C++ Program To Implement Bucket Sort
In this article, you will learn how to implement Bucket Sort, an efficient sorting algorithm particularly well-suited for uniformly distributed data, using C++.
Problem Statement
Efficiently sorting a large collection of items is a fundamental problem in computer science. While comparison-based sorts like Merge Sort or Quick Sort have a theoretical lower bound of O(N log N) time complexity, non-comparison sorts can achieve better performance in specific scenarios. The challenge lies in finding an algorithm that can sort data, especially when it is uniformly distributed over a range, faster than comparison-based methods.
Example
Consider an unsorted list of floating-point numbers that are uniformly distributed between 0.0 and 1.0:
Input: [0.72, 0.54, 0.91, 0.12, 0.38, 0.65, 0.29, 0.83]
After applying Bucket Sort, the list becomes:
Output: [0.12, 0.29, 0.38, 0.54, 0.65, 0.72, 0.83, 0.91]
Background & Knowledge Prerequisites
To understand and implement Bucket Sort effectively in C++, you should be familiar with:
- C++ Basics: Fundamental syntax, data types, loops, and functions.
- Arrays/Vectors: How to declare, initialize, and manipulate dynamic arrays using
std::vector. -
std::vectorofstd::vector: Understanding how to create and manage a two-dimensional dynamic structure. - Basic Sorting Concepts: An understanding of what sorting entails and familiarity with simple sorting algorithms (e.g., Insertion Sort) will be helpful, though
std::sortwill be used here. - Included Headers: You will typically need
for input/output,for dynamic arrays, andforstd::sort.
Use Cases or Case Studies
Bucket Sort is particularly effective in several practical scenarios:
- Sorting Floating-Point Numbers: It performs exceptionally well when sorting numbers within a known range (e.g., 0.0 to 1.0) with a uniform distribution.
- Large Datasets with Uniform Distribution: For very large sets of data where items are spread evenly across a range, Bucket Sort can outperform comparison-based sorts.
- External Sorting: When data is too large to fit into memory, Bucket Sort can be adapted to process chunks of data, writing sorted buckets to disk.
- Data with Limited Range: If the data values are constrained to a relatively small integer or floating-point range, assigning them to buckets becomes straightforward and efficient.
- Preprocessing Step for Other Algorithms: Sometimes used as a pre-sorting step to organize data before applying more complex algorithms that benefit from partially sorted input.
Solution Approaches
Standard Bucket Sort Implementation
This approach involves creating a set of "buckets," distributing the input elements into these buckets, sorting each individual bucket, and then concatenating the sorted buckets to produce the final sorted output.
One-line summary
Divides elements into a fixed number of buckets, sorts each bucket, then combines them.Code Example
// Bucket Sort Implementation
#include <iostream>
#include <vector>
#include <algorithm> // For std::sort
#include <iomanip> // For std::fixed and std::setprecision
// Function to perform Bucket Sort
void bucketSort(std::vector<double>& arr) {
// Step 1: Determine the number of buckets
// A common heuristic is to use approximately sqrt(N) or N/K buckets.
// For simplicity, let's use 10 buckets for numbers between 0.0 and 1.0.
int numBuckets = 10;
// Step 2: Create 'numBuckets' empty buckets
std::vector<std::vector<double>> buckets(numBuckets);
// Step 3: Distribute elements into buckets
// Iterate through the input array and place each element into its corresponding bucket.
// The bucket index is calculated by scaling the element value.
for (double val : arr) {
// Ensure val is within the expected range [0.0, 1.0) for this scaling
int bucketIndex = static_cast<int>(numBuckets * val);
// Handle edge case if val == 1.0, though examples typically use [0.0, 1.0)
if (bucketIndex >= numBuckets) {
bucketIndex = numBuckets - 1; // Put 1.0 in the last bucket
}
buckets[bucketIndex].push_back(val);
}
// Step 4: Sort individual buckets
// Iterate through each bucket and sort its contents.
// std::sort uses Introsort (a hybrid of QuickSort, HeapSort, and InsertionSort),
// which is efficient for smaller collections as well.
for (int i = 0; i < numBuckets; ++i) {
std::sort(buckets[i].begin(), buckets[i].end());
}
// Step 5: Concatenate sorted buckets
// Iterate through the buckets and place their sorted elements back into the original array.
int index = 0;
for (int i = 0; i < numBuckets; ++i) {
for (double val : buckets[i]) {
arr[index++] = val;
}
}
}
int main() {
// Step 1: Initialize an unsorted vector of double values
std::vector<double> data = {0.72, 0.54, 0.91, 0.12, 0.38, 0.65, 0.29, 0.83, 0.05, 0.47, 0.99, 0.01};
// Step 2: Print the original array
std::cout << "Original array: ";
for (double val : data) {
std::cout << std::fixed << std::setprecision(2) << val << " ";
}
std::cout << std::endl;
// Step 3: Apply Bucket Sort
bucketSort(data);
// Step 4: Print the sorted array
std::cout << "Sorted array: ";
for (double val : data) {
std::cout << std::fixed << std::setprecision(2) << val << " ";
}
std::cout << std::endl;
return 0;
}
Sample Output
Original array: 0.72 0.54 0.91 0.12 0.38 0.65 0.29 0.83 0.05 0.47 0.99 0.01
Sorted array: 0.01 0.05 0.12 0.29 0.38 0.47 0.54 0.65 0.72 0.83 0.91 0.99
Stepwise Explanation for Clarity
- Determine Number of Buckets: We start by deciding how many buckets to use. For floating-point numbers in the range
[0.0, 1.0), a common approach is to use a fixed number like 10, or a number proportional to the input size. The choice impacts efficiency. - Create Buckets: A
std::vectorofstd::vectoris initialized. Each innerstd::vectoracts as a bucket, capable of holding multipledoublevalues. - Distribute Elements: The core of Bucket Sort. Each element from the input array is iterated. Its value is used to calculate an index, determining which bucket it belongs to. For values between 0.0 and 1.0, multiplying by
numBucketsand casting tointgives an appropriate index. The element is thenpush_backinto the chosen bucket. - Sort Individual Buckets: Once all elements are distributed, each bucket needs to be sorted independently. Since buckets typically contain fewer elements than the original array,
std::sort(which is highly optimized) is efficient here. If buckets become very small, insertion sort might be slightly faster, butstd::sorthandles small inputs well too. - Concatenate Sorted Buckets: Finally, the elements are gathered back from the buckets. By iterating through the buckets sequentially and appending their sorted contents back into the original array, the entire array becomes sorted.
This implementation assumes input values are in the range [0.0, 1.0). For a general range [min_val, max_val], elements would first need to be mapped to [0.0, 1.0) using a transformation like (val - min_val) / (max_val - min_val) before calculating the bucket index.
Conclusion
Bucket Sort is an effective non-comparison-based sorting algorithm that can achieve linear time complexity (O(N+K) where N is the number of elements and K is the number of buckets) when data is uniformly distributed. Its efficiency stems from dividing the problem into smaller, more manageable subproblems (sorting individual buckets) and then combining the results. While it requires additional space for the buckets, its performance benefits make it a valuable tool for specific data types and distributions.
Summary
- Algorithm Type: Non-comparison sort.
- Best Use Case: Uniformly distributed data, especially floating-point numbers.
- Time Complexity: O(N+K) on average for uniformly distributed data, where N is elements and K is buckets. Worst-case can be O(N^2) if all elements fall into one bucket.
- Space Complexity: O(N+K) due to storing elements in buckets.
- Key Steps:
- Create
Kempty buckets. - Place each element into its corresponding bucket.
- Sort each non-empty bucket.
- Concatenate elements from sorted buckets.