C Program To Implement Bucket Sort
Bucket Sort is an efficient comparison sorting algorithm that distributes elements into a number of buckets, sorts each bucket individually, and then concatenates the sorted buckets. In this article, you will learn how to implement Bucket Sort in C for sorting an array of floating-point numbers.
Problem Statement
Efficiently sorting a list of elements is a fundamental problem in computer science. While general-purpose sorting algorithms like Quick Sort or Merge Sort work well for various data distributions, they might not be optimal for specific scenarios. When dealing with a large set of uniformly distributed numbers, especially floating-point numbers, an algorithm that leverages this distribution can significantly outperform comparison-based sorts, leading to faster processing times.
Example
Consider an unsorted array of floating-point numbers:
[0.89, 0.45, 0.67, 0.23, 0.09, 0.99]
After applying Bucket Sort, the array will be sorted in ascending order:
[0.09, 0.23, 0.45, 0.67, 0.89, 0.99]
Background & Knowledge Prerequisites
To understand and implement Bucket Sort in C, readers should be familiar with:
- C Language Basics: Variables, loops, conditional statements, functions.
- Arrays: Declaring and manipulating arrays.
- Pointers: Understanding pointer concepts and their use in C.
- Dynamic Memory Allocation: Using
malloc()andfree()for linked list nodes. - Linked Lists: Basic concepts of creating, inserting into, and traversing singly linked lists, as these are commonly used for implementing buckets.
Use Cases or Case Studies
Bucket Sort is particularly effective in scenarios where data is uniformly distributed over a range.
- Sorting large datasets of floating-point numbers: When numbers are distributed evenly, Bucket Sort can achieve linear time complexity (O(n+k)), where n is the number of elements and k is the number of buckets.
- Lexicographical sorting: Can be adapted to sort strings or other data based on specific characters or components.
- External sorting: When data cannot fit into memory, Bucket Sort can be used by sorting smaller chunks (buckets) individually.
- Data preprocessing for other algorithms: Sorting data can be a prerequisite for certain data analysis or search algorithms, and Bucket Sort offers an efficient way to achieve this for suitable datasets.
- Image processing: Can be used for histogram equalization or other pixel value manipulations where pixel values are often uniformly distributed within a range.
Solution Approaches
Implementing Bucket Sort in C
This approach details the standard implementation of Bucket Sort using an array of linked lists to represent the buckets. Each element is placed into a corresponding bucket, and then each bucket is sorted using a simple sorting method (like insertion sort for linked lists). Finally, the sorted elements are collected from the buckets.
// Bucket Sort Implementation
#include <stdio.h>
#include <stdlib.h> // Required for malloc, free
// Structure for a node in a linked list
struct Node {
float data;
struct Node* next;
};
// Function to create a new node for the linked list
struct Node* createNode(float data) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
if (newNode == NULL) {
perror("Failed to allocate memory for node");
exit(EXIT_FAILURE); // Exit if memory allocation fails
}
newNode->data = data;
newNode->next = NULL;
return newNode;
}
// Function to insert a node into a linked list in sorted order
// This effectively sorts elements within each bucket as they are added.
void insertSorted(struct Node** head, float data) {
struct Node* newNode = createNode(data);
struct Node* current;
// If the list is empty or the new node should be the new head
if (*head == NULL || (*head)->data >= newNode->data) {
newNode->next = *head;
*head = newNode;
} else {
// Traverse the list to find the correct position
current = *head;
while (current->next != NULL && current->next->data < newNode->data) {
current = current->next;
}
newNode->next = current->next;
current->next = newNode;
}
}
// Function to free the memory allocated for a linked list
void freeList(struct Node* head) {
struct Node* temp;
while (head != NULL) {
temp = head;
head = head->next;
free(temp);
}
}
// Function to implement Bucket Sort
void bucketSort(float arr[], int n) {
// Determine the number of buckets. For numbers between 0.0 and 1.0, 10 buckets
// are a common choice, but can be adjusted based on data distribution.
int numBuckets = 10;
// Create an array of pointers to Node, which will act as the heads of our linked lists (buckets)
struct Node* buckets[numBuckets];
// Step 1: Initialize all buckets as empty
for (int i = 0; i < numBuckets; i++) {
buckets[i] = NULL;
}
// Step 2: Distribute array elements into different buckets
for (int i = 0; i < n; i++) {
// Calculate the bucket index for the current element.
// Multiplies the number by `numBuckets` and takes the integer part.
// E.g., 0.89 * 10 = 8.9, floor(8.9) = 8. So 0.89 goes into bucket 8.
int bucketIndex = (int)(arr[i] * numBuckets);
// Ensure the index does not exceed bounds for values very close to 1.0
if (bucketIndex >= numBuckets) {
bucketIndex = numBuckets - 1;
}
// Insert the element into the calculated bucket's linked list in sorted order
insertSorted(&buckets[bucketIndex], arr[i]);
}
// Step 3: Sort individual buckets (implicitly done by insertSorted)
// The `insertSorted` function ensures that elements are added to each bucket
// in their correct sorted position, effectively sorting the individual buckets.
// For very large buckets, other sorting algorithms like quicksort or mergesort
// could be applied to the linked lists here if `insertSorted` isn't used.
// Step 4: Concatenate all buckets back into the original array
int index = 0; // Index for placing elements back into the original array
for (int i = 0; i < numBuckets; i++) {
struct Node* current = buckets[i];
while (current != NULL) {
arr[index++] = current->data; // Copy element back to array
current = current->next;
}
// Free the memory allocated for the linked list in the current bucket
freeList(buckets[i]);
}
}
int main() {
// Example array of floating-point numbers to be sorted
float arr[] = {0.89, 0.45, 0.67, 0.23, 0.09, 0.99, 0.12, 0.78, 0.56, 0.34};
int n = sizeof(arr) / sizeof(arr[0]);
printf("Original array: ");
for (int i = 0; i < n; i++) {
printf("%.2f ", arr[i]);
}
printf("\\n");
// Call the bucketSort function
bucketSort(arr, n);
printf("Sorted array: ");
for (int i = 0; i < n; i++) {
printf("%.2f ", arr[i]);
}
printf("\\n");
return 0;
}
Sample Output
Original array: 0.89 0.45 0.67 0.23 0.09 0.99 0.12 0.78 0.56 0.34
Sorted array: 0.09 0.12 0.23 0.34 0.45 0.56 0.67 0.78 0.89 0.99
Stepwise Explanation for Clarity
- Define Structures: A
Nodestruct is defined to create linked lists for each bucket, holding thedata(float) and a pointer to thenextnode. Helper functionscreateNode,insertSorted, andfreeListare also defined to manage these linked lists. - Initialize Buckets: An array of
numBuckets(e.g., 10) pointers toNodeis declared, representing the buckets. Each pointer is initialized toNULL, signifying empty buckets. - Distribute Elements: The algorithm iterates through the input
arr. For each elementarr[i], it calculates abucketIndex(e.g.,(int)(arr[i] * numBuckets)) to determine which bucket it belongs to. The element is then inserted into the linked list at thatbucketIndexusing theinsertSortedfunction, which places it in its correct sorted position within that bucket. - Sort Buckets: In this implementation, the sorting of individual buckets is handled implicitly by the
insertSortedfunction. As elements are added to a bucket, they are placed in their proper sorted order within that bucket's linked list. For very large buckets, one might choose to use a more advanced sorting algorithm after all elements are distributed. - Concatenate Buckets: Finally, the algorithm iterates through each bucket from the first to the last. It traverses the linked list in each bucket and copies its elements back into the original
arrsequentially, effectively reconstructing the fully sorted array. - Clean Up: After copying elements, the memory allocated for each bucket's linked list is freed using
freeListto prevent memory leaks.
Conclusion
Bucket Sort is a distribution-based sorting algorithm particularly effective for sorting large datasets of uniformly distributed elements. Its efficiency often surpasses comparison-based sorts when the data distribution is ideal, achieving an average time complexity of O(n + k), where 'n' is the number of elements and 'k' is the number of buckets. However, its performance degrades significantly if data is not uniformly distributed, potentially collapsing to O(n^2) in the worst case (e.g., all elements falling into one bucket).
Summary
- Bucket Sort distributes elements into a fixed number of buckets.
- It typically works best for uniformly distributed numerical data.
- Each bucket is then sorted individually, often using a simpler sorting algorithm or by inserting elements in sorted order.
- Finally, elements are gathered from the buckets in order to produce the sorted output.
- Implemented using an array of linked lists in C for dynamic bucket sizing and in-bucket sorting.
- Time complexity is O(n + k) on average, but O(n^2) in the worst case.