C Program For Implementation Of Post Man Sort Algorithms
In this article, you will learn about the Postman Sort algorithm, its practical applications, and a detailed C implementation to sort data based on multiple keys.
Problem Statement
Efficiently sorting data that contains multiple hierarchical attributes, such as addresses, is a common challenge. Standard sorting algorithms typically handle a single key. However, for complex records like mailing addresses, you need to sort first by zip code, then by street, and finally by house number. A single pass of a standard sort won't achieve this multi-level hierarchy, leading to unsorted data within sub-groups. This inefficiency can impact database queries, logistics, and data organization, where the order of nested fields is crucial.
Background & Knowledge Prerequisites
To understand Postman Sort, familiarity with the following concepts is helpful:
- C Programming Basics: Variables, data types, structures, arrays, loops, and functions.
- Basic Sorting Algorithms: Concepts of comparison sorts (like Bubble Sort or Insertion Sort) and their stability.
- Radix Sort Principles: Understanding how counting sort or bucket sort can be used as a stable sub-routine to sort numbers by digits.
- Stable Sorting: A sorting algorithm is stable if it preserves the relative order of equal elements. This is critical for multi-key sorting.
Use Cases or Case Studies
Postman Sort is particularly useful in scenarios requiring hierarchical sorting:
- Mail Delivery Systems: Sorting physical mail by country, then state, city, zip code, street name, and finally house number. This optimizes delivery routes.
- Database Management: Sorting records in a database table by multiple columns (e.g.,
DepartmentthenLastNamethenFirstName). - Logistics and Inventory: Organizing items in a warehouse by zone, aisle, shelf, and bin number for efficient retrieval.
- Geographic Information Systems (GIS): Ordering geographical data by administrative divisions like continent, country, region, and city.
- Network Packet Routing: Sorting packets based on multiple header fields to prioritize or direct traffic efficiently.
Solution Approaches: Postman Sort Algorithm
Postman Sort is a multi-key stable sorting algorithm often implemented using a variant of Radix Sort. It works by sorting the data multiple times, starting from the least significant key to the most significant key, or vice-versa, depending on the specific implementation (LSD-first or MSD-first). For address sorting, an MSD-first approach (like sorting by zip code first, then street, then house number) is intuitive. However, a common implementation uses an LSD-first approach with a stable sort, as this allows for simpler aggregation of results.
Here, we will demonstrate an LSD-first approach using a modified Counting Sort as the stable sub-routine for each key.
Approach 1: Postman Sort using Multi-Key Stable Counting Sort
This approach involves sorting the data elements iteratively, starting with the least significant attribute and progressing to the most significant attribute. Each sorting pass must use a stable sorting algorithm.
One-line summary
Sorts elements with multiple keys by iteratively applying a stable sort (like Counting Sort) from the least significant key to the most significant key.Code Example
// Postman Sort Algorithm
#include <stdio.h>
#include <stdlib.h> // For malloc, free
#include <string.h> // For strcpy, strlen
// Structure to represent a mail item with multiple keys
typedef struct MailItem {
int zipCode;
int districtID;
int houseNumber;
char addressLabel[50]; // For display purposes
} MailItem;
// Helper function to get the maximum value for a specific key
int getMax(MailItem arr[], int n, int keyType) {
int max = 0;
if (n == 0) return 0;
switch (keyType) {
case 0: // houseNumber
max = arr[0].houseNumber;
for (int i = 1; i < n; i++) {
if (arr[i].houseNumber > max)
max = arr[i].houseNumber;
}
break;
case 1: // districtID
max = arr[0].districtID;
for (int i = 1; i < n; i++) {
if (arr[i].districtID > max)
max = arr[i].districtID;
}
break;
case 2: // zipCode
max = arr[0].zipCode;
for (int i = 1; i < n; i++) {
if (arr[i].zipCode > max)
max = arr[i].zipCode;
}
break;
}
return max;
}
// A utility function to do counting sort of arr[] according to
// the key represented by keyType (0 for houseNumber, 1 for districtID, 2 for zipCode)
void countingSortForKey(MailItem arr[], int n, int keyType) {
MailItem *output = (MailItem *) malloc(n * sizeof(MailItem));
if (output == NULL) {
perror("Failed to allocate memory for output array");
exit(EXIT_FAILURE);
}
int maxKey = getMax(arr, n, keyType);
int *count = (int *) calloc((maxKey + 1), sizeof(int));
if (count == NULL) {
perror("Failed to allocate memory for count array");
free(output);
exit(EXIT_FAILURE);
}
// Store count of occurrences in count[]
for (int i = 0; i < n; i++) {
switch (keyType) {
case 0: count[arr[i].houseNumber]++; break;
case 1: count[arr[i].districtID]++; break;
case 2: count[arr[i].zipCode]++; break;
}
}
// Change count[i] so that count[i] now contains actual
// position of this key in output array
for (int i = 1; i <= maxKey; i++) {
count[i] += count[i - 1];
}
// Build the output array (iterate from end to maintain stability)
for (int i = n - 1; i >= 0; i--) {
switch (keyType) {
case 0:
output[count[arr[i].houseNumber] - 1] = arr[i];
count[arr[i].houseNumber]--;
break;
case 1:
output[count[arr[i].districtID] - 1] = arr[i];
count[arr[i].districtID]--;
break;
case 2:
output[count[arr[i].zipCode] - 1] = arr[i];
count[arr[i].zipCode]--;
break;
}
}
// Copy the output array to arr[], so that arr[] now contains sorted numbers
for (int i = 0; i < n; i++) {
arr[i] = output[i];
}
free(output);
free(count);
}
// The main function to perform Postman Sort
// Sorts by houseNumber, then districtID, then zipCode
void postmanSort(MailItem arr[], int n) {
// Sort by houseNumber (LSD)
countingSortForKey(arr, n, 0);
// Sort by districtID
countingSortForKey(arr, n, 1);
// Sort by zipCode (MSD)
countingSortForKey(arr, n, 2);
}
// Function to print an array of MailItem
void printMailItems(MailItem arr[], int n) {
for (int i = 0; i < n; i++) {
printf("Zip: %5d, Dist: %2d, House: %3d, Address: %s\\n",
arr[i].zipCode, arr[i].districtID, arr[i].houseNumber, arr[i].addressLabel);
}
}
int main() {
// Step 1: Initialize mail items
MailItem mail[] = {
{12345, 2, 101, "101 Elm St, Dist 2, ZC 12345"},
{12345, 1, 203, "203 Oak Ave, Dist 1, ZC 12345"},
{54321, 3, 50, "50 Pine Rd, Dist 3, ZC 54321"},
{12345, 2, 99, "99 Elm St, Dist 2, ZC 12345"},
{54321, 1, 15, "15 Pine Rd, Dist 1, ZC 54321"},
{98765, 1, 1, "1 Maple Ln, Dist 1, ZC 98765"},
{12345, 1, 203, "203 Main St, Dist 1, ZC 12345"}, // Duplicate houseNumber and zip with different street
{98765, 1, 10, "10 Maple Ln, Dist 1, ZC 98765"}
};
int n = sizeof(mail) / sizeof(mail[0]);
printf("Original Mail Items:\\n");
printMailItems(mail, n);
printf("\\n");
// Step 2: Apply Postman Sort
postmanSort(mail, n);
// Step 3: Print sorted mail items
printf("Sorted Mail Items (by houseNumber, then districtID, then zipCode):\\n");
printMailItems(mail, n);
return 0;
}
Sample Output
Original Mail Items:
Zip: 12345, Dist: 2, House: 101, Address: 101 Elm St, Dist 2, ZC 12345
Zip: 12345, Dist: 1, House: 203, Address: 203 Oak Ave, Dist 1, ZC 12345
Zip: 54321, Dist: 3, House: 50, Address: 50 Pine Rd, Dist 3, ZC 54321
Zip: 12345, Dist: 2, House: 99, Address: 99 Elm St, Dist 2, ZC 12345
Zip: 54321, Dist: 1, House: 15, Address: 15 Pine Rd, Dist 1, ZC 54321
Zip: 98765, Dist: 1, House: 1, Address: 1 Maple Ln, Dist 1, ZC 98765
Zip: 12345, Dist: 1, House: 203, Address: 203 Main St, Dist 1, ZC 12345
Zip: 98765, Dist: 1, House: 10, Address: 10 Maple Ln, Dist 1, ZC 98765
Sorted Mail Items (by houseNumber, then districtID, then zipCode):
Zip: 98765, Dist: 1, House: 1, Address: 1 Maple Ln, Dist 1, ZC 98765
Zip: 98765, Dist: 1, House: 10, Address: 10 Maple Ln, Dist 1, ZC 98765
Zip: 54321, Dist: 1, House: 15, Address: 15 Pine Rd, Dist 1, ZC 54321
Zip: 54321, Dist: 3, House: 50, Address: 50 Pine Rd, Dist 3, ZC 54321
Zip: 12345, Dist: 2, House: 99, Address: 99 Elm St, Dist 2, ZC 12345
Zip: 12345, Dist: 2, House: 101, Address: 101 Elm St, Dist 2, ZC 12345
Zip: 12345, Dist: 1, House: 203, Address: 203 Oak Ave, Dist 1, ZC 12345
Zip: 12345, Dist: 1, House: 203, Address: 203 Main St, Dist 1, ZC 12345
Stepwise Explanation for Clarity
- Define
MailItemStructure: Astruct MailItemis created to hold multiple attributes (keys) for each piece of mail:zipCode,districtID,houseNumber, and anaddressLabelfor descriptive purposes. getMaxHelper Function: This function determines the maximum value for a given key type (e.g., maximumhouseNumber) across the array. This is necessary to size thecountarray for Counting Sort.countingSortForKeyFunction: This is the core stable sorting routine.
- It takes the array, its size, and a
keyType(0 forhouseNumber, 1 fordistrictID, 2 forzipCode) as input. - It dynamically allocates an
outputarray to store sorted elements and acountarray to store frequencies of each key value. - It counts the occurrences of each key value.
- It modifies the
countarray to store the actual position of each key in theoutputarray. - It iterates through the input array *from right to left* to build the
outputarray. This reverse iteration is crucial for maintaining the stability of the sort. - Finally, it copies the sorted
outputarray back to the originalarr, and frees the temporary memory.
postmanSortFunction: This function orchestrates the multi-pass sorting:
- It first calls
countingSortForKeyto sort theMailItemarray byhouseNumber(the least significant key in our chosen hierarchy). - Next, it calls
countingSortForKeyto sort the now partially sorted array bydistrictID. BecausecountingSortForKeyis stable, elements with the samedistrictIDwill retain their relative order from the previoushouseNumbersort. - Finally, it sorts by
zipCode(the most significant key). Again, stability ensures that items with the samezipCoderemain sorted bydistrictID, and within that, byhouseNumber.
mainFunction:
- Initializes an array of
MailItemwith unsorted data. - Prints the original array.
- Calls
postmanSortto apply the algorithm. - Prints the final sorted array, demonstrating the hierarchical sorting.
Conclusion
The Postman Sort algorithm provides an effective and stable method for sorting data based on multiple hierarchical keys. By leveraging a stable sub-sorting algorithm like Counting Sort and applying it iteratively from the least significant to the most significant key, it ensures that data is correctly ordered at every level. This makes it invaluable for applications requiring precise multi-field data organization.
Summary
- Multi-Key Sorting: Postman Sort is designed for data with multiple attributes requiring hierarchical ordering.
- Stable Sub-Sort: It relies on a stable sorting algorithm (e.g., Counting Sort or Bucket Sort) for each pass to preserve the relative order of equal elements.
- Iterative Passes: The sorting is performed in multiple passes, typically from the least significant key to the most significant key (LSD-first approach).
- Applications: Common in mail delivery, database record sorting, logistics, and any system needing complex, multi-level data organization.
- Efficiency: When key ranges are suitable for Counting Sort, Postman Sort can achieve linear time complexity (O(N*k) where N is number of items and k is number of keys), making it very efficient for large datasets.