Sorting Elements Of An Array By Frequency in C
Sorting an array by the frequency of its elements is a common task in data processing and analysis. It involves arranging the array elements such that those appearing most frequently come first, and elements with the same frequency are sorted by their value (typically in ascending order).
In this article, you will learn how to sort elements of an array by their frequency using C, exploring different approaches to tackle this problem effectively.
Problem Statement
Imagine you have a list of numbers, and you need to organize them based on how often each number appears. For instance, if the number '5' appears 10 times and '3' appears 5 times, '5' should come before '3' in the sorted list. If both '5' and '3' appear 5 times, then '3' would come before '5' (assuming ascending order for ties). This problem is crucial in areas like data analytics, recommendation systems, and competitive programming where understanding data distribution is key.
Example
Let's consider an input array: [2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12]
The desired output after sorting by frequency (descending) and then by value (ascending) would be: [3, 3, 3, 3, 2, 2, 2, 12, 12, 4, 5]
Let's break down the frequencies:
3: appears 4 times2: appears 3 times12: appears 2 times4: appears 1 time5: appears 1 time
Following the sorting rules:
- Elements with frequency 4:
3 - Elements with frequency 3:
2 - Elements with frequency 2:
12 - Elements with frequency 1:
4,5(sorted by value:4,5)
Combining them gives: [3, 3, 3, 3, 2, 2, 2, 12, 12, 4, 5]
Background & Knowledge Prerequisites
To understand and implement the solutions, you should have a basic understanding of:
- C Programming Basics: Variables, data types, loops (for, while), conditional statements (if-else).
- Arrays: Declaration, initialization, accessing elements.
- Functions: Defining and calling functions.
- Pointers: Basic understanding for array manipulation and
qsort. - Structures (
struct): Defining and using custom data types. - Standard Library Functions: Specifically
qsortfor sorting.
Use Cases or Case Studies
Sorting by frequency has diverse applications across various domains:
- Data Analytics: Identifying the most common items in a dataset (e.g., top-selling products, most frequent words in a document, most popular search queries).
- Recommendation Systems: Suggesting items that are frequently bought or viewed together with a user's current selection.
- Network Traffic Analysis: Pinpointing the most active IP addresses or port numbers in network logs to detect anomalies or high-usage patterns.
- Competitive Programming: Many algorithmic problems involve optimizing solutions based on element frequencies.
- Database Indexing: Optimizing data access by sorting frequently queried data keys.
Solution Approaches
We will explore two primary approaches for sorting elements by frequency in C. Both leverage custom comparison logic with qsort.
Approach 1: Using a Structure and qsort for General Integers
This approach involves creating a custom structure to store each unique element and its corresponding frequency. We then populate an array of these structures and use qsort with a custom comparison function to sort them. Finally, we reconstruct the output array.
One-line Summary: Count frequencies, store (element, frequency) pairs in a struct array, sort the struct array using qsort, then rebuild the original array.
// Sort Array by Frequency - Approach 1
#include <stdio.h>
#include <stdlib.h> // For qsort and malloc
// Structure to hold element value and its frequency
typedef struct {
int value;
int frequency;
} ElementFrequency;
// Comparison function for qsort
// Sorts by frequency (descending)
// If frequencies are equal, sorts by value (ascending)
int compareElementFrequency(const void *a, const void *b) {
ElementFrequency *efA = (ElementFrequency *)a;
ElementFrequency *efB = (ElementFrequency *)b;
// Primary sort: frequency (descending)
if (efA->frequency != efB->frequency) {
return efB->frequency - efA->frequency;
}
// Secondary sort: value (ascending) if frequencies are equal
return efA->value - efB->value;
}
// Function to sort array by frequency
void sortByFrequency(int arr[], int n) {
// Step 1: Count frequencies of each element
// We'll use a temporary array of ElementFrequency structs
// This is an inefficient way to count frequencies for large or sparse ranges
// but works for demonstration. Better would be a hash map or sorting first.
// For simplicity, we assume elements are within a reasonable range for demonstration.
// A more robust frequency counting involves sorting the input array first or using a hash map.
// Let's optimize frequency counting by first sorting the input array
// This groups identical elements together, making frequency counting easier.
qsort(arr, n, sizeof(int), [](const void* a, const void* b){
return *(int*)a - *(int*)b;
});
ElementFrequency *freqArray = (ElementFrequency *)malloc(n * sizeof(ElementFrequency));
if (freqArray == NULL) {
printf("Memory allocation failed.\\n");
return;
}
int uniqueCount = 0;
if (n > 0) {
freqArray[uniqueCount].value = arr[0];
freqArray[uniqueCount].frequency = 1;
uniqueCount++;
for (int i = 1; i < n; i++) {
if (arr[i] == arr[i-1]) {
freqArray[uniqueCount-1].frequency++;
} else {
freqArray[uniqueCount].value = arr[i];
freqArray[uniqueCount].frequency = 1;
uniqueCount++;
}
}
}
// Step 2: Sort the freqArray based on frequency and then value
qsort(freqArray, uniqueCount, sizeof(ElementFrequency), compareElementFrequency);
// Step 3: Reconstruct the original array with sorted elements
int k = 0;
for (int i = 0; i < uniqueCount; i++) {
for (int j = 0; j < freqArray[i].frequency; j++) {
arr[k++] = freqArray[i].value;
}
}
free(freqArray); // Free allocated memory
}
int main() {
int arr[] = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
int n = sizeof(arr) / sizeof(arr[0]);
printf("Original array: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
sortByFrequency(arr, n);
printf("Sorted by frequency: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Another example
int arr2[] = {1, 1, 1, 2, 2, 3};
int n2 = sizeof(arr2) / sizeof(arr2[0]);
printf("\\nOriginal array 2: ");
for (int i = 0; i < n2; i++) {
printf("%d ", arr2[i]);
}
printf("\\n");
sortByFrequency(arr2, n2);
printf("Sorted by frequency 2: ");
for (int i = 0; i < n2; i++) {
printf("%d ", arr2[i]);
}
printf("\\n");
return 0;
}
Sample Output:
Original array: 2 3 2 4 5 12 2 3 3 3 12
Sorted by frequency: 3 3 3 3 2 2 2 12 12 4 5
Original array 2: 1 1 1 2 2 3
Sorted by frequency 2: 1 1 1 2 2 3
Stepwise Explanation:
- Define
ElementFrequencyStructure: AstructnamedElementFrequencyis created to store an integervalueand itsfrequency. compareElementFrequencyFunction: This custom comparison function is essential forqsort. It takes twoconst voidpointers, casts them toElementFrequency, and compares them:- It first compares
frequencyin descending order (efB->frequency - efA->frequency).
- It first compares
- If frequencies are equal, it then compares
valuein ascending order (efA->value - efB->value).
sortByFrequencyFunction:- Initial Sort (Optional but helpful): The input
arris first sorted usingqsortbased on element values. This groups identical elements together, making the subsequent frequency counting much simpler and more efficient than iterating through the array repeatedly.
- Initial Sort (Optional but helpful): The input
- Frequency Counting: It iterates through the now sorted input array
arr. As it encounters elements, it counts their frequencies. When a new unique element is found, a newElementFrequencyentry is added tofreqArray. - Allocate
freqArray: Dynamic memory is allocated for an array ofElementFrequencystructs to store the unique elements and their counts. - Populate
freqArray: It iterates through the sorted input array, counting consecutive identical elements to determine their frequencies and storing them infreqArray.uniqueCounttracks how many unique elements were found. - Sort
freqArray: ThefreqArrayis then sorted usingqsortwith our customcompareElementFrequencyfunction. This arranges the unique elements by their frequency and then by value. - Reconstruct Array: Finally, the original input
arris overwritten. It iterates through the sortedfreqArrayand, for eachElementFrequencyentry, adds itsvaluetoarrthe number of times indicated by itsfrequency. - Free Memory: The dynamically allocated
freqArraymemory is freed to prevent leaks.
mainFunction: Initializes an example array, prints it, callssortByFrequency, and prints the sorted array.
Approach 2: Using an Auxiliary Frequency Array (for Limited Range Integers)
This approach is optimized for scenarios where the array elements are non-negative and fall within a known, relatively small range (e.g., 0 to 1000). We use an auxiliary array as a frequency map.
One-line Summary: Count frequencies using an auxiliary array, create (element, frequency) pairs, sort these pairs using qsort, and then reconstruct the original array.
// Sort Array by Frequency - Approach 2
#include <stdio.h>
#include <stdlib.h> // For qsort and malloc
#include <limits.h> // For INT_MIN, INT_MAX (or just set a MAX_VAL)
// Structure to hold element value and its frequency (same as Approach 1)
typedef struct {
int value;
int frequency;
} ElementFrequency;
// Comparison function for qsort (same as Approach 1)
int compareElementFrequency_2(const void *a, const void *b) {
ElementFrequency *efA = (ElementFrequency *)a;
ElementFrequency *efB = (ElementFrequency *)b;
if (efA->frequency != efB->frequency) {
return efB->frequency - efA->frequency; // Descending frequency
}
return efA->value - efB->value; // Ascending value for ties
}
// Function to sort array by frequency using auxiliary frequency array
void sortByFrequencyLimitedRange(int arr[], int n, int max_val_in_arr) {
if (n == 0) return;
// Step 1: Use an auxiliary array to count frequencies
// The size of this array depends on the maximum possible value in arr.
// For demonstration, let's assume max_val_in_arr is known.
// In a real scenario, you'd find the max value in the array first.
int *freq_map = (int *)calloc(max_val_in_arr + 1, sizeof(int));
if (freq_map == NULL) {
printf("Memory allocation failed.\\n");
return;
}
for (int i = 0; i < n; i++) {
if (arr[i] >= 0 && arr[i] <= max_val_in_arr) { // Ensure value is within bounds
freq_map[arr[i]]++;
}
}
// Step 2: Create an array of ElementFrequency structs from the frequency map
ElementFrequency *freqArray = (ElementFrequency *)malloc((max_val_in_arr + 1) * sizeof(ElementFrequency));
if (freqArray == NULL) {
printf("Memory allocation failed.\\n");
free(freq_map);
return;
}
int uniqueCount = 0;
for (int i = 0; i <= max_val_in_arr; i++) {
if (freq_map[i] > 0) {
freqArray[uniqueCount].value = i;
freqArray[uniqueCount].frequency = freq_map[i];
uniqueCount++;
}
}
// Step 3: Sort the freqArray based on frequency and then value
qsort(freqArray, uniqueCount, sizeof(ElementFrequency), compareElementFrequency_2);
// Step 4: Reconstruct the original array with sorted elements
int k = 0;
for (int i = 0; i < uniqueCount; i++) {
for (int j = 0; j < freqArray[i].frequency; j++) {
arr[k++] = freqArray[i].value;
}
}
free(freq_map);
free(freqArray);
}
int main() {
int arr[] = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
int n = sizeof(arr) / sizeof(arr[0]);
int max_val = 12; // Assuming max value is 12 for this example.
// In a real scenario, you'd find this dynamically.
printf("Original array: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
sortByFrequencyLimitedRange(arr, n, max_val);
printf("Sorted by frequency: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Another example
int arr3[] = {7, 1, 5, 1, 7, 7, 5, 1, 2};
int n3 = sizeof(arr3) / sizeof(arr3[0]);
int max_val3 = 7; // Max value in arr3
printf("\\nOriginal array 3: ");
for (int i = 0; i < n3; i++) {
printf("%d ", arr3[i]);
}
printf("\\n");
sortByFrequencyLimitedRange(arr3, n3, max_val3);
printf("Sorted by frequency 3: ");
for (int i = 0; i < n3; i++) {
printf("%d ", arr3[i]);
}
printf("\\n");
return 0;
}
Sample Output:
Original array: 2 3 2 4 5 12 2 3 3 3 12
Sorted by frequency: 3 3 3 3 2 2 2 12 12 4 5
Original array 3: 7 1 5 1 7 7 5 1 2
Sorted by frequency 3: 1 1 1 7 7 7 5 5 2
Stepwise Explanation:
- Determine
maxvalinarr: This approach requires knowing the maximum possible value an element can take in the array. This can be pre-determined or found by iterating through the array once. freqmapArray: An integer arrayfreqmapis created with a size ofmaxvalinarr + 1. This array acts as a direct mapping:freqmap[i]stores the frequency of the numberi.callocis used to initialize all elements to 0.- Count Frequencies: The input
arris iterated, and for each elementarr[i], its countfreqmap[arr[i]]is incremented. This is very efficient for frequency counting when elements are within the map's range. - Create
ElementFrequencyArray: An array ofElementFrequencystructs (freqArray) is populated. It iterates through thefreqmapfrom index 0 tomaxvalinarr. Iffreqmap[i]is greater than 0 (meaning elementiwas present in the input array), anElementFrequencystruct withvalue = iandfrequency = freqmap[i]is added tofreqArray.uniqueCounttracks the number of unique elements. - Sort
freqArray: Similar to Approach 1,qsortis used withcompareElementFrequency2to sortfreqArrayby frequency (descending) and then by value (ascending). - Reconstruct Array: The original
arris rebuilt by iterating through the sortedfreqArrayand appending eachvaluefrequencytimes. - Free Memory: Dynamically allocated
freqmapandfreqArraymemory are freed.
Conclusion
Sorting an array by frequency is a fundamental data manipulation task with various real-world applications. Both approaches discussed leverage the power of qsort with custom comparison functions to achieve the desired sorting:
- Approach 1 (General
structandqsort): Highly versatile for arrays with elements of any integer range, including negative numbers, as it first sorts the input array to efficiently count frequencies of unique elements. - Approach 2 (Auxiliary Frequency Array): Optimized for arrays where elements are non-negative and fall within a relatively small, known range. It provides very efficient frequency counting.
Choosing the right approach depends on the characteristics of your data (range of values, sparsity) and performance requirements. For general integer arrays, Approach 1 is robust. For limited positive integer ranges, Approach 2 offers a more direct way to count frequencies.
Summary
- Problem: Arrange array elements based on their frequency (most frequent first), then by value for ties.
- Core Idea: Count element frequencies, store them with their values, and then sort these pairs.
- Approach 1 (
struct+qsort): - Sort input array initially for efficient frequency counting.
- Create
ElementFrequencystructs for unique elements. - Sort
ElementFrequencyarray usingqsort(freq descending, value ascending). - Reconstruct the original array.
- Approach 2 (Auxiliary Array +
qsort): - Best for non-negative integers within a limited range.
- Use an auxiliary array as a direct frequency map.
- Create
ElementFrequencystructs from the map. - Sort
ElementFrequencyarray usingqsort. - Reconstruct the original array.
- Key Component:
qsortwith a custom comparison function is crucial for defining the sorting logic. - Memory Management: Remember to
freedynamically allocated memory (malloc,calloc) to prevent memory leaks.
Learning Verification
- What is the primary sorting criterion when sorting by frequency? What is the secondary criterion?
- Explain why
qsortis a suitable function for this problem in C. - In Approach 1, why is it beneficial to sort the input array before counting frequencies?
- When would you prefer Approach 2 over Approach 1, and what is its main limitation?
Challenge: Try to modify one of the approaches to sort elements with the same frequency in descending order of their value instead of ascending.