Sorting Elements Of An Array By Frequency In C Program
Sorting elements in an array by their frequency means arranging them based on how often they appear, typically with the most frequent elements coming first.
If two elements have the same frequency, their original numerical value can be used as a tie-breaker, often in ascending order. In this article, you will learn how to implement this sorting logic in a C program using different approaches.
Problem Statement
Efficiently organizing data based on its recurrence is a fundamental challenge in data processing. Given an array of integers, the task is to sort these integers such that elements with higher frequencies appear before elements with lower frequencies. When frequencies are identical, the elements should be sorted by their values in ascending order. This problem has direct implications for data analysis, search algorithms, and optimizing data retrieval.
Example
Consider the following input array:
[2, 5, 2, 8, 5, 6, 8, 8]
The desired output, sorted by frequency (descending) and then by value (ascending), would be:
[8, 8, 8, 2, 2, 5, 5, 6]
Here's why:
8appears 3 times.2appears 2 times.5appears 2 times.6appears 1 time.
Since 2 and 5 have the same frequency (2), 2 comes before 5 because 2 < 5.
Background & Knowledge Prerequisites
To understand and implement the solutions, a basic understanding of the following C concepts is helpful:
- Arrays: Storing collections of elements.
- Structures (
struct): Creating custom data types to group related variables. - Loops:
forandwhileloops for iteration. - Functions: Defining and calling functions, especially for custom sorting logic.
- Pointers: Basic knowledge for
qsortusage. - Standard Library Functions: Familiarity with
stdio.hfor input/output andstdlib.hforqsort.
Use Cases or Case Studies
Sorting by frequency is valuable in various scenarios:
- Data Analysis: Identifying the most common items in a dataset (e.g., most purchased products, most frequent words in a document).
- Search Engine Optimization: Ranking search results or keywords based on their relevance and frequency in content.
- Compression Algorithms: More frequent symbols can be assigned shorter codes (e.g., Huffman coding).
- Network Traffic Analysis: Pinpointing IP addresses with the highest number of connection attempts to detect anomalies or attacks.
- Recommendation Systems: Suggesting items that are frequently associated with user preferences or popular among a community.
Solution Approaches
We will explore two distinct approaches to sort an array by frequency in C. The first approach involves manual counting and a basic sorting algorithm, providing clear insight into the logic. The second leverages the standard library's qsort function for a more efficient and concise solution.
Approach 1: Manual Frequency Counting and Bubble Sort
This approach involves creating a temporary structure to store each unique element and its frequency. Then, a simple sorting algorithm like bubble sort is applied to this structure based on the frequency criteria.
- Summary: Count frequencies of unique elements, store them in a custom struct array, and sort this array using a manual sorting algorithm (e.g., Bubble Sort).
C Code Example:
// Sort Array by Frequency - Manual Count & Bubble Sort
#include <stdio.h>
#include <stdlib.h> // For INT_MIN (though not strictly necessary for this approach, good for general practice)
// Structure to hold element and its frequency
struct ElementFreq {
int value;
int frequency;
};
// Function to print the sorted array
void printSortedArray(struct ElementFreq freqArray[], int uniqueCount) {
for (int i = 0; i < uniqueCount; i++) {
for (int j = 0; j < freqArray[i].frequency; j++) {
printf("%d ", freqArray[i].value);
}
}
printf("\\n");
}
int main() {
// Step 1: Initialize the input array
int arr[] = {2, 5, 2, 8, 5, 6, 8, 8};
int n = sizeof(arr) / sizeof(arr[0]);
// Step 2: Create a temporary array of ElementFreq structs
// Max size of this array can be 'n' (if all elements are unique)
struct ElementFreq freqArray[n];
int uniqueCount = 0; // Tracks the number of unique elements found
// Step 3: Count frequencies of each unique element
// Mark elements as visited by setting them to a value like INT_MIN
// This modifies the original array, so make a copy if needed.
// For simplicity, we'll iterate and use a 'visited' flag logic or check for existing.
// A better way is to iterate and then check if the element has already been added to freqArray.
for (int i = 0; i < n; i++) {
int isPresent = 0;
// Check if current element arr[i] is already in freqArray
for (int j = 0; j < uniqueCount; j++) {
if (freqArray[j].value == arr[i]) {
freqArray[j].frequency++;
isPresent = 1;
break;
}
}
// If not present, add it as a new unique element
if (!isPresent) {
freqArray[uniqueCount].value = arr[i];
freqArray[uniqueCount].frequency = 1;
uniqueCount++;
}
}
// Step 4: Sort the freqArray using Bubble Sort
// Sort criteria:
// 1. By frequency in descending order
// 2. If frequencies are same, by value in ascending order
for (int i = 0; i < uniqueCount - 1; i++) {
for (int j = 0; j < uniqueCount - i - 1; j++) {
// Compare frequencies
if (freqArray[j].frequency < freqArray[j+1].frequency) {
// Swap if current freq is less than next freq
struct ElementFreq temp = freqArray[j];
freqArray[j] = freqArray[j+1];
freqArray[j+1] = temp;
} else if (freqArray[j].frequency == freqArray[j+1].frequency) {
// If frequencies are equal, compare values
if (freqArray[j].value > freqArray[j+1].value) {
// Swap if current value is greater than next value
struct ElementFreq temp = freqArray[j];
freqArray[j] = freqArray[j+1];
freqArray[j+1] = temp;
}
}
}
}
// Step 5: Print the elements based on the sorted freqArray
printf("Original array: ");
for(int i=0; i<n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
printf("Sorted by frequency: ");
printSortedArray(freqArray, uniqueCount);
return 0;
}
Sample Output:
Original array: 2 5 2 8 5 6 8 8
Sorted by frequency: 8 8 8 2 2 5 5 6
Stepwise Explanation:
- Define
ElementFreqStructure: Astruct ElementFreqis created to hold an integervalueand its correspondingfrequency. - Initialize
freqArray: An array ofElementFreqstructs,freqArray, is declared to store the unique elements and their counts.uniqueCounttracks how many unique elements have been found. - Count Frequencies:
- The code iterates through the original input array
arr.
- The code iterates through the original input array
- For each element, it checks if it has already been added to
freqArray. - If found, its
frequencyis incremented. - If not found, it's added as a new unique element with a frequency of 1, and
uniqueCountis incremented.
- Sort
freqArray:- A standard Bubble Sort algorithm is applied to
freqArrayup touniqueCount.
- A standard Bubble Sort algorithm is applied to
- The comparison logic is crucial:
- It first compares the
frequencyof two elements. If the current element's frequency is less than the next, they are swapped (for descending frequency order). - If frequencies are equal, it then compares their
value. If the current element's value is greater than the next, they are swapped (for ascending value order).
- Print Sorted Array: The
printSortedArrayfunction iterates through the sortedfreqArray. For eachElementFreqstruct, it prints itsvalueas many times as itsfrequency.
Approach 2: Using qsort with a Custom Comparison Function
This approach is similar in concept to Approach 1 for counting frequencies, but it leverages C's standard library qsort function for sorting. qsort is generally more efficient than manual sorts for larger datasets and requires a custom comparison function.
- Summary: Count frequencies of unique elements, store them in a custom struct array, and sort this array using the standard
qsortfunction with a custom comparison function that implements the frequency and value sorting logic.
C Code Example:
// Sort Array by Frequency - qsort Approach
#include <stdio.h>
#include <stdlib.h> // For qsort
// Structure to hold element and its frequency
struct ElementFreq {
int value;
int frequency;
};
// Custom comparison function for qsort
// It defines the sorting logic:
// 1. Sort by frequency in descending order
// 2. If frequencies are same, sort by value in ascending order
int compare(const void *a, const void *b) {
struct ElementFreq *elemA = (struct ElementFreq *)a;
struct ElementFreq *elemB = (struct ElementFreq *)b;
// Compare frequencies first (descending order)
if (elemA->frequency != elemB->frequency) {
return elemB->frequency - elemA->frequency; // For descending frequency
}
// If frequencies are equal, compare values (ascending order)
return elemA->value - elemB->value; // For ascending value
}
// Function to print the sorted array
void printSortedArray(struct ElementFreq freqArray[], int uniqueCount) {
for (int i = 0; i < uniqueCount; i++) {
for (int j = 0; j < freqArray[i].frequency; j++) {
printf("%d ", freqArray[i].value);
}
}
printf("\\n");
}
int main() {
// Step 1: Initialize the input array
int arr[] = {2, 5, 2, 8, 5, 6, 8, 8};
int n = sizeof(arr) / sizeof(arr[0]);
// Step 2: Create a temporary array of ElementFreq structs
struct ElementFreq freqArray[n];
int uniqueCount = 0;
// Step 3: Count frequencies of each unique element
for (int i = 0; i < n; i++) {
int isPresent = 0;
for (int j = 0; j < uniqueCount; j++) {
if (freqArray[j].value == arr[i]) {
freqArray[j].frequency++;
isPresent = 1;
break;
}
}
if (!isPresent) {
freqArray[uniqueCount].value = arr[i];
freqArray[uniqueCount].frequency = 1;
uniqueCount++;
}
}
// Step 4: Sort the freqArray using qsort with the custom comparison function
qsort(freqArray, uniqueCount, sizeof(struct ElementFreq), compare);
// Step 5: Print the elements based on the sorted freqArray
printf("Original array: ");
for(int i=0; i<n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
printf("Sorted by frequency: ");
printSortedArray(freqArray, uniqueCount);
return 0;
}
Sample Output:
Original array: 2 5 2 8 5 6 8 8
Sorted by frequency: 8 8 8 2 2 5 5 6
Stepwise Explanation:
- Define
ElementFreqStructure: Same as Approach 1,struct ElementFreqstoresvalueandfrequency. compareFunction: This is the heart of theqsortapproach.- It takes two
const voidpointers, whichqsortpasses. These are cast back tostruct ElementFreqto access their members.
- It takes two
- Frequency Comparison: It first compares
elemB->frequencywithelemA->frequency. SubtractingelemAfromelemBresults in a positive value ifelemBhas a higher frequency (meaningelemBcomes beforeelemA), a negative value ifelemAhas higher (meaningelemAcomes beforeelemB), and zero if they are equal. This achieves descending order by frequency. - Value Tie-breaker: If frequencies are equal, it then compares
elemA->valuewithelemB->value. SubtractingelemBfromelemAresults in ascending order by value.
- Initialize
freqArrayand Count Frequencies: This part is identical to Approach 1, where unique elements and their frequencies are populated intofreqArray. - Sort with
qsort: Theqsortfunction is called:freqArray: The base address of the array to be sorted.
uniqueCount: The number of elements in the array to sort.sizeof(struct ElementFreq): The size of each element in the array.compare: A pointer to the custom comparison function.
- Print Sorted Array: Identical to Approach 1, it iterates through the
freqArrayand prints each element its corresponding number of times.
Conclusion
Sorting an array by the frequency of its elements is a common task in programming and data analysis. We've explored two robust methods in C. The manual approach with Bubble Sort provides a clear step-by-step understanding of the underlying logic for counting and sorting. The qsort approach offers a more efficient and idiomatic C solution by leveraging the standard library's powerful sorting capabilities with a custom comparison function. Both methods effectively achieve the desired frequency-based sorting while demonstrating different ways to handle the sorting logic in C.
Summary
- Problem: Sort array elements by frequency (descending) and then by value (ascending) for ties.
- Approach 1 (Manual Count & Bubble Sort):
- Uses a
struct ElementFreq { int value; int frequency; }to store unique element data. - Iterates through the original array to populate the
ElementFreqarray, counting frequencies. - Applies a manual sorting algorithm (e.g., Bubble Sort) to
ElementFreqarray with custom comparison logic. - Approach 2 (
qsortwith Custom Comparison): - Also uses a
struct ElementFreqfor frequency counting. - Leverages the
qsortstandard library function for efficient sorting. - Requires a
comparefunction that defines the sorting order: frequency (descending), then value (ascending) for ties. - Key Concept: Creating a temporary structure to pair elements with their frequencies is central to both solutions.
- Efficiency:
qsortis generally more efficient for larger datasets due to its optimized implementation (often Quicksort or Introsort).