Count Distinct Elements In Array In C Program
Counting the distinct elements in an array is a common programming challenge that involves identifying and quantifying unique values within a collection. This task is fundamental in data analysis, algorithm optimization, and database operations. In this article, you will learn how to approach and solve this problem in C using several different methods, each with its own advantages and trade-offs.
Problem Statement
The problem requires determining the number of unique values present in a given array. For instance, if an array contains [1, 2, 2, 3, 4, 3, 5], the distinct elements are 1, 2, 3, 4, 5, and the count of distinct elements is 5. This is crucial in scenarios where only the unique data points are relevant, such as analyzing survey responses, tracking unique visitors, or processing sensor data to avoid redundant information.
Example
Consider the integer array: [10, 20, 10, 30, 40, 20, 50]
The distinct elements in this array are 10, 20, 30, 40, 50.
The total count of distinct elements is 5.
Background & Knowledge Prerequisites
To effectively understand and implement the solutions for counting distinct elements in C, readers should be familiar with:
- C Language Basics: Variables, data types, operators.
- Arrays: Declaring, initializing, and accessing elements.
- Loops:
forandwhileloops for iteration. - Conditional Statements:
if-elseconstructs for decision-making. - Functions: Defining and calling simple functions.
- Pointers: Basic understanding for functions like
qsort.
No special header files beyond stdio.h and stdlib.h (for qsort) are typically required for the basic approaches.
Use Cases or Case Studies
Counting distinct elements is a versatile operation applicable in various fields:
- Data Analytics: Identifying the number of unique products sold, unique customers, or unique error codes in logs to understand patterns and anomalies.
- Database Optimization: In relational databases, calculating the
COUNT(DISTINCT column_name)is a frequent query for summary reports. - Network Monitoring: Determining the number of unique IP addresses accessing a server or the unique types of network packets observed to detect unusual activity.
- Inventory Management: Tracking the variety of different items in stock, regardless of their quantity.
- Competitive Programming: A common sub-problem in algorithmic challenges where efficient unique element counting can optimize overall solution performance.
Solution Approaches
Here are four common methods to count distinct elements in an array, ranging from straightforward to more optimized techniques.
Approach 1: Using Nested Loops (Brute Force)
This method iterates through each element and compares it with all preceding elements to check for duplicates. If an element has no duplicates before it, it is considered distinct.
- Summary: Iterate through the array; for each element, check if it has appeared before. If not, increment a distinct count.
- Code Example:
// Count Distinct Elements (Nested Loops)
#include <stdio.h>
int main() {
int arr[] = {10, 20, 10, 30, 40, 20, 50};
int n = sizeof(arr) / sizeof(arr[0]);
int distinct_count = 0;
// Step 1: Iterate through each element of the array
for (int i = 0; i < n; i++) {
// Step 2: Assume current element is distinct
int is_distinct = 1;
// Step 3: Compare current element with all preceding elements
for (int j = 0; j < i; j++) {
if (arr[i] == arr[j]) {
is_distinct = 0; // Found a duplicate
break; // No need to check further for this element
}
}
// Step 4: If no duplicate was found, increment distinct count
if (is_distinct) {
distinct_count++;
}
}
printf("Array: ");
for(int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
printf("Number of distinct elements (Nested Loops): %d\\n", distinct_count);
return 0;
}- Sample Output:
Array: 10 20 10 30 40 20 50
Number of distinct elements (Nested Loops): 5- Stepwise Explanation:
- Initialize
distinct_countto 0. - Use an outer loop (
i) to go through each element of the array. - For each
arr[i], set a flagis_distinctto 1 (assuming it's distinct). - Use an inner loop (
j) to comparearr[i]with all elementsarr[j]wherej < i. - If
arr[i]is found to be equal to anyarr[j], it meansarr[i]is a duplicate. Setis_distinctto 0 and break the inner loop. - After the inner loop, if
is_distinctis still 1, it meansarr[i]is a unique element encountered for the first time, so incrementdistinct_count.
Approach 2: Using a Sorting Algorithm
Sorting the array brings identical elements next to each other, making it easy to count distinct values by simply comparing adjacent elements.
- Summary: Sort the array first, then iterate through the sorted array, counting an element as distinct only if it's different from its predecessor.
- Code Example:
// Count Distinct Elements (Sorting)
#include <stdio.h>
#include <stdlib.h> // Required for qsort
// Comparison function for qsort (ascending order)
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
int arr[] = {10, 20, 10, 30, 40, 20, 50};
int n = sizeof(arr) / sizeof(arr[0]);
int distinct_count = 0;
printf("Original Array: ");
for(int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Step 1: Sort the array
qsort(arr, n, sizeof(int), compare);
printf("Sorted Array: ");
for(int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Step 2: If the array is not empty, the first element is always distinct
if (n > 0) {
distinct_count = 1;
// Step 3: Iterate through the sorted array from the second element
for (int i = 1; i < n; i++) {
// Step 4: If the current element is different from the previous one, it's distinct
if (arr[i] != arr[i-1]) {
distinct_count++;
}
}
}
printf("Number of distinct elements (Sorting): %d\\n", distinct_count);
return 0;
}- Sample Output:
Original Array: 10 20 10 30 40 20 50
Sorted Array: 10 10 20 20 30 40 50
Number of distinct elements (Sorting): 5- Stepwise Explanation:
- First, sort the array using
qsort. This function requires a custom comparison function (comparein this case) to determine the order. - If the array is not empty, initialize
distinct_countto 1 because the first element is always distinct. - Iterate through the sorted array starting from the second element (
i = 1). - Compare
arr[i]with its preceding elementarr[i-1]. - If
arr[i]is different fromarr[i-1], it means a new distinct element has been encountered, so incrementdistinct_count.
Approach 3: Using a Frequency Array (for limited integer range)
This method is highly efficient when the array elements are integers within a known, relatively small range (e.g., 0-1000). It uses an auxiliary array to mark the presence of each element.
- Summary: Create a boolean or integer array (frequency array) of size
MAX_VALUE + 1. Iterate through the input array, marking the presence of each element in the frequency array. Then, count the marked entries. - Code Example:
// Count Distinct Elements (Frequency Array)
#include <stdio.h>
#include <stdbool.h> // For bool type
#define MAX_VAL 100 // Assuming array elements are within 0 to MAX_VAL
int main() {
int arr[] = {10, 20, 10, 30, 40, 20, 50};
int n = sizeof(arr) / sizeof(arr[0]);
int distinct_count = 0;
// Step 1: Create a boolean array to mark presence, initialized to false
// The size should be MAX_VAL + 1 if elements can be 0 to MAX_VAL
bool seen[MAX_VAL + 1] = {false}; // All elements are initially not seen
// Step 2: Iterate through the input array
for (int i = 0; i < n; i++) {
// Step 3: Check if element is within valid range
if (arr[i] >= 0 && arr[i] <= MAX_VAL) {
// Step 4: If element hasn't been seen before, mark it as seen
if (!seen[arr[i]]) {
seen[arr[i]] = true;
distinct_count++; // Increment count for new distinct element
}
} else {
printf("Warning: Element %d out of defined range (0-%d). Skipping.\\n", arr[i], MAX_VAL);
}
}
printf("Array: ");
for(int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
printf("Number of distinct elements (Frequency Array): %d\\n", distinct_count);
return 0;
}- Sample Output:
Array: 10 20 10 30 40 20 50
Number of distinct elements (Frequency Array): 5- Stepwise Explanation:
- Define a
MAX_VALconstant based on the expected maximum value in the array. - Declare a boolean array
seenof sizeMAX_VAL + 1and initialize all its elements tofalse. Thisseenarray will act as a hash map where the index is the element value. - Iterate through the input array
arr. - For each
arr[i], check ifseen[arr[i]]isfalse. - If it's
false, it means this element is encountered for the first time. Setseen[arr[i]]totrueand incrementdistinct_count. - (Optional but good practice) Add a check to ensure
arr[i]is within the valid bounds of theseenarray to prevent out-of-bounds access.
Approach 4: Using a Temporary Array to Store Distinct Elements
This approach creates a new array to store only the distinct elements found so far. Each new element from the original array is checked against this temporary array.
- Summary: Iterate through the original array. For each element, check if it already exists in a separate temporary array. If not, add it to the temporary array and increment the distinct count.
- Code Example:
// Count Distinct Elements (Temporary Array)
#include <stdio.h>
#include <stdbool.h> // For bool type
int main() {
int arr[] = {10, 20, 10, 30, 40, 20, 50};
int n = sizeof(arr) / sizeof(arr[0]);
int distinct_count = 0;
// Step 1: Create a temporary array to store distinct elements found
// Its maximum size can be n (if all elements are distinct)
int distinct_elements[n];
// Step 2: Iterate through the original array
for (int i = 0; i < n; i++) {
// Step 3: Assume current element is distinct for the temporary array
bool found_in_distinct_array = false;
// Step 4: Compare current element with elements already in distinct_elements array
for (int j = 0; j < distinct_count; j++) {
if (arr[i] == distinct_elements[j]) {
found_in_distinct_array = true;
break; // Found a duplicate in the distinct_elements array
}
}
// Step 5: If not found, add it to distinct_elements array and increment count
if (!found_in_distinct_array) {
distinct_elements[distinct_count] = arr[i];
distinct_count++;
}
}
printf("Array: ");
for(int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
printf("Number of distinct elements (Temporary Array): %d\\n", distinct_count);
printf("Distinct Elements: ");
for(int i = 0; i < distinct_count; i++) {
printf("%d ", distinct_elements[i]);
}
printf("\\n");
return 0;
}- Sample Output:
Array: 10 20 10 30 40 20 50
Number of distinct elements (Temporary Array): 5
Distinct Elements: 10 20 30 40 50- Stepwise Explanation:
- Declare an integer array
distinct_elementswith a maximum size equal to the original array's sizen. This array will hold the unique elements as they are found. - Initialize
distinct_countto 0. This also serves as the current size of thedistinct_elementsarray. - Use an outer loop to iterate through each element
arr[i]of the original array. - For each
arr[i], use an inner loop to check ifarr[i]already exists in thedistinct_elementsarray (up todistinct_countelements). - If
arr[i]is not found indistinct_elements, add it todistinct_elementsat thedistinct_countindex and then incrementdistinct_count.
Conclusion
Counting distinct elements in an array is a foundational problem with various solutions, each suited for different scenarios. The nested loops approach is straightforward but less efficient for large datasets. Sorting provides a more efficient approach (O(N log N)), while the frequency array method offers excellent performance (O(N)) for arrays with a limited range of integer values. The temporary array approach, while also having a potential O(N^2) search, directly builds a list of distinct elements. Choosing the best method depends on factors like array size, element range, and performance requirements.
Summary
- Problem: Determine the number of unique values in an array.
- Nested Loops: Simple, O(N^2) time complexity. Compares each element with all preceding ones.
- Sorting: Efficient, O(N log N) time complexity. Sorts the array and then counts by comparing adjacent elements.
- Frequency Array: Highly efficient, O(N + M) time complexity (where M is max value). Best for integer arrays with a limited, known range, using an auxiliary array to mark presence.
- Temporary Array: Builds a list of unique elements, checking each new element against previously found unique elements. Performance can be O(N^2) for search.
- Choice: Select based on array size, element value range, and performance needs.