Remove Duplicate Elements In An Array Using C Program
Understanding how to manage and manipulate data efficiently is a fundamental skill in programming. One common task involves processing lists or collections of items to ensure uniqueness. Specifically, removing duplicate elements from an array is a challenge that frequently arises in various programming contexts. In this article, you will learn how to identify and eliminate duplicate entries from an array using different C programming approaches.
Problem Statement
The core problem is to take an array of elements (e.g., integers) that may contain multiple instances of the same value and transform it into an array where each value appears only once. The challenge often lies in performing this operation efficiently, either by using minimal additional memory or by modifying the array directly (in-place).
For instance, given an array [1, 2, 3, 2, 1, 4], the desired output after removing duplicates would be [1, 2, 3, 4]. The order of the unique elements might or might not be preserved depending on the chosen approach.
Example
Consider an initial array of integers:
int arr[] = {10, 20, 20, 30, 40, 40, 50};
After applying a duplicate removal process, the array should effectively contain:
{10, 20, 30, 40, 50}
The actual implementation might return a new array with these elements or modify the original array and provide a new "effective" size.
Background & Knowledge Prerequisites
To understand the solutions presented, readers should have a basic understanding of:
- C Programming Fundamentals: Variables, data types, loops (
for,while), conditional statements (if,else). - Arrays: How to declare, initialize, and access elements in a one-dimensional array.
- Functions: Basic function declaration and calling.
No specific libraries beyond standard input/output (stdio.h) are required for these examples.
Use Cases or Case Studies
Removing duplicate elements from an array is a widely applicable task across different domains:
- Data Cleaning: In data processing, raw datasets often contain redundant entries. Removing duplicates helps ensure data integrity and reduces processing overhead.
- Database Operations: Before inserting data into a database table with unique constraints, duplicates might need to be filtered out to prevent errors.
- Search and Indexing: When building search indexes or lookup tables, storing only unique keywords or identifiers can significantly improve performance and reduce memory footprint.
- Statistical Analysis: Ensuring unique samples in a dataset is crucial for accurate statistical calculations and analyses.
- User Management: Preventing duplicate usernames or email addresses during user registration processes.
Solution Approaches
Here we explore two common methods to remove duplicate elements from an array in C, each suitable for different scenarios.
Approach 1: Using an Auxiliary Array (for Unsorted Arrays)
This approach iterates through the original array and adds each element to a new (auxiliary) array only if it hasn't been added before. This preserves the relative order of the first occurrences of elements.
One-line summary
Iterate through the original array and copy unique elements to a separate array.Code example
// Remove Duplicates Using Auxiliary Array
#include <stdio.h>
int main() {
int original_arr[] = {10, 20, 20, 30, 40, 40, 50, 60, 10};
int n = sizeof(original_arr) / sizeof(original_arr[0]);
int unique_arr[n]; // Auxiliary array to store unique elements
int unique_count = 0; // Current count of unique elements
printf("Original Array: ");
for (int i = 0; i < n; i++) {
printf("%d ", original_arr[i]);
}
printf("\\n");
// Step 1: Iterate through the original array
for (int i = 0; i < n; i++) {
int is_duplicate = 0;
// Step 2: Check if the current element is already in unique_arr
for (int j = 0; j < unique_count; j++) {
if (original_arr[i] == unique_arr[j]) {
is_duplicate = 1;
break; // Found a duplicate, no need to check further
}
}
// Step 3: If not a duplicate, add it to unique_arr
if (!is_duplicate) {
unique_arr[unique_count] = original_arr[i];
unique_count++;
}
}
printf("Array after removing duplicates: ");
for (int i = 0; i < unique_count; i++) {
printf("%d ", unique_arr[i]);
}
printf("\\n");
return 0;
}
Sample output
Original Array: 10 20 20 30 40 40 50 60 10
Array after removing duplicates: 10 20 30 40 50 60
Stepwise explanation
- Initialization: An
original_arris defined, and an emptyunique_arrof the same maximum size is created along withunique_count(initialized to 0). - Outer Loop: The code iterates through each element of the
original_arrusing the indexi. - Inner Loop (Duplicate Check): For each element
original_arr[i], an inner loop iterates through theunique_arr(from indexj=0tounique_count-1).- It checks if
original_arr[i]matches any element already stored inunique_arr.
- It checks if
is_duplicate is set to 1, and the inner loop breaks, as the element is already present.- Add Unique Element: After the inner loop completes, if
is_duplicateis still 0 (meaning the element is truly unique so far),original_arr[i]is added tounique_arrat theunique_countposition, andunique_countis incremented. - Result: Finally,
unique_arr(up tounique_count) contains all the unique elements from the original array.
Approach 2: In-Place Removal for Sorted Arrays
If the array is sorted, duplicate elements will always be adjacent. This property allows for a highly efficient in-place removal technique that doesn't require an auxiliary array. If the array is unsorted, it must be sorted first.
One-line summary
Sort the array first, then traverse it with two pointers to move unique elements to the beginning.Code example
// Remove Duplicates In-Place (Sorted Array)
#include <stdio.h>
#include <stdlib.h> // For qsort
#include <string.h> // For memcpy (not strictly needed but useful for array manipulation)
// Comparison function for qsort
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
int arr[] = {10, 20, 20, 30, 40, 40, 50, 60, 10};
int n = sizeof(arr) / sizeof(arr[0]);
printf("Original Array: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Step 1: Sort the array
qsort(arr, n, sizeof(int), compare);
printf("Sorted Array: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
if (n == 0) {
printf("Array is empty, no duplicates to remove.\\n");
return 0;
}
// Step 2: Use two pointers for in-place removal
int i = 0; // Pointer for the next unique element position
// j iterates through the array
for (int j = 1; j < n; j++) {
// If current element is different from the last unique element
if (arr[j] != arr[i]) {
i++; // Move the unique pointer forward
arr[i] = arr[j]; // Place the unique element
}
}
// The new effective size of the array is i + 1
int new_n = i + 1;
printf("Array after removing duplicates (in-place): ");
for (int k = 0; k < new_n; k++) {
printf("%d ", arr[k]);
}
printf("\\n");
return 0;
}
Sample output
Original Array: 10 20 20 30 40 40 50 60 10
Sorted Array: 10 10 20 20 30 40 40 50 60
Array after removing duplicates (in-place): 10 20 30 40 50 60
Stepwise explanation
- Sorting: The
qsortfunction fromstdlib.his used to sort thearrin ascending order. This places all duplicate elements next to each other. - Two-Pointer Initialization:
-
iis initialized to 0. This pointer will track the index of the last unique element found and also where the next unique element should be placed.
-
j is initialized to 1. This pointer will iterate through the sorted array from the second element.- Traversal and Comparison: The
forloop iteratesjfrom the second element to the end of the array.- Inside the loop,
arr[j]is compared witharr[i].
- Inside the loop,
arr[j] is *different* from arr[i], it means arr[j] is a new unique element.i is incremented (moving the unique elements' boundary forward).arr[i] is then updated with arr[j], effectively moving the unique element to its correct position at the beginning of the array's "unique" section.arr[j] is *the same* as arr[i], it means arr[j] is a duplicate and is simply skipped, effectively "removing" it by not copying it over.- New Size: After the loop,
i + 1represents the new effective size of the array, containing only unique elements at the beginning. The elements beyond this new size are irrelevant. - Result: The modified
arr(up tonew_nelements) contains only the unique elements in sorted order.
Conclusion
Removing duplicate elements from an array is a common data manipulation task with various solutions depending on the array's state (sorted or unsorted) and memory constraints. Using an auxiliary array is a straightforward method for unsorted arrays, preserving the relative order of unique elements but consuming extra space. For sorted arrays, an efficient in-place two-pointer approach offers optimal space complexity.
Summary
- Problem: Eliminate multiple occurrences of the same element in an array, resulting in a list of unique elements.
- Auxiliary Array Approach:
- Suitable for unsorted arrays.
- Iterates and copies unique elements to a new array.
- Preserves the relative order of first occurrences.
- Time complexity: O(n\*m) where n is original size, m is unique elements size (can be O(n^2) in worst case).
- Space complexity: O(n) for the auxiliary array.
- In-Place Sorted Array Approach:
- Requires the array to be sorted first.
- Uses two pointers to efficiently move unique elements to the front of the array.
- Modifies the original array directly.
- Time complexity: O(n log n) due to sorting, then O(n) for removal.
- Space complexity: O(1) for in-place removal (excluding sorting's auxiliary space if any).