Java Program To Remove Duplicate Elements In An Array Using Hashset
Duplicate elements within an array can lead to incorrect data processing, skewed analysis, or inefficient storage. In this article, you will learn how to efficiently remove duplicate elements from a Java array using the HashSet collection.
Problem Statement
Arrays are fundamental data structures, but they inherently allow duplicate values. For instance, an array representing user IDs might accidentally contain the same ID multiple times. Processing such an array without first removing duplicates can lead to redundant operations, increased memory consumption, and inaccurate results in applications like reporting, database indexing, or unique record generation.
Example
Consider an integer array that contains several repeated values:
Original Array: [10, 20, 10, 30, 40, 20, 50]
After removing duplicates, the desired output would be an array or list containing only the unique elements:
Array with Duplicates Removed: [10, 20, 30, 40, 50] (order might vary depending on the approach)
Background & Knowledge Prerequisites
To understand the solution presented, readers should be familiar with:
- Java Basics: Variables, loops (
for-each), method calls. - Arrays in Java: Declaration, initialization, and iteration.
- Java Collections Framework: An understanding of interfaces like
Setand concrete classes likeHashSet. Specifically, knowing thatHashSetdoes not allow duplicate elements and does not guarantee insertion order. - Autoboxing/Unboxing: The automatic conversion between primitive types (like
int) and their corresponding wrapper classes (likeInteger).
Use Cases or Case Studies
Removing duplicate elements from arrays is a common requirement in various programming scenarios:
- Data Cleaning: Pre-processing raw data fetched from a database or file system to ensure uniqueness before further analysis or storage.
- Unique Identifier Generation: Ensuring a list of generated IDs, tokens, or codes contains only distinct values.
- Search Optimization: Creating a set of unique search keywords from user input to avoid redundant queries.
- Frequency Counting: Obtaining a list of unique items before calculating the frequency of each item in a larger dataset.
- Inventory Management: Identifying all unique product IDs currently in stock from a list that might contain multiple entries for the same product.
Solution Approaches
Approach 1: Using HashSet
This approach leverages the HashSet data structure, which inherently stores only unique elements.
Summary: Convert the array elements into a HashSet, which automatically filters out duplicates, and then optionally convert the HashSet back into an array or a List.
// Remove Duplicates from Array using HashSet
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import java.util.ArrayList;
import java.util.List;
// Main class containing the entry point of the program
public class Main {
public static void main(String[] args) {
// Step 1: Define the array with duplicate elements
Integer[] originalArray = {10, 20, 10, 30, 40, 20, 50, 10};
System.out.println("Original Array: " + Arrays.toString(originalArray));
// Step 2: Create a HashSet to store unique elements
// HashSet automatically handles duplicates
Set<Integer> uniqueElements = new HashSet<>();
// Step 3: Iterate through the original array and add elements to the HashSet
// Adding an element that already exists in a HashSet has no effect.
for (Integer element : originalArray) {
uniqueElements.add(element);
}
// Step 4: Convert the HashSet back to an array or List if needed
// Option A: Convert to an ArrayList
List<Integer> listWithoutDuplicates = new ArrayList<>(uniqueElements);
System.out.println("List without Duplicates (using HashSet): " + listWithoutDuplicates);
// Option B: Convert back to an array (if the original array type is needed)
// You need to specify the size of the new array to the toArray() method.
Integer[] arrayWithoutDuplicates = uniqueElements.toArray(new Integer[0]);
System.out.println("Array without Duplicates (using HashSet): " + Arrays.toString(arrayWithoutDuplicates));
}
}
Sample Output:
Original Array: [10, 20, 10, 30, 40, 20, 50, 10]
List without Duplicates (using HashSet): [50, 20, 40, 10, 30]
Array without Duplicates (using HashSet): [50, 20, 40, 10, 30]
Stepwise Explanation:
- Initialize Array: An
IntegerarrayoriginalArrayis declared and populated with elements, including duplicates. UsingInteger(wrapper class) is necessary becauseHashSetstores objects, not primitive types directly (though autoboxing handles the conversion frominttoInteger). - Create
HashSet: AHashSetnameduniqueElementsis instantiated. This set will be used to store unique elements. The key property ofHashSetis that it guarantees uniqueness; if you try to add an element that already exists, the operation is ignored, and the set remains unchanged. - Populate
HashSet: Afor-eachloop iterates through eachelementin theoriginalArray. Inside the loop,uniqueElements.add(element)is called. For every element,HashSetchecks if it's already present. If not, it's added. If it is, nothing happens, effectively removing duplicates. - Convert Back (Optional): After all elements from the array have been added to the
HashSet,uniqueElementswill contain only the distinct values.
- To get these unique elements back into an ordered structure like an
ArrayList, a newArrayListcan be constructed directly from theHashSet(new ArrayList<>(uniqueElements)). Note that the order might not be the original insertion order becauseHashSetdoes not guarantee order. - To get them back into an array, the
toArray()method of theHashSetcan be used. It's common to passnew Integer[0]totoArray()to specify the runtime type of the array, letting the method allocate an array of the correct size.
Conclusion
The HashSet approach provides a clean, concise, and efficient way to remove duplicate elements from an array in Java. Its underlying hash table implementation offers nearly constant time performance (O(1) on average) for add() and contains() operations, making it highly suitable for large datasets where performance is critical. While it doesn't preserve the original order of elements, its simplicity and efficiency make it a preferred solution for duplicate removal tasks.
Summary
- Duplicate elements in arrays can lead to inefficiencies and errors.
- The
HashSetcollection in Java is designed to store only unique elements. - To remove duplicates from an array:
- Create a
HashSet. - Add all array elements to the
HashSetusing a loop. - Optionally, convert the
HashSetback to anArrayListor a new array to get the unique elements in a list or array format. -
HashSetoperations (likeadd) have an average time complexity of O(1), making this approach very efficient. - This method does not guarantee the preservation of the original insertion order of elements.