Remove Duplicate Elements In Array Java Program
Removing duplicate elements from an array is a common task in programming, essential for data cleaning, optimization, and ensuring data integrity. In Java, there are several effective ways to achieve this, catering to different requirements regarding order preservation and performance.
In this article, you will learn how to remove duplicate elements from an array in Java using various approaches, including leveraging Java Collections Framework and Streams.
Problem Statement
Arrays are fundamental data structures, but they often contain redundant information in the form of duplicate elements. These duplicates can lead to incorrect calculations, inefficient storage, and errors in logic if not handled properly. For instance, an array tracking user IDs might inadvertently list the same ID multiple times, skewing unique user counts. The core problem is to transform an array containing duplicates into one that holds only unique elements, effectively eliminating any repeated values.
Example
Consider an array of integers with duplicates:
int[] originalArray = {1, 2, 3, 2, 1, 4, 5, 4};
The desired output after removing duplicates, assuming no specific order is required, would be an array containing only unique elements:
{1, 2, 3, 4, 5}
Background & Knowledge Prerequisites
To understand the solutions presented in this article, a basic understanding of the following Java concepts is beneficial:
- Arrays: How to declare, initialize, and iterate over arrays.
- Java Collections Framework:
- Sets: Specifically
HashSetandLinkedHashSet, which inherently store only unique elements. - Lists:
ArrayListfor dynamic array manipulation. - Loops:
forand enhancedforloops for iteration. - Java Streams (Java 8+): Basic understanding of
stream(),distinct(),collect(), andtoArray()methods.
Use Cases or Case Studies
Removing duplicates from arrays is crucial in many real-world scenarios:
- Data Preprocessing: Before analyzing data, duplicates are often removed to ensure accuracy, such as cleaning customer lists or sensor readings.
- Search Engine Optimization: Indexing unique keywords or URLs to prevent redundancy and improve search efficiency.
- User Management: Ensuring each user has a unique ID in a system, preventing multiple entries for the same user.
- Resource Allocation: Managing unique identifiers for resources (e.g., IP addresses, device serial numbers) to avoid conflicts.
- Inventory Systems: Keeping track of unique product SKUs to accurately manage stock levels.
Solution Approaches
Here are several approaches to remove duplicate elements from an array in Java, each with its own advantages.
Approach 1: Using HashSet
This is one of the most common and efficient ways to remove duplicates. A HashSet stores only unique elements and does not maintain insertion order.
- Summary: Convert the array to a
HashSetto leverage its unique-element property, then convert it back to an array.
// Remove Duplicates using HashSet
import java.util.HashSet;
import java.util.Arrays;
import java.util.Set;
public class Main {
public static void main(String[] args) {
// Step 1: Define the array with duplicate elements
Integer[] originalArray = {1, 2, 3, 2, 1, 4, 5, 4, 6};
System.out.println("Original Array: " + Arrays.toString(originalArray));
// Step 2: Create a HashSet and add all elements from the array
// HashSet automatically handles uniqueness
Set<Integer> uniqueElements = new HashSet<>(Arrays.asList(originalArray));
// Step 3: Convert the HashSet back to an array
// Using toArray(new T[0]) for type safety
Integer[] arrayWithoutDuplicates = uniqueElements.toArray(new Integer[0]);
// Step 4: Print the array after removing duplicates
System.out.println("Array without Duplicates (HashSet): " + Arrays.toString(arrayWithoutDuplicates));
}
}
Sample Output:
Original Array: [1, 2, 3, 2, 1, 4, 5, 4, 6]
Array without Duplicates (HashSet): [1, 2, 3, 4, 5, 6]
Stepwise Explanation:
- An
IntegerarrayoriginalArrayis initialized with some duplicate values. - A
HashSetnameduniqueElementsis created.Arrays.asList(originalArray)converts the array into aList, which is then passed to theHashSetconstructor. - The
HashSetautomatically filters out duplicate elements, retaining only one instance of each. uniqueElements.toArray(new Integer[0])converts theHashSetback into anIntegerarray. Thenew Integer[0]argument is a common idiom to ensure the correct type of array is returned.- The resulting array
arrayWithoutDuplicatescontains only unique elements, but their order might not be the same as in the original array due toHashSet's nature.
Approach 2: Using LinkedHashSet
If preserving the insertion order of unique elements is important, LinkedHashSet is the ideal choice.
- Summary: Similar to
HashSet, butLinkedHashSetmaintains the order in which elements were first inserted.
// Remove Duplicates using LinkedHashSet
import java.util.LinkedHashSet;
import java.util.Arrays;
import java.util.Set;
public class Main {
public static void main(String[] args) {
// Step 1: Define the array with duplicate elements
Integer[] originalArray = {1, 2, 3, 2, 1, 4, 5, 4, 6};
System.out.println("Original Array: " + Arrays.toString(originalArray));
// Step 2: Create a LinkedHashSet and add all elements from the array
// LinkedHashSet automatically handles uniqueness and maintains insertion order
Set<Integer> uniqueElements = new LinkedHashSet<>(Arrays.asList(originalArray));
// Step 3: Convert the LinkedHashSet back to an array
Integer[] arrayWithoutDuplicates = uniqueElements.toArray(new Integer[0]);
// Step 4: Print the array after removing duplicates
System.out.println("Array without Duplicates (LinkedHashSet): " + Arrays.toString(arrayWithoutDuplicates));
}
}
Sample Output:
Original Array: [1, 2, 3, 2, 1, 4, 5, 4, 6]
Array without Duplicates (LinkedHashSet): [1, 2, 3, 4, 5, 6]
Stepwise Explanation:
- An
IntegerarrayoriginalArrayis initialized. - A
LinkedHashSetnameduniqueElementsis created. When elements fromoriginalArrayare added,LinkedHashSetnot only ensures uniqueness but also remembers the order in which each *unique* element was first encountered. - The
LinkedHashSetis converted back to anIntegerarray usingtoArray(new Integer[0]). - The resulting
arrayWithoutDuplicatescontains unique elements in their original relative order.
Approach 3: Using Java 8 Streams distinct()
For modern Java development (Java 8 and above), streams provide a concise and functional way to remove duplicates.
- Summary: Convert the array to a stream, apply the
distinct()operation, and then convert the stream back to an array.
// Remove Duplicates using Java 8 Streams
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Step 1: Define the array with duplicate elements
int[] originalArray = {1, 2, 3, 2, 1, 4, 5, 4, 6};
System.out.println("Original Array: " + Arrays.toString(originalArray));
// Step 2: Convert the array to an IntStream, apply distinct(), and convert back to array
int[] arrayWithoutDuplicates = Arrays.stream(originalArray)
.distinct() // Filters out duplicate elements
.toArray(); // Converts the stream back to an array
// Step 3: Print the array after removing duplicates
System.out.println("Array without Duplicates (Streams): " + Arrays.toString(arrayWithoutDuplicates));
}
}
Sample Output:
Original Array: [1, 2, 3, 2, 1, 4, 5, 4, 6]
Array without Duplicates (Streams): [1, 2, 3, 4, 5, 6]
Stepwise Explanation:
- An
intarrayoriginalArrayis defined. Arrays.stream(originalArray)creates anIntStreamfrom the array..distinct()is an intermediate operation that returns a stream with unique elements, preserving the encounter order..toArray()is a terminal operation that collects the elements of the stream into a new array.- The
arrayWithoutDuplicatesholds unique elements in the order they first appeared in the original array. This approach is highly readable and concise.
Approach 4: Manual Approach with a Temporary Array (for primitive types, sorted)
This method can be used for primitive arrays when the order isn't strictly preserved (or if you sort it first) and you want to avoid boxing/unboxing overhead of Collections for very large arrays. It generally requires the array to be sorted first.
- Summary: Sort the array, then iterate through it, copying only unique adjacent elements to a new temporary array.
// Remove Duplicates Manually (Sorted Array)
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Step 1: Define the array with duplicate elements
int[] originalArray = {1, 2, 3, 2, 1, 4, 5, 4, 6};
System.out.println("Original Array: " + Arrays.toString(originalArray));
// Handle empty or single-element array edge case
if (originalArray.length <= 1) {
System.out.println("Array without Duplicates (Manual Sorted): " + Arrays.toString(originalArray));
return;
}
// Step 2: Sort the array
Arrays.sort(originalArray); // Output: [1, 1, 2, 2, 3, 4, 4, 5, 6]
System.out.println("Sorted Array: " + Arrays.toString(originalArray));
// Step 3: Create a temporary array to store unique elements
int[] temp = new int[originalArray.length];
int j = 0; // Index for the temp array
// Step 4: Iterate through the sorted array and copy unique elements
for (int i = 0; i < originalArray.length - 1; i++) {
if (originalArray[i] != originalArray[i + 1]) {
temp[j++] = originalArray[i];
}
}
// Add the last element (which is always unique in comparison to the element before it)
temp[j++] = originalArray[originalArray.length - 1];
// Step 5: Create a new array of the correct size and copy elements from temp
int[] arrayWithoutDuplicates = Arrays.copyOf(temp, j);
// Step 6: Print the array after removing duplicates
System.out.println("Array without Duplicates (Manual Sorted): " + Arrays.toString(arrayWithoutDuplicates));
}
}
Sample Output:
Original Array: [1, 2, 3, 2, 1, 4, 5, 4, 6]
Sorted Array: [1, 1, 2, 2, 3, 4, 4, 5, 6]
Array without Duplicates (Manual Sorted): [1, 2, 3, 4, 5, 6]
Stepwise Explanation:
- An
intarrayoriginalArrayis defined. - Edge cases for empty or single-element arrays are handled.
Arrays.sort(originalArray)sorts the array. This step is crucial because it brings all identical elements next to each other.- A temporary array
tempof the same size as the original is created. An indexjtracks the position intemp. - The code iterates through the *sorted*
originalArraycomparing adjacent elements. IforiginalArray[i]is different fromoriginalArray[i+1], it meansoriginalArray[i]is a unique element (or the last of a series of duplicates), so it's copied totemp. - The last element of the
originalArraymust always be added totempbecause it has noi+1element to compare against. - Finally,
Arrays.copyOf(temp, j)creates a new array of the exact sizej(number of unique elements) and copies the content fromtemp. - The
arrayWithoutDuplicatescontains unique elements in sorted order.
Conclusion
Removing duplicate elements from an array in Java can be achieved through various methods, each offering distinct advantages. HashSet is generally the most efficient for simply getting unique elements without regard for order. LinkedHashSet is preferred when the original insertion order of unique elements must be maintained. Java 8 Streams provide a very concise and readable functional approach with distinct(). For primitive arrays, a manual approach involving sorting can be an option, though often less elegant than using Collections or Streams.
Summary
- Problem: Arrays often contain duplicate elements that need to be removed for data integrity and efficiency.
-
HashSet: Efficiently removes duplicates, but does not guarantee order. Best for simple uniqueness. -
LinkedHashSet: Removes duplicates while preserving the first encountered order of unique elements. - Java 8 Streams (
distinct()): A concise, functional approach that preserves the encounter order of unique elements. Ideal for modern Java. - Manual (Sorted Array): Involves sorting the array and then iterating to copy only unique, non-adjacent elements. Suitable for primitive arrays but requires sorting, thus changing the original order to a sorted one.
- Choice of Method: Depends on whether order needs to be preserved and the Java version being used. For most cases,
HashSetor Streams are excellent choices.