Merge Two Sorted Arrays Without Duplicates In Java
When working with data in Java, you often encounter scenarios where you need to combine information from multiple sources. If these sources are sorted arrays and you need to consolidate them into a single, sorted list without any duplicate entries, there are several effective strategies. In this article, you will learn how to efficiently merge two sorted arrays while ensuring all elements are unique and the final array remains sorted.
Problem Statement
The core challenge involves taking two arrays, both of which are already sorted in ascending order and contain no internal duplicates, and producing a third array that is also sorted in ascending order and contains all unique elements from the initial two arrays. This often arises when combining pre-processed data sets where efficiency and data integrity (no duplicates) are crucial.
Example
Consider two sorted arrays arr1 and arr2. The goal is to merge them into a single sorted array result that contains only unique elements.
- Input:
- arr1 = [1, 3, 5, 7]
- arr2 = [2, 3, 4, 6, 7]
- Expected Output:
- result = [1, 2, 3, 4, 5, 6, 7]
Notice that 3 and 7 appear in both input arrays but only once in the final output.
Background & Knowledge Prerequisites
To understand and implement the solutions effectively, you should have a basic understanding of:
- Java Arrays: How to declare, initialize, and access elements in arrays.
-
ArrayList: Dynamic arrays in Java, useful for flexible resizing. -
HashSet: A collection that stores unique elements and provides fast lookup. -
TreeSet: A sorted set that automatically stores unique elements in their natural order. - Loops and Conditionals:
forloops,whileloops,if-elsestatements. - Sorting Algorithms (Conceptual): Understanding what it means for an array to be sorted.
Use Cases or Case Studies
Merging sorted arrays without duplicates is a common operation in various programming scenarios:
- Database Query Results: Combining sorted result sets from different database queries where distinct records are needed.
- Log File Analysis: Merging time-sorted log entries from multiple servers, ensuring each unique event is processed only once.
- Data Deduplication: Consolidating user lists, product catalogs, or sensor readings from different sources into a master list without redundant entries.
- Search Engine Indexing: Merging lists of document IDs from various index segments to create a unified, sorted list of relevant documents.
- Algorithm Optimization: As a sub-problem in more complex algorithms like external merge sort or set intersection/union operations.
Solution Approaches
Here are three distinct approaches to merge two sorted arrays without duplicates in Java.
Approach 1: Using HashSet and Sorting
This approach leverages the HashSet's ability to store only unique elements. We add all elements from both arrays into a HashSet, convert it to an ArrayList, and then sort the ArrayList to restore the order.
- Summary: Combine all elements into a
HashSetto eliminate duplicates, then convert theHashSetto anArrayListand sort it, finally converting to an array.
// Merge Two Sorted Arrays Without Duplicates using HashSet
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
// Main class containing the entry point of the program
public class Main {
public static void main(String[] args) {
int[] arr1 = {1, 3, 5, 7};
int[] arr2 = {2, 3, 4, 6, 7};
// Step 1: Create a HashSet to store unique elements
Set<Integer> uniqueElements = new HashSet<>();
// Step 2: Add all elements from arr1 to the HashSet
for (int num : arr1) {
uniqueElements.add(num);
}
// Step 3: Add all elements from arr2 to the HashSet
// HashSet automatically handles duplicates
for (int num : arr2) {
uniqueElements.add(num);
}
// Step 4: Convert the HashSet to an ArrayList
// This will contain unique elements, but not necessarily sorted
ArrayList<Integer> mergedList = new ArrayList<>(uniqueElements);
// Step 5: Sort the ArrayList
Collections.sort(mergedList);
// Step 6: Convert the sorted ArrayList back to an array
int[] result = new int[mergedList.size()];
for (int i = 0; i < mergedList.size(); i++) {
result[i] = mergedList.get(i);
}
// Step 7: Print the merged array
System.out.println("Array 1: " + Arrays.toString(arr1));
System.out.println("Array 2: " + Arrays.toString(arr2));
System.out.println("Merged (HashSet & Sort): " + Arrays.toString(result));
}
}
- Sample Output:
Array 1: [1, 3, 5, 7]
Array 2: [2, 3, 4, 6, 7]
Merged (HashSet & Sort): [1, 2, 3, 4, 5, 6, 7]
- Stepwise Explanation:
- Initialize an empty
HashSetcalleduniqueElements. - Iterate through
arr1and add each element touniqueElements.HashSetensures only unique values are stored. - Iterate through
arr2and add each element touniqueElements. Any duplicates fromarr2that are already present will not be added again. - Create an
ArrayListfromuniqueElements. At this point, the list contains all unique numbers but might not be in sorted order (asHashSetdoes not preserve insertion order). - Use
Collections.sort(mergedList)to sort theArrayListin ascending order. - Convert the
ArrayListback into anint[]for the final result.
Approach 2: Two-Pointer Approach
This is an efficient approach that takes advantage of the fact that both input arrays are already sorted. It avoids the overhead of hashing and re-sorting by merging elements directly while skipping duplicates.
- Summary: Use two pointers, one for each input array, to compare elements and add the smaller, unique element to a new
ArrayList. Handle cases where elements are equal and when one array is exhausted.
// Merge Two Sorted Arrays Without Duplicates using Two Pointers
import java.util.ArrayList;
import java.util.Arrays;
// Main class containing the entry point of the program
public class Main {
public static void main(String[] args) {
int[] arr1 = {1, 3, 5, 7};
int[] arr2 = {2, 3, 4, 6, 7};
// Step 1: Initialize two pointers for arr1 and arr2
int p1 = 0;
int p2 = 0;
// Step 2: Create an ArrayList to store the merged unique elements
ArrayList<Integer> mergedList = new ArrayList<>();
// Step 3: Traverse both arrays using the pointers
while (p1 < arr1.length && p2 < arr2.length) {
int val1 = arr1[p1];
int val2 = arr2[p2];
if (val1 < val2) {
// Add val1 if it's not a duplicate of the last element added
if (mergedList.isEmpty() || mergedList.get(mergedList.size() - 1) != val1) {
mergedList.add(val1);
}
p1++;
} else if (val2 < val1) {
// Add val2 if it's not a duplicate of the last element added
if (mergedList.isEmpty() || mergedList.get(mergedList.size() - 1) != val2) {
mergedList.add(val2);
}
p2++;
} else { // val1 == val2
// Add one instance of the duplicate value
if (mergedList.isEmpty() || mergedList.get(mergedList.size() - 1) != val1) {
mergedList.add(val1);
}
p1++;
p2++;
}
}
// Step 4: Add remaining elements from arr1 (if any)
while (p1 < arr1.length) {
int val1 = arr1[p1];
if (mergedList.isEmpty() || mergedList.get(mergedList.size() - 1) != val1) {
mergedList.add(val1);
}
p1++;
}
// Step 5: Add remaining elements from arr2 (if any)
while (p2 < arr2.length) {
int val2 = arr2[p2];
if (mergedList.isEmpty() || mergedList.get(mergedList.size() - 1) != val2) {
mergedList.add(val2);
}
p2++;
}
// Step 6: Convert the ArrayList to an array
int[] result = new int[mergedList.size()];
for (int i = 0; i < mergedList.size(); i++) {
result[i] = mergedList.get(i);
}
// Step 7: Print the merged array
System.out.println("Array 1: " + Arrays.toString(arr1));
System.out.println("Array 2: " + Arrays.toString(arr2));
System.out.println("Merged (Two-Pointer): " + Arrays.toString(result));
}
}
- Sample Output:
Array 1: [1, 3, 5, 7]
Array 2: [2, 3, 4, 6, 7]
Merged (Two-Pointer): [1, 2, 3, 4, 5, 6, 7]
- Stepwise Explanation:
- Initialize two pointers,
p1forarr1andp2forarr2, both starting at index 0. - Create an
ArrayListcalledmergedListto store the result. - Loop while both
p1andp2are within their respective array bounds:
- Compare
arr1[p1]andarr2[p2]. - If
arr1[p1]is smaller, add it tomergedList(only if it's not a duplicate of the *last* element already added tomergedList) and incrementp1. - If
arr2[p2]is smaller, add it tomergedList(with the same duplicate check) and incrementp2. - If
arr1[p1]equalsarr2[p2], add one instance of the value tomergedList(with duplicate check) and increment bothp1andp2.
- After the main loop, one of the arrays might have remaining elements. Loop through
arr1fromp1to its end, adding unique remaining elements tomergedList. - Similarly, loop through
arr2fromp2to its end, adding unique remaining elements tomergedList. - Finally, convert
mergedListinto anint[]for the result.
Approach 3: Using TreeSet
The TreeSet is a powerful collection that inherently maintains elements in a sorted order and ensures uniqueness. This approach simplifies the code significantly.
- Summary: Add all elements from both input arrays directly into a
TreeSet. TheTreeSetautomatically handles sorting and duplicate removal. Convert theTreeSetto an array.
// Merge Two Sorted Arrays Without Duplicates using TreeSet
import java.util.Arrays;
import java.util.Set;
import java.util.TreeSet;
// Main class containing the entry point of the program
public class Main {
public static void main(String[] args) {
int[] arr1 = {1, 3, 5, 7};
int[] arr2 = {2, 3, 4, 6, 7};
// Step 1: Create a TreeSet to store unique and sorted elements
Set<Integer> uniqueSortedElements = new TreeSet<>();
// Step 2: Add all elements from arr1 to the TreeSet
for (int num : arr1) {
uniqueSortedElements.add(num);
}
// Step 3: Add all elements from arr2 to the TreeSet
// TreeSet automatically handles duplicates and keeps elements sorted
for (int num : arr2) {
uniqueSortedElements.add(num);
}
// Step 4: Convert the TreeSet to an array
// The elements are already unique and sorted
int[] result = new int[uniqueSortedElements.size()];
int index = 0;
for (int num : uniqueSortedElements) {
result[index++] = num;
}
// Step 5: Print the merged array
System.out.println("Array 1: " + Arrays.toString(arr1));
System.out.println("Array 2: " + Arrays.toString(arr2));
System.out.println("Merged (TreeSet): " + Arrays.toString(result));
}
}
- Sample Output:
Array 1: [1, 3, 5, 7]
Array 2: [2, 3, 4, 6, 7]
Merged (TreeSet): [1, 2, 3, 4, 5, 6, 7]
- Stepwise Explanation:
- Initialize an empty
TreeSetcalleduniqueSortedElements. - Iterate through
arr1and add each element touniqueSortedElements.TreeSetinherently stores elements in ascending order and automatically handles duplicates. - Iterate through
arr2and add each element touniqueSortedElements. Again, duplicates are ignored, and new unique elements are inserted in their correct sorted position. - Iterate through
uniqueSortedElements(which is already sorted and unique) and populate a newint[]with these values. This is the final merged array.
Conclusion
Merging two sorted arrays while eliminating duplicates is a common task with several effective solutions in Java. The choice of approach depends on the specific requirements for performance and code simplicity. For simplicity and conciseness, especially if the arrays are not excessively large, the TreeSet approach is often ideal as it handles both uniqueness and sorting automatically. For scenarios demanding maximum performance for very large arrays, the two-pointer approach is generally more efficient as it avoids the overhead of hashing and object creation associated with HashSet or TreeSet for each element. The HashSet with explicit sorting offers a good balance if you need the fast deduplication of a hash set and are comfortable with a separate sorting step.
Summary
- Problem: Merge two sorted arrays into a single, sorted array without duplicates.
- Approach 1 (
HashSet& Sort): - Adds all elements to a
HashSetfor unique storage. - Converts to
ArrayListand sorts it usingCollections.sort(). - Converts back to an
int[]. Simple to implement, but involves an additional sorting step. - Approach 2 (Two-Pointer):
- Uses two pointers to traverse arrays simultaneously.
- Compares elements and adds the smaller (unique) one to an
ArrayList. - Handles equal elements and remaining elements after one array is exhausted.
- Most efficient in terms of time complexity as it leverages pre-sorted data.
- Approach 3 (
TreeSet): - Adds all elements directly into a
TreeSet. -
TreeSetautomatically maintains elements in sorted order and ensures uniqueness. - Converts the
TreeSetto anint[]. Most concise and elegant solution for clarity.