Count Number Of Distinct Elements In An Array In Java Program
In this article, you will learn how to efficiently count the number of distinct (unique) elements present in an array using various programming approaches in Java. Understanding this concept is fundamental for data processing and analysis tasks.
Problem Statement
The problem involves determining the total count of unique values within a given array. For instance, in an array [1, 2, 2, 3, 1], the distinct elements are 1, 2, 3, resulting in a count of 3. This task is common in scenarios where duplicate data needs to be identified or ignored for statistical analysis or resource optimization.
Example
Consider the integer array: [10, 20, 10, 30, 40, 20, 50]
The distinct elements in this array are 10, 20, 30, 40, 50.
The expected count of distinct elements is 5.
Background & Knowledge Prerequisites
To effectively understand the solutions presented, a basic grasp of the following Java concepts is beneficial:
- Arrays: How to declare, initialize, and access elements in a Java array.
- Loops:
forloops for iterating over array elements. - Collections Framework: Familiarity with interfaces like
Set(specificallyHashSet) and how they handle unique elements. - Stream API (Optional but Recommended): Basic understanding of Java 8 streams for more concise solutions.
Use Cases or Case Studies
Counting distinct elements is a practical operation in many real-world scenarios:
- User Analytics: Determining the number of unique visitors to a website or unique users who performed a specific action.
- Inventory Management: Identifying the number of unique product types in a warehouse.
- Data Deduplication: Finding the count of unique records in a dataset before processing them to save storage or computation.
- Database Query Optimization: Understanding the cardinality of columns to optimize query plans.
- Fraud Detection: Counting unique transaction IDs or IP addresses to detect unusual patterns.
Solution Approaches
Here are three effective methods to count distinct elements in a Java array.
Approach 1: Using a HashSet
This approach leverages the HashSet collection, which by definition stores only unique elements.
- One-line summary: Add all array elements to a
HashSetand then return its size.
// CountDistinctElementsUsingHashSet
import java.util.HashSet;
import java.util.Set;
public class Main {
public static void main(String[] args) {
// Step 1: Define the input array
int[] arr = {10, 20, 10, 30, 40, 20, 50};
// Step 2: Create a HashSet to store distinct elements
Set<Integer> distinctElements = new HashSet<>();
// Step 3: Iterate through the array and add each element to the HashSet
// HashSet automatically handles duplicates; only unique elements are added
for (int element : arr) {
distinctElements.add(element);
}
// Step 4: The size of the HashSet represents the count of distinct elements
int distinctCount = distinctElements.size();
// Step 5: Print the result
System.out.println("Original Array: " + java.util.Arrays.toString(arr));
System.out.println("Number of distinct elements (HashSet): " + distinctCount);
}
}
Sample Output:
Original Array: [10, 20, 10, 30, 40, 20, 50]
Number of distinct elements (HashSet): 5
Stepwise Explanation:
- An integer array
arris initialized with sample values. - A
HashSetnameddistinctElementsis created.HashSetensures that it stores only unique values. If you try to add an element that already exists, it simply ignores the add operation without error. - A
for-eachloop iterates through eachelementin thearr. - Inside the loop,
distinctElements.add(element)attempts to add the current element to the set. Duplicates are automatically prevented. - Finally,
distinctElements.size()returns the total number of unique elements that were successfully added to the set, which is our desired count.
Approach 2: Using Sorting and Iteration
This method involves sorting the array first, which brings identical elements together, making it easier to count distinct values by iterating through the sorted array.
- One-line summary: Sort the array and then iterate, incrementing a counter only when the current element is different from the previous one.
// CountDistinctElementsUsingSorting
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Step 1: Define the input array
int[] arr = {10, 20, 10, 30, 40, 20, 50};
// Step 2: Handle edge cases for empty or single-element arrays
if (arr.length == 0) {
System.out.println("Original Array: " + Arrays.toString(arr));
System.out.println("Number of distinct elements (Sorting): 0");
return;
}
// Step 3: Sort the array
Arrays.sort(arr);
// Step 4: Initialize distinct count and iterate through the sorted array
int distinctCount = 1; // At least one element is distinct if array is not empty
for (int i = 1; i < arr.length; i++) {
// If the current element is different from the previous one, it's a new distinct element
if (arr[i] != arr[i-1]) {
distinctCount++;
}
}
// Step 5: Print the result
System.out.println("Original Array: " + Arrays.toString(arr));
System.out.println("Sorted Array: " + Arrays.toString(arr));
System.out.println("Number of distinct elements (Sorting): " + distinctCount);
}
}
Sample Output:
Original Array: [10, 20, 10, 30, 40, 20, 50]
Sorted Array: [10, 10, 20, 20, 30, 40, 50]
Number of distinct elements (Sorting): 5
Stepwise Explanation:
- The input array
arris defined. - Edge cases for empty arrays are handled. If the array is empty, the distinct count is 0. If it has one element, the count is 1.
Arrays.sort(arr)sorts the array in ascending order. Now, identical elements are adjacent.distinctCountis initialized to1because if the array is not empty, the first element is always distinct.- A
forloop iterates from the second element (i = 1) to the end of the array. - Inside the loop,
if (arr[i] != arr[i-1])checks if the current element is different from its immediate predecessor. If they are different, it means a new distinct element has been encountered, anddistinctCountis incremented.
Approach 3: Using Java Stream API
The Java Stream API (introduced in Java 8) provides a concise and functional way to perform operations on collections, including finding distinct elements.
- One-line summary: Convert the array to a stream, apply the
distinct()operation, and then count the resulting elements.
// CountDistinctElementsUsingStreamAPI
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Step 1: Define the input array
int[] arr = {10, 20, 10, 30, 40, 20, 50};
// Step 2: Convert the array to an IntStream
// Step 3: Apply the distinct() operation to get unique elements
// Step 4: Use count() to get the total number of distinct elements
long distinctCount = Arrays.stream(arr) // Creates an IntStream from the array
.distinct() // Returns a stream with unique elements
.count(); // Counts the elements in the distinct stream
// Step 5: Print the result
System.out.println("Original Array: " + Arrays.toString(arr));
System.out.println("Number of distinct elements (Stream API): " + distinctCount);
}
}
Sample Output:
Original Array: [10, 20, 10, 30, 40, 20, 50]
Number of distinct elements (Stream API): 5
Stepwise Explanation:
- The input array
arris defined. Arrays.stream(arr)converts the primitiveint[]array into anIntStream..distinct()is an intermediate operation that returns a new stream containing only the unique elements from the upstream (in this case,arr). It internally usesObject.equals()to determine uniqueness..count()is a terminal operation that counts the number of elements in the stream and returns alongvalue.- The final
distinctCountholds the desired result.
Conclusion
Counting distinct elements in an array is a common programming challenge with multiple efficient solutions in Java. The HashSet approach offers excellent performance and simplicity for most cases, while the Stream API provides a highly concise and readable solution for modern Java development. The sorting method is a good educational exercise and can be efficient for certain data distributions, although it modifies the original array (or requires a copy) and might be slower than HashSet for very large datasets due to sorting overhead.
Summary
- Problem: Identify and count unique values in an array.
- HashSet Approach:
- Method: Add all elements to a
HashSet. - Result:
HashSet.size()gives the distinct count. - Pros: Very efficient (average O(N) time complexity), simple to implement.
- Cons: Requires extra space for the
HashSet. - Sorting Approach:
- Method: Sort the array, then iterate, counting new elements.
- Result: Counter value after iteration.
- Pros: Doesn't require extra space proportional to distinct elements (only for sorting algorithm), good for in-place modification.
- Cons: Modifies the original array, higher time complexity (O(N log N) due to sorting).
- Stream API Approach:
- Method:
Arrays.stream(array).distinct().count(). - Result: The
longcount returned by thecount()method. - Pros: Concise, readable, functional style.
- Cons: Can have a slight overhead compared to direct
HashSetmanipulation for very small arrays, but generally highly optimized.