Find Median Of Two Sorted Arrays In Java
Finding the median of two sorted arrays is a common algorithmic challenge that tests your understanding of array manipulation and optimization techniques. This problem often appears in technical interviews and is fundamental to efficiently processing merged datasets.
In this article, you will learn how to approach this problem, starting with a straightforward merging strategy and then progressing to a more optimized solution using binary search, complete with detailed explanations and Java code examples.
Problem Statement
Given two sorted arrays, nums1 and nums2, find the median of the two sorted arrays. The overall run time complexity should ideally be O(log(m+n)), where m and n are the lengths of the arrays.
This problem is significant in areas like data analysis, database merging, and statistics, where you often need to find the central tendency of combined, already sorted data without incurring the cost of fully sorting the entire merged set.
Example
Consider the following input:
-
nums1 = [1, 3] -
nums2 = [2]
The combined sorted array would be [1, 2, 3].
The median of this combined array is 2.
Another example:
-
nums1 = [1, 2] -
nums2 = [3, 4]
The combined sorted array would be [1, 2, 3, 4].
The median of this combined array is (2 + 3) / 2 = 2.5.
Background & Knowledge Prerequisites
To effectively understand the solutions, you should be familiar with:
- Java Basics: Variables, loops, conditional statements.
- Arrays: Declaration, initialization, and access.
- Median Definition:
- For an odd-length sorted array, the median is the middle element.
- For an even-length sorted array, the median is the average of the two middle elements.
- Binary Search Concept: Although not strictly required for the first approach, it's crucial for understanding the optimized solution.
Use Cases or Case Studies
Finding the median of two sorted arrays has several practical applications:
- Database Merging: When merging two sorted result sets from different database queries, finding the median value can help understand the central tendency of the combined data without fully materializing and sorting the entire union.
- Statistical Analysis: In fields like finance or social sciences, you might have two distinct datasets (e.g., income levels from two different regions) that are already sorted. Efficiently calculating the median of their combined distribution provides a robust measure of central tendency, less sensitive to outliers than the mean.
- Sensor Data Fusion: Imagine two sensors collecting sorted data streams (e.g., temperature readings over time). To get a combined real-time median of the current readings from both, this algorithm can be applied without waiting for a full merge.
- Online Judge/Interview Problems: This is a classic problem in competitive programming and technical interviews, designed to assess a candidate's problem-solving skills, understanding of data structures, and ability to optimize algorithms.
- Distributed Systems: In a distributed environment, two nodes might each hold a sorted subset of data. Finding the global median might require combining these subsets efficiently without transferring all data to a single location.
Solution Approaches
We will explore two main approaches: a straightforward merge-based solution and a more efficient binary search approach.
Approach 1: Merge and Find (Brute Force)
This approach involves merging the two sorted arrays into a single new sorted array and then finding the median from the merged array.
- Summary: Combine both arrays into a new array, sort it, and then calculate the median based on its length.
// Median of Two Sorted Arrays - Merge and Find
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Example 1
int[] nums1_ex1 = {1, 3};
int[] nums2_ex1 = {2};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex1) + ", nums2=" + Arrays.toString(nums2_ex1));
System.out.println("Median (Merge and Find): " + findMedianSortedArraysMerge(nums1_ex1, nums2_ex1)); // Expected: 2.0
// Example 2
int[] nums1_ex2 = {1, 2};
int[] nums2_ex2 = {3, 4};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex2) + ", nums2=" + Arrays.toString(nums2_ex2));
System.out.println("Median (Merge and Find): " + findMedianSortedArraysMerge(nums1_ex2, nums2_ex2)); // Expected: 2.5
}
public static double findMedianSortedArraysMerge(int[] nums1, int[] nums2) {
// Step 1: Determine the total length of the combined arrays
int m = nums1.length;
int n = nums2.length;
int totalLength = m + n;
// Step 2: Create a new array to store the merged elements
int[] merged = new int[totalLength];
int i = 0, j = 0, k = 0; // Pointers for nums1, nums2, and merged array
// Step 3: Merge nums1 and nums2 into 'merged' while maintaining sorted order
while (i < m && j < n) {
if (nums1[i] <= nums2[j]) {
merged[k++] = nums1[i++];
} else {
merged[k++] = nums2[j++];
}
}
// Step 4: Copy remaining elements from nums1, if any
while (i < m) {
merged[k++] = nums1[i++];
}
// Step 5: Copy remaining elements from nums2, if any
while (j < n) {
merged[k++] = nums2[j++];
}
// Step 6: Calculate the median from the merged array
if (totalLength % 2 == 1) {
// Odd length: median is the middle element
return (double) merged[totalLength / 2];
} else {
// Even length: median is the average of the two middle elements
int mid1 = merged[totalLength / 2 - 1];
int mid2 = merged[totalLength / 2];
return (double) (mid1 + mid2) / 2.0;
}
}
}
- Sample Output:
Input: nums1=[1, 3], nums2=[2]
Median (Merge and Find): 2.0
Input: nums1=[1, 2], nums2=[3, 4]
Median (Merge and Find): 2.5
- Stepwise Explanation:
- Initialize Pointers: Three pointers (
i,j,k) are used to traversenums1,nums2, and themergedarray, respectively. - Merge Arrays: Elements from
nums1andnums2are compared, and the smaller element is added tomerged. The respective pointer is incremented. This continues until one of the arrays is fully traversed. - Copy Remaining Elements: Any remaining elements in
nums1ornums2(if one array was longer than the other) are appended tomerged. - Calculate Median:
- If
totalLengthis odd, the median is the element atmerged[totalLength / 2]. - If
totalLengthis even, the median is the average ofmerged[totalLength / 2 - 1]andmerged[totalLength / 2].
- Time Complexity: O(m+n) for merging, O(1) for finding median. Total: O(m+n).
- Space Complexity: O(m+n) for the new merged array.
Approach 2: Binary Search (Optimized)
This approach aims for O(log(min(m, n))) time complexity by leveraging the sorted nature of the arrays and binary search. The core idea is to find a "partition" in both arrays such that:
- The total number of elements to the left of the partitions equals
(m + n + 1) / 2(for the left half of the median). - All elements in the left half are less than or equal to all elements in the right half.
- Summary: Perform a binary search on the smaller array to find the optimal partition point that divides the combined set of elements into two halves, satisfying the median conditions.
// Median of Two Sorted Arrays - Binary Search
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Example 1
int[] nums1_ex1 = {1, 3};
int[] nums2_ex1 = {2};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex1) + ", nums2=" + Arrays.toString(nums2_ex1));
System.out.println("Median (Binary Search): " + findMedianSortedArraysBinarySearch(nums1_ex1, nums2_ex1)); // Expected: 2.0
// Example 2
int[] nums1_ex2 = {1, 2};
int[] nums2_ex2 = {3, 4};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex2) + ", nums2=" + Arrays.toString(nums2_ex2));
System.out.println("Median (Binary Search): " + findMedianSortedArraysBinarySearch(nums1_ex2, nums2_ex2)); // Expected: 2.5
// Example 3: Empty array
int[] nums1_ex3 = {};
int[] nums2_ex3 = {1};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex3) + ", nums2=" + Arrays.toString(nums2_ex3));
System.out.println("Median (Binary Search): " + findMedianSortedArraysBinarySearch(nums1_ex3, nums2_ex3)); // Expected: 1.0
// Example 4: More complex
int[] nums1_ex4 = {1, 2, 5};
int[] nums2_ex4 = {3, 4, 6, 7, 8};
System.out.println("Input: nums1=" + Arrays.toString(nums1_ex4) + ", nums2=" + Arrays.toString(nums2_ex4));
System.out.println("Median (Binary Search): " + findMedianSortedArraysBinarySearch(nums1_ex4, nums2_ex4)); // Expected: 4.5
}
public static double findMedianSortedArraysBinarySearch(int[] nums1, int[] nums2) {
// Step 1: Ensure nums1 is the shorter array for binary search efficiency.
// This simplifies logic by always performing binary search on the smaller array.
if (nums1.length > nums2.length) {
return findMedianSortedArraysBinarySearch(nums2, nums1);
}
int m = nums1.length;
int n = nums2.length;
// Step 2: Define the range for the binary search on nums1's partition point.
// 'low' represents 0 elements taken from nums1 (all from nums2)
// 'high' represents all elements taken from nums1 (0 from nums2)
int low = 0;
int high = m;
// Step 3: Calculate the required number of elements in the left half of the combined array.
// This handles both odd and even total lengths correctly for partition calculation.
int halfLen = (m + n + 1) / 2;
// Step 4: Perform binary search to find the correct partition
while (low <= high) {
// 'partitionX' is the number of elements taken from nums1 for the left half
int partitionX = (low + high) / 2;
// 'partitionY' is the number of elements taken from nums2 for the left half
int partitionY = halfLen - partitionX;
// Step 5: Determine the maximum element on the left side and minimum on the right side
// Handle edge cases where a partition is at the beginning (0 elements) or end (all elements)
int maxX = (partitionX == 0) ? Integer.MIN_VALUE : nums1[partitionX - 1];
int minX = (partitionX == m) ? Integer.MAX_VALUE : nums1[partitionX];
int maxY = (partitionY == 0) ? Integer.MIN_VALUE : nums2[partitionY - 1];
int minY = (partitionY == n) ? Integer.MAX_VALUE : nums2[partitionY];
// Step 6: Check if the partitions are correct
if (maxX <= minY && maxY <= minX) {
// Correct partition found. The elements are correctly divided.
// Now determine the median based on total length.
if ((m + n) % 2 == 1) {
// Total length is odd, median is the maximum of the left half
return (double) Math.max(maxX, maxY);
} else {
// Total length is even, median is the average of the two middle elements
return (double) (Math.max(maxX, maxY) + Math.min(minX, minY)) / 2.0;
}
} else if (maxX > minY) {
// We have taken too many elements from nums1 (maxX is too large),
// need to move partitionX to the left in nums1.
high = partitionX - 1;
} else { // maxY > minX
// We have taken too few elements from nums1 (maxY is too large, implying minX is too small),
// need to move partitionX to the right in nums1.
low = partitionX + 1;
}
}
// This line should theoretically not be reached if the inputs are valid sorted arrays.
// It indicates an error condition.
throw new IllegalArgumentException("Input arrays are not sorted or invalid.");
}
}
- Sample Output:
Input: nums1=[1, 3], nums2=[2]
Median (Binary Search): 2.0
Input: nums1=[1, 2], nums2=[3, 4]
Median (Binary Search): 2.5
Input: nums1=[], nums2=[1]
Median (Binary Search): 1.0
Input: nums1=[1, 2, 5], nums2=[3, 4, 6, 7, 8]
Median (Binary Search): 4.5
- Stepwise Explanation:
- Swap for Shorter Array: The function first ensures
nums1is the shorter array. This is a common optimization for binary search problems on two arrays, as it guarantees thehighvalue forpartitionXis smaller, leading to fewer iterations. - Initialize Binary Search:
lowandhighdefine the search space forpartitionX(the number of elements to take fromnums1).halfLenis calculated as(m + n + 1) / 2. This is the total number of elements required in the left half of the combined sorted array to find the median. - Binary Search Loop: The loop continues as long as
low <= high.
- Calculate Partitions:
partitionXis the midpoint oflowandhigh.partitionYis then derived fromhalfLen - partitionX. - Identify Boundary Elements:
maxX,minX,maxY,minYrepresent the elements immediately bordering the partitions.maxXis the last element of the left partition fromnums1,minXis the first element of the right partition fromnums1.Integer.MIN_VALUEandInteger.MAX_VALUEare used to handle edge cases where a partition is at the very beginning (0 elements taken) or end (all elements taken) of an array. - Check Condition:
- If
maxX <= minYandmaxY <= minX, the partitions are correct. This means all elements in the combined left half are less than or equal to all elements in the combined right half. - If
(m + n)is odd, the median ismax(maxX, maxY). - If
(m + n)is even, the median is(max(maxX, maxY) + min(minX, minY)) / 2.0. - If
maxX > minY, it meanspartitionXis too far to the right (too many large elements fromnums1are in the left half). Adjusthigh = partitionX - 1. - If
maxY > minX, it meanspartitionXis too far to the left (too many small elements fromnums2are in the left half, implying we need more fromnums1). Adjustlow = partitionX + 1.
- Time Complexity: O(log(min(m, n))) because the binary search is performed on the smaller of the two arrays.
- Space Complexity: O(1) as no auxiliary data structures are created (besides a few variables).
Conclusion
Finding the median of two sorted arrays is a problem with multiple solutions, each offering different trade-offs in terms of time and space complexity. The naive merge-and-find approach is easy to understand and implement but comes with a linear time complexity proportional to the sum of array lengths. The optimized binary search approach, while more intricate to grasp initially, achieves a significantly better logarithmic time complexity, making it the preferred solution for larger datasets. Understanding the binary search approach requires careful consideration of partition points and boundary conditions but yields an efficient algorithm that is highly valued in competitive programming and system design.
Summary
- The problem involves finding the median of two already sorted arrays.
- Median Definition: Middle element for odd length, average of two middle elements for even length.
- Approach 1: Merge and Find
- Mechanism: Combine elements into a new sorted array, then find the median.
- Complexity: O(m+n) time, O(m+n) space. Simple but not optimal.
- Approach 2: Binary Search
- Mechanism: Find optimal partition points in both arrays such that the combined left half elements are less than or equal to combined right half elements. Performs binary search on the shorter array.
- Complexity: O(log(min(m, n))) time, O(1) space. Highly efficient.
- Key Idea for Binary Search: Find
partitionX(fromnums1) andpartitionY(fromnums2) such thatnums1[partitionX-1] <= nums2[partitionY]andnums2[partitionY-1] <= nums1[partitionX]. - Boundary conditions for empty partitions (using
Integer.MIN_VALUEandInteger.MAX_VALUE) are critical for the binary search approach.