Median Of Two Sorted Arrays in C
Finding the median of two sorted arrays is a classic problem that tests your understanding of array manipulation, sorting, and binary search algorithms.
This article will guide you through different approaches to solve this problem efficiently.
In this article, you will learn how to efficiently find the median of two sorted arrays using different strategies, including merging and a more optimized binary search approach.
Problem Statement
Given two sorted arrays, nums1 of size m and nums2 of size n, the task is to find the median of the two arrays combined. The overall run time complexity should ideally be O(log(m+n)).
For instance, if nums1 = [1, 3] and nums2 = [2], the merged array would be [1, 2, 3], and the median is 2. If nums1 = [1, 2] and nums2 = [3, 4], the merged array would be [1, 2, 3, 4], and the median is (2 + 3) / 2 = 2.5. This problem is fundamental in algorithms and data structures, often appearing in coding interviews and competitive programming.
Example
Let's consider nums1 = [1, 5, 9] and nums2 = [2, 4, 6]. The combined sorted list would be [1, 2, 4, 5, 6, 9]. Since there are 6 elements (an even number), the median is the average of the two middle elements (the 3rd and 4th elements): (4 + 5) / 2 = 4.5.
Background & Knowledge Prerequisites
To fully understand the solutions, you should be familiar with:
- Arrays: Basic operations like accessing elements.
- Sorting: The concept of sorted arrays and their properties.
- Median Definition: How to calculate the median for both odd and even numbers of elements in a sorted list.
- Basic C Programming: Loops, conditional statements, functions, and pointer arithmetic.
- Binary Search: The core concept of dividing and conquering a search space.
Use Cases or Case Studies
This problem, or its underlying principles, can be applied in various scenarios:
- Database Merging: When combining sorted result sets from different database queries, finding the median value might be required for statistical analysis.
- Sensor Data Analysis: Merging time-series data from two different sensors, both pre-sorted by timestamp, and needing to find the central tendency (median) of a specific measurement.
- Financial Data Aggregation: Combining sorted lists of stock prices or transaction volumes from different markets to get an overall median for market performance assessment.
- Resource Allocation: In distributed systems, if different nodes report their sorted resource usage, finding the median usage across all nodes can help in balanced load distribution.
- Algorithm Design Foundations: It's a fundamental problem often used as a building block or a test case for understanding advanced divide-and-conquer algorithms.
Solution Approaches
We will explore two primary approaches: a straightforward merge-based solution and a more efficient binary search approach.
Approach 1: Merge and Find Median
This is the most intuitive approach. We create a new array, merge the two sorted arrays into it, and then find the median from the newly merged sorted array.
One-line summary: Merge the two sorted arrays into a new array, then calculate the median from the combined sorted array.
C Code Example:
// Median of Two Sorted Arrays (Merge Approach)
#include <stdio.h>
#include <stdlib.h> // For malloc
// Function to find the median of two sorted arrays by merging
double findMedianSortedArrays_Merge(int* nums1, int nums1Size, int* nums2, int nums2Size) {
// Step 1: Create a new array to store merged elements
int totalSize = nums1Size + nums2Size;
int* merged = (int*)malloc(totalSize * sizeof(int));
if (merged == NULL) {
perror("Failed to allocate memory for merged array");
exit(EXIT_FAILURE);
}
// Step 2: Merge nums1 and nums2 into 'merged'
int i = 0, j = 0, k = 0;
while (i < nums1Size && j < nums2Size) {
if (nums1[i] < nums2[j]) {
merged[k++] = nums1[i++];
} else {
merged[k++] = nums2[j++];
}
}
// Copy remaining elements from nums1, if any
while (i < nums1Size) {
merged[k++] = nums1[i++];
}
// Copy remaining elements from nums2, if any
while (j < nums2Size) {
merged[k++] = nums2[j++];
}
// Step 3: Calculate the median
double median;
if (totalSize % 2 == 1) { // Odd number of elements
median = (double)merged[totalSize / 2];
} else { // Even number of elements
median = (double)(merged[totalSize / 2 - 1] + merged[totalSize / 2]) / 2.0;
}
// Step 4: Free allocated memory
free(merged);
return median;
}
int main() {
// Example 1: Odd total number of elements
int nums1_ex1[] = {1, 3};
int nums2_ex1[] = {2};
int nums1Size_ex1 = sizeof(nums1_ex1) / sizeof(nums1_ex1[0]);
int nums2Size_ex1 = sizeof(nums2_ex1) / sizeof(nums2_ex1[0]);
double median_ex1 = findMedianSortedArrays_Merge(nums1_ex1, nums1Size_ex1, nums2_ex1, nums2Size_ex1);
printf("Example 1: nums1 = {1, 3}, nums2 = {2} -> Median: %.1f\\n", median_ex1); // Expected: 2.0
// Example 2: Even total number of elements
int nums1_ex2[] = {1, 2};
int nums2_ex2[] = {3, 4};
int nums1Size_ex2 = sizeof(nums1_ex2) / sizeof(nums1_ex2[0]);
int nums2Size_ex2 = sizeof(nums2_ex2) / sizeof(nums2_ex2[0]);
double median_ex2 = findMedianSortedArrays_Merge(nums1_ex2, nums1Size_ex2, nums2_ex2, nums2Size_ex2);
printf("Example 2: nums1 = {1, 2}, nums2 = {3, 4} -> Median: %.1f\\n", median_ex2); // Expected: 2.5
// Example 3: Different lengths
int nums1_ex3[] = {0, 0};
int nums2_ex3[] = {0, 0};
int nums1Size_ex3 = sizeof(nums1_ex3) / sizeof(nums1_ex3[0]);
int nums2Size_ex3 = sizeof(nums2_ex3) / sizeof(nums2_ex3[0]);
double median_ex3 = findMedianSortedArrays_Merge(nums1_ex3, nums1Size_ex3, nums2_ex3, nums2Size_ex3);
printf("Example 3: nums1 = {0, 0}, nums2 = {0, 0} -> Median: %.1f\\n", median_ex3); // Expected: 0.0
return 0;
}
Sample Output:
Example 1: nums1 = {1, 3}, nums2 = {2} -> Median: 2.0
Example 2: nums1 = {1, 2}, nums2 = {3, 4} -> Median: 2.5
Example 3: nums1 = {0, 0}, nums2 = {0, 0} -> Median: 0.0
Stepwise Explanation:
- Allocate Memory: A new array
mergedis allocated dynamically to hold allm + nelements. - Merge Arrays: Two pointers (
ifornums1,jfornums2) traverse their respective arrays, comparing elements. The smaller element is added tomergedusing a third pointerk, and its respective array pointer is incremented. - Copy Remaining Elements: After one array is exhausted, any remaining elements from the other array are copied directly to
merged. - Calculate Median:
- If
totalSizeis odd, the median is the element atmerged[totalSize / 2].
- If
- If
totalSizeis even, the median is the average of elements atmerged[totalSize / 2 - 1]andmerged[totalSize / 2].
- Free Memory: The dynamically allocated
mergedarray is freed to prevent memory leaks.
- Time Complexity:
O(m + n)due to merging both arrays. - Space Complexity:
O(m + n)for themergedarray.
Approach 2: Binary Search (Optimal Solution)
This approach aims to find the "partition point" in the two arrays such that when combined, the median can be determined. It leverages the sorted nature of the arrays and uses binary search to achieve O(log(min(m, n))) time complexity.
The core idea is to divide the combined elements into two halves: a left half and a right half. The median will be found at the boundary of these halves. We perform a binary search on the smaller array to find the optimal partition.
One-line summary: Use binary search on the smaller array to find a partition that correctly divides the combined elements into two halves, where the median lies at the boundary.
C Code Example:
// Median of Two Sorted Arrays (Binary Search Approach)
#include <stdio.h>
#include <limits.h> // For INT_MIN, INT_MAX
// Helper function to find the maximum of two integers
int max(int a, int b) {
return (a > b) ? a : b;
}
// Helper function to find the minimum of two integers
int min(int a, int b) {
return (a < b) ? a : b;
}
double findMedianSortedArrays_BinarySearch(int* nums1, int nums1Size, int* nums2, int nums2Size) {
// Step 1: Ensure nums1 is the shorter array for binary search optimization
if (nums1Size > nums2Size) {
return findMedianSortedArrays_BinarySearch(nums2, nums2Size, nums1, nums1Size);
}
int low = 0;
int high = nums1Size; // Binary search range for partition in nums1
int totalLen = nums1Size + nums2Size;
int halfLen = (totalLen + 1) / 2; // Number of elements in the left half
while (low <= high) {
int partitionX = low + (high - low) / 2; // Partition point in nums1
int partitionY = halfLen - partitionX; // Partition point in nums2
// Determine left_max and right_min for both partitions
// If partition is at 0, left_max is INT_MIN
// If partition is at array size, right_min is INT_MAX
int maxX = (partitionX == 0) ? INT_MIN : nums1[partitionX - 1];
int minX = (partitionX == nums1Size) ? INT_MAX : nums1[partitionX];
int maxY = (partitionY == 0) ? INT_MIN : nums2[partitionY - 1];
int minY = (partitionY == nums2Size) ? INT_MAX : nums2[partitionY];
// Check if we found the correct partition
if (maxX <= minY && maxY <= minX) {
// Correct partition found
// Median calculation depends on total length (even/odd)
if (totalLen % 2 == 1) { // Odd total length
return (double)max(maxX, maxY);
} else { // Even total length
return (double)(max(maxX, maxY) + min(minX, minY)) / 2.0;
}
} else if (maxX > minY) {
// PartitionX is too far to the right, need to move left in nums1
high = partitionX - 1;
} else { // maxY > minX
// PartitionX is too far to the left, need to move right in nums1
low = partitionX + 1;
}
}
// Should not reach here if inputs are valid sorted arrays
return 0.0;
}
int main() {
// Example 1: Odd total number of elements
int nums1_ex1[] = {1, 3};
int nums2_ex1[] = {2};
int nums1Size_ex1 = sizeof(nums1_ex1) / sizeof(nums1_ex1[0]);
int nums2Size_ex1 = sizeof(nums2_ex1) / sizeof(nums2_ex1[0]);
double median_ex1 = findMedianSortedArrays_BinarySearch(nums1_ex1, nums1Size_ex1, nums2_ex1, nums2Size_ex1);
printf("Example 1: nums1 = {1, 3}, nums2 = {2} -> Median: %.1f\\n", median_ex1); // Expected: 2.0
// Example 2: Even total number of elements
int nums1_ex2[] = {1, 2};
int nums2_ex2[] = {3, 4};
int nums1Size_ex2 = sizeof(nums1_ex2) / sizeof(nums1_ex2[0]);
int nums2Size_ex2 = sizeof(nums2_ex2) / sizeof(nums2_ex2[0]);
double median_ex2 = findMedianSortedArrays_BinarySearch(nums1_ex2, nums1Size_ex2, nums2_ex2, nums2Size_ex2);
printf("Example 2: nums1 = {1, 2}, nums2 = {3, 4} -> Median: %.1f\\n", median_ex2); // Expected: 2.5
// Example 3: Different lengths
int nums1_ex3[] = {0, 0};
int nums2_ex3[] = {0, 0};
int nums1Size_ex3 = sizeof(nums1_ex3) / sizeof(nums1_ex3[0]);
int nums2Size_ex3 = sizeof(nums2_ex3) / sizeof(nums2_ex3[0]);
double median_ex3 = findMedianSortedArrays_BinarySearch(nums1_ex3, nums1Size_ex3, nums2_ex3, nums2Size_ex3);
printf("Example 3: nums1 = {0, 0}, nums2 = {0, 0} -> Median: %.1f\\n", median_ex3); // Expected: 0.0
// Example 4: One empty array
int nums1_ex4[] = {};
int nums2_ex4[] = {2, 3};
int nums1Size_ex4 = sizeof(nums1_ex4) / sizeof(nums1_ex4[0]);
int nums2Size_ex4 = sizeof(nums2_ex4) / sizeof(nums2_ex4[0]);
double median_ex4 = findMedianSortedArrays_BinarySearch(nums1_ex4, nums1Size_ex4, nums2_ex4, nums2Size_ex4);
printf("Example 4: nums1 = {}, nums2 = {2, 3} -> Median: %.1f\\n", median_ex4); // Expected: 2.5
// Example 5: Larger case
int nums1_ex5[] = {1, 5, 9};
int nums2_ex5[] = {2, 4, 6};
int nums1Size_ex5 = sizeof(nums1_ex5) / sizeof(nums1_ex5[0]);
int nums2Size_ex5 = sizeof(nums2_ex5) / sizeof(nums2_ex5[0]);
double median_ex5 = findMedianSortedArrays_BinarySearch(nums1_ex5, nums1Size_ex5, nums2_ex5, nums2Size_ex5);
printf("Example 5: nums1 = {1, 5, 9}, nums2 = {2, 4, 6} -> Median: %.1f\\n", median_ex5); // Expected: 4.5
return 0;
}
Sample Output:
Example 1: nums1 = {1, 3}, nums2 = {2} -> Median: 2.0
Example 2: nums1 = {1, 2}, nums2 = {3, 4} -> Median: 2.5
Example 3: nums1 = {0, 0}, nums2 = {0, 0} -> Median: 0.0
Example 4: nums1 = {}, nums2 = {2, 3} -> Median: 2.5
Example 5: nums1 = {1, 5, 9}, nums2 = {2, 4, 6} -> Median: 4.5
Stepwise Explanation:
- Handle Shorter Array: The function first ensures that
nums1is always the shorter array. This optimization helps keep the binary search space smaller, leading toO(log(min(m, n))). - Initialize Pointers:
lowandhighdefine the search space forpartitionX(the split point innums1).totalLenis the combined length, andhalfLenis the number of elements required in the left half of the combined sorted array. - Binary Search Loop: The
while (low <= high)loop performs the binary search.- Calculate Partitions:
partitionXis the current mid-point innums1.partitionYis calculated to ensure thatpartitionX + partitionY = halfLen. This makes sure the left half always contains the correct number of elements.
- Calculate Partitions:
- Determine Boundary Values:
maxX: The last element in the left part ofnums1. IfpartitionXis 0, it means no elements fromnums1are in the left part, so we useINTMIN.minX: The first element in the right part ofnums1. IfpartitionXisnums1Size, it means all elements fromnums1are in the left part, so we useINTMAX.maxYandminY: Similarly determined fornums2.- Check Partition Condition: A correct partition satisfies
maxX <= minYandmaxY <= minX. This means all elements in the left half (maxXandmaxY) are less than or equal to all elements in the right half (minXandminY). - If Correct:
- If
totalLenis odd, the median is the largest element in the left half:max(maxX, maxY). - If
totalLenis even, the median is the average of the largest element in the left half and the smallest element in the right half:(max(maxX, maxY) + min(minX, minY)) / 2.0. - If
maxX > minY: ThepartitionXis too far to the right, meaningmaxXis too large. We need to shift the partition left innums1(high = partitionX - 1). - If
maxY > minX: ThepartitionXis too far to the left, meaningmaxYis too large. We need to shift the partition right innums1(low = partitionX + 1).
- Return Value: The function returns the calculated median. The
return 0.0after the loop is a fallback for invalid inputs, though typically not reached with valid sorted arrays.
- Time Complexity:
O(log(min(m, n)))because the binary search is performed on the smaller of the two arrays. - Space Complexity:
O(1)as no extra arrays are used for merging.
Conclusion
We've explored two methods to find the median of two sorted arrays:
- Merging: Simple to understand and implement, but has a time complexity of
O(m + n)and space complexity ofO(m + n). - Binary Search: More complex to grasp initially, but offers a significant performance improvement with
O(log(min(m, n)))time complexity andO(1)space complexity, making it the preferred optimal solution for larger datasets.
Choosing the right approach depends on the constraints of your problem, primarily the size of the arrays and performance requirements. For typical interview scenarios, the binary search approach is expected due to its efficiency.
Summary
Here's a quick recap of the key takeaways:
- Problem: Find the median of two sorted arrays
nums1(sizem) andnums2(sizen). - Median Definition: Middle element for odd total length, average of two middle elements for even total length.
- Approach 1: Merge and Find
- Mechanism: Combine
nums1andnums2into a new sorted array. - Complexity:
O(m + n)time,O(m + n)space. - Approach 2: Binary Search (Optimal)
- Mechanism: Perform binary search on the smaller array to find an ideal partition point that separates the combined arrays into two halves.
- Condition:
max(lefthalf) <= min(righthalf). - Complexity:
O(log(min(m, n)))time,O(1)space. - Edge Cases: Handle empty partitions using
INTMINandINTMAX.
Test Your Understanding
Which approach would you choose if memory was extremely constrained but time was not a critical factor (e.g., very small arrays)? Why?
Try implementing the binary search solution in a different programming language (e.g., Python or Java) and test it with various edge cases like empty arrays or arrays with identical elements.