Minimum Sum Of Absolute Difference Of Given Array in C
Finding the Minimum Sum of Absolute Differences in an Array
In this article, you will learn how to find a target value x in an array such that the sum of absolute differences between x and every element in the array is minimized. We'll explore different approaches, from brute force to an optimized median-based solution.
Problem Statement
Given an array of integers arr, we need to find an integer x (which may or may not be present in arr) such that the sum S = Σ |arr[i] - x| for all i from 0 to n-1 is as small as possible. This problem is fundamental in statistics, data analysis, and optimization, often appearing in scenarios like facility location or finding a robust central tendency measure.
Example
Consider the array arr = [1, 2, 4]. If we choose x = 2, the sum of absolute differences is: |1 - 2| + |2 - 2| + |4 - 2| = |-1| + |0| + |2| = 1 + 0 + 2 = 3. As we will discover, x = 2 (the median of the array) indeed yields the minimum sum for this example.
Background & Knowledge Prerequisites
To effectively understand this article, you should be familiar with:
- Basic array operations in C.
- The concept of absolute value (
abs()function). - Sorting algorithms (like
qsortin C). - Basic understanding of time complexity (O(N), O(N log N)).
For C programming examples, we'll use standard library headers like , , and (for abs which is in stdlib.h for integers).
Use Cases or Case Studies
Finding the minimum sum of absolute differences has several practical applications:
- Facility Location: Imagine placing a new hospital or factory along a street where existing facilities are located. The goal is to minimize the total travel distance (sum of absolute differences) for all current facilities to the new one.
- Robust Statistics: The median is known to be a more robust measure of central tendency than the mean when dealing with outliers. This problem demonstrates why the median minimizes L1 distance (sum of absolute differences).
- Data Compression/Quantization: In some data processing tasks, you might need to represent a set of values with a single "representative" value that minimizes the total error, where error is measured by absolute difference.
- Image Processing: In certain image filtering or noise reduction techniques, local neighborhoods of pixels might be processed to find a central value that minimizes absolute differences to its neighbors.
Solution Approaches
We will explore two main approaches: a straightforward brute-force method and an optimized method leveraging a key mathematical property.
Approach 1: Brute Force Iteration
One-line summary: Iterate through all possible x values within the range of array elements, calculate the sum of absolute differences for each, and find the minimum.
Explanation: The most intuitive way to solve this problem is to try every possible integer x within the range of values present in the array. Since the optimal x will always lie between the minimum and maximum values of the array, we can iterate from min(arr) to max(arr). For each x, we compute the sum of |arr[i] - x| for all elements and keep track of the x that yields the smallest sum.
Code Example:
// Minimum Sum Absolute Difference - Brute Force
#include <stdio.h>
#include <stdlib.h> // For abs()
// Function to find the minimum value in an array
int findMin(int arr[], int n) {
int minVal = arr[0];
for (int i = 1; i < n; i++) {
if (arr[i] < minVal) {
minVal = arr[i];
}
}
return minVal;
}
// Function to find the maximum value in an array
int findMax(int arr[], int n) {
int maxVal = arr[0];
for (int i = 1; i < n; i++) {
if (arr[i] > maxVal) {
maxVal = arr[i];
}
}
return maxVal;
}
int main() {
int arr[] = {2, 9, 1, 5, 7};
int n = sizeof(arr) / sizeof(arr[0]);
// Step 1: Find the minimum and maximum elements in the array
int minElement = findMin(arr, n);
int maxElement = findMax(arr, n);
long long minTotalSum = -1; // Use long long for sum to prevent overflow
int optimal_x = -1;
// Step 2: Iterate through all possible 'x' values from minElement to maxElement
for (int x = minElement; x <= maxElement; x++) {
long long currentSum = 0;
// Step 3: Calculate the sum of absolute differences for the current 'x'
for (int i = 0; i < n; i++) {
currentSum += abs(arr[i] - x);
}
// Step 4: Update minTotalSum if currentSum is smaller
if (minTotalSum == -1 || currentSum < minTotalSum) {
minTotalSum = currentSum;
optimal_x = x;
}
}
printf("Array: [");
for (int i = 0; i < n; i++) {
printf("%d%s", arr[i], (i == n - 1) ? "" : ", ");
}
printf("]\\n");
printf("Minimum sum of absolute differences is %lld when x = %d\\n", minTotalSum, optimal_x);
return 0;
}
Sample Output:
Array: [2, 9, 1, 5, 7]
Minimum sum of absolute differences is 14 when x = 5
Stepwise Explanation:
- Find Range: Determine the minimum and maximum values present in the input array. This defines the search space for our optimal
x. - Iterate
x: Loop through every integer fromminElementtomaxElement. Each of these integers is a candidate forx. - Calculate Sum: For each candidate
x, iterate through the entire input arrayarrand calculateabs(arr[i] - x). Sum these absolute differences. - Track Minimum: Keep a running minimum of the sums calculated and the
xvalue that produced it.
Complexity Analysis: If R is the range (maxElement - minElement + 1), and N is the number of elements in the array, the time complexity is O(R * N). This can be inefficient if the range of values is large.
Approach 2: Using the Median Property (Optimal)
One-line summary: Sort the array, and the median element(s) will give the minimum sum of absolute differences.
Explanation: A powerful mathematical property states that the value x that minimizes the sum Σ |arr[i] - x| is the median of the array.
- If the array has an odd number of elements, there is a unique median, and that is the optimal
x. - If the array has an even number of elements, any value
xbetween the two middle elements (inclusive) will yield the same minimum sum. Typically, either of the two middle elements is chosen.
The intuition behind this is that for every element arr[i] less than x and every element arr[j] greater than x, moving x towards arr[i] reduces |arr[j] - x| but increases |arr[i] - x| (or vice-versa). The median is the "balancing point" where the number of elements less than x equals the number of elements greater than x, thus minimizing the total "pull" from both sides.
Code Example:
// Minimum Sum Absolute Difference - Median Approach
#include <stdio.h>
#include <stdlib.h> // For qsort() and abs()
// Comparison function for qsort
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
int arr[] = {2, 9, 1, 5, 7};
int n = sizeof(arr) / sizeof(arr[0]);
// Step 1: Sort the array
qsort(arr, n, sizeof(int), compare);
// Step 2: Find the median element
int optimal_x;
if (n % 2 == 1) {
// For odd number of elements, the median is at the middle index
optimal_x = arr[n / 2];
} else {
// For even number of elements, either of the two middle elements works.
// We'll pick the first one for consistency.
optimal_x = arr[n / 2 - 1]; // Or arr[n / 2]
}
// Step 3: Calculate the sum of absolute differences with the median
long long minTotalSum = 0;
for (int i = 0; i < n; i++) {
minTotalSum += abs(arr[i] - optimal_x);
}
printf("Original Array: [2, 9, 1, 5, 7]\\n");
printf("Sorted Array: [");
for (int i = 0; i < n; i++) {
printf("%d%s", arr[i], (i == n - 1) ? "" : ", ");
}
printf("]\\n");
printf("Minimum sum of absolute differences is %lld when x = %d\\n", minTotalSum, optimal_x);
// Example with even number of elements
int arr_even[] = {1, 2, 3, 4};
int n_even = sizeof(arr_even) / sizeof(arr_even[0]);
qsort(arr_even, n_even, sizeof(int), compare);
int optimal_x_even = arr_even[n_even / 2 - 1]; // First median (2)
long long sum_even_1 = 0;
for (int i = 0; i < n_even; i++) {
sum_even_1 += abs(arr_even[i] - optimal_x_even);
}
int optimal_x_even_2 = arr_even[n_even / 2]; // Second median (3)
long long sum_even_2 = 0;
for (int i = 0; i < n_even; i++) {
sum_even_2 += abs(arr_even[i] - optimal_x_even_2);
}
printf("\\nOriginal Array: [1, 2, 3, 4]\\n");
printf("Sorted Array: [");
for (int i = 0; i < n_even; i++) {
printf("%d%s", arr_even[i], (i == n_even - 1) ? "" : ", ");
}
printf("]\\n");
printf("For even sized array, choosing x = %d yields sum %lld\\n", optimal_x_even, sum_even_1);
printf("For even sized array, choosing x = %d yields sum %lld\\n", optimal_x_even_2, sum_even_2);
return 0;
}
Sample Output:
Original Array: [2, 9, 1, 5, 7]
Sorted Array: [1, 2, 5, 7, 9]
Minimum sum of absolute differences is 14 when x = 5
Original Array: [1, 2, 3, 4]
Sorted Array: [1, 2, 3, 4]
For even sized array, choosing x = 2 yields sum 4
For even sized array, choosing x = 3 yields sum 4
Stepwise Explanation:
- Sort the Array: Use a sorting algorithm (like
qsortin C) to arrange the array elements in non-decreasing order. This takesO(N log N)time. - Find the Median:
- If
Nis odd, the median isarr[N / 2].
- If
- If
Nis even, anyxin the range[arr[N / 2 - 1], arr[N / 2]]will work. We can pickarr[N / 2 - 1]orarr[N / 2]as theoptimalx.
- Calculate Sum: Compute the sum of absolute differences
Σ |arr[i] - optimalx|using the chosen median. This takesO(N)time.
Complexity Analysis: The dominant step is sorting, so the overall time complexity is O(N log N). This is significantly more efficient than the brute-force approach, especially for large arrays with a wide range of values.
Conclusion
Finding the minimum sum of absolute differences is a classic optimization problem with broad applications. While a brute-force approach can solve it, it quickly becomes impractical for larger datasets. The key insight lies in recognizing that the median of the array is the optimal choice for x. By sorting the array and selecting the median, we can solve this problem efficiently with O(N log N) time complexity.
Summary
- The problem seeks an
xthat minimizesΣ |arr[i] - x|. - Brute Force Approach:
- Iterates through all possible
xvalues between the array's min and max elements. - Calculates the sum for each
x. - Time Complexity:
O(R * N), whereRis the range of values andNis array size. - Optimal Approach (Median):
- Sort the array.
- The median element(s) is the optimal
x. - If
Nis odd,x = arr[N / 2]. - If
Nis even, anyxbetweenarr[N / 2 - 1]andarr[N / 2]works. - Time Complexity:
O(N log N)due to sorting. - The median approach is significantly more efficient and is the standard solution for this problem.
Test Your Understanding!
- Given the array
[10, 2, 5, 8, 3]:- What is the optimal
xthat minimizes the sum of absolute differences?
- What is the optimal
- What is the minimum sum of absolute differences?
- Why is the brute-force approach less efficient than the median approach for a very large array with values ranging from 1 to 1,000,000?