Minimum Sum Of Absolute Difference Of Given Array In C Program
Calculating the minimum sum of absolute differences in an array is a fundamental problem in optimization and data analysis.
It involves finding a target value such that the total absolute deviation of all array elements from this target is minimized. In this article, you will learn how to efficiently determine this minimum sum by leveraging a key mathematical property.
Problem Statement
Given an array of integers, arr, the objective is to find an integer X such that the sum sum(|arr[i] - X|) for all i in the array is as small as possible. This problem frequently arises in scenarios where you need to identify a "central" value that minimizes total error or cost, often robust to outliers. For example, in facility location problems, X could represent the optimal placement of a new facility to minimize the total travel distance for customers.
Example
Consider the array {1, 2, 9, 2, 7}. If we sort this array, we get {1, 2, 2, 7, 9}. The median of this sorted array is 2 (the element at the middle index (5-1)/2 = 2). Let's calculate the sum of absolute differences using X = 2: |1-2| + |2-2| + |2-2| + |7-2| + |9-2| = 1 + 0 + 0 + 5 + 7 = 13. This value of 13 is the minimum possible sum for this array.
Background & Knowledge Prerequisites
To understand and implement the solutions, familiarity with the following concepts is helpful:
- Basic C Programming: Understanding arrays, loops, and functions.
- Sorting Algorithms: Knowledge of how to sort an array (e.g., using
qsortfromstdlib.hor implementing a simple sort). - Absolute Value: The concept of
|a - b|representing the non-negative difference betweenaandb. - Median: The middle value in a sorted list of numbers. If the list has an odd number of elements, the median is the single middle element. If it has an even number, the median is typically the average of the two middle elements, but for this specific problem, any value between (and including) the two middle elements will yield the minimum sum.
Use Cases or Case Studies
Minimizing the sum of absolute differences has various practical applications across different domains:
- Facility Location: Determining the optimal location for a new facility (e.g., hospital, warehouse, fire station) to minimize the total travel distance for all users or existing points. This often corresponds to finding the median location.
- Robust Statistics (L1 Regression): In statistical modeling, minimizing the sum of absolute errors (MAE - Mean Absolute Error) is a robust regression technique that is less sensitive to outliers compared to minimizing the sum of squared errors (MSE - Mean Squared Error, used in Ordinary Least Squares regression).
- Data Analysis: Finding a robust "center" or representative value for a dataset that is not heavily influenced by extreme values.
- Image Processing: Certain filtering techniques might use this principle to reduce noise while preserving edges, as the median filter is known for its robustness.
- Resource Allocation: Distributing resources among various entities to minimize the overall imbalance or deviation from a target allocation.
Solution Approaches
The fundamental insight for this problem is that the sum of absolute differences is minimized when X is the median of the array.
Approach 1: Understanding the Median Property
The mathematical proof involves considering the derivative of the sum function (or more intuitively, imagining a rubber band model where each arr[i] is a peg and X is a point on a line pulled by rubber bands from each peg). The forces balance out at the median.
- For an array with an odd number of elements: The unique median element
arr[n/2](after sorting) will minimize the sum. - For an array with an even number of elements: Any value
Xbetween the two middle elementsarr[n/2 - 1]andarr[n/2](inclusive) will yield the same minimum sum. For practical implementation, picking eitherarr[n/2 - 1]orarr[n/2]is sufficient ifXmust be an integer.
Approach 2: Sorting and Calculating using the Median
This is the most straightforward and efficient approach to solve the problem.
- One-line summary: Sort the array and select the median element as the target
Xto calculate the minimum sum of absolute differences.
- Code example:
// Minimum Sum of Absolute Differences using Median
#include <stdio.h> // For printf
#include <stdlib.h> // For qsort and abs
// Comparison function for qsort to sort integers in ascending order
int compareIntegers(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
// Example with an odd number of elements
int arr_odd[] = {1, 2, 9, 2, 7};
int n_odd = sizeof(arr_odd) / sizeof(arr_odd[0]);
printf("Original array (odd size): ");
for (int i = 0; i < n_odd; i++) {
printf("%d ", arr_odd[i]);
}
printf("\\n");
// Step 1: Sort the array
qsort(arr_odd, n_odd, sizeof(int), compareIntegers);
// Step 2: Find the median (for odd-sized array, it's the middle element)
int median_odd = arr_odd[n_odd / 2];
// Step 3: Calculate the sum of absolute differences with the median
long long min_sum_odd = 0; // Use long long to prevent overflow for large sums
for (int i = 0; i < n_odd; i++) {
min_sum_odd += abs(arr_odd[i] - median_odd);
}
printf("Sorted array (odd size): ");
for (int i = 0; i < n_odd; i++) {
printf("%d ", arr_odd[i]);
}
printf("\\n");
printf("Median for odd size array: %d\\n", median_odd);
printf("Minimum sum of absolute differences for odd size array: %lld\\n\\n", min_sum_odd);
// Example with an even number of elements
int arr_even[] = {1, 2, 10, 2, 7, 8};
int n_even = sizeof(arr_even) / sizeof(arr_even[0]);
printf("Original array (even size): ");
for (int i = 0; i < n_even; i++) {
printf("%d ", arr_even[i]);
}
printf("\\n");
// Step 1: Sort the array
qsort(arr_even, n_even, sizeof(int), compareIntegers);
// Step 2: Find one of the medians (for even-sized array, arr[n/2 - 1] or arr[n/2] works)
// We choose arr[n/2 - 1] here, but arr[n/2] would yield the same minimum sum.
int median_even = arr_even[n_even / 2 - 1];
// Step 3: Calculate the sum of absolute differences with the chosen median
long long min_sum_even = 0;
for (int i = 0; i < n_even; i++) {
min_sum_even += abs(arr_even[i] - median_even);
}
printf("Sorted array (even size): ");
for (int i = 0; i < n_even; i++) {
printf("%d ", arr_even[i]);
}
printf("\\n");
printf("Median (chosen) for even size array: %d\\n", median_even);
printf("Minimum sum of absolute differences for even size array: %lld\\n", min_sum_even);
return 0;
}
- Sample output:
Original array (odd size): 1 2 9 2 7
Sorted array (odd size): 1 2 2 7 9
Median for odd size array: 2
Minimum sum of absolute differences for odd size array: 13
Original array (even size): 1 2 10 2 7 8
Sorted array (even size): 1 2 2 7 8 10
Median (chosen) for even size array: 2
Minimum sum of absolute differences for even size array: 22
- Stepwise explanation:
- Include Headers: Include
stdio.hfor input/output andstdlib.hforqsortandabs. - Comparison Function: Define
compareIntegersto be used byqsort. This function dictates the sorting order. - Sort the Array: Use
qsortto sort the input array in ascending order. This is a crucial step as the median is only defined for a sorted list. - Find the Median:
- If the array size
nis odd, the median isarr[n / 2].
- If the array size
- If the array size
nis even, any element betweenarr[n / 2 - 1]andarr[n / 2](inclusive) will work. The example picksarr[n / 2 - 1].
- Calculate Sum: Iterate through the sorted array. For each element
arr[i], calculateabs(arr[i] - median)and add it to a running total. Uselong longfor the sum to accommodate potentially large values.
Conclusion
The problem of finding the minimum sum of absolute differences in an array can be efficiently solved by understanding and applying the property of the median. By sorting the array and selecting its median element as the target X, we guarantee the smallest possible sum of deviations. This approach is not only computationally efficient (dominated by the sorting time, typically O(N log N)) but also robust, making it valuable in various analytical and optimization contexts.
Summary
- Problem: Minimize
sum(|arr[i] - X|)for a given arrayarr. - Key Insight: The value
Xthat minimizes this sum is the median of the array. - Solution Steps:
- Sort the input array in ascending order.
- Identify the median element. For odd-sized arrays, it's the middle element. For even-sized arrays, either of the two middle elements (or any value between them) serves as the minimizing
X. - Calculate the sum of absolute differences between each array element and the chosen median.
- Benefits: This method is efficient, robust to outliers, and has widespread applications in fields like statistics, optimization, and data analysis.