Replace Each Element Of The Array By Its Rank In The Array
An array's elements often carry intrinsic value, but sometimes their relative standing—their "rank"—is more informative.
Understanding how to replace actual values with their ranks can simplify analysis and data representation.
In this article, you will learn how to replace each element of an array with its rank, where identical values share the same rank and ranks are 1-based.
Problem Statement
The task is to transform an array by replacing each numerical element with its "rank" within the array. The rank is defined as the 1-based position an element would have if the array were sorted. If multiple elements have the same value, they should all be assigned the same rank.
This problem is crucial in scenarios like competitive scoring, data compression, or when you need to understand the relative position of data points rather than their absolute magnitudes. For instance, knowing that a student scored in the "top 10%" (rank-based) might be more insightful than just knowing their raw score.
Example
Let's consider an input array: [10, 20, 10, 30, 5]
The desired output, after replacing elements with their 1-based ranks, would be: [2, 3, 2, 4, 1]
Here's why:
- The smallest distinct value is
5, so it gets rank1. - The next distinct value is
10. Both10s get rank2. - The next distinct value is
20, so it gets rank3. - The largest distinct value is
30, so it gets rank4.
Background & Knowledge Prerequisites
To follow along with the solutions, you should have a basic understanding of:
- C Language Fundamentals: Variables, data types, arrays, loops, and functions.
- Structures (structs): How to define and use custom data structures in C.
- Pointers: Basic pointer arithmetic and how they are used with arrays and functions.
- Sorting Algorithms: Specifically, familiarity with the
qsortstandard library function and how to provide a custom comparison function.
Use Cases or Case Studies
- Competitive Ranking: In a competition, raw scores might vary wildly, but converting them to ranks (e.g., 1st, 2nd, 3rd) provides a standardized way to evaluate performance relative to others.
- Data Analysis and Visualization: When dealing with skewed data distributions, ranking can normalize the data, making it easier to compare different datasets or visualize trends without extreme outliers dominating the view.
- Sparse Data Representation: In some algorithms, the absolute values of data points are less important than their relative ordering. Replacing values with ranks can sometimes simplify algorithms or reduce memory usage if ranks can be stored more compactly.
- Feature Engineering in Machine Learning: Ranking numerical features can sometimes improve model performance by making the features less sensitive to outliers and preserving the ordinal relationship between values.
Solution Approaches
We will explore a robust method that effectively handles duplicate values by assigning them the same rank.
Approach 1: Using Value-Index Pairs and Sorting
This approach involves creating temporary data structures to keep track of the original positions while sorting values. This ensures that ranks are correctly mapped back to their original array positions.
One-line Summary: Create auxiliary pairs of (value, original index), sort them by value, and then iterate to assign 1-based ranks back to the original positions, handling duplicates appropriately.
Code Example:
// Replace Array Elements by Rank
#include <stdio.h>
#include <stdlib.h> // Required for qsort
// Structure to hold an element's value and its original index
typedef struct {
int value;
int original_index;
} Element;
// Comparison function for qsort: sorts Element structs by their 'value'
int compareElements(const void *a, const void *b) {
return ((Element *)a)->value - ((Element *)b)->value;
}
// Function to replace array elements by their ranks
void replaceWithRanks(int arr[], int n) {
// Step 1: Create an array of Element structs
Element *elements = (Element *)malloc(n * sizeof(Element));
if (elements == NULL) {
fprintf(stderr, "Memory allocation failed!\\n");
return;
}
// Step 2: Populate the elements array with values and original indices
for (int i = 0; i < n; i++) {
elements[i].value = arr[i];
elements[i].original_index = i;
}
// Step 3: Sort the elements array based on their values
qsort(elements, n, sizeof(Element), compareElements);
// Step 4: Create a temporary array to store ranks based on original indices
int *ranks = (int *)malloc(n * sizeof(int));
if (ranks == NULL) {
fprintf(stderr, "Memory allocation failed!\\n");
free(elements);
return;
}
// Step 5: Assign ranks to the original positions
int current_rank = 1;
for (int i = 0; i < n; i++) {
// If it's not the first element and current value is different from previous
if (i > 0 && elements[i].value != elements[i-1].value) {
current_rank++;
}
// Store the current_rank at the original position of this element
ranks[elements[i].original_index] = current_rank;
}
// Step 6: Replace the original array elements with their calculated ranks
for (int i = 0; i < n; i++) {
arr[i] = ranks[i];
}
// Step 7: Free dynamically allocated memory
free(elements);
free(ranks);
}
int main() {
// Original array
int arr[] = {10, 20, 10, 30, 5, 20};
int n = sizeof(arr) / sizeof(arr[0]);
printf("Original array: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Replace elements with their ranks
replaceWithRanks(arr, n);
printf("Array with ranks: ");
for (int i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\\n");
// Another example
int arr2[] = {7, 3, 7, 1, 9, 3};
int n2 = sizeof(arr2) / sizeof(arr2[0]);
printf("Original array 2: ");
for (int i = 0; i < n2; i++) {
printf("%d ", arr2[i]);
}
printf("\\n");
replaceWithRanks(arr2, n2);
printf("Array 2 with ranks: ");
for (int i = 0; i < n2; i++) {
printf("%d ", arr2[i]);
}
printf("\\n");
return 0;
}
Sample Output:
Original array: 10 20 10 30 5 20
Array with ranks: 2 3 2 4 1 3
Original array 2: 7 3 7 1 9 3
Array 2 with ranks: 3 2 3 1 4 2
Stepwise Explanation:
- Define
ElementStructure: Astruct Elementis created to store both thevalueof an array element and itsoriginalindex. This is crucial because sorting will change the order of elements, but we need to know where each element originally came from. - Comparison Function:
compareElementsis a standard comparison function required byqsort. It takes twovoidpointers, casts them toElement, and returns the difference of theirvaluefields for ascending order sorting. - Allocate and Populate
elementsArray: Dynamic memory is allocated for an array ofElementstructs, equal to the size of the input array. Thiselementsarray is then populated by iterating through the originalarr, storing each value and its index. - Sort
elementsArray: Theqsortfunction is called to sort theelementsarray. This arranges all elements based on theirvaluein ascending order, while keeping theiroriginalindexintact within eachElementstruct. - Allocate
ranksArray: Another dynamic arrayranksis allocated. This array will temporarily hold the calculated ranks, mapped to their original indices, before updating the input array. - Assign Ranks:
- A
currentrankvariable is initialized to1.
- A
- We iterate through the sorted*
elementsarray. - For each element:
- If it's not the first element and its
valueis different from the previous element'svalue(in the sorted list), it means we've encountered a new distinct value, socurrentrankis incremented. - The
currentrankis then stored in theranksarray at the position indicated byelements[i].originalindex. This ensures the rank goes to the correct spot.
- Update Original Array: Finally, we iterate through the
ranksarray and copy its values back into the originalarr, effectively replacing each element with its calculated rank. - Free Memory: Dynamically allocated memory for
elementsandranksis freed to prevent memory leaks.
Conclusion
Replacing array elements with their ranks is a common and useful transformation in data processing. The detailed approach using auxiliary value-index pairs and sorting, as demonstrated, provides a robust and efficient way to achieve this, correctly handling duplicate values by assigning them the same rank. This method ensures that the relative ordering of elements is captured, offering a foundation for further analysis or specific algorithm requirements.
Summary
- Problem: Replace array elements with their 1-based ranks, where identical values share the same rank.
- Key Idea: Use a temporary data structure to store both element value and original index to preserve mapping during sorting.
- Steps for Solution:
- Create
(value, originalindex)pairs for each array element. - Sort these pairs based on their values.
- Iterate through the sorted pairs to assign ranks: increment rank only when a new distinct value is encountered.
- Store the assigned rank back into a result array at the
originalindex. - Update the original array with the calculated ranks.
- Use Cases: Competitive ranking, data normalization, sparse data representation, feature engineering.
- Prerequisites: C language basics, structs,
qsort, dynamic memory allocation.
Challenge Your Understanding!
- Modify the
replaceWithRanksfunction to assign "skip ranks". This means if there are three elements with rank 2, the next unique value would get rank 5 (1 + 3 + 1). For example,[10, 20, 10, 30]would become[2, 4, 2, 5](10 gets rank 2, 20 gets rank 4 because 10 appears twice, 30 gets rank 5). - What would happen if you skipped step 7 (freeing memory) in the
replaceWithRanksfunction? Why is it important in C? - Consider an array with negative numbers or zero. Will the provided solution work correctly? Why or why not?