C Program For Merge Sort With Time Complexity
Efficiently sorting data is fundamental in computer science, impacting everything from database performance to search algorithms. As datasets grow, the need for robust and scalable sorting methods becomes critical.
In this article, you will learn how to implement Merge Sort in C, understand its underlying mechanism, and analyze its time complexity.
Problem Statement
Sorting an array of elements means arranging them in a specific order, typically ascending or descending. While various sorting algorithms exist, many struggle with performance on large datasets or in specific scenarios. The challenge lies in finding a stable, efficient sorting algorithm that performs consistently well, even for large inputs, and handles different data distributions effectively.
Example
Consider an unsorted array of integers: {12, 11, 13, 5, 6, 7}.
After applying Merge Sort, the array will be sorted in ascending order: {5, 6, 7, 11, 12, 13}.
Background & Knowledge Prerequisites
To understand Merge Sort, readers should be familiar with:
- C Language Basics: Variables, arrays, loops, functions.
- Recursion: Understanding how functions call themselves.
- Pointers: Basic knowledge of pointer usage for array manipulation.
- Divide and Conquer Paradigm: The core concept behind Merge Sort, involving breaking down a problem into smaller subproblems, solving them, and combining their solutions.
Use Cases or Case Studies
Merge Sort is a versatile algorithm used in various applications:
- External Sorting: When data to be sorted is too large to fit into memory, Merge Sort can efficiently sort chunks of data on disk and then merge them.
- Parallel Sorting: Its divide-and-conquer nature makes it well-suited for parallel processing environments, as subproblems can be sorted independently.
- Inversion Count Problem: Merge Sort can be modified to count the number of inversions in an array in O(n log n) time.
- Stable Sorting: It's a stable sorting algorithm, meaning it preserves the relative order of equal elements. This is crucial in applications where additional properties of elements matter.
- Guaranteed Performance: Unlike some other sorting algorithms (e.g., Quick Sort in its worst case), Merge Sort consistently offers O(n log n) performance, making it reliable for critical systems.
Solution Approaches
Merge Sort operates on the "divide and conquer" principle:
- Divide: Split the unsorted list into n sublists, each containing one element (a list of one element is considered sorted).
- Conquer (Sort): Recursively divide the list into two halves until individual elements are reached.
- Combine (Merge): Repeatedly merge sublists to produce new sorted sublists until there is only one sorted list remaining.
The core of Merge Sort lies in its merge function, which efficiently combines two already sorted subarrays into a single sorted array.
Approach 1: Recursive Merge Sort Implementation
This approach demonstrates the standard recursive implementation of Merge Sort using a helper function to merge sorted subarrays.
One-line summary: Recursively divides the array into halves, sorts each half, and then merges the sorted halves back together.
// Merge Sort Implementation
#include <stdio.h>
#include <stdlib.h> // For malloc and free
// Function to merge two sorted subarrays
// arr[left...mid] and arr[mid+1...right]
void merge(int arr[], int left, int mid, int right) {
int i, j, k;
int n1 = mid - left + 1;
int n2 = right - mid;
// Create temporary arrays
int *L = (int *)malloc(n1 * sizeof(int));
int *R = (int *)malloc(n2 * sizeof(int));
// Copy data to temp arrays L[] and R[]
for (i = 0; i < n1; i++)
L[i] = arr[left + i];
for (j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];
// Merge the temp arrays back into arr[left...right]
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = left; // Initial index of merged subarray
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
// Copy the remaining elements of L[], if any
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
// Copy the remaining elements of R[], if any
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
// Free the dynamically allocated memory
free(L);
free(R);
}
// Main function that implements Merge Sort
// left is for left index and right is right index of the sub-array of arr to be sorted
void mergeSort(int arr[], int left, int right) {
if (left < right) {
// Same as (left + right) / 2, but avoids overflow for large left and right
int mid = left + (right - left) / 2;
// Sort first and second halves
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
// Utility function to print an array
void printArray(int arr[], int size) {
for (int i = 0; i < size; i++)
printf("%d ", arr[i]);
printf("\\n");
}
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);
printf("Given array is \\n");
printArray(arr, arr_size);
mergeSort(arr, 0, arr_size - 1);
printf("Sorted array is \\n");
printArray(arr, arr_size);
return 0;
}
Sample Output:
Given array is
12 11 13 5 6 7
Sorted array is
5 6 7 11 12 13
Stepwise explanation for clarity:
mergeSort(arr, left, right)Function:
- This is the recursive function that implements the "divide" part.
- Base Case: If
leftis greater than or equal toright, it means the subarray has 0 or 1 element, which is inherently sorted. The recursion stops. - Divide: Calculates the
midpoint of the current subarray. - Conquer: Recursively calls
mergeSortfor the left half (arr[left...mid]) and the right half (arr[mid+1...right]). This continues until single-element subarrays are reached. - Combine: Once the recursive calls return (meaning the subarrays are sorted), it calls the
mergefunction to combine the two sorted halves into a single sorted subarray.
merge(arr, left, mid, right)Function:
- This function is the heart of Merge Sort, responsible for combining two sorted subarrays.
- It takes
arr, and the start (left), middle (mid), and end (right) indices defining the two sorted halvesarr[left...mid]andarr[mid+1...right]. - Create Temp Arrays: Two temporary arrays,
LandR, are created to hold the elements of the left and right halves, respectively. Dynamic memory allocation (malloc) is used to handle arrays of varying sizes. - Copy Data: Elements from the original array's specified ranges are copied into
LandR. - Merge:
- Three pointers
i,j, andkare initialized.iforL,jforR, andkfor the originalarr(starting atleft). - Elements from
LandRare compared. The smaller element is placed intoarr[k], and its respective pointer (iorj) andkare incremented. - This continues until one of the temporary arrays is exhausted.
- Copy Remaining Elements: Any remaining elements in
LorR(if one array finished earlier) are copied directly into the mainarr. - Free Memory: The dynamically allocated memory for
LandRis freed to prevent memory leaks.
Time Complexity Analysis
The time complexity of Merge Sort can be analyzed using the recurrence relation:
T(n) = 2T(n/2) + O(n)
Where:
-
T(n)is the time to sort an array of sizen. -
2T(n/2)represents the time to recursively sort the two halves of the array. -
O(n)represents the time taken by themergeoperation, which needs to iterate through allnelements to combine the two sorted halves.
Using the Master Theorem or by tracing the recursion tree, we can determine the time complexity:
- Best Case: O(n log n)
- Average Case: O(n log n)
- Worst Case: O(n log n)
This consistent O(n log n) performance across all cases is one of Merge Sort's most significant advantages, as it guarantees efficient sorting regardless of the initial arrangement of elements. The log n factor arises from the number of times the array is divided until single elements are reached (the depth of the recursion tree), and the n factor comes from the work done at each level of the recursion (the merging process).
Conclusion
Merge Sort is a powerful and reliable sorting algorithm, characterized by its consistent O(n log n) time complexity in all cases. Its "divide and conquer" strategy, coupled with an efficient merging process, makes it an excellent choice for large datasets, external sorting, and applications requiring stable sorting. While it uses additional space for temporary arrays during merging, its predictable performance often outweighs this drawback.
Summary
- Merge Sort is a divide and conquer algorithm.
- It consistently has a time complexity of O(n log n) in best, average, and worst cases.
- The algorithm involves two main steps: dividing the array into halves and merging sorted halves.
- It is a stable sorting algorithm, preserving the relative order of equal elements.
- Requires O(n) auxiliary space for temporary arrays during the merge operation.
- Suitable for large datasets and external sorting scenarios.