Topological Sort In Data Structure And Algorithm
In a world where tasks often depend on others, understanding the order in which to complete them is crucial. Topological sorting offers a systematic way to arrange these interdependent tasks, ensuring all prerequisites are met before proceeding. In this article, you will learn how topological sort works, its core algorithms, and practical applications.
Problem Statement
Many real-world scenarios involve a sequence of activities where some tasks must finish before others can begin. Examples include course prerequisites, project dependencies, or compiler instruction scheduling. The problem is to find a linear ordering of the vertices (tasks) in a directed acyclic graph (DAG) such that for every directed edge from vertex u to vertex v, u comes before v in the ordering. If the graph contains a cycle, a topological sort is impossible, as it implies an infinite loop of dependencies.
Example
Consider a set of courses and their prerequisites:
- Course A has no prerequisites.
- Course B has no prerequisites.
- Course C requires Course A and Course B.
- Course D requires Course B.
- Course E requires Course C and Course D.
- Course F requires Course E.
A valid topological sort, representing one possible order to take these courses, would be:
A -> B -> D -> C -> E -> F
Another valid sort could be:
B -> A -> C -> D -> E -> F
Background & Knowledge Prerequisites
To understand topological sort, readers should be familiar with:
- Graph Theory Basics: Understanding of vertices (nodes) and edges.
- Directed Graphs: Graphs where edges have a direction (e.g., A -> B means A leads to B, but not necessarily B leads to A).
- Acyclic Graphs: Graphs that contain no cycles. A Directed Acyclic Graph (DAG) is essential for topological sorting.
- Adjacency List/Matrix: Common ways to represent graphs.
- Depth-First Search (DFS) or Breadth-First Search (BFS): These graph traversal algorithms form the basis for topological sort implementations.
Use Cases or Case Studies
Topological sort is a fundamental algorithm with diverse applications:
- Task Scheduling: Ordering jobs or tasks in a project where some tasks depend on the completion of others. For instance, in software build systems like Makefiles, it determines the order to compile files.
- Course Scheduling: Determining a valid sequence of courses for a student, respecting prerequisite requirements.
- Dependency Resolution: In package management systems (e.g.,
apt,npm), it helps install packages in the correct order based on their dependencies. - Compiler Optimization: Ordering intermediate code instructions to ensure data dependencies are met.
- Data Serialization: Ordering objects for serialization where some objects contain references to others that must be serialized first.
Solution Approaches
There are two primary algorithms for topological sorting: one based on Depth-First Search (DFS) and another based on computing in-degrees (Kahn's Algorithm, which uses a BFS-like approach).
Approach 1: DFS-based Algorithm
This approach uses a modified DFS traversal. It visits each node and recursively visits all its dependent nodes. Once all dependencies of a node have been processed, the node is added to the topological order (typically prepended to a list or pushed onto a stack).
- Summary: Perform a DFS traversal. When a node's DFS call completes (meaning all its adjacent nodes have been visited and processed), add the node to the front of a list or push it onto a stack. This effectively processes nodes after all their dependencies.
- Code Example:
// DFS-based Topological Sort
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX_VERTICES 100
// Adjacency list representation
typedef struct Node {
int vertex;
struct Node* next;
} Node;
Node* adj[MAX_VERTICES];
int num_vertices;
bool visited[MAX_VERTICES];
int stack[MAX_VERTICES];
int stack_idx = 0;
void add_edge(int u, int v) {
Node* newNode = (Node*)malloc(sizeof(Node));
newNode->vertex = v;
newNode->next = adj[u];
adj[u] = newNode;
}
void init_graph(int vertices) {
num_vertices = vertices;
for (int i = 0; i < num_vertices; i++) {
adj[i] = NULL;
visited[i] = false;
}
stack_idx = 0;
}
void dfs_topological_sort(int u) {
visited[u] = true;
Node* current = adj[u];
while (current != NULL) {
int v = current->vertex;
if (!visited[v]) {
dfs_topological_sort(v);
}
current = current->next;
}
// Push current vertex to stack after all its dependencies are visited
stack[stack_idx++] = u;
}
void topological_sort() {
for (int i = 0; i < num_vertices; i++) {
if (!visited[i]) {
dfs_topological_sort(i);
}
}
printf("DFS-based Topological Sort Order: ");
for (int i = stack_idx - 1; i >= 0; i--) { // Print in reverse order to get correct topological order
printf("%d ", stack[i]);
}
printf("\\n");
}
int main() {
// Step 1: Initialize graph with 6 vertices (0-5)
init_graph(6);
// Step 2: Add edges for the example: A->C, B->C, B->D, C->E, D->E, E->F
// Mapping: A=0, B=1, C=2, D=3, E=4, F=5
add_edge(0, 2); // A -> C
add_edge(1, 2); // B -> C
add_edge(1, 3); // B -> D
add_edge(2, 4); // C -> E
add_edge(3, 4); // D -> E
add_edge(4, 5); // E -> F
// Step 3: Perform topological sort
topological_sort();
// Clean up allocated memory (optional for simple examples, good practice for larger programs)
for (int i = 0; i < num_vertices; i++) {
Node* current = adj[i];
while (current != NULL) {
Node* temp = current;
current = current->next;
free(temp);
}
}
return 0;
}
- Sample Output:
DFS-based Topological Sort Order: 1 0 3 2 4 5
- Stepwise Explanation:
- Initialization: Create an adjacency list for the graph, a
visitedarray (initialized to false), and an emptystackto store the topological order. - Iterate Vertices: Loop through all vertices from 0 to
num_vertices - 1. If a vertexuhas not been visited, start a DFS fromu. - DFS Traversal:
- Mark the current vertex
uas visited. - Recursively call
dfs_topological_sortfor all unvisited neighborsvofu. - After all descendants of
uhave been visited and processed (i.e., the recursive calls return), pushuonto thestack.
- Result: The elements popped from the
stack(or printed in reverse order of pushing) will give one valid topological sort.
Approach 2: Kahn's Algorithm (BFS-based)
Kahn's algorithm uses the concept of "in-degree," which is the number of incoming edges to a vertex. It iteratively removes vertices with an in-degree of zero.
- Summary: Calculate the in-degree for all vertices. Initialize a queue with all vertices that have an in-degree of zero. While the queue is not empty, dequeue a vertex, add it to the topological order, and for each of its neighbors, decrement their in-degree. If a neighbor's in-degree becomes zero, enqueue it.
- Code Example:
// Kahn's Algorithm (BFS-based) Topological Sort
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX_VERTICES 100
// Adjacency list representation
typedef struct Node {
int vertex;
struct Node* next;
} Node;
Node* adj[MAX_VERTICES];
int num_vertices_kahn;
int in_degree[MAX_VERTICES];
int kahn_queue[MAX_VERTICES];
int front = 0, rear = 0;
void add_edge_kahn(int u, int v) {
Node* newNode = (Node*)malloc(sizeof(Node));
newNode->vertex = v;
newNode->next = adj[u];
adj[u] = newNode;
in_degree[v]++; // Increment in-degree for the destination vertex
}
void init_graph_kahn(int vertices) {
num_vertices_kahn = vertices;
for (int i = 0; i < num_vertices_kahn; i++) {
adj[i] = NULL;
in_degree[i] = 0;
}
front = 0;
rear = 0;
}
void topological_sort_kahn() {
int count = 0;
int result[MAX_VERTICES];
// Step 1: Initialize queue with all vertices having in-degree 0
for (int i = 0; i < num_vertices_kahn; i++) {
if (in_degree[i] == 0) {
kahn_queue[rear++] = i;
}
}
// Step 2: Process nodes in the queue
while (front < rear) {
int u = kahn_queue[front++];
result[count++] = u;
// For each neighbor v of u
Node* current = adj[u];
while (current != NULL) {
int v = current->vertex;
in_degree[v]--; // Decrement in-degree of neighbor
// If in-degree becomes 0, add to queue
if (in_degree[v] == 0) {
kahn_queue[rear++] = v;
}
current = current->next;
}
}
// Step 3: Check for cycles
if (count != num_vertices_kahn) {
printf("Kahn's Algorithm: Graph contains a cycle, topological sort impossible.\\n");
} else {
printf("Kahn's Algorithm Topological Sort Order: ");
for (int i = 0; i < count; i++) {
printf("%d ", result[i]);
}
printf("\\n");
}
}
int main() {
// Step 1: Initialize graph with 6 vertices (0-5)
init_graph_kahn(6);
// Step 2: Add edges for the example: A->C, B->C, B->D, C->E, D->E, E->F
// Mapping: A=0, B=1, C=2, D=3, E=4, F=5
add_edge_kahn(0, 2); // A -> C
add_edge_kahn(1, 2); // B -> C
add_edge_kahn(1, 3); // B -> D
add_edge_kahn(2, 4); // C -> E
add_edge_kahn(3, 4); // D -> E
add_edge_kahn(4, 5); // E -> F
// Step 3: Perform topological sort
topological_sort_kahn();
// Clean up allocated memory
for (int i = 0; i < num_vertices_kahn; i++) {
Node* current = adj[i];
while (current != NULL) {
Node* temp = current;
current = current->next;
free(temp);
}
}
return 0;
}
- Sample Output:
Kahn's Algorithm Topological Sort Order: 0 1 3 2 4 5
- Stepwise Explanation:
- Calculate In-Degrees: For every vertex, calculate its in-degree by counting incoming edges.
- Initialize Queue: Create a queue and add all vertices with an in-degree of 0 to it. These are the starting points as they have no prerequisites.
- Process Queue:
- While the queue is not empty, dequeue a vertex
u. - Add
uto the topological order list. - For each neighbor
vofu(i.e., for every edgeu -> v): - Decrement the in-degree of
v. - If the in-degree of
vbecomes 0, enqueuev.
- Cycle Detection: If, at the end, the number of vertices in the topological order is less than the total number of vertices in the graph, it indicates that a cycle exists, and a complete topological sort is not possible.
Conclusion
Topological sort is a powerful algorithm for ordering tasks or events in directed acyclic graphs. Both the DFS-based approach and Kahn's algorithm (BFS-based) provide efficient ways to achieve this. Understanding these methods is key to solving a variety of dependency-driven problems in computing and beyond.
Summary
- Purpose: Linearly order vertices in a Directed Acyclic Graph (DAG) such that for every edge
u -> v,uappears beforev. - Requirement: Only applicable to DAGs; if a cycle exists, topological sort is impossible.
- DFS-based Approach:
- Uses a modified Depth-First Search.
- Adds vertices to a stack *after* all their descendants have been visited.
- The final order is obtained by popping elements from the stack.
- Kahn's Algorithm (BFS-based):
- Relies on computing and updating in-degrees of vertices.
- Starts with vertices having an in-degree of zero, adds them to a queue.
- Processes nodes, decrements neighbors' in-degrees, and adds new zero in-degree nodes to the queue.
- Applications: Task scheduling, dependency resolution, course prerequisites, compiler optimization.