Topological Sort In Data Structure And Algorithm In C++
Topological sort is a fundamental algorithm used to order elements in a directed acyclic graph (DAG). It arranges nodes linearly such that for every directed edge from node A to node B, A comes before B in the ordering. In this article, you will learn about topological sort, its underlying principles, practical applications, and two primary C++ implementations.
Problem Statement
Many real-world systems involve tasks or items with dependencies. For example, to build a software project, certain modules must be compiled before others. Similarly, completing advanced courses often requires prerequisite knowledge. The problem is to find a linear ordering of these tasks or items such that all dependencies are respected. This ordering is precisely what topological sort provides, but it is only applicable to graphs that are directed and contain no cycles (DAGs). If a cycle exists, a valid topological ordering cannot be formed, as a cyclic dependency implies an unbreakable loop.
Example
Consider a simple project where tasks have dependencies: Task A must be completed before Task B. Task A must be completed before Task C. Task B must be completed before Task D. Task C must be completed before Task D.
A possible topological sort for these tasks would be: A, B, C, D (or A, C, B, D)
This sequence ensures all prerequisites are met before proceeding to subsequent tasks.
Background & Knowledge Prerequisites
To understand topological sort effectively, familiarity with the following concepts is beneficial:
- Graphs: Understanding of nodes (vertices) and edges.
- Directed Graphs: Edges have a specific direction (e.g., A -> B).
- Acyclic Graphs: Graphs that contain no cycles, meaning there's no path that starts and ends at the same node by traversing edges in their specified direction.
- Adjacency List: A common representation for graphs where each node stores a list of its directly connected neighbors.
- Basic C++: Knowledge of vectors, loops, functions, and recursion.
- Depth First Search (DFS) / Breadth First Search (BFS): Graph traversal algorithms that form the basis for topological sort implementations.
Use Cases
Topological sort is incredibly versatile and finds application in various domains:
- Project Scheduling: Determining the correct order of tasks in a project plan, respecting dependencies between tasks.
- Course Prerequisite Systems: Ordering university courses so that all prerequisites for a course are taken before the course itself.
- Compiler Instruction Scheduling: Optimizing the order of operations in compiled code to improve performance, while respecting data dependencies.
- Dependency Resolution in Software Builds: Tools like Make or Maven use topological sort to determine the order in which modules or files should be built.
- Data Serialization: Ordering objects for serialization where some objects depend on others.
Solution Approaches
There are two primary algorithms for computing a topological sort: Kahn's Algorithm (based on BFS) and the DFS-based algorithm. Both produce a valid topological ordering for a DAG.
1. Kahn's Algorithm (BFS-based)
This algorithm iteratively removes nodes that have no incoming edges, adding them to the topological order.
One-line summary: Identifies nodes with an in-degree of zero, adds them to a queue, and processes them by decrementing the in-degree of their neighbors until all nodes are processed.
Code Example:
// Kahn's Algorithm for Topological Sort
#include <iostream>
#include <vector>
#include <queue>
#include <map>
using namespace std;
// Function to perform Kahn's algorithm for topological sort
vector<int> topologicalSortKahn(int numNodes, const vector<vector<int>>& adj) {
// Step 1: Calculate in-degrees for all nodes
map<int, int> inDegree;
for (int i = 0; i < numNodes; ++i) {
inDegree[i] = 0; // Initialize all nodes with in-degree 0
}
for (int u = 0; u < numNodes; ++u) {
for (int v : adj[u]) {
inDegree[v]++; // Increment in-degree for each neighbor
}
}
// Step 2: Initialize queue with all nodes having in-degree 0
queue<int> q;
for (int i = 0; i < numNodes; ++i) {
if (inDegree[i] == 0) {
q.push(i);
}
}
// Step 3: Process nodes from the queue
vector<int> result;
while (!q.empty()) {
int u = q.front();
q.pop();
result.push_back(u); // Add current node to topological order
// For all neighbors of u, decrement their in-degree
for (int v : adj[u]) {
inDegree[v]--;
// If a neighbor's in-degree becomes 0, add it to the queue
if (inDegree[v] == 0) {
q.push(v);
}
}
}
// Step 4: Check for cycle (if result size is not equal to numNodes)
if (result.size() != numNodes) {
// This indicates a cycle in the graph, so topological sort is not possible.
// For this example, we'll return an empty vector to signify this.
return {};
}
return result;
}
int main() {
// Example Graph (0->1, 0->2, 1->3, 2->3)
// Nodes: 0, 1, 2, 3
int numNodes = 4;
vector<vector<int>> adj(numNodes);
adj[0].push_back(1);
adj[0].push_back(2);
adj[1].push_back(3);
adj[2].push_back(3);
cout << "Topological Sort using Kahn's Algorithm:" << endl;
vector<int> sortedOrder = topologicalSortKahn(numNodes, adj);
if (sortedOrder.empty() && numNodes > 0) { // Check if empty due to cycle or empty graph
cout << "Graph contains a cycle, topological sort not possible." << endl;
} else {
for (int node : sortedOrder) {
cout << node << " ";
}
cout << endl;
}
// Another Example Graph (5->0, 4->0, 5->2, 4->1, 2->3, 3->1)
// Nodes: 0, 1, 2, 3, 4, 5
numNodes = 6;
adj.assign(numNodes, vector<int>()); // Clear and resize
adj[5].push_back(0);
adj[5].push_back(2);
adj[4].push_back(0);
adj[4].push_back(1);
adj[2].push_back(3);
adj[3].push_back(1);
cout << "\\nTopological Sort using Kahn's Algorithm (Another Example):" << endl;
sortedOrder = topologicalSortKahn(numNodes, adj);
if (sortedOrder.empty() && numNodes > 0) {
cout << "Graph contains a cycle, topological sort not possible." << endl;
} else {
for (int node : sortedOrder) {
cout << node << " ";
}
cout << endl;
}
return 0;
}
Sample Output:
Topological Sort using Kahn's Algorithm:
0 1 2 3
Topological Sort using Kahn's Algorithm (Another Example):
4 5 0 2 3 1
Stepwise Explanation:
- Calculate In-Degrees: For every node, count the number of incoming edges. This is its "in-degree."
- Initialize Queue: Add all nodes with an in-degree of zero to a queue. These are the starting points in the topological order.
- Process Queue:
- Dequeue a node
u. Adduto the result list. - For each neighbor
vofu(i.e., for every edgeu -> v), decrementv's in-degree. - If
v's in-degree becomes zero, enqueuev.
- Cycle Detection: If, after the process, the size of the result list is less than the total number of nodes, it means there was a cycle in the graph, and a topological sort is not possible.
2. Depth First Search (DFS)-based Algorithm
This approach leverages the properties of DFS to produce a topological ordering.
One-line summary: Performs a DFS traversal, adding nodes to the topological order list only after all their dependent nodes (nodes reachable from them) have been visited and added.
Code Example:
// DFS-based Algorithm for Topological Sort
#include <iostream>
#include <vector>
#include <stack> // For storing the result in correct order
#include <map> // For visited and recursion stack
using namespace std;
// Recursive DFS helper function for topological sort
void dfsTopologicalSort(int u, const vector<vector<int>>& adj,
map<int, bool>& visited, map<int, bool>& recursionStack,
stack<int>& resultStack, bool& hasCycle) {
visited[u] = true;
recursionStack[u] = true; // Mark as part of current recursion path
for (int v : adj[u]) {
if (hasCycle) return; // If a cycle is detected, stop further processing
if (!visited[v]) {
dfsTopologicalSort(v, adj, visited, recursionStack, resultStack, hasCycle);
} else if (recursionStack[v]) {
// If 'v' is visited and is in the current recursion stack, it's a back edge -> cycle detected
hasCycle = true;
return;
}
}
recursionStack[u] = false; // Remove from current recursion path
resultStack.push(u); // Add node to the stack after all its dependencies are processed
}
// Function to perform DFS-based topological sort
vector<int> topologicalSortDFS(int numNodes, const vector<vector<int>>& adj) {
map<int, bool> visited;
map<int, bool> recursionStack; // To detect cycles
stack<int> resultStack; // Stores the topological order in reverse
bool hasCycle = false;
// Initialize visited and recursionStack for all nodes
for (int i = 0; i < numNodes; ++i) {
visited[i] = false;
recursionStack[i] = false;
}
// Call DFS for all unvisited nodes
for (int i = 0; i < numNodes; ++i) {
if (!visited[i]) {
dfsTopologicalSort(i, adj, visited, recursionStack, resultStack, hasCycle);
if (hasCycle) {
// Return empty vector to indicate a cycle
return {};
}
}
}
// Pop elements from the stack to get the correct topological order
vector<int> result;
while (!resultStack.empty()) {
result.push_back(resultStack.top());
resultStack.pop();
}
return result;
}
int main() {
// Example Graph (0->1, 0->2, 1->3, 2->3)
// Nodes: 0, 1, 2, 3
int numNodes = 4;
vector<vector<int>> adj(numNodes);
adj[0].push_back(1);
adj[0].push_back(2);
adj[1].push_back(3);
adj[2].push_back(3);
cout << "Topological Sort using DFS-based Algorithm:" << endl;
vector<int> sortedOrder = topologicalSortDFS(numNodes, adj);
if (sortedOrder.empty() && numNodes > 0) {
cout << "Graph contains a cycle, topological sort not possible." << endl;
} else {
for (int node : sortedOrder) {
cout << node << " ";
}
cout << endl;
}
// Another Example Graph (5->0, 4->0, 5->2, 4->1, 2->3, 3->1)
// Nodes: 0, 1, 2, 3, 4, 5
numNodes = 6;
adj.assign(numNodes, vector<int>()); // Clear and resize
adj[5].push_back(0);
adj[5].push_back(2);
adj[4].push_back(0);
adj[4].push_back(1);
adj[2].push_back(3);
adj[3].push_back(1);
cout << "\\nTopological Sort using DFS-based Algorithm (Another Example):" << endl;
sortedOrder = topologicalSortDFS(numNodes, adj);
if (sortedOrder.empty() && numNodes > 0) {
cout << "Graph contains a cycle, topological sort not possible." << endl;
} else {
for (int node : sortedOrder) {
cout << node << " ";
}
cout << endl;
}
// Example with a cycle: 0->1, 1->2, 2->0
numNodes = 3;
adj.assign(numNodes, vector<int>());
adj[0].push_back(1);
adj[1].push_back(2);
adj[2].push_back(0);
cout << "\\nTopological Sort using DFS-based Algorithm (Cycle Example):" << endl;
sortedOrder = topologicalSortDFS(numNodes, adj);
if (sortedOrder.empty() && numNodes > 0) {
cout << "Graph contains a cycle, topological sort not possible." << endl;
} else {
for (int node : sortedOrder) {
cout << node << " ";
}
cout << endl;
}
return 0;
}
Sample Output:
Topological Sort using DFS-based Algorithm:
0 1 2 3
Topological Sort using DFS-based Algorithm (Another Example):
5 4 2 3 0 1
Topological Sort using DFS-based Algorithm (Cycle Example):
Graph contains a cycle, topological sort not possible.
Stepwise Explanation:
- DFS Traversal: Perform a DFS traversal on the graph. Maintain two states for each node:
visited(whether the node has been visited in any DFS) andrecursionStack(whether the node is currently in the recursion stack of the *current* DFS path). - Order Generation: During the DFS, instead of processing a node when it's first visited, add it to a stack (or a list in reverse order) *after* all its neighbors (and their respective subgraphs) have been fully explored. This ensures that a node is added to the order only after all its dependencies have been accounted for.
- Cycle Detection: If, during the DFS, you encounter a node that is already in the
recursionStack(meaning it's an ancestor in the current DFS path), a back edge is found, indicating a cycle in the graph. In this case, a topological sort is not possible. - Final Order: Once the DFS for all components is complete, pop elements from the stack. The popped sequence will be the topological sort.
Conclusion
Topological sort is a crucial algorithm for ordering tasks and items based on their dependencies within a Directed Acyclic Graph. Kahn's Algorithm, based on BFS, offers an iterative solution by managing in-degrees, while the DFS-based approach uses recursion to ensure dependencies are met before a node is added to the final order. Both methods effectively solve the problem of linearizing dependencies, provided no cycles exist in the graph.
Summary
- Topological sort orders nodes in a Directed Acyclic Graph (DAG) such that A appears before B if there's an edge A → B.
- It's used for task scheduling, course prerequisites, and dependency resolution.
- Two main algorithms:
- Kahn's Algorithm (BFS-based): Uses in-degrees and a queue, iteratively processing nodes with no incoming edges.
- DFS-based Algorithm: Uses a recursive DFS traversal, adding nodes to a stack after all their descendants have been visited, then reversing the stack.
- Both algorithms detect cycles; if a cycle exists, a topological sort is not possible.