In part 1 of this series I looked at common search and sort algorithms used on lists. Part 2 focused on hash functions, sets and maps. Part 3 covered trees, depth and breadth first search and self-balancing trees. In this post I will look at graphs, their representations, traversal algorithms and Eulerian paths.
What is a Graph
A graph is a data structure designed to show relationships between objects. Sometimes called a network, a graph consists of a set of nodes (also called vertices) connected by edges. If you’ve read the previous post on trees, you’ll recognise that a tree is actually a specific type of graph.
Unlike trees, graphs can contain cycles, which means there’s no concept of a root node. You can start at one node, follow a series of edges and end up back where you started. Edges can also contain data, often representing the “strength” or “weight” of a relationship between two nodes.
Directions and Cycles
Edges in a graph can have directions, forming a directed graph. In a directed graph it’s possible to have two edges between the same pair of nodes pointing in opposite directions. An undirected graph has no sense of direction on its edges.
A cycle occurs when you can start at one node, follow a series of edges and arrive back at the starting node. Cycles can cause infinite loops during traversal, so many graph algorithms require the input graph to be acyclic. A directed acyclic graph (DAG) is a commonly encountered structure that combines directed edges with the guarantee of no cycles.
Connectivity
A disconnected graph contains vertices that can’t be reached from certain other vertices. In contrast, a connected graph has some path between every pair of vertices. Connectivity is a measure of the minimum number of elements that need to be removed for a graph to become disconnected, effectively measuring how “strong” the graph is.
For directed graphs there’s a further distinction. A weakly connected directed graph is only connected if all directed edges are replaced with undirected ones. A strongly connected directed graph has a path from every node to every other node in both directions.
Graph Representations
There are three common ways to represent a graph in code.
An edge list is simply a list of edges, where each edge is represented by the IDs of its two connected vertices. This gives you a 2D list structure that’s straightforward but not always efficient for lookups.
An adjacency list uses each vertex ID as an index into an array. Each position in the array contains a list of adjacent vertices. This is generally the most practical representation for sparse graphs.
An adjacency matrix is a 2D array where a value of 1 at position [i][j] indicates an edge between node i and node j, and 0 indicates no edge. The diagonal is usually 0 since nodes typically don’t connect to themselves. In an undirected graph each edge shows up twice, making the matrix symmetric.
Graph Traversal
Depth First Search
DFS follows one path as far as it will go before backtracking and trying the next option. Traversal and search are essentially the same operation, the only difference being that search stops early when the target is found.
DFS can be implemented with a stack or with recursion. Pick a starting node and mark it as seen. Follow an edge to an adjacent node, mark it as seen and repeat. If you encounter a node you’ve already seen, try a different edge. When there are no new edges to explore, pop from the stack and backtrack to the previous node.
The runtime complexity is $O(|E| + |V|)$, where $|E|$ is the number of edges and $|V|$ is the number of vertices. Each edge is actually traversed twice, once to explore and once to backtrack, but this is still linear in the number of edges.
Breadth First Search
BFS explores all adjacent nodes at the current depth before moving deeper into the graph. Start with a node, mark it as seen and visit all of its adjacent nodes, adding each to a queue. Once all edges from the starting node have been explored, dequeue the next node and repeat the process.
The runtime complexity is $O(|E| + |V|)$, the same as DFS. The difference is purely in the order nodes are visited.
Eulerian Paths
An Eulerian path is a path in a graph that visits every edge exactly once. An Eulerian circuit is an Eulerian path that starts and ends at the same vertex. These concepts are named after Leonhard Euler, who first studied them in 1736 when solving the famous Seven Bridges of Konigsberg problem.
For an undirected graph, an Eulerian circuit exists if and only if every vertex has an even degree (number of edges connected to it). An Eulerian path that isn’t a circuit exists if exactly two vertices have an odd degree. Those two vertices will be the start and end points of the path. If more than two vertices have an odd degree, no Eulerian path exists at all.
For a directed graph the conditions are slightly different. An Eulerian circuit exists if every vertex has equal in-degree and out-degree. An Eulerian path exists if at most one vertex has an out-degree exceeding its in-degree by 1 (the start vertex) and at most one vertex has an in-degree exceeding its out-degree by 1 (the end vertex), with all other vertices having equal in-degree and out-degree.