Tech Interview CheatSheet

In part 1 of this series I looked at common search and sort algorithms used on lists. Part 2 focused on hash functions, sets and maps. Part 3 covered trees, depth and breadth first search and self-balancing trees. In this post I will look at graphs, their representations, traversal algorithms and Eulerian paths.

What is a Graph

A graph is a data structure designed to show relationships between objects. Sometimes called a network, a graph consists of a set of nodes (also called vertices) connected by edges. If you’ve read the previous post on trees, you’ll recognise that a tree is actually a specific type of graph.

Unlike trees, graphs can contain cycles, which means there’s no concept of a root node. You can start at one node, follow a series of edges and end up back where you started. Edges can also contain data, often representing the “strength” or “weight” of a relationship between two nodes.

Directions and Cycles

Edges in a graph can have directions, forming a directed graph. In a directed graph it’s possible to have two edges between the same pair of nodes pointing in opposite directions. An undirected graph has no sense of direction on its edges.

A cycle occurs when you can start at one node, follow a series of edges and arrive back at the starting node. Cycles can cause infinite loops during traversal, so many graph algorithms require the input graph to be acyclic. A directed acyclic graph (DAG) is a commonly encountered structure that combines directed edges with the guarantee of no cycles.

Connectivity

A disconnected graph contains vertices that can’t be reached from certain other vertices. In contrast, a connected graph has some path between every pair of vertices. Connectivity is a measure of the minimum number of elements that need to be removed for a graph to become disconnected, effectively measuring how “strong” the graph is.

For directed graphs there’s a further distinction. A weakly connected directed graph is only connected if all directed edges are replaced with undirected ones. A strongly connected directed graph has a path from every node to every other node in both directions.

Graph Representations

There are three common ways to represent a graph in code.

An edge list is simply a list of edges, where each edge is represented by the IDs of its two connected vertices. This gives you a 2D list structure that’s straightforward but not always efficient for lookups.

An adjacency list uses each vertex ID as an index into an array. Each position in the array contains a list of adjacent vertices. This is generally the most practical representation for sparse graphs.

An adjacency matrix is a 2D array where a value of 1 at position [i][j] indicates an edge between node i and node j, and 0 indicates no edge. The diagonal is usually 0 since nodes typically don’t connect to themselves. In an undirected graph each edge shows up twice, making the matrix symmetric.

graph_representations.py

class Node(object):
    def __init__(self, value):
        self.value = value
        self.edges = []

class Edge(object):
    def __init__(self, value, node_from, node_to):
        self.value = value
        self.node_from = node_from
        self.node_to = node_to

class Graph(object):
    def __init__(self, nodes=[], edges=[]):
        self.nodes = nodes
        self.edges = edges

    def insert_node(self, new_node_val):
        new_node = Node(new_node_val)
        self.nodes.append(new_node)
        
    def insert_edge(self, new_edge_val, node_from_val, node_to_val):
        from_found = None
        to_found = None
        for node in self.nodes:
            if node_from_val == node.value:
                from_found = node
            if node_to_val == node.value:
                to_found = node
        if from_found == None:
            from_found = Node(node_from_val)
            self.nodes.append(from_found)
        if to_found == None:
            to_found = Node(node_to_val)
            self.nodes.append(to_found)
        new_edge = Edge(new_edge_val, from_found, to_found)
        from_found.edges.append(new_edge)
        to_found.edges.append(new_edge)
        self.edges.append(new_edge)

    def _get_array_size(self):
        return max([node.value for node in self.nodes]) + 1
        
    def get_edge_list(self):
        """Return a list of triples that looks like this:
        (Edge Value, From Node Value, To Node Value)"""
        return [(edge.value, edge.node_from.value, edge.node_to.value) for edge in self.edges]

    def get_adjacency_list(self):
        """The indecies of the outer list represent
        "from" nodes. Each section in the list 
        will store a list of tuples that looks like this:
        (To Node, Edge Value)"""
        adjacency_list = [None] * self._get_array_size() 
        for edge in self.edges:
            if not adjacency_list[edge.node_from.value]:
                adjacency_list[edge.node_from.value] = []
            adjacency_list[edge.node_from.value].append((edge.node_to.value, edge.value))
        return adjacency_list
    
    def get_adjacency_matrix(self):
        """Return a matrix, or 2D list.
        Row numbers represent from nodes,
        column numbers represent to nodes.
        Edge values is stored in each spot,
        and a 0 if no edge exists."""
        max_index = self._get_array_size()
        matrix = [[0 for x in range(max_index)] for y in range(max_index)]        
        for edge in self.edges:
            matrix[edge.node_from.value][edge.node_to.value] = edge.value
        return matrix

Graph Traversal

Depth First Search

DFS follows one path as far as it will go before backtracking and trying the next option. Traversal and search are essentially the same operation, the only difference being that search stops early when the target is found.

DFS can be implemented with a stack or with recursion. Pick a starting node and mark it as seen. Follow an edge to an adjacent node, mark it as seen and repeat. If you encounter a node you’ve already seen, try a different edge. When there are no new edges to explore, pop from the stack and backtrack to the previous node.

The runtime complexity is $O(|E| + |V|)$, where $|E|$ is the number of edges and $|V|$ is the number of vertices. Each edge is actually traversed twice, once to explore and once to backtrack, but this is still linear in the number of edges.

graph_dfs.py

 def dfs(self, start_node):
        """Recursive implementation
        of Depth First Search iterating through a node's edges.
        """
        ret_list = [start_node.value]
        start_node.visited = True
        edges_out = [e for e in start_node.edges
                     if e.node_to.value != start_node.value]
        for edge in edges_out:
            if not edge.node_to.visited:
                ret_list.extend(self.dfs_helper(edge.node_to))
        return ret_list

Breadth First Search

BFS explores all adjacent nodes at the current depth before moving deeper into the graph. Start with a node, mark it as seen and visit all of its adjacent nodes, adding each to a queue. Once all edges from the starting node have been explored, dequeue the next node and repeat the process.

The runtime complexity is $O(|E| + |V|)$, the same as DFS. The difference is purely in the order nodes are visited.

graph_bfs.py

 def bfs(self, start_node_num):
        """An iterative implementation of Breadth First Search
        iterating through a node's edges."""
        node = self.find_node(start_node_num)
        self._clear_visited()
        ret_list = []
        # Your code here
        queue = [node]
        node.visited = True
        def enqueue(n, q=queue):
            n.visited = True
            q.append(n)
        def unvisited_outgoing_edge(n, e):
            return ((e.node_from.value == n.value) and
                    (not e.node_to.visited))
        while queue:
            node = queue.pop(0)
            ret_list.append(node.value)
            for e in node.edges:
                if unvisited_outgoing_edge(node, e):
                    enqueue(e.node_to)
        return ret_list

Eulerian Paths

An Eulerian path is a path in a graph that visits every edge exactly once. An Eulerian circuit is an Eulerian path that starts and ends at the same vertex. These concepts are named after Leonhard Euler, who first studied them in 1736 when solving the famous Seven Bridges of Konigsberg problem.

For an undirected graph, an Eulerian circuit exists if and only if every vertex has an even degree (number of edges connected to it). An Eulerian path that isn’t a circuit exists if exactly two vertices have an odd degree. Those two vertices will be the start and end points of the path. If more than two vertices have an odd degree, no Eulerian path exists at all.

For a directed graph the conditions are slightly different. An Eulerian circuit exists if every vertex has equal in-degree and out-degree. An Eulerian path exists if at most one vertex has an out-degree exceeding its in-degree by 1 (the start vertex) and at most one vertex has an in-degree exceeding its out-degree by 1 (the end vertex), with all other vertices having equal in-degree and out-degree.

Tech Interview CheatSheet - Graphs