Monday, December 27, 2010

Parallel graph search

Graph is a concept used to describe relationships between things where vertices (or nodes) represent things and edges represent relationships. Graph is made up of two finite sets (vertices and edges) and denoted by G = (V, E). Nodes usually are denoted by non-negative integers (for example, graph with 3 nodes with have nodes denoted by 0, 1, 2) and edges are pairs of nodes (s, t) where s is a source and t is a target node.

Searching for a simple path (path with no repeated vertices) that connects two nodes is one of the basic problems that are solved using graph search. We need to visit each node once (meaning visited nodes tracking is on) to find out is there any path between two given nodes. Time complexity for graph search is O(V).

Let’s parallelize the thing to speed it up.

Graph representation is irrelevant for our case so let’s assume implementation of the following public API is in place:

public class Graph
    // Initializes new graph with given number of nodes and 
    // set of edges
    public Graph(int nodes, IEnumerable<Tuple<int, int>> edges);
    // Gets number of nodes in the graph
    public int Nodes { get; }
    // Gets list of nodes adjacent to given one
    public IEnumerable<int> AdjacentTo(int node);

Starting with a source node for each node being processed search condition is checked and search is expanded to adjacent nodes that are not yet visited. Search continues until condition is met or no more reachable not visited nodes are left.

First part of the solution is to process nodes in parallel until no more is left or early termination is requested. The idea is to maintain two work queues for current and next phases. Items from current phase queue are processed in parallel and new work items are added to next phase queue. At the end of phase queues are switched. While this mechanism covers “while not empty” scenario, cooperative cancellation covers early search termination.

public static class ParallelAlgorithms
    public static void DoWhileNotEmpty<T>(IEnumerable<T> seed, Action<T, Action<T>> body, CancellationToken cancellationToken)
        // Maintain two work queues for current and next phases
        var curr = new ConcurrentQueue<T>(seed);
        var next = new ConcurrentQueue<T>();
        var parallelOptions = new ParallelOptions {CancellationToken = cancellationToken};
        // Until there is something to do
        while (!curr.IsEmpty)
            // Function to add work for the next phase
            Action<T> add = next.Enqueue;
            // Process current work queue in parallel while 
            // populating next one
            Parallel.ForEach(curr, parallelOptions, t => body(t, add));
            // Switch queues and empty next one
            curr = next;
            next = new ConcurrentQueue<T>();

Now that we have processing mechanism we need to bring visited nodes tracking mechanism. As part of our solution we must find a path that connects two given nodes if one exists. Thus instead of just remembering which nodes were visited we’ll keep track of parent nodes used to visit a node. Unvisited nodes are marked with –1. Once search is finished we can reconstruct path backwards between target and source by following parent links.

public static class ParallelGraph
    public static int[] Search(this Graph graph, int source, Func<Tuple<int, int>, bool> func)
        const int undef = -1;
        // Start with dummy loop edge
        var seed = Enumerable.Repeat(Tuple.Create(source, source), 1);
        // Mark all nodes as not yet visited
        var tree = Enumerable.Repeat(undef, graph.Nodes).ToArray();
        tree[source] = source;
        // Cancellation token source used to exit graph search 
        // once search condition is met.
        var cancellationTokenSource = new CancellationTokenSource();
            // Until there are reachable not visited nodes 
                .DoWhileNotEmpty(seed, (edge, add) =>
                var from = edge.Item2;

                // and search condition is not met
                if (!func(edge))

                // Try expand search to adjacent nodes
                foreach (var to in graph.AdjacentTo(from))
                    // Expand search to not yet visited nodes 
                    // (otherwise if node is expanded and 
                    // then checked during processing we may 
                    // end up with lots of already visited 
                    // nodes in memory which is a problem 
                    // for large dense graphs) marking nodes 
                    // to visit next
                    if (Interlocked.CompareExchange(ref tree[to], from, undef) != undef)
                    add(Tuple.Create(from, to));
            }, cancellationTokenSource.Token);
        catch (OperationCanceledException ex)
            // Check if exception originated from parallel 
            // processing cancellation request
            if (ex.CancellationToken != cancellationTokenSource.Token)
        // In the end we have spanning tree that connects
        // all the nodes reachable from source
        return tree;

    public static IEnumerable<int> FindPath(this Graph graph, int from, int to)
        // Search within graph until reached 'to'
        var tree = Search(graph, from, edge => edge.Item2 != to);
        // Check if there is a path that connects 'from' and 'to'
        if (tree[to] == -1)
            return Enumerable.Empty<int>();
        // Reconstruct path that connects 'from' and 'to'
        var path = new List<int>();
        while (tree[to] != to)
            to = tree[to];
        // Reverse path as it is reconstructed backwards
        return Enumerable.Reverse(path);