Wednesday, January 5, 2011

Parallel string matching

String matching is about searching for occurrence (first or all occurrences – it makes difference from parallelization point of view as you shall see soon) of a pattern in a given text.

This problem naturally fits into data parallelism scenario although details depend on whether we want to find first or all occurrences of a pattern. Parallel searching for all occurrences is simpler as searching space is static (whole text needs to be looked through) in contrast to searching for the first occurrence that may be anywhere in a text (reducing search space improves performance otherwise the solution is either equal to sequential search till first occurrence or parallel search of all occurrences with getting the first one).

In order to find all pattern occurrences text can be separated into chunks that are processed in parallel. Each chunk overlaps with its immediate neighbor (except for the last one) for no more than length(pattern) - 1 characters to cover the case when pattern occurs in text such that chunk starting position is within pattern.

public static class ParallelAlgorithms
{
    // Find all occurrences of a pattern in a text
    public static IEnumerable<int> IndexesOf(this string text, string pattern, int startIndex, int count)
    {
        var proc = Environment.ProcessorCount;
        // Do range partitioning
        var chunk = (count + proc - 1) / proc;
        var indexes = new IEnumerable<int>[proc];

        Parallel.For(0, proc, 
            p => 
            {
                // Define overlapping with its immediate 
                // neighbor chunk except for the last one
                var before = p * chunk;
                var now = Math.Min(chunk + pattern.Length - 1, count - p * chunk);
                // Sequentially search for patterns occurences 
                // and store local result
                indexes[p] = SequentialIndexesOf(text, pattern, startIndex + before, now);
            });
        return indexes.SelectMany(p => p);
    }

    static IEnumerable<int> SequentialIndexesOf(string text, string pattern, int startIndex, int count)
    {
        for (var i = 0; i <= count - pattern.Length;)
        {
            var found = text.IndexOf(pattern, startIndex + i, count - i);
            // No pattern occurrences is found till the end
            if (found == -1)
                break;
            yield return found;
            // Proceed with next to found position
            i = found + 1;
        }
    }
}

Now for the first occurrence scenario search space must be reduced minimizing amount of unnecessary work (it will be non-zero in most cases due speculative processing).

One way to do this is to separate text into chunks and process them in parallel such that if pattern is found in chunk(i) processing of any chunk(j) where j > i is cancelled. Assuming n is the length of a text and p equals to logical processor count in the worst case (when first occurrence of a pattern is at the end of the first chunk) amount of the unnecessary work will be (p - 1)*n/p. On the other hand range partitioning has poor performance when workload is unbalanced.

What we can do instead is dynamic partitioning where whole text is separated into groups of chunks of the same length within group and chunk length in consequent group is two times large than previous group had. Group size should be set to number of logical processors. Thus, if c denotes chunk of unit size text will be separated like c, c, c, …, cc, cc, cc, …, cccc, cccc, cccc,…  Now this sequence of chunks can be processed in parallel (respecting order) breaking at first found occurrence. Thus in worst case at most p*c amount of unnecessary work will be done.

This partitioning strategy is used in PLINQ and called chunk partitioning. Although text can be treated as a sequence of characters and chunk partitioning can be used out of the box but that is not what we want as otherwise we will process individual characters rather than text chunks. Instead we’ll produce sequence of chunks manually and using single item partitioner and Parallel.ForEach process them in parallel respecting order.

public static class ParallelAlgorithms
{
    // Find first occurrence of a pattern in a text
    public static int IndexOf(this string text, string pattern, int startIndex, int count)
    {
        var minChunkSize = pattern.Length << 5;
        var maxChunkSize = minChunkSize << 3;
        // Create sequence of chunks
        var chunks = PartitionRangeToChunks(startIndex, count, minChunkSize, maxChunkSize, Environment.ProcessorCount);
        // Process chunks in parallel respecting order
        var chunkPartitioner = SingleItemPartitioner.Create(chunks);
        var concurrentBag = new ConcurrentBag<int>();
        Parallel.ForEach(chunkPartitioner,
            (range, loop) =>
            {
                var start = range.Item1;
                var length = Math.Min(startIndex + count - start, range.Item2 + pattern.Length - 1);
                var index = text.IndexOf(pattern, start, length);
                // No pattern occurrences in this chunk
                if (index < 0)
                    return;
                // Store shared result
                concurrentBag.Add(index);
                // Let running parallel iterations complete and 
                // prevent starting new ones
                loop.Break();
            });
        // Pick first occurrence or indicate no occurrence
        return concurrentBag.Count > 0 ? concurrentBag.Min() : -1;
    }

    static IEnumerable<Tuple<int, int>> PartitionRangeToChunks(int start, int count, int minChunkSize, int maxChunkSize, int doubleAfter)
    {
        var end = start + count;
        var chunkSize = Math.Min(minChunkSize, count);
        while (start < end)
        {
            for (var i = 0; i < doubleAfter && start < end; i++)
            {
                var next = Math.Min(end, start + chunkSize);
                yield return Tuple.Create(start, next - start);
                start = next;
            }

            chunkSize = Math.Min(maxChunkSize, chunkSize * 2);
        }
    }
}

Search in parallel! =)