Saturday, August 21, 2010

Minimum window that contains all characters

Like most of life's problems, this one can be solved with bending

- Bender B.Rodrigues

Let’s bend another problem. Given set of characters P and string T find minimum window in T that contains all characters in P. Applicable solution is restricted to O(length(T)) time complexity. For example, given a string T “of characters and as” and set of characters T in a form of a string “aa s” the minimum window will be “and as”.

The problem can be broken into two parts:

  • How to select window?
  • How to check that selected window contains all characters from P?

Selecting every possible window (all unique pairs (i, j) where 0 <= i <= j < length(T)) will lead to solution worse than O(length(T)^2) because you still need to check if all characters from P are within selected window. Instead we will check every possible window ending position. Thus there are at least length(T) windows to consider.

Any feasible window has length equal to or greater than length(P). Performing recheck for any considered window will result in a solution no better than O(length(T)*length(P)). Instead we need to use check results from previous iteration.

Now we need to make sure that checking if a particular character is in P is done in an optimal way. Taking into account that a particular character may appear more than once and window thus must contain appropriate number of characters. We will use hash table to map unique characters from P to their count for fast lookup.

And now let’s tie all things together.

  • Until reached the end of T move by one current window ending position.
  • Append next character to the end of previous window which to this moment doesn’t contain all necessary characters. Char to count map is used to track the number of characters left to find. Basically if character is in P count is decremented. The number may become negative meaning that there are more than required characters.
  • If unmatched character count goes to zero the window contains all required characters. However there may be redundant characters. Thus we try to compact current window. It is ok to do this as we are looking for minimum window and any window that is extended from this one won’t be better.
  • Once window is compacted compare it with the minimum one and updated it if needed.
  • If current window contains all the characters remove from it the first one simply by moving by one starting position to make sure that at each iteration previous window doesn’t contain all the characters (there is no point in appending new characters to a window that already contains all of the required ones).

Code the thing! =)

static string FindMinWindow(string t, string p)
{
	// Create char to count mapping for fast lookup
	// as some characters may appear more than once
	var charToCount = new Dictionary<char, int>();
	foreach (var c in p)
	{
		if (!charToCount.ContainsKey(c))
			charToCount.Add(c, 0);
		charToCount[c]++;
	}

	var unmatchesCount = p.Length;
	int minWindowLength = t.Length + 1, minWindowStart = -1;
	int currWindowStart = 0, currWindowEnd = 0;
	for (; currWindowEnd < t.Length; currWindowEnd++)
	{
		var c = t[currWindowEnd];
		// Skip chars that are not in P
		if (!charToCount.ContainsKey(c))
			continue;
		// Reduce unmatched characters count
		charToCount[c]--;
		if (charToCount[c] >= 0)
			// But do this only while count is positive
			// as count may go negative which means 
			// that there are more than required characters
			unmatchesCount--;

		// No complete match, so continue searching
		if (unmatchesCount > 0)
			continue;

		// Decrease window as much as possible by removing 
		// either chars that are not in T or those that 
		// are in T but there are too many of them
		c = t[currWindowStart];
		var contains = charToCount.ContainsKey(c);
		while (!contains || charToCount[c] < 0)
		{
			if (contains)
				// Return character to P
				charToCount[c]++;

			c = t[++currWindowStart];
			contains = charToCount.ContainsKey(c);
		}

		if (minWindowLength > currWindowEnd - currWindowStart + 1)
		{
			minWindowLength = currWindowEnd - currWindowStart + 1;
			minWindowStart = currWindowStart;
		}

		// Remove last char from window - it is definitely in a 
		// window because we stopped at it during decrease phase
		charToCount[c]++;
		unmatchesCount++;
		currWindowStart++;
	}

	return minWindowStart > -1 ?
	       t.Substring(minWindowStart, minWindowLength) :
	       String.Empty;
}

Every character is examined at most twice (during appending to the end and during compaction) so the whole solution has O(length(T)) time complexity assuming hash table lookup is O(1) operation.

4 comments:

Gaurav Saxena said...

Nice solution. Thanks a lot!!

multicoredump said...

I stumbled your blog while searching for "merge BSTs in place" and ended up reading all #algorithm posts. Great job!
Thanks for posting :)

Dzmitry Huba said...

Thanks, I'm glad you enjoyed it. I will continue exploring interesting algorithms.

visu said...

"Selecting every possible window (all unique pairs (i, j) where 0 <= i <= j < length(T)) will lead to solution worse than O(length(T)^2) because you still need to check if all characters from P are within selected window."

To my understanding we need to choose i, j from 0 to length(T) with repetitions.
which evaluates to
( (length(T)+1) * length(T) ) / ( 2*1 )

Have you approximated it as O(length(T)^2)?

Please let me know how you have calculated it.