Thursday, November 5, 2009

Disposing sequence of resources

C# “using” statement has several advantages over its expanded equivalent:

  • Shortcut is more readable
  • If local variable form for resource acquisition is used it is read-only inside using statement and thus prevents you from spoiling resource disposal

Whenever you need to obtain several resources (number is known at compile time), use and then dispose them “using” statement is usually the choice:

using(var aStream = File.Open("a.txt", FileMode.Open))
{
    using(var bStream = File.Open("b.txt", FileMode.Open))
    {
        // Use both streams
    }
}

However it is not always the case. There may be a case when number of resources to obtain is not known at compile time. For example, basic external merge sorting algorithm separates large file into chunks (total number depends on original file size and available memory) that can be sorted in memory and then written to disk. Sorted chunks iteratively merged until a single chunk is left (which is sorted original file). During merge iteration several files must be opened (number is not known in advance), processed and then disposed. As we cannot use “using” statement directly it might look like this:

IEnumerable<string> files = ...; // Initialized elsewhere

var streams = new List<Stream>();
try
{
    // As we may get half way through opening
    // files and got exception because file doesn't
    // exist opened streams must be remembered
    foreach (var file in files)
    {
        streams.Add(File.Open(file, FileMode.Open));
    }

    // Use streams 
}
finally
{
    // Dispose opened streams
    foreach (var stream in streams)
    {
        stream.Dispose();
    }
}

Unfortunately we lost all advantages of “using” statement (looks messy and collection of opened streams or its contents can be modified before “finally” block). It would be nice to have something like this:

using (var streams = ???)
{
    // streams must be IEnumerable<Stream>
}

For reference types expansion of “using” statement looks like this (struct types differ in how resource is disposed):

using (ResourceType resource = expression) statement 

// is expanded to

{
    ResourceType resource = expression;
    try
    {
        statement;
    }
    finally
    {
        if (resource != null) ((IDisposable)resource).Dispose();
    }
}

If an exception happens during expression evaluation resource won’t be disposed (as there is nothing to dispose). However any exceptions inside statement are ok. So we need to somehow define how file names are converted into streams but still avoid any exceptions. Lazy evaluation will be handy.

// Projected sequence won’t get evaluated until it is enumerated
// and thus file related exceptions (if any) are also postponed
files.Select(file => File.Open(file, FileMode.Open))

Still we cannot use it inside “using” statement as it is not IDisposable. So basically what we want is a disposable sequence that takes care of disposing its elements (required to be IDisposable).

interface IDisposableSequence<T> : IEnumerable<T>, IDisposable
    where T:IDisposable
{ }

Sequence of disposable elements can be wrapped through

static class Disposable
{
    // Defined as an extension method that augments minimal needed interface
    public static IDisposableSequence<T> AsDisposable<T>(this IEnumerable<T> seq)
        where T:IDisposable
    {
         return new DisposableSequence<T>(seq);
    }
}

class DisposableSequence<T> : IDisposableSequence<T>
    where T:IDisposable
{
    public DisposableSequence(IEnumerable<T> sequence)
    {
       ... // an implementation goes here
    }
    
    ... // Other members elided for now
}

We are close. But there is subtle issue. Obtaining resource is a side effect. Enumerating multiple times through projected into resources sequence will result in unwanted side effects which of course must be avoided. In this particular case enumerating (and thus projecting it) through the same element (file name) more than once will attempt to open already opened file and result in exception as File.Open uses FileShare.None by default.

So we need to avoid side effects by memorizing obtained resources.

class DisposableSequence<T> : IDisposableSequence<T>
    where T : IDisposable
{
    private IEnumerable<T> m_seq;
    private IEnumerator<T> m_enum;
    private Node<T> m_head;
    private bool m_disposed;

    public DisposableSequence(IEnumerable<T> sequence)
    {
        m_seq = sequence;
    }

    public IEnumerator<T> GetEnumerator()
    {
        ThrowIfDisposed();

        // Enumerator is built traversing lazy linked list 
        // and forcing it to expand if possible
        var n = EnsureHead();
        while (n != null)
        {
            yield return n.Value;
            n = n.GetNext(true);
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    public void Dispose()
    {
        if (!m_disposed)
        {
            m_disposed = true;

            // As sequence creates enumerator it is responsible 
            // for its disposal
            if (m_enum != null)
            {
                m_enum.Dispose();
                m_enum = null;
            }

            // As it is possible that not all resources were 
            // obtained (for example, inside using statement 
            // only half of lazy evaluated sequence elements 
            // were enumerated and thus only half of resources 
            // obtained) we do not want to obtain them now
            // as they are going to be disposed immediately. 
            // Thus we traverse only through already created 
            // lazy linked list nodes and dispose obtained 
            // resources
            Dispose(m_head);

            m_seq = null;
        }
    }

    private Node<T> EnsureHead()
    {
        // Obtain enumerator once
        if (m_enum == null)
        {
            m_enum = m_seq.GetEnumerator();
            // Try to expand to first element
            if (m_enum.MoveNext())
            {
                // Created node caches current element
                m_head = new Node<T>(m_enum);
            }
        }
        return m_head;
    }

    private void ThrowIfDisposed()
    {
        if (m_disposed)
        {
            throw new ObjectDisposedException("DisposableSequence");
        }
    }

    private static void Dispose(Node<T> h)
    {
        if (h == null)
        {
            return;
        }

        try
        {
            // Disposing resources must be done in the opposite 
            // to usage order. With recursion it will have the 
            // same semantics as nested try{}finally{} blocks.
            Dispose(h.GetNext(false));
        }
        finally
        {
            h.Value.Dispose();
        }
    }

    class Node<V>
    {
        private readonly V m_value;
        private IEnumerator<V> m_enum;
        private Node<V> m_next;

        public Node(IEnumerator<V> enumerator)
        {
            m_value = enumerator.Current;
            m_enum = enumerator;
        }

        public V Value
        {
            get { return m_value; }
        }

        public Node<V> GetNext(bool force)
        {
            // Expand only if forced and not expanded before
            if (force && m_enum != null)
            {
                if (m_enum.MoveNext())
                {
                    m_next = new Node<V>(m_enum);
                }
                m_enum = null;
            }
            return m_next;
        }
    }
}

Once enumerated resources are memorized inside lazy linked list. It expands only more than already memorized resources are requested.

After putting things together our desired “using” statement usage looks like this

using (var streams = files.Select(file => File.Open(file, FileMode.Open)).AsDisposable())
{
    // streams is IEnumerable<Stream> and IDisposable
}

Enjoy!

Update.

In general it is a good practice to acquire resource right before its usage and dispose it when it is no longer needed otherwise system may experience resources exhaustion.

Described above approach can be used whenever resources should be acquired and disposed together (they all have the same actual usage time) and you do not know number of resources in advance. Otherwise you must use one or more "using" statements and dispose resources as they are no longer needed.

You must carefully consider that even if grouped under a single "using" statement (using described approach) resources have different actual usage time they won't be disposed (unless done explicitly inside "using" statement assuming that multiple calls to Dispose method are allowed) until processing of all resources is completed (holding some of them unnecessarily).