MSDN magazine columnist Ted Neward recently wrote an inspiring two-part article showing how use of a functional programming language can lead to new insights into a problem, even if the actual implementation is done in a mostly procedural language like C#.
In the first part, he sets up the problem of reconciling two lists (local and remote) of financial transactions. A solution in F# is given, which in the second part serves as a guide for the final implementation in C#. While the C# solution captures the functional spirit of working with lists, it doesn’t really feel like C#. Also, in it’s way of dealing with lists, it has potential performance issues. Here I want to show a C# implementation using LINQ to elegantly and efficiently deal with this specific problem.
Before going on to read the rest of this post you might want to read both parts of the MSDN magazine article. Don’t worry if you do not know F#. The code given there is straightforward and easy enough to understand even for novices to functional languages. In the following I give a very brief summary.
Financial transactions of a local and a remote list are compared and reconciled by amount, such that matching transactions are combined, while transactions with no counterpart stay single. Both lists are assumed to be sorted in ascending order. The following set of classes are used to capture the three cases:
class Transaction { public float Amount { get; set; } public DateTime Date { get; set; } public String Comment { get; set; } }
class Register { } class RegEntry : Register { public Transaction Local { get; set; } public Transaction Remote { get; set; } } class MissingLocal : Register { public Transaction Transaction { get; set; } } class MissingRemote : Register { public Transaction Transaction { get; set; } }
In the C# solution, List.GetRange()
is used to generate sublists of unprocessed transactions. As Ted Neward points out, this poses a potential performance problem. Since a copy of the original list is made in each step, the algorithm has quadratic running time. Moreover, since ReconcileInternal()
is called recursively, references to the intermediate lists stay in scope until the end of the recursion. This results in a quadratic space requirement.
Quadratic performance is not bad per se. If the data is small enough, the solution might be just fine. But what if we are not dealing with thousands, but rather millions of transactions?
Also, the passing of copies of lists in recursive method calls is less than elegant. While the inspiration taken from the functional language F# is quite remarkable, the translation to C# seems to verbatim. The C# way of dealing with lists, or enumerations in general, is LINQ.
There is no predefined extension method in the Framework Class Library applicable to this specific problem. The one that comes closest is Enumerable.Zip
, but it does not support comparison.
However, it is not hard to write a custom extension method to do the job. The signature is the similar to that of Zip
, but the generic types are fixed to Register
and Transaction
.
From each sequence an enumerator is obtained and advanced to the first element.
public static IEnumerable Match(this IEnumerable first, IEnumerable second) { using (var enum1 = first.GetEnumerator()) using (var enum2 = second.GetEnumerator()) { // Move both enumerators to first element bool eoc1 = !enum1.MoveNext(); // end of first collection? bool eoc2 = !enum2.MoveNext(); // end of second collection?
The only way of finding out whether an IEnumerator
has reached the end of the collection is by calling MoveNext()
, inevitably advancing the enumerator to the next element. Since this is not always wanted, the state is saved in a bool variable.
Now the current elements are compared, and in the case of a match, both of them are yielded as a RegEntry
object. If there is no match, a MissingRemote
or a MissingLocal
is yielded. For every yielded transaction, the corresponding iterator is advanced.
// Match sequences until end of one collection is reached while (!eoc1 && !eoc2) { var cur1 = enum1.Current; var cur2 = enum2.Current; int cmp = cur1.Amount.CompareTo(cur2.Amount); if (cmp < 0) { yield return new MissingRemote { Transaction = cur1 }; eoc1 = !enum1.MoveNext(); } else if (cmp > 0) { yield return new MissingLocal { Transaction = cur2 }; eoc2 = !enum2.MoveNext(); } else { yield return new RegEntry { Local = cur1, Remote = cur2 }; eoc1 = !enum1.MoveNext(); eoc2 = !enum2.MoveNext(); } }
This continues as long as there are elements from both sequences.
When one sequence runs out of transactions, the remaining elements of the other are yielded. We are done when both sequences are fully processed.
// Yield rest of first sequence while (!eoc1) { yield return new MissingRemote { Transaction = enum1.Current }; eoc1 = !enum1.MoveNext(); } // Yield rest of second sequence while (!eoc2) { yield return new MissingLocal { Transaction = enum2.Current }; eoc2 = !enum2.MoveNext(); } }
That’s all. No copying of lists. No recursion. Execution in linear time. And thanks to the lazy evaluation of enumerations we might not even need to keep any lists in memory, meaning only a constant amount of working memory is needed. Of course, this is only strictly true if the sequences are already in order.
Having solved the specific problem, the next step is to ask how the solution can be generalized. The types Transaction
, Register
and its three subclasses should not be hard-coded into Match
. It turns out that this generalization is quite straight forward.
Next time I’ll show the generic implementation of the Match extension.