Fun with C# and LINQ – Part 2

Last time I discussed how two lists of financial transactions can be matched by defining a custom extension method. The used types were hard-coded, which is fine as long as you don’t plan to reuse the code.
It is not hard to generalize the extension method using generic type parameters. These would be TFirst for elements of the first list, TSecond for elements of the second list, and TResult for elements of the resulting enumeration. The signature of the generic match method would then be

public static IEnumerable<TResult> Match<TFirst, TSecond, TResult>(
    this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, int> compareTo,
    Func<TFirst, TSecond, TResult> matchSelector,
    Func<TFirst, TResult> firstSelector,
    Func<TSecond, TResult> secondSelector
    )

The first two parameters are the lists to compare. The elements are compared with the compareTo delegate. The remaining three parameters are delegates that yield the resulting elements. Depending on the result of compareTo, the appropriate selector is called.

The method body is structurally identical to the non-generic version. The two specific differences are

Instead of the hard-coded comparison of transaction amounts, the comparion delegate is invoked
Instead of calling the constructors of the return types, the selector delegates are invoked

    using (var enum1 = first.GetEnumerator())
    using (var enum2 = second.GetEnumerator()) {
        // Move both enumerators to first element
        bool eoc1 = !enum1.MoveNext();  // end of first collection?
        bool eoc2 = !enum2.MoveNext();  // end of second collection?

        // Match sequences until end of one collection is reached
        while (!eoc1 && !eoc2) {
            // Compare current elements
            var cur1 = enum1.Current;
            var cur2 = enum2.Current;
            int cmp = compareTo(cur1, cur2);

            // Yield the next TResult element and advance iterator(s)
            if (cmp < 0) {
                yield return firstSelector(cur1);
                eoc1 = !enum1.MoveNext();
            }
            else if (cmp > 0) {
                yield return secondSelector(cur2);
                eoc2 = !enum2.MoveNext();
            }
            else {
                yield return matchSelector(cur1, cur2);
                eoc1 = !enum1.MoveNext();
                eoc2 = !enum2.MoveNext();
            }
        }

        // Yield rest of first sequence
        while (!eoc1) {
            yield return firstSelector(enum1.Current);
            eoc1 = !enum1.MoveNext();
        }
        // Yield rest of second sequence
        while (!eoc2) {
            yield return secondSelector(enum2.Current);
            eoc2 = !enum2.MoveNext();
        }
    }
}

At first it may seem that we do not need three different selectors. Instead of firstSelector and secondSelector one could call matchSelector with default(TFirst) or default(TSecond), respectively, as one argument. This general approach is taken in the related ZipLongest and ZipShortest extensions methods in the MoreLinq LINQ extension library. However, if the type’s default values are viable elements of the input lists, there is no good way to discriminate between these values coming from the input or being created in the matching. Also, matchSelector is called with two values that would not necessarily be compared as equal. If you rely on this fact, either in the code of matchSelector itself, or in the use of the resulting object, you are out of luck.

However, it seems reasonable to implement an overload that discards unmatched elements. Sure enough, one selector is sufficient now.

public static IEnumerable<TResult> Match<TFirst, TSecond, TResult>(
    this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, int> compareTo,
    Func<TFirst, TSecond, TResult> matchSelector
    ) {
    using (var enum1 = first.GetEnumerator())
    using (var enum2 = second.GetEnumerator()) {
        // Move both enumerators to first element
        if (!enum1.MoveNext() || !enum2.MoveNext())
            yield break;

        // Match sequences until end of one collection is reached
        while (true) {
            // Compare current elements
            var cur1 = enum1.Current;
            var cur2 = enum2.Current;
            int cmp = compareTo(cur1, cur2);

            // Yield the next TResult element and advance iterator(s)
            if (cmp < 0) {
                if (!enum1.MoveNext())
                    yield break;
            }
            else if (cmp > 0) {
                if (!enum2.MoveNext())
                    yield break;
            }
            else {
                yield return matchSelector(cur1, cur2);
                if (!enum1.MoveNext() || !enum2.MoveNext())
                    yield break;
            }
        }
    }
}

For a final variation, imagine that the input types implement the IComparable interface. This enables us to get rid of the comparator delegate.

public static IEnumerable<TResult> Match<TFirst, TSecond, TResult>(
    this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> matchSelector,
    Func<TFirst, TResult> firstSelector,
    Func<TSecond, TResult> secondSelector
    ) where TFirst : IComparable<TSecond> {
    return Match(first, second, (f, s) => f.CompareTo(s), matchSelector, firstSelector, secondSelector);
}

Here TFirst is supposed to implement the comparison.

Now this variation seems less satisfactory. In order to keep the symmetry between the two input types, one would need another overload with a type constraint where TSecond : IComparable<TFirst>. However, generic type constraints are not part of the method signature, so this is not possible. Moreover, a comparison delegate is more in the spirit of LINQ and offers the highest flexibility.

Fun with C# and LINQ – Part 2

Like this:

Leave a Reply Cancel reply

Share this:

Like this:

Leave a Reply Cancel reply