# Recursive LINQ

Recently a colleague asked me for advice about using LINQ to group and sort a table in a specific way. It contains multiple columns by which to group, and a single time-stamp column by which to sort. Consider this data as an example:

Column1Column2Column3Timestamp
CCC6
ABC4
CAA5
BBC1
CCC3
CCC1
CBC7
BAA5
ACB7
ABB9
The unsorted table.

The first step is to group by Column1, and then sort the groups by their lowest time-stamp. For equal values of the time-stamp, the order is arbitrary. Repeat the step for Column2 and Column3, preserving the previous grouping. The result would be

Column1Column2Column3Timestamp
CCC6
CCC3
CCC1
CAA5
CBC7
BBC1
BAA5
ABC4
ABB9
ACB7
The grouped/sorted result.

The requirements can be quite literally translated into the following LINQ query

```IEnumerable<DataRow> sorted =
// Column1
from row in table.AsEnumerable()
group row by row.Field<string>(0)
into grouped1
orderby grouped1.Min(x => x.Field<int>("Timestamp"))
from row in
// Column2
from row in grouped1
group row by row.Field<string>(1)
into grouped2
orderby grouped2.Min(x => x.Field<int>("Timestamp"))
from row in
// Column3
from row in grouped2
group row by row.Field<string>(2)
into grouped3
orderby grouped3.Min(x => x.Field<int>("Timestamp"))
from row in grouped3
select row
select row
select row;```

While this does indeed achieve the desired results, there are a few drawbacks to this solution.

Most importantly the code is not easily comprehensible, which is always a major concern for production code. One of the most compelling features of LINQ, the possibility to express data queries in a clear fashion, is lost here. The nested grouping/sorting can be somewhat clarified by suitable comments and indentation. However, while excessive comments are better than none, sometimes they are a sign that the code is badly structured. Even worse, the indentation is actually flattened by the Visual Studio code formatting.

Another shortcoming is that the number of columns is fixed. While this was fine by the requirements, a more general solution would be welcome.

Let’s take a closer look at the query to understand how it can be improved. The crucial piece of the query is the rather obscure construct

```from row in
from row in grouped1
[...]
select row```

which is nested three times in the query.

At first this is quite confusing because the first `from` gets translated into a call to `SelectMany`, while the second one only marks the beginning of a sub-query. All becomes clearer once one realizes that the first `from row in` together with its matching `select row` does nothing more but to flatten the enclosed sequence. This is best visible with the innermost appearance of this pattern

```from row in grouped3
select row```

which gets translated into the idiom `SelectMany(grouped3 => grouped3)` 1. Remember that `SelectMany` is the LINQ equivalent of the monadic `Bind`. As such it flattens the sequence of the grouped values that `grouped3` takes on, which are each of type `IGrouping<string, DataRow>` and therefore also of type `IEnumerable<DataRow>`, into a single `IEnumerable<DataRow>`.

Equipped with this insight we can reformulate the query as a recursive method. The parameter `rows` represents the rows to be grouped and sorted, `groupColumns` are the columns by which to group, and `colIndex` is the current index into `groupColumns`.

```static IEnumerable<DataRow> RecursiveGroupSort(IEnumerable<DataRow> rows, IList<DataColumn> groupColumns, int colIndex)
{
return
from row in rows
group row by row.Field<string>(groupColumns[colIndex])
into grouped
orderby grouped.Min(x => x.Field<int>("Timestamp"))
from row in (colIndex < groupColumns.Count - 1 ? RecursiveGroupSort(grouped, groupColumns, colIndex + 1) : grouped)
select row;
}```

The expression

`(colIndex < groupColumns.Count - 1 ? RecursiveGroupSort(grouped, groupColumns, colIndex + 1) : grouped)`

represents the sub-query, which is either a recursive call, or the innermost sequence of grouped rows. Actually the parentheses here are not necessary, but they make the code slightly more legible.

The method `RecursiveGroupSort` takes an arbitrary number of columns to group by. Furthermore the code is much clearer than the original unrolled query. One might go even further from here and implement this as an anonymous recursion, but the resulting code would most certainly be much less comprehensible.

1. Actually for performance reasons it gets translated into the overload `SelectMany (grouped3 => grouped3, (grouped3, row) => row)`, see Wes Dyers marvelous blog post for further explanation.
Recursive LINQ
Scroll to top