LinqAF: The (possible) future

This is part of a series on LinqAF, you should start with the first post.

What’s missing from LinqAF?

The number one missing thing is performance.  LinqAF isn’t terribly slow, but it’s not consistently faster than LINQ-to-Objects and there’s no real reason it couldn’t be.  This would take the form of much more extensive profiling, and a likely a decent chunk of unsafe code.

There are also lots of additional operators that could be added.  Operators new to 4.7.1 like Append, Prepend, SkipLast, TakeLast, and ToHashSet or the extensive operators available in packages like MoreLINQ.  While the nature of LinqAF means each additional operator adds significantly to the generated code size, the code generation infrastructure makes it entirely feasible.

There are built-in enumerables that aren’t optimized for, like the Dictionary<TKey, TValue> and SortedDictionary<TKey, TValue> types, because they require more effort as a consequence of having two generic parameters.  It’d be neat to include the immutable collections introduced in .NET 4.5 as well.

What’s missing from C#?

cat-3068896_640

Missing from this blog?  On topic images.

A few changes to C# would make it possible to meaningfully improve LinqAF – ref extension methods, and improvements to generic type inferencing.

In the places where LinqAF does use extension methods (mostly in the numerical operators like Max, Average, and Sum), it’s forced to pass the source enumerable by value.  This entails making a copy of a (potentially quite large) struct, which has a deleterious impact on performance.  It is because of this copying cost that internally LinqAF passes all enumerables by reference.  Ref extension methods, allowing something like static MyEnumerable<TSource> MyOperator<TSource>(this ref TSource source), would prevent this copying.  This feature may make it into a future version of C#, and is presently championed by Mads Torgersen.

The reason LinqAF doesn’t use extension methods more extensively is because of weaknesses in C#’s generic type inferencing.  In a nutshell,

// in some static class
public static WhereEnumerable<TItem, TInnerEnumerable, TInnerEnumerator> 
  Where<TItem, TInnerEnumerable, TInnerEnumerator>(
    this TInnerEnumerable source,
    Func<TItem, bool> predicate
  )
  where TInnerEnumerable: struct, IStructEnumerable<TItem, TInnerEnumerator>
  where TInnerEnumerator: struct, IStructEnumerator<TItem>
{ /* ... */ }

will not successfully be invoked by code like

var e = /* some struct implementing IStructEnumerable<...>
e.Where(i => true);

instead it fails with a CS0411 error (type arguments for the method cannot be inferred).  If this weren’t the case everything except the overrides in LinqAF could be implemented as extension methods, which would drastically reduce the size of the codebase and the final DLL.  The one attempt in this direction that I’m aware of came to nothing due to problems with backwards compatibility.  With compatibility concerns, the increase in complexity, and presumably compile times – I’d be a little surprised if it were done in a future C# version.  That said, LinqAF would benefit from it if it were.

What could be improved?

As mentioned in an earlier post, I’m very interested in alternatives to BoxedEnumerable<T>.  I spent a decent amount of time playing around with a tagged union approach, but could never come up with anything workable.

The RangeEnumerable<T> type being generic over T instead of being bound to int is a blemish.  Fixing this would let me remove the MiscUtil dependency, and would improve the performance of the Range operator. The major stumbling block is that the LinqAF.Generator project assumes all enumerables are generic over T, and Range being an exception would require significant amounts of work.

And that’s it!

This is the end of my series on LinqAF.  Once again, the code is on Github and a Nuget package is available.  Below is a table of contents for easy reference.

Thanks for reading!

Table of contents