LinqAF: Replacing LINQ and not allocating

This is part of a series on LinqAF, you should start with the first post.

How is LINQ-to-Objects replaceable?

The C# language has a number of keywords that are used with any LINQ provider.  Fortunately for me, these keywords are equivalent to methods calls.  In other words: from myItem in myList where myPredicate(myItem) is equivalent to myList.Where(myItem => myPredicate(myItem)).   These method calls are unqualified and untyped, so all that matters is that some code somewhere provides a definition for Where, System.Linq isn’t privileged nor is IEnumerable<T>.

Similarly, C#’s foreach statement makes untyped and unprivileged calls to GetEnumerator().  The BCL uses this trick extensively to reduce allocations when enumerating over collections in System.Collections.Generic (for example see how List<T> has a GetEnumerator() method that returns a struct).

How does LinqAF avoid allocating?

kitten-2249619_640

Allocations frighten kitty

The biggest trick is that all operations in LinqAF are represented as structs, whereas LINQ-to-Objects always returns an IEnumerable<T> or IOrderedEnumerable<T>.  This means that LinqAF is going to perform 0 allocations for most LINQ operations (the exceptions are covered in a later post) versus LINQ-to-Objects minimum of 1 allocation.

LinqAF also implements custom struct enumerators for each enumerable which can result in fewer allocations.  The savings are less for these enumerators because LINQ-to-Objects uses some tricks (like this internal Iterator<T> class) to avoid allocating enumerators in common cases.

Technically LinqAF could achieve these optimizations just by duck typing, but for ease of development and some type checking peace of mind I have all enumerables implement the IStructEnumerable interface and all enumerators implement IStructEnumerator.

There are lots of consequences (and some additional advantages) to having all these different structs flying around.  The first big consequence is that LinqAF cannot implement most operations as extension methods, instead I had to use instance methods.  This is because generic parameter inferencing ignores type constraints, and I have lots of : struct and : SomeInterface constraints to ensure the correct guarantees are upheld.

Another consequence is that I could not have parameters of a single enumerable type, I had to provide overloads for each of the concrete operator types to avoid boxing.  This meant that rather than each enumerable struct having a single Concat method they had sixty-nine(!), with similar increases in method counts for other operators.  All told LinqAF ended up being 2,500+ files and 100,000+ methods, which is much much larger than the official LINQ-to-Objects implementation.  Naturally creating all of those files and methods by hand would have been masochistic, I cover exactly how I generated them in the next post.

Using structs also introduces a new kind of “uninitialized” error.  In C# you cannot hide a struct’s parameterless constructor (technically you cannot implement a structs default constructor either, but that matters less), which means that it’s possible to create a non-null but also not valid enumerable to use with LinqAF.  Since LINQ’s semantics require failing-fast on invalid arguments, I add an IsDefaultValue method to all enumerables to detect this scenario.

There are advantages to having all these different types – I can avoid dynamic dispatch (all calls on structs are non-virtual calls), and allow more optimization opportunities.  LINQ-to-Objects has some optimizations for when the actual type of an IEnumerable<T> is a List, array, WhereEnumerableIterator, and so on – but relies on runtime type checking.  LinqAF can perform many of these same optimizations, but statically – for example WhereEnumerable’s Select method returns a WhereSelectEnumerable and it’s Where method builds up a chain of predicates (both optimizations avoid creating pointless additional enumerables and enumerators).

What’s next?

I cover how LinqAF is structured and how the final (massive) code is produced.