This is part of a series on LinqAF, you should start with the first post.
How is LINQ-to-Objects replaceable?
The C# language has a number of keywords that are used with any LINQ provider. Fortunately for me, these keywords are equivalent to methods calls. In other words:
from myItem in myList where myPredicate(myItem) is equivalent to
myList.Where(myItem => myPredicate(myItem)). These method calls are unqualified and untyped, so all that matters is that some code somewhere provides a definition for
Where, System.Linq isn’t privileged nor is IEnumerable<T>.
foreach statement makes untyped and unprivileged calls to
GetEnumerator(). The BCL uses this trick extensively to reduce allocations when enumerating over collections in System.Collections.Generic (for example see how List<T> has a GetEnumerator() method that returns a struct).
How does LinqAF avoid allocating?
The biggest trick is that all operations in LinqAF are represented as structs, whereas LINQ-to-Objects always returns an IEnumerable<T> or IOrderedEnumerable<T>. This means that LinqAF is going to perform 0 allocations for most LINQ operations (the exceptions are covered in a later post) versus LINQ-to-Objects minimum of 1 allocation.
LinqAF also implements custom struct enumerators for each enumerable which can result in fewer allocations. The savings are less for these enumerators because LINQ-to-Objects uses some tricks (like this internal Iterator<T> class) to avoid allocating enumerators in common cases.
Technically LinqAF could achieve these optimizations just by duck typing, but for ease of development and some type checking peace of mind I have all enumerables implement the IStructEnumerable interface and all enumerators implement IStructEnumerator.
There are lots of consequences (and some additional advantages) to having all these different structs flying around. The first big consequence is that LinqAF cannot implement most operations as extension methods, instead I had to use instance methods. This is because generic parameter inferencing ignores type constraints, and I have lots of : struct and : SomeInterface constraints to ensure the correct guarantees are upheld.
Another consequence is that I could not have parameters of a single enumerable type, I had to provide overloads for each of the concrete operator types to avoid boxing. This meant that rather than each enumerable struct having a single
Concat method they had sixty-nine(!), with similar increases in method counts for other operators. All told LinqAF ended up being 2,500+ files and 100,000+ methods, which is much much larger than the official LINQ-to-Objects implementation. Naturally creating all of those files and methods by hand would have been masochistic, I cover exactly how I generated them in the next post.
Using structs also introduces a new kind of “uninitialized” error. In C# you cannot hide a struct’s parameterless constructor (technically you cannot implement a structs default constructor either, but that matters less), which means that it’s possible to create a non-null but also not valid enumerable to use with LinqAF. Since LINQ’s semantics require failing-fast on invalid arguments, I add an
IsDefaultValue method to all enumerables to detect this scenario.
There are advantages to having all these different types – I can avoid dynamic dispatch (all calls on structs are non-virtual calls), and allow more optimization opportunities. LINQ-to-Objects has some optimizations for when the actual type of an IEnumerable<T> is a List, array, WhereEnumerableIterator, and so on – but relies on runtime type checking. LinqAF can perform many of these same optimizations, but statically – for example WhereEnumerable’s Select method returns a WhereSelectEnumerable and it’s Where method builds up a chain of predicates (both optimizations avoid creating pointless additional enumerables and enumerators).
What is LINQ-to-Objects?
LINQ is a collection of methods provided by .NET that is used to perform a variety of operations like filtering, mapping, grouping, ordering, an so on. There are a multitude of LINQ providers but one of the most commonly used is LINQ-to-Objects which works on plain old .NET objects in memory (others include LINQ-to-Entities, LINQ-to-XML, and the defunct LINQ-to-SQL).
Microsoft’s implementation is done as a series of extension methods to IEnumerable. You can browse the source on Github if you’re curious, but I personally believe the best explanation of this pattern is Jon Skeet’s Edulinq series from 2011.
Many (but not all) of LINQ’s operations can be done without allocating, and when I started LinqAF those were all I intended to provide (LinqAF stood for “Linq: Allocation Free”). When I decided to extend LinqAF to include reversing, sorting, grouping, and the various ToXXX operations (all of which require allocations) I started aiming for “a low number” of allocations.
LinqAF also includes the ability to re-use previously allocated collections, which I’ll expand on in a later post.
Give me the TL;DR
- LinqAF is available on Github and Nuget
- Most operations perform no allocations
- All operations (including foreach statements) work as you’d expect
- It’s “type inference compatible” with System.Linq
- If you’ve written out explicit types, you’ll probably need to make source changes to use LinqAF
- There’s one exception with SelectMany delegates which is covered in a later post
- The whole endeavor qualifies as a Bad Idea™, but a fun one