LinqAF: Generating way too much code

Posted: 2018/01/18 | Author: kevinmontrose | Filed under: code |Comments Off

This is part of a series on LinqAF, you should start with the first post.

Why use codegen for LinqAF?

As mentioned in the previous article, LinqAF defines a type per operator and has to define many instance methods on each type. When you add it all up, replacing LINQ-to-Objects requires ~50 enumerable structs each of which exposes ~1,300 methods.

So yeah, I didn’t do it by hand.

The C# compiler has been re-implemented and open-sourced, resulting in the Roslyn project. This makes it really easy to manipulate and reason about C# code, which I used to fashion a templating system to generate LinqAF. Specifically, I was able to keep the templates mostly “correct” C# that benefited from type checking and Visual Studios navigation faculties.

Concretely:

Every LINQ operator has an interface defined (IConcat, IMin, ISelectMany, etc.)
Each logically distinct method gets a CommonImplementation… implementation (So Concat gets a few methods, Min gets a lot, SelectMany gets several)
A template is defined that implements the interfaces and calls the appropriate CommonImplementation methods.
- Here dynamic comes into play, as many templates elide generic parameters (these parameters are filled in later)
Similarly templates are defined for various extension methods
- Inter-operating with IEnumerable<T> and other collections defined in the BCL
- Average
- Concat for enumerables with two generic parameters
- Except for enumerables with two generic parameters
- Intersect for enumerables with two generic parameters
- Max
- Min
- SequenceEqual for enumerables with two generic parameters
- Sum
- Union for enumerables with two generic parameters
Overrides that replace particular generated methods with other, static, C# code
- This allows LinqAF to leverage static type information to avoid doing pointless work
- For example, almost all operators on the EmptyEnumerable are replaced with implementations that elide most work
A separate project, LinqAF.Generator, processes all these templates to generate the final code
- There are 70 different steps, most corresponding to a particular operator
- Roslyn is also used to remove unused using statements and nicely format the final code

This code generation approach let me keep the “actual” implementation of LinqAF under 20,000 lines of code.

What are the downsides to code generation?

Code generation is typically a worse idea than a cat

The biggest one is that iteration time is much worse, and it got worse as more operators were implemented. This is mitigated somewhat by sharing common implementations in (…) CommonImplementation, allowing bug fixes to be copy/pasted for quick testing; bugs in the code generation parts are still slow to fix.

While limited in scope, the reliance on dynamic in certain places also means that the LinqAF project can compile even if there are actually type errors. This was most common when adding new operators, and commenting out the other operators let me decrease the iteration time considerably.

Code generation is also harder to understand and setup as this post’s existence demonstrates. Thankfully the Roslyn project, and it’s availability on Nuget, makes code generation considerably less difficult – using something like T4 or outright concatenation would have been even worse.

What’s next?

In what will probably be the longest post in the series, I cover most of the operators and the various optimizations that LinqAF has for them.

Kevin Montrose

LinqAF: Generating way too much code

Why use codegen for LinqAF?

What are the downsides to code generation?

What’s next?

Related

Archive