Overthinking CSV With Cesil: .NET 5
Posted: 2021/05/26 Filed under: code | Tags: cesil Comments Off on Overthinking CSV With Cesil: .NET 5The latest release of Cesil, 0.9.0, now targets .NET 5. Along with some free performance, .NET 5 and C# 9 bring some new features which Cesil now either uses or supports.
Free Performance
It’s no secret that each new .NET Core release has come with some performance improvements, and even though .NET 5 has dropped “Core” it hasn’t dropped that trend. My own benchmarking suggests about 10%, in libraries like Cesil.
Of course, there are more performance improvements available if you’re willing to make code changes. The biggest once for Cesil was…
The Pinned Object Heap
The Pinned Object Heap (POH) is a new GC heap introduced in .NET 5, and as the name suggests it’s distinguishing feature is that everything allocated in it is pinned. For those unfamiliar, in .NET an allocation can be pinned (via a fixed statement, a GCHandle, etc.) which prevents the garbage collector from moving it during a GC pass.
As you’d expect, pinning an allocation complicates GC and can have serious performance implications which means the historical advice has been to avoid holding pins for long periods of time. For code, like Cesil, which is heavily asynchronous this is extra difficult as “a long time” can happen at any await expression. Of course, pinning isn’t free so you don’t want to unpin memory that you’re going to immediately repin either.
Before 0.9.0, Cesil had quite a lot of code in it that managed pinning and unpinning the bits of memory used to implement parsing CSV files. By using the POH, I was able to remove all this code – keeping the performance benefits of some pointer-heavy code without gumming up the GC. Naturally, since the fastest code is the code that never runs, removing this code also improved Cesil’s overall performance.
A word of caution, almost all allocations should remain off the POH. Cesil allocates very little on the POH, one constant chunk for state machine transition rules, and a small amount of memory for each IBoundConfiguration (instance of which should be reused). Accordingly, during normal read & write operations Cesil won’t allocate anything new on the POH.
Naturally, performance isn’t the only improvement in .NET 5 there are also new features like…
Native (Unsigned) Integers
Native integers have always existed in .NET, but C# 9 exposes them more conveniently as nint and nuint. These are basically aliases for IntPtr and UIntPtr, but those were somewhat awkward to use in earlier C# versions.
Cesil now supports reading and writing nint and nuint, including when using Cesil.SourceGenerator. Extra care must be taken when using them, as their legal range depends on the bittage of the reading and writing processes.
Init-Only Setters
C# has supported get-only properties and private setters for a while, however it was not possible to idiomatically express “properties which may only be set at initialization time”. C# 9 adds init; as a new syntax to allow this, a great addition in my opinion.
At runtime, init-only setters required no changes in Cesil. They do greatly complicate source generators, however, as it is now necessary to set all init-only properties at the same time. This is similar to constructor parameters except that constructor parameters are always required to be present, but properties are inherently optional.
Record Types
The biggest new addition to C#, record types are a terser way to declare classes that primarily encapsulate data. They’re a simple idea that actually has a lot of complication behind the scenes, primarily around how they interact with inheritance.
To be blunt, while Cesil does support record types I’d discourage doing anything with them beyond the simplest declarations. Basically anything with inheritance, non-primary constructors, or additional properties is asking for trouble. Cesil does what it can to choose reasonable defaults, but behavior can be surprising.
For example, given this declaration:
record A(int Foo);
record B(string Bar) : A(Bar.Length);
Deserializing a string like “Foo,Bar\r\n123,hello” to an instance of B will initialize Foo twice, leaving it equal to 123. Meanwhile deserializing “Bar\r\nhello” will initialize Foo once to 5. If you de-sugar what records are actually doing this “makes sense” as Foo is set in A’s constructor and then again when B is initialized (which is legal, as Foo is init-only), but I’d argue it’s unintuitive.
[DoesNotReturn] And Coverage
Pretty minor for most, but really big for Cesil, is that .NET now ships with a DoesNotReturnAttribute. Along with some changes to Coverlet, this allowed a lot of code that was in place to reduce noise in coverage statistics to be removed.
And that’s about it for .NET 5 and Cesil. .NET 6 is already in preview, promising some more performance improvements and exciting new types to support, so I’ve even more improvements for Cesil to look forward to.
I’m nearing the end of the planned posts in this series – next time I’ll go over any standing open questions and their resolutions, as I prepare for the Cesil 1.0 release.