Overthinking CSV With Cesil: Writing Known Types

My last two posts have covered deserializing with Cesil, the subsequent two will cover serialization. This post will specifically dig into the case where you know the types involved at compile time, while the next one will cover the dynamic type case. If you’ve read the previous posts on read operations hopefully a lot of this will seem intuitive, just in reverse.

Again, CesilUtils exposes a bunch of utility methods – this time with names like WriteXXX. Variants exist for single row, multiple row, synchronous, asynchronous, and “straight to a file” operations. Just like with reading, CesilUtils doesn’t allow you to reuse an IBoundConfiguration<TRow> nor does it expose the underlying I(Async)Writer<TRow> but is convenient when performance and customization aren’t of paramount importance.

As with reading, maximum performance and flexibility is found in using either IWriter<TRow> or IAsyncWriter<TRow> interfaces obtained from an IBoundConfiguration<TRow> created via Configuration.For<TRow>. Creating configurations is mildly expensive, so caching and reusing them can be beneficial.

The writer interfaces expose methods to do the following:

  • Write a collection of rows with WriteAll(Async)
    • The sync version accepts an IEnumerable<T>
    • The async version can take either an IEnumerable<T> or an IAsyncEnumerable<T>
  • Write a single row with Write(Async)
  • Write a comment with WriteComment(Async)
    • If a comment contains a row ending sequence of characters, it will be split into multiple comments automatically

Mapping a type to a set of columns, the order of the those columns, and the conversion of the values of those columns to text is done with the ITypeDescriber registered on the Options provided to Configuration.For<TRow> or the method on CesilUtils (by default, this is an instance of DefaultTypeDescriber). When an IBoundConfiguration<TRow> is created ITypeDescriber.EnumerateMembersToSerialize is invoked once and the returned SerializableMembers detail how Cesil will map a TRow instance to a set of text columns.

Specifically a SerializableMember details

  • The name of column, which may be written as part of a header row
  • The Getter to use to obtain a value from a TRow instance
  • An (optional) ShouldSerialize to control, per-row, whether a column should be included
  • The Formatter used to turn the columns value into a sequence of characters
  • Whether or not to include a column if it has the default value for it’s type
    • Cesil uses Activator.CreateInstance to obtain a default instance of ValueTypes, and use null as the default value for reference types

The order of columns is taken from the order they are yielded by the IEnumerable<SerializableMember> returned by ITypeDescriber.EnumerateMembersToSerialize.

There is quite a lot of flexibility in how Getters, ShouldSerializes, and Formatters can be created. They will be covered in detail in a later post.

There’s less internal state being managed when Cesil is writing in comparison to when it is reading, so there are no fancy state machines or lookup tables. The most interesting part is NeedsEncodeHelper which is used to check for characters that would require escaping, which makes use of the X64 intrinsics supported in modern .NET (provided your processor supports them).

There are some minor additional details to keep in mind while writing with Cesil:

  • All XXXAsync() methods try to make as much progress as they can without blocking, they don’t just yield to yield.
  • All XXXAsync() methods do take an optional CancellationToken, and pass it down to the underlying stream. CancellationTokens are checked at reasonable intervals, but no guarantees are made about how often.
  • If you try to write a comment without having configured your Options with a comment character, an exception will be raised.
  • If you try and write a value that would require escaping without having configured your Options with a way to start and end escaped values, an exception will be raised.
    • Options.Default has ” as it’s escape start and stop characters.
  • If you try to write a value that includes the escape start and stop character, but have not configured your Options with an escape character, an exception will be raised.

And that about covers how to write static types with Cesil.

The Open Question for this post is a return to an earlier one, but with a particular focus on writing: Is there anything missing from IWriter(Async) that you’d expect to be supported in a modern .NET CSV library?

This question has already led to some changes, which will appear in the next release of Cesil – adding comment writing methods that take ReadOnlySpan<char> and ReadOnlyMemory<char> parameters, clarifying some parameter names, and returning counts of the number of rows written from the enumerable taking write methods.

Remember that, as part of the sustainable open source experiment I detailed in the first post of this series, any commentary from a Tier 2 GitHub Sponsor will be addressed in a future comment or post. Feedback from non-sponsors will receive equal consideration, but may not be directly addressed.

In my next post I’ll cover how Cesil supports writing dynamic types, those not known at compile time. As you might expect from reading static and dynamic types, it is very similar to how static types are read…