Overthinking CSV With Cesil: Reading Dynamic Types

In my last post I went over how to use Cesil to deserialize to known, static, types. Since version 4.0, C# has also had a notion of dynamic types – ones whose bindings, members, and conversions are all resolved at runtime – and Cesil also supports deserializing into these.

In 2020, supporting dynamic isn’t exactly a given – dynamic is relatively rare in the .NET ecosystem, the big “Iron” use cases in 2015 (dynamic languages running on .NET) are all dead as far as I can tell, and the static-vs-dynamic-typing pendulum has been swinging back towards static with the increasing popularity of languages like Go, Rust, and TypeScript (even Python supports type annotations these days). All that said, I still believe there are niches in C# well served by dynamic – “quick and dirty” data loading without declaring types, and loading heterogeneous data. These are both niches Cesil aims to support well, and therefore dynamic support is a first-class feature.

Part of being a first-class feature means that all the flexibility and ease of use from static types is also present when working with dynamic. There aren’t any new types or interfaces, just use Configuration.ForDynamic() instead of Configuration.For<TRow>(), Options.DynamicDefault (which assumes a header row is present) instead of Options.Default (which will detect if a header row is present or not, which isn’t possible with unknown types), and the EnumerateDynamicXXX() methods on CesilUtils. The same readers with the same methods are all available, only now instead of some concrete T you’ll get a dynamic back. And, while dynamic operation does impose additional overhead, Cesil still aims for dynamic operations to be reasonably performant – within a factor of 3 or so of their static equivalent.

Regardless of the Options used, the dynamic rows returned by Cesil always support:

  • Casting to IDisposable
  • Calling the Dispose() method
  • Get accessor with an int (ie. someRow[0]), which returns a dynamic cell
    • This will throw if the int is out of bounds
  • Get accessor with a column name (ie. someRow[“someColumn”]), which returns a dynamic cell
    • If there was no header row present when reading (or if the column name is not found), this will throw
  • Get accessor with an Index (ie. someRow[^1]), which returns a dynamic cell
    • This will throw if the Index is out of bounds
  • Get accessor with a Range (ie. someRow[1..2]), which returns a dynamic row
    • This will throw if the Range is out of bounds
  • Get accessor with a ColumnIdentifier (ie. someRow[ColumnIdentifier.Create(3)]), which returns a dynamic cell

Likewise, regardless of the Options used, dynamic cells (obtained by indexing a dynamic row per above) always support casting to IConvertible. IConvertible is a temperamental interface, so Cesil’s implementation is limited – it doesn’t support non-null IFormatProviders, and makes a very coarse attempt at determining TypeCode. Basically, Cesil does just enough for the various methods on Convert to work “as you’d expect” for dynamic cells.

Just like with static deserialization, the ITypeDescriber on the Options used to create the IBoundConfiguration<TRow> controls how values are mapped to types. The differences are that dynamic conversions are discovered each time they occur (versus once, for static types) and conversion decisions are deferred until a cast (versus happening during reading, for static types). Dynamic deserialization does not allow custom InstanceProviders (as the dynamic backing infrastructure is provided directly by Cesil) – however the XXXWithReuse() methods on I(Async)Reader<TRow> still allow for some control over allocations.

Customization of dynamic conversions can be done with the DynamicRowConverter type (for rows) and the ITypeDescriber.GetDynamicCellParserFor() method (for cells). I’ll dig further into these capabilities in a later post. Out of the box, the DefaultTypeDescriber (used by Options.DynamicDefault) implements the conversions you would expect.

Namely, for dynamic rows Cesil’s defaults allow conversion to:

  • Object
  • Tuples
    • Rows with more than 7 columns can be mapped to nested Tuples using TRest generic parameter
  • ValueTuples, including those with a TRest parameter
    • Rows with more than 7 columns can be mapped to nested ValueTuples using TRest generic parameter
  • IEnumerable<T>
    • Each cell is lazily converted to T
  • IEnumerable
    • Each cell becomes an object, with no conversion occurring
  • Any type with a constructor taking the same number of parameters as the row has columns
    • Each cell is converted to the expected parameter type
  • Any type with a constructor taking zero parameter, provided the row has column names
    • Any properties (public or private, static or instance) whose name matches a column name will be set to the column’s value

If no conversion is possible, Cesil will raise an exception. If a conversion is chosen that requires converting cells to static values, those conversions may also fail and raise exceptions.

For dynamic cells, Cesil’s defaults allow conversion to:

As with rows, finding no conversion or having a conversion fail will cause Cesil to raise an exception.

And that covers the why and what of dynamic deserialization in Cesil. This post leaves me with two Open Questions:

  1. Are there any useful dynamic operations around reading that are missing from Cesil?
  2. Do the conversions provided by the DefaultTypeDescriber for dynamic rows and cells cover all common use cases?

As before, I’ve opened two issues to gather long form responses.  Remember that, as part of the sustainable open source experiment I detailed in the first post of this series, any commentary from a Tier 2 GitHub Sponsor will be addressed in a future comment or post. Feedback from non-sponsors will receive equal consideration, but may not be directly addressed.

Next time I’ll dive into the write operations Cesil supports, starting with static types.