Committing A Horrible Atrocity Against C#

Title from the other side of the tracks: Bringing Sanity To C# Conditions

Have you played with Roslyn?  It’s Microsoft’s “compiler as a service” framework coming to a future .NET release, with a CTP having been available for awhile now.

The focus seems to be on making it really easy to do the sorts of source transformations Visual Studio (and heavy hitting plugins like ReSharper) specialize in.  Better code generation, static analysis, and what-have-you tooling will be probably be nice result as well.

But you can do some really evil/awesome things as well thanks to Roslyn failing very gracefully in the presence of malformed code, by design.

For example, have you ever wanted C# to be truthy?

I’m told my sense of fun is a little warped.

Truthiness is coalescing certain values to true/false that are not strictly boolean; making null a stand in for false, for example.  This is a common feature in dynamic languages, Python, Ruby, and of course Javascript all have some notion of truthiness.

I personally kind of hate truthiness (I’d appreciate it as an explicit unary operator though), but I realize this is a religious position.  I also hate mayonnaise, but don’t begrudge it’s existence.

But if you do like truthiness, or like writing Fun code like I do, Roslyn makes it possible to cram it into C#.

I’m choosing to define Truthiness In C# as:

  • null, the empty string, and the defaults of ValueTypes are false
  • everything else is true
  • only available in control statements, not as a type coercion

This definition is really easy to change, for reasons that will become apparent later, and is basically Python’s take (minus collections and dictionaries) so it’s not too crazy.

I want this to be a build step, so the flow for compiling with truthiness is:

  1. Take a (possibly invalid due to truthy statements) C# program
  2. Find all the bits that assume truthiness
  3. Replace those bits with equivalent and valid vanilla C#
  4. Hand the results off to the actual C# compiler

Roslyn actually can emit assemblies (in theory, I haven’t tried in the CTP), but for the sake of brevity I’m choosing to stop the process early and write new .cs files to disk.

Finding bits that assume truthiness is quite simple, because Roslyn doesn’t just blow up on an invalid program; it does it’s best to give you everything that can be gleaned from malformed code, just like Visual Studio.

This let’s us use a SyntaxWalker to visit each node of the AST for our dodgy program and still mostly make sense of it.  If we encounter anything where we expect a conditional (inside an if statement, for example) that isn’t provably a boolean, then we’ve probably found a truthy usage.  We don’t do anything with that knowledge yet, we just stash it away for later.

The SyntaxWalker that does this is surprisingly simple.  Once you’ve got a SemanticModel, figuring out the type of an expression is trivial.

private bool IsBool(ExpressionSyntax exp)
{
  var info = Model.GetSemanticInfo(exp);
  var knownType = info.Type;

  return
    knownType != null &&
    knownType.SpecialType == Roslyn.Compilers.SpecialType.System_Boolean;
}

Once we’ve walked all source, finding a list of all truthy things, we can do the actual truthy implementation.  The easiest way to do this is to wrap all truthy values in a call to method that implements the above rules.  A more “correct” way would be to transform the truthy expressions into proper boolean ones, saving a method call and giving the compiler more information to work with.

I naturally went with the easy way, adding this method to every class that uses truthy expressions.  This also makes it very easy to change the truthiness rules, as I alluded to earlier.

private static bool __Truthy(object o)
{
  if (o == null || (o is string && (string)o == string.Empty)) return false;
  var type = o.GetType();
  if (type.IsValueType)
  {
    return !o.Equals(Activator.CreateInstance(type));
  }
  return true;
}

There’s a bit of finese in the actual wrapping of expression in calls to __Truthy.  If an expression contains a sub-expression that is itself truthy we want to replace the sub-expression first and re-walk the SyntaxTree.  This is because whether or not an expression is truthy is dependent on it’s sub-expressions: (true || “”) is truthy, but (true || __Truthy(“”)) is not essentially.  There’s also a little bit of work spent detecting if something is already being passed to __Truthy, so we don’t end up with __Truthy(__Truthy(“”)) or similar; this is mostly caused by ternary conditionals just being weird relatively speaking.

The full project on Github is an executable that transforms a whole Visual Studio .csproj, in a rather haphazard fashion.  You need the referenced assemblies to get a SemanticModel, which I’m extracting from .csprojs for convenience.

To illustrate the transformation, here’s some truthy C#.

static void Main(string[] args)
{
  UserDefined ud = new UserDefined { ABoolProp = false, AStringProp = "" };
  bool a = true, b = false;

  if ("hello world") Console.WriteLine("String literals are truthy");
  if (1) Console.WriteLine("int literals are truthy");
  if (!null) Console.WriteLine("null is falsy");
  if (!ud.AStringProp) Console.WriteLine("Member access works");
  if (ud.AStringProp.Length || !ud.ABoolProp) Console.WriteLine("Chained member access works");

  if (a || b || 0) Console.WriteLine("Normal bools aren't mangled");
  if (true) Console.WriteLine("Normal literals aren't mangled");

  string str = a ? "hello" : "world";

  if (str == "hello") Console.WriteLine("Normal ternary conditionals aren't mangled");

  for (int i = 0; i < 3 && (ud.AnIntMethod() ? "hello" : "world"); i++)
  {
    Console.WriteLine(i + " complicated condition");
  }
}

This gets transformed into

static void Main(string[] args)
{
  UserDefined ud = new UserDefined { ABoolProp = false, AStringProp = "" };
  bool a = true, b = false;

  if (__Truthy("hello world")) Console.WriteLine("String literals are truthy");
  if (__Truthy(1)) Console.WriteLine("int literals are truthy");
  if (!__Truthy(null)) Console.WriteLine("null is falsy");
  if (!__Truthy(ud.AStringProp)) Console.WriteLine("Member access works");
  if (__Truthy(ud.AStringProp.Length) || !ud.ABoolProp) Console.WriteLine("Chained member access works");

  if (a || b || __Truthy(0)) Console.WriteLine("Normal bools aren't mangled");
  if (true) Console.WriteLine("Normal literals aren't mangled");

  string str = a ? "hello" : "world";

  if (str == "hello") Console.WriteLine("Normal ternary conditionals aren't mangled");

  for (int i = 0; i < 3 && (__Truthy(__Truthy(ud.AnIntMethod()) ? "hello" : "world")); i++)
  {
    Console.WriteLine(i + " complicated condition");
  }
}

Notice that spacing is maintained, Roslyn round trips that sort of thing which is very nice.

And of course, the code runs and outputs the expected text:

String literals are truthy
int literals are truthy
null is falsy
Member access works
Chained member access works
Normal bools aren't mangled
Normal literals aren't mangled
Normal ternary conditionals aren't mangled
0 complicated condition
1 complicated condition
2 complicated condition

There are, as always, caveats.  Lots in this case, as Roslyn is still a CTP and there are a number of C# features it doesn’t handle yet.  Using var or lambdas will mess things right up, and I’m sure there are some out right bugs in both Roslyn and my dinky code.

But isn’t it cool that it’s possible to abuse C# even this much already?


Follow

Get every new post delivered to your Inbox.