Extending Type Inferencing in C#

Posted: 2012/06/10 | Author: kevinmontrose | Filed under: code |Comments Off

A while ago I used the first Roslyn CTP to hack truthiness into C#. With the second Roslyn CTP dropping on June 5th, now pretty close to feature complete, it’s time think up some fresh language hacks.

Roslyn What Now?

For those unfamiliar, Roslyn is Microsoft’s “Compiler as a Service” for the VB.NET and C# languages. What makes Roslyn so interesting is that it exposes considerably more than “Parse” and “Compile” methods; you also have full access to powerful code transformations and extensive type information.

Roslyn is also very robust in the face of errors, making the most sense it can of malformed code and providing as complete a model as possible. This is very handy for my evil purposes.

Picking A Defect

Now I happen to really like C#, it’s a nice, fast, fairly rapidly evolving, statically typed language. You get first class functions, a garbage collector, a good type system, and all that jazz. Of course, no language is perfect so I’m going to pick one of my nits with C# and hack up a dialect that addresses it using Roslyn.

The particular defect is that you must specify return types in a method declaration, this is often pointless repetition in my opinion.

Consider this method:

static MvcHtmlString ToDateOnlySpanPretty(DateTime dt, string cssClass)
{
  return MvcHtmlString.Create(String.Format(@"<span title=""{0:u}"" class=""{1}"">{2}</span>", dt, cssClass, ToDateOnlyStringPretty(dt, DateTime.UtcNow)));
}

Is that first MvcHtmlString really necessarily? Things get worst when you start returning generic types, oftentimes I find myself writing code like:

Dictionary<string, int> CalcStatistics()
{
  var result = new Dictionary<string, int>();
  // ...
  return result;
}

Again, the leading Dictionary<string, int> is really not needed the type returned is quite apparent so we’re really just wasting key strokes there.

This pointless repetition was addressed for local variables in C# 3.0 with the var keyword. With Roslyn, we can hack up a pre-processor that allows var as a method’s return type. The above would become the following:

var CalcStatistics()
{
  var results = new Dictionary<string, int>();
  // ...
  return results;
}

The Rules

Wanton type inferencing can be a bit dangerous, it makes breaking contracts pretty easy to do for one. So I’m imposing a few constraints on where “var as a return” can be used, for the most part these are arbitrary and easily changed.

var can only be used on non-public and non-protected methods
if a method returns multiple types, a type will be chosen for which all returned types have an implicit conversion with the following preferences
- classes before interfaces
- derived types before base types
- generic types before non-generic types
- in alphabetical order
- with the exceptions that Object is always considered last and IEnumerable (and IEnumerable<T>) are considered before other interfaces
a method with empty returns will become void

The last rule may be a little odd as C# doesn’t have a notion of a “void type” really, which is something of a defect itself in my opinion. Which type is chosen for a return is well-defined so it’s predictable (always a must in a language feature) and attempts to match “what you meant”.

I made one more handy extension, which is allowing var returning methods to return anonymous types. You can sort of do this now either passing around Object or using horrible grotty hacks (note, don’t actually do that); but since there’s no name you can’t do this cleanly. Since var returning methods don’t need names, I figured I might as well address that too.

Let’s See Some Code

Actually loading a solution and parsing/compiling is simple (and boring), check out the source for how to do it.

The first interesting bit is finding all methods that are using “var” as a return type.

// eligibleMethods is a List<MethodDeclarationSyntax>
var needsInferencing =
 eligibleMethods
 .Where(
  w => (w.ReturnType is IdentifierNameSyntax) &&
  ((IdentifierNameSyntax)w.ReturnType).IsVar
 ).ToList();

This literally says “find the methods that have return type tokens which are ‘var'”. Needless to say, this would be pretty miserable to do from scratch.

Next interesting bit, we grab all the return statements in a method and get the types they return.

var types = new List<TypeInfo>();
// returns is a List<ReturnStatementSyntax>
foreach (var ret in returns)
{
  var exp = ret.Expression;

  if (exp == null)
  {
    types.Add(TypeInfo.None);
    continue;
  }

  var type = model.GetTypeInfo(exp);
  types.Add(type);
}

Note the “model.GetTypeInfo()” call there, that’s Roslyn providing us with detailed type information (which we’ll be consuming in a second) despite the fact that the code doesn’t actually compile successfully at the moment.

We move onto the magic of actually choosing a named return type. Roslyn continues to give us a wealth of type information, so all possible types are pretty easy to get (and ordering them is likewise simple).

var allPossibilities = info.SelectMany(i => i.Type.AllInterfaces).OfType<NamedTypeSymbol>().ToList();
allPossibilities.AddRange(info.Select(s => s.Type).OfType<NamedTypeSymbol>());
// info is an IEnumerable<TypeInfo>
foreach (var i in info)
{
  var @base = i.Type.BaseType;
  while (@base != null)
  {
    allPossibilities.Add(@base);
    @base = @base.BaseType;
  }
}

Rewriting the method with a new return type is a single method call, and replacing the “from source” method with our modified one is just as simple.

And that’s basically it for the non-anonymous type case.

Anonymous Types

For returning anonymous types, everything up to the “rewrite the method” step is basically the same. The trouble is, even though we know the “name” of the anonymous type we can’t use it. What we need to do instead is “hoist” the anonymous type into an actual named type and return that instead.

This is kind of complicated, but you can decompose it into the following steps:

Make a note of returned anonymous types
Rewrite methods to return a new type (that doesn’t exist yet)
Rewrite anonymous type initializers to use the new type
Create the new type declaration

Steps #1 and #2 are easy, Roslyn doesn’t care that your transformation doesn’t make sense yet.

Step #3 requires a SyntaxRewriter and a way to compare anonymous types for equality. The rewriter is fairly simple, the syntax for an anonymous type initializer and a named one vary by very little.

Comparing anonymous types is a little more complicated. By the C# spec, anonymous are considered equivalent if they have the same properties (by name and type) declared in the same order.

Roslyn gives us the information we need just fine, so I threw a helper together for it:

internal static bool AreEquivalent(TypeSymbol a, TypeSymbol b, Compilation comp)
{
  var aMembers = a.GetMembers().OfType<PropertySymbol>().ToList();
  var bMembers = b.GetMembers().OfType<PropertySymbol>().ToList();
  if (aMembers.Count != bMembers.Count) return false;
  for (var i = 0; i < aMembers.Count; i++)
  {
    var aMember = aMembers[i];
    var bMember = bMembers[i];
    if (aMember.Name != bMember.Name) return false;
    if (aMember.DeclaredAccessibility != bMember.DeclaredAccessibility) return false;
    var aType = aMember.Type;
    var bType = bMember.Type;
    var aName = aType.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat);
    var bName = bType.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat);
    if (aName == bName) continue;
    var conv = comp.ClassifyConversion(aType, bType);
    if (!conv.IsIdentity) return false;
  }
  return true;
}

Notice that Roslyn also gives us details about what conversions exist between two types, another thing that would be absolutely hellish to implement yourself.

The final step, adding the new type, is the biggest in terms of code although it’s not really hard to understand (I also cheat a little). Our hoisted type needs to quack enough like an anonymous to not break any code, which means it needs all the expected properties and to override Equals, GetHashCode, and ToString.

This is all contained in one large method. Rather than reproduce it here, I’ll show what it does.

Take the anonymous type:

new
{
A = "Hello",
B = 123,
C = new Dictionary<string, string>()
}

This will get a class declaration similar to

internal class __FTI88a733fde28546b8ae4f36786d8446ec {
  public string A { get; set; }
  public int B { get; set; }
  public global::System.Collections.Generic.Dictionary<string, string> C { get; set; }
  public override string ToString()
  {
    return new{A,B,C}.ToString();
  }
  public override int GetHashCode()
  {
    return new{A,B,C}.GetHashCode();
  }
  public override bool Equals(object o)
  {
    __FTI88a733fde28546b8ae4f36786d8446ec other = o as __FTI88a733fde28546b8ae4f36786d8446ec;
    if(other == null) return new{A,B,C}.Equals(o);
    return
      (A != null ? A.Equals(other.A) :
      (other.A != null ? other.A.Equals(A) : true )) &&
      B == other.B &&
      (C != null ? C.Equals(other.C) : (other.C != null ? other.C.Equals(C) : true ));
  }
}

The two big cheats here are the lack of a constructor (so it’s technically possible to modify the anonymous type, this is fixable but I don’t think it’s needed for a proof-of-concept) and using the equivalent anonymous type itself to implement the required methods.

Conclusion

I’m pretty pleased with Roslyn thus far, it’s plenty powerful. There are still a bunch of limitations and unimplemented features in the current CTP though, so be aware of them before you embark on some recreational hacking.

In terms of complaints, some bits are a little verbose (though that’s gotten better since the first CTP), documentation is still lacking, and there are some fun threading assumptions that make debugging a bit painful. I’m sure I’m Doing It Wrong™ in some places, so some of my complaints may be hogwash; I expect improvements in each subsequent release as well.

If it wasn’t apparent, the code here is all proof-of-concept stuff. Don’t use in production, don’t expect it to be bug free, etc. etc.

You can check out the whole project on Github.

Kevin Montrose