Sigil: Adding Some (More) Magic To IL
Posted: 2013/02/14 Filed under: code | Tags: sigil 5 CommentsA nifty thing you can do in .NET is generate bytecode (properly Common Intermediate Language [CIL], formerly Microsoft Intermediate Language [MSIL], commonly called just IL) on the fly. Previously I’ve used it to do dumb things with strings, and build a serializer on top of protobuf-net. Over at Stack Exchange it’s used in small but critical parts of our API, in our caching layer via protobuf-net, and in our micro-ORM Dapper.
The heart of IL generation is the ILGenerator class which lets you emit individual opcodes and keeps track of labels, locals, try/catch/finally blocks, and stack depth.
To illustrate .NET’s built-in IL generation, here’s how you’d add 1 & 2:
var method = new DynamicMethod("AddOneAndTwo", typeof(int), Type.EmptyTypes); var il = method.GetILGenerator(); il.Emit(OpCodes.Ldc_I4, 1); il.Emit(OpCodes.Ldc_I4, 2); il.Emit(OpCodes.Add); il.Emit(OpCodes.Ret); var del = (Func<int>)method.CreateDelegate(typeof(Func<int>)); del(); // returns 3But…
ILGenerator is quite powerful, but it leaves a lot to be desired in terms of ease of use. For example, leave one of the Ldc_I4’s out of the above…
var method = new DynamicMethod("AddOneAndTwo", typeof(int), Type.EmptyTypes); var il = method.GetILGenerator(); il.Emit(OpCodes.Ldc_I4, 1); // Woops, we left out the 2! il.Emit(OpCodes.Add); il.Emit(OpCodes.Ret); var del = (Func<int>)method.CreateDelegate(typeof(Func<int>)); del();And what happens? You’d expect an error to be raised when we emit the Add opcode, but I’d understand deferring verification until the delegate was actually created.
Of course nothing’s ever easy, and what actually happens is an InvalidProgramException is thrown when the delegate is first used with a phenomenally unhelpful “Common Language Runtime detected an invalid program.” message. Most of the time, ILGenerator gives you no indicator as to where or why you went wrong.
Frustrations I’ve had with ILGenerator, in descending severity:
- Fails very late during code generation, and doesn’t indicate what went wrong
- Allows obviously malformed instructions, like Emit(OpCodes.Ldc_I4, “hello”)
- Lack of validation around “native int” allows for code that only works on specific architectures
Enter Sigil
Naturally I have a solution, and that solution’s name is Sigil (defined as “an inscribed or painted symbol considered to have magical power”, pronounced “Si-jil”). Sigil wraps ILGenerator, exposes much less error prone alternatives to Emit, and does immediate verification of the instruction stream.
The erroneous code above becomes:
var il = Emit<Func<int>>.NewDynamicMethod("AddOneAndTwo"); il.LoadConstant(1); // Still missing that 2! il.Add(); il.Return(); var del = il.CreateDelegate(); del();And Sigil throws an exception at il.Add() with the much more helpful message “Add expects 2 values on the stack”. Also notice how Sigil does away with all that nasty casting.
Sigil does much more than just checking that enough values are on the stack. It’ll catch type mismatches (including esoteric ones, like trying to add a float and a double), illegal control transfers (like branching out of catch blocks), and bad method calls.
Data For Debugging
In addition to not failing helpfully, ILGenerator doesn’t give you much to go on when it does fail. You don’t get an instruction listing or stack states, and your locals and labels are nothing but indexes and offsets.
When verification fails using Sigil the full instruction stream to data and current state of the stack (possibly two stacks, if a branch is involved) are captured by the thrown SigilVerificationException. Every local and label gets a name in the instruction listing (which you can override), and the values on the stack that caused the failure are indicated.
For example…
var il = Emit<Func<string, Func<string, int>, string>>.NewDynamicMethod("E1"); var invoke = typeof(Func<string, int>).GetMethod("Invoke"); var notNull = il.DefineLabel("not_null"); il.LoadArgument(0); il.LoadNull(); il.UnsignedBranchIfNotEqual(notNull); il.LoadNull(); il.Return(); il.MarkLabel(notNull); il.LoadArgument(1); il.LoadArgument(0); il.CallVirtual(invoke); il.Return(); var d1 = il.CreateDelegate();… throws an SigilVerificationException on Return(), and calling GetDebugInfo() on it gives you the following:
Top of stack ------------ System.Int32 // Bad value Instruction stream ------------------ ldarg.0 ldnull bne.un not_null ldnull ret not_null: ldarg.1 ldarg.0 callvirt Int32 Invoke(System.String)You still have to puzzle through it, but’s a lot easier to see what went wrong (that return from the passed delegate needs to be converted to a string before calling Return()).
But Wait, There’s More
Since Sigil is already doing some correctness validation that requires waiting until a method is “finished” (like making sure branches end up with their “expected” stacks), it has all it needs to automated a lot of tedious optimizations you typically do by hand when using ILGenerator.
For example, “Emit(OpCodes.Ldc_I4, {count})” shouldn’t be used if {count} is between -1 and 8; but who wants to remember that, especially if you’re rapidly iterating? Similarly almost every branching instruction has a short form you should use when the offset (in bytes, not instructions) fits into a single byte. Sigil automates all of that, you just call “LoadConstant” or “Branch” and move on.
Sigil also automates picking the appropriate version of some opcodes based on type. In raw IL, there are separate instructions for loading bytes, ints, arbitrary ValueTypes, and reference types from an array. Using ILGenerator you’d have to pick the appropriate opcode, but with Sigil you just call “LoadElement()” and the preceding instructions are used to figure it out.
Finally, Sigil detects when the Tailcall and Readonly prefixes can be used and inserts them into the command stream. It’s not possible to detect when the Volatile and Unaligned prefixes should be inserted (at least so far as I know), but Sigil does only allow them to be added in conjuction with opcodes they’re legal on which is still better than ILGenerator.
Unconditional Branch Caveat
There is one pain point Sigil does not yet address, though I have plans. Right now, Sigil requires type assertions immediately after unconditional branches (Br, and Leave to be precise) as it’s incapable of inferring the state of the stack in this case. This doesn’t come up quite as much as you’d expect, since truly unconditional branches are rare; especially when creating DynamicMethods.
Asserting types is attached to marking labels, and looks like the following:
var il = Emit<Func<int>>.NewDynamicMethod(); var b0 = il.DefineLabel("b0"), b1 = il.DefineLabel("b1"), b2 = il.DefineLabel("b2"); il.LoadConstant("abc"); il.Branch(b0); // jump to b0 with "abc" il.MarkLabel(b1, new [] { typeof(int) }); // incoming: 3 il.LoadConstant(4); il.Call(typeof(Math).GetMethod("Max", new[] { typeof(int), typeof(int) })); il.Branch(b2); // jump to b2 with 4 il.MarkLabel(b0, new[] { typeof(string) }); // incoming: "abc" il.CallVirtual(typeof(string).GetProperty("Length").GetGetMethod()); il.Branch(b1); // jump to b1 with 3 il.MarkLabel(b2, new[] { typeof(int) }); // incoming: 4 il.Return();You can assert types along with any MarkLabel call, in cases where Sigil can infer the stack state a SigilVerificationException will be thrown when there’s a mismatch.
Check It Out, Try It Out, Break It
Sigil’s source is on github, and it’s available on Nuget.
While I’ve done a fair amount of testing and converted some projects from ILGenerator to Sigil to flush out bugs, I wouldn’t at all be surprised if there are more. Likewise, I wouldn’t be shocked if Sigil’s validation has some holes or if it’s too strict in some cases.
So grab Sigil and try it out, I love working on this sort of stuff so don’t be shy about opening issues.
How does this compare to Cecil [1] or Microsoft’s Rosalyn [2]?
1 – http://www.mono-project.com/Cecil
2 – http://msdn.microsoft.com/en-us/vstudio/roslyn.aspx
Cecil focuses on modifying and inspecting existing assemblies, Roslyn on compiling, inspecting, and transforming C#.
Sigil only really cares about generating IL, it does nothing with existing code (save verifying that using it is legal). It’s also language agnostic, you’re generating IL not compiling C#.
So I guess I’d say Cecil and Roslyn are in the same “code generation / dynamism” bucket as Sigil, but are otherwise basically unrelated. Sigil wouldn’t be a replacement for either (assuming you’re using them to their fullest).
Roslyn’s Emit API can be used to generate and execute IL on the fly, for example see [1]. A drawdown with Roslyn is once IL is emitted, it is difficult to inspect and manipulated.
Cecil’s inspection capabilities are wonderful, but not connected with on-the-fly generation.
Overall, I can see the benefit of something lighter weight and targeted at IL manipulation and generation. To bad these things are still playing in their own sandbox.
1 – http://stackoverflow.com/questions/10751079/loading-an-assembly-generated-by-the-roslyn-compiler
I was under the impression Roslyn didn’t give you much chance to modify the IL generated, just spitting out assemblies. I haven’t done much with it beyond what I’ve blogged here though, which has all been very source & AST focused.
Sigil is definitely focused on runtime generation, as a replacement for directly using ILGenerator. Though there’s been some interest in other features (like making it play nice with IKVM for multi-platform targeting purposes) I want to focus on that use case until I’m sure it’s solid.
I did very similiar project https://github.com/Bobris/BTDB/tree/unstable/BTDB/IL
I don’t do any checking so it is more lightweight. But it supports generation of pdb for debugging and even calling private methods with debugging.
Check tests: https://github.com/Bobris/BTDB/blob/unstable/BTDBTest/ILExtensionsTest.cs
There are some tricks to not have so many strings …