Scripting and Profiling Improvements in StackExchange.Redis

At Stack Exchange, we make pretty heavy use of Redis.  We’re so invested we’ve had our best minds build not one, but two different .NET libraries for it.  Recent(ish) experience with the Providence project has prompted additions to the modern library, StackExchange.Redis, in the realms of Lua scripting and Profiling.

Lua Scripting Improvements

StackExchange.Redis already had full support for Redis’s scripting commands when Providence started, but Lua scripting wasn’t really used at StackExchange at that time.  Early versions of Providence started making extensive use of Lua scripts to make atomic updates to our data during our daily refresh process, and we started having painful experiences.  The root cause of our pain was that Redis requires you write your script using KEYS and ARGV arrays to pass parameters.

Here’s a toy example to illustrate the issue.  The command

eval "redis.call('set', KEYS[1], ARGV[1])" 1 theKey 1 theArg

rendered in C# as

IDatabase db = /* ... */
var theKey = /* ... */
var theArg = /* ... */
db.ScriptEvaluate(
  "redis.call('set', KEYS[1], ARGV[1])",
  new RedisKey[] { theKey },
  new RedisValue [] { theArg }
);

sets one “the-key” to “the-arg”.  The actual script in your source would be “redis.call(‘set’, KEYS[1], ARGV[1])” which is… less than ideal, especially once your scripts start getting complicated.

In recent versions of StackExchange.Redis there’s an alternative, use the new LuaScript class.  This lets you refer to variables by name in a manner inspired by Dapper (the Stack Exchange micro-ORM).

The previous example would now be rendered in C# as:

IDatabase db = /* ... */
var theKey = /* ... */
var theArg = /* ... */
var script = LuaScript.Prepare("redis.call('set', @theKey, @theArg)");
db.ScriptEvaluate(script, new { theKey, theArg });

LuaScript rewrites your script to refer to ARGV and also handles mapping your parameters to the appropriate ARGV index (and places any parameters of type RedisKey into the KEYS array as well, so as to play nicely with Redis Cluster).  You can use any object to specify parameters, so long as it has a public field or property for each @parameter.

To avoid the overhead of transmitting the script text for every call to IDatabase.ScriptEvaluate, you can convert a LuaScript into a LoadedLuaScript like so:

IServer server = /* ... */
LuaScript script = /* ... */
LoadedLuaScript loaded = server.ScriptLoad(script);

The LoadedLuaScript class provides the same functionality as the LuaScript class, but uses the SCRIPT LOAD and EVALSHA commands to avoid re-transmitting the entire script when evaluated.

Profiling Improvements

While Redis is really (really really) fast, at Stack Overflow we’ve still been profiling its impact on page load times since time immemorial.  Everytime a developer looks at Stack Overflow, they get informed of how much time was spent in talking to Redis.

That 48.8ms includes a fair amount of

That 48.8ms includes a fair amount of “moderator only” work too.

The way Stack Overflow uses Redis is analogous to how we use SQL Server – send query, wait for it to complete, read the response, then proceed.  StackExchange.Redis multiplexes and pipelines so the application doesn’t block waiting for responses, but individual HTTP requests do as a consequence of their logic.

Accordingly, Stack Overflow’s Redis profiling code has been conceptually:

var watch = Stopwatch.StartNew();
IDatabase db = /* ... */
db.GetString(/* ... */);
watch.Stop();
PushToProfilingStats(watch.Elapsed);

Obviously the actual code is more sophisticated.

Providence, however, works quite differently.  For each request to Providence, it dispatches several Redis commands and then combines them once they all finish.

In other words, the Providence Redis code tends to look like:

IDatabase db = /* ... */
var inFlight =
  new List<Task>
  {
    GatherFooFieldsAsync(),  // does an async redis query or two
    GatherBarFieldsAsync()   // likewise
    /* and so on */
  };
 await Task.WhenAll(inFlight);

To better support profiling in situations like Providence’s, StackExchange.Redis now has the IProfiler interface,  the RegisterProfiler method, the BeginProfiling method, and the FinishProfiling method.  These new additions are used by StackExchange.Redis to associate dispatched commands with a user provided context object, even when using the async and await keywords.

Exactly how to determine context is application specific but the following code is used in the Providence API project, which is a stock MVC5 website.

// The IProfiler implementation
public class RedisProfiler : IProfiler
{
    const string RequestContextKey = "RequestProfilingContext";

    public object GetContext()
    {
        var ctx = HttpContext.Current;
        if (ctx == null) return null;

        return ctx.Items[RequestContextKey];
    }

    public object CreateContextForCurrentRequest()
    {
        var ctx = HttpContext.Current;
        if (ctx == null) return null;

        object ret;
        ctx.Items[RequestContextKey] = ret = new object();

        return ret;
    }
}

// In Global.asax.cs
static RedisProfiler RedisProfiler;
static ConnectionMultiplexer RedisConnection;

protected void Application_Start()
{
    RedisProfiler = new RedisProfiler();
    RedisConnection = ConnectionMultiplexer.Connect(...);
    RedisConnection.RegisterProfiler(RedisProfiler);
    // and actual app startup logic ...
}

protected void Application_BeginRequest()
{
    var ctxObj = RedisProfiler.CreateContextForCurrentRequest();
    if (ctxObj != null)
    {
      RedisConnection.BeginProfiling(ctxObj);
    }

    // and actual per-request logic ...
}

protected void Application_EndRequest()
{
    var ctxObj = RedisProfiler.GetContext();
    if (ctxObj != null)
    {
        var timings = RedisConnection.FinishProfiling(ctxObj);

        var totalCommands = 0;
        double totalMilliseconds = 0;

        foreach (var timing in timings)
        {
            totalCommands++;
            totalMilliseconds += timing.ElapsedTime.TotalMilliseconds;
        }

        // and then perf recording logic ...
    }

    // and actual end of request logic ...
}

More examples of profiling with different notions of context can be found in the StackExchange.Redis profiling tests.

Profiling provides details on each command dispatched, not just their count and time to execute.  Timings are exposed as IProfiledCommands, which includes information about:

  • The Redis instance the command was sent to
  • The DB the command was run in
  • The command itself
  • Any CommandFlags specified
  • When the command was created, which corresponds to when the StackExchange.Redis method was called
  • How long it took to enqueue the command after it was created
  • How long it took to send the command after it was enqueued
  • How long it took Redis to respond after the command was sent
  • How long it took to process the response after it was received
  • Total time elapsed from command creation til the response was processed
  • If the command was a retransmission due to Redis Cluster
  • If the command was a retransmitted, a reference to the original commands timings
  • If the command was a retransmission, whether it was an ASK or MOVED that prompted it

StackExchange.Redis is available on GitHub and NuGet


Jil: Version 2.x, Dynamic Serialization, And More

Jil, a fast JSON serializer/deserializer for .NET, has hit 2.x!  It’s gained a lot more perf work, some improved functionality, and dynamic serialization.

What’s Dynamic Serialization?

Jil has always supported static serialization via a generic type parameter.  This would serialize exactly the provided type; things like inheritance, System.Object members, and the DLR were unsupported.

In .NET there are, broadly speaking, three “kinds” of dynamic behavior: polymorphism, reflection, and the dynamic language runtime.  Jil now supports all three.

Polymorphism

Polymorphism comes up when you use subclasses.  Consider the following.

class Foo
{
  public string Fizz;
}
class Bar : Foo
{
  public string Buzz;
}

 

Since Jil can’t know the Bar class exists when Serialize<Foo>(…) is first called, the Buzz member would never be serialized.  SerializeDynamic(…), however, knows that the Foo class isn’t sealed and that the possibility of new members needs to be accounted for on every invocation.  A similar situation exist with virtual members, which is also handled in SerializeDynamic.

Reflection

Reflection matters when the objects being serialized have normal .NET types (ints, strings, lists, user defined classes , and so on) at runtime but that type isn’t known at compile time.  Consider the following.

object a = 123;
object b = "hello world";

Calling Serialize(a) or Serialize(b) would infer a type of System.Object, since at compile time that’s the only information we have.  Using SerializeDynamic, Jil knows to do a runtime lookup on the actual type of the passed in object.

Dynamic Runtime

The dynamic runtime backs the dynamic keyword in C#, but for Jil’s purposes it can be thought of as special casing types that implement IDynamicMetaObjectProvider.

While it’s rare to directly implement IDynamicMetaObjectProvider, code using ExpandoObject or DynamicObject isn’t unheard of.  For example, Dapper uses ExpandoObject to implement its dynamic returns.

Speed Tricks

As usual, Jil focuses on speed although dynamic serialization is necessarily slower than static serialization.

Jil builds a custom serializer at each point in the type graph where the type could vary on subsequent calls.  For example, if Jil sees a “MyAbstractClassBase” abstract class as a member it will do an extra lookup on each call to SerializeDynamic(…) to find out what the type really is for that invocation.  If instead Jil sees a “string” or a “MyValueType” struct as a member, it knows that they cannot vary on subsequent calls and so will not do the extra lookup.  This makes the first serialization involving a new type slower, but subsequent serializations are much faster.

The most common implementations of IDynamicMetaObjectProvider are special cased, ExpandoObject is treated as an IDictionary<string, object> and DynamicObject’s TryConvert(…) method is called directly.  This avoids some very expensive trial casts that are sometimes necessary when serializing an implementer of IDynamicMetaObjectProvider.

Further Improvements

While dynamic serialization is the main new feature in 2.x, other improvements have been made.

A partial list of improvements:

Relative Speed Improvements

It’s been a while since I’ve posted Jil benchmarks, since most recent work has been on dynamic features that aren’t really comparable.  However, lots of  little improvements have added up to some non-trivial performance gains on the static end.

Overall serialization gains fall around 1% for the Stack Exchange API types, with the larger models and collections gaining a bit more.

In the Stack Exchange API, deserialization of most models has seen speed increases north of 5% with largest models and collections seeing double-digit gains.

 

These numbers are available in a Google Doc were derived from Jil’s Benchmark project running on a machine with the following specs,:

  • Operating System: Windows 8 Enterprise 64-bit
  • Processor: Intel Core i7-3960X 3.30 GHz
  • Ram: 64 GB
    • DDR
    • Quad Channel
    • 800 MHz

The types used were taken from the Stack Exchange API to reflect a realistic workload, but as with all benchmarks take these numbers with a grain of salt.

I would also like to call out Paul Westcott’s (manofstick on Github) contributions to the Jil project, which have made some of these recent performance gains possible.

As always, you can

Browse the code on  GitHub or Get Jil on NuGet


Jil: Dynamic Deserialization

With version 1.3.0 Jil now supports dynamic deserialization of JSON, with the appropriately named JSON.DeserializeDynamic(…) methods.

What Types Are Supported?

Jil’s dynamic deserializer parses the same set of types that its static deserializer does.  Supported types are:

  • Strings and Chars
  • Booleans
  • Integers
  • Floating point numbers
  • DateTimes
  • Nullables
  • Enumerations
  • Guids, in the “D” format
  • Arrays
  • Objects

DateTime formats must be configured through an Options object, and includes four popular JSON date formats.

How Dynamic Is It?

Jil returns a dynamic object, introduced in C# 4, rather than some custom “JSON” class.  Using the parsed object is done with natural feeling C# instead of method or indexer calls.

The following are supported operations on a Jil dynamic:

  • Casts
    • ie. (int)JSON.DeserializeDynamic(“123″)
  • Member access
    • ie. JSON.DeserializeDynamic(@”{“”A””:123}”).A
  • Indexers
    • ie. JSON.DeserializeDynamic(@”{“”A””:123}”)[“A”]
    • or JSON.DeserializeDynamic(“[0, 1, 2]”)[0]
  • Foreach loops
    • ie. foreach(var keyValue in JSON.DeserializeDynamic(@”{“”A””:123}”)) { … }
      • in this example, keyValue is a dynamic with Key and Value properties
    • or foreach(var item in JSON.DeserializeDynamic(“[0, 1, 2]”)) { … }
      • in this example, item is a dynamic and will have values 0, 1, and 2
  • Common unary operators (+, -, and !)
  • Common binary operators (&&, ||, +, -, *, /, ==, !=, <, <=, >, and >=)
  • .Length & .Count on arrays
  • .ContainsKey(string) on objects

Jil also allows you to cast JSON arrays to IEnumerable<T>, JSON objects to IDictionary<string, T>, and either to plain IEnumerables.

Dynamic objects returned by Jil are immutable.  Setting properties, setting through an indexer, .Add, +=, and so on are not supported and will throw exceptions at runtime.

Performance Tricks

As always, Jil works quite hard to be quite fast.  While dynamic deserialization is necessarily somewhat slower than its static counterpart, it’s still quite fast.

Custom Number Parsers

.NET’s built in number parsers are much more general purpose than what Jil needs, so a custom parser that only does what is demanded by the JSON spec is used instead.

Custom Number Format

The JSON spec explicitly doesn’t specify the precision of numbers, and Jil exploits that to implement a non-IEEE794 floating point representation while parsing.  Jil uses a less compact, but easier to parse, “Fast Number” of 169-bits for most purposes internally.  Converting from this format to .NET’s built-in floating point values requires some extra work, but is still a net speed up for a small number of accesses.  In addition, converting from integer “Fast Numbers” to .NET’s built-in integer types is quite fast.

 Minimal Allocations

While dynamic dispatch implies a great many more allocations than with Jil’s static deserializer, the dynamic deserializer still strives to minimize them.  You can see this in the methods for parsing strings, where a char[] is used if possible rather than allocating strings via a StringBuilder.

Benchmarks

Benchmarking dynamic code is tricky.  Comparisons to other libraries may not be apples-to-apples if the same dynamic behavior hasn’t been implemented, or if parsing or conversion is delayed to different points.

Rather than put together a, quite possibly misleading, comparison to other JSON libraries.  The following compares Jil’s dynamic deserializer to its static deserializer.

Comparison of Jil's static and dynamic deserializers

Comparison of Jil’s static and dynamic deserializers

The code for this benchmark can be found on Github.

While the dynamic deserializer is quite fast, it’s worth remember that dynamic dispatch is considerably slower than static dispatch and that some type conversions are deferred when deserializing dynamically.  In practice this means that using the static deserializer is still noticeably faster than using the dynamic deserializer, but the dynamic deserializer is still usable for most purposes.

Grab the latest Jil on Nuget or Checkout the code on Github

After the inevitable bug fixes, I’ll probably be revisiting Jil’s existing SerializeDynamic() method to make it actually dynamic.  Currently it only dynamically dispatches on the first type it sees, which isn’t nearly as useful.


Jil: Doing JSON Really, Really Quickly

After about three months of work, and some time in a production environment, the second half of Jil (a fast JSON library built on Sigil) is ready for release.

Jil now supports both serializing and deserializing JSON.

As with serialization, Jil deserialization supports very few configuration options and requires static typing.  The interface is, accordingly, just a simple “JSON.Deserialize<T>(…)” which takes either a string or a TextReader.

The Numbers

Jil’s entire raison d’être is ridiculous optimizations in the name of speed, so let’s get right to the benchmarks.

Let’s start with the (recently updated) SimpleSpeedTester:

Performance Graph

The numbers show Jil’s deserializers are about 50% faster than the next fastest JSON library.  Protobuf-net is included as a baseline, it isn’t a JSON serializer (it does Protocol Buffers) but it’s the fastest .NET serializer (of anything) that I’m aware of.

The Tricks

Just like with serializing, Jil does runtime code generation using Sigil to produce tight, monolithic deserialization methods.  That right there accounts for quite a lot of performance.

To speed up deserializing numbers, Jil has a large collection of methods tailored to each built-in number type.  Deserializing bytes, for example, can omit sign checks and all  but one overflow check.  This specialization avoids unnecessary allocations, and sidesteps all the overhead in .NET imposed by support for culture-specific number formatting.

When deserializing JSON objects, Jil tries to avoid allocations when reading member names.  It does this by checking to see if a hash function (Jil uses variants of the Fowler–Noll–Vo hash function) can be truncated into a jump table, ie. Jil calculates hash(memberName) % X for many Xs to see if a collision free option exists.  If there’s an option that works, Jil uses it instead of actually storing member names which saves an allocation and the cost of a lookup in a Dictionary.

A fair amount of performance is gained by being very parsimonious with memory on the heap.  Jil typically allocates a single character buffer, and may allocate a StringBuilder if certain conditions are met by the deserialized data.  Additional string allocations will be made if Jil cannot use hash functions member name lookups.

Interested in Jil?

Grab It On Nuget or Checkout The Code

Next up is probably dynamic deserialization, though naturally I’d love to make the existing functionality even faster.  Don’t hesitate to ping me with more performance tricks.


Jil: Serializing JSON Really, Really Quickly

Several months ago I more or less finished up my “just for fun” coding side project Sigil (which I’ve written about a few times before), and started looking for something new to hack on.  I settled on a JSON serializer for .NET that pulled out all the stops to be absolutely as fast as I could make it.

About three months later, I’m ready to start showing off…

Jil

a fast JSON serializer built on Sigil.

This is still a very early release (arbitrarily numbers 0.5.5, available on Nuget [but not in search results]), so there are a lot of caveats:

  1. Jil has very few configuration options
  2. Jil has very limited support for dynamic serialization
  3. Jil only serializes, no deserialization
  4. Jil only supports a subset of types
  5. Jil has seen very little production use
    • At time of writing, Jil has been pulled into the Stack Exchange API for about a week without issue

Some Numbers

Benchmarks are most the exciting thing about Jil, so let’s get right to them.  The following are from a fork of theburningmonk’s SimpleSpeedTester project.

In tabular form, the same data:

Raw numbers are available in a google document.  There are also further benchmarks I created while making Jil on Github.

The take away from these benchmarks is that Jil is about twice as fast as the next fastest .NET JSON serializer.  Protobuf-net is still faster (as you’d expect from an efficient binary protocol from Google and a library written by Marc Gravell) but Jil’s closer to it than to the next JSON serializer.

Some Tricks

I could write a whole series of how Jil shaves microseconds, and may yet do so.  I’ll briefly go over some of the highlights right now, though.

The first one’s right there in the name, Jil is built on Sigil.  That means a type’s serializer gets created in nice tight CIL, which becomes nice tight machine code, and no reflection at all occurs after the first call.

Second, Jil has a number of very specialized methods for converting data types into strings.  Rather than relying on Object.ToString() or similar, Jil has a separate dedicated methods for shoving Int32s, UInt64s, Guids, and so on into character arrays.  These specialized methods avoid extra allocations, sidestep all the culture-specific infrastructure .NET makes available, and let me do crazy things like divide exactly 14 times to print a DateTime.

As you’d expect of performance focused code in a garbage collected environment, the third thing Jil focuses on is trying not allocate anything, ever.  In practice, Jil can keep allocations to a single small charactar array except when dealing with Guids and IDictionary<TKey, TValue>s.  For Guids Jil must allocate an array for each since Guid.ToByteArray() doesn’t take a buffer, while serializing Dictionaries still allocates an enumerator.

If you’ve clicked through to Jil’s source by now, you might have noticed some MethodImpl attributes.  That’s a part of the Jil’s fourth big trick, trading a fair amount of memory for more speed.  Aggressively inlining code saves a few instructions spent branching, and even more time if instruction prefetching isn’t perfect in the face of method calls.

Last but not least, Jil avoids branches whenever possible; your CPU’s branch predictor can ruin your day.  This means everything from using jump tables, to skipping negative checks on unsigned types, to only doing JSONP specific escaping when requested, and even baking configuration options into serializers to avoid the runtime checks.  This does mean that Jil can create up to 64 different serializers for the same type, though in practice only a few different configurations are used within a single program.

Check Out The Code or Grab Jil From Nuget

I’m definitely interested in any crazy code that shaves more time off.  Also faster ways to create a string (rather than write to a TextWriter), my experiment with capacity estimation works… but not for reliably enough speedups to flip it on by default.


Making Magic: How Sigil Works

Version 3.0.0 of Sigil was just released (grab it on Nuget and check out the source on Github).  The big new feature this release is a disassembler, one which allows for some inspection of the flow of values through a .NET delegate.

But that’s not what I’m writing about.  I figure now’s as good a time as any to write up the “how” of Sigil, given that I covered the “what” and “why” in an earlier post and that recent releases have refactored Sigil’s internals into a state I’m happier with.

Bytecode Verifiers And You

In essence Sigil is a “bytecode verifier”.  If you’ve done JVM or .NET development you should be familiar with the concept, the bytecode verifiers on those platforms make sure that the class files or assemblies you load contain bytecode that can safely be executed.

The definition of “safe” is very particular, a bytecode verifier doesn’t prevent errors from occurring at execution time but rather prevents invariants of the runtime from being violated.  For example, a bytecode verifier would guarantee that invoking an instance method is passed the proper number and types of parameters and that it is invoked against an instance of the appropriate type.

One way to think about bytecode verifiers is that they guarantee that every operation receives the correct types as inputs and every operation leaves the runtime in a well formed state.

Sigil’s Special Features

Where Sigil differs from other bytecode verifiers is that it doesn’t operate on “finished” instruction sequences.  It verifies as you build a sequence, failing as soon as it can be sure the sequence is invalid.

Because Sigil deals with incomplete instruction sequences it also has to do with a lot of unknowns, especially around branches.  It’s quite common to branch to an instruction you haven’t actually emitted yet or emit instructions that aren’t yet provably reachable, both cases a traditional verifier can never encounter.

Sigil also has to explain itself when it fails, so it has to be able to deliver where and why a given sequence became invalid (which can be far removed from the last emitted instruction because of branches).  Similar complications exist when verification is successful, as things like eliding trivial casts and replacing branches with their short forms (which are deferred until an instruction stream is finalized) requires a lot of information about the instruction stream be retained.

Simple Verifying

If you ignore branches, verifying a bytecode sequence is pretty simple.  You can think of it as executing instructions as if they consumed and produced types instead of values.  Since Sigil is a .NET library I’ll be using .NET examples, though the basic idea applies to all similar verifiers.

For example, assume the following is an implementation of a Func<int>:

ldc.i4 1
ldc.i4 2
add
ret

We know “ldc.i4″ consumes nothing, and produces an int, “add” actually consumes and produces a wide range of types but one of them is “int int -> int”.  The “ret” instruction either consumes nothing or a single type, dependent on the signature of the method it is used in; in this case it consume an “int” which we know because the method is a “Func<int>”.

I’ve written out the state of the stack (the .NET runtime is specified as a stack machine) after each instruction executes:

         // --empty--
ldc.i4 1 // int
ldc.i4 2 // int int
add      // int
ret      // --empty--

We need to add one more rule, which is that control flow ends with a “ret” which leaves the stack empty, and with that it’s clear that this sequence would verify.

How each instruction affects the stack, in terms of types.

The following sequence wouldn’t verify:

ldc.i4 1            // int
ldstr "hello world" // string int
mult                // !ERROR!
ret

Since “mult” doesn’t have a transition from “string int” verification could be shown to fail as soon as “mult” is emitted.

The earliest releases of Sigil were simple verifiers as they didn’t try and trace through.  Through version 1.2.9 you instead had to assert types when marking labels under certain circumstances.

Complex Verifying

As soon as you add branches life gets considerably more complicated.  Now you have to deal with cases where you can’t infer what types are on the stack immediately.

Consider the following:

ldc.i4 1
br end_label

middle_label:
add
ret

end_label:
ldc.i4 2
br middle_label

In this example the types passed to “add” are unclear when it is emitted, it’s only when “br middle_label” is encountered that it becomes clear that “add” will verify.

Furthermore, whether or not “ret” verifies depends on the method being built. It would verify if we’re building a Func<int> then it verifies.  If we’re building a Func<string> it should fail when emitted, since there’s no way for “add” to pass a string to “ret”.  If we’re building a Func<double>, then it should fail when “br middle_label” is emitted since then “add” is known to produce and pass an Int32 to “ret”.

How Sigil Copes

Sigil deals with the complexities of verifying partial sequences in two ways: tracking the possible types on the stack, and creating a verifier for each possible code path.

Reconsidering the above example:

                // ** Initial Verifier: V1, with empty stack **
ldc.i4 1        // int
br end_label    // int
                // ** New Verifier: V2 with unknown stack, V1 stashed **
middle_label:
add             // int|long|float|double|etc.
ret             // --empty--
                // ** New Verifier: V3 with unknown stack, V2 stashed **
end_label:      // ** V1 restored (stack is: int) **
ldc.i4 2        // int int
br middle_label // int int (V2 now verified with int int as initial stack)

There’s rather a lot going on now, what with verifiers being stashed and restored.  Also note that when “add” is first encountered it places an “or” clause onto the stack, which allows “ret” to verify if it expects any type in that “or” clause.

How different verifiers see the instruction stream

The logic around creating and restoring verifiers is tricky, but boiled down:

  • At an unconditional branch, store the current stack state and remove the current verifiers
    • If the branched to label has already been marked, take the current stack and check it against the expected initial stack at that label
  • At a conditional branch do the same checks and stores as an unconditional one, but don’t remove the current verifiers
  • When marking a label, if there is no verifier create a new one with an unknown initial stack
    • If there are stacks stored against that label from previous branches, check that those stacks are consistent and make them the new verifiers initial stack
  • Treat “ret” as an unconditional branch which doesn’t store the stack state
  • If at any point an instruction is emitted and there is no current verifier, the newly emitted code is unreachable

This logic is captured in Sigil’s RollingVerifier UnconditionalBranch, ConditionalBranch, Return, and Mark methods.

As for tracking all possible values on the stack (as in the “or” with “add” in the previous example), Sigil considers transitions rather than types on the stack.  So long as there’s at least one transition that could be taken for every instruction a verifier has seen, the instruction stream is considered verifiable.

Take for example the, rather complex, transitions for “add”:

new[]
{
    new StackTransition(new [] { typeof(int), typeof(int) }, new [] { typeof(int) }),
    new StackTransition(new [] { typeof(int), typeof(NativeIntType) }, new [] { typeof(NativeIntType) }),
    new StackTransition(new [] { typeof(long), typeof(long) }, new [] { typeof(long) }),
    new StackTransition(new [] { typeof(NativeIntType), typeof(int) }, new [] { typeof(NativeIntType) }),
    new StackTransition(new [] { typeof(NativeIntType), typeof(NativeIntType) }, new [] { typeof(NativeIntType) }),
    new StackTransition(new [] { typeof(float), typeof(float) }, new [] { typeof(float) }),
    new StackTransition(new [] { typeof(double), typeof(double) }, new [] { typeof(double) }),
    new StackTransition(new [] { typeof(AnyPointerType), typeof(int) }, new [] { typeof(SamePointerType) }),
    new StackTransition(new [] { typeof(AnyPointerType), typeof(NativeIntType) }, new [] { typeof(SamePointerType) }),
    new StackTransition(new [] { typeof(AnyByRefType), typeof(int) }, new [] { typeof(SameByRefType) }),
    new StackTransition(new [] { typeof(AnyByRefType), typeof(NativeIntType) }, new [] { typeof(SameByRefType) })
};

One more thing to track when dealing with verifiers is whether or not we know their initial stack, what I call “baseless” in the code.  It is not an error for an instruction stream to underflow a baseless verifier, since it’s stack could be anything.  Instead of failing verification, Sigil considers the result of underflowing a baseless stack to be a “wildcard” type which satisfies any transition; this is how “add” can pop a value to continue verification after “middle_label”.

Trickier Things

.NET has a couple hairy CIL opcodes that require special handling: dup, localloc, and leave.

“dup” duplicates the current value on the stack, the difficulty being that we only know the type on the stack if we can verify the preceding instructions which isn’t always.  Sigil handles this by making “dup” place a special type on the stack, which when encountered by the verifier pushes a copy the preceeding transition’s result or a wildcard if underflowing a baseless verifier.

“localloc” is analogous to alloca(), pushing a pointer to memory on the stack, which requires that only a single value be on the stack when executed.  This means the current verifier cannot be baseless to verify.  In this case Sigil uses a special transition which asserts that the size of the stack is one if the verifier is based, and is ignored if it is not.

“leave” is an unconditional branch out of an exception or catch block which empties the stack entirely.  Sigil considers this equivalent to “pop”-ing exactly the number of items currently on the stack, which like “localloc” means the current verifier cannot be baseless.  Like “dup” Sigil uses a special type to indicate that the stack needs to be emptied to the verifier.

Optimizing Emitted Code

There are two kinds of optimizations Sigil can do to emitted CIL: eliding trivial casts, and replacing instructions with their short forms.

Conceptually eliding is straight forward, just keep track of what types are guaranteed to go into a castclass or isinst operation and if those types are assignable to the type encoding with the instruction elide it.  Sigil attaches callbacks, like this one, to “castclass” and “isinst” transitions which are called whenever a verifier processes those operations and passed enough information to decide whether to elide themselves or not.

Some short forms are easy, any of the LoadConstant methods with short forms can be changed at call time.  The trickier ones are branches, as we need to wait until we’re done emitting code and can calculate offsets.  Tracking offsets is handled by BufferedILGenerator (which maintains a cache of byte offsets to each instruction) and a last minute call to Emit.PatchBranches patches all the branch instructions that can fit their offsets into a signed byte.

Optimizing Sigil

Sigil employs some tricks to keep it relatively snappy.

Perhaps most importantly, it doesn’t re-verify operations unless it absolutely has to.  Every currently in scope verifier maintains a cache of its state when it last verified the instruction stream and re-uses that cache when new instructions are added.  The cache does have to be invalidated when the initial stack state changes, which only happens when branching to or marking labels.

Sigil also tries to keep the number of verifiers in scope limited by discarding any non-baseless verifiers past the first one.  Since any verifier that isn’t baseless can be traced back to the start of the method, we know that there are no “or” type clauses on the stack so the verifiers are equivalent going forward even if they took different paths through the code.

Other Tidbits

To wrap this up I’m just going to list some minor things of note found in Sigil.

  • Sigil has most of an implementation of Linq-to-Objects to run on pre-.NET 3.5 runtimes, heavily influenced by Jon Skeet’s Edulinq series
  • Sigil has it’s own Tuple implementation for similar reasons
  • Sigil’s disassembler, while doing a great deal more, started as a replacement for the Mono.Reflection disassembler in our code base
  • Sigil’s exception blocks are slightly different from ILGenerators in that you explicitly attach catches and finallys to them, this makes nested exception handling easier to debug

And again, Sigil can be installed from Nuget and the source is available on Github.


Improving Sigil

A few weeks ago I announced Sigil, a fail-fast validating wrapper around .NET’s ILGenerator.  Since then I’ve pushed a number of updates and bug fixes, culminating in the fairly significant 2.0.0 release now on Nuget.

I figured now was a good time to write down what’s changed, and where I see this code going.

A Smarter Validator

The biggest change between 1.x and 2.x is that Sigil no longer requires stack type assertions after unconditional branches.

In version 1.x you had to do something like the following:

var emit = Emit<Func<int>>.NewDynamicMethod();

var label1 = emit.DefineLabel();
var label2 = emit.DefineLabel();

emit.LoadConstant(0);
emit.Branch(label1);

emit.MarkLabel(label2, new [] { typeof(int) }); // Sigil wasn't smart enough to know what the stack was here
emit.Return();

emit.MarkLabel(label1);
emit.Branch(label2);

Version 2.x removes the need to pass types to certain MarkLabel() calls entirely.  This removes the most painful bit of busywork entailed in using the initial release of Sigil.

As a consequence of this new validator, a few breaking changes were made to Sigil’s interface.  The obvious removal of stack assertions, plus the removal of GetStack (as the stack state is not always known) and the addition of some required parameters to array manipulation methods (as the array involved cannot always be inferred).

Better Debugging Information

Sigil is all about failing fast and helpfully, and several changes have been made to the string returned by SigilVerificationException.GetDebugInfo() since Sigil’s initial release.

Examples include:

  • Non-returning paths indicated by the labels that path flows through
  • Instructions that take locals are annotated with the names and types of the involved local
  • The instruction that causes a verification error
  • As in version 1.x the stack state at any error, now including all possible types

Optimizations

Since it’s initial release, Sigil has gained the ability to elide trivial casts (thanks to Marc Gravell for the first version of this feature).

Modest performance improvements have been made to rewriting branches into their short forms, and the ability to disable such rewriting has been added (credit to Jason Punyon for some of this).

Performance

Sigil has been tuned somewhat since it’s initial release, but the introduction of a smarter validator has come at a performance price under certain circumstances.  As a rough guideline, if you’re generating methods with hundreds of labels and branches or tens of thousands of instructions Sigil may be too slow for you.

In addition to the aforementioned optimization options (passed as OptimizationOptions), the option to defer some validation has also been added (via the ValidationOptions enumeration).  Between the two a great deal of Sigil’s utility can still be had even for very large or very branch intensive generated code.

Minor Improvements

  • While debugging all declared labels and in scope locals are exposed as the Labels and Locals properties
  • The WriteLine(string, params Locals[]) method
  • Automatic insertion of the volatile prefix when able
  • The ability to refer to locals and labels by name

The Future

I’m not really satisified with Sigil’s performance just yet.  Profiling suggests that the slowest part of Sigil is gluing together instruction sequences for post-branch validation; I’m sure this can be done more quickly and probably skipped altogether in a lot of cases.

I’d also like to support the last four CIL instructions: Arglist, Mkrefany, Refanytype, and Refanyval.  Sigil has focused mostly on the DynamicMethod case where these instructions are rarely useful, but for completion’s sake they should be available.

Sigil currently has .NET 4.5 and 3.5 builds, it’d be nice to get a .NET 2.0 build as well.  This is non-trivial as Sigil makes heavy use of LINQ and extension methods.

A more long term goal is to abstract most of Sigil’s dependency on System.Reflection and System.Reflection.Emit so that its validation and optimization features can be used for a wider array of meta-programming tasks.  Freely switching between standard .NET and IKVM.Reflection is probably the first place to start, as there’s already a similar feature in protobuf-net’s precompiler.

As before, Sigil’s source is on github and the latest stable releases are on Nuget.


Follow

Get every new post delivered to your Inbox.