Public Broadcasting: A Self-Describing Wrapper Around protobuf-net
Posted: 2012/12/07 Filed under: code Comments Off on Public Broadcasting: A Self-Describing Wrapper Around protobuf-netFamiliar with Protocol Buffers? It’s a neat binary serialization format out of Google which aims to be efficient and extensible. Java, C++, and Python have “official” libraries, and there are a plethora of libraries for other platforms.
In fact, we’ve been using protobuf-net over at Stack Exchange for a good long time, since August 18th, 2010 if our commit logs are to be believed. It’s famously fast and simple to use, which has let it worm its way into basically all of our projects. We even got the mind behind it to come and slum it with chumps like me.
But…
There is one pain point to using Protocol Buffers and that’s, well, defining the protocol bits. You can either define your messages in .proto files and compile them, or if you’re using protobuf-net (or similar) annotate your existing types.
With protobuf-net a typical class ends up looking like so:
[ProtoContract]
public class RedisInboxItem : IComparable<RedisInboxItem>
{
[ProtoMember(1)]
public DateTime CreationDate { get; set; }
[ProtoMember(2, IsRequired = true)]
public InboxItemType Type { get; set; }
[ProtoMember(3)]
public int Id { get; set; }
[ProtoMember(4)]
public string Title { get; set; }
[ProtoMember(5)]
public bool IsPersisted { get; set; }
// ...
This isn’t a problem if you’re marshaling between different code bases or acting as a service – you need to document the types involved after all; might as well do it in annotations or .proto files. But if you’re communicating within the same code base, or with internal services, this manual protocol declaration can be onerous and a tad error prone.

Easily beat Brock, or eventually have a giant fire-breathing dragon?
Trade-Offs For Convenience
What would be really keen is a wrapper around Protocol Buffers that carries its own description, so that mapping tag numbers to fields doesn’t need any fore-knowledge of the serialized types. It’s also been a while since I committed any really terrible ILGenerator code.
So I wrote one, it trades some of protobuf-net’s speed and some of Protocol Buffers compactness for the convenience of not having to use .proto files or annotations. I call it Public Broadcasting, because that’s the first phrase that popped into my head with a P followed by a B. Naming is hard.
In short, what Public Broadcasting does is provide a structurally typed wrapper around protobuf-net. Any member names that match when deserializing are mapped correctly, any missing members are ignored, and any safe conversions (like byte -> int or Enum <-> String) happen automatically. In addition, Nullable<struct>’s will be converted to default(struct) if necessary when deserializing. Inheritance, type names, interfaces, and so on are ignored; we care about how the data “looks” not how it’s “named”.
Public Broadcasting works by describing a type using Protocol Buffers, then including that description in an “envelope” along with the actual data. When deserializing, Public Broadcasting constructs a new type with all the appropriate annotations and then lets protobuf-net do the heavy lifting of deserializing the data. Since we only care about the “structure” of the data a good deal of .NET’s type system is discarded, namely only classes (no distinction between ReferenceType and ValueType), Lists, Dictionaries, Enumerations, Nullables, and “simple” types (int, byte, string, GUID, etc.) are used.
Since the original type being deserialized doesn’t actually need to be known with Public Broadcasting, there is a Deserialize method which returns dynamic. Although dynamic is something I normally avoid, in the “grab a data object, then discard”- style I typically use protobuf-net in, I think it’s a good fit.
In my (admittedly limited) testing, Public Broadcasting is typically within an order of magnitude of raw protobuf-net usage. And protobuf-net is hella-fast, so even 10x slower is going to be plenty fast most of the time.
In terms of compactness, Public Broadcasting is going to be approximately the “length of all involved strings” larger than raw protobuf-net. As soon as you start having many instances or recursive types this overhead shrinks relative to the size of the message, as Public Broadcasting doesn’t repeat member names like JSON.
However, if you absolutely must have the smallest messages and the fastest (de)serializations then you’ll want to use protobuf-net directly; overhead imposed by Public Broadcasting is noticeable.
If You Like What You’ve Read
Grab the source, or pull the dll off of Nuget, and let me know what you think.
Extending Type Inferencing in C#
Posted: 2012/06/10 Filed under: code Comments Off on Extending Type Inferencing in C#A while ago I used the first Roslyn CTP to hack truthiness into C#. With the second Roslyn CTP dropping on June 5th, now pretty close to feature complete, it’s time think up some fresh language hacks.
Roslyn What Now?
For those unfamiliar, Roslyn is Microsoft’s “Compiler as a Service” for the VB.NET and C# languages. What makes Roslyn so interesting is that it exposes considerably more than “Parse” and “Compile” methods; you also have full access to powerful code transformations and extensive type information.
Roslyn is also very robust in the face of errors, making the most sense it can of malformed code and providing as complete a model as possible. This is very handy for my evil purposes.
Picking A Defect
Now I happen to really like C#, it’s a nice, fast, fairly rapidly evolving, statically typed language. You get first class functions, a garbage collector, a good type system, and all that jazz. Of course, no language is perfect so I’m going to pick one of my nits with C# and hack up a dialect that addresses it using Roslyn.
The particular defect is that you must specify return types in a method declaration, this is often pointless repetition in my opinion.
Consider this method:
static MvcHtmlString ToDateOnlySpanPretty(DateTime dt, string cssClass)
{
return MvcHtmlString.Create(String.Format(@"<span title=""{0:u}"" class=""{1}"">{2}</span>", dt, cssClass, ToDateOnlyStringPretty(dt, DateTime.UtcNow)));
}
Is that first MvcHtmlString really necessarily? Things get worst when you start returning generic types, oftentimes I find myself writing code like:
Dictionary<string, int> CalcStatistics()
{
var result = new Dictionary<string, int>();
// ...
return result;
}
Again, the leading Dictionary<string, int> is really not needed the type returned is quite apparent so we’re really just wasting key strokes there.
This pointless repetition was addressed for local variables in C# 3.0 with the var keyword. With Roslyn, we can hack up a pre-processor that allows var as a method’s return type. The above would become the following:
var CalcStatistics()
{
var results = new Dictionary<string, int>();
// ...
return results;
}
The Rules
Wanton type inferencing can be a bit dangerous, it makes breaking contracts pretty easy to do for one. So I’m imposing a few constraints on where “var as a return” can be used, for the most part these are arbitrary and easily changed.
- var can only be used on non-public and non-protected methods
- if a method returns multiple types, a type will be chosen for which all returned types have an implicit conversion with the following preferences
- classes before interfaces
- derived types before base types
- generic types before non-generic types
- in alphabetical order
- with the exceptions that Object is always considered last and IEnumerable (and IEnumerable<T>) are considered before other interfaces
- a method with empty returns will become void
The last rule may be a little odd as C# doesn’t have a notion of a “void type” really, which is something of a defect itself in my opinion. Which type is chosen for a return is well-defined so it’s predictable (always a must in a language feature) and attempts to match “what you meant”.
I made one more handy extension, which is allowing var returning methods to return anonymous types. You can sort of do this now either passing around Object or using horrible grotty hacks (note, don’t actually do that); but since there’s no name you can’t do this cleanly. Since var returning methods don’t need names, I figured I might as well address that too.
Let’s See Some Code
Actually loading a solution and parsing/compiling is simple (and boring), check out the source for how to do it.
The first interesting bit is finding all methods that are using “var” as a return type.
// eligibleMethods is a List<MethodDeclarationSyntax> var needsInferencing = eligibleMethods .Where( w => (w.ReturnType is IdentifierNameSyntax) && ((IdentifierNameSyntax)w.ReturnType).IsVar ).ToList();
This literally says “find the methods that have return type tokens which are ‘var'”. Needless to say, this would be pretty miserable to do from scratch.
Next interesting bit, we grab all the return statements in a method and get the types they return.
var types = new List<TypeInfo>();
// returns is a List<ReturnStatementSyntax>
foreach (var ret in returns)
{
var exp = ret.Expression;
if (exp == null)
{
types.Add(TypeInfo.None);
continue;
}
var type = model.GetTypeInfo(exp);
types.Add(type);
}
Note the “model.GetTypeInfo()” call there, that’s Roslyn providing us with detailed type information (which we’ll be consuming in a second) despite the fact that the code doesn’t actually compile successfully at the moment.
We move onto the magic of actually choosing a named return type. Roslyn continues to give us a wealth of type information, so all possible types are pretty easy to get (and ordering them is likewise simple).
var allPossibilities = info.SelectMany(i => i.Type.AllInterfaces).OfType<NamedTypeSymbol>().ToList();
allPossibilities.AddRange(info.Select(s => s.Type).OfType<NamedTypeSymbol>());
// info is an IEnumerable<TypeInfo>
foreach (var i in info)
{
var @base = i.Type.BaseType;
while (@base != null)
{
allPossibilities.Add(@base);
@base = @base.BaseType;
}
}
Rewriting the method with a new return type is a single method call, and replacing the “from source” method with our modified one is just as simple.
And that’s basically it for the non-anonymous type case.
Anonymous Types
For returning anonymous types, everything up to the “rewrite the method” step is basically the same. The trouble is, even though we know the “name” of the anonymous type we can’t use it. What we need to do instead is “hoist” the anonymous type into an actual named type and return that instead.
This is kind of complicated, but you can decompose it into the following steps:
- Make a note of returned anonymous types
- Rewrite methods to return a new type (that doesn’t exist yet)
- Rewrite anonymous type initializers to use the new type
- Create the new type declaration
Steps #1 and #2 are easy, Roslyn doesn’t care that your transformation doesn’t make sense yet.
Step #3 requires a SyntaxRewriter and a way to compare anonymous types for equality. The rewriter is fairly simple, the syntax for an anonymous type initializer and a named one vary by very little.
Comparing anonymous types is a little more complicated. By the C# spec, anonymous are considered equivalent if they have the same properties (by name and type) declared in the same order.
Roslyn gives us the information we need just fine, so I threw a helper together for it:
internal static bool AreEquivalent(TypeSymbol a, TypeSymbol b, Compilation comp)
{
var aMembers = a.GetMembers().OfType<PropertySymbol>().ToList();
var bMembers = b.GetMembers().OfType<PropertySymbol>().ToList();
if (aMembers.Count != bMembers.Count) return false;
for (var i = 0; i < aMembers.Count; i++)
{
var aMember = aMembers[i];
var bMember = bMembers[i];
if (aMember.Name != bMember.Name) return false;
if (aMember.DeclaredAccessibility != bMember.DeclaredAccessibility) return false;
var aType = aMember.Type;
var bType = bMember.Type;
var aName = aType.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat);
var bName = bType.ToDisplayString(SymbolDisplayFormat.FullyQualifiedFormat);
if (aName == bName) continue;
var conv = comp.ClassifyConversion(aType, bType);
if (!conv.IsIdentity) return false;
}
return true;
}
Notice that Roslyn also gives us details about what conversions exist between two types, another thing that would be absolutely hellish to implement yourself.
The final step, adding the new type, is the biggest in terms of code although it’s not really hard to understand (I also cheat a little). Our hoisted type needs to quack enough like an anonymous to not break any code, which means it needs all the expected properties and to override Equals, GetHashCode, and ToString.
This is all contained in one large method. Rather than reproduce it here, I’ll show what it does.
Take the anonymous type:
new
{
A = "Hello",
B = 123,
C = new Dictionary<string, string>()
}
This will get a class declaration similar to
internal class __FTI88a733fde28546b8ae4f36786d8446ec {
public string A { get; set; }
public int B { get; set; }
public global::System.Collections.Generic.Dictionary<string, string> C { get; set; }
public override string ToString()
{
return new{A,B,C}.ToString();
}
public override int GetHashCode()
{
return new{A,B,C}.GetHashCode();
}
public override bool Equals(object o)
{
__FTI88a733fde28546b8ae4f36786d8446ec other = o as __FTI88a733fde28546b8ae4f36786d8446ec;
if(other == null) return new{A,B,C}.Equals(o);
return
(A != null ? A.Equals(other.A) :
(other.A != null ? other.A.Equals(A) : true )) &&
B == other.B &&
(C != null ? C.Equals(other.C) : (other.C != null ? other.C.Equals(C) : true ));
}
}
The two big cheats here are the lack of a constructor (so it’s technically possible to modify the anonymous type, this is fixable but I don’t think it’s needed for a proof-of-concept) and using the equivalent anonymous type itself to implement the required methods.
Conclusion
I’m pretty pleased with Roslyn thus far, it’s plenty powerful. There are still a bunch of limitations and unimplemented features in the current CTP though, so be aware of them before you embark on some recreational hacking.
In terms of complaints, some bits are a little verbose (though that’s gotten better since the first CTP), documentation is still lacking, and there are some fun threading assumptions that make debugging a bit painful. I’m sure I’m Doing It Wrong™ in some places, so some of my complaints may be hogwash; I expect improvements in each subsequent release as well.
If it wasn’t apparent, the code here is all proof-of-concept stuff. Don’t use in production, don’t expect it to be bug free, etc. etc.
You can check out the whole project on Github.
More, a CSS compiler
Posted: 2012/05/08 Filed under: code 4 CommentsCSS is an… interesting technology. As Wil Shipley put it, “CSS: If a horse were designed by a committee of camels.” There’s just enough rough edges, weird decisions, and has-not-had-to-use-it-everyday blindness to make you want something better to work in. While it’s still a draft (and I hope has no chance of becoming part of a standard), take a look at CSS Variables and despair that that was proposed.
I’m hardly the first (second, or probably even millionth) person to think this, so there are some alternatives to writing CSS out there. At Stack Exchange we use LESS (specifically the dotLESS variant), Sass is also pretty popular just from straw polling developers though I’m not really familiar with it.
For kicks I started throwing my own CSS compiler together a few months ago that’s a little more radical than LESS (from which I took a great deal of inspiration). I’m calling it More for now because: naming is hard, and I like the homage to LESS. Although I am drawing explicit inspiration from LESS, More is not a LESS super-set.
The radical changes are
- Order doesn’t matter
- “Resets” are first class
- Sprites are first class
- Explicit copy selector syntax, in addition to mixins
Declaration Order Doesn’t Matter
In CSS subsequent declarations override preceding ones, if you declare .my-class twice the latter one’s rules win; likewise for rules within the same block. This is pretty wrong-headed in my opinion, you should be stating your intentions explicitly not just tacking on new rules. Combine this with how negative an impact bloated CSS can have on page load speeds (CSS and HTML are the bare minimum necessary for display, everything else can be deferred), I can’t help but classify this as a misfeature.
In More, the order of selector blocks in the output is explicitly not guaranteed to match that of the input. Resets, detailed below, allow you to specify that certain blocks preceed all others and @media blocks necessarily follow all others but any other reliance on ordering is considered a bad practice.
Likewise, overriding rules in blocks is an explicit operation rather than a function of declaration order, the specifics are detailed below.
Resets
More natively supports the concept of a “reset”. Blocks contained between @reset{…} will always be placed at the top of generated CSS file, and become available for inclusion via reset includes.
@reset{
h1 { margin: 0; }
}
.class h1 { color: blue; }
becomes
h1 { margin: 0; }
.class h1 { color: blue; }
Blocks within a @reset block cannot use @reset includes or selector includes, but can use mixins. Reset blocks are also not referenced by selector includes.
@reset includes let you say “reset this block to the default stylings”, practically this means including any properties defined within a @reset{…} in the block with the @reset include.
@reset {
h1 { margin: 0; }
}
.class h1 { color: blue; @reset(h1); }
becomes
h1 { margin: 0; }
.class h1 { color:blue; margin: 0; }
It is not necessary to specify the selector to reset to, an empty @reset() will reset to the selector of the block that contains it. Note that for nested blocks this will be the inner most blocks selector.
@reset {
:hover { color: red; }
h1 { margin: 0; }
}
h1 {
@reset();
line-height: 15px;
}
a {
color: blue;
&:hover {
font-weight: bold;
@reset();
}
}
becomes
:hover { color: red; }
h1 { margin: 0; }
h1 { margin: 0; line-height: 15px; }
a { color: blue; }
a:hover { font-weight: bold; color: red; }
Properties included by a @reset() (with or without optional selector) will never override those defined otherwise in a block.
@reset { a { color: red; height: 10px; } }
a { @reset(); color: blue; } }
becomes
a { color: red; height: 10px; }
a { color: blue; height: 10px; }
The intent behind resets is to make it easier to define a style’s “default element stylings” and subsequent reset to them easily.
Sprites
If you include a non-trivial number of images on a site, you really ought to be collapsing them into sprites. There are plenty of tools for doing this out there, but tight integration with CSS generation (if you’ve already decided to compile your CSS) is an obvious win.
To that end, More lets you generate sprites with the following syntax:
@sprite('/img/sprite.png'){
@up-mod = '/img/up.png';
@down-mod = '/img/down.png';
}
This declaration creates the sprite.png from up.png and down.png, and adds the @up-mod and @down-mod mixins to the file. Mixins created from sprites are no different than any other mixins.
For example:
.up-vote {
@up-mod();
}
could compile to
.up-vote {
background-image: url(img/sprite.png);
background-position: 0px 0px;
background-repeat: no-repeat;
width: 20px;
height: 20px;
}
For cases there are other properties that you want to include semantically “within” a sprite, these generated mixins take one mixin as an optional parameter. The following would be valid, for example.
.down-vote {
@down-mod(@myMixin);
}
Finally, there are command line options for running executable on generated sprite files. It’s absolutely worthwhile to squeeze the last few bytes out of a PNG file you’ll serve a million times, but actually implementing such a compression algorithm is outside the scope of More. While this can be fairly easily hacked together, as a built-in I hope it serves as a “you should be doing this” signal to developers. I personally recommend PNGOUT for this purpose.
Include Selectors
Sometimes, you just want to copy CSS around. This can be for brevity’s sake, or perhaps because you want to define a set of selectors in terms of some other selector. To do that with More, just include the selector in @() like so:
div.foo { color: #abc; }
div.bar {
background-color: #def;
@(div.foo);
}
This syntax works with more complicated selectors as well, including comma delimited ones, making it possible to include many distinct blocks in a single statement. It is not an error if a selector does not match any block in the document.
Selector includes are one of the last things to resolve in a More document, so you are copying the final CSS around. @(.my-class) will copy the final CSS in the .my-class block, but not any of the mixin invokations, nested blocks, or variable declarations that went into generating it.
For example:
.bar {
foo: @c ?? #eee;
&:hover {
buzz: 123;
}
}
.fizz {
@c = #aaa;
@(.bar);
}
Will generate:
.bar { foo: #eee; }
.bar:hover { buzz: 123; }
.fizz { foo: #eee; }
As @(.bar) copies only the properties of .bar in the final document.
LESS Inspired Bits
That’s it for the really radical new stuff, most of the rest of More is heavily inspired by (though rarely syntactically identical to) LESS. Where the syntax has been changed, it has been changed for reasons of clarity (at least to me, clarity is inherently subjective).
More takes pains to “stand out” from normal CSS, as it is far more common to be reading CSS than writing it (as with all code). As such, I believe it is of paramount importance that as little of a file need be read to understand what is happening in a single line of More.
This is the reason for heavy use of the @ character, as it distinguishes between “standard CSS” with its very rare at-rules and More extensions when scanning a document. Likewise, the = character is used for assignment instead of :.
Variables
More allows you to define constants that can then be reused by name throughout a document, they can be declared globally as well as in blocks like mixins (which are detailed below).
@a = 1; @b = #fff;
These are handy for defining or overriding common values.
Another example:
@x = 10px;
img {
@y = @x*2;
width: @y;
height: @y;
}
It is an error for one variable to refer to another in the same scope before it is declared, but variables in inner scopes can referrer to those in containing scopes irrespective of declaration order. Variables cannot be modified once declared, but inner variables can shadow outer ones.
Nested Blocks
One of the annoying parts of CSS is the repetition when declaring a heirarchy of selectors. Selectors for #foo, #foo.bar, #foo.bar:hover, and so on must appear in full over and over again.
More lets you nest CSS blocks, again much like LESS.
#id {
color: green;
.class {
background-color: red;
}
&:hover {
border: 5px solid red;
}
}
would compile to
#id {
color: green;
}
#id .class {
background-color: red;
}
#id:hover {
border: 5px solid red;
}
Note the presense of LESS’s & operator, which means “concat with parent”. Use it for when you don’t want a space (descendent selector) between nested selectors; this is often the case with pseudo-class selectors.
Mixins
Mixins effectively lets you define functions which you can then include in CSS blocks.
@alert(){
background-color: red;
font-weight: bold;
}
.logout-alert{
font-size: 14px;
@alert();
}
Mixins can also take parameters.
@alert(@c) { color: @c; }
Parameters can have default values, or be optional altogether. Default values are specified with an =<value>, optional parameters have a trailing ?.
@alert(@c=red, @size?) { color: @c; font-size: @size; }
Mixin’s can be passed as parameters to mixins, as can selector inclusions.
@outer(@a, @b) { @a(); @b(green); }
@inner(@c) { color: @c; }
h1 { font-size: 12px; }
img { @outer(@(h1), @inner); }
Will compile to
h1 { font-size: 12px; }
img { font-size: 12px; color: green; }
As in LESS, the special variable @arguments is bound to all the parameters passed into a mixin.
@mx(@a, @b, @c) { font-family: @arguments; }
p { @mx("Times New Roman", Times, serif); }
Compiles to
p { font-family: "Times New Roman", Times, serif; }
Parameter-less mixins do not define @arguments, and it is an error to name any parameter or variable @arguments.
Functions
More implements the following functions:
- blue(color)
- darken(color, percentage)
- desaturate(color, percentage)
- fade(color, percentage)
- fadein(color, percentage)
- fadeout(color, percentage)
- gray(color)
- green(color)
- hue(color)
- lighten(color, percentage)
- lightness(color)
- mix(color, color, percentage/decimal)
- nounit(any)
- red(color)
- round(number, digits?)
- saturate(color, percentage)
- saturation(color)
- spin(color, number)
Most of these functions are held in common with LESS, to ease transition. All functions must be preceded by @, for example:
@x = #123456;
p {
color: rgb(@red(@x), 0, 0);
}
Note rgb(…), hsl(..), and other CSS constructs are not considered functions, but they will convert to typed values if used in variable declarations, mixin calls, or similar constructs.
In String Replacements
More strives to accept all existing valid CSS, but also attempts to parse the right hand side of properties for things like math and type information. These two goals compete with each other, as there’s a lot of rather odd CSS out there.
img {
filter: progid:DXImageTransform.Microsoft.MotionBlur(strength=9, direction=90);
font-size: 10px !ie7;
}
It’s not reasonable, in my opinion, to expect style sheets with these hacks to be re-written just to take advantage of More, but it can’t parse them either.
As a compromise, More will accept any of these rules but warn against their use. Since a fancier parsing scheme isn’t possible, a simple string replacement is done instead.
img {
filter: progid:DXImageTransform.Microsoft.MotionBlur(strength=@strength, direction=90);
font-size: @(@size * 2px) !ie7;
}
The above demonstrates the syntax. Any variables placed in the value will be simply replaced, more complicated expressions need to be wrapped in @().
Although this feature is intended mostly as a compatibility hack, it is also available in quoted strings.
Other, Less Radical, Things
There are a couple of nice More features that aren’t really LESS inspired, but would probably fit right in a future LESS version. They’re not nearly as radical additions to the idea of a CSS compiler.
Optional Values
Mixin invocations, parameters, variables, and effectively whole properties can be marked “optional”.
.example {
color: @c ?? #ccc;
@foo()?;
}
This block would have a color property of #ccc if @c were not defined, making overrides quite simple (you either define @c or you don’t). Likewise, the foo mixin would only be included if it were defined (without the trailing ? it would be an error).
A trailing ? on a selector include will cause a warning, but not an error. It doesn’t make sense, as selector includes are always optional.
Entire properties can be optional as follows:
.another-example {
height: (@a + @b * @c)?;
}
If any of a, b, or c are not defined then the entire height property would not be evaluated. Without the grouping parenthesis and trailing ? this would result in an error.
Overrides
When using mixins, it is not uncommon for multiple copies of the same property to be defined. This is especially likely if you’re using mixins to override default values. To alleviate this ambiguity, it is possible to specify that a mixin inclusion overrides rules in the containing block.
@foo(@c) { color: @c; }
.bar {
color: blue;
@foo(red)!;
}
This would create a .bar block with a single color property with the value of “red”. More will warn, but not error, when the same property is defined more than once in a single block.
You can also force a property to always override (regardless of trailing ! when included) by using !important. More will not warn in these cases, as it’s expressed intent.
You can also use a trailing ! on selector includes, to the same effect.
Values With Units
Values can be typed, and will be correctly coersed when possible and result in an error otherwise. This makes it easier to catch non-nonsensical statements.
@mx(@a, @b) { height: @a + @b; }
.foo{
@mx(12px, 14); // Will evaluate to height: 26px;
@mx(1in, 4cm); // Will evaluate to height: 6.54cm; (or equivalent)
@mx(15, rgb(50,20,10)); // Will error
}
Similarly, any function which expects multiple colors can take colors of any form (hsl, rgb, rgba, hex triples, or hex sextuples).
When needed, units can be explicitly stripped from a value using the built in @nounit function.
Includes
You can include other More or CSS files using the @using directive. Any included CSS files must be parse-able as More (which is a CSS super-set, although a strict one). Any variables, mixins, or blocks defined in any includes can be referred to at any point in a More file, include order is unimportant.
@using can have a media query, this results in the copied blocks being placed inside of a @media block with the appropriate query specified. More will error if the media query is unparseable.
At-rules, which are usually illegal within @media blocks, will be copied into the global context; this applies to mixins as well as @font-face, @keyframes, and so on. This makes such at-rules available in the including file, allowing for flexibility in project structure.
More does allow normal CSS @imports. They aren’t really advisable for speed reasons. @imports must appear at the start of a file, and More will warn about @import directives that had to be re-ordered as that is not strictly compliant CSS in the first place.
CSS3 Animations
More is aware of the CSS3 @keyframes directive, and gives you all the same mixin and variable tools to build them.
@mx(@offset) { top: @offset * 2px; left: @offset + 4px; }
@keyframes my-anim {
from { @mx(5); }
to { @mx(15); }
}
Compiles to
@keyframes my-anim {
from { top: 10px; left: 9px;}
to { top: 30px; left: 19px; }
}
It is an error to include, either via a mixin or directly, a nested block in an animation block. More will also accept (and emit unchanged) the Mozilla and Webkit prefixed versions of @keyframes.
Variables can be declared within both @keyframes declarations and the inner animation blocks, the following would be legal More for example.
@keyframes with-vars {
@a = 10;
from {
@b = @a * 2;
top: @b + 5px;
left: @b + 3px;
}
to { top: @a; left: @a; }
}
It is an error to place @keyframes within @media blocks or mixins. As with @media, @keyframes included via a @using directive will be pulled into the global scope.
More Is A CSS Superset
More strives to accept all existing CSS as valid More. In-string replacements are a concession to this, as is @import support; both of which were discussed above. @charset, @media, and @font-face are also supported in pursuit of this goal.
@charset may be declared in any file, even those referenced via @using. It is an error for more than one character set to be referred to in a final file. More will warn if an unrecognized character set is used in any @charset declaration.
@media declarations are validated, with More warning on unrecognized media types and erroring if no recognized media types are found in a declaration. Blocks in @media statements can use mixins, but cannot declare them. Selector includes first search within the @media block, and then search in the outer scope. It is not possible for a selector include outside of a @media block to refer to a CSS block inside of one.
By way of example:
img { width: 10px; @(.avatar); }
p { font-color: grey; }
@media tv {
p { font-color: black; }
.avatar { border-width: 1px; @(p); }
.wrapper { height:10px; @(img); }
}
Will compile to:
img { width: 10px; }
p { font-color: grey; }
@media tv {
p { font-color: black; }
.avatar { border-width: 1px; font-color: black; }
.wrapper { height:10px; width: 10px; }
}
@font-face can be used as expected, however it is an error to include any nested blocks in a @font-face declaration. More will warn if a @font-face is declared but not referred to elsewhere in the final CSS output; this is generally a sign of an error, but the possibility of another CSS file (or inline style) referring to the declared font remains. More will error if no font-family or src rule is found in a @font-face block.
Minification
More minifies it’s output (based on a command line switch), so you don’t need a separate CSS minifier. It’s not the best minifier out there, but it does an OK job.
In particular it removes unnecessary white space, quotation marks, and reduces color and size declarations; “green” becomes #080 and “140mm” become “14cm” and so on.
A contrived example:
img
{
a: #aabbcc;
b: rgb(100, 50, 30);
c: #008000;
d: 10mm;
e: 1.00;
f: 2.54cm;
}
Would become (leaving in white space for reading clarity):
img
{
a: #abc;
b: #64321e;
c: green;
d: 1cm;
e: 1;
f: 1in;
}
This minification is controlled by a command line switch.
Cross Platform
For web development platform choice is really irrelevant and acknowledging that More runs just fine under Mono, and as such works on the Linux and Mac OS X command lines.
More Isn’t As Polished As LESS
I always feel a need to call this out when I push a code artifact, but this isn’t a Stack Exchange thing. This is a “Kevin Montrose screwing around with code to relax” thing. I’ve made a pretty solid effort to make it “production grade”, lots of tests, converted some real CSS (some of which is public, some of which is unfortunately not available), but this is still a hobby project. Nothing is going to make it as solid as years of real world usage, which LESS and Sass have and More doesn’t.
Basically, if you’re looking for something to simplify CSS right now and it cannot fail go with one of them. But, if you like some of what you read and are looking to play around well…
Give it a Spin
You can also check out the code on Github.
Committing A Horrible Atrocity Against C#
Posted: 2012/05/05 Filed under: code Comments Off on Committing A Horrible Atrocity Against C#Title from the other side of the tracks: Bringing Sanity To C# Conditions
Have you played with Roslyn? It’s Microsoft’s “compiler as a service” framework coming to a future .NET release, with a CTP having been available for awhile now.
The focus seems to be on making it really easy to do the sorts of source transformations Visual Studio (and heavy hitting plugins like ReSharper) specialize in. Better code generation, static analysis, and what-have-you tooling will be probably be nice result as well.
But you can do some really evil/awesome things as well thanks to Roslyn failing very gracefully in the presence of malformed code, by design.
For example, have you ever wanted C# to be truthy?
Truthiness is coalescing certain values to true/false that are not strictly boolean; making null a stand in for false, for example. This is a common feature in dynamic languages, Python, Ruby, and of course Javascript all have some notion of truthiness.
I personally kind of hate truthiness (I’d appreciate it as an explicit unary operator though), but I realize this is a religious position. I also hate mayonnaise, but don’t begrudge it’s existence.
But if you do like truthiness, or like writing Fun™ code like I do, Roslyn makes it possible to cram it into C#.
I’m choosing to define Truthiness In C# as:
- null, the empty string, and the defaults of ValueTypes are false
- everything else is true
- only available in control statements, not as a type coercion
This definition is really easy to change, for reasons that will become apparent later, and is basically Python’s take (minus collections and dictionaries) so it’s not too crazy.
I want this to be a build step, so the flow for compiling with truthiness is:
- Take a (possibly invalid due to truthy statements) C# program
- Find all the bits that assume truthiness
- Replace those bits with equivalent and valid vanilla C#
- Hand the results off to the actual C# compiler
Roslyn actually can emit assemblies (in theory, I haven’t tried in the CTP), but for the sake of brevity I’m choosing to stop the process early and write new .cs files to disk.
Finding bits that assume truthiness is quite simple, because Roslyn doesn’t just blow up on an invalid program; it does it’s best to give you everything that can be gleaned from malformed code, just like Visual Studio.
This let’s us use a SyntaxWalker to visit each node of the AST for our dodgy program and still mostly make sense of it. If we encounter anything where we expect a conditional (inside an if statement, for example) that isn’t provably a boolean, then we’ve probably found a truthy usage. We don’t do anything with that knowledge yet, we just stash it away for later.
The SyntaxWalker that does this is surprisingly simple. Once you’ve got a SemanticModel, figuring out the type of an expression is trivial.
private bool IsBool(ExpressionSyntax exp)
{
var info = Model.GetSemanticInfo(exp);
var knownType = info.Type;
return
knownType != null &&
knownType.SpecialType == Roslyn.Compilers.SpecialType.System_Boolean;
}
Once we’ve walked all source, finding a list of all truthy things, we can do the actual truthy implementation. The easiest way to do this is to wrap all truthy values in a call to method that implements the above rules. A more “correct” way would be to transform the truthy expressions into proper boolean ones, saving a method call and giving the compiler more information to work with.
I naturally went with the easy way, adding this method to every class that uses truthy expressions. This also makes it very easy to change the truthiness rules, as I alluded to earlier.
private static bool __Truthy(object o)
{
if (o == null || (o is string && (string)o == string.Empty)) return false;
var type = o.GetType();
if (type.IsValueType)
{
return !o.Equals(Activator.CreateInstance(type));
}
return true;
}
There’s a bit of finese in the actual wrapping of expression in calls to __Truthy. If an expression contains a sub-expression that is itself truthy we want to replace the sub-expression first and re-walk the SyntaxTree. This is because whether or not an expression is truthy is dependent on it’s sub-expressions: (true || “”) is truthy, but (true || __Truthy(“”)) is not essentially. There’s also a little bit of work spent detecting if something is already being passed to __Truthy, so we don’t end up with __Truthy(__Truthy(“”)) or similar; this is mostly caused by ternary conditionals just being weird relatively speaking.
The full project on Github is an executable that transforms a whole Visual Studio .csproj, in a rather haphazard fashion. You need the referenced assemblies to get a SemanticModel, which I’m extracting from .csprojs for convenience.
To illustrate the transformation, here’s some truthy C#.
static void Main(string[] args)
{
UserDefined ud = new UserDefined { ABoolProp = false, AStringProp = "" };
bool a = true, b = false;
if ("hello world") Console.WriteLine("String literals are truthy");
if (1) Console.WriteLine("int literals are truthy");
if (!null) Console.WriteLine("null is falsy");
if (!ud.AStringProp) Console.WriteLine("Member access works");
if (ud.AStringProp.Length || !ud.ABoolProp) Console.WriteLine("Chained member access works");
if (a || b || 0) Console.WriteLine("Normal bools aren't mangled");
if (true) Console.WriteLine("Normal literals aren't mangled");
string str = a ? "hello" : "world";
if (str == "hello") Console.WriteLine("Normal ternary conditionals aren't mangled");
for (int i = 0; i < 3 && (ud.AnIntMethod() ? "hello" : "world"); i++)
{
Console.WriteLine(i + " complicated condition");
}
}
This gets transformed into
static void Main(string[] args)
{
UserDefined ud = new UserDefined { ABoolProp = false, AStringProp = "" };
bool a = true, b = false;
if (__Truthy("hello world")) Console.WriteLine("String literals are truthy");
if (__Truthy(1)) Console.WriteLine("int literals are truthy");
if (!__Truthy(null)) Console.WriteLine("null is falsy");
if (!__Truthy(ud.AStringProp)) Console.WriteLine("Member access works");
if (__Truthy(ud.AStringProp.Length) || !ud.ABoolProp) Console.WriteLine("Chained member access works");
if (a || b || __Truthy(0)) Console.WriteLine("Normal bools aren't mangled");
if (true) Console.WriteLine("Normal literals aren't mangled");
string str = a ? "hello" : "world";
if (str == "hello") Console.WriteLine("Normal ternary conditionals aren't mangled");
for (int i = 0; i < 3 && (__Truthy(__Truthy(ud.AnIntMethod()) ? "hello" : "world")); i++)
{
Console.WriteLine(i + " complicated condition");
}
}
Notice that spacing is maintained, Roslyn round trips that sort of thing which is very nice.
And of course, the code runs and outputs the expected text:
String literals are truthy int literals are truthy null is falsy Member access works Chained member access works Normal bools aren't mangled Normal literals aren't mangled Normal ternary conditionals aren't mangled 0 complicated condition 1 complicated condition 2 complicated condition
There are, as always, caveats. Lots in this case, as Roslyn is still a CTP and there are a number of C# features it doesn’t handle yet. Using var or lambdas will mess things right up, and I’m sure there are some out right bugs in both Roslyn and my dinky code.
But isn’t it cool that it’s possible to abuse C# even this much already?
An Absurd Experiment In String Parsing
Posted: 2012/02/26 Filed under: code 1 CommentI’m going to start this off by saying that this post deals with some code that is a bit… silly. Definitely veering into “why would you do this?” territory for most, but it’s still an interesting endeavor.
The question is, how quickly can you parse a string?
To give some context, at Stack Exchange we’ve got a process that’s slurping logs from our load balancer into SQL Server for us to query; it’s useful for analytics, debugging, and performance monitoring. The trick is that at our scale this is quite a lot of data to be dealing with, and traffic spikes at peak time have a habit of knocking the service into a death spiral.
Investigations into one of the latest incidents of service suicide indicated, for a short time, that actually parsing the logs was a bottleneck. Ultimately this turned out to be a red-herring (the actual killer was, rather predictably, disk IO), but the seed of “how would we do this faster?” was planted.
To be clear, what follows isn’t solving a real problem; this is just me having some fun with code. Everyone should do it now and again, builds character.
There are, broadly, two approaches to string parsing
You can either go with regular expressions, or you can roll it yourself with IndexOf (or your framework of choice’s equivalent). All things being equal, I’m inclined to go with regular expressions for their terseness and comparable ease of maintainability.
If you’re going for maximum speed (as I am here), you really do want to do all the string manipulation yourself and maintainability be damned. However, it’d be nice if we could get the performance of IndexOf’ing and Substring’ing everywhere with a cleaner interface.
Enter IL
ILGenerator, my favorite class to go out of my way to avoid, lets you generate new methods at the bytecode (Common Intermediate Language, in the .NET world) level at runtime. If you want speed, you’ll find it here.
The approach I went with was to create a nice interface (call chaining, opaque state objects, and all that jazz) for describing a parser, and at the last minute dropping into some hideous IL generation to create an actual delegate to do the deed. This produces very, very fast string parsing code without being unbearable to use; it also enables some dirty tricks we wouldn’t want to use even in hand rolled parsing code, which I’ll get to later.
In terms of capabilities, I decided that a reasonable string parser would be composed of the following components
- Moving forward or backwards in the string a fixed number of characters
- Moving forward or backwards in the string until a certain string is encountered
- Taking a given number of characters from the string as a value
- Taking characters from the string until a certain string is encountered
- Taking the remainder of the string as a value
- Perform an arbitrary action when the input string does not conform to the parser’s expectations
I decomposed these six components further into eighteen different method calls, though there are many convenience overrides (referring to a member via either a string or a MemberInfo being the most common).
It ends up looking like the following (contrived example) in practice:
var parser =
FSBuilder
.Take(":=", "VarName")
.Take("(", "MethodName")
.Take(")", "Parameters")
.Until(";")
.TakeRest("Remained")
.Else((str, obj) => { throw new ArgumentException(); })
.Seal();
var myObj = new MyObject();
parser("a:=fx(a,b,c); ... and then some more ...", myObj);
Do note that this isn’t meant to be a replacement for regular expressions, it’s meant to replace a class of string parsers for which I’d expect most developers to use regular expressions. I do suspect these operations are enough to build DFAs (though I’m not going to spend time trying to prove it, I don’t really care), but not very cleanly and most regular expression engines aren’t strictly regular anyway.
This code runs like greased lightening
On my machine, parsing “(\d+),(\d+),(\d+),(\d+)” and it’s equivalent hand written and composed versions across one million randomly generated (but compliant) strings yields the following.
- Regex: 3540ms
- Hand Written: 1140ms
- Composed: 690ms
I’m dropping the last digit, as it’s highly variable, and taking the median of five runs; forcing a GC (and pausing for it to complete) between each run to try and minimize GC jitters.
Wait, aren’t the hand written and composed versions equivalent?
They’re definately similar, but there are a lot of dirty tricks you can pull off when you’re emitting IL that you wouldn’t really want to tolerate in hand written code.
A big one, if you’re using IndexOf and Substring you’re still creating lots of strings (creating GC pressure) and implicitly paying for method calls (which are fast but not instant, so they add up over many iterations) to get at actual character data. The internals of a regular expression are even worse in terms of overhead and object churn. The IL I’m generating does everything in terms of character arrays (in fact, I can guarantee I create only one to parse a string of integers) and avoids method calls like the plague, effectively inlining everything.
In a related vein, both regular expression and IndexOf/Substring approaches end up giving you strings you need to parse into the appropriate data types. Naively, you’ll typically use things like Int32.Parse which have the same “strings + method call” overhead as above. Generating IL lets me inline my own parsing code, which naturally deals in character arrays, avoiding both costs again.
The IndexOf/Substring approach does still eek out better performance when you’re just dealing with string members. Delving into the .NET classes with Reflector shows that Microsoft has (wisely) invested quite a lot of time in optimizing string manipulation, to match them I’d probably have to start linking C in which is a bigger time investment than I’m willing to make.

There really is no lower limit.
Of course, it could be dirtier
I have left some areas unexplored that would probably squeeze a few more milliseconds out.
Non-integer types still fall back to Parse calls, as do DateTime and TimeSpan and some cases of enumerations. Alternative implementations are time consuming to produce, but would pay dividends especially in the DateTime and TimeSpan cases.
I suspect allocations can be reduced even further through some dubious stackalloc or ThreadLocal usage, possibly combined with some heap juggling. Not nearly as certain these would work out for the best, but a lack of expertise and time have kept from digging too deeply.
My string scanning algorithm is basically a for loop, breaking out something like Boyer-Moore would pay noticeable dividends for larger “needles”. This is a lot of complexity for exactly zero gain in the rather common “single character needle” case, so I haven’t sunk the necessary time in yet and may never.
Not much time has been spent on generating ideal IL either. There’s certainly a lot of stack shuffling that could be eliminated for truly minuscule, if non-zero, savings.
If you’re interested, play around with the code
I’ve thrown it up on github, and uploaded a package on NuGet as the unexcitingly named FluentStringParser (because naming is hard). The code’s moderately well documented, but it’s still IL so the learning curve is more of a brick wall really. Still, learn by doing and all that.
I do stress that this is a personal project, not a Stack Exchange endorsed chunk of code (those are currently documented in this blog post). Caveat emptor if you pull this code in as it does dirty, dirty things for silly, silly reasons.
Stack Exchange API V2.0: JS Auth Library
Posted: 2012/01/25 Filed under: code, pontification | Tags: apiv2 1 CommentIn a previous article I discussed why we went with OAuth 2.0 for authentication in V2.0 of the Stack Exchange API (beta and contest currently underway), and very shortly after we released a simple javascript library to automate the whole affair (currently also considered “in beta”, report any issues on Stack Apps). The motivations for creating this are, I feel, non-obvious as is why it’s built the way it is.
Motivations
I’m a strong believer in simple APIs. Every time a developer using your API has to struggle with a concept or move outside their comfort zone, your design has failed in some small way. When you look at the Stack Exchange API V2.0, the standout “weird” thing is authentication. Every other function in the system is a simple GET (well, there is one POST with /filters/create), has no notion of state, returns JSON, and so on. OAuth 2.0 requires user redirects, obviously has some notion of state, has different flows, and is passing data around on query strings or in hashes. It follows that, in pursuit of overall simplicity, it’s worthwhile to focus on simplifying consumers using our authentication flows. The question then becomes “what can we do to simplify authentication?”, with an eye towards doing as much good as possible with our limited resources. The rationale for a javascript library is that:
- web applications are prevalent, popular, and all use javascript
- we lack expertise in the other (smaller) comparable platforms (Android and iOS, basically)
- web development makes it very easy to push bug fixes to all consumers (high future bang for buck)
- other APIs offer hosted javascript libraries (Facebook, Twitter, Twilio, etc.)
Considerations
The first thing that had to be decided was the scope of the library, as although the primary driver for the library was the complexity of authentication that did not necessarily mean that’s all the library should offer. Ultimately, all it did cover is authentication, for reasons of both time and avoidance of a chilling affect. Essentially, scoping the library to just authentication gave us the biggest bang for our buck while alleviating most fears that we’d discourage the development of competing javascript libraries for our API. It is, after all, in Stack Exchange’s best interest for their to be a healthy development community around our API. I also decided that it was absolutely crucial that our library be as small as possible, and quickly served up. Negatively affecting page load is unacceptable in a javascript library, basically. In fact, concerns about page load times are why the Stack Exchange sites themselves do not use Facebook or Twitter provided javascript for their share buttons (and also why there is, at time of writing, no Google Plus share option). It would be hypocritical to expect other developers to not have the same concerns we do about third-party includes.
Implementation
Since it’s been a while since there’s been any code in this discussion, I’m going to go over the current version (which reports as 453) and explain the interesting bits. The source is here, though I caution that a great many things in it are implementation details that should not be depended upon. In particular, consumers should always link to our hosted version of the library (at https://api.stackexchange.com/js/2.0/all.js).
The first three lines sort of set the stage for “small as we can make it”.
window.SE = (function (navigator, document,window,encodeURIComponent,Math, undefined) {
"use strict";
var seUrl, clientId, loginUrl, proxyUrl, fetchUserUrl, requestKey, buildNumber = '@@~~BuildNumber~~@@';
I’m passing globals as parameters to the closure defining the interface in those cases where we can expect minification to save space (there’s still some work to be done here, where I will literally be counting bytes for every reference). We don’t actually pass an undefined to this function, which both saves space and assures nobody’s done anything goofy like giving undefined a value. I intend to spend some time seeing if similar proofing for all passed terms is possible (document and window are already un-assignable, at least in some browsers). Note that we also declare all of our variables in batches throughout this script, to save bytes from repeating “var” keywords.
Implementation Detail: “@@~~BuildNumber~~@@” is replaced as part of our deploy. Note that we pass it as a string everywhere, allow us to change the format of the version string in the future. Version is provided only for bug reporting purposes, consumers should not depend on its format nor use it in any control flow.
function rand() { return Math.floor(Math.random() * 1000000); }
Probably the most boring part of the entire implementation, gives us a random number. Smaller than inlining it everywhere where we need one, but not by a lot even after minifying references to Math. Since we only ever use this to avoid collisions, I’ll probably end up removing it altogether in a future version to save some bytes.
function oldIE() {
if (navigator.appName === 'Microsoft Internet Explorer') {
var x = /MSIE ([0-9]{1,}[\.0-9]{0,})/.exec(navigator.userAgent);
if (x) {
return x[1] <= 8.0;
}
}
return false;
}
Naturally, there’s some Internet Explorer edge case we have to deal with. For this version of the library, it’s that IE8 has all the appearances of supporting postMessage but does not actually have a compliant implementation. This is a fairly terse check for Internet Explorer versions <= 8.0, inspired by the Microsoft recommended version. I suspect a smaller one could be built, and it’d be nice to remove the usage of navigator if possible.
Implementation Detail: There is no guarantee that this library will always treat IE 8 or lower differently than other browsers, nor is there a guarantee that it will always use postMessage for communication when able.
Now we get into the SE.init function, the first method that every consumer will need to call. You’ll notice that we accept parameters as properties on an options object; this is a future proofing consideration, as we’ll be able to add new parameters to the method without worrying (much) about breaking consumers.
You’ll also notice that I’m doing some parameter validation here:
if (!cid) { throw "`clientId` must be passed in options to init"; }
if (!proxy) { throw "`channelUrl` must be passed in options to init"; }
if (!complete) { throw "a `complete` function must be passed in options to init"; }
if (!requestKey) { throw "`key` must be passed in options to init"; }
This is something of a religious position, but I personally find it incredibly frustrating when a minified javascript library blows up because it expected a parameter that wasn’t passed. This is inordinately difficult to diagnose given how trivial the error is (often being nothing more than a typo), so I’m checking for it in our library and thereby hopefully saving developers some time.
Implementation Detail: The exact format of these error messages isn’t specified, in fact I suspect we’ll rework them to reduce some repetition and thereby save some bytes. It is also not guaranteed that we will always check for required parameters (though I doubt we’ll remove it, it’s still not part of the spec) so don’t go using try-catch blocks for control flow.
This odd bit:
if (options.dev) {
seUrl = 'https://dev.stackexchange.com';
fetchUserUrl = 'https://dev.api.stackexchange.com/2.0/me/associated';
} else {
seUrl = 'https://stackexchange.com';
fetchUserUrl = 'https://api.stackexchange.com/2.0/me/associated';
}
Is for testing on our dev tier. At some point I’ll get our build setup to strip this out from the production version, there’s a lot of wasted bytes right there.
Implementation Detail: If the above wasn’t enough, don’t even think about relying on passing dev to SE.init(); it’s going away for sure.
The last bit of note in SE.init, is the very last line:
setTimeout(function () { complete({ version: buildNumber }); }, 1);
This is a bit of future proofing as well. Currently, we don’t actually have any heaving lifting to do in SE.init() but there very well could be some in the future. Since we’ll never accept blocking behavior, we know that any significant additions to SE.init() will be asynchronous; and a complete function would be the obvious way to signal that SE.init() is done.
Implementation Detail: Currently, you can get away with calling SE.authenticate() immediately, without waiting for the complete function passed to SE.init() to execute. Don’t do this, as you may find that your code will break quite badly if our provided library starts doing more work in SE.init().
Next up is fetchUsers(), an internal method that handles fetching network_users after an authentication session should the consumer request them. We make a JSONP request to /me/associated, since we cannot rely on the browser understanding CORS headers (which are themselves a fairly late addition to the Stack Exchange API).
Going a little out of order, here’s how we attach the script tag.
while (window[callbackName] || document.getElementById(callbackName)) {
callbackName = 'sec' + rand();
}
window[callbackName] = callbackFunction;
src += '?access_token=' + encodeURIComponent(token);
src += '&pagesize=100';
src += '&key=' + encodeURIComponent(requestKey);
src += '&callback=' + encodeURIComponent(callbackName);
src += '&filter=!6RfQBFKB58ckl';
script = document.createElement('script');
script.type = 'text/javascript';
script.src = src;
script.id = callbackName;
document.getElementsByTagName('head')[0].appendChild(script);
The only interesting bit here is the while loop making sure we don’t pick a callback name that is already in use. Such a collision would be catastrophically bad, and since we can’t guarantee anything about the hosting page we don’t have a choice but to check.
Implementation Detail: JSONP is the lowest common denominator, since many browsers still in use do not support CORS. It’s entirely possible we’ll stop using JSONP in the future, if CORS supporting browsers become practically universal.
Our callbackFunction is defined earlier as:
callbackFunction =
function (data) {
try {
delete window[callbackName];
} catch (e) {
window[callbackName] = undefined;
}
script.parentNode.removeChild(script);
if (data.error_id) {
error({ errorName: data.error_name, errorMessage: data.error_message });
return;
}
success({ accessToken: token, expirationDate: expires, networkUsers: data.items });
};
Again, this is fairly pedestrian. One important thing that is often overlooked when making these sorts of libraries is the cleanup of script tags and callback functions that are no longer needed. Leaving those lingering around does nothing but negatively affect browser performance.
Implementation Detail: The try-catch block is a workaround for older IE behaviors. Some investigation into whether setting the callback to undefined performs acceptably for all browsers may let us shave some bytes there, and remove the block.
Finally, we get to the whole point of this library: the SE.authenticate() method.
We do the same parameter validation we do in SE.init, though there’s a special case for scope.
if (scopeOpt && Object.prototype.toString.call(scopeOpt) !== '[object Array]') { throw "`scope` must be an Array in options to authenticate"; }
Because we can’t rely on the presence of Array.isArray in all browsers, we have to fall back on this silly toString() check.
The meat of SE.authenticate() is in this block:
if (window.postMessage && !oldIE()) {
if (window.attachEvent) {
window.attachEvent("onmessage", handler);
} else {
window.addEventListener("message", handler, false);
}
} else {
poll =
function () {
if (!opened) { return; }
if (opened.closed) {
clearInterval(pollHandle);
return;
}
var msgFrame = opened.frames['se-api-frame'];
if (msgFrame) {
clearInterval(pollHandle);
handler({ origin: seUrl, source: opened, data: msgFrame.location.hash });
}
};
pollHandle = setInterval(poll, 50);
}
opened = window.open(url, "_blank", "width=660, height=480");
In a nutshell, if a browser supports (and properly implements, unlike IE8) postMessage we use that for cross-domain communication other we use the old iframe trick. The iframe approach here isn’t the most elegant (polling isn’t strictly required) but it’s simpler.
Notice that if we end up using the iframe approach, I’m wrapping the results up in an object that quacks enough like a postMessage event to make use of the same handler function. This is easier to maintain, and saves some space through code reuse.
Implementation Detail: Hoy boy, where to start. First, the usage of postMessage or iframes shouldn’t be relied upon. Nor should the format of those messages sent. The observant will notice that stackexchange.com detects that this library is in use, and only create an iframe named “se-api-frame” when it is; this behavior shouldn’t be relied upon. There’s quite a lot in this method that should be treated as a black box; note that the communication calisthenics this library is doing isn’t necessary if you’re hosting your javascript under your own domain (as is expected of other, more fully featured, libraries like those found on Stack Apps).
Here’s the handler function:
handler =
function (e) {
if (e.origin !== seUrl || e.source !== opened) { return; }
var i,
pieces,
parts = e.data.substring(1).split('&'),
map = {};
for (i = 0; i < parts.length; i++) {
pieces = parts[i].split('=');
map[pieces[0]] = pieces[1];
}
if (+map.state !== state) {
return;
}
if (window.detachEvent) {
window.detachEvent("onmessage", handler);
} else {
window.removeEventListener("message", handler, false);
}
opened.close();
if (map.access_token) {
mapSuccess(map.access_token, map.expires);
return;
}
error({ errorName: map.error, errorMessage: map.error_description });
};
You’ll notice that we’re religious about checking the message for authenticity (origin, source, and state checks). This is very important as it helps prevent malicious scripts from using our script as a vector into a consumer; security is worth throwing bytes at.
Again we’re also conscientious about cleaning up, making sure to unregister our event listener, for the same performance reasons.
I’m using a mapSuccess function to handle the conversion of the response and invokation of success (and optionally calling fetchUsers()). This is probably wasting some space and will get refactored sometime in the future.
I’m passing expirationDate to success as a Date because of a mismatch between the Stack Exchange API (which talks in “seconds since the unix epoch”) and javascript (which while it has a dedicated Date type, thinks in “milliseconds since unix epoch”). They’re just similar enough to be confusing, so I figured it was best to pass the data in an unambiguous type.
Implementation Detail: The manner in which we’re currently calculating expirationDate can over-estimate how long the access token is good for. This is legal, because the expiration date of an access token technically just specifies a date by which the access token is guaranteed to be unusable (consider what happens to an access token for an application a user removes immediately after authenticating to).
Currently we’ve managed to squeeze this whole affair down into a little less than 3K worth of minified code, which gets down under 2K after compression. Considering caching (and our CDN) I’m pretty happy with the state of the library, though I have some hope that I can get us down close to 1K after compression.
[ed. As of version 568, the Stack Exchange Javascript SDK is down to 1.77K compressed, 2.43K uncompressed.]
Stack Exchange API V2.0: Implementing Filters
Posted: 2012/01/11 Filed under: code, pontification | Tags: apiv2 Comments Off on Stack Exchange API V2.0: Implementing FiltersAs part of this series of articles, I’ve already discussed why we added filters to the Stack Exchange API in version 2.0 (go check it out, you could win a prize). Now I’m going to discuss how they were implemented and what drove the design.
Considerations
Stability
It is absolutely paramount that filters not break, ever. A lot of the benefits of filters go away if applications are constantly generating them (that is, if they aren’t “baked into” executables), and “frustrated” would be a gross understatement of how developers would feel if we kept forcing them to redistribute their applications with new filters.
From stability, it follows that filters need to be immutable. Consider the denial of service attack that would be possible from modifying a malicious party extracting and modifying a filter baked into a popular application.
Speed
One of the big motivations behind filters was improving performance, so it follows that the actual implementation of filters shouldn’t have any measurable overhead. In practice this means that no extra database queries (and preferably no network traffic at all) can occur as consequence of passing a filter.
Ease of Use
While it’s probably impossible to make using a filter more convenient than not using one, it’s important that using filters not be a big hassle for developers. Minimizing the number of filters that need to be created, and providing tools to aid in their creation are thus worthwhile.
Implementation
Format
Filters, at their core, ended up being a simple bitfield of which fields to include in a response. Bitfields are fast to decode, and naturally stable.
Also, every filter includes every type is encoded in this bitfield. This is important for the ease of use consideration, as it makes it possible to use a single filter for all your requests.
Encoding
A naive bitfield for a filter would have, at time of writing, 282 bits. This is a bit hefty, a base64 encoded naive filter would be approximately 47 characters long for example, so it behooves us to compress it somewhat.
An obvious and simple compression technique is to run-length encode the bitfield. We make this even more likely to bear fruit by grouping the bits first by “included in the default filter” and then by “type containing the field”. This grouping exploits the expectation that filters will commonly either diverge from the default filter or focus on particular types.
We also tweak the character’s we’ll use to encode a filter a bit, so we’re technically encoding in a base higher than 64; though we’re losing a character to indicate safe/unsafe (which is a discussion for another time).
All told, this gets the size of filters we’re seeing the wild down to a manageable 3 to 29 characters.
Bit Shuffling
This ones a bit odd, but in the middle of the encoding step we do some seemingly pointless bit shuffling. What we’re trying to do here is enforce opaqueness, why we’d want to do that deserves some explanation.
A common problem when versioning APIs is discovering that a number of consumers (oftentimes an uncomfortably large number) are getting away with doing technically illegal things. An example is SetWindowsHook in Win16 (short version, developers could exploit knowledge of the implementation to avoid calling UnhookWindowsHook), one from V1.0 of the Stack Exchange API is /questions/{id}/comments also accepting answer ids (this exploits /posts/{ids}/comments, /questions/{ids}/comments, and /answers/{ids}/comments all being aliases in V1.x). When you find such behavior you’re left choosing between breaking consumers or maintaining “bugs” in your implementation indefinitely, neither of which are great options.
The point of bit shuffling is to make it both harder to figure out the implementation (though naturally not impossible, the average developer is more than bright enough to figure our scheme out given enough time) so such “too clever for your own good” behavior is harder to pull off, and to really drive the point home that you shouldn’t be creating filters without calling /filter/create.
Backwards Compatibility
Maintaining compatibility between API versions with filters is actually pretty simple if you add one additional constraint, you never remove a bit in the field. This lets you use the length of the underlying bitfield as a version number.
Our implementation maintains a list of fields paired with the length of the bitfield they were introduced on. This lets us figure out which fields were available when a given filter was created, and exclude any newer fields when we encounter an old filter.
Composing A Query
One prerequisite for filters is the ability to easily compose queries against your datastore. After all, it’s useless to know that certain fields shouldn’t be fetched if you can’t actually avoid querying for them.
In the past we would have used LINQ-to-SQL, but performance concerns have long since lead us to develop and switch to Dapper, and SqlBuilder in Dapper.Contrib.
Here’s an rough outline of building part of an answer object query.
// While technically optional, we always need this so *always* fetch it
builder.Select("Id AS answer_id");
builder.Select("ParentId AS question_id");
// links and title are backed by the same columns
if (Filter.Answer.Link || Filter.Answer.Title)
{
builder.LeftJoin("dbo.Posts Q ON Q.Id = ParentId");
builder.Select("Q.Title as title");
}
if(Filter.Answer.LastEditDate)
{
builder.Select("LastEditDate AS last_edit_date");
}
Note that sometimes we’ll grab more data than we intend to return, such as when fetching badge_count objects we always fetch all three counts even if we only intend to return, say, gold. We rely on some IL magic just before we serialize our response to handle those cases.
Caches
The Stack Exchange network sites would fall over without aggressive caching, and our API has been no different. However, introducing filters complicates our caching approach a bit.
In V1.x, we just maintained query -> response and type+id -> object caches. In V2.0, we need to account for the fields actually fetched or we risk responding with too many or too few fields set when we have a cache hit.
The way we deal with this is to tag each object in the cache with a mini-filter which contains only those types that could have been returned by the method called. For example, the /comment mini-filter would contain all the fields on the comment and shallow_user types. When we pull something out of the cache, we can check to see if it matches by seeing if the cached mini-filter covers the relevant fields in the current request’s filter; and if so, use the cached data to avoid a database query.
One clever hack on top of this approach lets us service requests for filters that we’ve never actually seen before. When we have a cache hit for a given type+id pair but the mini-filter doesn’t cover the current request, we run the full request (database hit and all) and then merge the cached object with the returned one and place it back in the cache. I’ve taken to calling this “merge and return to cache” process as widening an object in cache.
Imagine: request A comes in asking for 1/2 the question fields, request B now comes in asking for the other 1/2, then request C comes in asking for all the fields on question. When A is processed there’s nothing in the cache, we run the query and place 1/2 of a question object in cache. When B is processed, we find the cached result of A but it doesn’t have the fields needed to satisfy B; so we run the query, widen the cached A with the new B. When C is processed, we find the cached union of A and B and voilà, we can satisfy C without hitting the database.
One subtlety is that you have to make sure a widened object doesn’t remain in cache forever. It’s all too easy for an object to gain a handful of fields on many subsequent queries resetting it’s expiration each time, causing you to serve exceptionally stale data. The exact solution depends on your caching infrastructure, we just add another tag to the object with it’s maximum expiration time; anything we pull out of the cache that’s past due to be expired is ignored.
Tooling
We attacked the problem of simplifying filter usage in two ways: providing a tool to generate filters, and enabling rapid prototyping with a human-friendly way to bypass filter generation.
We spent a lot of time getting GUI for filter editing up to snuff in our API console (pictured, the /questions console). With just that console you can relatively easily generate a new filter, or use an existing one as a starting point. For our internal development practically all filters have ended up being created via this UI (which is backed by calls to /filter/create), dogfooding has lead me to be pretty satisfied with the result.
For those developers who aren’t using the API console when prototyping, we allow filters to be specified with the “include”, “exclude”, and “base” parameters (the format being the same as calls to /filter/create). The idea here is if you just want a quick query for, say, total questions you probably don’t want to go through the trouble of generating a filter; instead, just call /questions?include=.total&base=none&site=stackoverflow. However, we don’t want such queries to make their way into released application (they’re absurdly wasteful of bandwidth for one) so we need a way to disincentivize them outside of adhoc queries. We do this by making them available only when a query doesn’t pass an application key, and since increased quotas are linked to passing application keys we expect the majority of applications to use filters correctly.
Why I Love Attribute Based Routing
Posted: 2011/07/25 Filed under: code 8 CommentsOver at Stack Exchange we make use of this wonderful little piece of code, the RouteAttribute (an old version can be found in our Data Explorer; a distinct, somewhat hardened, version can also be found as part of StackID, and I link the current version toward the bottom of this post). Thought up by Jarrod “The M is for Money” Dixon sometime around April 2009, this is basically the only thing I really miss in a vanilla MVC project.
Here’s what it looks like in action:
public class UsersController : ControllerBase
{
[Route("users/{id:INT}/{name?}")]
public ActionResult Show(int? id, string name, /* and some more */){
// Action implementation goes here //
}
}
Nothing awe-inspiring, all that says is “any request starting with /users/ followed by a number of 9 or fewer digits (tossing some valid integers out for simplicity’s sake), and optionally by / and any string should be routed to the Show action”.
Compare this to the standard way to do routing in MVC:
public class MvcApplication : System.Web.HttpApplication
{
protected void Application_Start()
{
// Other stuff
routes.MapRoute(
"Default",
"{controller}/{action}/{id}",
/* defaults go here */
);
}
}
This isn’t exactly 1-to-1 as we end up with /users/show/123 instead of /users/123/some-guy, but now let’s call them equivalent. There are good reasons for why you’d want the /users/{id}/{name} route, which are discussed below.
Where’s the gain in using the RouteAttribute?
Ctrl-Shift-F (search in files, in Visual Studio) is way up there. With the RouteAttribute, the code behind a route is sitting right next to the route registration; trivial to search for. You may prefer to think of it as code locality, all the relevant bits of an Action are right there alongside its implementation.
Some might scoff at the utility of this, but remember that UsersController? That’s split across 14 files. The assumption that enough information to identify the location, in code, of an Action can be shoved in its URL falls apart unless you’re ready to live with really ugly urls.
Action method name flexibility. The RouteAttribute decouples the Action method and Controller names from the route entirely. In the above example, “Show” doesn’t appear anywhere, and the site’s urls are better for it.
Granted, most routes will start out resembling (if not matching) their corresponding method names. But with the RouteAttribute, permalinks remain valid in the face of future method renaming.
You’re also able to be pragmatic with Action method locations in code, while presenting a pristine conceptual interface. An administrative route in, for example, the PostsController to take advantage of existing code would still be reached at “/admin/whatever.”
A minor nicety, with the RouteAttribute it’s easy to map two routes to the same Action. This is a bit ugly with routing rules that include method/controller names, for obvious reasons.
Metadata locality. Our RouteAttribute extends ActionMethodSelectorAttribute, which lets us impose additional checks after route registration. This lets you put acceptable HTTP methods, permitted user types, registration priorities (in MVC, the order routes are registered matters), and the like all right there alongside the url pattern.
A (slightly contrived) example:
[Route("posts/{id:INT}/rollback/{revisionGuid?}", HttpVerbs.Post, EnsureXSRFSafe = true, Priority=RoutePriority.High)]
The strength here is, again, grouping all the pertinent bits of information about a route together. MVC already has enough of this approach, with attributes like HttpPost, that you’ll be decorating Actions with attributes anyway.
No need for [NonAction]. The NonActionAttribute lets you suppress a method on a controller that would otherwise be an Action. I’ll admit, there aren’t a lot of public methods in my code that return ActionResults that aren’t meant to be routable, but there are a number that return strings. Yes, if you weren’t aware, a public method returning a string is a valid Action in MVC.
It seems that back in the before times (in the original MVC beta), you had to mark methods as being Actions rather than not being actions. I think the current behavior (opting out of being an Action) makes sense for smaller projects, but as a project grows you run the risk of accidentally creating routes.
You (probably) want unconventional routing. One argument that has arisen internally against using the RouteAttribute is that it deviates from MVC conventions. While I broadly agree that adhering to conventions is Good Thing™, I believe that the argument doesn’t hold water in this particular case.
The MVC default routing convention of “/{controller}/{action}/{id}” is fine as a demonstration of the routing engine, and for internal or hobby projects it’s perfectly serviceable… but not so much for publicly facing websites.
Here are the two most commonly linked URLs on any Stack Exchange site.
/questions/{id}/{title} as in http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o
/users/{id}/{name} as in http://stackoverflow.com/users/59711/arec-barrwin
In both cases the last slug ({name} and {title}) are optional, although whenever we generate a link we do our best to include them. Our urls are of this form for the dual purposes of making them user-readable/friendly, and as SEO. SEO can be further divided into hints to Google algorithms (which is basically black magic, I have no confirmation that it actually does anything) and the more practical benefit of presenting the title of a question twice on the search result page.
Closing Statement
Unlike the WMD editor, Booksleeve, or the MVC MiniProfiler we don’t have an open source “drop in and use it” version of the RouteAttribute out there. The versions released incidentally are either out-dated (as in the Data Explorer) or a cut down and a tad paranoid (as in StackID). To rectify this slightly, I’ve thrown a trivial demonstration of our current RouteAttribute up on Google Code. It’s still not a simple drop in (in particular XSRF token checking had to be commented out, as it’s very tightly coupled to our notion of a user), but I think it adequately demonstrates the idea. There are definitely some quirks in the code, but in practice it works quite well.
While I’m real bullish on the RouteAttribute I’m not trying to say that MVC routing is horribly flawed, nor that anyone using it has made a grave error. If it’s working for you, great! If not, you should give attribute based routing a gander. If you’re starting something new I’d strongly recommend playing with it, you just might like. It’d be nice if a more general version of this were shipping as part of MVC in the not-horribly-distant future.
Mobile Views in ASP.NET MVC3
Posted: 2011/07/17 Filed under: code 5 CommentsOn Stack Exchange, we’ve just rolled out a brand spanking new mobile site. This took about 6 weeks of my and our designer’s (Jin Yang) time, the majority of it spent building mobile Views.
Very little time was spent hammering mobile View switching support into MVC, because it’s really not that hard.
A nice thing about the Stack Exchange code base is that all of our Controllers share a common base class. As a consequence, it’s easy to overload the various View(…) methods to do some mobile magic. If your MVC site doesn’t follow this pattern it’s not hard to slap it onto an existing code base, it is a pre-requisite for this approach though.
Here’s the gist of the additions to the Controller base class:
protected new internal ViewResult View()
{
if (!IsMobile()) return base.View();
var viewName = ControllerContext.RouteData.GetRequiredString("action");
CheckForMobileEquivalentView(ref viewName, ControllerContext);
return base.View(viewName, (object)null);
}
protected new internal ViewResult View(object model)
{
if (!IsMobile()) return base.View(model);
var viewName = ControllerContext.RouteData.GetRequiredString("action");
CheckForMobileEquivalentView(ref viewName, ControllerContext);
return base.View(viewName, model);
}
protected new internal ViewResult View(string viewName)
{
if (!IsMobile()) return base.View(viewName);
CheckForMobileEquivalentView(ref viewName, ControllerContext);
return base.View(viewName);
}
protected new internal ViewResult View(string viewName, object model)
{
if (!IsMobile()) return base.View(viewName, model);
CheckForMobileEquivalentView(ref viewName, ControllerContext);
return base.View(viewName, model);
}
// Need this to prevent View(string, object) stealing calls to View(string, string)
protected new internal ViewResult View(string viewName, string masterName)
{
return base.View(viewName, masterName);
}
CheckForMobileEquivalentView() looks up the final view to render, in my design the lack of a mobile alternative just falls back to serving the desktop versions; this approach may not be appropriate for all sites, but Stack Exchange sites already worked pretty well on a phone pre-mobile theme.
private static void CheckForMobileEquivalentView(ref string viewName, ControllerContext ctx)
{
// Can't do anything fancy if we don't know the route we're screwing with
var route = (ctx.RouteData.Route as Route);
if (route == null) return;
var mobileEquivalent = viewName + ".Mobile";
var cacheKey = GetCacheKey(route, viewName);
bool cached;
// CachedMobileViewLookup is a static ConcurrentDictionary<string, bool>
if (!CachedMobileViewLookup.TryGetValue(cacheKey, out cached))
{
var found = ViewEngines.Engines.FindView(ctx, mobileEquivalent, null);
cached = found.View != null;
CachedMobileViewLookup.AddOrUpdate(cacheKey, cached, delegate { return cached; });
}
if (cached)
{
viewName = mobileEquivalent;
}
return;
}
The caching isn’t interesting here (though important for performance), the important part is the convention of adding .Mobile to the end of a View’s name to mark it as “for mobile devices.” Conventions rather than configurations, after all, being a huge selling point of the MVC framework.
And that’s basically it. Anywhere in your Controllers where you call View(“MyView”, myModel) or similar will instead serve a mobile View if one is available (passing the same model for you to work with).
If you’re doing any whole cloth caching (which you probably are, and if not you probably should be) [ed: I seem to have made this phrase up, “whole cloth caching” is caching an entire response] you’ll need to account for the mobile/desktop divide. All we do is slap “-mobile” onto the keys right before they hit the OuputCache.
One cool trick with this approach is that anywhere you render an action (as with @Html.Action() in a razor view) will also get the mobile treatment. Take a look at a Stack Overflow user page to see this sort of behavior in action. Each of those paged subsections (Questions, Answers, and so on) is rendered inline as an action and then ajax’d in via the same action. In fact, since the paging code on the user page merely fetches some HTML and writes it into the page (via jQuery, naturally) we’re able to use exactly the same javascript on the desktop user page and the mobile one.
I’m not advocating the same javascript between desktop and mobile views in all cases, but when you can do it (as you sometimes can when the mobile view really is just the “shrunk down” version of the desktop) it’ll save you a lot of effort, especially in maintenance down the line.
Another neat tidbit (though MVC itself gets most of the credit here), is the complete decoupling of view engines from the issue. If you want Razor on mobile, but are stuck with some crufty old ASPX files on the desktop (as we are in a few places) you’re not forced to convert the old stuff. In theory, you could throw Spark (or any other view engine) into the mix as well; though I have not actually tried doing that.
As an aside, this basic idea seems to be slated for MVC per Phil Haack’s announcement of the MVC4 Roadmap. I’ve taken it as a validation of the basic approach, if not necessarily the implementation.






















