Public Broadcasting: A Self-Describing Wrapper Around protobuf-net

Familiar with Protocol Buffers?  It’s a neat binary serialization format out of Google which aims to be efficient and extensible.  Java, C++, and Python have “official” libraries, and there are a plethora of libraries for other platforms.

In fact, we’ve been using protobuf-net over at Stack Exchange for a good long time, since August 18th, 2010 if our commit logs are to be believed.  It’s famously fast and simple to use, which has let it worm its way into basically all of our projects.  We even got the mind behind it to come and slum it with chumps like me.

But…

There is one pain point to using Protocol Buffers and that’s, well, defining the protocol bits.  You can either define your messages in .proto files and compile them, or if you’re using protobuf-net (or similar) annotate your existing types.

With protobuf-net a typical class ends up looking like so:

[ProtoContract]
public class RedisInboxItem : IComparable<RedisInboxItem>
{
  [ProtoMember(1)]
  public DateTime CreationDate { get; set; }
  [ProtoMember(2, IsRequired = true)]
  public InboxItemType Type { get; set; }
  [ProtoMember(3)]
  public int Id { get; set; }
  [ProtoMember(4)]
  public string Title { get; set; }
  [ProtoMember(5)]
  public bool IsPersisted { get; set; }

  // ...

This isn’t a problem if you’re marshaling between different code bases or acting as a service – you need to document the types involved after all; might as well do it in annotations or .proto files.  But if you’re communicating within the same code base, or with internal services, this manual protocol declaration can be onerous and a tad error prone.

Easily beat Brock, or eventually have a giant fire-breathing dragon?

Trade-Offs For Convenience

What would be really keen is a wrapper around Protocol Buffers that carries its own description, so that mapping tag numbers to fields doesn’t need any fore-knowledge of the serialized types.  It’s also been a while since I committed any really terrible ILGenerator code.

So I wrote one, it trades some of protobuf-net’s speed and some of Protocol Buffers compactness for the convenience of not having to use .proto files or annotations.  I call it Public Broadcasting, because that’s the first phrase that popped into my head with a P followed by a B.  Naming is hard.

In short, what Public Broadcasting does is provide a structurally typed wrapper around protobuf-net.  Any member names that match when deserializing are mapped correctly, any missing members are ignored, and any safe conversions (like byte -> int or Enum <-> String) happen automatically.  In addition, Nullable<struct>’s will be converted to default(struct) if necessary when deserializing.  Inheritance, type names, interfaces, and so on are ignored; we care about how the data “looks” not how it’s “named”.

Public Broadcasting works by describing a type using Protocol Buffers, then including that description in an “envelope” along with the actual data.  When deserializing, Public Broadcasting constructs a new type with all the appropriate annotations and then lets protobuf-net do the heavy lifting of deserializing the data.  Since we only care about the “structure” of the data a good deal of .NET’s type system is discarded, namely only classes (no distinction between ReferenceType and ValueType), Lists, Dictionaries, Enumerations, Nullables, and “simple” types (int, byte, string, GUID, etc.) are used.

Since the original type being deserialized doesn’t actually need to be known with Public Broadcasting, there is a Deserialize method which returns dynamic.  Although dynamic is something I normally avoid, in the “grab a data object, then discard”- style I typically use protobuf-net in, I think it’s a good fit.

In my (admittedly limited) testing, Public Broadcasting is typically within an order of magnitude of raw protobuf-net usage.  And protobuf-net is hella-fast, so even 10x slower is going to be plenty fast most of the time.

In terms of compactness, Public Broadcasting is going to be approximately the “length of all involved strings” larger than raw protobuf-net.  As soon as you start having many instances or recursive types this overhead shrinks relative to the size of the message, as Public Broadcasting doesn’t repeat member names like JSON.

However, if you absolutely must have the smallest messages and the fastest (de)serializations then you’ll want to use protobuf-net directly; overhead imposed by Public Broadcasting is noticeable.

If You Like What You’ve Read

Grab the source, or pull the dll off of Nuget, and let me know what you think.