Stack Exchange API V2.0: Implementing Filters

As part of this series of articles, I’ve already discussed why we added filters to the Stack Exchange API in version 2.0 (go check it out, you could win a prize).  Now I’m going to discuss how they were implemented and what drove the design.

Considerations

Stability

It is absolutely paramount that filters not break, ever.  A lot of the benefits of filters go away if applications are constantly generating them (that is, if they aren’t “baked into” executables), and “frustrated” would be a gross understatement of how developers would feel if we kept forcing them to redistribute their applications with new filters.

From stability, it follows that filters need to be immutable.  Consider the denial of service attack that would be possible from modifying a malicious party extracting and modifying a filter baked into a popular application.

Speed

One of the big motivations behind filters was improving performance, so it follows that the actual implementation of filters shouldn’t have any measurable overhead.  In practice this means that no extra database queries (and preferably no network traffic at all) can occur as consequence of passing a filter.

Ease of Use

While it’s probably impossible to make using a filter more convenient than not using one, it’s important that using filters not be a big hassle for developers.  Minimizing the number of filters that need to be created, and providing tools to aid in their creation are thus worthwhile.

Implementation

Format

Filters, at their core, ended up being a simple bitfield of which fields to include in a response.  Bitfields are fast to decode, and naturally stable.

Also, every filter includes every type is encoded in this bitfield.  This is important for the ease of use consideration, as it makes it possible to use a single filter for all your requests.

Encoding

A naive bitfield for a filter would have, at time of writing, 282 bits.  This is a bit hefty, a base64 encoded naive filter would be approximately 47 characters long for example, so it behooves us to compress it somewhat.

An obvious and simple compression technique is to run-length encode the bitfield.  We make this even more likely to bear fruit by grouping the bits first by “included in the default filter” and then by “type containing the field”.  This grouping exploits the expectation that filters will commonly either diverge from the default filter or focus on particular types.

We also tweak the character’s we’ll use to encode a filter a bit, so we’re technically encoding in a base higher than 64; though we’re losing a character to indicate safe/unsafe (which is a discussion for another time).

All told, this gets the size of filters we’re seeing the wild down to a manageable 3 to 29 characters.

Bit Shuffling

This ones a bit odd, but in the middle of the encoding step we do some seemingly pointless bit shuffling.  What we’re trying to do here is enforce opaqueness, why we’d want to do that deserves some explanation.

A common problem when versioning APIs is discovering that a number of consumers (oftentimes an uncomfortably large number) are getting away with doing technically illegal things.  An example is SetWindowsHook in Win16 (short version, developers could exploit knowledge of the implementation to avoid calling UnhookWindowsHook), one from V1.0 of the Stack Exchange API is /questions/{id}/comments also accepting answer ids (this exploits /posts/{ids}/comments, /questions/{ids}/comments, and /answers/{ids}/comments all being aliases in V1.x).  When you find such behavior you’re left choosing between breaking consumers or maintaining “bugs” in your implementation indefinitely, neither of which are great options.

The point of bit shuffling is to make it both harder to figure out the implementation (though naturally not  impossible, the average developer is more than bright enough to figure our scheme out given enough time) so such “too clever for your own good” behavior is harder to pull off, and to really drive the point home that you shouldn’t be creating filters without calling /filter/create.

Backwards Compatibility

Maintaining compatibility between API versions with filters is actually pretty simple if you add one additional constraint, you never remove a bit in the field.  This lets you use the length of the underlying bitfield as a version number.

Our implementation maintains a list of fields paired with the length of the bitfield they were introduced on.  This lets us figure out which fields were available when a given filter was created, and exclude any newer fields when we encounter an old filter.

Composing A Query

One prerequisite for filters is the ability to easily compose queries against your datastore.  After all, it’s useless to know that certain fields shouldn’t be fetched if you can’t actually avoid querying for them.

In the past we would have used LINQ-to-SQL, but performance concerns have long since lead us to develop and switch to Dapper, and SqlBuilder in Dapper.Contrib.

Here’s an rough outline of building part of an answer object query.

// While technically optional, we always need this so *always* fetch it
builder.Select("Id AS answer_id");
builder.Select("ParentId AS question_id");
// links and title are backed by the same columns
if (Filter.Answer.Link || Filter.Answer.Title)
{
  builder.LeftJoin("dbo.Posts Q ON Q.Id = ParentId");
  builder.Select("Q.Title as title");
}
if(Filter.Answer.LastEditDate)
{
  builder.Select("LastEditDate AS last_edit_date");
}

The actual code is a bit heavier on extension methods and reuse.

Note that sometimes we’ll grab more data than we intend to return, such as when fetching badge_count objects we always fetch all three counts even if we only intend to return, say, gold.  We rely on some IL magic just before we serialize our response to handle those cases.

Caches

The Stack Exchange network sites would fall over without aggressive caching, and our API has been no different.  However, introducing filters complicates our caching approach a bit.

In V1.x, we just maintained query -> response and type+id -> object caches.  In V2.0, we need to account for the fields actually fetched or we risk responding with too many or too few fields set when we have a cache hit.

The way we deal with this is to tag each object in the cache with a mini-filter which contains only those types that could have been returned by the method called.  For example, the /comment mini-filter would contain all the fields on the comment and shallow_user types.  When we pull something out of the cache, we can check to see if it matches by seeing if the cached mini-filter covers the relevant fields in the current request’s filter; and if so, use the cached data to avoid a database query.

One clever hack on top of this approach lets us service requests for filters that we’ve never actually seen before.  When we have a cache hit for a given type+id pair but the mini-filter doesn’t cover the current request, we run the full request (database hit and all) and then merge the cached object with the returned one and place it back in the cache.  I’ve taken to calling this “merge and return to cache” process as widening an object in cache.

Imagine: request A comes in asking for 1/2 the question fields, request B now comes in asking for the other 1/2, then request C comes in asking for all the fields on question.  When A is processed there’s nothing in the cache, we run the query and place 1/2 of a question object in cache.  When B is processed, we find the cached result of A but it doesn’t have the fields needed to satisfy B; so we run the query, widen the cached A with the new B.  When C is processed, we find the cached union of A and B and voilà, we can satisfy C without hitting the database.

One subtlety is that you have to make sure a widened object doesn’t remain in cache forever.  It’s all too easy for an object to gain a handful of fields on many subsequent queries resetting it’s expiration each time, causing you to serve exceptionally stale data.  The exact solution depends on your caching infrastructure, we just add another tag to the object with it’s maximum expiration time; anything we pull out of the cache that’s past due to be expired is ignored.

Tooling

We attacked the problem of simplifying filter usage in two ways: providing a tool to generate filters, and enabling rapid prototyping with a human-friendly way to bypass filter generation.

Stack Exchange's Emmett Nicholas did a lot of the UI work here.

We spent a lot of time getting GUI for filter editing up to snuff in our API console (pictured, the /questions console).  With just that console you can relatively easily generate a new filter, or use an existing one as a starting point.  For our internal development practically all filters have ended up being created via this UI (which is backed by calls to /filter/create), dogfooding has lead me to be pretty satisfied with the result.

For those developers who aren’t using the API console when prototyping, we allow filters to be specified with the “include”, “exclude”, and “base” parameters (the format being the same as calls to /filter/create).  The idea here is if you just want a quick query for, say, total questions you probably don’t want to go through the trouble of generating a filter; instead, just call /questions?include=.total&base=none&site=stackoverflow.  However, we don’t want such queries to make their way into released application (they’re absurdly wasteful of bandwidth for one) so we need a way to disincentivize them outside of adhoc queries.  We do this by making them available only when a query doesn’t pass an application key, and since increased quotas are linked to passing application keys we expect the majority of applications to use filters correctly.


Follow

Get every new post delivered to your Inbox.