Stack Exchange API V2.0: Filters

We’re underway with the 2.0 version of the Stack Exchange API, so there’s no time like the present to get my thoughts on it written down.  This is the first in a series of nine posts about the additions, changes, and ideas in and around our latest API revision.  I’m sharing details because I think they’re interesting, but not with the expectation that everything I talk about will be generally applicable to other API designers.

First up, the addition of filters.

Mechanics

Pictured: the state of our documentation in the V1.0 beta.

Filters take the form of an opaque string passed to a method which specifies which fields you want returned.  For example, passing “!A7x.GE1T” to /sites returns only the names of the sites in the Stack Exchange network (this is a simplification, more details when we get to implementation).  This is similar to, but considerably terser than, partial returns via the “fields” parameter as implemented by Facebook and Google (note that we do allow something similar for key-less requests via the “include” and “exclude” parameters).

You can think of filters as redacting returned fields.  Every method has some set of fields that can be returned, and a filter specifies which of those fields shouldn’t be.  If you’re more comfortable thinking in SQL, filters specify the selected columns (the reality is a bit more complicated).

Filters are created by passing the fields to include, those to exclude, and a base filter to the /filter/create method.  They’re immutable and never expire, making it possible (and recommended) to generate them once and then bake them into applications for distribution.

Motivations

There are two big motivations, and a couple of minor ones, for introducing filters.

Performance

We also log a monitor per-request SQL and CPU time for profiling purposes.

The biggest one was improved performance in general, and allowing developers to tweak API performance in particular.  In the previous versions of the Stack Exchange API you generally fetched everything about an object even if you only cared about a few properties.  There were ways to exclude comments and answers (which were egregiouly expensive in some cases) but that was it.

For example, imagine all you cared about were the users most recently active in a given tag (let’s say C#).  In both V1.1 and V2.0 the easiest way to query is would be to use the /questions route with the tagged parameter.  In V1.1 you can exclude body, answers, and comments but you’re still paying for close checks, vote totals, view counts, etc.  In V2.0 you can get just the users, letting you avoid several joins and a few queries.  The adage “the fastest query is one you never execute” holds, as always.

Bandwidth

Related to performance, some of our returns can be surprisingly large.  Consider /users, which doesn’t return bodies (as /questions, /answers, and so on do) but does return an about_me field.  These fields can be very large (at time of writing, the largest about_me fields are around 4k) and when multiplied by the max page size we’re talking about wasting 100s of kilobytes.

Even in the worst cases this is pretty small potatoes for a PC, but for mobile devices both the wasted data (which can be shockingly expensive) and the latency of fetching those wasted bytes can be a big problem.  In V1.1 the only options we had were per-field true/false parameters (the aforementioned answers and comments) which quickly becomes unwieldy.  Filters in V2.0 let us handle this byte shaving in a generic way.

Saner Defaults

In V1.1 any data we didn’t return by default either introduced a new method or a new parameter to get at that data, which made us error on the side of “return it by default”.  Filters let us be much more conservative in what our default returns are.

A glance at the user type reveals a full six fields we’d have paid the cost of returning under the V1.1 regime.  Filters also provide a convenient place to hang properties that are app wide (at least most of the time), such as “safety” (which is a discussion for another time).

Interest Indications

Filters give us a great insight into what fields are actually of interest to API consumers. Looking into usage gives us some indicators on where to focus our efforts, both in terms of optimization and new methods to add.  Historically my intuition about how people use our API has been pretty poor, so having more signals to feed back into future development is a definite nice to have.

While not the sexiest feature (that’s probably authentication), filters are probably my favorite new feature.  They’re a fairly simple idea that solve a lot of common, general, problems.  My next post (ed: available here) will deal with some of the implementation details of our filter system.