History Of The Stack Exchange API, Mistakes

In an earlier post, I wrote about some of the philosophy and “cool bits” in the 1.0 release of the Stack Exchange API.  That’s all well and good, but of course I’m going to tout the good parts of our API; I wrote a lot of it after all.  More interesting are the things that have turned out to be mistakes, we learn more from failure than success after all.

Returning Total By Default

Practically every method in the API returns a count of the elements the query would return if not constrained by paging.

For instance, all questions on Stack Overflow:

{
  "total": 1936398,
  "page": 1,
  "pagesize": 30,
  "questions": [...]
}

Total is useful for rendering paging controls, and count(*) queries (how many of my comments have been up-voted, and so on); so it’s not that the total field itself was a mistake.  But returning it by default definitely was.

The trick is that while total can be useful, it’s not always useful.  Quite frequently queries take the form of “give me the most recent N questions/answers/users who X”, or “give me the top N questions/answers owned by U ordered by S”.  Neither of these common queries care about total, but they’re paying the cost of fetching it each time.

For simple queries (/1.0/questions call above), at least as much time is spent fetching total as is spent fetching data.

“Implicit” Types

Each method in the Stack Exchange API returns a homogenous set of results, wrapped in a meta data object.  You get collections of questions, answers, comments, users, badges, and so on back.

The mistake is that although the form of the response is conceptually consistent, the key under which the actual data is returned is based on the type.  Examples help illustrate this.

/1.0/questions returns:

{
 "total": 1947127,
 ...
 "questions": [...]
}

/1.0/users returns:

{
 "total": 507795,
 ...
 "users": [...]
}

This makes it something of a pain to write wrappers around our API in statically typed languages.  A much better design would have been a consistent `items` field with an additional `type` field.

How /1.0/questions should have looked:

{
 "total": 1947127,
 "type": "question",
 ...
  "items": [...]
}

This mistake became apparent as more API wrappers were written.  Stacky, for example, has a number of otherwise pointless classes (the “Responses” classes) just to deal with this.

It should be obvious what's dangerous, and most things shouldn't be.

Inconsistent HTML “Safety”

This one only affects web apps using our API, but it can be a real doozy when it does.  Essentially, not all text returns from our API is safe to embed directly into HTML.

This is complicated a bit by many of our fields having legitimate HTML in them, making it so consumers can’t just html encode everything.  Question bodies, for example, almost always have a great deal of HTML in them.

This led to the situation where question bodies are safe to embed directly, but question titles are not; user about mes, but not display names; and so on.  Ideally, everything would be safe to embed directly except in certain rare circumstances.

This mistake is a consequence of how we store the underlying data.  It just so happens that we encode question titles and user display names “just in time”, while question bodies and user about mes are stored pre-rendered.

A Focus On Registered Users

There are two distinct mistakes here.  First, we have no way of returning non-existent users.  This question, for instance, has no owner.  In the API, we return no user object even though we clearly know at least the display name of the user.  This comes from 1.0 assuming that every user will have an id, which is a flawed assumption.

Second, the /1.0/users route only returns registered users.  Unregistered users can be found via their ids, or via some other resource (their questions, comments, etc.).  This is basically a bug that no one noticed until it was too late, and got frozen into 1.0.

I suppose the lesson to take from these two mistakes is that your beta audience (in our case, registered users) and popular queries (which for us are all around questions and answers) have a very large impact on the “polish” pieces of an API get.  A corollary to Linus’ Law to be aware of, as the eyeballs are not uniformly distributed.

Things not copied from Twitter: API uptime.

Wasteful  Request Quotas

Our request quota system is a lift from Twitter’s API for the most part, since we figured it was better to steal borrow from an existing widely used API than risk inventing a worse system.

To quickly summarize, we issue every IP using the API a quota (that can be raised by using an app key) and return the remaining and total quotas in the X-RateLimit-Current and X-RateLimit-Max headers.  These quotas reset 24 hours after they are initially set.

This turns out to be pretty wasteful in terms of bandwidth as, unlike Twitter, our quotas are quite generous (10,000 requests a day) and not dynamic.  As with the total field, many applications don’t really care about the quota (until they exceed it, which is rare) but they pay to fetch it on every request.

Quotas are also the only bit of meta data we place in response headers, making them very easy for developers to miss (since no one reads documentation, they just start poking at APIs).  They also aren’t compressed due to the nature of headers, which goes against our “always compress responses” design decision.

The Good News

Is that all of these, along with some other less interesting mistakes, are slated to be fixed in 2.0.  We couldn’t address them in 1.1, as we were committed to not breaking backwards compatibility in a point-release (there were also serious time constraints).


12 Comments on “History Of The Stack Exchange API, Mistakes”

  1. Hi Kevin, thanks for sharing! I believe growing = learning with your mistakes, and its better when we learn from mistakes of others 🙂

    One questions: How did you end up handling Inconsistent HTML “Safety”? It is not clear is you pre-encoded data and saved on database, or saved raw data and encoded just before rendering the HTML?

    Thanks again!

    • 1.1 (being unable to introduce breaking changes) did not address this flaw, so it still exists under the current release of the API.1.1 (being unable to introduce breaking changes) did not address this flaw, so it still exists under the current release of the API.

      The plan for 2.0 is to guarantee that all text returned is safe to embed directly. Exactly how this will be achieved is an implementation detail (and thus subject to change at any time), though we’ll probably be doing just in time encoding for question titles and user names at launch.

      We will probably introduce un-safe text returns in subsequent versions (unbaked Markdown to enable editing, for instance). The lesson we’ll take from this earlier mistake is that you need to be really clear about differences in “safety”.

      • Robert Lee says:

        I think the current 1.0/1.1 behavior is more correct. For non-web applications, the cost of having to unescape the title headers seems like it would be an extreme inconvenience.

        I think the more correct way to look at this is that the data type of titles and other fields is “Plain Text”, where-as the data type of question bodies is “HTML”. So maybe what you should do is just change the key of the question body property from just “body” to “bodyHTML” to indicate that it contains HTML, not plain text (fields should be considered to contain unformatted raw data by default).

        This is like “database normalization”, but for APIs.

        Just my $0.02.

      • In my experience a lot of developers aren’t going to look into the encoding of a field regardless of what it is, so we need to take that into account.

        Currently, failing to encode a “plain text” field can lead to security holes. When we switch to encoding these fields in returns, erroneously choosing to encode them merely looks ugly. The “lazier” course of action (just in-lining API returns in a page) will also be the safe and correct one, more developers will accidentally fall into it hopefully.

        There’s also something to be said for consistency, having two different string fields with different semantics next to each in other in a returned object just feels wrong. Every app (web or not) already has to deal with HTML to get much out of the API (since question and answer bodies basically always have some), so I don’t think escaping the occasional field can really be called a burden.

        Of course, this is just the plan going forward. I suppose I could find myself writing a “Version 2.0, Mistakes” article in a year’s time about how this was a terrible mistake, and we’re going back to heterogeneous string returns in 3.0…

      • Robert Lee says:

        That makes sense. Thanks for the great article Kevin.

  2. Dave Weitz says:

    Blizzard is doing some interesting things with its Community API. I like their idea of assigning costs to each request (some requests are heavier than others) and then that totals up to a cap and they start throttling requests. See: http://us.battle.net/wow/en/forum/topic/2743691064

    • This is not an uncommon idea (Twitter has moved to a form of it too), and we may be forced to do so sometime in the future.

      However, I feel we should do our best to keep the quota system simple. We have to pay the pain of optimizing our API once, but each individual consumer would have to deal with the complexity of a non-linear quota system.

      Basically, I think it’s best to make life easy on the developers using the API even if costs the creators of it; up to a point, naturally.

  3. Morgan Christiansson says:

    Steal? Borrow? Didn’t you mean to say mean learn or copy?

  4. Kev d'Salvo says:

    Thanks for sharing more on the Stack Exchange API 🙂

  5. Kannan Goundan says:

    The escaping problem is definitely tricky and your “ugly text vs security hole” tradeoff seems reasonable, but I think escaping everything still seems like the wrong thing to do.

    First, it’ll cause more problems than just ugly text. Ugly text is relatively harmless because a human reader can still interpret the text correctly. But if a lazy programmer doesn’t unescape text before doing a substring search, you’ll get false positives and negatives. Still not as bad as a security hole, but worse than just ugly text.

    Second, it implicitly endorses a style of programming where you don’t think about escaping. Modern libraries/tools are trending towards more robust/strict escaping rules. Escaping everything would make life easier for people doing things the wrong way and harder for people doing things the right way.

    And now to throw in some guilt-by-association, escaping everything according to HTML’s rules feels a little like PHP’s magic quotes, which escaped everything according to MySQL’s rules.

    I don’t have a solution for your original problem, but I wonder if the best thing to do is leave things as they are?

    • I’ve never seen a false positive on a search due to over-encoding, while I’ve seen several “forgot to encode” security holes. Not that they can’t happen, but it’s vanishingly rare and much less dangerous.

      I don’t see how forcing “Encode(…)” calls everywhere can be called “the right way”. The important thing is consistency, and right now we’ve got two fields next to each other that need *different* treatment w.r.t. encoding; this just begs for mistakes to be made.

      It’s important to remember that effort spent using an API isn’t really helping anyone. All the utility of an application is in the “value add” over the API; accordingly, how easy it is to correctly use an API is very important. You want the lazy approach to be the right approach 99% of the time.

      • Kannan Goundan says:

        I guess I think that “the right way” is to be aware of the string types you’re using. Some templating tools encode things by default, unless marked as “already-encoded”. I suppose it may be confusing when two fields have different encodings, but Robert’s “HTML” field name suffix idea seems like a good differentiator.

        Anyway, I haven’t gone through the experience of having to deal with the 1.0 API (clearly you guys thought selective escaping was the right thing at some point and then changed your mind with experience). But from the outside, it seems similar (though not identical) to the PHP magic quotes situation where yeah, it reduced SQL injection problems overall, but made life a little harder for programmers who were already doing the right thing.