Stack Exchange API V2.0: Throttling

Posted: 2012/03/22 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | Comments Off

Here’s a post I lost while writing the rest of the Stack Exchange API V2.0 series, it’s been restored (ie. I hit the publish button) in glorious TechniColor.

Every API has some limit in the number of requests it can serve. Practically, your servers will eventually fall down under load. Pragmatically, you’ll want to limit access to keep your servers from falling down under load. Rationing access to an API is a tricky business; you have to balance technical, business, and philosophical concerns.

History

In early versions of the Stack Exchange API we had two distinct throttles, a “simultaneous requests” throttle that would temporarily ban any IP making more than 30 requests per second and a “daily quota” throttle that would drop any requests over a certain threshold (typically 10,000) per day.

In V2.0, the simultaneous request throttle remains and unauthenticated applications (those making requests without access tokens) are subject to the same IP base daily quota that was in V1.x. We’ve also added two new classes of throttles for authenticated applications, a per-app-user-pair quota and a per-user quota.

Function

When an authenticated app makes a request it is counted against a per-user-app-pair daily quota (notably not dependent upon IP address) and a per-user daily quota. The user-app pair numbers are returned on each request (in fields on the common wrapper object), but the per-user daily value is hidden.

The authenticated application cases are a bit complicated, so here’s a concrete example:

A user is actively using 6 applications, all of which are making authenticated requests. The user’s daily quota is 50,000 and each application has a daily quota of 10,000 (these numbers are typical). Every time an application makes a request it receives its remaining quota with the response, and this number should be decremented by one on each request; if application #1 makes its 600^th request it will receive “9,400” back, if application #2 simultaneously made its 9,000^th request it will get “1,000” back. If an application exceeds its personal quota (with its 10,001^st request) its request will begin being denied, but every other application will continue to make requests as normal. However, if the sum of the requests made by each application exceeds the user’s quota of 50,000 then all requests will be denied.

Here's to twenty more years of filicide.

There are good reasons to throttle.

Rationale

The simultaneous request throttle is purely technically driven, it helps protect the network from DOS attacks. This throttle is enforced much higher in the stack than the others (affected requests never even hit our web tier), and is also much “ruder”. In a perfect world we’d return well formed errors (as we do elsewhere in the API) when this is encountered, but practically speaking it’s much simpler to just let HAProxy respond however it pleases.

Our daily quotas are more tinted philosophically. We still need to limit access to a scarce resource (our servers’ CPU time) and we don’t want any one application to monopolize the API to the detriment of others (or direct users of the Stack Exchange sites). A simple quota system addresses these concerns sufficiently in my opinion. It’s also “fair” since all you have to do is register on Stack Apps, after which you have access to all of our CC-wiki’d data. The quota of 10,000 was chosen because it allows an app to request (100 at a time) one million questions a day via the API; it’s a nice round number, and is also fairly “absurd” giving us some confidence that most applications can function within the quota.

The per-app-user-pair quota starts pulling some business sense into the discussion. In essence, IP based throttles fall apart in the face of computing as a service (Google App Engine, Azure, Amazon Web Services, and others). While we were aware of this flaw in v1.0, we had no recourse at the time (lacking the resources to add authentication to the API and still ship in a timely manner). By excluding services like App Engine from our API consumers we were hamstringing them, which made improving our throttling story a high ticket item for API v2.0. As a rule of thumb, if a feature makes life easier on the consumers of your API you should seriously investigate it; after-all an API is not in and of itself interesting, the applications built on top of it are.

With the per-app-user-pair quota alone we still have to worry about DOS attacks, thus the per-user quota. One wrinkle is in not reporting the state of the quota to applications. I considered that, but decided that it was something of a privacy leak and not of interest to most applications; after all, there’s no way for one application to control another.

Stack Exchange API V2.0: The Stable Future

Posted: 2012/02/22 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | Comments Off

This is the last of my planned series about v2.0 of the Stack Exchange API (the contest is still going on, and we froze the interface a few weeks ago), and I’d like to talk about some regrets. Not another mistakes post (API V2.0 is still too young to judge with the benefit of hindsight), but something in a similar vein.

This is thankfully a pretty short post, as I’ve worked pretty hard to reduce the regrettable things in every iteration of our API.

The need for breaking changes

I definitely tried to break as little as possible, compare the question object in V1.1 to that in V2.0 to see how little changed. However, we ate two big breaks: restructuring the common response wrapper, and moving the API from per-site urls (api.stackoverflow.com, api.serverfault.com, etc.) to a single shared domain (api.stackexchange.com).

These were definitely needed, and I’m certain benefits they give us will quickly pay off the transition pain; but still, breaking changes impose work on consumers. When building an API it’s important to never make breaking changes wantonly, you’re making non-trivial imposition on other developer’s time.

We couldn’t do everything

I’ve already got a list of things I want in V2.1, unfortunately some of them were things I’d like to have gotten into V2.0. A truism of development is that there’s always more to be done, real developers ship, and so on. API development is no different, but I’m hoping to see an increase in new features and a decrease in time between API revisions as Stack Exchange matures.

We should be much stabler now

Naturally, since I’m aware of these problems we’ve taken steps to make them less of an issue.

I’m very satisfied with the current request wrapper, and the filter feature (plus our backwards compatibility tooling around it) makes it much easier to make additions to it without breaking existing consumers. In general, filters allow for lots of forward compatibility guarantees.

We’ll never be able to do everything, but all sorts of internal gotchas that made V1.1 really rough to roll out have been fixed as part of V2.0. Subsequent minor revisions to V2.0 shouldn’t be nearly as rough, and with our growing staff developer resources should be comparatively plentiful.

Stack Exchange API V2.0: Consistency

Posted: 2012/02/16 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | Comments Off

If you take a look at the changelog for Stack Exchange API V2.0 (go check it out, you could win a prize), you’ll see we did more than just add new methods. We also made a considerable number changes which taken as a whole aim to improve the consistency of the API’s interface.

Broadly speaking good APIs are consistent APIs. Every time your API surprises a developer, or forces them to re-read the documentation, you’ve lost a small but important battle. As I’ve said before, an API isn’t interesting in and of itself the applications built on top of it are; and accordingly any time wasted wrestling with the rough edges of your API is time not spent making those applications even more interesting.

Addressing Past Mistakes

This is an opinion I held during V1.0 as well, but we’re hardly perfect and some oddities slipped through. Fields were inconsistently HTML encoded/escaped, question bodies and titles needing completely different treatment by an application. There were ways to exclude some, but not all, expensive to fetch fields on some, but not many, methods. There were odd naming patterns, for example /questions/{ids} expected question ids, and /comments/{ids} expected comment ids, but /revisions/{ids} expected post ids. Return types could be unexpected, /questions returned questions as did /questions/{ids} but while /badges returned badges /badges/{ids} returned users (those who had earned the badges in question).

We’ve addressed all of these faults, and more, in V2.0. Safety addresses the encoding issues, while Filters address the excluding of fields. /revisions/{ids} was renamed to /posts/{ids}/revisions, while another odd method /revisions/{post ids}/{revision id} became the new /revisions/{revision ids}. And all methods that exist in /method, /method/{ids} pairs return the same types.

There are some ways we’re perhaps… oddly consistent, that I feel merit mention. All returns use the same “common wrapper object”, even errors; this lets developers use common de-serialization code paths for all returns. Related, all methods return arrays of results even if the can reasonably be expected to always return a single item (such as /info); as with the common wrapper, this allows for sharing de-serialization code. We return quota fields as data in the response body (other APIs, like Twitter, often place them in headers), so developers can handle them in the same manner as they would any other numeric data.

Pitfalls

I would caution against what I’ve taken to calling “false consistency”, where an API either implies a pattern that doesn’t exist or sacrifices more important features in the name of consistency.

By way of example: the Stack Exchange API has a number of optional sorts on many methods, many of which are held in common (a great many methods have an activity sort, for instance). The crux of the consistency objection to these sorts is that they often times apply to the same field (creation sorting applying to question, answer, and comment creation_date fields for example) but are named differently from the fields, some argue they should have the same name.

However, this would be a false consistency. Consider if we just changed the names of the existing sorts, we would then be implying that all fields can be sorted by which isn’t true. If did make it possible, we’d have sacrificed performance (and probably some of our SQL servers) in pursuit of consistency. In this case the cure would be worse than the disease, though I would argue our sorts are consistent with regards to each other if not with regards to the fields they operate on.

Stack Exchange API V2.0: Safety

Posted: 2012/02/08 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | Comments Off

Every method in version 2.0 of the Stack Exchange API (no longer in beta, but an app contest continues) can return either “safe” or “unsafe” results. This is a rather odd concept that deserves some explanation.

Semantics

Safe results are those for which every field on every object returned can be inlined into HTML without concern for script injections or other “malicious” content. Unsafe results are those for which this is not the case, however that is not to imply that all fields are capable of containing malicious content in an unsafe return.

An application indicates where it wants safe or unsafe results via the filter it passes along with the request. Since most requests should have some filter passed, it seemed reasonable to bundle safety into them.

To the best of my knowledge, this is novel in an API. It’s not exactly something that we set out to add however, there’s a strong historical component to its inclusion.

Rationale

In version 1.0 of the Stack Exchange API every result is what we would call unsafe in version 2.0. We were simply returning the data stored in our database, without much concern for encoding or escaping. This lead to cases where, for example, question bodies were safe to inline (and in fact, escaping them would be an error) but question titles required encoding lest an application open itself to script injections. This behavior caused difficulties for a number of third-party developers, and bit us internally a few times as well; which is why I resolved to do something to address it in version 2.0.

I feel that a great strength in any API is consistency, and thus the problem was not that you had to encode data, it was that you didn’t always have to. We had two options, we could either make all fields “high fidelity” by returning the most original data we had (ie. exactly what user’s had entered, before tag balancing, entity encoding, and so on) or we could make all fields “safe” by make sure the same sanitization code ran against them regardless of how they were stored in the database.

Unfortunately, it was not to be.

Strictly speaking, I would have preferred to go with the “high fidelity” option but another consideration forced the “safe” one. The vast majority of the content in the Stack Exchange network is submitted in markdown, which isn’t really standardized (and we’ve added a number of our own extensions). Returning the highest fidelity data would be making the assumption that all of our consumers could render our particular markdown variant. While we have open sourced most of our markdown implementation, even with that we’d be restricting potential consumers to just those built on .NET. Thus the “high fidelity” option wasn’t just difficult to pursue, it was effectively impossible.

Given this train of thought, I was originally going to just make all returns “safe” period. I realize in hindsight that this would have been a pretty grave mistake (thankfully, some of the developers at Stack Exchange, and some from the community, talked me out of it). I think the parallel with C#’s unsafe keyword is a good one, sometimes you need dangerous things and you put a big “I know what I’m doing” signal right there when you do. This parallel ultimately granted the options their names; properly escaped and inline-able returns are “safe”, and those that are pulled straight out of the database are “unsafe”.

Adding a notion of “safety” allows us to be very consistent in how data we return should be treated. It also allows developers to ignore encoding issues if they so desire, at least in the web app case. Of course, if they’re willing to handle encoding themselves they also have the option.

Stack Exchange API V2.0: No Write Access

Posted: 2012/02/01 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | 2 Comments

Version 2.0 of the Stack Exchange API (still in beta, with an app contest) introduced authentication, but unlike most other APIs did not introduced write access at the same time. Write access is one of the most common feature requests we get for the API, so I feel it’s important to explain that this was very much a conscious decision, not one driven solely by the scarcity of development resources.

Why?

The “secret sauce” of the Stack Exchange network is the quality of the content found on the sites. When compared to competing platforms you’re much more likely to find quality existing answers to your questions, to find interesting and coherent questions to answer, and to quickly get good answers to new questions you post. Every change we make or feature we add is considered in terms of either preserving or improving quality, and the Stack Exchange API has been no different. When viewed under with lens of quality, a few observations can be made about the API.

Screw up authentication, and you screw up write access

Write access presupposes authentication, and any flaw in authentication is going to negatively impact the use of write access accordingly. Authentication, being a new 2.0 feature, is essentially untested, unverified, and above all untrusted in the current Stack Exchange API. By pushing authentication out in its own release we’re starting to build some confidence in our implementation, allowing us to be less nervous about write access in the future.

Screw up write access, and you screw up quality

The worst possible outcome of adding write access to the Stack Exchange API is a flood of terrible questions and answers on our sites. As such the design of our API will need to actively discourage such an outcome, we can’t get away with a simple “POST /questions/create”. However, a number of other APIs can get away with very simple write methods and it’s important to understand how they differ from the Stack Exchange model and as such need not be as concerned (beyond not having nearly our quality in the first place).

The biggest observation is that every Stack Exchange site is, in a sense, a single “well” in danger of being poisoned. Every Stack Exchange site user sees the same Newest Questions page, the same User page, and (with the exception of Stack Overflow) the same Home page. Compare with social APIs (ie. Facebook and Twitter, where everything is sharded around the user), or service APIs (like Twilio, which doesn’t really have common content to show users); there are lots of “wells”, none of which are crucial to protect.

Write access requires more than just writing

It’s easy to forget just how much a Stack Exchange site does to encourage proper question asking behavior in users.

I've circled the parts of the Ask page that don't offer the user guidance

A write API for Stack Exchange will need to make all of this guidance available for developers to integrate into their applications, as well as find ways to incentivize that integration. We also have several automated quality checks against new posts, and a plethora of other rejection causes; all of which need to be conscientiously reported by the API (without creation oracles for bypassing those checks).

Ultimately the combination of: wanting authentication to be independently battle tested, the need to really focus on getting write access correct, and the scarcity of development resources caused by other work also slated for V2.0 caused write access to be deferred to a subsequent release.

Stack Exchange API V2.0: JS Auth Library

Posted: 2012/01/25 | Author: kevinmontrose | Filed under: code, pontification | Tags: apiv2 | 1 Comment

In a previous article I discussed why we went with OAuth 2.0 for authentication in V2.0 of the Stack Exchange API (beta and contest currently underway), and very shortly after we released a simple javascript library to automate the whole affair (currently also considered “in beta”, report any issues on Stack Apps). The motivations for creating this are, I feel, non-obvious as is why it’s built the way it is.

Motivations

I’m a strong believer in simple APIs. Every time a developer using your API has to struggle with a concept or move outside their comfort zone, your design has failed in some small way. When you look at the Stack Exchange API V2.0, the standout “weird” thing is authentication. Every other function in the system is a simple GET (well, there is one POST with /filters/create), has no notion of state, returns JSON, and so on. OAuth 2.0 requires user redirects, obviously has some notion of state, has different flows, and is passing data around on query strings or in hashes. It follows that, in pursuit of overall simplicity, it’s worthwhile to focus on simplifying consumers using our authentication flows. The question then becomes “what can we do to simplify authentication?”, with an eye towards doing as much good as possible with our limited resources. The rationale for a javascript library is that:

web applications are prevalent, popular, and all use javascript
we lack expertise in the other (smaller) comparable platforms (Android and iOS, basically)
web development makes it very easy to push bug fixes to all consumers (high future bang for buck)
other APIs offer hosted javascript libraries (Facebook, Twitter, Twilio, etc.)

Considerations

The first thing that had to be decided was the scope of the library, as although the primary driver for the library was the complexity of authentication that did not necessarily mean that’s all the library should offer. Ultimately, all it did cover is authentication, for reasons of both time and avoidance of a chilling affect. Essentially, scoping the library to just authentication gave us the biggest bang for our buck while alleviating most fears that we’d discourage the development of competing javascript libraries for our API. It is, after all, in Stack Exchange’s best interest for their to be a healthy development community around our API. I also decided that it was absolutely crucial that our library be as small as possible, and quickly served up. Negatively affecting page load is unacceptable in a javascript library, basically. In fact, concerns about page load times are why the Stack Exchange sites themselves do not use Facebook or Twitter provided javascript for their share buttons (and also why there is, at time of writing, no Google Plus share option). It would be hypocritical to expect other developers to not have the same concerns we do about third-party includes.

Implementation

Warning, lots of this follows.

Since it’s been a while since there’s been any code in this discussion, I’m going to go over the current version (which reports as 453) and explain the interesting bits. The source is here, though I caution that a great many things in it are implementation details that should not be depended upon. In particular, consumers should always link to our hosted version of the library (at https://api.stackexchange.com/js/2.0/all.js).

The first three lines sort of set the stage for “small as we can make it”.

window.SE = (function (navigator, document,window,encodeURIComponent,Math, undefined) {
"use strict";
var seUrl, clientId, loginUrl, proxyUrl, fetchUserUrl, requestKey, buildNumber = '@@~~BuildNumber~~@@';

I’m passing globals as parameters to the closure defining the interface in those cases where we can expect minification to save space (there’s still some work to be done here, where I will literally be counting bytes for every reference). We don’t actually pass an undefined to this function, which both saves space and assures nobody’s done anything goofy like giving undefined a value. I intend to spend some time seeing if similar proofing for all passed terms is possible (document and window are already un-assignable, at least in some browsers). Note that we also declare all of our variables in batches throughout this script, to save bytes from repeating “var” keywords.

Implementation Detail: “@@~~BuildNumber~~@@” is replaced as part of our deploy. Note that we pass it as a string everywhere, allow us to change the format of the version string in the future. Version is provided only for bug reporting purposes, consumers should not depend on its format nor use it in any control flow.

function rand() { return Math.floor(Math.random() * 1000000); }

Probably the most boring part of the entire implementation, gives us a random number. Smaller than inlining it everywhere where we need one, but not by a lot even after minifying references to Math. Since we only ever use this to avoid collisions, I’ll probably end up removing it altogether in a future version to save some bytes.

function oldIE() {
if (navigator.appName === 'Microsoft Internet Explorer') {
var x = /MSIE ([0-9]{1,}[\.0-9]{0,})/.exec(navigator.userAgent);
if (x) {
    return x[1] <= 8.0;
}
}
return false;
}

Naturally, there’s some Internet Explorer edge case we have to deal with. For this version of the library, it’s that IE8 has all the appearances of supporting postMessage but does not actually have a compliant implementation. This is a fairly terse check for Internet Explorer versions <= 8.0, inspired by the Microsoft recommended version. I suspect a smaller one could be built, and it’d be nice to remove the usage of navigator if possible.

Implementation Detail: There is no guarantee that this library will always treat IE 8 or lower differently than other browsers, nor is there a guarantee that it will always use postMessage for communication when able.

Now we get into the SE.init function, the first method that every consumer will need to call. You’ll notice that we accept parameters as properties on an options object; this is a future proofing consideration, as we’ll be able to add new parameters to the method without worrying (much) about breaking consumers.

You’ll also notice that I’m doing some parameter validation here:

if (!cid) { throw "`clientId` must be passed in options to init"; }
if (!proxy) { throw "`channelUrl` must be passed in options to init"; }
if (!complete) { throw "a `complete` function must be passed in options to init"; }
if (!requestKey) { throw "`key` must be passed in options to init"; }

This is something of a religious position, but I personally find it incredibly frustrating when a minified javascript library blows up because it expected a parameter that wasn’t passed. This is inordinately difficult to diagnose given how trivial the error is (often being nothing more than a typo), so I’m checking for it in our library and thereby hopefully saving developers some time.

Implementation Detail: The exact format of these error messages isn’t specified, in fact I suspect we’ll rework them to reduce some repetition and thereby save some bytes. It is also not guaranteed that we will always check for required parameters (though I doubt we’ll remove it, it’s still not part of the spec) so don’t go using try-catch blocks for control flow.

This odd bit:

if (options.dev) {
seUrl = 'https://dev.stackexchange.com';
fetchUserUrl = 'https://dev.api.stackexchange.com/2.0/me/associated';
} else {
seUrl = 'https://stackexchange.com';
fetchUserUrl = 'https://api.stackexchange.com/2.0/me/associated';
}

Is for testing on our dev tier. At some point I’ll get our build setup to strip this out from the production version, there’s a lot of wasted bytes right there.

Implementation Detail: If the above wasn’t enough, don’t even think about relying on passing dev to SE.init(); it’s going away for sure.

The last bit of note in SE.init, is the very last line:

setTimeout(function () { complete({ version: buildNumber }); }, 1);

This is a bit of future proofing as well. Currently, we don’t actually have any heaving lifting to do in SE.init() but there very well could be some in the future. Since we’ll never accept blocking behavior, we know that any significant additions to SE.init() will be asynchronous; and a complete function would be the obvious way to signal that SE.init() is done.

Implementation Detail: Currently, you can get away with calling SE.authenticate() immediately, without waiting for the complete function passed to SE.init() to execute. Don’t do this, as you may find that your code will break quite badly if our provided library starts doing more work in SE.init().

Next up is fetchUsers(), an internal method that handles fetching network_users after an authentication session should the consumer request them. We make a JSONP request to /me/associated, since we cannot rely on the browser understanding CORS headers (which are themselves a fairly late addition to the Stack Exchange API).

Going a little out of order, here’s how we attach the script tag.

while (window[callbackName] || document.getElementById(callbackName)) {
callbackName = 'sec' + rand();
}
window[callbackName] = callbackFunction;
src += '?access_token=' + encodeURIComponent(token);
src += '&pagesize=100';
src += '&key=' + encodeURIComponent(requestKey);
src += '&callback=' + encodeURIComponent(callbackName);
src += '&filter=!6RfQBFKB58ckl';
script = document.createElement('script');
script.type = 'text/javascript';
script.src = src;
script.id = callbackName;
document.getElementsByTagName('head')[0].appendChild(script);

The only interesting bit here is the while loop making sure we don’t pick a callback name that is already in use. Such a collision would be catastrophically bad, and since we can’t guarantee anything about the hosting page we don’t have a choice but to check.

Implementation Detail: JSONP is the lowest common denominator, since many browsers still in use do not support CORS. It’s entirely possible we’ll stop using JSONP in the future, if CORS supporting browsers become practically universal.

Our callbackFunction is defined earlier as:

callbackFunction =
function (data) {
try {
delete window[callbackName];
} catch (e) {
window[callbackName] = undefined;
}
script.parentNode.removeChild(script);
if (data.error_id) {
error({ errorName: data.error_name, errorMessage: data.error_message });
return;
}
success({ accessToken: token, expirationDate: expires, networkUsers: data.items });
};

Again, this is fairly pedestrian. One important thing that is often overlooked when making these sorts of libraries is the cleanup of script tags and callback functions that are no longer needed. Leaving those lingering around does nothing but negatively affect browser performance.

Implementation Detail: The try-catch block is a workaround for older IE behaviors. Some investigation into whether setting the callback to undefined performs acceptably for all browsers may let us shave some bytes there, and remove the block.

Finally, we get to the whole point of this library: the SE.authenticate() method.

We do the same parameter validation we do in SE.init, though there’s a special case for scope.

if (scopeOpt && Object.prototype.toString.call(scopeOpt) !== '[object Array]') { throw "`scope` must be an Array in options to authenticate"; }

Because we can’t rely on the presence of Array.isArray in all browsers, we have to fall back on this silly toString() check.

The meat of SE.authenticate() is in this block:

if (window.postMessage && !oldIE()) {
if (window.attachEvent) {
window.attachEvent("onmessage", handler);
} else {
window.addEventListener("message", handler, false);
}
} else {
poll =
function () {
if (!opened) { return; }
if (opened.closed) {
clearInterval(pollHandle);
return;
}
var msgFrame = opened.frames['se-api-frame'];
if (msgFrame) {
clearInterval(pollHandle);
handler({ origin: seUrl, source: opened, data: msgFrame.location.hash });
}
};
pollHandle = setInterval(poll, 50);
}
opened = window.open(url, "_blank", "width=660, height=480");

In a nutshell, if a browser supports (and properly implements, unlike IE8) postMessage we use that for cross-domain communication other we use the old iframe trick. The iframe approach here isn’t the most elegant (polling isn’t strictly required) but it’s simpler.

Notice that if we end up using the iframe approach, I’m wrapping the results up in an object that quacks enough like a postMessage event to make use of the same handler function. This is easier to maintain, and saves some space through code reuse.

Implementation Detail: Hoy boy, where to start. First, the usage of postMessage or iframes shouldn’t be relied upon. Nor should the format of those messages sent. The observant will notice that stackexchange.com detects that this library is in use, and only create an iframe named “se-api-frame” when it is; this behavior shouldn’t be relied upon. There’s quite a lot in this method that should be treated as a black box; note that the communication calisthenics this library is doing isn’t necessary if you’re hosting your javascript under your own domain (as is expected of other, more fully featured, libraries like those found on Stack Apps).

Here’s the handler function:

handler =
function (e) {
if (e.origin !== seUrl || e.source !== opened) { return; }
var i,
pieces,
parts = e.data.substring(1).split('&'),
map = {};
for (i = 0; i < parts.length; i++) {
pieces = parts[i].split('=');
map[pieces[0]] = pieces[1];
}
if (+map.state !== state) {
return;
}
if (window.detachEvent) {
window.detachEvent("onmessage", handler);
} else {
window.removeEventListener("message", handler, false);
}
opened.close();
if (map.access_token) {
mapSuccess(map.access_token, map.expires);
return;
}
error({ errorName: map.error, errorMessage: map.error_description });
};

You’ll notice that we’re religious about checking the message for authenticity (origin, source, and state checks). This is very important as it helps prevent malicious scripts from using our script as a vector into a consumer; security is worth throwing bytes at.

Again we’re also conscientious about cleaning up, making sure to unregister our event listener, for the same performance reasons.

I’m using a mapSuccess function to handle the conversion of the response and invokation of success (and optionally calling fetchUsers()). This is probably wasting some space and will get refactored sometime in the future.

I’m passing expirationDate to success as a Date because of a mismatch between the Stack Exchange API (which talks in “seconds since the unix epoch”) and javascript (which while it has a dedicated Date type, thinks in “milliseconds since unix epoch”). They’re just similar enough to be confusing, so I figured it was best to pass the data in an unambiguous type.

Implementation Detail: The manner in which we’re currently calculating expirationDate can over-estimate how long the access token is good for. This is legal, because the expiration date of an access token technically just specifies a date by which the access token is guaranteed to be unusable (consider what happens to an access token for an application a user removes immediately after authenticating to).

Currently we’ve managed to squeeze this whole affair down into a little less than 3K worth of minified code, which gets down under 2K after compression. Considering caching (and our CDN) I’m pretty happy with the state of the library, though I have some hope that I can get us down close to 1K after compression.

[ed. As of version 568, the Stack Exchange Javascript SDK is down to 1.77K compressed, 2.43K uncompressed.]

Stack Exchange API V2.0: Authentication

Posted: 2012/01/18 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | 2 Comments

The most obvious addition to the 2.0 version of the Stack Exchange API (beta and contest currently under way) is authentication (authorization technically but the distinction isn’t pertinent to this discussion) in the form of OAuth 2.0. With this new feature, it’s now possible for a user to demonstrate to a third party who they are on any site in the Stack Exchange network.

Why OAuth 2.0?

OAuth 2.0 was a pretty easy choice. For one, there aren’t that many well known authentication protocols out there. OAuth 1.0a, OAuth 2.0, OpenID (sort of), … and that’s about it.

Though we’re quite familiar with OpenID, all it does is demonstrate who you are to a consumer; there’s no token for subsequent privileged requests. Furthermore, OpenID is… tricky to consume. At Stack Exchange we make use of the excellent dotNetOpenAuth, but when you’re providing an API you can’t assume all clients have an “easy out” in the form of a library; simplicity is king.

OAuth 1.0a is something of a disaster, in my professional opinion. It certainly works, but it’s very complicated with numerous flows that frankly most applications don’t need. It also forces developers to deal with the nitty gritty details of implementing signatures, and all sorts of encoding headaches (remember, an API cannot assume that developers can hand that work off to a library). If we had no other options then we’d go with OAuth 1.0a, but thankfully that’s not the case.

OAuth 2.0’s main strength is consumer simplicity. A simple redirect, POST (or just a redirect, depending on flow), and then out pops a token for privileged requests. It does impose a little bit of additional complexity on our end, as HTTPS is mandated by the standard but extra complexity on our end is fine (as opposed to on the consumer’s end). OAuth 2.0 is fairly new, and as with OAuth 1.0a a “conforming implementation” is a matter of debate so it’s not all roses; but it’s the best of the bunch.

Implementation Details

I explicitly modeled our implementation of OAuth 2.0 on Facebook’s, under the assumption that in the face of any ambiguities in the spec it’d be best if we went along with a larger provider’s interpretation. Not to imply that Facebook is automatically correct, but that following it is best option for those developing against our API; anyone wildly Googling for OAuth 2.0 is likely to find details on Facebook’s implementation, and it’d be best if they were also true for Stack Exchange’s.

For example the OAuth 2.0 spec calls for scopes to be space delimited while Facebook requires commas, at Stack Exchange we accept both. The spec also (for some bizarre reason) leaves the success and error responses when exchanging auth codes for access tokens in the explicit flow up to the implementation, in both cases we mimic Facebook’s implementation.

The Stack Exchange user model introduces some complications as well. On Stack Exchange you conceptually have 1 account (with any number of credentials) and many users (1 on each different site, potentially), you can be logged in as any number of users (with a cookie on each site) but aren’t really logged in at an account level. Since we didn’t want users to have to authenticate to each site they’re active on (with 70+ sites in the network, this would be incredibly unfriendly for power users) we needed to chose a single site that would serve as a “master site” and mediate logins required during OAuth 2.0 flows; we ended up choosing stackexchange.com to fill this role.

The Elephant In The Room

Pictured: The Stack Exchange conference room

By now, some of you are thinking “what’s the point, it’s trivial to compromise credentials in an OAuth flow”. Simple phishing for username/password in an app, more complicated script injection schemes, pulling cookies out of hosted browser instances, etc. There are some arguments against it from a UX standpoint as well.

Honestly, I agree with a good deal of these arguments. In a lot of cases OAuth really is a lot weaker than it’s been portrayed, though I would argue that it helps protect honest developers from themselves. You can’t make any silly mistakes around password storage if you never have an opportunity to store a password, for example.

However, these arguments aren’t really pertinent in Stack Exchange’s case because we’ve already settled on OpenID for login. This means that even if we wanted to support something akin to xAuth we couldn’t, a user’s username/password combo is useless to us. So we’re stuck with something that depends on a browser short of pulling off of OpenID altogether (which is almost certainly never going to happen, for reasons I hope are obvious).

Stack Exchange API V2.0: Implementing Filters

Posted: 2012/01/11 | Author: kevinmontrose | Filed under: code, pontification | Tags: apiv2 | Comments Off

As part of this series of articles, I’ve already discussed why we added filters to the Stack Exchange API in version 2.0 (go check it out, you could win a prize). Now I’m going to discuss how they were implemented and what drove the design.

Considerations

Stability

It is absolutely paramount that filters not break, ever. A lot of the benefits of filters go away if applications are constantly generating them (that is, if they aren’t “baked into” executables), and “frustrated” would be a gross understatement of how developers would feel if we kept forcing them to redistribute their applications with new filters.

From stability, it follows that filters need to be immutable. Consider the denial of service attack that would be possible from modifying a malicious party extracting and modifying a filter baked into a popular application.

Speed

One of the big motivations behind filters was improving performance, so it follows that the actual implementation of filters shouldn’t have any measurable overhead. In practice this means that no extra database queries (and preferably no network traffic at all) can occur as consequence of passing a filter.

Ease of Use

While it’s probably impossible to make using a filter more convenient than not using one, it’s important that using filters not be a big hassle for developers. Minimizing the number of filters that need to be created, and providing tools to aid in their creation are thus worthwhile.

Implementation

Format

Filters, at their core, ended up being a simple bitfield of which fields to include in a response. Bitfields are fast to decode, and naturally stable.

Also, every filter includes every type is encoded in this bitfield. This is important for the ease of use consideration, as it makes it possible to use a single filter for all your requests.

Encoding

A naive bitfield for a filter would have, at time of writing, 282 bits. This is a bit hefty, a base64 encoded naive filter would be approximately 47 characters long for example, so it behooves us to compress it somewhat.

An obvious and simple compression technique is to run-length encode the bitfield. We make this even more likely to bear fruit by grouping the bits first by “included in the default filter” and then by “type containing the field”. This grouping exploits the expectation that filters will commonly either diverge from the default filter or focus on particular types.

We also tweak the character’s we’ll use to encode a filter a bit, so we’re technically encoding in a base higher than 64; though we’re losing a character to indicate safe/unsafe (which is a discussion for another time).

All told, this gets the size of filters we’re seeing the wild down to a manageable 3 to 29 characters.

Bit Shuffling

This ones a bit odd, but in the middle of the encoding step we do some seemingly pointless bit shuffling. What we’re trying to do here is enforce opaqueness, why we’d want to do that deserves some explanation.

A common problem when versioning APIs is discovering that a number of consumers (oftentimes an uncomfortably large number) are getting away with doing technically illegal things. An example is SetWindowsHook in Win16 (short version, developers could exploit knowledge of the implementation to avoid calling UnhookWindowsHook), one from V1.0 of the Stack Exchange API is /questions/{id}/comments also accepting answer ids (this exploits /posts/{ids}/comments, /questions/{ids}/comments, and /answers/{ids}/comments all being aliases in V1.x). When you find such behavior you’re left choosing between breaking consumers or maintaining “bugs” in your implementation indefinitely, neither of which are great options.

The point of bit shuffling is to make it both harder to figure out the implementation (though naturally not impossible, the average developer is more than bright enough to figure our scheme out given enough time) so such “too clever for your own good” behavior is harder to pull off, and to really drive the point home that you shouldn’t be creating filters without calling /filter/create.

Backwards Compatibility

Maintaining compatibility between API versions with filters is actually pretty simple if you add one additional constraint, you never remove a bit in the field. This lets you use the length of the underlying bitfield as a version number.

Our implementation maintains a list of fields paired with the length of the bitfield they were introduced on. This lets us figure out which fields were available when a given filter was created, and exclude any newer fields when we encounter an old filter.

Composing A Query

One prerequisite for filters is the ability to easily compose queries against your datastore. After all, it’s useless to know that certain fields shouldn’t be fetched if you can’t actually avoid querying for them.

In the past we would have used LINQ-to-SQL, but performance concerns have long since lead us to develop and switch to Dapper, and SqlBuilder in Dapper.Contrib.

Here’s an rough outline of building part of an answer object query.

// While technically optional, we always need this so *always* fetch it
builder.Select("Id AS answer_id");
builder.Select("ParentId AS question_id");
// links and title are backed by the same columns
if (Filter.Answer.Link || Filter.Answer.Title)
{
builder.LeftJoin("dbo.Posts Q ON Q.Id = ParentId");
builder.Select("Q.Title as title");
}
if(Filter.Answer.LastEditDate)
{
builder.Select("LastEditDate AS last_edit_date");
}

The actual code is a bit heavier on extension methods and reuse.

Note that sometimes we’ll grab more data than we intend to return, such as when fetching badge_count objects we always fetch all three counts even if we only intend to return, say, gold. We rely on some IL magic just before we serialize our response to handle those cases.

Caches

The Stack Exchange network sites would fall over without aggressive caching, and our API has been no different. However, introducing filters complicates our caching approach a bit.

In V1.x, we just maintained query -> response and type+id -> object caches. In V2.0, we need to account for the fields actually fetched or we risk responding with too many or too few fields set when we have a cache hit.

The way we deal with this is to tag each object in the cache with a mini-filter which contains only those types that could have been returned by the method called. For example, the /comment mini-filter would contain all the fields on the comment and shallow_user types. When we pull something out of the cache, we can check to see if it matches by seeing if the cached mini-filter covers the relevant fields in the current request’s filter; and if so, use the cached data to avoid a database query.

One clever hack on top of this approach lets us service requests for filters that we’ve never actually seen before. When we have a cache hit for a given type+id pair but the mini-filter doesn’t cover the current request, we run the full request (database hit and all) and then merge the cached object with the returned one and place it back in the cache. I’ve taken to calling this “merge and return to cache” process as widening an object in cache.

Imagine: request A comes in asking for 1/2 the question fields, request B now comes in asking for the other 1/2, then request C comes in asking for all the fields on question. When A is processed there’s nothing in the cache, we run the query and place 1/2 of a question object in cache. When B is processed, we find the cached result of A but it doesn’t have the fields needed to satisfy B; so we run the query, widen the cached A with the new B. When C is processed, we find the cached union of A and B and voilà, we can satisfy C without hitting the database.

One subtlety is that you have to make sure a widened object doesn’t remain in cache forever. It’s all too easy for an object to gain a handful of fields on many subsequent queries resetting it’s expiration each time, causing you to serve exceptionally stale data. The exact solution depends on your caching infrastructure, we just add another tag to the object with it’s maximum expiration time; anything we pull out of the cache that’s past due to be expired is ignored.

Tooling

We attacked the problem of simplifying filter usage in two ways: providing a tool to generate filters, and enabling rapid prototyping with a human-friendly way to bypass filter generation.

Stack Exchange's Emmett Nicholas did a lot of the UI work here.

We spent a lot of time getting GUI for filter editing up to snuff in our API console (pictured, the /questions console). With just that console you can relatively easily generate a new filter, or use an existing one as a starting point. For our internal development practically all filters have ended up being created via this UI (which is backed by calls to /filter/create), dogfooding has lead me to be pretty satisfied with the result.

For those developers who aren’t using the API console when prototyping, we allow filters to be specified with the “include”, “exclude”, and “base” parameters (the format being the same as calls to /filter/create). The idea here is if you just want a quick query for, say, total questions you probably don’t want to go through the trouble of generating a filter; instead, just call /questions?include=.total&base=none&site=stackoverflow. However, we don’t want such queries to make their way into released application (they’re absurdly wasteful of bandwidth for one) so we need a way to disincentivize them outside of adhoc queries. We do this by making them available only when a query doesn’t pass an application key, and since increased quotas are linked to passing application keys we expect the majority of applications to use filters correctly.

Stack Exchange API V2.0: Filters

Posted: 2012/01/06 | Author: kevinmontrose | Filed under: pontification | Tags: apiv2 | Comments Off

We’re underway with the 2.0 version of the Stack Exchange API, so there’s no time like the present to get my thoughts on it written down. This is the first in a series of nine posts about the additions, changes, and ideas in and around our latest API revision. I’m sharing details because I think they’re interesting, but not with the expectation that everything I talk about will be generally applicable to other API designers.

First up, the addition of filters.

Mechanics

Pictured: the state of our documentation in the V1.0 beta.

Filters take the form of an opaque string passed to a method which specifies which fields you want returned. For example, passing “!A7x.GE1T” to /sites returns only the names of the sites in the Stack Exchange network (this is a simplification, more details when we get to implementation). This is similar to, but considerably terser than, partial returns via the “fields” parameter as implemented by Facebook and Google (note that we do allow something similar for key-less requests via the “include” and “exclude” parameters).

You can think of filters as redacting returned fields. Every method has some set of fields that can be returned, and a filter specifies which of those fields shouldn’t be. If you’re more comfortable thinking in SQL, filters specify the selected columns (the reality is a bit more complicated).

Filters are created by passing the fields to include, those to exclude, and a base filter to the /filter/create method. They’re immutable and never expire, making it possible (and recommended) to generate them once and then bake them into applications for distribution.

Motivations

There are two big motivations, and a couple of minor ones, for introducing filters.

Performance

We also log a monitor per-request SQL and CPU time for profiling purposes.

The biggest one was improved performance in general, and allowing developers to tweak API performance in particular. In the previous versions of the Stack Exchange API you generally fetched everything about an object even if you only cared about a few properties. There were ways to exclude comments and answers (which were egregiouly expensive in some cases) but that was it.

For example, imagine all you cared about were the users most recently active in a given tag (let’s say C#). In both V1.1 and V2.0 the easiest way to query is would be to use the /questions route with the tagged parameter. In V1.1 you can exclude body, answers, and comments but you’re still paying for close checks, vote totals, view counts, etc. In V2.0 you can get just the users, letting you avoid several joins and a few queries. The adage “the fastest query is one you never execute” holds, as always.

Bandwidth

Related to performance, some of our returns can be surprisingly large. Consider /users, which doesn’t return bodies (as /questions, /answers, and so on do) but does return an about_me field. These fields can be very large (at time of writing, the largest about_me fields are around 4k) and when multiplied by the max page size we’re talking about wasting 100s of kilobytes.

Even in the worst cases this is pretty small potatoes for a PC, but for mobile devices both the wasted data (which can be shockingly expensive) and the latency of fetching those wasted bytes can be a big problem. In V1.1 the only options we had were per-field true/false parameters (the aforementioned answers and comments) which quickly becomes unwieldy. Filters in V2.0 let us handle this byte shaving in a generic way.

Saner Defaults

In V1.1 any data we didn’t return by default either introduced a new method or a new parameter to get at that data, which made us error on the side of “return it by default”. Filters let us be much more conservative in what our default returns are.

A glance at the user type reveals a full six fields we’d have paid the cost of returning under the V1.1 regime. Filters also provide a convenient place to hang properties that are app wide (at least most of the time), such as “safety” (which is a discussion for another time).

Interest Indications

Filters give us a great insight into what fields are actually of interest to API consumers. Looking into usage gives us some indicators on where to focus our efforts, both in terms of optimization and new methods to add. Historically my intuition about how people use our API has been pretty poor, so having more signals to feed back into future development is a definite nice to have.

While not the sexiest feature (that’s probably authentication), filters are probably my favorite new feature. They’re a fairly simple idea that solve a lot of common, general, problems. My next post (ed: available here) will deal with some of the implementation details of our filter system.

History

Function

Rationale

The need for breaking changes

We couldn’t do everything

We should be much stabler now

Addressing Past Mistakes

Pitfalls

Semantics

Rationale

Why?

Screw up authentication, and you screw up write access

Screw up write access, and you screw up quality

Write access requires more than just writing

Motivations

Considerations

Implementation

Why OAuth 2.0?

Implementation Details

The Elephant In The Room

Considerations

Stability

Speed

Ease of Use

Implementation

Format

Encoding

Bit Shuffling

Backwards Compatibility

Composing A Query

Caches

Tooling

Mechanics

Motivations

Performance

Bandwidth

Saner Defaults

Interest Indications

Archive