Stack Exchange API V2.0: Throttling

Here’s a post I lost while writing the rest of the Stack Exchange API V2.0 series, it’s been restored (ie. I hit the publish button) in glorious TechniColor.

Every API has some limit in the number of requests it can serve.  Practically, your servers will eventually fall down under load.  Pragmatically, you’ll want to limit access to keep your servers from falling down under load.  Rationing access to an API is a tricky business; you have to balance technical, business, and philosophical concerns.

History

In early versions of the Stack Exchange API we had two distinct throttles, a “simultaneous requests” throttle that would temporarily ban any IP making more than 30 requests per second and a “daily quota” throttle that would drop any requests over a certain threshold (typically 10,000) per day.

In V2.0, the simultaneous request throttle remains and unauthenticated applications (those making requests without access tokens) are subject to the same IP base daily quota that was in V1.x.  We’ve also added two new classes of throttles for authenticated applications, a per-app-user-pair quota and a per-user quota.

Function

When an authenticated app makes a request it is counted against a per-user-app-pair daily quota (notably not dependent upon IP address) and a per-user daily quota.  The user-app pair numbers are returned on each request (in fields on the common wrapper object), but the per-user daily value is hidden.

The authenticated application cases are a bit complicated, so here’s a concrete example:

A user is actively using 6 applications, all of which are making authenticated requests.  The user’s daily quota is 50,000 and each application has a daily quota of 10,000 (these numbers are typical).  Every time an application makes a request it receives its remaining quota with the response, and this number should be decremented by one on each request; if application #1 makes its 600th request it will receive “9,400” back, if application #2 simultaneously made its 9,000th request it will get “1,000” back.  If an application exceeds its personal quota (with its 10,001st request) its request will begin being denied, but every other application will continue to make requests as normal. However, if the sum of the requests made by each application exceeds the user’s quota of 50,000 then all requests will be denied.

There are good reasons to throttle.

Rationale

The simultaneous request throttle is purely technically driven, it helps protect the network from DOS attacks.  This throttle is enforced much higher in the stack than the others (affected requests never even hit our web tier), and is also much “ruder”.  In a perfect world we’d return well formed errors (as we do elsewhere in the API) when this is encountered, but practically speaking it’s much simpler to just let HAProxy respond however it pleases.

Our daily quotas are more tinted philosophically.  We still need to limit access to a scarce resource (our servers’ CPU time) and we don’t want any one application to monopolize the API to the detriment of others (or direct users of the Stack Exchange sites).  A simple quota system addresses these concerns sufficiently in my opinion.  It’s also “fair” since all you have to do is register on Stack Apps, after which you have access to all of our CC-wiki’d data.  The quota of 10,000 was chosen because it allows an app to request (100 at a time) one million questions a day via the API; it’s a nice round number, and is also fairly “absurd” giving us some confidence that most applications can function within the quota.

The per-app-user-pair quota starts pulling some business sense into the discussion.  In essence, IP based throttles fall apart in the face of computing as a service (Google App Engine, Azure, Amazon Web Services, and others).  While we were aware of this flaw in v1.0, we had no recourse at the time (lacking the resources to add authentication to the API and still ship in a timely manner).  By excluding services like App Engine from our API consumers we were hamstringing them, which made improving our throttling story a high ticket item for API v2.0.  As a rule of thumb, if a feature makes life easier on the consumers of your API you should seriously investigate it; after-all an API is not in and of itself interesting, the applications built on top of it are.

With the per-app-user-pair quota alone we still have to worry about DOS attacks, thus the per-user quota.  One wrinkle is in not reporting the state of the quota to applications.  I considered that, but decided that it was something of a privacy leak and not of interest to most applications; after all, there’s no way for one application to control another.


Stack Exchange API V2.0: The Stable Future

This is the last of my planned series about v2.0 of the Stack Exchange API (the contest is still going on, and we froze the interface a few weeks ago), and I’d like to talk about some regrets.  Not another mistakes post (API V2.0 is still too young to judge with the benefit of hindsight), but something in a similar vein.

This is thankfully a pretty short post, as I’ve worked pretty hard to reduce the regrettable things in every iteration of our API.

The need for breaking changes

I definitely tried to break as little as possible, compare the question object in V1.1 to that in V2.0 to see how little changed.  However, we ate two big breaks: restructuring the common response wrapper, and moving the API from per-site urls (api.stackoverflow.com, api.serverfault.com, etc.) to a single shared domain (api.stackexchange.com).

These were definitely needed, and I’m certain benefits they give us will quickly pay off the transition pain; but still, breaking changes impose work on consumers.  When building an API it’s important to never make breaking changes wantonly, you’re making non-trivial imposition on other developer’s time.

We couldn’t do everything

I’ve already got a list of things I want in V2.1, unfortunately some of them were things I’d like to have gotten into V2.0.  A truism of development is that there’s always more to be done, real developers ship, and so on.  API development is no different, but I’m hoping to see an increase in new features and a decrease in time between API revisions as Stack Exchange matures.

We should be much stabler now

Naturally, since I’m aware of these problems we’ve taken steps to make them less of an issue.

I’m very satisfied with the current request wrapper, and the filter feature (plus our backwards compatibility tooling around it) makes it much easier to make additions to it without breaking existing consumers.  In general, filters allow for lots of forward compatibility guarantees.

We’ll never be able to do everything, but all sorts of internal gotchas that made V1.1 really rough to roll out have been fixed as part of V2.0.  Subsequent minor revisions to V2.0 shouldn’t be nearly as rough, and with our growing staff developer resources should be comparatively plentiful.


Stack Exchange API V2.0: Consistency

If you take a look at the changelog for Stack Exchange API V2.0 (go check it out, you could win a prize), you’ll see we did more than just add new methods.  We also made a considerable number changes which taken as a whole aim to improve the consistency of the API’s interface.

Broadly speaking good APIs are consistent APIs.  Every time your API surprises a developer, or forces them to re-read the documentation, you’ve lost a small but important battle.  As I’ve said before, an API isn’t interesting in and of itself the applications built on top of it are; and accordingly any time wasted wrestling with the rough edges of your API is time not spent making those applications even more interesting.

Addressing Past Mistakes

This is an opinion I held during V1.0 as well, but we’re hardly perfect and some oddities slipped through.  Fields were inconsistently HTML encoded/escaped, question bodies and titles needing completely different treatment by an application.  There were ways to exclude some, but not all, expensive to fetch fields on some, but not many, methods.  There were odd naming patterns, for example /questions/{ids} expected question ids, and /comments/{ids} expected comment ids, but /revisions/{ids} expected post ids.  Return types could be unexpected, /questions returned questions as did /questions/{ids} but while /badges returned badges /badges/{ids} returned users (those who had earned the badges in question).

We’ve addressed all of these faults, and more, in V2.0.  Safety addresses the encoding issues, while Filters address the excluding of fields.  /revisions/{ids} was renamed to /posts/{ids}/revisions, while another odd method /revisions/{post ids}/{revision id} became the new /revisions/{revision ids}.  And all methods that exist in /method, /method/{ids} pairs return the same types.

There are some ways we’re perhaps… oddly consistent, that I feel merit mention.  All returns use the same “common wrapper object”, even errors; this lets developers use common de-serialization code paths for all returns.  Related, all methods return arrays of results even if the can reasonably be expected to always return a single item (such as /info); as with the common wrapper, this allows for sharing de-serialization code.  We return quota fields as data in the response body (other APIs, like Twitter, often place them in headers), so developers can handle them in the same manner as they would any other numeric data.

Identifying this game would make an interesting "age check" form.

Pitfalls

I would caution against what I’ve taken to calling “false consistency”, where an API either implies a pattern that doesn’t exist or sacrifices more important features in the name of consistency.

By way of example: the Stack Exchange API has a number of optional sorts on many methods, many of which are held in common (a great many methods have an activity sort, for instance).  The crux of the consistency objection to these sorts is that they often times apply to the same field (creation sorting applying to question, answer, and comment creation_date fields for example) but are named differently from the fields, some argue they should have the same name.

However, this would be a false consistency.  Consider if we just changed the names of the existing sorts, we would then be implying that all fields can be sorted by which isn’t true.  If did make it possible, we’d have sacrificed performance (and probably some of our SQL servers) in pursuit of consistency.  In this case the cure would be worse than the disease, though I would argue our sorts are consistent with regards to each other if not with regards to the fields they operate on.


Stack Exchange API V2.0: Safety

Every method in version 2.0 of the Stack Exchange API (no longer in beta, but an app contest continues) can return either “safe” or “unsafe” results.  This is a rather odd concept that deserves some explanation.

Semantics

Safe results are those for which every field on every object returned can be inlined into HTML without concern for script injections or other “malicious” content.  Unsafe results are those for which this is not the case, however that is not to imply that all fields are capable of containing malicious content in an unsafe return.

An application indicates where it wants safe or unsafe results via the filter it passes along with the request.  Since most requests should have some filter passed, it seemed reasonable to bundle safety into them.

To the best of my knowledge, this is novel in an API.  It’s not exactly something that we set out to add however, there’s a strong historical component to its inclusion.

Rationale

In version 1.0 of the Stack Exchange API every result is what we would call unsafe in version 2.0.  We were simply returning the data stored in our database, without much concern for encoding or escaping.  This lead to cases where, for example, question bodies were safe to inline (and in fact, escaping them would be an error) but question titles required encoding lest an application open itself to script injections.  This behavior caused difficulties for a number of third-party developers, and bit us internally a few times as well; which is why I resolved to do something to address it in version 2.0.

I feel that a great strength in any API is consistency, and thus the problem was not that you had to encode data, it was that you didn’t always have to.  We had two options, we could either make all fields “high fidelity” by returning the most original data we had (ie. exactly what user’s had entered, before tag balancing, entity encoding, and so on) or we could make all fields “safe” by make sure the same sanitization code ran against them regardless of how they were stored in the database.

Unfortunately, it was not to be.

Strictly speaking, I would have preferred to go with the “high fidelity” option but another consideration forced the “safe” one.  The vast majority of the content in the Stack Exchange network is submitted in markdown, which isn’t really standardized (and we’ve added a number of our own extensions).  Returning the highest fidelity data would be making the assumption that all of our consumers could render our particular markdown variant.  While we have open sourced most of our markdown implementation, even with that we’d be restricting potential consumers to just those built on .NET.  Thus the “high fidelity” option wasn’t just difficult to pursue, it was effectively impossible.

Given this train of thought, I was originally going to just make all returns “safe” period.  I realize in hindsight that this would have been a pretty grave mistake (thankfully, some of the developers at Stack Exchange, and some from the community, talked me out of it).  I think the parallel with C#’s unsafe keyword is a good one, sometimes you need dangerous things and you put a big “I know what I’m doing” signal right there when you do.  This parallel ultimately granted the options their names; properly escaped and inline-able returns are “safe”, and those that are pulled straight out of the database are “unsafe”.

Adding a notion of “safety” allows us to be very consistent in how data we return should be treated.  It also allows developers to ignore encoding issues if they so desire, at least in the web app case.  Of course, if they’re willing to handle encoding themselves they also have the option.


Stack Exchange API V2.0: No Write Access

Version 2.0 of the Stack Exchange API (still in beta, with an app contest) introduced authentication, but unlike most other APIs did not introduced write access at the same time.  Write access is one of the most common feature requests we get for the API, so I feel it’s important to explain that this was very much a conscious decision, not one driven solely by the scarcity of development resources.

Why?

The “secret sauce” of the Stack Exchange network is the quality of the content found on the sites.  When compared to competing platforms you’re much more likely to find quality existing answers to your questions, to find interesting and coherent questions to answer, and to quickly get good answers to new questions you post.  Every change we make or feature we add is considered in terms of either preserving or improving quality, and the Stack Exchange API has been no different.  When viewed under with lens of quality, a few observations can be made about the API.

Screw up authentication, and you screw up write access

Write access presupposes authentication, and any flaw in authentication is going to negatively impact the use of write access accordingly.  Authentication, being a new 2.0 feature, is essentially untested, unverified, and above all untrusted in the current Stack Exchange API.  By pushing authentication out in its own release we’re starting to build some confidence in our implementation, allowing us to be less nervous about write access in the future.

Screw up write access, and you screw up quality

The worst possible outcome of adding write access to the Stack Exchange API is a flood of terrible questions and answers on our sites.  As such the design of our API will need to actively discourage such an outcome, we can’t get away with a simple “POST /questions/create”.  However, a number of other APIs can get away with very simple write methods and it’s important to understand how they differ from the Stack Exchange model and as such need not be as concerned (beyond not having nearly our quality in the first place).

The biggest observation is that every Stack Exchange site is, in a sense, a single “well” in danger of being poisoned.  Every Stack Exchange site user sees the same Newest Questions page, the same User page, and (with the exception of Stack Overflow) the same Home page.  Compare with social APIs (ie. Facebook and Twitter, where everything is sharded around the user), or service APIs (like Twilio, which doesn’t really have common content to show users); there are lots of “wells”, none of which are crucial to protect.

Write access requires more than just writing

It’s easy to forget just how much a Stack Exchange site does to encourage proper question asking behavior in users.

I've circled the parts of the Ask page that don't offer the user guidance

A write API for Stack Exchange will need to make all of this guidance available for developers to integrate into their applications, as well as find ways to incentivize that integration.  We also have several automated quality checks against new posts, and a plethora of other rejection causes; all of which need to be conscientiously reported by the API (without creation oracles for bypassing those checks).

Ultimately the combination of: wanting authentication to be independently battle tested, the need to really focus on getting write access correct, and the scarcity of development resources caused by other work also slated for V2.0 caused write access to be deferred to a subsequent release.


Stack Exchange API V2.0: JS Auth Library

In a previous article I discussed why we went with OAuth 2.0 for authentication in V2.0 of the Stack Exchange API (beta and contest currently underway), and very shortly after we released a simple javascript library to automate the whole affair (currently also considered “in beta”, report any issues on Stack Apps).  The motivations for creating this are, I feel, non-obvious as is why it’s built the way it is.

Motivations

I’m a strong believer in simple APIs.  Every time a developer using your API has to struggle with a concept or move outside their comfort zone, your design has failed in some small way. When you look at the Stack Exchange API V2.0, the standout “weird” thing is authentication.  Every other function in the system is a simple GET (well, there is one POST with /filters/create), has no notion of state, returns JSON, and so on.  OAuth 2.0 requires user redirects, obviously has some notion of state, has different flows, and is passing data around on query strings or in hashes. It follows that, in pursuit of overall simplicity, it’s worthwhile to focus on simplifying consumers using our authentication flows.  The question then becomes “what can we do to simplify authentication?”, with an eye towards doing as much good as possible with our limited resources. The rationale for a javascript library is that:

  • web applications are prevalent, popular, and all use javascript
  • we lack expertise in the other (smaller) comparable platforms (Android and iOS, basically)
  • web development makes it very easy to push bug fixes to all consumers (high future bang for buck)
  • other APIs offer hosted javascript libraries (Facebook, Twitter, Twilio, etc.)

Considerations

The first thing that had to be decided was the scope of the library, as although the primary driver for the library was the complexity of authentication that did not necessarily mean that’s all the library should offer. Ultimately, all it did cover is authentication, for reasons of both time and avoidance of a chilling affect.  Essentially, scoping the library to just authentication gave us the biggest bang for our buck while alleviating most fears that we’d discourage the development of competing javascript libraries for our API.  It is, after all, in Stack Exchange’s best interest for their to be a healthy development community around our API. I also decided that it was absolutely crucial that our library be as small as possible, and quickly served up.  Negatively affecting page load is unacceptable in a javascript library, basically.  In fact, concerns about page load times are why the Stack Exchange sites themselves do not use Facebook or Twitter provided javascript for their share buttons (and also why there is, at time of writing, no Google Plus share option).  It would be hypocritical to expect other developers to not have the same concerns we do about third-party includes.

Implementation

Warning, lots of this follows.

Since it’s been a while since there’s been any code in this discussion, I’m going to go over the current version (which reports as 453) and explain the interesting bits.  The source is here, though I caution that a great many things in it are implementation details that should not be depended upon.  In particular, consumers should always link to our hosted version of the library (at https://api.stackexchange.com/js/2.0/all.js).

The first three lines sort of set the stage for “small as we can make it”.

window.SE = (function (navigator, document,window,encodeURIComponent,Math, undefined) {
 "use strict";
var seUrl, clientId, loginUrl, proxyUrl, fetchUserUrl, requestKey, buildNumber = '@@~~BuildNumber~~@@';

I’m passing globals as parameters to the closure defining the interface in those cases where we can expect minification to save space (there’s still some work to be done here, where I will literally be counting bytes for every reference).  We don’t actually pass an undefined to this function, which both saves space and assures nobody’s done anything goofy like giving undefined a value.  I intend to spend some time seeing if similar proofing for all passed terms is possible (document and window are already un-assignable, at least in some browsers). Note that we also declare all of our variables in batches throughout this script, to save bytes from repeating “var” keywords.

Implementation Detail: “@@~~BuildNumber~~@@” is replaced as part of our deploy.  Note that we pass it as a string everywhere, allow us to change the format of the version string in the future.  Version is provided only for bug reporting purposes, consumers should not depend on its format nor use it in any control flow.

function rand() { return Math.floor(Math.random() * 1000000); }

Probably the most boring part of the entire implementation, gives us a random number.  Smaller than inlining it everywhere where we need one, but not by a lot even after minifying references to Math.  Since we only ever use this to avoid collisions, I’ll probably end up removing it altogether in a future version to save some bytes.

function oldIE() {
  if (navigator.appName === 'Microsoft Internet Explorer') {
    var x = /MSIE ([0-9]{1,}[\.0-9]{0,})/.exec(navigator.userAgent);
    if (x) {
      return x[1] <= 8.0;
    }
  }
  return false;
}

Naturally, there’s some Internet Explorer edge case we have to deal with.  For this version of the library, it’s that IE8 has all the appearances of supporting postMessage but does not actually have a compliant implementation.  This is a fairly terse check for Internet Explorer versions <= 8.0, inspired by the Microsoft recommended version.  I suspect a smaller one could be built, and it’d be nice to remove the usage of navigator if possible.

Implementation Detail:  There is no guarantee that this library will always treat IE 8 or lower differently than other browsers, nor is there a guarantee that it will always use postMessage for communication when able.

Now we get into the SE.init function, the first method that every consumer will need to call.  You’ll notice that we accept parameters as properties on an options object; this is a future proofing consideration, as we’ll be able to add new parameters to the method without worrying (much) about breaking consumers.

You’ll also notice that I’m doing some parameter validation here:

if (!cid) { throw "`clientId` must be passed in options to init"; }
if (!proxy) { throw "`channelUrl` must be passed in options to init"; }
if (!complete) { throw "a `complete` function must be passed in options to init"; }
if (!requestKey) { throw "`key` must be passed in options to init"; }

This is something of a religious position, but I personally find it incredibly frustrating when a minified javascript library blows up because it expected a parameter that wasn’t passed.  This is inordinately difficult to diagnose given how trivial the error is (often being nothing more than a typo), so I’m checking for it in our library and thereby hopefully saving developers some time.

Implementation Detail:  The exact format of these error messages isn’t specified, in fact I suspect we’ll rework them to reduce some repetition and thereby save some bytes.  It is also not guaranteed that we will always check for required parameters (though I doubt we’ll remove it, it’s still not part of the spec) so don’t go using try-catch blocks for control flow.

This odd bit:

if (options.dev) {
 seUrl = 'https://dev.stackexchange.com';
 fetchUserUrl = 'https://dev.api.stackexchange.com/2.0/me/associated';
} else {
 seUrl = 'https://stackexchange.com';
 fetchUserUrl = 'https://api.stackexchange.com/2.0/me/associated';
}

Is for testing on our dev tier.  At some point I’ll get our build setup to strip this out from the production version, there’s a lot of wasted bytes right there.

Implementation Detail: If the above wasn’t enough, don’t even think about relying on passing dev to SE.init(); it’s going away for sure.

The last bit of note in SE.init, is the very last line:

setTimeout(function () { complete({ version: buildNumber }); }, 1);

This is a bit of future proofing as well.  Currently, we don’t actually have any heaving lifting to do in SE.init() but there very well could be some in the future.  Since we’ll never accept blocking behavior, we know that any significant additions to SE.init() will be asynchronous; and a complete function would be the obvious way to signal that SE.init() is done.

Implementation Detail:  Currently, you can get away with calling SE.authenticate() immediately, without waiting for the complete function passed to SE.init() to execute.  Don’t do this, as you may find that your code will break quite badly if our provided library starts doing more work in SE.init().

Next up is fetchUsers(), an internal method that handles fetching network_users after an authentication session should the consumer request them.  We make a JSONP request to /me/associated, since we cannot rely on the browser understanding CORS headers (which are themselves a fairly late addition to the Stack Exchange API).

Going a little out of order, here’s how we attach the script tag.

while (window[callbackName] || document.getElementById(callbackName)) {
 callbackName = 'sec' + rand();
}
window[callbackName] = callbackFunction;
src += '?access_token=' + encodeURIComponent(token);
src += '&pagesize=100';
src += '&key=' + encodeURIComponent(requestKey);
src += '&callback=' + encodeURIComponent(callbackName);
src += '&filter=!6RfQBFKB58ckl';
script = document.createElement('script');
script.type = 'text/javascript';
script.src = src;
script.id = callbackName;
document.getElementsByTagName('head')[0].appendChild(script);

The only interesting bit here is the while loop making sure we don’t pick a callback name that is already in use.  Such a collision would be catastrophically bad, and since we can’t guarantee anything about the hosting page we don’t have a choice but to check.

Implementation Detail:  JSONP is the lowest common denominator, since many browsers still in use do not support CORS.  It’s entirely possible we’ll stop using JSONP in the future, if CORS supporting browsers become practically universal.

Our callbackFunction is defined earlier as:

callbackFunction =
 function (data) {
  try {
   delete window[callbackName];
  } catch (e) {
   window[callbackName] = undefined;
  }
  script.parentNode.removeChild(script);
  if (data.error_id) {
   error({ errorName: data.error_name, errorMessage: data.error_message });
   return;
  }
  success({ accessToken: token, expirationDate: expires, networkUsers: data.items });
 };

Again, this is fairly pedestrian.  One important thing that is often overlooked when making these sorts of libraries is the cleanup of script tags and callback functions that are no longer needed.  Leaving those lingering around does nothing but negatively affect browser performance.

Implementation Detail:  The try-catch block is a workaround for older IE behaviors.  Some investigation into whether setting the callback to undefined performs acceptably for all browsers may let us shave some bytes there, and remove the block.

Finally, we get to the whole point of this library: the SE.authenticate() method.

We do the same parameter validation we do in SE.init, though there’s a special case for scope.

if (scopeOpt && Object.prototype.toString.call(scopeOpt) !== '[object Array]') { throw "`scope` must be an Array in options to authenticate"; }

Because we can’t rely on the presence of Array.isArray in all browsers, we have to fall back on this silly toString() check.

The meat of SE.authenticate() is in this block:

if (window.postMessage && !oldIE()) {
 if (window.attachEvent) {
  window.attachEvent("onmessage", handler);
 } else {
  window.addEventListener("message", handler, false);
 }
 } else {
  poll =
   function () {
    if (!opened) { return; }
    if (opened.closed) {
     clearInterval(pollHandle);
     return;
    }
    var msgFrame = opened.frames['se-api-frame'];
    if (msgFrame) {
     clearInterval(pollHandle);
     handler({ origin: seUrl, source: opened, data: msgFrame.location.hash });
    }
 };
 pollHandle = setInterval(poll, 50);
}
opened = window.open(url, "_blank", "width=660, height=480");

In a nutshell, if a browser supports (and properly implements, unlike IE8) postMessage we use that for cross-domain communication other we use the old iframe trick.  The iframe approach here isn’t the most elegant (polling isn’t strictly required) but it’s simpler.

Notice that if we end up using the iframe approach, I’m wrapping the results up in an object that quacks enough like a postMessage event to make use of the same handler function.  This is easier to maintain, and saves some space through code reuse.

Implementation Detail:  Hoy boy, where to start.  First, the usage of postMessage or iframes shouldn’t be relied upon.  Nor should the format of those messages sent.  The observant will notice that stackexchange.com detects that this library is in use, and only create an iframe named “se-api-frame” when it is; this behavior shouldn’t be relied upon.  There’s quite a lot in this method that should be treated as a black box; note that the communication calisthenics this library is doing isn’t necessary if you’re hosting your javascript under your own domain (as is expected of other, more fully featured, libraries like those found on Stack Apps).

Here’s the handler function:

handler =
 function (e) {
 if (e.origin !== seUrl || e.source !== opened) { return; }
 var i,
 pieces,
 parts = e.data.substring(1).split('&'),
 map = {};
 for (i = 0; i < parts.length; i++) {
  pieces = parts[i].split('=');
  map[pieces[0]] = pieces[1];
 }
 if (+map.state !== state) {
  return;
 }
 if (window.detachEvent) {
  window.detachEvent("onmessage", handler);
 } else {
 window.removeEventListener("message", handler, false);
 }
opened.close();
if (map.access_token) {
 mapSuccess(map.access_token, map.expires);
 return;
}
error({ errorName: map.error, errorMessage: map.error_description });
};

You’ll notice that we’re religious about checking the message for authenticity (origin, source, and state checks).  This is very important as it helps prevent malicious scripts from using our script as a vector into a consumer; security is worth throwing bytes at.

Again we’re also conscientious about cleaning up, making sure to unregister our event listener, for the same performance reasons.

I’m using a mapSuccess function to handle the conversion of the response and invokation of success (and optionally calling fetchUsers()).  This is probably wasting some space and will get refactored sometime in the future.

I’m passing expirationDate to success as a Date because of a mismatch between the Stack Exchange API (which talks in “seconds since the unix epoch”) and javascript (which while it has a dedicated Date type, thinks in “milliseconds since unix epoch”).  They’re just similar enough to be confusing, so I figured it was best to pass the data in an unambiguous type.

Implementation Detail:  The manner in which we’re currently calculating expirationDate can over-estimate how long the access token is good for.  This is legal, because the expiration date of an access token technically just specifies a date by which the access token is guaranteed to be unusable (consider what happens to an access token for an application a user removes immediately after authenticating to).

Currently we’ve managed to squeeze this whole affair down into a little less than 3K worth of minified code, which gets down under 2K after compression.  Considering caching (and our CDN) I’m pretty happy with the state of the library, though I have some hope that I can get us down close to 1K after compression.

[ed. As of version 568, the Stack Exchange Javascript SDK is down to 1.77K compressed, 2.43K uncompressed.]


Stack Exchange API V2.0: Authentication

The most obvious addition to the 2.0 version of the Stack Exchange API (beta and contest currently under way) is authentication (authorization technically but the distinction isn’t pertinent to this discussion) in the form of OAuth 2.0.  With this new feature, it’s now possible for a user to demonstrate to a third party who they are on any site in the Stack Exchange network.

Why OAuth 2.0?

OAuth 2.0 was a pretty easy choice.  For one, there aren’t that many well known authentication protocols out there.  OAuth 1.0a, OAuth 2.0, OpenID (sort of), … and that’s about it.

Though we’re quite familiar with OpenID, all it does is demonstrate who you are to a consumer; there’s no token for subsequent privileged requests.  Furthermore, OpenID is… tricky to consume.  At Stack Exchange we make use of the excellent dotNetOpenAuth, but when you’re providing an API you can’t assume all clients have an “easy out” in the form of a library; simplicity is king.

OAuth 1.0a is something of a disaster, in my professional opinion.  It certainly works, but it’s very complicated with numerous flows that frankly most applications don’t need.  It also forces developers to deal with the nitty gritty details of implementing signatures, and all sorts of encoding headaches (remember, an API cannot assume that developers can hand that work off to a library).  If we had no other options then we’d go with OAuth 1.0a, but thankfully that’s not the case.

OAuth 2.0’s main strength is consumer simplicity.  A simple redirect, POST (or just a redirect, depending on flow), and then out pops a token for privileged requests.  It does impose a little bit of additional complexity on our end, as HTTPS is mandated by the standard but extra complexity on our end is fine (as opposed to on the consumer’s end).  OAuth 2.0 is fairly new, and as with OAuth 1.0a a “conforming implementation” is a matter of debate so it’s not all roses; but it’s the best of the bunch.

Implementation Details

I explicitly modeled our implementation of OAuth 2.0 on Facebook’s, under the assumption that in the face of any ambiguities in the spec it’d be best if we went along with a larger provider’s interpretation.  Not to imply that Facebook is automatically correct, but that following it is best option for those developing against our API; anyone wildly Googling for OAuth 2.0 is likely to find details on Facebook’s implementation, and it’d be best if they were also true for Stack Exchange’s.

For example the OAuth 2.0 spec calls for scopes to be space delimited while Facebook requires commas, at Stack Exchange we accept both.  The spec also (for some bizarre reason) leaves the success and error responses when exchanging auth codes for access tokens in the explicit flow up to the implementation, in both cases we mimic Facebook’s implementation.

The Stack Exchange user model introduces some complications as well.  On Stack Exchange you conceptually have 1 account (with any number of credentials) and many users (1 on each different site, potentially), you can be logged in as any number of users (with a cookie on each site) but aren’t really logged in at an account level.  Since we didn’t want users to have to authenticate to each site they’re active on (with 70+ sites in the network, this would be incredibly unfriendly for power users) we needed to chose a single site that would serve as a “master site” and mediate logins required during OAuth 2.0 flows; we ended up choosing stackexchange.com to fill this role.

The Elephant In The Room

Pictured: The Stack Exchange conference room

By now, some of you are thinking “what’s the point, it’s trivial to compromise credentials in an OAuth flow”.  Simple phishing for username/password in an app, more complicated script injection schemes, pulling cookies out of hosted browser instances, etc.  There are some arguments against it from a UX standpoint as well.

Honestly, I agree with a good deal of these arguments.  In a lot of cases OAuth really is a lot weaker than it’s been portrayed, though I would argue that it helps protect honest developers from themselves.  You can’t make any silly mistakes around password storage if you never have an opportunity to store a password, for example.

However, these arguments aren’t really pertinent in Stack Exchange’s case because we’ve already settled on OpenID for login.  This means that even if we wanted to support something akin to xAuth we couldn’t, a user’s username/password combo is useless to us.  So we’re stuck with something that depends on a browser short of pulling off of OpenID altogether (which is almost certainly never going to happen, for reasons I hope are obvious).