Stack Exchange API V2.0: Throttling

Here’s a post I lost while writing the rest of the Stack Exchange API V2.0 series, it’s been restored (ie. I hit the publish button) in glorious TechniColor.

Every API has some limit in the number of requests it can serve.  Practically, your servers will eventually fall down under load.  Pragmatically, you’ll want to limit access to keep your servers from falling down under load.  Rationing access to an API is a tricky business; you have to balance technical, business, and philosophical concerns.

History

In early versions of the Stack Exchange API we had two distinct throttles, a “simultaneous requests” throttle that would temporarily ban any IP making more than 30 requests per second and a “daily quota” throttle that would drop any requests over a certain threshold (typically 10,000) per day.

In V2.0, the simultaneous request throttle remains and unauthenticated applications (those making requests without access tokens) are subject to the same IP base daily quota that was in V1.x.  We’ve also added two new classes of throttles for authenticated applications, a per-app-user-pair quota and a per-user quota.

Function

When an authenticated app makes a request it is counted against a per-user-app-pair daily quota (notably not dependent upon IP address) and a per-user daily quota.  The user-app pair numbers are returned on each request (in fields on the common wrapper object), but the per-user daily value is hidden.

The authenticated application cases are a bit complicated, so here’s a concrete example:

A user is actively using 6 applications, all of which are making authenticated requests.  The user’s daily quota is 50,000 and each application has a daily quota of 10,000 (these numbers are typical).  Every time an application makes a request it receives its remaining quota with the response, and this number should be decremented by one on each request; if application #1 makes its 600th request it will receive “9,400” back, if application #2 simultaneously made its 9,000th request it will get “1,000” back.  If an application exceeds its personal quota (with its 10,001st request) its request will begin being denied, but every other application will continue to make requests as normal. However, if the sum of the requests made by each application exceeds the user’s quota of 50,000 then all requests will be denied.

There are good reasons to throttle.

Rationale

The simultaneous request throttle is purely technically driven, it helps protect the network from DOS attacks.  This throttle is enforced much higher in the stack than the others (affected requests never even hit our web tier), and is also much “ruder”.  In a perfect world we’d return well formed errors (as we do elsewhere in the API) when this is encountered, but practically speaking it’s much simpler to just let HAProxy respond however it pleases.

Our daily quotas are more tinted philosophically.  We still need to limit access to a scarce resource (our servers’ CPU time) and we don’t want any one application to monopolize the API to the detriment of others (or direct users of the Stack Exchange sites).  A simple quota system addresses these concerns sufficiently in my opinion.  It’s also “fair” since all you have to do is register on Stack Apps, after which you have access to all of our CC-wiki’d data.  The quota of 10,000 was chosen because it allows an app to request (100 at a time) one million questions a day via the API; it’s a nice round number, and is also fairly “absurd” giving us some confidence that most applications can function within the quota.

The per-app-user-pair quota starts pulling some business sense into the discussion.  In essence, IP based throttles fall apart in the face of computing as a service (Google App Engine, Azure, Amazon Web Services, and others).  While we were aware of this flaw in v1.0, we had no recourse at the time (lacking the resources to add authentication to the API and still ship in a timely manner).  By excluding services like App Engine from our API consumers we were hamstringing them, which made improving our throttling story a high ticket item for API v2.0.  As a rule of thumb, if a feature makes life easier on the consumers of your API you should seriously investigate it; after-all an API is not in and of itself interesting, the applications built on top of it are.

With the per-app-user-pair quota alone we still have to worry about DOS attacks, thus the per-user quota.  One wrinkle is in not reporting the state of the quota to applications.  I considered that, but decided that it was something of a privacy leak and not of interest to most applications; after all, there’s no way for one application to control another.