Disabling Third-Party Cookies Doesn’t (Meaningfully) Improve Privacy

Posted: 2011/09/26 | Author: kevinmontrose | Filed under: pontification | 5 Comments

Cookies aren't just for the dark side.

I noticed in some discussion on Hacker News about Google Chrome an argument that disabling third-party cookies somehow improved privacy. I don’t intend to comment on the rest of the debate, but this particular assertion is troubling.

At time of writing, only two browsers interfere with third-party cookies in any meaningful way. Internet Explorer denies setting third-party cookies unless a P3P header is sent. This is basically an evil bit, and just as pointless. No other browser even pretends to care about this standard.

The other is Apple’s Safari browser, which denies setting third-party cookies unless a user has “interacted” with the framed content. The definition of “interacted” is a bit fuzzy, but clicking seems to do it. No other browser does this, or anything like it. There are some laughably simple hacks around this, like floating an iframe under the user’s cursor (and, for some reason, submitting a form with a POST method). Even if those hacks didn’t exist, the idea is still pointless.

The reason I know about these rules is that we had to work around them when implementing auto-logins at Stack Exchange (there was an earlier version that straight up did not work for Safari due to reliance on third-party cookies). This also came up when implementing the Stack Exchange OpenID Provider, as we frame log in and account creation forms on our login page.

For auto-logins, I ended up using a combination of localStorage and postMessage that works on all modern browsers (since it’s not core functionality we were willing to throw IE7 under a bus at the time, and now that IE9 is out we don’t support IE7 at all). StackID tries some workarounds for Safari, and upon failure displays an error message providing some guidance.

These methods are somewhat less nefarious than this, but just slightly.

The joke is that there are alternatives that work just fine

ETags have gotten a lot of press, the gist being that you re-purpose a caching mechanism for tracking (similar tricks are possible with the Last-Modified header). This is a fundamental problem with any cache expiration scheme that isn’t strictly time based, as a user will always have to present some (potentially identifying) token to a server to see if their cache is still valid.

Panopticlick attacks the problem statistically, using the fact that any given browser is pretty distinctive in terms of headers, plugins, and so on independent of any cookies or cache directives. My install of Chrome in incognito mode provides ~20 bits of identifying information, which if indicative of the population at large implies a collision about every 1,200 users. In practice, most of these strings are globally unique so coupled with IP based geo-location it is more than sufficient for tracking if you’re only concerned with a small percentage of everyone on Earth. Peter Eckersley’s paper on the subject also presents a rudimentary algorithm for following changing fingerprints (section 5.2), so you don’t even have to worry about increased instability when compared to third-party cookies.

You can get increasingly nefarious with things like “image cookies,” where you a create a unique image and direct a browser to cache it forever. You then read the colors out via HTML5’s Canvas, and you’ve got a string that uniquely identifies a browser. This bypasses any same origin policy (like those applied to cookies and localStorage) since all browsers will just pull the image out of cache regardless of which domain the script is executing under. I believe this technique was pioneered by Evercookie, but there may be some older work I’m not aware of.

If you’ve been paying attention, you’ll notice that none of these techniques are exactly cutting edge. They’re still effective due in large part to the fact that closing all of these avenues would basically break the internet.

They aren't the most friendly of UIs, but they exist.

Why do we stick to cookies and localStorage?

The short of it is that we over at Stack Exchange are “Good Guys™,” and as such we don’t want to resort to such grey (or outright black) hat techniques even if we’re not using them nefariously. I hope the irony of doing the “right thing” being more trouble than the alternative isn’t lost on anyone reading this.

More practically, after 15 years of popular internet usage normal people actually kind-of-sort-of get cookies. Not in any great technical sense, but in the “clear them when I use a computer at the library” sense. Every significant browser also has a UI for managing them, and a way to wipe them all out. It’s for this reason that our OpenID provider only uses cookies, since it’s more important that it be practically secure-able than usable; at least when compared to the Stack Exchange sites themselves.

For global login, localStorage is acceptable since clearing it is somewhat less important. You can only login to existing accounts, only on our network, and on that network there are significant hurdles preventing really nefarious behavior (you cannot permanently destroy your account, or your content in most cases).

This reference predates Internet Explorer's cookie support.

What good does Safari’s third-party cookie behavior do?

Depending on how cynical you are, one of: nothing, mildly inconveniencing unscrupulous ad networks, or childishly spiting Google. I’m in the “nothing” category as there’s too much money to be had to believe it deters the seedier elements of the internet, and the notion that Apple would try to undermine a competitor’s revenue stream this way is too conspiracy theory-ish for me to take seriously.

I can believe someone at Apple thinks it helps privacy, but in practice it clearly doesn’t. At best, it keeps honest developers honest (not that they needed any prompting for this) and at worst it makes it even harder for user’s to avoid tracking as more and more developers resort to the more nefarious (but more reliable!) alternatives to third-party cookies.

There may be legitimate complaints about browser’s default behavior with regards to privacy, but having third-party cookies enabled by default isn’t one of them.

History Of The Stack Exchange API, Version 1.1

Posted: 2011/09/06 | Author: kevinmontrose | Filed under: pontification | Comments Off

In February we rolled out version 1.1 of the Stack Exchange API. This version introduced 18 new methods, a new documentation system, and an application gallery.

Developing this release was decidedly different than developing version 1.0. We were much more pressed for time as suggested edits (one of our bigger changes to the basic site experience) were being developed at basically the same time. Total development time on 1.1 amounted to approximately one month, as compared to three for 1.0.

The time constraint meant that our next API release would be a point release (in that we wouldn’t be able to re-implement much), which also meant we were mostly constrained by what had gone before. Version 1.0 had laid down some basic expectations: vectorized requests, a consistent “meta object” wrapper, JSON returns, and so on. This was a help, since a lot of the work behind an API release is in deciding these really basic things. It also hurt some though, since we couldn’t address any of the mistakes that had become apparent.

How we decided what to add in 1.1

There’s one big cheat available to Stack Exchange here; we’ve got a user base chock full of developers requesting features. This is not to suggest that all requests have been good ones, but they certainly helps prevent group-think in the development team.

More generally, I approached each potential feature with this checklist.

Has there been any expressed interest in the feature?
Is it generally useful?
Does it fit within the same model as the rest of the API?

Take everything that passes muster, order them by a combination of usefulness and difficulty of implementing (which is largely educated guess work), and take however many you think you’ve got time to implement off the top. I feel the need to stress that this is an ad hoc approach, while bits and pieces of this process were written down (in handy todo.txt files) there wasn’t a formal process or methodology built around it. No index cards, functional specs, planning poker, or what have you (I’m on record [25 minutes in or so] saying that we don’t do much methodology at Stack Exchange).

Careers's distinguishing feature is contact based, not data based.

Some examples from 1.1

Some new methods, like /questions/{ids}/linked, were the direct results of feature requests. Others, like /users/…/top-answers, came from internal requests; this one in support of Careers 2.0 (we felt it was important that most of the data backing Careers be publicly available with the introduction of passive candidates). Both methods easily pass the “expressed interest” bar.

General usefulness is fuzzier, and therefore trickier to show; it is best defined by counter-example in my opinion. Trivial violators are easy to imagine, the /jon-skeet or /users-born-in-february methods, but more subtle examples are less forthcoming. A decent example of a less than general method is one which gives access to the elements of a user’s global inbox which are public (almost every type of notification is in response to a public event, but there are a few private notifications). This would be useful only in the narrow cases where an app wants some subset of a user’s inbox data, but doesn’t want to show the inbox itself. I suspect this would be a very rare use case, based on the lack of any request for similar features on the sites themselves. It has the extra problem of being almost certain to be deprecated by a future API version that exposes the whole of an inbox in conjunction with user authentication.

One pitfall that leads to less than generally useful methods is to depend too much on using your own API (by building example apps, or consuming it internally for example) as a method of validating design. The approach is a popular one, and it’s not without merit, but you have to be careful not to write “do exactly what my app needs (but nearly no other app will)” methods. The Stack Exchange API veers a little into this territory with the /users/{ids}/timeline method which sort of assumes you’re trying to write a Stack Exchange clone, it’s not actually too specialized to be of no other use but it’s a less than ideally general.

Whether something “fits” can be a tad fuzzy as well. For instance, while there’s nothing technically preventing the /users/moderators method from returning a different type than /users (by adding, say, an elected_on_date field) I feel that would still be very wrong. A more subtle example would be a /posts method, that behaves like a union of /questions, /answers, and /comments. There’s some clear utility (like using it to get differential updates) however such a method wouldn’t “fit,” because we currently have no notion of returning a heterogeneous set of objects. There are also sharper “doesn’t fit” cases, like adding a method that returns XML (as the rest of the API returns JSON) or carrying state over between subsequent API calls (the very thought of which fills me with dread).

There was some experimentation in 1.1

In 1.1 almost everything done was quite safe, we didn’t change existing methods, we didn’t add new fields, and really there weren’t any radical changes anywhere. Well… except for two methods, /sites and /users/{id}/associated which got completely new implementations (the old ones naturally still available under /1.0).

These new versions address some of the short comings we knew about the API in general, and some problems peculiar to those methods in 1.0 (most of which stem from underestimating how many sites would be launched as part of Stack Exchange 2.0). Getting these methods, that would more properly belong in version 2.0, out early allowed us to get some feedback on the direction planned for the API. We had the fortune of having a couple well isolated methods (their implementations are completely independent of the rest of the API) that needed some work anyway on which to test our future direction; I’m not sure this is something that can reasonably be applied to other APIs.

The world of tomorrow

Version 1.1 is the current release of the Stack Exchange API, and has been for the last seven months. Aside from bug fixes, no changes have been made in that period. While work has not yet begun on version 2.0, it has been promised for this year and some internal discussion has occurred, some documents circulated, and the like. It’s really just a matter of finding the time now, which at the moment is mostly being taken up by Facebook Stack Overflow and related tasks.

Kevin Montrose