Adding Static Code Analysis to Stack Overflow

As of September 23rd 2019 we’re applying static analysis to some of the code behind public Stack Overflow, Stack Overflow for Teams, and Stack Overflow Enterprise in order to pre-emptively find and eliminate certain kinds of vulnerabilities. How we accomplished this is an interesting story, and also illustrative of advancements in .NET’s open source community.

But first…

What did we have before static analysis?

The Stack Overflow codebase has been under continuous development for around a decade, starting all the way back on ASP.NET MVC Preview 2. As .NET has advanced we’ve adopted tools that encourage safe practices like Razor (which defaults to encoding strings, helping protect against cross site scripting vulnerabilities). We’ve also created new tools that encourage doing things the Right Way™, like Dapper which handles parameterizing SQL automatically while still being an incredibly performant (lite-)ORM.

An incomplete, but illustrative, list of default-safe patterns in our codebase:

  • Automated SQL parameterization with Dapper
  • Default encoding strings in views with Razor
  • Requiring cross site request forgery (XSRF) tokens for non-idempotent (ie. POST, PUT, DELETE, etc.) routes by default
  • HMACs with default expirations and common validation code
  • Adopting TypeScript, an ongoing process, which increases our confidence around shipping correct JavaScript
  • Private data, for Teams and Enterprise, is on separate infrastructure with separate access controls

In essence we were safe, at least in theory, from most classes of injection and cross site scripting attacks.

So, …

What did static analysis give us?

In large part, confidence that we were consistently following our pre-established best practices. Even though our engineers are talented and our tooling is easy to use, we’ve had dozens of people working on Stack Overflow for 10+ years – inevitably some mistakes slipped into the codebase. Accordingly most fixes were just moving to doing something “the right way,” and pretty minor. Things like “use our route registration attribute, instead of [HttpPost]” or “remove old uses of SHA1, and switch to SHA256”.

The more “exciting” fixes required introducing new patterns, and updating old code to use them. While we had no evidence that any of these were exploited, or even exploitable in practice, we felt it was best to err on the side of caution and address them anyway. We added three new patterns as part of adopting static code analysis:

  1. We replaced uses of System.Random with an equivalent interface backed by System.Security.Cryptography.RandomNumberGenerator.
    1. It is very hard to prove a random number being predictable either is or isn’t safe, so we standardized on always hard to predict.
  2. We now default to forbidding HTTP redirects to domains we do not control, requiring all exceptions be explicitly documented.
    1. The concern here is open redirects, which can be used for phishing or other malicious purposes.
    2. Most of our redirects were already appropriately validating this, but the checks were scattered across the code base. There were a few missing or buggy checks, but we found no evidence of them being exploited.
  3. We strengthened XSRF checks to account for cases where users move between unauthenticated and authenticated states.
    1. Our XSRF checks previously assumed there was a single token tied to a user’s identity. Since this changes during authentication, some of our code suppressed this check and relied on other validation (completing an OAuth flow, for example).
    2. Even though all cases did have some kind of XSRF prevention, having any opt-outs of our default XSRF checking code is risky – so we decided to improve our checks to handle this case. Our fix was to allow two tokens to be acceptable, briefly, on certain routes.

Our checks run on every pull request for Stack Overflow, and additionally (and explicitly) on every Enterprise build – meaning we aren’t just confident that we’re following our best practices today but we’re confident we will keep following them in the future.

In terms of Open Web Application Security Project (OWASP) lists, we gained automatic detection of:

That wraps up what we found and fixed, but…

How did we add static code analysis?

This is boring because all we did was write a config file and add a PackageReference to SecurityCodeScan.

That’s it – Visual Studio will pick it up as an analyzer (so you get squigglies) and the C# compiler will do the same so you get warnings or errors (we treat all warnings as errors).

Not real code, ’cause by the time I thought to take a screenshot we’d already fixed everything.

Far more interesting is all the open source stuff that made this possible:

  • In 2014 Microsoft open sourced Roslyn, their C# and VB.NET compiler
  • Visual Studio 2015 ships with support of Roslyn analyzers
  • The authors of Security Code Scan start work in 2016
  • I contribute some minor changes to accommodate Stack Overflow peculiarities in 2019

If you’d told me 6 years ago that we’d be able to add any sort of code analysis to the Stack Overflow solution: trivially, for free, and in a way that contributes back to the greater developer community – I wouldn’t have believed you. It’s great to see “the new Microsoft’s” behavior benefit us so directly, but it’s even greater to see what the OSS community has built because of it.

We’ve only just shipped this, which begs the question…

What’s next with static code analysis?

Security is an ongoing process, not a bit you flip or a feature you add. Accordingly there will always be more to do and places we want to make improvements, and static code analysis is no different.

As I alluded to at the start, we’re only analyzing some of the code behind Stack Overflow. More precisely we’re not analyzing views, or tracing through inter-procedural calls – and analyzing both is an obvious next step.

We’ll be able to start analyzing views once our migration to ASP.NET Core is complete. Pre-Core Razor view compilation doesn’t give us an easy way to add any analyzers, but that should be trivial once we’re upgraded. Razor’s default behavior gives us some confidence around injection attacks, and views usually aren’t doing anything scary – but it will be nice to have stronger guarantees of correctness in the future.

Not tracing through inter-procedural calls is a bit more complicated. Technically this is a limitation of Security Code Scan, there’s an issue for it. That we can’t analyze views reduces the value of inter-procedural analysis today, since we almost always pass user-provided data into views. For now, we’re comfortable focusing on our controller action methods since basically all user-provided data passes through them before going onto views or other inter-procedural calls.

The beauty of open source is that when we do come back and do these next steps (and any other quality of life changes), we’ll be making them available to the community so everyone benefits. It’s a wonderful thing to be able to benefit ourselves, our customers, and .NET developers everywhere – all at the same time.

Sabbatical Log: Retrospective

I took a sabbatical in November and set out to learn some game development. It’s been a month-and-change since I wrapped up, and now I’m looking back to reflect on my endeavor.


My auto-responder for the month I was out.

My auto-response while I was out.

Stack Overflow gives employees 4 weeks of paid vacation on their 5th anniversary, and an additional week each subsequent year — up to a maximum of 8 accumulated weeks. It’s the same as regular vacation in terms of scheduling, pay, and whatnot except that you must take a minimum of a month off at a time. It was a big year for sabbaticals at Stack Overflow; 13 people took them in 2018.

Before taking the time off, I knew I’d need a project to keep me entertained, and I decided I’d play with a kind of coding I’d never really done before — game development.

What did I do?

I built the very beginning of a A Link To The Past pseudo-clone, working from first principles. Besides actually blitting pixels onto the screen, I built everything: loading assets, collision detection, animations, room transitions, etc. In the end I had a game with one real room, a working enemy, a few interactable objects (doors, bushes, walls, and pits), support for room transitions, and a lot of half-way decent infrastructure.

What am I proud of?

I feel like I actually accomplished a lot, given how little I knew going in — I was just an enthusiastic reader on the subject, with a decent command of C#. More concretely, and with some distance from the original authorship, these are the parts I’m most proud of:

  • Hot reloading of assets
    • Sprites, animations, hitmaps, and room backgrounds can all be edited while the game is running and they’ll be automatically reloaded. This made me way more productive when it came to “creating” assets (i.e. chopping up LTTP screenshots and stitching them together).
    • For a real game, I would have built a bunch of tools for creating and previewing things, but hot reloads felt like a clever and cheap alternative given my limited time.
    • I did sprites on the 2nd day (which implicitly handled hit maps), rooms on the 7th, and animations on the 8th.
  • A focus on DEBUG performance
    • I spent most of my time in DEBUG builds since I was, well, building and debugging. Accordingly, I felt I had to keep DEBUG performance decent enough to actually run the full game.
    • Since wrapping up, I’ve come across discussions suggesting that some actual game devs share this sentiment, which is a nice bit of validation. Of course, they may not be representative of the game industry as a whole, but at least I’m not alone in my craziness.
    • I spent time explicitly working on DEBUG performance on the 14th and 15th.
  • A decent separation of concerns
    • I had never worked with the Entity-Component-System pattern, but found that it did a really good job of keeping code separated in a sensible way.
    • For example, both the knights and the player can cut bushes… but neither of those entities have any knowledge of the other. Generally, things “just worked” without nasty code.
    • I also kept rendering unaware of the rest of the game, which feels like an accomplishment.
  • Useful debug overlays
    • I implemented overlays for sprite bounds, collision bounds, collision polygons, sub-system timings, and render timings (including FPS). These were crazy useful for debugging, and frankly I should have worked on them even earlier than I did.
    • I implemented these overlays on the 12th, 13th, 14th, and 28th.

What didn’t I get to?

“The rest of the game” is the obvious answer, but I knew going in that there was no way I could build everything in a month. A few of the things that I had at least made notes to start on were:

  • A DeathSystem to handle death animations and dropping items when something dies
    • Right now the sword knight, as the only enemy, handles its own death animations, which is less than ideal.
  • A source of random numbers
    • Current code just uses a System.Random instance, but I’d like to implement a custom provider for a few reasons:
      • Guarantee of stability, so no matter where the code is running random numbers will be the same.
      • In the same vein, parallelism requires some changes so that random numbers are consistent in a frame even if the actual code runs at different times.
      • There’s a lot of fun to be had with manipulating RNGs in old games (check out the Luck Manipulation TAS videos), so a custom seeding paired with a simple algorithm can actually be a plus.
  • Recording gameplay videos
    • I kept the rendering code and logic code separate, in part to enable this, but never got around to it.
    • I captured all the GIFs in my posts with ScreenToGif.
  • Some system for positioning frames in an animation and/or multi-part entity
    • In a number of systems there’s a “glue” function that repositions various entities based on current animation frames, which is not great.
    • These relative positions shouldn’t be in code as constants, they should be loaded up as part of an asset.
    • I mulled over the problem for a while, but couldn’t come up with a nice generic solution… maybe there’s something in the literature I’m unaware of, or maybe I’ve structured my entities and animations poorly.
  • Automated rebuilding of asset/animation/etc. enums
    • I mentioned this on the 16th, and it only got worse as time went on.
    • Besides lack of time, my biggest worry was that I’d build something that wasn’t portable outside of my machine — always a risk when tweaking builds.

What do I regret?

  • Using a custom fixed point implementation for collision detection
    • It was an interesting exercise, and I’m not convinced that it couldn’t be done well, but I spent a lot of time on it for little gain.
    • I also had to spend a lot of time optimizing this code since, especially in DEBUG builds, it was thousands of times slower than just using floats.
  • I didn’t start on debug overlays soon enough
    • I assumed that test would cover most of the same ground, which turned out to be false.
    • A few days were lost because I hadn’t anticipated the need for debug overlays, making debugging a lot harder than it should have been.

Overall, I have few regrets, which feels indicative of a month well spent. That said, if and when I continue working on this codebase, I’m sure I’ll find more things to regret.

Where did I spend my time?

  1. Collision detection and related code
    1. I spent time on this during the 3rd7th, 9th11th, 13th, & 16th.
    2. This totals about 33% of my time over the entire sabbatical.
  2. Asset creation and manipulation
    1. This isn’t well accounted for in the actual posts, but most days I spent an hour or so doing something with assets. Oftentimes this was just scaling and making minor edits, but sometimes it was painstaking screenshot comparisons to figure out relative positioning between two separate assets.
    2. As a rough estimate, I spent about 12% of my time on this task.
  3. Debugging tools
    1. Primarily overlays, as documented in the 11th13th.
    2. A lot of ToString() implementations to make life easier in the debugger.
    3. This was probably also about 8% of my time.

Everything else took the remaining 40-50% of my time, which feels about right. Since I was mostly working on building up the infrastructure of a proper game, I don’t think spending half of my time on plumbing was a problem. Now, if I had spent a year or two on this project, that division of time would be problematic — I’d expect more time to be spent on the actual “game” part as development progressed.

Will I keep going?

I’d like to, which is a weaselly answer.

The value of uninterrupted time to devote to development is hard to overstate, and if I continue I won’t have that. So I couldn’t expect to be nearly as productive, which makes the whole thing less attractive — I’m one of those people who derives a lot of enjoyment from making tangible progress.

All that said, I’m going to try and spend a weekend every now and then plugging away at the LearnMeAThing codebase. I may even keep notes and blog about it occasionally.

Where’s the code?

I can’t exactly publish the whole project, since it’s full of Link To The Past derived assets. I’m also not all that interested in really open sourcing it, because it’s a learning project for me I wouldn’t accept pull requests or handle issues. That said, I like the idea of folks learning from my examples-and-or-mistakes: so I put together a GPLv3 one-time dump of just the code.

Sabbatical Log: December 1st and 2nd

This blog series is running about a week behind my actual work, giving me time to clean things up.  The date the actual work was done is in the title, horizontal lines indicate when I stopped writing and started coding.

Family was in town on the first, I got nothing done.  But I did get a bonus day on the 2nd!

Logically, the knight knows how to chase the player now… there are some issues, however.

Just have to spend some time debugging here, then I’ll get to the user actually taking damage and recoiling.

The knight can now stab players, and it knocks the player away. As with other damage-y things, the player system is not actually aware of the knight or it’s sword – it is just informed that it was struck by something with the DealsDamage component attached to it.

Last task for the sabbatical, make the knight die when it’s hit by the player’s sword.

The sword knight can die!

And with that, my sabbatical is ended. I intend to write up a retrospective at a later date, but all in all I’m pretty happy with my progress and I learned a lot. Definitely ready to get back to work and see all my Stack Overflow people again though, a month of not working is quite enough.

Continue onto the Retrospective

Sabbatical Log: November 30th

This blog series is running about a week behind my actual work, giving me time to clean things up.  The date the actual work was done is in the title, horizontal lines indicate when I stopped writing and started coding.

It’s my last official day of sabbatical! I’m still planning to do a retrospective post, and have the weekend to maybe do some extra-sabbatical work.

But today, I’m hoping to finish up the sword knight.

I’ve got some smarter state management on the knight, it will no longer wander too far away from its origin, and have made the sword sharp. Now, when it’s in motion, it will “do damage” to anything it comes into contact with that can receive damage.

Right now bushes are the only things take take damage, so the knight is limited to cutting them. This is, again, emergent behavior – neither the knight’s code nor the bush’s code knows about the other and the collision system is completely ignorant of damage. I continue to be impressed with the Entity-Component-System design pattern, which made this structuring pretty painless.

Now to get the knight to see (and subsequently chase) the player.

Well, I didn’t quite get to chasing… but the knight can see things now. I modeled is vision as an invisible, but still hitable, box – so it’s just a matter of adding a reaction to the player colliding with it.

Turning on one of the debug layers lets us see the vision cones in action.

Continue onto December 1st and 2nd

Sabbatical Log: November 29th

This blog series is running about a week behind my actual work, giving me time to clean things up.  The date the actual work was done is in the title, horizontal lines indicate when I stopped writing and started coding.

Work continues on the sword knight. There’s a lot of infrastructure that has to be setup, including lots and lots of assets and animations, which is time consuming.

Currently, all I’ve got is a dismembered knight who can look right.

I should have something more impressive later today.

Progress continues, animations are as always pretty time consuming.

Thought this buggy loop looked funny though.

I’ve now got all the basic logic for the sword knight going.

It basically has three states:

  • Standing
  • Walking
  • Searching

You can see the various states here

Obviously, I haven’t actually got the knight moving yet – that shouldn’t be too tricky. But first I’ve got to define all the hitmaps for the various pieces (the knight is modeled as five entities: feet, body, head, sword, and shield), or bad things will happen when the knight starts walking into things – and that’s going to take a bit of time.

After that I need to make the knight not blind, right now it can’t actually find anything when searching. Fixing that will require a temporary entity, and a new state where the knight chases the player. And then finally, I can implement behavior for when the knight’s sword hits a player and the player sword hit’s a knight (or the knight’s shield).

With my time running out, I don’t know if I’ll be able to do all of these but I’m going to make an honest attempt.

End of the day, and the sword knight can move and has hitmaps for all his components.

It’s collision logic is mostly working, though there are no programmed behaviors for such collisions yet. There’s also no bounds checking to make sure the knight doesn’t just wander off the map.

The knight is thoroughly exercising all the patterns I’ve built up, and I can see plenty of places I’d make improvements – but there’s no time for it. One thing I did do was create an AssociatedEntityComponent that makes it easy to group entities together, an idea I’ve been putting off (it does complicate the EntityManager class) but keeping track of the 5 different entities making up a sword knight made it necessary.

Currently the SwordKnightSystem is 861 lines of code, it’s chunk of the ObjectCreator is 612, and the SwordKnightStateComponent is 74. Lots of code is spent dealing with error cases, and there’s lots of room to improve the ergonomics of the code… but all told, I’m not too displeased with current state of things.

I’m going to return to the sword knight tomorrow, hopefully wrapping it up.

Continue onto November 30th

Sabbatical Log: November 28th

This blog series is running about a week behind my actual work, giving me time to clean things up.  The date the actual work was done is in the title, horizontal lines indicate when I stopped writing and started coding.

I’ve got a little bit of cleanup to knock out before getting to implementing an enemy, and I want to get allocated bytes and some GC states into a debug overlay. So I’m going to do that real quick.

Humorously, I did find a couple leaks with this new GC reporting – and most of them were in the “render this debug layer”-code.

(generations go left to right, so it’s 5 gen0 collections, 4 gen1 collections, etc.)

I did also find a bug with returning FrameState’s to a reused pool, and fixed it. There’s a more complicated issue around delayed loading of some types & delegates, in particular those around collision detection, which can cause some collections after reaching the stable state.

I could go and fix those, but they’re a fixed cost and frankly I want to get started on an enemy.

I’ve decided I’m going to implement a sword knight, my first enemy and probably my last task for this sabbatical.

Continue onto November 29th

Sabbatical Log: November 27th

This blog series is running about a week behind my actual work, giving me time to clean things up.  The date the actual work was done is in the title, horizontal lines indicate when I stopped writing and started coding.

I’ve got falling down a pit mostly working, but I want to add a little drop shadow to show where the player is falling. This will be the first exit transition that creates an entity, so that’ll be interesting.

Whelp, that’s wrapped up

Again, I’m not gonna win any animation awards – but it works.

I think my final task for this sabbatical month is going to be to implement an actual enemy to fight, but before I do that there’s one more piece of infrastructure I want to build: a “job system”.

Pretty much everything that matters has multiple cores now, and it’s kind of silly not to take advantage of them – but multi-threading has some complications in the context of a game. Whereas a web app can assign threads to different logical requests, games have a single logical process. Unlike web apps where all threads are more or less equal, games typically have a privileged thread (the UI thread). And since this whole endeavor is a learning exercise, I think I should play around within these constraints.

I’ve worked up a very simple job system, composed of 3 parts:

  • JobRunner
  • Job (which implements IJob)
  • JobsCompletionToken

JobRunner is pre-sized for a set of threads, and a maximum number of concurrent jobs. Other code will use a single JobRunner to pre-allocate Job’s based on a delegate, and whatever additional state they need (the current GameState will always be passed, as it should always be needed). JobsCompletionToken is returned by the JobRunner.StartJobs method, wraps around whatever IJobs are passed, and has a WaitForCompletionMethod().

The basic idea is that JobRunner pre-allocates threads and JobCompletionTokens, various bits of code create their Job’s when they’re spun up, those Job’s (as IJobs) are re-used to run the code but with an updated GameState. I built it using the standard Thread, Monitor, and Interlocked classes. Threads do the actual work (the main thread is never stolen), the Interlocked class is used to add and remove IJobs and JobsCompletionTokens from queues, and Monitor is used to pause and resume threads. Since everything is pre-allocated, once startup is done this runs with no allocations.

Multithreaded code is difficult, and I’m not convinced I got this correct (though I did write tests). It’s also difficult to describe, so I’m just going to link to it: JobRunner, Job, and JobsCompletionToken.

Now to actually use it.

It’s the end of the day and I’ve converted two systems to use the JobRunner: the CollisionDetectionSystem, and the UpdatePositionSystem. Both of these were doing four passes of the same logical step, one for each level of a room.

(single threaded is on the left, multi-threaded on the right)

If you squint you can see some small improvements in DEBUG, although they’re mostly washed out in RELEASE builds. The multi-threading for the collision system is probably too coarse, honestly, since most everything is on one level. Regardless, I feel this exercise was a good one.

Tomorrow I’ll start on a proper enemy.

Continue onto November 28th