This post is part of a series on the Providence project at Stack Exchange, the first post can be found here.
We’ve gone over building the “developer-kind” and “technology” classifiers, but we’ve yet to describe actually using these predictions for anything. We started with better targeting for Stack Overflow Careers job listings, this was an attractive place to start because our existing system was very naïve, there was enough volume to run experiments easily, and mistakes wouldn’t be very harmful to our users.
In a nutshell, the challenge was: given a person and a set of job listings, produce a prediction of how well the person “fits” each of the jobs. We wanted to produce a value for each job-person pair, rather than just picking some number of jobs for a person, so we could experiment independently in the final ad selection approaches.
To quickly recap what we had to work with: for each person who visits Stack Overflow, we had 11 developer-kind percentage labels, 15 technology percentage labels, and the tag view data which was used to generate those sets of labels; for each job listed on Stack Overflow Careers, we had a list of preferred developer-kinds, preferred technologies, and some tags related to the job.
When designing this algorithm, there were a few goals we had to keep in mind. Neither overly broad jobs (ie. “Web Developer, Any Platform”) nor overly narrow jobs (ie. “ASP.NET MVC2, EF4, and F# in Brownsville”) should dominate. Similarly, while no single label should overwhelm the others, the algorithm should incorporate knowledge of how significant our experimentation revealed the different labels to be. This algorithm was also one of the few pieces of Providence that needed to run in real time.
The end result was very simple, and can be broken down into the following steps:
- Mask away any developer-kind and technology labels on the person that are not also in the job
- So a job without “Android Developer” will ignore that label on the person
- Sum each developer-kind percentage label that remains
- Sum each technology percentage label that remains
- Determine the tag the person is most active in which also appears on the job, if any
- Calculate what percentage of a person’s overall tag activity occurred in the tag chosen in the previous step
- Scale the percentages calculated in steps 3, 4, & 5 by some pre-calculated weights (see below) and sum them
- Determine the largest possible value that could have been calculated in step 6
- Divide the value of step 6 by the value in step 7, producing our final result
Conceptually, the above algorithm determines the features of a person who would be perfect for a job, and then determine how closely the actual person we’re considering matches.
The per-feature weights it incorporates let us emphasize that certain features are better predictors of a match than others. The least predictive feature is developer-kind, so it is given a weight of one, while technologies and tag matches have weights of two and three because they are roughly two and three times as predictive, respectively. We determined the weight for each feature with experiments in which each feature was used alone to target ads, then compared the relative improvement observed for each feature. These weights also matched our intuition; it makes sense that being a Full Stack Web Developer instead of a Back End Web Developer doesn’t matter as much as being unfamiliar with the technology stack, which itself isn’t as significant as lacking the domain knowledge implied by a particular tag.
Once we tested this algorithm and determined it worked, we had to decide what to do when it didn’t have enough information to make any intelligent predictions. This can happen in two cases: if we know nothing about the user (if they’re brand new, or have opted-out of Providence), and if we know nothing about the job (it may be a crazy outlier, badly entered, or just plain a bad job). In both cases we decided to predict a low, but non-zero, default value; this means that an “empty” person will still get a reasonable mix of jobs and that an “empty” job will still get some exposure. This default value was selected by randomly sampling people and averaging how well they match all the non-empty jobs we’ve ever seen, then adjusting that average down by one standard deviation. In practice, this means the default ranks somewhere between 30% and 50% of the non-empty matches calculated for most users.
There were a few non-Providence concerns to handle after nailing down this algorithm before we could ship the final product. We needed to constrain ads geographically, chose a subset of ads based on their predicted weights, and serve the actual ads on Stack Overflow. All of these were left to our ad server team, and dealt with without resorting to any machine learning trickery.