Source: David Hu blog, Nov 2011
The Khan Academy is well known for its extensive library of over 2600 video lessons. It should also be known for its rapidly-growing set of now 225 exercises — outnumbering stitches on a baseball — with close to 2 million problems done each day.
To determine when a student has finished a certain exercise, we award proficiency to a user who has answered at least 10 problems in a row correctly — known as a streak. Proficiency manifests itself as a gold star, a green patch on teachers’ dashboards, a requirement for some badges (eg. gain 3 proficiencies), and a bounty of “energy” points. Basically, it means we think you’ve mastered the concept and can move on in your quest to know everything.
It turns out that the streak model has serious flaws.
First, if we define proficiency as your chance of getting the next problem correct being above a certain threshold, then the streak becomes a poor binary classifier. Experiments conducted on our data showed a significant difference between students who take, say, 30 problems to get a streak vs. 10 problems right off the bat — the former group was much more likely to miss the next problem after a break than the latter.
False positives is not our only problem, but also false negatives. One of our largest source of complaints is from frustrated students who lost their streak. You get 9 correct, make a silly typo, and lose all your hard-earned progress. In other words, the streak thinks that users who have gotten 9 right and 1 wrong are at the same level as those who haven’t started.
In Search of a Better Model
These findings, presented by one of our full-time volunteers Jace, led us to investigate whether we could construct a better proficiency model. We prototyped a constant acceleration “rocketship” model (with heavy gnomes that slow you down on wrong answers), but ultimately decided that a prudent first step would be to just abstract away the streak model with the notion of “fill up the bar”.
Conversations with the team led me to conceive of applying machine learning to predict the likelihood of getting the next problem correct, and use that as the basis for a new proficiency model. Basically, if we think you’re more than % likely to get the next problem correct, for some threshold , we’ll say you’re proficient.