Data-Driven Spacing Algorithms

One of the more exciting parts of the Quantized Knowledge Project is the chance to reclaim the crown for best spacing algorithm. Right now that crown belongs to SuperMemo, which uses a spacing algorithm called SM-18. (Anki is stuck on SM-2.)

What is a spacing algorithm? It's how you figure out when to next show a user a given card. More formally, spacing algorithms take a review history, and estimate the longest time after which you'll remember a card with x% (say 90%) probability. Good spacing algorithms matter--my friends who use SM-18 say that the long-term review workload is considerably lighter.

Anki and SuperMemo spend a lot of effort optimizing their spacing algorithms at a global level. But because their users don't share cards or data, there's not much they can do on a per-card basis. In other words, the best they can do is make all cards harder or easier. But because QKP collects data from its clients, we can micro-optimize the spacing algorithm for individual cards. That's our superpower.

For example, everyone who has done SRS knows that, even in a set of homogenous cards, some cards are inherently more difficult than others. For example, 一 ("one") is much harder than 鬱 ("gloom"), even though they're both single kanji production cards. If you collect review data from your users, you can observe that the retention rate is higher than average for 一, and lower than average for 鬱. Then you can adjust the intervals to compensate.

We can even determine cross-card correlation: let's say you have one card that asks what the wavelength range is for the color red, and another that asks about the wavelength range for orange. If you review the red card successfully, we can look at the population level data and increase the interval for the orange card by 20%. And that percentage can be tuned to be just right over a significant population of people reviewing both cards.

The core activity in spaced repetition is turning over flashcards. Here's a secret: nobody likes turning over flashcards. But we do it, because the benefits are enormous. I've spent 1250 hours in Anki over the last 6 or 7 years. A 20% improvement would be life changing--that's several weeks of my life back.