The Japanese Foundation Deck

tl; dr. Nobody knows which Japanese learning strategies work and which don't because nobody collects data or uses the scientific method. This causes lots of new learners to pick bad strategies and fail to learn. The goal of the Quantized Knowledge Project is to impose scientific rigor and prevent that.

The first deck we'll test will be the Japanese Foundation deck: 10,000 real sentences sourced from anime and light novels, with the difficulty smoothly increasing from「XはYです」to full paragraphs of real text. You'll learn grammar as you go through other learners' comments on the cards.

Fill out this form if you're interested in being an early user of this deck, or get in touch at robertvc at mit dot edu if you're interested in helping to create and maintain it.

The Why

There are two hard things about teaching yourself Japanese: teaching yourself something, and Japanese. Japanese will always be hard. But teaching yourself something doesn't need to be.

In a traditional classroom, figuring out how to learn is your teacher's job. Your job is just to do the learning. But if you teach yourself, figuring out how to learn is also your job. And you have to decide how to learn without knowing anything about the underlying subject.

There's an infinite variety of ways to teach yourself Japanese, and all of them have people swearing they work and people swearing that they don't. You, the new learner, basically pick a method at random and hope it works out.

The reason there's so much disagreement is that everybody has only a single data point: what worked for them. Nobody knows whether RTK is actually worth the time or not, whether clozes are more or less effective than sentence comprehension cards, whether switching to monolingual decks is worth the time. People have guesses, but nobody has done the science, because it's impossible to do science with n=1.

So we're going to get the data and do the science. We're going to do for learning Japanese what the Model T did for cars: make it predicable, repeatable, and efficient.

In the process, we're going to make the single best Japanese learning materials the world has ever seen. We're going to prove that they work, and then iteratively improve them. Long term collaboration has given the programming community god-tier tooling, and there's no reason it won't do the same for Japanese.

How: the Platform

The spaced repetition platform we use needs to support basically two things. The first is to allow individual decks to evolve and be guided by the community over time. The second is to be an engine for science: an unbiased source of data about which decks work and which don't.

The allowing decks to evolve part is fairly easy. At a per-card level, we support comments (share insight with your fellow learners) and suggestions (fix inaccuracies). At a per-deck level, we provide deck makers with the aggregate statistics to understand how their decks are being used and where they can be improved. Community decks are hosted on GitHub, where it's easy for others to suggest and merge in improvements. That's how we give the community the tools to make a kick-ass deck.

The engine of science part is a bit trickier. First, we require each deck to declare a testable hypothesis: perhaps, "a learner who finishes this deck with a 90% retention rate will understand the gist of any one minute segment of Boku no Hero Academia." Then, we give people who finish the deck @ 90% some random one minute segments and quiz their comprehension. Does the deck do what it claims to do? Does another deck do it better? We then adjust the claim at the beginning of the deck, so new learners can see what they're getting into.

Those are the two functions of Quantized.co: to facilitate collaborative deck building, and to provide the data that underlies any good scientific engine.

(If you're still feeling antsy about a new platform, that's super reasonable and I get it. Maybe check out Quantized, Data, and Money)?

How: The Deck

I want to start this science off with a bang. Here's the hypothesis for the Japanese Foundation Deck.

Starting from RTK, a new learner of Japanese can learn to understand the vast majority (>95%) of the sentences from Norwegian Wood or other (light) novel of similar difficulty after going through a premade ten thousand sentence deck.

The Japanese Foundation deck contains ten thousand real sentences taken from sources that you actually care about--novels, movies, light novels, anime. The sentences gradually get longer, more grammatically challenging, and use more advanced vocabulary as you progress through the deck. They also transition (smoothly) to monolingual after 2000 cards. By the end, you should be able to jump (again, smoothly) off into reading novels.

From a corpus of hundreds of books (about 5 million sentences), a computer program suggests a few that are the right difficulty to be the next sentence. Then we review the options by hand and pick our favorite. That way you get the benefit of using real Japanese sentences without the unpredictability and wild swings in difficulty.

Each card supports comments. When you encounter a new grammatical point, you can read the comments to figure out what's going on. When you encounter a new kanji, we'll link you to RTK. And if you figure out what's going on with some particularly obtuse sentence, you can explain it to your fellow learners.

Readings are seeded with a modern UniDic, and TTS is through an Azure WaveNet. As users go through the deck, they can suggest contributions through the QKP interface, which will go back for review and integration. We also expect to get professional voice acting done once the basic hypothesis is validated (see the QKP Secret Master Plan for more details).

What's Next?

Make no mistake, this project is an experiment. It might turn out to be impossible. But if we succeed, we'll completely redefine the Japanese learning landscape, and potentially the language learning landscape as a whole.

If you're interested in using a deck like the one outlined above, fill out the interest form here by 3/20. The daily commitment will be no more than 45 minutes over a few months. To ensure that we can keep track of everyone (important if we're to do real science!), the first cohort will be limited to about a dozen people. No worries if this time frame doesn't work for you--we'll eventually open the deck to the public.

Alternatively, if you're interested in helping to build out this deck, I'd love to hear from you. We need lots of different types of expertise to pull this off successfully: programmers, proficient Japanese users, spaced repetition aficionados, and anyone else who feels like this really clicks for them. Reach out to me at robertvc at mit dot edu!