BKT told us how well a student knows subtraction-with-borrowing. It had no idea that a student who reverses digits on subtraction problems probably also reverses them on place value problems — because BKT treats every Knowledge Component as an island.

Deep Knowledge Tracing (DKT) fixes that. Instead of four independent scalar parameters per KC, it maintains a shared LSTM hidden vector across all KCs and learns the dependencies from data. This is Phase 3 of NumPath: swapping out the Markov model for a neural sequence model.

Here's what we built, the design decision that almost made us reach for a transformer, and the student simulator we had to build first to test it without any real students.

What We Built

Two components that feed each other: