BKT told us how well a student knows subtraction-with-borrowing. It had no idea that a student who reverses digits on subtraction problems probably also reverses them on place value problems — because BKT treats every Knowledge Component as an island.
Deep Knowledge Tracing (DKT) fixes that. Instead of four independent scalar parameters per KC, it maintains a shared LSTM hidden vector across all KCs and learns the dependencies from data. This is Phase 3 of NumPath: swapping out the Markov model for a neural sequence model.
Here's what we built, the design decision that almost made us reach for a transformer, and the student simulator we had to build first to test it without any real students.
What We Built
Two components that feed each other:













