Measuring learning: Bayesian Knowledge Tracing in practice

"The student finished the lesson" is not the same as "the student learned the lesson." Any serious learning platform has to confront this gap. At MiaCortex, our answer is Bayesian Knowledge Tracing (BKT), a model that has been used in intelligent tutoring systems since the 1990s and remains the workhorse of adaptive learning.

The model

BKT models each discrete skill as a hidden binary state: the student either knows it or does not. Every time the student attempts a problem that tests that skill, the model observes a noisy signal (correct or incorrect) and updates its belief about the hidden state.

The model has four parameters per skill:

p(L0) — prior probability the student knew the skill before any practice.
p(T) — probability the student transitions from not-knowing to knowing on a single practice attempt.
p(G) — probability of guessing correctly despite not knowing.
p(S) — probability of a careless slip despite knowing.

Given a sequence of observations, standard Bayesian updates produce a posterior estimate of p(knows the skill). When that posterior crosses a threshold (typically 0.95), we consider the skill mastered.

Why BKT and not a deep model

Deep Knowledge Tracing (DKT) and its successors outperform BKT on held-out next-problem prediction benchmarks. But next-problem prediction is a proxy task — what we actually care about is delivering useful recommendations to the student. For that, BKT has three virtues that matter more than raw accuracy:

Interpretability — we can explain to an instructor why a student is being shown a specific problem.
Calibration — BKT's probabilities track reality. DKT's do not, out of the box.
Small-data stability — BKT works with dozens of observations per student. Deep models need thousands.

What we use it for

The mastery estimate drives two decisions:

Problem selection — the next problem is one that targets the lowest-confidence skill in the student's current topic.
Progress communication — the student sees a per-skill mastery meter, not just a count of problems solved.

The second one matters more than we expected. Students who see a per-skill meter behave differently than students who see a problem count: they go back and practice weak skills instead of chasing streaks.

Limitations

BKT assumes skills are independent. They are not. Sorting depends on comparisons; recursion depends on function calls. A more faithful model would be a Bayesian network over a skill graph. This is on our roadmap.