The problem with calling an LLM directly
NumPath's teacher dashboard generates per-student insights — one-sentence observations like "Emma skips borrowing in 9 of 11 recent subtraction attempts" with a suggested action. The obvious implementation is to import the Anthropic SDK, call messages.create(), and return the result.
That works until you need to test it. Or run it offline. Or swap providers. Or audit where the insight came from.
This post covers how NumPath abstracts the LLM behind a protocol interface, tests with a deterministic stub, and structures the insight pipeline so the evidence is assembled from database reads — not generated by the model.
The Protocol: 6 lines






