Drift Detection for LLM Routing: Catching Silent Model Degradation

It's 2am and I am staring at a routing layer I spent weeks tuning, running a thought experiment that will not let me sleep. The router is doing exactly what I built it to do. Nothing in my code would change, nothing in my config would change, and yet I can see, plain as day, the night this system goes confidently, repeatedly wrong while every line of it stays correct. The failure is already baked in. I just have not been bitten by it yet.

The setup is simple. I route incoming tasks across four capabilities: a fast cheap model, a slow expensive one, a retrieval tool, and a code-execution agent. Each task goes to one of them, and I watch a single binary signal, did the output pass the quality gate or not. Run that for a few thousand calls and the policy converges, the weights stabilize, the dispatcher learns which arm wins. For a while, life is good. And that good stretch is exactly the trap.

Here is the scenario that keeps me up. The fast cheap model gets silently updated by its vendor, and its accuracy on my tasks quietly collapses. My router has no idea. It is carrying a high historical success estimate for that arm, earned honestly over weeks of good performance, and it keeps routing there because three weeks ago that was the right call. The dispatcher would not be broken. It would be right about a world that no longer existed. It would be wrong because it remembered too well.

Drift Detection for LLM Routing: Catching Silent Model Degradation

Drift Detection for LLM Routing: Catching Silent Model Degradation

Related reading

Detecting Silent Model Failure: Drift Monitoring That Actually Works

Your AI Agent Drifted Last Night and You Didn't Notice

Your Agent Didn't Break, It Drifted: Detecting Slow Decay in Autonomous Systems

Echo: results so far

LLM Gateways: Routing, Fallbacks, And Semantic Caching

Evaluating LLMs in production: From drift detection to continuous monitoring

Related reading

Detecting Silent Model Failure: Drift Monitoring That Actually Works

Your AI Agent Drifted Last Night and You Didn't Notice

Your Agent Didn't Break, It Drifted: Detecting Slow Decay in Autonomous Systems

Echo: results so far

LLM Gateways: Routing, Fallbacks, And Semantic Caching

Evaluating LLMs in production: From drift detection to continuous monitoring