Detecting Silent Model Failure: Drift Monitoring That Actually Works

TL;DR: Most drift monitoring setups alert on the wrong thing. Feature distribution drift is cheap to compute and almost always misleading. Prediction drift plus a delayed ground-truth feedback loop catches the failures that actually cost money. Here is the setup I use at Yokoy.

A model that returns HTTP 200 with a plausible-looking float is the worst kind of broken. No exception, no pager, no Slack message. The metric only moves three weeks later when finance reviews the numbers.

I have spent the last two years rebuilding the monitoring story for our expense classification models. What follows is what I kept after throwing out the rest.

The mistake I keep seeing

Teams instrument input feature drift first because it is the easiest thing to compute. Pull yesterday's feature values, pull today's, run a KS test on each column, alert when p < 0.05.

I have spent the last two years rebuilding the monitoring story for our expense classification models. What follows is what I kept after throwing out the rest.

The mistake I keep seeing

Teams instrument input feature drift first because it is the easiest thing to compute. Pull yesterday's feature values, pull today's, run a KS test on each column, alert when p < 0.05.

Detecting Silent Model Failure: Drift Monitoring That Actually Works

Detecting Silent Model Failure: Drift Monitoring That Actually Works

Related reading

Drift Detection for LLM Routing: Catching Silent Model Degradation

Your Agent Didn't Break, It Drifted: Detecting Slow Decay in Autonomous Systems

Silent Model Swaps Are Eating Your LLM Budget — How to Detect Model Drift in…

Your AI Agent Drifted Last Night and You Didn't Notice

Silent Drift in Agent Decision Quality: Catching It Before Your Users Do

Stop Shipping ML Models With Bare Floats: A Deep Dive Into Statistically…

Related reading

Drift Detection for LLM Routing: Catching Silent Model Degradation

Your Agent Didn't Break, It Drifted: Detecting Slow Decay in Autonomous Systems

Silent Model Swaps Are Eating Your LLM Budget — How to Detect Model Drift in…

Your AI Agent Drifted Last Night and You Didn't Notice

Silent Drift in Agent Decision Quality: Catching It Before Your Users Do

Stop Shipping ML Models With Bare Floats: A Deep Dive Into Statistically…