TL;DR: Most drift monitoring setups alert on the wrong thing. Feature distribution drift is cheap to compute and almost always misleading. Prediction drift plus a delayed ground-truth feedback loop catches the failures that actually cost money. Here is the setup I use at Yokoy.

A model that returns HTTP 200 with a plausible-looking float is the worst kind of broken. No exception, no pager, no Slack message. The metric only moves three weeks later when finance reviews the numbers.

I have spent the last two years rebuilding the monitoring story for our expense classification models. What follows is what I kept after throwing out the rest.

The mistake I keep seeing

Teams instrument input feature drift first because it is the easiest thing to compute. Pull yesterday's feature values, pull today's, run a KS test on each column, alert when p < 0.05.