TL;DR: Our metrics bill went 6x in a single month. Traffic was flat. One Prometheus label carrying per-build IDs spawned millions of time series, and the backend charges by active series. Here's how we caught it and the label rules we run now so it doesn't happen again.
The bill, not the traffic
I'm on the infra team at Buildkite. We run a fairly chunky Prometheus setup feeding a managed backend, and one Monday the monthly estimate had quietly gone from about $1,800 to a touch over $11k. Nobody shipped more traffic. Build volume was the same 40k-ish builds a day it'd been for weeks.
So it wasn't load. It was series count. Active series had climbed from roughly 1.2 million to nearly 9 million, and the backend prices on active series, not on request volume. That's the trap most people miss the first time.
What cardinality actually is






