Your AI bill, minus the AI you've already paid for

If you watch your AI API spend for a week, two things become obvious. The first is that costs scale linearly with traffic, which everyone expects. The second is that a lot of that traffic is the same traffic. The same support question, asked by ten different users in slightly different words. The same system prompt, prepended to every single request. The same internal tool query, run on a cron every five minutes. You're paying full price for the model to compute the same answer it computed yesterday.

Caching is the difference between paying for an AI call once and paying for it every time. It's not a clever optimization — it's the bare minimum. The interesting question is which caching, with what tradeoffs, at what layers.

Three layers, three tradeoffs

There's no single "AI cache." There are three layers that solve overlapping but distinct problems.

Exact match. If the request you're making has been made before, byte-identical, return the previous response. This is how etag works on the web. It's instant, free, and never wrong. The catch is that it almost never hits in production AI traffic — even tiny variations like a different timestamp in the prompt, a user-specific name, or a re-ordered tool list make the requests non-identical. Exact-match cache hit rates in real-world AI APIs are typically in the 5–15% range.

Three layers, three tradeoffs

There's no single "AI cache." There are three layers that solve overlapping but distinct problems.

Your AI bill, minus the AI you've already paid for

Your AI bill, minus the AI you've already paid for

Related reading

AI API cost control is a routing problem, not a pricing spreadsheet

I Cut My AI API Bill from $420 to $28/Month — Here's Exactly How

AI Agent Cost Is a Runtime Signal | Focused Labs

How to Track AI Usage Without Losing Revenue (Complete Guide)

Nobody keeps the receipts for AI pricing, so I built the changelog

Proxy, Gateway, or Poll the Usage API? Picking an Architecture for AI Cost…

Related reading

AI API cost control is a routing problem, not a pricing spreadsheet

I Cut My AI API Bill from $420 to $28/Month — Here's Exactly How

AI Agent Cost Is a Runtime Signal | Focused Labs

How to Track AI Usage Without Losing Revenue (Complete Guide)

Nobody keeps the receipts for AI pricing, so I built the changelog

Proxy, Gateway, or Poll the Usage API? Picking an Architecture for AI Cost…