At some point your AI bill stops being a rounding error and someone asks the obvious question: who spent what, on which model, doing what? Answering it means putting something between your developers and the providers — or putting something next to the providers. There are three common shapes, and the choice has real consequences for latency, failure modes, and what data you end up holding. Most teams pick one by accident and regret it later. Here's how to pick one on purpose.
The three shapes
1. Inline proxy. You stand up a service that every LLM request flows through. It forwards to OpenAI/Anthropic/OpenRouter, reads the response, records tokens and cost, and returns the completion. LiteLLM-style gateways do this.
2. SDK / wrapper instrumentation. You wrap the client library so each call emits a metric before returning. No separate network hop, but every call site has to use your wrapper.
3. Usage-API polling. You touch the request path not at all. Instead you periodically read the provider's own metering API — the usage and activity endpoints most providers already expose — and reconstruct who-spent-what from data the platform computed for you.








