Proxy, Gateway, or Poll the Usage API? Picking an Architecture for AI Cost Visibility

At some point your AI bill stops being a rounding error and someone asks the obvious question: who spent what, on which model, doing what? Answering it means putting something between your developers and the providers — or putting something next to the providers. There are three common shapes, and the choice has real consequences for latency, failure modes, and what data you end up holding. Most teams pick one by accident and regret it later. Here's how to pick one on purpose.

The three shapes

1. Inline proxy. You stand up a service that every LLM request flows through. It forwards to OpenAI/Anthropic/OpenRouter, reads the response, records tokens and cost, and returns the completion. LiteLLM-style gateways do this.

2. SDK / wrapper instrumentation. You wrap the client library so each call emits a metric before returning. No separate network hop, but every call site has to use your wrapper.

3. Usage-API polling. You touch the request path not at all. Instead you periodically read the provider's own metering API — the usage and activity endpoints most providers already expose — and reconstruct who-spent-what from data the platform computed for you.

Proxy, Gateway, or Poll the Usage API? Picking an Architecture for AI Cost Visibility

Related reading

How FinOps Teams Trace Per-Request AI Costs Through Multi-Tenant Gateways

Enterprise LLM Gateway: Route, govern, and secure your AI traffic

6 AI Gateways Compared for 2026: Routing, Governance, Caching, and Observability

What Is an AI Gateway? (And the Week We Realized We Desperately Needed One)

AI API gateway vendor evaluation checklist for SaaS teams

I built a simple AI proxy to cut API costs — here's what I learned