Most teams I talk to run every LLM call through one frontier model, then act surprised when the invoice shows up. We did the same thing for a while. The fix that actually moved our bill was boring: route each request to the cheapest model that can still do the job. Here is the math and how we set it up.
The price spread is bigger than people assume
If you line up current API pricing across providers, the gap between budget and frontier models for comparable output is roughly 50x per token. Output tokens also cost more than input, usually in the 4-6x range, which matters a lot if your app generates long responses.
So the question is not "which model is best." It is "which model is good enough for this request, at what cost." For a support reply, a classification, or a short summary, a mid-tier model often produces output you cannot distinguish from the frontier one in a blind test. You are paying frontier prices for work a cheaper model finishes fine.
What routing looks like in practice






