Cutting our LLM bill ~80% with model routing: the actual cost math

Most teams I talk to run every LLM call through one frontier model, then act surprised when the invoice shows up. We did the same thing for a while. The fix that actually moved our bill was boring: route each request to the cheapest model that can still do the job. Here is the math and how we set it up.

The price spread is bigger than people assume

If you line up current API pricing across providers, the gap between budget and frontier models for comparable output is roughly 50x per token. Output tokens also cost more than input, usually in the 4-6x range, which matters a lot if your app generates long responses.

So the question is not "which model is best." It is "which model is good enough for this request, at what cost." For a support reply, a classification, or a short summary, a mid-tier model often produces output you cannot distinguish from the frontier one in a blind test. You are paying frontier prices for work a cheaper model finishes fine.

What routing looks like in practice

The price spread is bigger than people assume

What routing looks like in practice

Cutting our LLM bill ~80% with model routing: the actual cost math

Cutting our LLM bill ~80% with model routing: the actual cost math

Related reading

How I Cut My LLM API Bill by 80% With a Simple Router

Stop getting surprise per-token LLM bills: a flat-rate, auto-routing API…

How We Reduced Our LLM API Costs by 60%: What Actually Worked

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…

We Cut Our LLM API Bill 30% With Four Lines of YAML

LLM API cost attribution playbook for production SaaS teams

Related reading

How I Cut My LLM API Bill by 80% With a Simple Router

Stop getting surprise per-token LLM bills: a flat-rate, auto-routing API…

How We Reduced Our LLM API Costs by 60%: What Actually Worked

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…

We Cut Our LLM API Bill 30% With Four Lines of YAML

LLM API cost attribution playbook for production SaaS teams