We Cut Our LLM API Bill 30% With Four Lines of YAML

Our gateway handles a few thousand LLM calls per hour. Mostly internal tools, some customer-facing agents. We noticed something in the logs: a lot of prompts were basically the same question worded differently.

"Summarize this quarterly report" and "give me a summary of the Q2 report" hitting the same model, getting nearly identical responses, costing us twice. Multiply that across a few hundred users and it adds up fast.

The math on duplicate calls

Quick back-of-envelope. GPT-4o runs \$2.50 per million input tokens, \$10 per million output. Claude Sonnet is \$3/\$15. A typical summarization request with context is maybe 2K input tokens and 500 output. That's roughly \$0.007 per call on GPT-4o.

Doesn't sound like much until you're doing 50K calls a day and 30-40% of them are semantically identical. That's \$100+/day in duplicate spend. \$3K/month. For responses you already generated.

The math on duplicate calls

Doesn't sound like much until you're doing 50K calls a day and 30-40% of them are semantically identical. That's \$100+/day in duplicate spend. \$3K/month. For responses you already generated.

We Cut Our LLM API Bill 30% With Four Lines of YAML

We Cut Our LLM API Bill 30% With Four Lines of YAML

Related reading

Reducing LLM Costs: Best Practices and Techniques

How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

LLM Cost Optimization: Cut AI Inference Costs 47–80% Without Sacrificing Quality

Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the…

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…

Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers

Related reading

Reducing LLM Costs: Best Practices and Techniques

How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

LLM Cost Optimization: Cut AI Inference Costs 47–80% Without Sacrificing Quality

Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the…

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…

Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers