How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

We build news software — a content platform used by more than 200 publishers at Alesta WEB. Once we wired language models into the newsroom workflow (headline suggestions, summaries, SEO fields, tag extraction, draft scaffolding), something predictable happened: the AI bill started growing faster than the feature list.

The naive version of "add AI" is a thin wrapper around one expensive frontier model, called fresh on every request. It works in a demo. In production, across thousands of articles a day, it's a slow way to set money on fire.

This is the architecture we settled on after eighteen months of running it. Three layers — cache, batch, cascade — plus the quality gates that make the cheap layers safe to rely on. The result was roughly a 95% reduction in per-task cost versus the naive "frontier-model-only, no cache" baseline, with no measurable drop in editorial quality.

The code is PHP, because the platform is PHP. The ideas are language-agnostic.

How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

Related reading

How We Reduced Our LLM API Costs by 60%: What Actually Worked

How I Cut My LLM Costs by 90% Without Changing My App Logic

The $10,000 Lesson: Building Cost-Efficient AI Features with Function Calling…

LLM Cost Optimization: Cut AI Inference Costs 47–80% Without Sacrificing Quality

How I Cut My LLM API Costs by 70% Without Touching My Code

How I Cut My AI Bill by Caching LLM Responses in Node.js