How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

We build news software — a content platform used by more than 200 publishers at Alesta WEB. Once we wired language models into the newsroom workflow (headline suggestions, summaries, SEO fields, tag extraction, draft scaffolding), something predictable happened: the AI bill started growing faster than the feature list.

The naive version of "add AI" is a thin wrapper around one expensive frontier model, called fresh on every request. It works in a demo. In production, across thousands of articles a day, it's a slow way to set money on fire.

This is the architecture we settled on after eighteen months of running it. Three layers — cache, batch, cascade — plus the quality gates that make the cheap layers safe to rely on. The result was roughly a 95% reduction in per-task cost versus the naive "frontier-model-only, no cache" baseline, with no measurable drop in editorial quality.

The code is PHP, because the platform is PHP. The ideas are language-agnostic.