TL;DRAI

Semantic caching riduce i costi API LLM del 60% con hit rate >30%, perché formulazioni diverse della stessa query trovano risposte cachate tramite embedding similarity. Per un manager tech: la spesa AI è dominata da input token ridondanti (system prompt, documenti); semantic caching è la leva principale, non output compression o model switching.

At some point in most of our production AI projects, someone looks at the monthly API bill and asks whether we can do something about it. The answer is always yes — but the specific answers vary a lot depending on what you are actually spending the money on.

This post covers the techniques that moved the needle for us, in rough order of impact. Some of these are obvious in retrospect. A few took longer than they should have to figure out.

Where the money actually goes

Before optimising anything, you need to know what is driving your costs. LLM API pricing is based on tokens — input tokens and output tokens, usually priced differently, with output tokens costing more.

In most production systems we have built, the cost breakdown looks something like this: a large fraction of input tokens are repetitive context — the same system prompt, the same retrieved documents, the same few-shot examples — sent with every request. Output tokens are often smaller than people expect, because most real-world tasks involve classification, extraction, or short-form generation rather than long prose.

dev.to

How We Reduced Our LLM API Costs by 60%: What Actually Worked

At some point in most of our production AI projects, someone looks at the monthly API bill and asks...

lunedì 29 giugno 2026 New tab

TL;DRAI

1,672 words~8 min read

This post covers the techniques that moved the needle for us, in rough order of impact. Some of these are obvious in retrospect. A few took longer than they should have to figure out.

Where the money actually goes

How We Reduced Our LLM API Costs by 60%: What Actually Worked

How We Reduced Our LLM API Costs by 60%: What Actually Worked

Related reading

How I Cut My LLM API Costs by 70% Without Touching My Code

How I Cut Our AI API Bill by 95%: What Actually Worked

10 Ways To Reduce Your LLM API Costs

How I Cut LLM API Costs by 60% With 2 Lines of Code

How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…

Related reading

How I Cut My LLM API Costs by 70% Without Touching My Code

How I Cut Our AI API Bill by 95%: What Actually Worked

10 Ways To Reduce Your LLM API Costs

How I Cut LLM API Costs by 60% With 2 Lines of Code

How We Reduced LLM Costs by 95%: Cache + Batch + Cascade in PHP

My LLM API Bill Hit $847/Month. Here is the Open-Source Proxy That Cut It to…