TL;DRAI

Anthropic cache_control markers enable prompt caching at 1.25× (5-min write), 0.10× per read; 2–3 cache hits yield 90% input savings on stable prefixes. For services reusing system prompts and context, hit frequency determines TTL choice: minute-scale traffic → 5-min, sparse (10–30 min) → 1-hour. Primary API cost lever.

The parent guide AI API caching covers the broader caching strategy; this article goes one level into Anthropic's specific implementation.

What it caches and why

Prompt caching is provider-side prefix-attention caching. When you send a request to Anthropic with cache_control: { type: "ephemeral" } on part of the prompt, Anthropic hashes the leading content up to that marker, checks an internal cache, and serves the cached attention state if a match exists. The actual model run still happens — Claude still generates the response token-by-token — but the expensive prefix-attention computation is skipped.

The "cache" here is not the response. It's the work the model does to encode the static context into the model's internal representation. Most production LLM workloads carry a long stable prefix (system prompt + retrieved context + tool definitions) followed by a short variable suffix (the user message). Re-encoding the stable prefix on every request is wasted compute. Anthropic charges less for the cached portion because it's doing less work.

The pricing math

dev.to

Anthropic prompt caching, explained: cache_control markers, the two-tier write premium, and when it actually pays off

How Anthropic's prompt cache works mechanically — the ephemeral cache_control marker, the two-tier write premium (1.25x for 5-min TTL, 2x fo

domenica 14 giugno 2026 New tab

TL;DRAI

2,559 words~12 min read

The parent guide AI API caching covers the broader caching strategy; this article goes one level into Anthropic's specific implementation.

What it caches and why

The pricing math

Anthropic prompt caching, explained: cache_control markers, the two-tier write premium, and when it actually pays off

Anthropic prompt caching, explained: cache_control markers, the two-tier write premium, and when it actually pays off

Related reading

5 Anthropic Prompt Caching Patterns That Cut My API Bill 70%

We Measured LLM Prompt Caching in Production — Same Prompt, 0% to 91% Hit Rates

Claude Prompt Caching: How to Cut API Costs (2026)

LLM Prompt Caching: The Complete 2026 Guide

Why your Anthropic prompt caching probably isn't working (and the npm package I…

Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain

Related reading

5 Anthropic Prompt Caching Patterns That Cut My API Bill 70%

We Measured LLM Prompt Caching in Production — Same Prompt, 0% to 91% Hit Rates

Claude Prompt Caching: How to Cut API Costs (2026)

LLM Prompt Caching: The Complete 2026 Guide

Why your Anthropic prompt caching probably isn't working (and the npm package I…

Claude Code cache confusion as Anthropic tweaks defaults, but quotas still drain