Claude Code Costs, Act II — Where the big hidden costs are

A single-model session that stays well-cached is cheap. The biggest swing in a multi-model bill comes from one move — switching models — because the prompt cache belongs to a single model. The instant you switch, the cache you already paid for is thrown away.

Whether that helps or hurts comes down to how you switch:

Sticky — keep a whole conversation (or a sub-agent) on one model. The cheaper model builds and reuses its own warm cache, so the bill goes down.

Per-turn bouncing — flip between models mid-conversation. It's not a full cold rebuild each time: only the first call to each model is fully cold, and returning to one you've already used re-reads that model's own warm prefix and writes only the catch-up since you left (so long as you're back within the ~20-block / 1-hour reuse window — see first call cold, then warm catch-up below). But you're now keeping two caches warm and paying a catch-up write on every flip, so at the 2× write rate it can still cost more than never switching.

Our 25-turn run shows both ends: bouncing 20% of turns to Sonnet lost ~2%, while keeping the whole run on Sonnet saved ~53% (on Haiku, ~85%) — see Act III. So the goal isn't to avoid switching; it's to switch the right way. The costly version is the one that sneaks in by accident: a router that picks a model per request, or a manual swap partway through a conversation.

Whether that helps or hurts comes down to how you switch:

Sticky — keep a whole conversation (or a sub-agent) on one model. The cheaper model builds and reuses its own warm cache, so the bill goes down.

Claude Code Costs, Act II — Where the big hidden costs are

Claude Code Costs, Act II — Where the big hidden costs are

Related reading

Claude Code Costs, Act III — The ecosystem of options for spending less

I ran a single Claude Code session for 1,270 turns. It cost $1,278. Here's the…

Claude Code Costs, Act I — How the billing actually works

The Claude Code cost formula: why the same session can cost 10x more tomorrow

Adapting Your Claude Code Workflow by Subscription: Pro, Max $100, Max $200

Prompt caching vs the long LLM conversation: where your input bill actually…

Related reading

Claude Code Costs, Act III — The ecosystem of options for spending less

I ran a single Claude Code session for 1,270 turns. It cost $1,278. Here's the…

Claude Code Costs, Act I — How the billing actually works

The Claude Code cost formula: why the same session can cost 10x more tomorrow

Adapting Your Claude Code Workflow by Subscription: Pro, Max $100, Max $200

Prompt caching vs the long LLM conversation: where your input bill actually…