A single-model session that stays well-cached is cheap. The biggest swing in a multi-model bill comes from one move — switching models — because the prompt cache belongs to a single model. The instant you switch, the cache you already paid for is thrown away.
Whether that helps or hurts comes down to how you switch:
Sticky — keep a whole conversation (or a sub-agent) on one model. The cheaper model builds and reuses its own warm cache, so the bill goes down.
Per-turn bouncing — flip between models mid-conversation. It's not a full cold rebuild each time: only the first call to each model is fully cold, and returning to one you've already used re-reads that model's own warm prefix and writes only the catch-up since you left (so long as you're back within the ~20-block / 1-hour reuse window — see first call cold, then warm catch-up below). But you're now keeping two caches warm and paying a catch-up write on every flip, so at the 2× write rate it can still cost more than never switching.
Our 25-turn run shows both ends: bouncing 20% of turns to Sonnet lost ~2%, while keeping the whole run on Sonnet saved ~53% (on Haiku, ~85%) — see Act III. So the goal isn't to avoid switching; it's to switch the right way. The costly version is the one that sneaks in by accident: a router that picks a model per request, or a manual swap partway through a conversation.






