How prompt caching actually works
When an LLM processes your input, it doesn't just read and forget. For tokens that appear in the same position across multiple requests, the model can reuse its previous computation. This is called prefix caching.
Request 1: [System Prompt] [Conversation Turn 1] [Turn 2]
└── 260K tokens computed from scratch ──┘
Cost: expensive











