LLMs generate text one token at a time.

That sounds simple.

But without KV Cache, every new token would repeat a lot of old work.

That is why inference optimization starts with keys and values.

Core Idea