An explanation of the KV cache memory problem in production LLM serving and how PagedAttention (the technique behind vLLM) solves it with OS-inspired virtual memory paging.