I built an interactive 11-chapter guide to how LLM inference actually works

Production vLLM is 100,000+ lines of C++, CUDA, and Python. It powers most of the industry's LLM serving — but reading it cold is brutal.

So I built a study series around nano-vLLM, an open-source reimplementation of vLLM's core ideas in ~1,200 lines of pure Python. Every algorithm is visible. Every design decision is legible. It turned out to be the perfect lens for actually understanding how LLMs generate text.

The result is an 11-chapter interactive guide. No ML background required — every piece of jargon is explained from scratch with analogies, diagrams, annotated source code, interactive simulators, and quizzes.

What it covers:

What Is LLM Inference? — tokens, autoregressive generation, Q/K/V attention, HBM vs SRAM

Production vLLM is 100,000+ lines of C++, CUDA, and Python. It powers most of the industry's LLM serving — but reading it cold is brutal.

What it covers:

What Is LLM Inference? — tokens, autoregressive generation, Q/K/V attention, HBM vs SRAM

I built an interactive 11-chapter guide to how LLM inference actually works

I built an interactive 11-chapter guide to how LLM inference actually works

Related reading

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

LLM output validation: 5 patterns that actually work in production

How LLMs Actually Work: A Developer's Mental Model

Two months building an investment bot. What it taught me about LLMs

Comparing LLM Inference APIs: Cost, Performance, and More

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Related reading

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

LLM output validation: 5 patterns that actually work in production

How LLMs Actually Work: A Developer's Mental Model

Two months building an investment bot. What it taught me about LLMs

Comparing LLM Inference APIs: Cost, Performance, and More

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…