What building an LLM inference engine from scratch taught me about compiler design

the insight that started this project hit me while i was finishing a bytecode-compiled language i'd written in C

i'd spent months building a hand-written lexer, a single-pass Pratt compiler, a stack VM with 35 opcodes, and a mark-and-sweep garbage collector. and right near the end i had this realization: an LLM inference engine is the same problem. it's a graph-compile plus memory-plan plus kernel-schedule problem. i'd just built one

so i decided to find out if that was actually true

the project

the result is ignis, a from-scratch LLM inference engine in Rust. i used it specifically to see how far the compiler analogy held up. the dependency count ended up at 2: memmap2 (to mmap the weight blob off disk) and fancy-regex (for one look-ahead in the BPE tokenizer). everything else is hand-written, because the whole point was to understand what's actually happening

the insight that started this project hit me while i was finishing a bytecode-compiled language i'd written in C

so i decided to find out if that was actually true

the project

What building an LLM inference engine from scratch taught me about compiler design

What building an LLM inference engine from scratch taught me about compiler design

Related reading

I built an interactive 11-chapter guide to how LLM inference actually works

I built a programming language from raw assembly — and it beats C by 6.6 on LZ77

I Got Tired of LLMs Hallucinating Compliance, So I Built an Open-Source…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

The compiler caught a lot. It didn't catch enough.

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

Related reading

I built an interactive 11-chapter guide to how LLM inference actually works

I built a programming language from raw assembly — and it beats C by 6.6 on LZ77

I Got Tired of LLMs Hallucinating Compliance, So I Built an Open-Source…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

The compiler caught a lot. It didn't catch enough.

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…