Most "AI games" phone home. Every turn is an API round-trip, every player burns your tokens, and the whole thing dies the day the bill scares you. I wanted the opposite: a text roguelike where the dungeon master is an LLM that runs entirely in the player's browser — no server, no API key, no per-token cost, and it keeps working offline after the first load.

Here's the architecture and the one bug that taught me the most.

The core trick: WebLLM + WebGPU

WebLLM compiles quantized models to WebGPU, so inference runs on the player's GPU. There is no backend at all.

const cdn = "https://esm.run/@mlc-ai/web-llm";