Most "AI games" phone home. Every turn is an API round-trip, every player burns your tokens, and the whole thing dies the day the bill scares you. I wanted the opposite: a text roguelike where the dungeon master is an LLM that runs entirely in the player's browser — no server, no API key, no per-token cost, and it keeps working offline after the first load.
Here's the architecture and the one bug that taught me the most.
The core trick: WebLLM + WebGPU
WebLLM compiles quantized models to WebGPU, so inference runs on the player's GPU. There is no backend at all.
const cdn = "https://esm.run/@mlc-ai/web-llm";






