I ran a 2-billion-parameter language model entirely inside a browser tab.

No server. No API key. No cloud. Completely offline, Just Chrome, WebGPU, and my laptop's GPU generating tokens locally.

This was not a frontend talking to a hidden backend. The model loaded on my machine and replied at around 20+ tokens/sec on an M1 MacBook Pro.

That means private AI, no inference bill, and no waiting on someone else's backend.

The first time it worked, it felt slightly illegal.