I ran a 2-billion-parameter language model entirely inside a browser tab.
No server. No API key. No cloud. Completely offline, Just Chrome, WebGPU, and my laptop's GPU generating tokens locally.
This was not a frontend talking to a hidden backend. The model loaded on my machine and replied at around 20+ tokens/sec on an M1 MacBook Pro.
That means private AI, no inference bill, and no waiting on someone else's backend.
The first time it worked, it felt slightly illegal.











