Maker disclosure: I build Macrokit (Apache-2.0, fully open). This post is the pattern, not a pitch — there's nothing to buy. Links at the end; the demo is keyless and runs entirely in your browser, so you can verify every claim here in your own network tab.
Open one link, and a ~0.5–7B model running in your browser — no signup, no API key, no server, nothing installed — does GitHub-maintainer work you'd assume needs a frontier model: triaging the newest PR on a public repo, proposing labels, summarizing open issues. Open your network tab while it runs and you'll see the only outbound traffic is the model weights downloading once and public GitHub reads. No inference server. No key, mine or yours.
That demo isn't a trick, and it isn't "weak models are secretly as smart as GPT." It's a structural choice about where the reasoning happens. Here's the whole idea.
The forced choice everyone starts with
If you deploy an LLM app today you pick a side:






