C Boz

Posted on Jun 19

• Originally published at dev.to

Build-in-public. This is the real architecture behind running ~50 local AI agents on 6GB of VRAM — one GPU lock, an eviction watchdog, a resource governor, and a model router. Originally posted on my blog.

The question I get most often is some version of "there's no way you run that many agents on a 6GB laptop GPU." The honest answer: not the way you're picturing it. I don't run 50 models at once. I run one model at a time, very deliberately — and most of the engineering is about scheduling, not inference. Here's the actual architecture.