I spent ten days forcing tiny local models to write real code. Here's what actually breaks.

I had a thought a few weeks ago that wouldn't leave me alone - I depend on Claude Code every day. If it disappeared tomorrow, priced out, rate limited, whatever, I'd want a fallback I actually own. Not a cheaper subscription. Something that runs on my own hardware, forever, at zero marginal cost.

So I started an experiment. A coding harness where every reasoning call goes to a tiny local model (Gemma 4 2B, served by llama.cpp on a Jetson Orin Nano), and the harness does everything it can to make up the difference. One hard rule - no cloud fallback, ever. If the small model can't do something, decompose the work or move it into deterministic code. Never escalate to a bigger model.

The bet isn't mine alone. Projects like little-coder and NVIDIA's small-model research make the same wager - small models underperform agentic work because their harnesses are thin, not because the models are incapable. I wanted to find out exactly how true that is, with numbers I could trust. Ten days in, here's what I've learned.

The harness was throwing away right answers

My biggest early win wasn't making the model smarter. It was noticing that about 60% of my failures were the model producing correct logic with broken indentation. The module wouldn't even import, so it scored as a fail. The right answer was sitting there and my harness was discarding it over whitespace.

The harness was throwing away right answers

I spent ten days forcing tiny local models to write real code. Here's what actually breaks.

I spent ten days forcing tiny local models to write real code. Here's what actually breaks.

Related reading

I tried to build a SaaS. I'm shipping tiny libraries instead.

I Versioned the Way I Think. Then I Forced It to Comply.

How I Built a Self-Improving Coding Agent with Claude Code: 5 Lessons After 6…

How I built my own Claude code in Typescript

Trust the harness, not the model: a weekend of local agents building their own…

Building with mini, Part 0: Why a Minimalist Orchestrator for Claude Code

Related reading

I tried to build a SaaS. I'm shipping tiny libraries instead.

I Versioned the Way I Think. Then I Forced It to Comply.

How I Built a Self-Improving Coding Agent with Claude Code: 5 Lessons After 6…

How I built my own Claude code in Typescript

Trust the harness, not the model: a weekend of local agents building their own…

Building with mini, Part 0: Why a Minimalist Orchestrator for Claude Code