A local model opened 41 of our pull requests in five weeks. The model is the least interesting part.

Between May 21 and June 25, a fleet of local models running on a Mac and an AMD mini-PC opened 41 pull requests we merged into our Kubernetes operator, at zero cloud-inference cost. The models are a coin flip. This is a case study in why that does not matter.

giovedì 25 giugno 2026 New tab

This was originally published on the LLMKube blog.

Here is the claim, up front and checkable: between May 21 and June 25, 2026, a fleet of local models opened 41 pull requests that we merged into LLMKube, our open-source Kubernetes operator for self-hosted inference. No code or prompts left the building. The marginal inference cost was a few cents of electricity. Across those five weeks they were about a fifth of everything merged into the repo, and closer to half in the busiest recent stretch, sitting next to pull requests from five human contributors who showed up in the same weeks.

If you have used a 27-billion-parameter open-weight model as a coding agent, your first reaction is correct skepticism. A model that size is a coin flip on a non-trivial issue. It drifts. It writes tests that do not test anything. It declares victory on code that does not compile.

That is all true, and it is also beside the point. We never bet on the model. We bet on the harness around it. This post is the evidence for that bet, including the parts where it failed.

The setup: a weak model, a strict harness, heterogeneous hardware

This was originally published on the LLMKube blog.

That is all true, and it is also beside the point. We never bet on the model. We bet on the harness around it. This post is the evidence for that bet, including the parts where it failed.

The setup: a weak model, a strict harness, heterogeneous hardware

A local model opened 41 of our pull requests in five weeks. The model is the least interesting part.

A local model opened 41 of our pull requests in five weeks. The model is the least interesting part.

Related reading

Trust the harness, not the model: a weekend of local agents building their own…

A 27B model on an AMD mini-PC fixed a bug in our operator. Then it overreached.

We got local models to triage the OpenClaw repo for FREE!*

Ollama's Chinese Model Support Is Real — But Running Kimi and DeepSeek Locally…

5 Best Local LLM Tools and Models You Should Run in 2026

I built an open-source alternative to Microsoft's KAITO that works on ANY…

Related reading

Trust the harness, not the model: a weekend of local agents building their own…

A 27B model on an AMD mini-PC fixed a bug in our operator. Then it overreached.

We got local models to triage the OpenClaw repo for FREE!*

Ollama's Chinese Model Support Is Real — But Running Kimi and DeepSeek Locally…

5 Best Local LLM Tools and Models You Should Run in 2026

I built an open-source alternative to Microsoft's KAITO that works on ANY…