This was originally published on the LLMKube blog.
Here is the claim, up front and checkable: between May 21 and June 25, 2026, a fleet of local models opened 41 pull requests that we merged into LLMKube, our open-source Kubernetes operator for self-hosted inference. No code or prompts left the building. The marginal inference cost was a few cents of electricity. Across those five weeks they were about a fifth of everything merged into the repo, and closer to half in the busiest recent stretch, sitting next to pull requests from five human contributors who showed up in the same weeks.
If you have used a 27-billion-parameter open-weight model as a coding agent, your first reaction is correct skepticism. A model that size is a coin flip on a non-trivial issue. It drifts. It writes tests that do not test anything. It declares victory on code that does not compile.
That is all true, and it is also beside the point. We never bet on the model. We bet on the harness around it. This post is the evidence for that bet, including the parts where it failed.
The setup: a weak model, a strict harness, heterogeneous hardware






