The Human in the Loop Doesn't Scale. I Kept Him Anyway.

What it costs to be the last reviewer your own system has

Part of the ForgeFlow series — building a coding agent that runs its execution loop locally on an M5 Max, and writing down what actually breaks. Planning runs on a frontier model; code generation runs on a local model via Ollama, test-driven inside a Docker sandbox.

In the last post, I described how my agent's rulebook learned to forget — rules that age out get flagged, and a human decides whether to retire them. I ended on a line that's been nagging me ever since: the whole thing works because I'm still small enough to read everything, and I suspect that doesn't last.

This post is about that suspicion. It's about the design decision I keep reaching for — keep a human in the loop — and the uncomfortable thing underneath it: that human is me, I am exactly one person, and "put a human on it" is not a plan. It's a debt I haven't been billed for yet.

"Keep a human in the loop" is the answer I keep reaching for

What it costs to be the last reviewer your own system has

"Keep a human in the loop" is the answer I keep reaching for

The Human in the Loop Doesn't Scale. I Kept Him Anyway.

The Human in the Loop Doesn't Scale. I Kept Him Anyway.

Related reading

What I Learned by Deleting Rules My Agent Had Already Learned

When pytest Said "Passed," It Was Lying

I spent ten days forcing tiny local models to write real code. Here's what…

The Harness Flywheel: How Reviews Become Rules

Forge Review: 8B Local Model Hits 99% on Agentic Tasks

The Gate Fired 198 Times. I Called It "Working."

Related reading

What I Learned by Deleting Rules My Agent Had Already Learned

When pytest Said "Passed," It Was Lying

I spent ten days forcing tiny local models to write real code. Here's what…

The Harness Flywheel: How Reviews Become Rules

Forge Review: 8B Local Model Hits 99% on Agentic Tasks

The Gate Fired 198 Times. I Called It "Working."