What it costs to be the last reviewer your own system has

Part of the ForgeFlow series — building a coding agent that runs its execution loop locally on an M5 Max, and writing down what actually breaks. Planning runs on a frontier model; code generation runs on a local model via Ollama, test-driven inside a Docker sandbox.

In the last post, I described how my agent's rulebook learned to forget — rules that age out get flagged, and a human decides whether to retire them. I ended on a line that's been nagging me ever since: the whole thing works because I'm still small enough to read everything, and I suspect that doesn't last.

This post is about that suspicion. It's about the design decision I keep reaching for — keep a human in the loop — and the uncomfortable thing underneath it: that human is me, I am exactly one person, and "put a human on it" is not a plan. It's a debt I haven't been billed for yet.

"Keep a human in the loop" is the answer I keep reaching for