Everyone's saying the same thing right now: stop prompting your coding agent, start designing the loop that prompts it for you, and let it do the work. We agree. We've just been doing it long enough that it isn't a prediction anymore — autonomous loops have been running our R&D on four production repos for weeks.

Here's a concrete one. On NyxID, our open-source gateway, a loop took a load-balancing feature from a GitHub issue to a merged PR last week: about 1,400 lines of Rust, and the merge metadata records human_touch_count = 0, meaning no human edited the diff. A person still scoped the issue and clicked merge — but the code came out of the loop and survived review without anyone rewriting it. (PR #975)

That's the part everyone's excited about, and it's real. It's also not the hard part, and not the reason we trust the thing enough to leave it running.

The hard part is trust, not autonomy

The failure mode of an autonomous loop isn't that it does nothing. It's that it does something confidently wrong: writes plausible code that doesn't hold up, papers over a failing test, claims a result it can't support, and runs until your budget is gone. A single model is sure of itself even when it shouldn't be, and a naive loop inherits all of that confidence with none of the brakes. That's the real reason most "agent runs for 10 hours" demos stay demos.