There are two kinds of AI agent in production right now. The first one you babysit. You tweak its system prompt, watch it fail on a new kind of task, tweak it again, and the prompt slowly turns into a wall of special cases nobody wants to touch. The second kind notices the failure on its own, writes a better version of its own prompt, tests that version against real work, and keeps it only if it actually wins. The gap between those two is the whole field of self-evolving agents, and this year it stopped being a research curiosity.

A self-evolving agent is just an agent wrapped in a feedback loop. The agent runs a task. Something scores the output. When a weakness shows up often enough, the system proposes a new system prompt, runs both the old and the new version on real traffic for a while, and promotes the winner. If the new version turns out worse, it rolls back to the last known good one. No human in the path, but also no leap of faith, because nothing gets promoted until it earns it.

That is the idea. The interesting part is which piece is actually hard.

The Optimizer Got Solved This Year

The piece everyone writes papers about is the mutation step: given a prompt that is underperforming, produce a better one. For years the serious answer was reinforcement learning, which adjusts a model from sparse numerical rewards. It works, but it is expensive and it treats a rich failure as a single number.