The most useful thing an AI coding agent can do is let you check its work. It will not enjoy it.

I once watched my own agent do beautiful work on the wrong half of the job.

I had built a harness called Mycelium to make Claude do real product development, discovery through delivery. On a little macOS app, it aced the discovery. Mapped the opportunity tree like it had written the textbook. Then it shipped the thing with zero tests and no accessibility, and told me it looked great. It wasn't lying, exactly. It just had no idea what it didn't know, and neither of us found out until I went looking.

That is the part nobody prices. Not how good the agent is. How long it takes you to find out whether it is right.

The agents people call a joy to use are the ones that hand you the answer and the receipt at the same time: a diff, a screen recording, a before and after you can glance at and trust in two seconds. The ones that wear you down hand you something you have to re-derive yourself just to know whether you can believe it. Same model underneath, sometimes the same task. The only difference is who pays to check, and how much it costs them. Hamel Husain has put the sharp version of this: if a thing is hard to evaluate, that is a product smell. He is right, and I would push it one step further.