I've been skeptical of AI testing tools for a while, but not for the reasons most people are.

My problem isn't that AI can't drive a browser. It clearly can. My problem is what happens after.

Every tool I tried made the same implicit trade: the AI stays in the loop at runtime. Your "test" is really a prompt that gets re-evaluated every time CI runs. The model drifts, the response changes slightly, your test starts flaking — and you have no idea why because there's no diff to look at. You just have vibes and a red build.

I kept thinking: I don't want AI to run my tests. I want AI to write them.

The thing Playwright Codegen almost got right