I've been skeptical of AI testing tools for a while, but not for the reasons most people are.
My problem isn't that AI can't drive a browser. It clearly can. My problem is what happens after.
Every tool I tried made the same implicit trade: the AI stays in the loop at runtime. Your "test" is really a prompt that gets re-evaluated every time CI runs. The model drifts, the response changes slightly, your test starts flaking — and you have no idea why because there's no diff to look at. You just have vibes and a red build.
I kept thinking: I don't want AI to run my tests. I want AI to write them.
The thing Playwright Codegen almost got right








