Last week I watched my AI agent try to test a Flutter screen. It wrote a test file, ran flutter test, copied the stack trace back into the prompt, pasted a screenshot, and called it a workflow. It was slow, and it was guessing.

On the web, agents do not work like that anymore. Playwright MCP gives them an accessibility tree to read and stable refs to act on. 33k stars, no screenshot guessing. Flutter never had that layer.

So I built Dusk.

The problem

End-to-end testing on Flutter has always been a stitched-together ritual.