Originally published at dylanworrall.com.
Most agent demos that involve a browser are shot in one take for a reason. The moment you try to make browser automation reliable — running unattended, across sites you don't control, hundreds of times — it stops being a demo and starts being an engineering problem. I've spent a lot of time on that problem building the browser layer inside Froots, and a handful of patterns made the difference between "works in the video" and "works at 3am while I'm asleep."
Prefer structured verbs over raw eval
It's tempting to give the agent one giant escape hatch: run arbitrary JavaScript in the page and parse whatever comes back. It works right up until it doesn't, and when it fails it fails opaquely.
A small vocabulary of structured commands beats one omnipotent one:







