Most web agents today drive a browser one action at a time. The model receives the current page state — as a screenshot or DOM text — and predicts the next click, keypress, or scroll. This action-at-a-time design made sense when language models had limited reasoning ability. As models have become more capable at writing and debugging code, that rigid loop has become a constraint rather than a structure that helps.
Microsoft Research’s AI Frontiers lab built a different approach. Their new open-source framework, Webwright, gives the agent a terminal instead of a stateful browser session. The agent writes Playwright code to control browsers, runs bash commands, inspects logs, and iteratively refines scripts. Playwright is an open-source browser automation library, also from Microsoft, that supports programmatic control of Chromium, Firefox, and WebKit browsers.
Webwright separates the agent from the browser and treats the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session but the code and logs in the local workspace.
This is the same model a developer uses when writing an RPA (Robotic Process Automation) script. Instead of manually clicking through a site each time, they write a script once. That script can be rerun, adapted, and shared. Webwright applies this to LLM-powered agents.














