The #16 was about how a single agent perceives a page — vision, accessibility tree, or runtime structural perception — and why the third one wins when the user is the user. This post is about the axis underneath that one, the part nobody names: not how the agent sees the page, but how many things are looking at it at once.
Today the answer is one. One driver per browser. That assumption is so baked in it's invisible — and it's the next thing to break.
One driver is a design decision, not a law
Look at the three perception architectures from #16 and notice what they share. Vision spins up a headless browser the agent owns. Accessibility-tree agents drive a Chromium instance over DevTools Protocol — again, theirs. In both, the browser is a single-occupancy vehicle: the agent gets in, drives, gets out. The human isn't in the car. Neither is any other agent.
Runtime structural perception is the odd one out, and that difference is the whole point of this post. It runs as a peripheral to the user's own browser — the real session, already logged in. Which means the surface it reads is, by construction, a surface someone else is already on. The human is right there. The agent isn't operating a browser; it's operating the same browser.






