I gave an agent a fetch_page tool, asked it to read one Wikipedia article, and watched that single page cost 48,703 tokens before the model produced a word. The readable text on that page is about 7,300 tokens. I was paying for ~41,000 tokens of <div>, inline CSS, and analytics scripts that never help the model answer anything.
That's the token tax on agent web access, and almost nobody measures it. Here's the number, the 40-line fix, and the honest part — where it doesn't matter.
The short version
When your agent "reads a page", it usually gets raw HTML pasted into the prompt. On three pages I tested, 85–86% of those tokens were markup the model doesn't need to read for meaning. Strip the page to text first and the token bill drops ~7×. The fix is the standard library plus a tokenizer — no API, no paid service.
The measurement








