This article is part of our coverage of the latest in AI research.
Salesforce Research has introduced a new framework that enables web agents to better navigate websites by leveraging the native tools and features those sites already contain. The framework, called WALT (Web Agents that Learn Tools), examines a website’s functionality to extract tools that abstract away low-level execution. This means that instead of reasoning about how to click and type, agents simply think in terms of tools such as searching, filtering, or adding new items to lists. WALT outperforms other web agent frameworks on key industry benchmarks and paves the way for more robust and versatile web agents.
The challenge of web navigation
Web agents are designed to automate complex browser tasks. But current methods for designing web agents are brittle, relying on step-by-step user interface interactions and heavy LLM reasoning that fall apart in dynamic site layouts and long-horizon tasks.
In contrast, when humans navigate the web, they abstract away implementation details and focus on what they want to accomplish, not how the interface mechanics work. For example, humans think about websites in terms of high-level operations like search, filter, and sort. They leverage this prior knowledge to recognize reusable patterns across websites, allowing them to quickly adapt their interactions to new layouts.







