At a glance

AI agents are moving into social contexts. When agents manage calendars, negotiate purchases, or interact with other agents on a user’s behalf, they need more than task competence—they need social reasoning.

SocialReasoning-Bench evaluates that ability. The benchmark tests whether an agent can negotiate for a user in two realistic settings: Calendar Coordination and Marketplace Negotiation.

The benchmark measures both outcomes and process: it scores agents on outcome optimality (how much value they secure for the user) and due diligence (whether they follow a competent decision-making process).

Current frontier models often leave value on the table. They usually complete the task, but they frequently accept suboptimal meeting times or poor deals instead of advocating effectively for the user.