SocialReasoning Bench shows the limits of today’s AI agents

At a glance

AI agents are moving into social contexts. When agents manage calendars, negotiate purchases, or interact with other agents on a user’s behalf, they need more than task competence—they need social reasoning.

SocialReasoning-Bench evaluates that ability. The benchmark tests whether an agent can negotiate for a user in two realistic settings: Calendar Coordination and Marketplace Negotiation.

The benchmark measures both outcomes and process: it scores agents on outcome optimality (how much value they secure for the user) and due diligence (whether they follow a competent decision-making process).

Current frontier models often leave value on the table. They usually complete the task, but they frequently accept suboptimal meeting times or poor deals instead of advocating effectively for the user.

SocialReasoning Bench shows the limits of today’s AI agents

Other newsrooms on this story

Related reading

How to build a better AI benchmark

Other newsrooms on this story

Related reading

How to build a better AI benchmark

Button-pushing explorers: How to grasp that AI agents can do amazing things…

Don't sleep on Cohere: Command A Reasoning, its first reasoning model, is built…

Research: Traditional Marketing Doesn’t Work on AI Shopping Agents

New Paper: Towards a science of AI agent reliability

New benchmark confirms AI video generators look stunning but still can't reason…