Storia in 1 fonti

SocialReasoning Bench shows the limits of today’s AI agents

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest.

Raccontata da

microsoft.com

lunedì 11 maggio 2026·microsoft.com
SocialReasoning Bench shows the limits of today’s AI agents
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit…