MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

The adoption of interoperability standards, such as the Model Context Protocol (MCP), can provide enterprises with insights into how agents and models function outside their walled confines. However, many benchmarks fail to capture real-life interactions with MCP.

Salesforce AI Research developed a new open-source benchmark it calls MCP-Universe, which aims to track LLMs as these interact with MCP servers in the real world, arguing that it will paint a better picture of real-life and real-time interactions of models with tools enterprises actually use. In its initial testing, it found that models like OpenAI’s recently released GPT-5 are strong, but still do not perform as well in real-life scenarios.

Visa’s $3.5B Bet on AI

“Existing benchmarks predominantly focus on isolated aspects of LLM performance, such as instruction following, math reasoning, or function calling, without providing a comprehensive assessment of how models interact with real-world MCP servers across diverse scenarios,” Salesforce said in a paper.

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Visa’s $3.5B Bet on AI

MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks

MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks

Related reading

The interoperability breakthrough: How MCP is becoming enterprise AI's…

Model Context Protocol: A promising AI integration layer, but not a standard…

Open-source MCPEval makes protocol-level agent testing plug-and-play

MCP isn’t KYC-ready: Why regulated sectors are wary of open agent exchanges

Model Context Protocol (MCP) is the Biggest AI Breakthrough Since ChatGPT

New Enterprise-Ready MCP Specification Brings New Security Challenges

Related reading

The interoperability breakthrough: How MCP is becoming enterprise AI's…

Model Context Protocol: A promising AI integration layer, but not a standard…

Open-source MCPEval makes protocol-level agent testing plug-and-play

MCP isn’t KYC-ready: Why regulated sectors are wary of open agent exchanges

Model Context Protocol (MCP) is the Biggest AI Breakthrough Since ChatGPT

New Enterprise-Ready MCP Specification Brings New Security Challenges