Artificial Analysis has dropped something the AI hardware world has been quietly waiting for: an actual benchmark that measures how well chips handle agentic AI workloads in the real world. The benchmark is called AA-AgentPerf, and its initial results running DeepSeek V4 Pro tell a story that AMD probably would rather not hear right now.
NVIDIA’s Blackwell systems, specifically the B200 and GB300, consistently outperformed AMD’s Instinct MI355X GPUs on power-efficient agentic inference.
What AA-AgentPerf actually measures
It’s the first multi-vendor open benchmark from Artificial Analysis designed specifically for hardware performance in agentic coding tasks.
The benchmark evaluates how many concurrent agents a system can support while meeting specific service-level objectives. Those SLOs cover output token speeds ranging from 20 to 300 tokens per second and time-to-first-token (TTFT) targets between 3 and 10 seconds.









