AA-AgentPerf releases initial results for DeepSeek V4 Pro benchmark, showing NVIDIA Blackwell dominance

Artificial Analysis has dropped something the AI hardware world has been quietly waiting for: an actual benchmark that measures how well chips handle agentic AI workloads in the real world. The benchmark is called AA-AgentPerf, and its initial results running DeepSeek V4 Pro tell a story that AMD probably would rather not hear right now.

NVIDIA’s Blackwell systems, specifically the B200 and GB300, consistently outperformed AMD’s Instinct MI355X GPUs on power-efficient agentic inference.

What AA-AgentPerf actually measures

It’s the first multi-vendor open benchmark from Artificial Analysis designed specifically for hardware performance in agentic coding tasks.

The benchmark evaluates how many concurrent agents a system can support while meeting specific service-level objectives. Those SLOs cover output token speeds ranging from 20 to 300 tokens per second and time-to-first-token (TTFT) targets between 3 and 10 seconds.

AA-AgentPerf releases initial results for DeepSeek V4 Pro benchmark, showing NVIDIA Blackwell dominance

Other newsrooms on this story

Related reading

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

NVIDIA Blackwell Leads AgentPerf, the First Agentic-AI Infra Benchmark:…

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI…

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to…

Blackwell's AI Benchmark Lead, AMD's Ryzen AI Halo, and Linux 7.2 GPU Driver…

Nvidia Blackwell achieves 20x more agents per megawatt than Hopper