AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how inference systems perform under these conditions. Artificial Analysis AgentPerf (AA-AgentPerf) offers the industry’s first multi-vendor open benchmarks profiling trajectories that are representative of real-world AI agent coding tasks.

This post explains how AA-AgentPerf sets a new standard for measuring agentic workload performance, and how NVIDIA extreme co-design helps deliver up to 20x better agentic coding performance than previous generations.

What is AA-AgentPerf?

AA-AgentPerf is a hardware benchmark created by Artificial Analysis that measures the number of concurrent AI agents an inference system can support while meeting predefined, model-specific performance service level objective (SLO) tiers. An SLO is defined as a specific threshold of output token speed and time-to-first-token (TTFT). The benchmark results are normalized per accelerator and per megawatt to enable comparison across hardware configurations.

Figure 1. The AA-AgentPerf hardware benchmark measures the throughput and efficiency of running multiple AI agents in parallel