Nvidia's GB300 NVL72 achieves 61.4K concurrent agents per megawatt, a 20x leap over H200

Nvidia just dropped a number that should make every data center operator do a double take. The company’s new GB300 NVL72 system can handle 61,400 concurrent AI agents per megawatt of power consumed, compared to just 2,600 on the prior-generation H200.

That’s a 20x improvement in agent density per unit of energy. For an industry where electricity costs are rapidly becoming the binding constraint on growth, this isn’t a spec sheet flex. It’s a structural shift in the economics of inference.

What’s inside the rack

The GB300 NVL72 is built on Nvidia’s Blackwell Ultra architecture, packing 72 Blackwell Ultra GPUs and 36 Grace CPUs into a single liquid-cooled rack. The system integrates roughly 20 to 21 TB of HBM3e memory and offers 130 TB/s of NVLink bandwidth, which is the internal data highway that keeps all those GPUs talking to each other without bottlenecking.

Nvidia says the platform delivers up to 50 times the AI factory output of its older Hopper-generation systems. It also claims 10 times the tokens per second per user and five times the throughput per watt.

Nvidia's GB300 NVL72 achieves 61.4K concurrent agents per megawatt, a 20x leap over H200

Other newsrooms on this story

Related reading

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to…

Nvidia Blackwell achieves 20x more agents per megawatt than Hopper

Azure achieves fastest AI training milestone with Nvidia partnership

Hewlett Packard Enterprise boosts Private Cloud AI token throughput by 20% with…

Local AI clustering with Dell's Pro Max GB10 — connecting two Nvidia Grace…