New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI

The NVIDIA Blackwell platform has been widely adopted by leading inference providers such as Baseten, DeepInfra, Fireworks AI and Together AI to reduce cost per token by up to 10x. Now, the NVIDIA Blackwell Ultra platform is taking this momentum further for agentic AI.

AI agents and coding assistants are driving explosive growth in software-programming-related AI queries: from 11% to about 50% last year, according to OpenRouter’s State of Inference report. These applications require low latency to maintain real-time responsiveness across multistep workflows and long context when reasoning across entire codebases.

New SemiAnalysis InferenceX performance data shows that the combination of NVIDIA’s software optimizations and the next-generation NVIDIA Blackwell Ultra platform has delivered breakthrough advances on both fronts. NVIDIA GB300 NVL72 systems now deliver up to 50x higher throughput per megawatt, resulting in 35x lower cost per token compared with the NVIDIA Hopper platform.

By innovating across chips, system architecture and software, NVIDIA’s extreme codesign accelerates performance across AI workloads — from agentic coding to interactive coding assistants — while driving down costs at scale.

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI

Other newsrooms on this story

Related reading

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models…

Networking Archives

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Lambda, Microsoft agree to multibillion-dollar AI infrastructure deal with…

Nvidia said to be developing new, more powerful AI chip for sale in China |…

AWS' custom chip strategy is showing results, and cutting into Nvidia's AI…