How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

As organizations move from AI pilots to production AI factories, infrastructure decisions have shifted from peak chip specifications to cost per token: how many useful tokens they can deliver per dollar, per watt and within required latency targets.

Codesigned with NVIDIA GPUs, CPUs, networking and systems, and strengthened by a broad open source ecosystem, NVIDIA’s full-stack inference software continuously improves hardware performance. On the NVIDIA Blackwell platform, the software stack has already reduced token costs by up to 5x on the DeepSeek V4 model in just one month.

SemiAnalysis InferenceX results comparing token cost and interactivity for NVIDIA GB300 NVL72 systems with SGLang and the NVIDIA Dynamo inference framework.

Leading companies and inference providers are already seeing the compounding value of NVIDIA’s inference software stack on Blackwell:

Baseten used the NVIDIA TensorRT-LLM open source library to serve DeepSeek V4 Pro on Blackwell GPUs for reasoning, coding and long-context workloads, applying proprietary runtime optimizations to deliver up to 50% more tokens per second.

SemiAnalysis InferenceX results comparing token cost and interactivity for NVIDIA GB300 NVL72 systems with SGLang and the NVIDIA Dynamo inference framework.

Leading companies and inference providers are already seeing the compounding value of NVIDIA’s inference software stack on Blackwell:

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

Other newsrooms on this story

Related reading

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models…

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Inference Archives

NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at…

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to…

Amazon's Trainium and Inferentia chips gain traction as firms seek Nvidia…

Other newsrooms on this story

Related reading

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models…

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Inference Archives

NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at…

New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to…

Amazon's Trainium and Inferentia chips gain traction as firms seek Nvidia…