Nvidia Releases Its Best Open AI Model Yet—But Still Lags Behind China - Decrypt

In brief

NVIDIA unveiled Nemotron 3 Ultra at Computex on June 1, a 550-billion-parameter open-weight model.

The model delivers over 300 tokens per second on a pre-release DeepInfra endpoint, running three to six times faster than Chinese rivals

But Kimi K2.6 from Moonshot AI still leads the open-weight intelligence ranking.

Jensen Huang walked onto the Computex stage in Taipei on Sunday, leather jacket on, and unveiled Nemotron 3 Ultra—Nvidia's largest open AI model ever and, at least for now, the smartest open-weight model built in America. It's good. It's just not good enough to beat China.The model packs roughly 550 billion total parameters but runs on only 55 billion active ones at any given moment, using a design called mixture-of-experts. Parameters are what determine an AI model’s breadth of knowledge, with a greater number generally meaning more powerful.To understand how a mixture-of-experts model works, think of it like a hospital with hundreds of specialists: When a patient comes in, only the relevant doctors actually show up—not everyone on staff. That approach keeps the cost of running the model far lower than its headline parameter count would suggest, which is exactly why Nvidia can claim 5x faster inference and costs 30% lower than comparable open-weight alternatives.Independent evaluator Artificial Analysis, which partnered with Nvidia on the pre-release assessment, put Nemotron 3 Ultra at 48 on its Intelligence Index—a composite benchmark that aggregates 10 evaluations spanning reasoning, coding, general knowledge, and agentic performance, scored on a numbered scale where higher means smarter.That makes it the top U.S. open-weight model by a comfortable margin. The next closest American options are Gemma 4 31B from Google at 39, Nemotron 3 Super at 36, and OpenAI's gpt-oss-120b at 33.

In brief

NVIDIA unveiled Nemotron 3 Ultra at Computex on June 1, a 550-billion-parameter open-weight model.

The model delivers over 300 tokens per second on a pre-release DeepInfra endpoint, running three to six times faster than Chinese rivals

But Kimi K2.6 from Moonshot AI still leads the open-weight intelligence ranking.

Nvidia Releases Its Best Open AI Model Yet—But Still Lags Behind China - Decrypt

Nvidia Releases Its Best Open AI Model Yet—But Still Lags Behind China - Decrypt

Other newsrooms on this story

Related reading

Nvidia's Nemotron 3 Ultra becomes the smartest open US model, but China still…

Nvidia CEO Jensen Huang launches Nemotron 3 Ultra AI model at Computex 2026

NVIDIA Nemotron 3 Ultra 550B: Developer Guide — Architecture, Benchmarks &…

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep…

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid…

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Other newsrooms on this story

Related reading

Nvidia's Nemotron 3 Ultra becomes the smartest open US model, but China still…

Nvidia CEO Jensen Huang launches Nemotron 3 Ultra AI model at Computex 2026

NVIDIA Nemotron 3 Ultra 550B: Developer Guide — Architecture, Benchmarks &…

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep…

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid…

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI