In brief
NVIDIA unveiled Nemotron 3 Ultra at Computex on June 1, a 550-billion-parameter open-weight model.
The model delivers over 300 tokens per second on a pre-release DeepInfra endpoint, running three to six times faster than Chinese rivals
But Kimi K2.6 from Moonshot AI still leads the open-weight intelligence ranking.
Jensen Huang walked onto the Computex stage in Taipei on Sunday, leather jacket on, and unveiled Nemotron 3 Ultra—Nvidia's largest open AI model ever and, at least for now, the smartest open-weight model built in America. It's good. It's just not good enough to beat China.The model packs roughly 550 billion total parameters but runs on only 55 billion active ones at any given moment, using a design called mixture-of-experts. Parameters are what determine an AI model’s breadth of knowledge, with a greater number generally meaning more powerful.To understand how a mixture-of-experts model works, think of it like a hospital with hundreds of specialists: When a patient comes in, only the relevant doctors actually show up—not everyone on staff. That approach keeps the cost of running the model far lower than its headline parameter count would suggest, which is exactly why Nvidia can claim 5x faster inference and costs 30% lower than comparable open-weight alternatives.Independent evaluator Artificial Analysis, which partnered with Nvidia on the pre-release assessment, put Nemotron 3 Ultra at 48 on its Intelligence Index—a composite benchmark that aggregates 10 evaluations spanning reasoning, coding, general knowledge, and agentic performance, scored on a numbered scale where higher means smarter.That makes it the top U.S. open-weight model by a comfortable margin. The next closest American options are Gemma 4 31B from Google at 39, Nemotron 3 Super at 36, and OpenAI's gpt-oss-120b at 33.












