NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

NVIDIA has released Nemotron 3 Ultra, a 550B total (55B active) open Mixture-of-Experts hybrid Mamba-Transformer for long-running agents. It pairs a 1M-token context with up to ~6x higher inference throughput than comparable open LLMs at on-par accuracy, and ships with open weights, training data, and recipes under OpenMDW-1.1.

giovedì 4 giugno 2026 New tab

NVIDIA has released Nemotron 3 Ultra, the largest model in its Nemotron 3 family. It targets a specific problem: long-running agents that plan, call tools, and reason across many turns. As agents run longer, token counts grow and inference cost climbs. Nemotron 3 Ultra is designed to keep accuracy high while making that inference faster and cheaper.

What is Nemotron 3 Ultra

Nemotron 3 Ultra is a 550 billion total parameter Mixture-of-Experts (MoE) model. Only 55 billion parameters are active per token. The MoE design improves accuracy per active parameter.

It uses a hybrid Mamba-Attention architecture instead of a pure Transformer. Mamba layers handle long sequences with sub-quadratic scaling. A few Attention layers are kept for precise recall over large contexts.

The model was pre-trained on 20 trillion text tokens. Context was then extended to 1 million tokens. It was post-trained using Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD).

What is Nemotron 3 Ultra

Nemotron 3 Ultra is a 550 billion total parameter Mixture-of-Experts (MoE) model. Only 55 billion parameters are active per token. The MoE design improves accuracy per active parameter.

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

Other newsrooms on this story

Related reading

NVIDIA Nemotron 3 Ultra 550B: Developer Guide — Architecture, Benchmarks &…

Nvidia's Nemotron 3 swaps pure Transformers for a Mamba hybrid to run AI agents…

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep…

Nemotron 3 Ultra went live June 4. Here's the call that works.

NVIDIA AI Releases Nemotron 3 Embed: An Open Embedding Collection Whose 8B…

Other newsrooms on this story

Related reading

NVIDIA Nemotron 3 Ultra 550B: Developer Guide — Architecture, Benchmarks &…

Nvidia's Nemotron 3 swaps pure Transformers for a Mamba hybrid to run AI agents…

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep…

Nemotron 3 Ultra went live June 4. Here's the call that works.

NVIDIA AI Releases Nemotron 3 Embed: An Open Embedding Collection Whose 8B…