Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Back to Articles

NVIDIA Nemotron 3 Nano Omni is a new omni-modal understanding model built for real-world document analysis, multiple image reasoning, automatic speech recognition, long audio-video understanding, agentic computer use, and general reasoning.

It extends the Nemotron multimodal line from a strong vision-language system to a broader text + image + video + audio model.

Nemotron 3 Nano Omni delivers best-in-class accuracy on complex document intelligence leaderboards such as MMlongbench-Doc, OCRBenchV2, while also leading in video and audio leaderboards like WorldSense and DailyOmni. It achieves top accuracy on VoiceBench for audio understanding and ranks as the most cost‑efficient open video understanding model on MediaPerf.

Under the hood, it combines the Nemotron 3 hybrid Mamba-Transformer Mixture-of-Experts backbone with a C-RADIOv4-H vision encoder and Parakeet-TDT-0.6B-v2 audio encoder.

Back to Articles

It extends the Nemotron multimodal line from a strong vision-language system to a broader text + image + video + audio model.

Under the hood, it combines the Nemotron 3 hybrid Mamba-Transformer Mixture-of-Experts backbone with a C-RADIOv4-H vision encoder and Parakeet-TDT-0.6B-v2 audio encoder.

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Other newsrooms on this story

Related reading

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language…

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single…

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal…

How Nvidia changed the open source AI game with Nemotron 3 - TechTalks

Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest…

Other newsrooms on this story

Related reading

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language…

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single…

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal…

How Nvidia changed the open source AI game with Nemotron 3 - TechTalks

Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest…