NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time

NVIDIA’s Nemotron Speech team has released Nemotron 3.5 ASR. It is a 600M-parameter streaming Automatic Speech Recognition (ASR) model. A single checkpoint transcribes 40 language-locales in real time. Punctuation and capitalization are built in natively. The model ships as open weights on Hugging Face. The license is OpenMDW-1.1. The architecture is a Cache-Aware FastConformer-RNNT.

What is Nemotron 3.5 ASR

Nemotron 3.5 ASR extends nvidia/nemotron-speech-streaming-en-0.6b to many languages. It adds prompt-based language-ID conditioning to the base model. That lets one 600M-parameter checkpoint cover 40 language-locales. No per-language model or model-swapping is required.

The model targets two workloads. The first is low-latency streaming for live audio. The second is high-throughput batch transcription. Output is production-ready text with proper casing and punctuation. No separate punctuation-restoration step is needed.

Image source: https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b

NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time

Other newsrooms on this story

Related reading

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model…

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6×…

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language…

NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep…