Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

Simultaneous interpretation is one of the harder problems in applied AI. You’re asking a model to translate speech before the speaker has finished a sentence. Every extra second of delay breaks the illusion of real-time communication. Alibaba’s Qwen team has been chipping away at this with each release. Their latest model, Qwen3.5-LiveTranslate-Flash, brings that latency down to 2.8 seconds and expands input language coverage to 60 languages.

https://qwen.ai/blog?id=qwen3.5-livetranslate

A Meaningful Jump From the Previous Release

The Qwen3-LiveTranslate-Flash handled 18 input languages at roughly three seconds of latency. Qwen3.5-LiveTranslate-Flash brings that down to 2.8 seconds, expands input coverage to 60 languages, and adds speech output in 29 languages. That’s more than a 3× expansion in language coverage on the input side. For devs building multilingual products, this reduces the need for per-language model switching in most global enterprise scenarios.

The latency improvement comes from a technique for processing what the team calls ‘reading units.’ Rather than waiting for a full sentence to arrive before producing output, the model decides when enough meaning has accumulated in a segment to commit to a translation. It streams output continuously while the speaker is still talking. This is the same underlying logic as semantic unit prediction but with a tighter implementation that shaves off that extra 200 milliseconds.

https://qwen.ai/blog?id=qwen3.5-livetranslate

A Meaningful Jump From the Previous Release

Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

Other newsrooms on this story

Related reading

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba's Tongyi Lab Releases Qwen-Audio-3.0-TTS, a Hosted Text-to-Speech Model…

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown…

Alibaba's new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low…

Alibaba unveils Qwen3.5 with visual agentic abilities

Alibaba's Qwen Audio 3.0 TTS Plus tops the competition in the text-to-speech…

Other newsrooms on this story

Related reading

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba's Tongyi Lab Releases Qwen-Audio-3.0-TTS, a Hosted Text-to-Speech Model…

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown…

Alibaba's new open source Qwen3-235B-A22B-2507 beats Kimi-2 and offers low…

Alibaba unveils Qwen3.5 with visual agentic abilities

Alibaba's Qwen Audio 3.0 TTS Plus tops the competition in the text-to-speech…