Simultaneous interpretation is one of the harder problems in applied AI. You’re asking a model to translate speech before the speaker has finished a sentence. Every extra second of delay breaks the illusion of real-time communication. Alibaba’s Qwen team has been chipping away at this with each release. Their latest model, Qwen3.5-LiveTranslate-Flash, brings that latency down to 2.8 seconds and expands input language coverage to 60 languages.
https://qwen.ai/blog?id=qwen3.5-livetranslate
A Meaningful Jump From the Previous Release
The Qwen3-LiveTranslate-Flash handled 18 input languages at roughly three seconds of latency. Qwen3.5-LiveTranslate-Flash brings that down to 2.8 seconds, expands input coverage to 60 languages, and adds speech output in 29 languages. That’s more than a 3× expansion in language coverage on the input side. For devs building multilingual products, this reduces the need for per-language model switching in most global enterprise scenarios.
The latency improvement comes from a technique for processing what the team calls ‘reading units.’ Rather than waiting for a full sentence to arrive before producing output, the model decides when enough meaning has accumulated in a segment to commit to a translation. It streams output continuously while the speaker is still talking. This is the same underlying logic as semantic unit prediction but with a tighter implementation that shaves off that extra 200 milliseconds.















