Alibaba just made its most aggressive move yet into embodied AI. The company launched the Qwen-Robot Suite, a trio of AI models purpose-built to serve as the cognitive backbone for robots operating in the real world.
What Alibaba actually built
The Qwen-Robot Suite consists of three specialized models, each handling a different slice of robotic intelligence.
First, there’s Qwen-RobotManip. Built on the Qwen3.5-4B architecture, it’s a generalist vision-language-action model. In English: it lets a robot look at something, understand a verbal or text instruction about that thing, and then physically manipulate it.
Second is Qwen-RobotNav, which handles vision-language navigation. This is the model that lets a robot move through physical spaces based on natural language directions.









