TL;DRAI

Alibaba ha rilasciato Qwen3.7-Plus, modello multimodale agentic con tool invocation, autonomous iteration, rankato #16 Vision Arena. Per stack AI: agent vision-capable per OCR/chart; con rank #5 Vision Arena è alternativa viabile ai soli leader US.

Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Alibaba Cloud’s Bailian platform. Bailian is the console international users access as Model Studio. It offers API services to external developers. The release follows Alibaba’s May unveiling of the Qwen3.7 generation.

Qwen3.7-Plus is a multimodal large language model. The model understands images and video, alongside written prompts. Its sibling, Qwen3.7-Max, is text-only.

This is visual understanding, not generation. The model reads images and video; it does not create them. Alibaba’s image and video generation work sits in separate model families.

Alibaba team describes the release as a step in multimodal hybrid agent technology. An agent is a model that plans and acts across steps. Building on image and video understanding, Qwen3.7-Plus adds five abilities. These are deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration.

Self-programming means the model writes and revises its own code. Tool invocation means it calls external functions or APIs. Verification and testing means it runs outputs and checks results. Autonomous iteration means it loops until the task is done. Together, they describe a model built to act, not just answer.

marktechpost.com

Alibaba's Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba released Qwen3.7-Plus, a multimodal model on Bailian adding image and video understanding, deep reasoning, and tool invocation.

martedì 2 giugno 2026 New tab

TL;DRAI

913 words~4 min read

Qwen3.7-Plus is a multimodal large language model. The model understands images and video, alongside written prompts. Its sibling, Qwen3.7-Max, is text-only.

This is visual understanding, not generation. The model reads images and video; it does not create them. Alibaba’s image and video generation work sits in separate model families.

Alibaba's Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba's Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Other newsrooms on this story

Related reading

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown…

Alibaba launches Qwen3.6-Plus, its third proprietary AI model in days

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba introduces Qwen3.7-Max as next-gen AI agent model · TechNode

Alibaba unveils Qwen3.7-Max, its flagship AI model for real-world tasks

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…

Other newsrooms on this story

Related reading

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown…

Alibaba launches Qwen3.6-Plus, its third proprietary AI model in days

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba introduces Qwen3.7-Max as next-gen AI agent model · TechNode

Alibaba unveils Qwen3.7-Max, its flagship AI model for real-world tasks

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…