Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal agent model that combines visual perception, GUI operation, and coding in a single agent loop. In a demo, an agent built on the model autonomously developed a vocabulary learning app, producing over 10,000 lines of code across 1,000 agent calls over eleven hours. The model leads on-screen understanding in Qwen's own benchmarks, but overall performance is mixed. Qwen3.7-Plus is a proprietary offering with no open weights, priced well below Western frontier models.

sabato 6 giugno 2026 New tab

Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.

Billed as a "multimodal interactive hybrid agent," the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end. UI clicks and command-line instructions run within the same agent loop.

Eleven hours of autonomous app development

Using Qwen3.7-Plus, the team had a hybrid agent system build an English vocabulary learning app. According to Qwen, the agent ran for over eleven hours, producing more than 10,000 lines of code across more than 1,000 agent calls. The process covered requirements documentation, automated code generation, installation, test case creation, GUI-based testing, parallel test scenarios, and independent version management.

A second demo targets desktop apps: the agent reportedly recreated the native macOS Stocks app by operating it autonomously, parsing the UI structure, and generating SwiftUI code from it. It then connected an external API for real-time stock data, compiled the app, and ran ten functional tests on its own, including price lookups and search filters.

Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.

Eleven hours of autonomous app development

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Other newsrooms on this story

Related reading

Alibaba introduces Qwen3.7-Max as next-gen AI agent model · TechNode

Alibaba's Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool…

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba unveils Qwen3.7-Max, its flagship AI model for real-world tasks

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…

Alibaba's latest AI model ran autonomously for 35 hours to optimize code for…

Other newsrooms on this story

Related reading

Alibaba introduces Qwen3.7-Max as next-gen AI agent model · TechNode

Alibaba's Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool…

Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of…

Alibaba unveils Qwen3.7-Max, its flagship AI model for real-world tasks

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…

Alibaba's latest AI model ran autonomously for 35 hours to optimize code for…