Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal model built on top of the text-only Qwen3.7. It combines visual perception with classic agent capabilities like coding and tool use.
Billed as a "multimodal interactive hybrid agent," the model is designed to recognize real-world scenes, read screen content, operate graphical interfaces, write code from visual templates, and navigate mobile apps end to end. UI clicks and command-line instructions run within the same agent loop.
Eleven hours of autonomous app development
Using Qwen3.7-Plus, the team had a hybrid agent system build an English vocabulary learning app. According to Qwen, the agent ran for over eleven hours, producing more than 10,000 lines of code across more than 1,000 agent calls. The process covered requirements documentation, automated code generation, installation, test case creation, GUI-based testing, parallel test scenarios, and independent version management.
A second demo targets desktop apps: the agent reportedly recreated the native macOS Stocks app by operating it autonomously, parsing the UI structure, and generating SwiftUI code from it. It then connected an external API for real-time stock data, compiled the app, and ran ten functional tests on its own, including price lookups and search filters.










