Figure 1. Image generation from Wan-2.7-Image, showing color palette control. The Alibaba chefs have cooked up several AI model releases this week. Wan-2.7-Image, Qwen3.5-Omni and Qwen 3.6 Plus.Google released Gemma 4, the latest iteration of Google’s open model family for local and edge device use. Gemma 4 is licensed under Apache 2.0 and includes high-performing 31B dense and 26B MoE (4B active) models, and smaller E2B and E4B models for edge devices. All are natively multi-modal, with text, vision, audio and native function calling for agentic tool-use support; the 26B and 31B models score near-frontier on reasoning (1440-1450 Arena ELO scores) and support 256K‑token context windows.The permissive Apache 2.0 license for Gemma 4 eliminates licensing friction for enterprises, and the high performance of the 26B and 31B models for their size make them highly cost‑effective and useful AI models for local use. These AI models are exciting because they enable a fully-local free OpenClaw or Hermes agent.Alibaba Qwen team has announced Qwen 3.6 Plus, highlighting its advanced agentic capabilities and its multimodal reasoning. Qwen 3.6 is intended for large-scale agentic workflows and repository-level engineering across codebases, with a 1 million-token context window and a “preserved thinking” flag to support multi-turn agent tasks. It is a near-frontier AI model (similar to Opus 4.5) and scores highly on AI coding (56.6% on SWE-bench Pro, 61% no Terminal Bench 2.0), reasoning (95.3% on AIME25), and agentic benchmarks (70.7% on Tau3-Bench).Qwen 3.6 Plus is available via Qwen chat and API and cloud, and open weight models in Qwen 3.6 family are planned to be released soon.Alibaba also released Qwen3.5-Omni, a native omni-modal a 397B parameter MoE model with 17B active parameters that supports text, image, audio, and video input and output. The Qwen3.5-Omni model is based on a Thinker-Talker architecture; with up to 256K context, it can process more than 10 hours of audio input, and over 400 seconds of 720p audio-visual input at 1 FPS. It can recognize speech across 113 languages and dialects and speech generation across 36, and it supports semantic interruption and turn-taking intent recognition for realtime interaction. This makes it highly suited for voice agents, live assistants, and audio-video reasoning workloads.Microsoft released three foundation MAI models into Microsoft Foundry and related platforms: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2.MAI-Transcribe-1 speech-to-text model transcribes speech across 25 different languages, 2.5× faster than prior models.MAI-Voice-1 is an audio-generating voice model that can generate 60 seconds of audio in one second, with customized voice outputs.MAI-Image-2, Microsoft’s most capable image generation model previously released on the MAI Playground, is now available in Microsoft Foundry as well.Due to their deal with OpenAI, Microsoft was contractually prohibited from independently pursuing artificial general intelligence. This changed in October 2025 in the wake of a renegotiation of those terms, and Microsoft is now pursuing independent foundation AI models, competing with OpenAI, Google, and others. However, Microsoft has also reaffirmed their partnership with OpenAI and use of their AI models.Z.AI has launched GLM-5V Turbo, a vision-enabled AI model specifically optimized for multi-modal coding and agentic tool use in GUI agents. GLM-5V-Turbo is Z.AI’s first multimodal coding foundation model, designed to bridge the gap between visual recognition and code generation by using a specialized vision encoder to understand fine layout details in screen layouts and UI designs. It supports a 200,000-token context window, making it capable of analyzing dense documents, video workflows, and complex software bug screenshots.Zhipu AI also released GLM-5.1, a mixture-of-experts (MoE) model with 744B parameters (40B active) similar to its predecessor. GLM-5.1 is targeted as an agentic coding model with benchmarks that stack up favorably for coding: 77% on SWE-Bench verified, 56% on Terminal Bench 2.0. This is SOTA for open-source models and comparable Opus 4.5.Prism ML has introduced Bonsai 1-Bit model family including 1-bit Bonsai 8B, a dense language model optimized for “intelligence density” through 1-bit quantization. Prism claims that “on raw benchmark averages, 1-bit Bonsai 8B remains competitive with leading 8B-class models, but it does so at just 1.15 GB memory footprint, roughly 12-14x smaller than its peers.” Bonsai 8B can run locally at high-speed on edge devices while maintaining useful quality, and it is available on the Bonsai 8B Hugging Face model page.Figure 2. Bonsai models push the frontier of AI model efficiency through aggressive quantization, yielding Bonsai 8B which achieves state-of-the-art performance for its model memory footprint.Liquid AI released LFM2.5-350M, a tiny 350 million parameter model trained for data extraction and agentic tool calling. With quantization, the model can fit within 500 MB and be targeted at practical small-model edge deployments.Arcee released the 399B parameter Trinity‑Large‑Thinking under Apache 2.0, a sparse MoE model with a reasoning phase that rivals frontier AI models on some agent benchmarks (PinchBench score of 91.9). Acree’s Trinity is unique in being a fully open-source US-based AI model that offers enterprises a low‑cost, sovereign alternative to similar Chinese AI lab models for customization. Trinity Large Thinking model weights are on Hugging Face.Pruna AI has released Page Upscale, a high-efficiency image upscaling tool that can generate 8-megapixel outputs for a fraction of a cent per image.Cursor introduced Cursor 3, a majorly redesigned, agent-first rebuild of Cursor focused on coordinating cloud and local agents in one workflow. The Cursor automations platform lets users direct multiple AI coding agents from Slack, commits, or timers. Rather than another incremental editor update, Cursor is embracing the move from iterative chat-based AI coding toward full agent management as it competes with Claude Code and OpenAI Codex.Google added new features to its video editor app Vids, including avatar control via text prompts, Veo 3.1 support, YouTube export, and a screen‑recording Chrome extension. Users can generate eight‑second Veo clips, export videos directly to private YouTube channels, and use natural‑language prompts to customize avatar appearance and actions. Google also launched Veo 3.1 Lite as a cheaper video-generation option through the Gemini API. Alibaba released Wan2.7-Image a unified model for AI image generation and editing. Alibaba boasts that Wan 2.7-Image has improved realistic face generation and text rendering, while also offering better color control and multi-image consistency (useful for marketing and branding). The productized editing and controllability. Figure 3. Wan-2.7-Image offers more realistic and controllable face and character generation, with control over eye shape, face contour, makeup, hairstyle, and accessories, across different ages and ethnicities.Skywork launched Matrix Game 3.0, a fully open-source interactive world model providing controllable, 3D-consistent environment generation. The system uses a 5B parameter model fine-tuned from Wan 2.2 and post-trained on high-end video game engines and real-world data. The Maxtrix-Game-3.0 model is capable of real-time streaming at 40 frames per second over minute-long sequences in 720p resolution, edging closer to real video-gaming utility.Voice AI company ElevenLabs launched the ElevenMusic iOS app, which lets users generate AI‑created songs with natural‑language prompts. The app offers stations, mood mixes, and a remix feature. The free app allows 7 AI song generations per day, and a $10/month Pro tier unlocks 500 track generations a month and 500 GB storage. With commoditization of AI voice models, ElevenLabs is diversifying beyond voice models to the AI music generation market, competing against Suno and Udio.Kilo launched the new KiloClaw platform to provide secure AI agent run environments. Kilo Klaw is a centrally managed environment that provides SSO, secrets handling, audit logs, short‑lived tokens, and bot‑account code. This is intended to curb data leakage and let security teams monitor, revoke access, and safely scale autonomous agents, so that enterprises can balance automation with compliance.Anthropic published new interpretability research on emotion concepts in LLMs. Anthropic researchers found Claude has internal states functioning analogously to human emotions such as fear, love, joy, and desperation. They found this to be an emergent property of training on human data, not intentional design. By adjusting specific “emotion vectors,” developers can directly influence the behavioral tone and output of the model. While the model appears to display emotions, researchers clarify that these are mathematical reflections of training data rather than biological consciousness.OpenAI announced that it closed a $122 billion funding round at an $852 billion post-money valuation. The company said the capital will support the next phase of AI development, including infrastructure, research, and deployment at larger scale. This is the largest funding round in history.OpenAI stated they are generating $2B in revenue per month, supported by a large and growing user and subscription base:ChatGPT is the overwhelming leader in consumer AI with more than 900 million weekly active users, and over 50 million subscribers. ChatGPT has 6x the monthly web visits and mobile sessions than the next largest AI app, while total AI time spent is 4x the next largest AI app and 4x all others combined. … frontier AI is becoming part of everyday life for people around the world.OpenAI also announced that it had acquired TBPN, the media company known for its tech-focused podcasts and live discussions,OpenAI has confirmed plans to consolidate its various tools, including ChatGPT, Codex, and its browser, into a single AI “super app” for desktop. This project aims to fix the fragmentation caused by multiple product launches in 2025 and create a unified interface and integrated agentic system for chat, coding, and browsing. The app is expected to feature enhanced agentic capabilities beyond just coding, directly competing with Anthropic’s integrated desktop interface.French AI leader Mistral AI secured $830 million to build a European data center near Paris powered by 13,800 Nvidia GPUs. The project aims to provide European governments and enterprises with a “sovereign AI stack” that reduces reliance on U.S.-based hyperscale cloud providers.Cognichip raised $60M to build AI that designs AI chips. Cognichip claims to reduce chip development costs by 75% and timelines by half, by providing AI models trained on chip design for use by chip designers.Governor Gavin Newsom has signed an executive order to strengthen state procurement standards for AI, requiring companies to prove their technology is bias-free and secure. This move is explicitly framed as a countermeasure to the Trump administration’s limits on AI safety regulations and may set up a future regulatory legal battle.Anthropic’s Claude Code source code leak, covered in our article Claude Code’s Secrets Revealed, was a dominant story this week. As we noted, this has led to copy-cats (like claude-code on GitHub) and clean-room rewrites of Claude Code such as claw-code, a Rust rewrite that has 50K stars on GitHub.Claw-code announced that they are:“Autonomously maintained by lobsters/claws — not by human hands.”The most popular AI coding tool, Code Claude, was reverse-engineered and replicated by AI coding tools in a matter of days and is now being maintained autonomously by AI agents. This is a significant example of the AI self-improvement loop.
AI Week in Review 26.04.04
Gemma 4, Qwen 3.6 Plus, Qwen3.5-Omni, MAI-Transcribe-1, MAI-Voice-1, GLM-5V Turbo, GLM-5.1, Bonsai 1-Bit 8B, LFM2.5-350M, Arcee Trinity-Large-Thinking, Cursor 3, Veo 3.1 Lite, Wan-2.7-Image, KiloClaw.









