AI Week in Review 26.04.11

Anthropic announced Claude Mythos Preview as their most powerful new frontier model, so capable in coding that they did not release it but announced Project Glasswing, a defensive-security project that gives selected partners access to use the model to find and fix vulnerabilities in critical software.It’s not typical for us to declare a non-released AI model a Top Tool, but as discussed in our article “A Preview of Claude Mythos,” Claude Mythos is perhaps the biggest “step function” leap in AI model capabilities since GPT-4 released in 2023.Anthropic’s system card for Claude Mythos Preview describes its advances in coding capabilities (77.8% on SWE-Bench Pro is a stunning advance on Opus 4.6) and overall intelligence, while also exploring its alignment and behavior. Anthropic signaled heightened AI risks due to its cybersecurity capabilities, explaining how the model discovered high-severity vulnerabilities in major operating systems and browsers at a much higher level than Opus 4.6.Anthropic’s announcement of their AI model as “too powerful” may be self-promoting, but sharing its capabilities in a limited preview is safer than a public release.Meta introduced Muse Spark, a natively multimodal reasoning model with tool use, visual chain-of-thought, and multi-agent orchestration. With SWE-Bench Pro at 52.4%, ARC AGI 2 at 42.5%, and GDP-val AA Elo score of 1444, it lines up on par with Gemini 3.1 Pro on many benchmarks. This first release from Meta Superintelligence Labs puts Meta back in the AI race after the Llama 4 release misfired. This is not an open AI model. Instead, Meta is releasing Muse Spark to the Meta AI app and website now, with planned rollout to WhatsApp, Instagram, Facebook, Messenger, and AI glasses in coming weeks.Meta AI’s harness for Muse Spark reportedly uses 16 internal tools, including web search, content search across Meta properties, Python execution, file editing, visual grounding, sub-agent spawning, and account-linking hooks for services such as Gmail and Google Calendar.In conjunction with Meta’s Muse Spark release, Meta reported on how they build and test advanced AI and published an update to their Advanced AI Scaling Framework, previewing a safety report for Muse Spark. Meta is expanding its evaluation to cover chemical, biological, cybersecurity, and loss-of-control risks. Meta tested Muse Spark before and after safeguards against thousands of adversarial scenarios and found it lacked enough autonomous capability to pose control risks in those evaluations.Z.ai officially released GLM-5.1 as an open-source model, touting its top spot among open models on SWE-Bench Pro (58.4%) and its support for long autonomous runs. GLM-5.1 is a 754B parameter Mixture-of-Experts (MoE) model with 40B active parameters and was engineered for long-horizon autonomous tasks, coding, and AI agent use. The model was trained entirely on Chinese Huawei Ascend hardware, and it is available via Hugging Face and API providers such as OpenRouter.We expect harnesses to continue evolving. So we built Managed Agents: a hosted service in the Claude Platform that runs long-horizon agents on your behalf through a small set of interfaces meant to outlast any particular implementation — including the ones we run today. - AnthropicAnthropic launched Claude Managed Agents on the Claude platform as a set of composable APIs for building and deploying cloud-hosted agents. This system separates the harness, sandbox, and session interfaces to handle long-horizon AI agent execution support: sandboxed code execution, state management, credentials, permissions, tracing, long-running sessions, and multi-agent coordination. Anthropic said Managed Agents makes components easier to recover or replace independently and reduces both debugging difficulty and security exposure for AI agents.Figure 2. The components of Claude Managed Agents.Google has integrated NotebookLM into the Gemini app, merging its research tool features to make Gemini more capable. Notebook in Gemini app integration lets users gather files, past chats, and custom instructions into a single context for the AI chatbot. Users can organize projects and focused research tasks with topic-based categorization similar to ChatGPT’s Projects. The notebooks feature has been released for web app users on the Ultra, Pro, and Plus plan, with mobile and free‑tier access to follow.Factory.ai launched a desktop app for its AI Droids on macOS and Windows. The app supports multi-agent sessions, persistent “Droid Computers,” local model support through Ollama or vLLM, computer-use features, and VS Code integration. The release extends Factory’s agent workflow from the command line into a native desktop environment for running multiple agents in parallel.Cursor announced that users can control agents remotely from a phone or another device, running AI agents on any remote machine. The update for remote execution is designed to let developers launch coding agents on remote development machines and manage them away from their main workstation.Alibaba’s Taotian Group anonymously launched HappyHorse-1.0, a 15B parameter video model that recently took the top spot on the Artificial Analysis video arena for its high-fidelity generative video, beating rivals like Seedance 2.0 and Kling 3.0 and shaking up the fast-moving AI video generation space. You can try out the text-to-video model at the HappyHorse-1.0 site.World Labs rolled out Marble 1.1 and Marble 1.1-Plus. Marble 1.1 offers artifact reduction and improvements to lighting and contrast, while the Plus version offers the ability to generate larger, more complex environments. The resulting AI generated worlds are getting closer to (but not at) video generation quality.OpenAI’s GPT-Image-2 was reportedly spotted on LM Arena under the codenames “maskingtape,” “gaffertape,” and “packingtape.” This Image 2 model reportedly performs better than Nano Banana Pro in image generation, with excellent photorealism and text rendering. This is a leak and test appearance with no formal launch yet, but a public release soon is likely.OpenClaw’s latest release as of 2026.4.9 has introduced a major update with the release of the /dreaming feature for memory consolidation. The OpenClaw Dreaming feature enables agents to reorganize memory across Light, Deep, and REM phases, generating a human-readable “Dream Diary” in dreams.md. The update also adds built-in video and music generation, broader language support, and GPT-5.4 as the new default AI model, as Anthropic is now blocking use of Claude subscriptions on OpenClaw.OpenAI Prism launched Paper Review, an AI workflow for evaluating scientific and technical papers. By providing a detailed technical review, the tool is intended to improve rigor, correctness, and reproducibility in research review. This AI-assisted review system will further automate peer-review process and accelerate the scientific process. It may reduce low-quality papers, but it is unclear if this will improve quality of scientific submissions.OpenAI introduced the ChatGPT Pro $100 per month tier for Codex, offering 5 times the usage limits compared to the $20 Plus plan, higher local‑message, cloud‑task, and code‑review caps. Designed to compete with Claude Max and capture users displaced by Anthropic’s Claude restrictions on OpenClaw, the plan also offers temporary boosts until May 31 and exclusive access to GPT‑5.3‑Codex‑Spark.X launched a new Grok-powered photo editor in the X post composer, that includes Grok’s “Edit with Words” image generator and a redaction blur tool. The editor also adds standard drawing and text tools, but the AI text-driven image editing is the main new feature. The update brings generative image editing directly into X’s posting workflow.Google’s latest Gemini upgrade lets the chatbot generate interactive 3D models and simulations. Starting from user prompts, this feature will make controllable interactive visualizations of various physical scenarios, such as displaying fractals, the Moon’s orbit, or molecular interactions. This is similar to visualization tools added by Anthropic and OpenAI to their interfaces, and it is available to Gemini Pro app users.Google has launched AI avatar capabilities on YouTube Shorts, letting creators generate up to eight‑second clips with strict usage limits and visible AI labels.HeyGen launched Avatar V, the latest generation of their avatar tool, with improved character consistency across scenes. Avatar V can now capture a user’s identity from 15 seconds of input and keep that identity consistent across generated videos, allowing users to change outfit, setting, and look while preserving the same underlying character across outputs.Google added Learn Mode and Custom Instructions to Gemini in Colab. Google said Learn Mode turns Gemini into a coding tutor that explains concepts step by step, while Custom Instructions let users set coding preferences, libraries, or class-specific guidance at the notebook level. The changes are intended to make Colab’s Gemini integration more personalized and more useful for learning, not just code generation.Spotify expanded its AI-powered Prompted Playlists feature so it can include podcasts as well as music.Seedance 2.0 has launched on Replicate, with support for multiple reference images, videos, and audio files for cinematic AI video generation. In addition, CapCut rolled out Dreamina Seedance 2.0 in the United States across its app, desktop, and web products.Anthropic published guidance on subagents in Claude Code, explaining when delegation is useful in long, complex coding sessions. The guidance explains how using subagents can improve focus, context management, and reliability in workflows. Subagents help isolate sub-tasks, so the main session does not accumulate unnecessary context. It presents typical useful applications, such as workflow pipelines and research-heavy tasks, and how to build skills for sub-agents.Weights and Biases research finds that giving models more reasoning time can sometimes reduce performance rather than improve it. The report studied Claude Opus 4.6 and GPT-5.4 and found maximum thinking effort dropped Claude Opus by 11.9 percentage points but lifts GPT-5.4 by 25.0 percentage points.Similar work in “Brevity Constraints Reverse Performance Hierarchies in Language Models” showed a counterintuitive phenomenon where larger LLMs underperform smaller ones through a mechanism that introduces errors through overelaboration.Netflix released VOID, a physics-aware, open-source AI video tool designed for advanced inpainting and object removal. Presented in “VOID: Video Object and Interaction Deletion,” VOID is a fine-tune of CogVideoX that allows editors to remove objects while naturally simulating the resulting physical interactions, for example, having a guitar fall when the person holding it is removed.Google DeepMind presented on “AI Agent Traps,” documenting how adversarial content embedded in web pages can exploit autonomous agents. The study found that hidden prompt injections in HTML could hijack agent Operative Loops in 86% of scenarios, while latent memory poisoning can corrupt an agent’s persistent reasoning with less than 0.1% data contamination. Malicious websites can detect agents via timing, behavior, or user-agent strings, then feed them manipulated data.Anthropic’s revenue growth has accelerated rapidly, with the company’s annualized recurring revenue (ARR) tripling to $30 billion in April. To sustain this growth, Anthropic secured a compute deal with CoreWeave and an expanded compute deal with Google and Broadcom for 3.5 gigawatts of TPU compute capacity.“We are making our most significant compute commitment to date to keep pace with our unprecedented growth.” - Krishna Rao, CFO of Anthropic.OpenAI’s post “The next phase of enterprise AI” shows OpenAI is continuing to push their agentic AI solutions further in the enterprise. OpenAI’s Codex reportedly reached 3 million weekly active users, up from 2 million the prior month. OpenAI tied that growth to broader engagement with agentic workflows in enterprise settings. The platform is also said to be expanding with plugins, sub-agents, and “Guardian Approvals,” an experimental workflow for escalating only higher-risk tool calls.OpenAI released a Child Safety Blueprint, a policy framework focused on AI-enabled child sexual exploitation and age-appropriate AI design. The framework calls for updated laws around AI-generated CSAM, stronger provider reporting and law-enforcement coordination, and safety-by-design protections in AI systems. It was published as a policy and safety initiative rather than a model release.Utah has authorized a 12-month pilot for Legion Health to use an AI chatbot to autonomously renew certain psychiatric prescriptions for stable patients. The system features rigid safety guardrails, immediately escalating to a human clinician if it detects suicidality or mania.AI agent traffic now dominates technical documentation, according to analysis by Mintlify, with AI agents such as Claude Code and Cursor accounting for 45.3% of all requests, nearly tying with traditional human-driven browsers. Mintlify suggests that online documentation needs to be tailored to support AI agent consumption to accommodate this shift.AI agents empowered with AI coding are the basis of the growing “do anything” applications from Google, Lovable, even Claude Cowork. In this AI design convergence, every AI application begins to look more like a general AI management tool, as knowledge work itself converges. Coding is central to this functionality.

AI Week in Review 26.04.11

Other newsrooms on this story

Related reading

AI Week in Review 26.05.02

AI Week in Review 26.04.18

AI Week in Review 26.03.28

AI Week in Review 26.01.31

AI Week in Review 26.05.08

AI Week in Review 26.02.14