AI Week in Review 26.04.18

Figure 1. Screenshot from Nvidia’s Lyra 2.0, an explorable 3D world model with high-resolution consistent 3D generation.Anthropic released Claude Opus 4.7 as its latest flagship model for complex, long-running agentic tasks, with significant enhancements in coding, visual processing, and practical task execution. The new model introduces the “xhigh” effort levels for deeper reasoning and a self-verification feature that allows the model to check its own outputs before reporting back.Claude Opus 4.7 narrows the performance gap with the unreleased Claude Mythos model with state-of-the-art benchmarks for a generally available AI model: 64.3% on SWE-bench Pro; 69.4% on TerminalBench 2.0; 1753 on GPD-Val-AA. Claude Opus 4.7 offers a 3-fold resolution leap in visual reasoning with higher model accuracy: 79.3% on BrowseComp.Figure 2. Claude Opus 4.7 is a SOTA frontier AI model that closes the gap with Claude Mythos in agentic and visual reasoning benchmarks.Anthropic intentionally limited Claude Opus 4.7 cybersecurity-related capabilities to test automated safeguards and make it less risky. Opus 4.7 follows instructions more literally, uses a new tokenizer that can increase token counts by a 1.35x factor, and may generate more output tokens during extended reasoning. Pricing for Opus 4.7 is in-line with Opus 4.6 on a per-token basis.Anthropic this week also introduced Claude Design, an AI-powered visual design tool to generate high-fidelity prototypes, mockups, web design assets, and presentations. The platform allows users to establish a permanent design system by uploading brand assets, ensuring generations that follow a consistent visual identity. Claude Design uses Opus 4.7 for its improved visual reasoning performance, and it integrates with Claude Code, allowing developers to automatically build the relevant HTML and CSS files for designs. It is available to all Claude subscribers.Figure 3. The Claude Design interface enables interactive iterations on a design.OpenAI announced an expanded Codex that supports broader desktop and workflow automation. This “Codex for (almost) everything” transforms Codex from a coding assistant into a full-blown computer agent that is capable of controlling desktop apps, browsing on the web, clicking and typing in the background, connecting to 90+ plugins, and automating multi-step workflows. It features persistent memory, image generation with gpt-image-1.5, memory, and multi-terminal support. This is rolling out to ChatGPT subscribers with the Codex desktop app on macOS now, then Windows.OpenAI is making Codex into their Desktop Super App for all kinds of desktop work tasks, including software development, research and clerical work, to better compete with Anthropic, where Claude Code and Claude Desktop appear to dominate.OpenAI has launched GPT‑Rosalind, a frontier reasoning model specifically fine-tuned for scientific research in biology, drug discovery, and medicine. Benchmarks for bioinformatics such as BixBench and LABBench2 indicate GPT-Rosalind is twice as effective as GPT 5.4 in experimental design and analysis, while partners like Amgen, NVIDIA, Moderna, and Ginkgo Bioworks report accelerated R&D and cost savings. The Codex plugin orchestrates workflows to bridge ideas to experimental validation, and it features an interface for generating graphs and research findings. It is available as a research preview, with a trusted‑access program for vetted enterprise customers.Alibaba’s Qwen team released Qwen 3.6-35B-A3B as an open model. It is a 35B parameter MoE with 3B active parameters, native multimodality, and a 262K context window that can be extended to 1M. Scoring 49.5% on SWE-bench Pro, it is SOTA for its size and suitable for stable, practical coding and agentic use. Model weights are available via Hugging Face.Google launched Gemini 3.1 Flash TTS, a text-to-speech model with prompt-based control over emotion and inflection in speech delivery. It supports two speakers, more than 70 languages, and inline natural language tags like [laughs] or [sighs] to control pacing and emotion. It operates in batch mode with a 3-second latency, so it’s not for real-time interaction, but it offers a significantly cheaper alternative for expressive AI-generated audio.NVIDIA released Lyra 2.0, a framework for generating persistent, explorable 3D worlds from a single image. NVIDIA said it addresses spatial forgetting and temporal drift through progressive scene generation and reconstruction into explicit 3D representations. The system is designed for exploration and simulation of generated environments.Alibaba has unveiled Happy Oyster, an open-ended world model capable of real-time world creation and character interaction. The system functions as a controllable video model where users can navigate a character through stylized or realistic environments using keyboard commands. Notably, the model generates synchronized audio alongside the video and features distinct “director” and “wandering” modes for different levels of simulation control.Tencent released the HYWorld 2.0 world model framework, an open system for world generation and reconstruction that can turn an image, text prompt, or video into editable 3D world assets. The system outputs Gaussian splats, meshes, and point clouds that can be imported into Unity, Unreal, Blender, and NVIDIA Isaac Sim.Windsurf released version 2.0 of their Agentic IDE, upgrading it with an Agent Command Center, Devin integration, and Spaces for persistent project context. The product combines local agents and cloud agents so users can plan locally and hand off execution to Devin in a cloud VM.Baidu open-sourced ERNIE-Image, an 8B diffusion transformer for text-to-image generation that excels at multilingual text rendering for posters, comics, infographics, and other text-heavy visuals. The ERNIE Image Turbo variant cuts generation down to eight inference steps to speed output. Baidu said the model leads open-weight systems on GenEval.Figure 4. ERNIE Image offers an open-weight high-quality text-to-image AI model that you can run locally.Marimo launched marimo pair, a powerful new computational environment for agents that lets coding agents work directly inside reactive Python notebooks. Supported agents include Claude Code, Codex, and OpenCode, and the system can read variables, run cells, test logic, and manipulate UI elements.OpenAI expanded its Trusted Access for Cyber program with the release of GPT-5.4-Cyber, a specialized model for binary reverse engineering and malware inspection. This move directly counters Anthropic’s restricted “Mythos” rollout by providing thousands of vetted defenders with a more permissive security-focused model.Anthropic introduced Claude Code Routines to automate recurring work with cloud-hosted agents. The feature supports cron jobs, GitHub-event triggers, and API-triggered runs.Consumer neurotech startup Sabi announced an AI-powered beanie featuring 70,000 biosensors designed to translate brain activity into device commands. The company aims to move beyond keyboards and touchscreens by making thought-driven computer control a consumer reality by late 2026.Canva’s new AI‑powered assistant, Canva AI 2.0, lets users generate brand‑consistent, editable designs from natural‑language prompts, automatically producing multiple layer‑based options and streamlining the full workflow. It integrates with Slack, Gmail, Google Drive, and Anthropic, adds web‑research, scheduling, faster image generation and lower cost.Gemini’s Nano Banana can now use your Personal Intelligence to create more relevant, personal images. Gemini now uses your Google Photos, other personal images, and personal preferences to generate custom images automatically, eliminating long prompts and manual uploads.The Gemini app is now on the Mac. Available free for macOS 15 and up, the Gemini app can be launched with Option + Space, lets users share their screen for contextual AI help on local files, and generates images and videos without leaving the workflow.OpenAI’s Agents SDK now ships with a native harness and sandbox execution, letting agents inspect files, run commands, edit code, and tackle long‑horizon tasks while preserving durable state in tightly controlled environments. First released for Python (TypeScript to follow), the framework is available to developers via the API at standard pricing.Google has upgraded Chrome’s AI Mode to let users open source links side‑by‑side with the chat, enabling follow‑up questions while preserving the original search context. A new “plus” menu lets users pull data from open tabs, images or files into AI‑Mode searches for richer, context‑aware queries.Along with an upgraded Android CLI, Google is launching a new Android skills GitHub repository and an Android Knowledge base. These additions give AI agents‑specific documentation and code snippets for coding tasks.Physical Intelligence has a new robot model, π0.7, which can do tasks it was never taught such as using an air fryer with minimal coaching. The model blends sparse task‑specific data with web‑based pretraining to remix learned skills, achieving high success after prompt refinement. While still limited to guided steps and lacking single‑command autonomy.Roblox is introducing new agentic features in it Roblox Assistant to help developers plan, build, and test games on its platform. Roblox is adding a Planning Mode that asks clarifying questions, builds action plans, and introduces Mesh and Procedural Model Generation to create 3D assets.DeepL released a voice‑to‑voice translation suite covering meetings, mobile and web conversations, and group chats for frontline workers. The platform offers add‑ons for Zoom and Teams, an API for custom use cases, and a QR‑code entry for group sessions.Warp terminal added universal support for CLI coding agents including Claude Code, Codex, Gemini CLI, and OpenCode, to make the terminal work better for agentic development. The update introduced vertical tabs, notifications, native code review, rich input, and mobile remote control for managing multiple agent sessions.Researchers at Together AI and UCSD introduced Parcae, a stable looped transformer AI model architecture that achieves the quality of models twice its size by treating the forward pass as a dynamic system. The paper “Parcae: Scaling Laws For Stable Looped Language Models” establishes the first scaling laws for layer looping, which suggest that looping and data should be increased in tandem for a given fixed FLOP budget.In a significant breakthrough for AI-automated research, nine parallel Claude Opus 4.6 agents recovered 97% of a performance gap in a weak-to-strong supervision problem, outdoing Anthropic’s own human researchers.In collaboration with Nvidia, Cursor developed a multi-Agent system that achieves 38% Speedup in CUDA Optimization. The multi-agent system automatically writes and optimizes CUDA kernels through a continuous loop of writing and benchmarking. In a three-week trial, the system improved performance by an average of 38% across hundreds of real-world problems.CoreWeave announced major AI infrastructure deals involving Anthropic, Jane Street and Meta. The company said it now serves 9 of the top 10 AI model providers. CoreWeave has become a major GPU and cloud supplier for leading AI labs and hyperscalers.On April 14, Figma board member and Anthropic CPO Mike Krieger resigned, in advance of the release of the Claude Design tool that rivals Figma by producing slide decks, prototypes and marketing assets through conversational prompts and exportable to Canva, PDF, PPTX or HTML. Anthropic’s push into enterprise AI tools is sparking investor concerns that AI could displace SaaS applications; Figma stock fell 5% on Friday.SimpleClosure has launched a platform that lets companies sell unused code, Slack messages, emails, and workspace data to AI firms, sparking a new industry of “reinforcement learning gyms” that use defunct corporate data to build simulated workplace environments.AI startup funding news:AI coding startup Cursor is in talks to raise at least $2 billion in fresh capital that would give it a $50 billion valuation.Factory raised $150M at a $1.5B valuation to compete in the AI‑assisted coding market. Factory develops AI agents for the enterprise and differentiates by supporting multiple foundation models like Claude and DeepSeek.AI infrastructure company Upscale AI is reportedly in talks to raise $180‑200 million in a third funding round, valuing the startup at around $2 billion.Antioch, a physical AI startup that lets robot builders spin up digital replicas that mimic real sensor data, raised $8.5 million in seed funding, valuing it at $60 million.InsightFinder AI raised $15M to scale its AI observability platform that monitors model reliability across tech stacks.Luma launched Innovative Dreams, a filmmaking production studio built in partnership with Wonder Project. Using real‑time hybrid filmmaking tools, it will combine performance capture and generative AI virtual production to cut costs. The launch follows a trend of studios moving to lower production costs with generative AI.The White House OMB office is preparing to give Federal agencies access to Anthropic’s cybersecurity‑focused Mythos AI model, a sign of Government concerns around cyber-security implications of the powerful AI model. This comes after Anthropic demonstrated this model to Fed Chair Jerome Powell.The cost reduction from using AI in film production could be massive. Runway CEO Cristóbal Valenzuela proposed that studios can replace a single $100 million film with 50 AI‑generated movies at the same cost. While AI can efficiently expand output, scaling output without scaling creativity will not produce quality art.“If you’re spending a hundred million dollars on making one feature film, which is 90 minutes, imagine taking a hundred million dollars and spending it on, like, 50 movies. Same quality. Same amount of output, visually. But you make way more content. So, you have way better chances of hitting something. It’s a quantity problem. – Runway CEO Cristobal Valenzuela

AI Week in Review 26.04.18

AI Week in Review 26.04.18

Other newsrooms on this story

Related reading

AI Week in Review 26.02.21

AI Week in Review 26.03.28

AI Week in Review 26.04.11

AI Week in Review 26.06.27

AI Week in Review 26.02.14

AI Week in Review 26.05.02

Other newsrooms on this story

Related reading

AI Week in Review 26.02.21

AI Week in Review 26.03.28

AI Week in Review 26.04.11

AI Week in Review 26.06.27

AI Week in Review 26.02.14

AI Week in Review 26.05.02