AI Week in Review 26.05.23

Figure 1. Multi-modal world model Gemini Omni is the Nano Banana for video. Omni can take multiple inputs (audio, video, image, text) to create a video on command. Prompt used for this: Dynamic sci-fi file style video based on input image, audio track from audio file, and elements lighting up from video input.Google presented many AI updates at Google I/O this week, and we shared our breakdown of highlighted announcements and releases in a prior article. There were many AI announcements at Google I/O, but to recap our recap, these were the most important ones:Google introduced Gemini Omni, a multimodal world generation model that can create “anything from any input,” with natural-language editing across text, image, and video prompts, and video generation output.Google released Gemini 3.5 Flash and previewed Gemini 3.5 Pro. Gemini 3.5 Flash is positioned for agentic workflows, coding, long-horizon tasks, multimodal understanding, and real-time. Gemini 3.5 Pro is expected to roll out next month.Google introduced Antigravity 2.0, an agent-first platform that revamps Antigravity, and also unveiled the Gemini Spark personal agent, a 24/7 personal AI agent built on Gemini 3.5 and Antigravity.Google announced many AI-infused features across the Google ecosystem, including major AI updates for Search, personalized Daily Briefs, Universal Cart for AI-assisted shopping, Ask YouTube for video search, Google Pics for image editing, and intelligent eyewear powered by Gemini.One way to summarize Google’s direction: Google expanded its agentic product layer across Search, Gemini, Workspace, shopping, YouTube, and Android XR. The Verge summarized Google I/O as a broad AI platform push across models, agents, apps, and hardware.Alibaba’s Qwen Team released Qwen3.7-Max, a new model designed for long-horizon autonomous agentic tasks. The model demonstrated up to 35 hours of continuous autonomous execution during an engineering task and features a 1 million token context window. Qwen3.7-Max is a proprietary model that outperforms all Chinese competitors on reasoning and coding benchmarks and matches Claude Opus 4.6.Figure. Qwen 3.7 is SOTA across many coding and agentic benchmarks; it’s the best Chinese AI model and a match for Claude Opus 4.6.OpenAI updated Codex with richer context, goal mode, browser improvements, and locked computer use. The latest Codex release added Appshots for attaching macOS app windows to Codex threads, general availability of Goal Mode across the Codex app, IDE extension and CLI, improved browser annotations, and locked computer use for eligible Mac Computer Use users. The update is aimed at making Codex more useful for longer-running software.Cohere unveiled Command A+, a highly optimized 218B parameter language model released under a permissive Apache 2.0 license for open-source enterprise use. The Command A+ model utilizes a Sparse Mixture-of-Experts architecture and key features include hardware-efficient quantization for single-GPU deployment, multimodal capabilities, and improved tokenization for non-European languages. It is available on Hugging Face.Amazon Nova Act now qualifies as a HIPAA eligible service. This expansion allows healthcare organizations to deploy autonomous, browser-based AI agents to automate complex workflows involving protected health information. The service can automate tasks such as appointment scheduling, insurance verification, and claims processing.Anthropic announced updates to Project Glasswing, allowing qualifying customers access to a Claude harness, a threat model builder, and various skills. The company also plans to expand the project to additional partners and has released a dashboard for open-source vulnerabilities.Cerebras Systems announced high-speed inference for the trillion-parameter Kimi K2.6 model. The chipmaker is running Moonshot AI’s open-weight model at 981 output tokens per second, significantly outperforming GPU-based cloud providers. This enterprise-first deployment utilizes wafer-scale architecture to provide massive speed improvements for agentic coding and heavy workloads.Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models. The models achieved a 1.4% word error rate on English medical terminology, significantly outperforming generalist APIs from OpenAI, ElevenLabs, and Whisper. The technology also demonstrated a 98.3% recall rate on clinical entities and surpassed the performance of the legacy incumbent, Dragon Medical One.An OpenAI reasoning model autonomously disproved a major conjecture in discrete geometry called the unit distance problem. OpenAI said an internal general-purpose reasoning model produced a proof resolving the long-running planar unit distance problem, originally posed by Paul Erdős in 1946. The proof, reviewed by external mathematicians, is notable because the model was not a math-specialized system and used ideas from algebraic number theory to disprove a conjecture many mathematicians believed was likely true.This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics. … The result is also notable for how it was found. The proof came from a new general-purpose reasoning model, rather than from a system trained specifically for mathematics, scaffolded to search through proof strategies, or targeted at the unit distance problem in particular.ComplexMCP was introduced as a benchmark for LLM agents in realistic tool-use environments. The paper on ComplexMCP argues that many agents can call isolated APIs but struggle when tools are interdependent, noisy, and embedded in workflows that resemble commercial software automation. The benchmark is intended to measure the “last mile” of agent performance, where success depends not just on calling tools but on managing state, dependencies, and changing environments. A new benchmark called SMDD-Bench tests whether LLMs can solve real-world small-molecule drug discovery tasks. The benchmark evaluates frontier open and closed models on tasks requiring chemical and biological reasoning, 3D intuition, specialized tool use, and planning under limited oracle calls.The paper “Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions” found that autonomous agents such as OpenClaw are vulnerable to systemic, architecture-level vulnerabilities that exploit multi-turn interactions. The study evaluated a standard agent framework with 10 mainstream LLM backbones across 20 threat scenarios and found that an evasion framework raised the average risk trigger rate from 28.3% to 52.6%. They constructed A3S-Bench to evaluate AI agents on such vulnerabilities.Andrej Karpathy has joined Anthropic as an individual contributor on the pre-training team. This is big news because he is a highly-respected AI researcher, one of the pioneers in the AI space, who cofounded OpenAI and led AI at Tesla for many years. He’s the 3rd senior former OpenAI employee to join Anthropic in the last two years.Nvidia reported record first-quarter fiscal 2027 revenue amid increasing AI infrastructure demand. Revenue for the quarter ending in April reached $81.6 billion, up 20% from the previous quarter and 85% from a year earlier. AI compute demand from hyperscalers, enterprises, and AI labs to support training and inference continues to expand.Google and Blackstone announced a new AI cloud infrastructure venture that will serve up Google TPU AI support through a compute-as-a-service model. Blackstone will initially invest $5 billion in equity to help bring 500 megawatts of data center capacity online by 2027, and total investment could reach $25 billion including leverage.The White House prepared an AI oversight executive order focused on model review and cybersecurity risks that would create a voluntary framework for AI labs to engage with the Federal Government before releasing covered models. The proposed order would task agencies with evaluating AI security following concerns regarding models like Anthropic’s Mythos and OpenAI’s GPT-5.5 Cyber.However, Trump called off those plans because of concerns such regulation could dull America’s edge on AI technology. A major point of contention is a requirement for companies to share advanced models with the government up to 90 days ahead of launch. The turnabout reflects tension between AI safety advocates pressing for stronger regulation and tech-industry allies who favor voluntary cooperation.U.S. lawmakers have moved to counter Chinese AI and technology exports with legislation to bolster American exports, as part of a broader geopolitical contest over AI infrastructure, chips, and digital technology.OpenAI and Dell Technologies are collaborating to deploy Codex in the enterprise, integrating Codex with the Dell AI Data Platform to support hybrid and on-premises workloads. Codex-powered agents will be utilized for tasks including software development and business workflow automations.Regis Aged Care implemented RegiCare Assist AI assistant to streamline clinical care management. Developed with Microsoft Copilot Studio and Microsoft Foundry, the assistant summarizes voluminous progress notes and flags clinical concerns.Spotify and Universal Music Group have entered a licensing deal to launch an AI remix tool that allows fans to create AI-powered covers from UMG’s catalog. Spotify also revealed new AI-driven tools for audiobook and podcast production during its Investor Day.Axios reported this week on some fresh polling on AI policy, and the results show Democrats becoming more AI skeptical, Republicans more likely to trust AI companies, and anxiety among the younger voters that AI will harm job opportunity. Overall, a majority are “pro AI innovation” and have a nuanced view of AI regulation, between the poles of “Pause AI” and “get Govt out of the way.” The most popular position - a 41% plurality - is that the government needs “basic safety” standards that keep American companies competitive:Figure 3. A survey question response from the Harris Axios survey on AI. In a twist of irony, Steven Rosenbaum explains how inaccurate quotes got into his book The Future of Truth. A New York Times investigation revealed that the use of AI tools during research led to several improperly attributed or synthetic quotes in the book. Rosenbaum is currently conducting a citation audit to correct these errors in future editions. The book is about “how Truth is being bent, blurred, and synthesized” thanks to the “pressure of fast-moving, profit-driven AI.”He blames AI because it was deceptively easy to use, but he proves that AI slop is due to sloppy fact-checking and editing and humans cutting corners. A good workman never blames his tools.

AI Week in Review 26.05.23

AI Week in Review 26.05.23

Other newsrooms on this story

Related reading

Google I/O Recap

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Google unveils Gemini 3.5 Flash, Omni, and Spark AI

Gemini Omni shows where AI video tools are heading next

11 demos of Gemini Omni and Gemini 3.5 in action

Google unveils Gemini Omni, a multimodal AI model that generates video from…

Other newsrooms on this story

Related reading

Google I/O Recap

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Google unveils Gemini 3.5 Flash, Omni, and Spark AI

Gemini Omni shows where AI video tools are heading next

11 demos of Gemini Omni and Gemini 3.5 in action

Google unveils Gemini Omni, a multimodal AI model that generates video from…