One of the most prominent improvements in Opus 4.8 is its honesty. … Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. … Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. - AnthropicAnthropic released Claude Opus 4.8, a frontier AI model upgrade to Opus 4.7, with stronger coding, agentic, and professional work performance. On benchmarks, Opus 4.8 achieves state-of-the-art 1890 on GDPval-AA for knowledge work and 69.2% on SWE-Bench Pro. Anthropic also touts improvements in its alignment and honesty, being less likely to hallucinate success or unverified claims.The improvements over Opus 4.7 are solid but incremental, and they have kept standard pricing unchanged from the prior version. They also launched effort controls and a faster and cheaper fast mode that can be used for high-throughput workloads.Figure 2. Claude Opus 4.8 has continued to advance the frontier of AI models with SOTA performance on coding, reasoning and knowledge work tasks, making it a great AI model for use in Claude Code and Claude Cowork.Anthropic also launched dynamic workflows in Claude Code for large tasks such as codebase-scale migrations. When users prompt for a complex task, Claude breaks the target down into subtasks and assigns sub-agents to the work. Claude dynamically runs tens to hundreds of parallel sub-agents in a single session, checking its work via internal agent critique before final output.Figure 3. Claude Opus 4.8 improves on alignment, close to the alignment of Mythos Preview.OpenAI launched Rosalind Biodefense, which gives trusted developers sponsored access to GPT-Rosalind for defensive biology work, including epidemiological modeling, early detection, screening, preparedness, diagnostics, and medical-countermeasure development. OpenAI is also expanding trusted access to selected U.S. government and allied public-health and biodefense partners.Mistral introduced Search Toolkit in public preview. The open-source framework unifies ingestion, retrieval, and evaluation for production search pipelines used in AI applications. Mistral’s pitch is that teams should spend less time wiring together search infrastructure and more time improving retrieval quality; the toolkit can run in cloud, on-premises, or edge environments.Mistral launched Vibe as Mistral’s live agent product and main AI interface, available through Mistral’s chat interface and mobile apps. Vibe now replaces LeChat and is absorbing prior Le Chat history, plans, and settings inside Chat mode. Vibe has a Work Mode AI agent for complex, multi-stage tasks, and a Code Mode as the new coding surface in the Vibe web app. The launch positions Mistral’s consumer and developer-facing agent around everyday tasks and knowledge work.We believe physics deserves its own frontier AI models. - MistralMistral announced “physics AI” for industrial engineering. The company says it has brought Emmi AI into Mistral and is building AI models that learn from physics-solver outputs to predict physical fields from geometry, boundary conditions, or measurement data. The intended use cases include faster design-space exploration, tooling and process optimization. They aim to apply these physics AI models as real-time digital twins for industrial partners such as ASML, Airbus, Safran, and Siemens Energy. Microsoft announced its new MAI Image 2.5 image generation model, an upgraded text-to-image generator succeeding MAI Image 2.0 that follows prompt instructions more closely and renders text strings more reliably. Climbing to the number three spot on the text-to-image Arena.ai leaderboard, MAI Image 2.5 displays strong visual reasoning around scenes and lighting, which combined with its sharper accurate text rendering makes it well-suited for branding and product concepts.Figure 4. MAI Image 2.5 is strong on text rendering and spatial reasoning to render exact images.Microsoft rolled out an overhauled design for Microsoft 365 Copilot across its office productivity suite, calling it “a cohesive, agentic experience.” The new Copilot has a consistent entry point across apps and can now draw live data directly from other integrated Microsoft apps, such as emails, calendars, and files, to generate context-aware charts and graphs.Microsoft is attempting to keep Copilot competitive as quickly evolving AI applications take on agentic abilities. To that end, Microsoft is reportedly developing a unified “super app” to consolidate GitHub Copilot, Copilot chat, and Copilot Cowork into a single destination. This new platform will feature an agentic workflow capability internally named Autopilot and is expected to launch by the end of summer.Perplexity announced that its Perplexity Computer capabilities are now directly available within Microsoft 365 applications, including Word, Excel, and PowerPoint. The deep integration allows users to request multi-step, complex analytical actions beyond standard chat responses. For instance, the tool can analyze a legal document against a template, track changes, and generate an issues list with fallback clauses.Eleven Labs released its upgraded Music V2 generative audio model, which focuses on producing higher-fidelity musical tracks. Eleven Labs claims:Music v2 delivers better vocals, instrumentation, and arrangement across every genre, with improved multilingual support and a set of new capabilities.The foundation model was trained entirely on licensed data, ensuring that commercial usage rights are cleared for content creators. Testing shows that the model contains built-in world knowledge, allowing it to correctly reference specific landmarks and pop culture elements when given regional prompts.Eleven Labs also launched Dubbing V2, an automated video localization tool that translates audio content while preserving original attributes. The software takes an uploaded video file and converts the speech into one of over 90 target languages, translating while maintaining the speaker’s original vocal tone, emotional delivery, and facial expressions. This keeps the output more faithful to the original delivery.Figma transformed its AI design assistant, Figma Make, into a live, visual software editor that connects natively to production codebases. The update allows users to import existing Git repositories directly into the Figma desktop app to visually edit underlying code and push changes back to engineering through GitHub pull requests. The platform utilizes a multi-model AI system, toggling between Anthropic’s Claude and Google’s Gemini models to write code that adheres to established design system guidelines.MiniMax released a technical report on their M2 series and teased upcoming M3 models. The upcoming M3 series will feature “MiniMax Sparse Attention” (MSA), a sub-quadratic framework capable of 15.6 times faster decoding speed at million-token context lengths. The MiniMax-M2 Series Technical Report highlights the sparse Mixture-of-Experts architecture M2 and its training: Agent-driven data pipelines; the “Forge” reinforcement learning system for agent-native training; M2.7 taking steps toward self-evolution by autonomously debugging training runs.Meta is developing an AI-powered pendant that it plans to start testing in the next year. The device is expected to build on the technology of Limitless, an AI startup acquired by Meta at the end of 2025. Meta also plans to expand its AI glasses lineup and launch a “Wearables for Work” business subscription.OpenAI has added Codex’s computer use feature to Windows. The app can see your screen and perform tasks on your device. Users can also manage and review Codex’s jobs via the ChatGPT app.OpenAI will remove Canvas feature in GPT-5.5 models. The side-by-side editing feature will no longer be available with GPT-5.5 Instant or GPT-5.5 Thinking. OpenAI is also shortening GPT-5.5 Instant responses and reducing the use of bullets in text.The paper “Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization” argues that some efficient distillation methods damage multi-step reasoning through “reasoning collapse.” To fix this, the proposed RED method uses activation-aware initialization to better preserve hidden-representation rank. Experiments on Llama and Qwen models show that RED recovers reasoning while keeping the efficiency benefits of compressed LLMs.Anthropic raised $65 billion in Series H funding at a mind-boggling $965 billion post-money valuation, with Anthropic saying proceeds will support safety and interpretability research, compute expansion, and product scaling. Anthropic’s run-rate revenue crossed $47 billion earlier in May, leading OpenAI in revenue, and it has signed major compute agreements with Amazon, Google, and SpaceX to ramp up capacity for serving AI. Anthropic also opened a Milan office and expanded its European footprint.OpenAI published its Frontier Governance Framework this week, which explains how OpenAI’s safety and security practices align with existing and emerging legal requirements, including in the US, California, and EU. The Frontier Governance Framework covers how OpenAI deals with AI risk assessment and mitigation in areas such as cyber offense, CBRN, harmful manipulation, and loss of control, providing guidance on model reporting, security management, and incident response.The Verge examined the rapid normalization of AI in warfare in a feature that argues that military AI is no longer a future scenario. The article covered the shift from Project Maven to modern AI-enabled surveillance, object detection, and targeting workflows. tensions between government demand for broad “lawful use” and AI companies’ attempts to define ethical red lines around autonomous weapons and surveillance.In the era of Artificial Intelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. – Pope Leo XIVPope Leo XIV issued an encyclical letter on AI called “Magnifica Humitas”, which means ‘Magnificent Humanity’, with a focus on “safeguarding the human person” in the AI era. It’s a nuanced, informed, and detailed document covering the impact of AI and how we should approach it. The Pope emphasizes that humans possess a unique, inherent dignity that should not be overlooked as AI capabilities grow.The Pope neither rejects AI in toto nor accepts the accelerationist argument but raises serious concerns and social impacts resulting from AI development, such as AI companionship’s impact on human relationships. He critiques how AI development being controlled by a few private entities complicates governing these technologies for the “common good.” The Pope advocates for “disarming” AI, meaning we must move away from a mentality of “armed competition” of the AI race and instead foster open, human-friendly collaboration.Pope Leo XIV and the New Social Question of AI reviews Pope Leo XIV’s AI missive in the context of Pope Leo XIII’s Revum Novarum, which confronted challenges of industrialization over a century ago.The Guardian scrutinized Anthropic’s association with Pope Leo XIV’s AI encyclical, sharing criticism that Anthropic’s engagement with the Vatican could become “Vatican-washing” if it burnishes the company’s safety image without addressing AI concerns. Anthropic is also using AI ‘concerns’ as a way to lock down AI development via ‘regulatory capture.’The Pope has moved the AI ethics debate forward, addressing AI in religious, social, labor, and geopolitical contexts.“I would like to employ the expression to disarm which is close to my heart. Disarming AI means freeing it from the mentality of armed competition ... which today is not limited simply to the military context but is also an economic and cognitive phenomenon. This entails a race for ever more powerful algorithms and larger data sets driven by the desire to secure geopolitical or commercial dominance.” - Pope Leo XIV
AI Week in Review 26.05.30
Claude Opus 4.8 and Dynamic workflows, Rosalind Biodefense, Mistral Search Toolkit, Mistral Vibe with Work Mode / Code Mode, MAI Image 2.5, Microsoft Copilot update, Eleven Labs Music V2, Dubbing V2.










