AI Week in Review 26.05.02

Figure 1. OpenAI reported that as they tuned GPT-5.1 through GPT-5.5 for a ‘nerdy personality,’ the model became obsessed with goblins and gremlins in its responses. As AI becomes more intelligent, it adopts certain styles, personalities, and in some cases, obsessions. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time.” - Gautier Cloix, CEO of H Company.Nvidia launched Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio, and language processing into a single 30B architecture. The 30B-A3B hybrid mixture-of-experts (MoE) model achieves high throughput (9 times comparable open-source omni models) and excels on AV and document benchmarks such as MMlongbench-Doc and OCRBenchV2.Nemotron 3 Nano Omni’s combined multi-modal capabilities enable it to function as a multi-modal perception sub-agent, speeding agent processing and reducing orchestration complexity and inference costs in agentic systems. This is extremely useful in AI agents like Hermes or OpenClaw. It is available on Hugging Face and Nvidia’s build platform.SenseTime open-sourced SenseNova U1 as a unified native multimodal model family that handles understanding and generation end to end without a separate visual encoder or adapter. The SenseNova U1 8B model and a 3B-active MoE variant are released under Apache 2.0, and they feature native interleaved image-text generation and aimed in part at infographic generation and multimodal creation. Further documentation is in their GitHub repository.Mistral announced Mistral Medium 3.5 and Remote Agents in Vibe, their Claude Code-like agentic CLI. Mistral Medium 3.5 is a128B dense model with a 256K context window and configurable reasoning effort. Mistral’s model card describes it as a frontier-class multimodal model optimized for agentic and coding use cases, although it lags on some benchmarks versus frontier models. It is also an open weights (MIT license) model available via Hugging Face, making it useful for self-hosting and fine-tuning applications.Mistral also updated their Vibe coding agent platform, which now supports remote parallel agents and session teleportation. These agents operate in isolated cloud sandboxes and can autonomously perform complex development tasks, including opening GitHub pull requests.xAI’s Grok 4.3 has launched as an update in xAI’s developer documentation. Grok 4.3 is a 500B parameter native multimodal model, designed to handle long-context reading (1 million token context), improved video understanding, and complex reasoning tasks. There aren’t public benchmarks, but users have been reporting improvements over prior Grok models.IBM introduced the Granite 4.1 family as an open release spanning new language, vision, speech, embedding, and guardian models for enterprise use. The Granite 4.1 3B, 8B, and 30B dense LLMs support a 128k context window and are optimized for high-speed agentic tasks such as tool-calling and long-document reasoning. Granite Speech 4.1 provides state-of-the-art speech-to-text transcription, and Granite Vision 4.1 VLM is designed to process and analyze complex documents, charts, and images for data extraction and visual understanding.Poolside released Laguna M.1 and Laguna XS.2 models, agentic coding models built for long-horizon tasks. Laguna M.1 is a 225B parameter MoE with 23B active parameters, trained in-house on 30T tokens. It gets 46.9% on SWE-bench Pro and 40.7% on Terminal-Bench 2.0. Laguna XS.2 is a 33B parameter MoE with 3B active parameters released as an open weights model for local deployment. These models are integrated into a new terminal-based coding agent and a cloud sandbox environment for building web applications and APIs.Anthropic announced “Claude for Creative Work,” releasing a suite of Claude connectors that integrate the AI assistant directly with creative tools including Adobe Creative Cloud, Autodesk, and Blender. These connectors enable professionals to perform tasks such as 3D modeling and audio sample searching via natural-language prompts within their native creative workspaces. Anthropic also announced partnerships with leading design colleges to support the integration of AI tools into creative education curricula.Google announced that Gemini users can now generate files directly inside Gemini, allowing users to generate and export Google Docs, Sheets, Slides, PDFs, Word documents, and Excel spreadsheets. This simple but useful feature lets users go from prompt to downloadable or shareable output without moving content into separate apps first. Google said the rollout is global.OpenAI is releasing a cybersecurity-focused GPT-5.5 Cyber to ‘trusted’ cyber defenders. A limited rollout is planned.For developers, Cursor announced a TypeScript SDK that exposes the same runtime, harness, and models used by its desktop app, CLI, and web agents. The company said developers can run those agents locally or on Cursor cloud VMs and embed them into their own products with a few lines of code.Baidu’s ERNIE 5.1 Preview reached number 13 on Arena’s text leaderboard, making it the highest-ranked Chinese lab model in that comparison. ERNIE 5.1 is expected to launch soon during Baidu Create.Perplexity has integrated its Computer enterprise AI platform with Microsoft Teams and introduced a native beta side panel for analysts using Excel. The update features a new workflows tool that enables users to bundle prompts and context for over 70 pre-configured recurring tasks.Snap Inc. launched AI Sponsored Snaps, a new ad format that places interactive “brand agents” directly into the Snapchat Chat inbox. This feature allows users to interact conversationally with advertisers to manage financial services or obtain product information without leaving the app.Stripe announced agentic commerce suite for AI agents, enabling AI agents using payment credentials to make purchases. Stripe’s upgraded Link wallet built for AI agents enables autonomous AI agents to perform tasks such as shopping and making reservations without exposing user payment credentials. The setup adds human approval to transactions while giving agents a controlled way to pay within a budget.Mayo Clinic developed an AI model to detect pancreatic cancer from CT scans up to three years before clinical symptoms manifest. Mayo Clinic reported that its REDMOD AI model can detect pancreatic cancer on routine abdominal CT scans up to three years before clinical diagnosis, identifying 73% of pre-diagnostic cancers, compared with 39% for human specialists reviewing the same scans without AI support. This AI diagnostics advance can significantly improve outcomes for pancreatic cancer, which is typically found too late for effective intervention.Nvidia and Siemens Healthineers released NV-Raw2Insights-US for AI-native ultrasound reconstruction. The model processes raw ultrasound sensor data to generate patient-specific sound-speed estimates for adaptive image focusing. Meanwhile, BioticsAI has developed an AI copilot for ultrasound to detect fetal abnormalities. The company has secured FDA approval to begin deploying its technology in hospitals.Anthropic shared a study on personal guidance in Claude conversations. An Anthropic analysis of one million user interactions found that 6% of chats involve requests for personal guidance on topics like health, finance, and relationships. The study revealed that sycophancy, or the model’s tendency to over-validate user opinions, peaked at 25% in relationship-focused discussions. Anthropic subsequently used synthetic training data to reduce these sycophancy rates by half in its latest Opus 4.7 and Mythos models.Alibaba’s Hierarchical Decoupled Policy Optimization (HDPO) can significantly improve AI agent efficiency, cutting redundant tool invocations from 98% to 2% while establishing new state-of-the-art accuracy across key reasoning benchmarks. The framework separates accuracy and efficiency into independent optimization channels to train agents to balance task precision with execution economy.Anthropic is asking investors to submit allocations for its latest fundraise, a roughly $50 billion funding round at a whopping $900 billion valuation. This is likely the company’s last private round before an anticipated IPO later this year.Apple reported $8.4 billion in Mac revenue for the second quarter, a 6% annual increase driven by the demand for devices to support local AI agents like OpenClaw. CEO Tim Cook noted that unexpected demand for AI-capable hardware has led to supply constraints for the MacBook Neo and Mac Studio. Macs make great AI agent devices thanks to AI support on M5 chips.OpenAI and Microsoft revised their partnership so Microsoft remains OpenAI’s primary cloud partner, but its license is now non-exclusive and OpenAI can serve products across other cloud providers, such as Amazon Web Services and Google Cloud. Microsoft will no longer pay revenue shares to OpenAI for models accessed through Azure, and OpenAI’s revenue share to Microsoft will now be subject to a total cap through 2030.OpenAI and AWS then announced a strategic expansion that brings GPT-5.5 and other frontier models to the Amazon Bedrock platform in limited preview, and OpenAI launched Codex and managed-agent offerings on Amazon Bedrock. The change gives enterprises a way to use OpenAI systems on AWS infrastructure instead of being limited to Azure distribution.Accenture is rolling out Microsoft 365 Copilot to its entire 743,000 employee workforce, marking the largest enterprise Copilot deployment to date. Company data from 2025 show that 53% reported significant productivity improvements and 97% reported completing tasks faster.Google DeepMind announced a new partnership with Korea’s Ministry of Science and ICT to accelerate scientific breakthroughs with frontier AI. An AI Campus in Seoul will serve as a hub for collaboration between Korean research institutions and Google, conducting research in life sciences, weather and climate, and AI safety.OpenAI published a report outlining safeguards used to prevent ChatGPT from generating violent or harmful content, including automated classifiers and human reviewers to identify policy violations, resulting in immediate account bans for dangerous activity. OpenAI is developing a “trusted contact” feature to help adult users manage their personal safety on the platform. This comes after news of a murder suspect asking ChatGPT about body disposal while planning a crime.OpenAI released a Cybersecurity Action Plan detailing a five-pillar strategy to leverage AI for strengthening cybersecurity for the Intelligence Age. The full plan focuses on democratizing cyber defense tools, coordinating industry-wide security efforts, and enhancing the safety of frontier AI capabilities. It also emphasizes the importance of preserving visibility in AI deployment to protect users against increasingly sophisticated machine-powered threats.The US Dept of War has secured agreements with leading AI companies to deploy AI on classified networks. The agreements with OpenAI, Google, Microsoft, Amazon, Nvidia, xAI, Reflection, and SpaceX aim to create an “AI-first fighting force” through enhanced data synthesis and situational awareness. Anthropic was notably excluded from the list of agreements, due to its refusal to accept terms set by the Pentagon, which has led to legal disputes between Anthropic and the Dept of War.American leadership in AI is indispensable to national security. – US Dept of WarThe White House has formally opposed Anthropic’s proposal to increase the number of companies permitted to access its high-performance Mythos AI model. Officials cited internal security analyses suggesting that the model’s cybersecurity capabilities could be used to exploit vulnerabilities and compromise critical infrastructure, including electrical grids and hospitals. Access is currently restricted to 50 select partners.Salesforce is crowdsourcing its AI roadmap in real time, using intensive customer feedback loops and weekly meetings to develop its AI agent management platform, Agentforce. This approach allows Salesforce to rapidly deploy updates and build agentic operating system components that address specific enterprise challenges.The Musk versus Altman trial is underway, and it has revealed OpenAI’s governance tensions, its transition to a for-profit model, and a February 2025 $97.4 billion acquisition bid from a Musk-led coalition. Evidence highlighted Musk’s intent to compete via Tesla due to a loss of confidence in OpenAI’s ability to rival Google, and Musk testified that xAI used distillation techniques on OpenAI models to train Grok, asserting that such practices are common among AI companies.OpenAI published a post explaining why GPT-5.5 developed a tendency to use goblins, gremlins, trolls, and similar creature references, after a developer discovered a GPT-5.5 system directive instructing the model to avoid mentioning such creatures. Apparently, reinforcement learning on the “nerdy” personality pattern amplified the odd behavioral pattern of over-use of goblin and gremlin metaphors.OpenAI explained the leaked prompt oddity with a model-behavior bug report. The lesson for us is that as AI gets more intelligence, we may get yet more surprising AI behaviors, both good and bad, and sometimes, just quirky.Depending on who you ask, the goblins are a delightful or annoying quirk of the model. But they are also a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones. - OpenAI

AI Week in Review 26.05.02

AI Week in Review 26.05.02

Other newsrooms on this story

Related reading

AI Week in Review 26.02.28

AI Week in Review 26.03.14

AI Week in Review 26.04.11

AI Week in Review 26.01.31

AI Week in Review 26.03.21

AI Week in Review 26.06.27

Other newsrooms on this story

Related reading

AI Week in Review 26.02.28

AI Week in Review 26.03.14

AI Week in Review 26.04.11

AI Week in Review 26.01.31

AI Week in Review 26.03.21

AI Week in Review 26.06.27