Microsoft wants to offer the ‘most complete AI and app agent factory’.

Microsoft has released three new AI foundational models, created in-house, in a move that places the company in direct competition with enterprise AI rivals, despite its deep ties with OpenAI.

The new foundational models target three of the most commercially viable modalities: transcription, voice and images. The models are already powering Microsoft’s products, including Copilot, Bing and Azure Speech, the company said, and will be available in a preview via the Microsoft Foundry and MAI Playground.

With this, Microsoft is furthering its goals of delivering “the most complete AI and app agent factory”, it said.

‘MAI-Transcribe-1’ is a first-generation speech recognition model expected to deliver “enterprise-grade accuracy” across 25 languages at around 50pc lower GPU costs than its alternatives. The model scores lower than 4pc average ‘word error rate’ on accuracy benchmarks, while GPT-Transcribe is at 4.2pc and Gemini 3.1 Flash is at 4.9pc.