OpenAI's new voice model brings GPT-5-level reasoning to real-time conversations

OpenAI is shipping GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper - a new generation of voice models built to reason, translate, and transcribe on the fly.

ChatGPT has had an audio mode for a while, and Google offers a similar real-time conversation feature through Gemini. But the models behind these voice interactions have been significantly weaker than their text-only counterparts, especially compared to text reasoning models that take time to think through problems.

According to OpenAI, that's no longer cutting it. A modern voice agent needs to understand what someone actually means, keep track of context, roll with changes, use tools, and respond appropriately - all at the same time.

The company came up with three new interaction patterns that can also be combined. With "Voice-to-Action," a user describes what they need out loud, and the system reasons through the request, calls the right tools, and gets the job done.

With "Systems-to-Voice," software turns context into spoken guidance. A travel app could tell a passenger that their connecting flight is still reachable despite a delay, give them the fastest route to the new gate, and confirm their luggage transfer.

OpenAI's new voice model brings GPT-5-level reasoning to real-time conversations

Related reading

In crowded voice AI market, OpenAI bets on instruction-following and expressive…

GPT-5 is here. Now what?

OpenAI launches new GPT-5 model for all ChatGPT users

OpenAI updates ChatGPT's voice mode with more natural-sounding speech |…

ChatGPT: OpenAI launches GPT-5, the latest version of its language processing…

GPT-5's rollout fell flat for consumers, but the AI model is gaining where it…