On May 8, 2026, OpenAI shipped three new voice models into its API — and the most significant of them changes what voice agents can actually do.
GPT-Realtime-2 is the first voice model in the Realtime API family to carry GPT-5-class reasoning. That change unlocks a category of use cases that were previously impractical: complex multi-step voice workflows, reliable agentic tool calling during spoken interactions, and sessions long enough to handle real work. The other two models — GPT-Realtime-Translate and GPT-Realtime-Whisper — address two other gaps that have frustrated voice app developers since the original Realtime API launched. This guide covers all three, with the patterns and code you need to build production voice agents today.
What Changed: Three New Voice Models
OpenAI released these models simultaneously on May 8:
GPT-Realtime-2 — GPT-5-class reasoning for live voice conversations, with configurable reasoning effort, a 128K context window, parallel multi-tool calling, and natural interruption handling.






