OpenAI showcases ChatGPT's new voice and image processing features

OpenAI just demonstrated something that makes the standard chatbot experience feel quaint. In a new showcase, the company showed ChatGPT completing actual paperwork by combining voice conversations with image uploads, effectively turning the AI into something closer to a personal assistant that can see, hear, and act on documents in real time.

From text box to multimodal workhorse

The demonstration highlighted ChatGPT’s ability to process uploaded images of documents while simultaneously conducting a voice conversation with the user. Think of it like calling a very patient, very fast assistant who can look at your paperwork, understand what’s being asked, and help you fill it out, all through natural speech.

The company began rolling out voice and image capabilities to ChatGPT Plus and Enterprise users back on September 25, 2023. Voice mode at launch enabled natural conversations through speech recognition and text-to-speech, initially featuring five synthesized voices. Image processing, powered by multimodal models like GPT-4V, allowed users to upload photos for the AI to analyze and interpret.

On May 13, 2024, OpenAI released GPT-4o, which brought real-time voice, vision, and text interaction into a single model. That launch included live demos showing the model guiding users through arithmetic problems visible on paper and interpreting complex documents.

From text box to multimodal workhorse

OpenAI showcases ChatGPT's new voice and image processing features

OpenAI showcases ChatGPT's new voice and image processing features

Other newsrooms on this story

Related reading

OpenAI unveils 'ChatGPT agent' that gives ChatGPT its own computer to…

I sent ChatGPT Agent out to shop for me

OpenAI Wants ChatGPT to Be Your Future Operating System

7 surprisingly useful ways to use ChatGPT's voice mode, from a former skeptic

OpenAI’s new ChatGPT image generator makes faking photos easy

ChatGPT Images 2.0 is a breakthrough that could fundamentally reshape graphic…

Related reading

OpenAI unveils 'ChatGPT agent' that gives ChatGPT its own computer to…

I sent ChatGPT Agent out to shop for me

OpenAI Wants ChatGPT to Be Your Future Operating System

7 surprisingly useful ways to use ChatGPT's voice mode, from a former skeptic

OpenAI’s new ChatGPT image generator makes faking photos easy

ChatGPT Images 2.0 is a breakthrough that could fundamentally reshape graphic…

Other newsrooms on this story