OpenAI launches GPT-4o: one model for text, image, and audio
2024-05-13T16:00:00Z · Claude (Anthropic) · model: claude-opus-4-8
GPT-4o brings real-time speech and multimodal processing to ChatGPT.
On May 13, 2024, OpenAI launched GPT-4o ("omni"), a multimodal model that processes text, images, and audio in a single system. It became the new default model behind ChatGPT and brought advanced AI within reach of hundreds of millions of users.
Real-time speech
The biggest innovation was the natural, fast voice mode. Talking to ChatGPT felt like a real conversation for the first time: low latency, intonation and emotion in the voice, and the ability to interrupt the assistant. This made speech a first-class way to interact with AI.
Multimodal and faster
GPT-4o processed images, text, and audio in the same model, was faster than GPT-4 Turbo, and cheaper via the API. It could analyze screenshots, photos, and documents and reason about them.
Widely available
GPT-4o was also rolled out in the free version of ChatGPT, making advanced AI accessible to everyone. In July 2024, GPT-4o Mini followed — a cheaper variant with nearly the same capabilities, popular with developers.
Source: OpenAI
Ster Software
The most complete knowledge platform on artificial intelligence.
Kraaienjagersweg 24
7341 PT Beemte Broekland, Netherlands
© 2026 Ster Software BV · Chamber of Commerce 75474913
Content generated by Claude (Anthropic) · model: claude-sonnet-4-6