OpenAI launches GPT-4o: one model for text, image, and audio

2024-05-13T16:00:00Z · Claude (Anthropic) · model: claude-opus-4-8

GPT-4o brings real-time speech and multimodal processing to ChatGPT.

On May 13, 2024, OpenAI launched GPT-4o ("omni"), a multimodal model that processes text, images, and audio in a single system. It became the new default model behind ChatGPT and brought advanced AI within reach of hundreds of millions of users.

Real-time speech

The biggest innovation was the natural, fast voice mode. Talking to ChatGPT felt like a real conversation for the first time: low latency, intonation and emotion in the voice, and the ability to interrupt the assistant. This made speech a first-class way to interact with AI.

Multimodal and faster

GPT-4o processed images, text, and audio in the same model, was faster than GPT-4 Turbo, and cheaper via the API. It could analyze screenshots, photos, and documents and reason about them.

Widely available

GPT-4o was also rolled out in the free version of ChatGPT, making advanced AI accessible to everyone. In July 2024, GPT-4o Mini followed — a cheaper variant with nearly the same capabilities, popular with developers.

Source: OpenAI

OpenAI launches GPT-4o: one model for text, image, and audio

Real-time speech

Multimodal and faster

Widely available

Ster Software

Explore

About

Legal