OpenAI has launched GPT-4o, the new AI model that allows you to speak and recognize emotions

Mira Murati, CTO of OpenAI, presenting GPT–4o. Credits: OpenAI.

Is called ChatGPT-4o (where the “o” stands for “omni”) the new version of ChatGPT announced last night of the artificial intelligence model of OpenAIfree for everyone and able to accept as input any combination of text, audio and images, in turn managing to generate as output any combination of text, audio and images and recognize emotions in order to provide more “real” and less artificial feedback. Rumors predicted that the long-awaited ChatGPT-based search engine would be announced at OpenAI's event yesterday, so ChatGPT-4o was a surprise.
When compared to the previous generation model – that is GPT-4.0 Turbo – GPT-4o is well twice as fast and efficient from an energy point of view. This allowed Sam Altman's company to cut costs and make it accessible to everyone free of charge (the global release seems to be gradually completed within a few weeks). Also announced during the event was a ChatGPT desktop client for Mac and free access to the GPT Store (OpenAI's virtual store where user-customized chatbots are available, based on the same model as ChatGPT).

The new capabilities of ChatGPT-4o

According to OpenAI, the new ChatGPT-40 constitutes “a step forward towards a much more natural human-computer interaction”. Being more powerful than the previous generation model, it is capable of respond to audio inputs in as little as 232 millisecondswith a average of 320 millisecondspractically equating human response times in a “typical” conversation.

The new model also has frequency limits five times higher compared to the previous generation, which is significant considering the fact that this parameter refers to frequency with which users can make requests of various kinds.

But in practice what does the new GPT-4o model on which ChatGPT is based allow you to do? Judging by the presentation (of which we propose the replica below), it knows how to do an infinite number of things. The very detailed – and at times even funny – demo created by the OpenAI team led by Mira Murati, Chief Technology Officer of OpenAI, highlighted GPT-4o's ability to interpret facial expressions and emotions reading them from the camera or microphones of their mobile devices. Not only that: the model also manages to adapt its responses and the tone of vocal feedback according to the emotions that characterize the conversation with the user.

In the demonstration, for example, ChatGPT-4o added sound effects to its vocal responses (such as giggles and startle effects) or changed its tone of voice to adapt to the flow of the conversation, all rather quickly and quickly.

The speed of response seems to be a characteristic of other languages too, such as Italian, which was “borrowed” by Murati for a real-time translation test. The model, in fact, supports 50 languages thus covering the 97% of the global population.

The model also proved to be perfectly capable of recognize elements present on the screen shared by the user or handwritten symbols on a sheet of paper to offer solutions to various problems (like an equation for example).

The possible risks of ChatGPT-4o

Such a high-performance, powerful and efficient AI model inevitably arouses great enthusiasm on the one hand, and on the other it can cause considerable concern due to the potential misuses of ChatGPT-4o. Precisely regarding the possible risks deriving from the birth of its new “creature”, OpenAI declared in an official note:

We understand that the GPT-4o's audio modes present a number of new risks. Today we are publicly releasing the text and image inputs and outputs. In the coming weeks and months, we will work on the technical infrastructure, usability through post-training, and the security needed to release the other modes. For example, at launch, audio outputs will be limited to a selection of preset voices and will comply with our existing security policies. We'll share more details about the GPT-4o's full range of modes in the next system sheet.

OpenAI stated, however, that before announcing GPT-4o it performed all the necessary tests to verify its safety. It did this through a series of automatic and human evaluations performed throughout the entire model training process, also involving 70 external experts specialized in various sectors.