By Zoe Kleinman, BBC Technology editor
OpenAI has unveiled the latest version of the tech which underpins its AI chatbot ChatGPT. It’s called GPT-4o, and it will be rolled out to all users of ChatGPT, including non-subscribers.
It is faster than earlier models and has been programmed to sound chatty and sometimes even flirtatious in its responses to prompts.
The new version can read and discuss images, translate languages, and identify emotions from visual expressions. There is also memory so it can recall previous prompts.
It can be interrupted and it has an easier conversational rhythm – there was no delay between asking it a question and receiving an answer.
Glitches
During a live demo using the voice version of GPT-4o, it provided helpful suggestions for how to go about solving a simple equation written on a piece of paper – rather than simply solving it. It analysed some computer code, translating between Italian and English and interpreted the emotions in a selfie of a smiling man.
Using a warm American female voice, it greeted its prompters by asking them how they were doing. When paid a compliment, it responded: “Stop it, you’re making me blush!”.
It wasn’t perfect – at one point it mistook the smiling man for a wooden surface, and it started to solve an equation that it hadn’t yet been shown. This unintentionally demonstrated that there’s still some way to go before the glitches and hallucinations which make chatbots unreliable and potentially unsafe, can be ironed out.
But what it does show us is the direction of travel for OpenAI, which I think intends GPT-4o to become the next generation of AI digital assistant, a kind of turbo-charged Siri or Hey, Google which remembers what it’s been told in the past and can interact beyond voice or text.
If there was an elephant in the room, alongside the enthusiastic off-camera audience whooping and applauding, it was the environmental price tag of this technology.
We know that AI is more power-hungry than traditional computing tasks, and that the more sophisticated it becomes, the more computing power it requires. There was no mention of sustainability during the evening.
Demystify
We have seen chatbots like Elon Musk’s Grok and Pi, from DeepMind co-founder Mustafa Suleyman, prioritise the “personality” of their products, but the way in which GPT-4o seamlessly handled the combination of text, audio and images with an instant response appears to put OpenAI ahead of the competition.
Of course, at the moment we only have the firm’s word for it – it was their demo, carefully curated and managed by them. It will be interesting to see how GPT-4o copes at scale with the millions of people who already use ChatGPT as it rolls out.
OpenAI’s chief technology officer Mira Murati described GPT-4o as “magical” but added that the firm would “remove that mysticism” with the product’s roll-out.
An interesting and emotive choice of words: while this tech is rapidly becoming more sophisticated and increasingly convincing as a companion – it is not sentient or magic, it is complex programming and machine learning.
There have been rumours about a partnership between OpenAI and Apple and while this has not yet been confirmed, it was telling during the presentation that Apple products were used throughout.
Another shot across the bows was the timing of this event, 24 hours before its rival Google is due to show off its latest AI developments at its annual conference, Google IO.