The AI Daily Brief: Artificial Intelligence News and Analysis - Why OpenAI's Announcement Was A Bigger Deal Than People Think
Episode Date: May 14, 2024This episode dives into OpenAI's recent product event, unpacking the key takeaways and why it might be a bigger deal than initial reactions suggest. It explores the announcement of GPT-4 Omni (GPT-4o)..., a multimodal AI model with free access, its potential to revolutionize human-computer interaction, and the significance of democratizing advanced AI tools. The video also discusses the debate surrounding OpenAI's true advancements and the upcoming Google IO event for comparisons. ** Get your free NetSuite KPI Checklist - https://netsuite.com/breakdown Check out the hit podcast from HBS Managing the Future of Work https://www.hbs.edu/managing-the-future-of-work/podcast/Pages/default.aspx Join Superintelligent at https://besuper.ai/ -- Practical, useful, hands on AI education through tutorials and step-by-step how-tos. Use code podcast for 50% off your first month! ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI Daily Brief, a deep dive on OpenAI Spring Update.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, come check out our Discord. You can find the link in the show notes.
Quick note before we dive into the episode, I don't want to break up this stream of consciousness with any ads.
I do want to shout out that it's super intelligent, you better believe that we are going to start digging into these new open AI updates right about now.
I, for one, I'm particularly excited to try out these new image generation capabilities that have
what appears like it could be incredible ability to include specific text, as well as native
consistent character generation.
And so as always, if you haven't checked out super intelligent yet and you want to get your
AI learning on, go to be super.a.i and use code podcast for 50% off your first month.
Now, let's talk about what the heck OpenAI just announced.
Welcome back to the AI Daily Brief.
Today is one of those days, kind of the opposite of some of the ones we've had recently,
where everyone is talking about just one thing.
And so instead of doing our whole normal brief
and main episode sort of conversation,
we are just going to focus on the big thing
that everyone is talking about,
which is, of course, OpenAI Spring Update.
Now, this is the event that has been rumored for a couple of weeks.
For a while, there was speculation
that we were going to see a search engine,
some sort of competition with Google and perplexity.
But towards the end of last week,
as the event apparently got delayed a couple of days,
it started to come into view that the most likely candidate
was some sort of personal assistant update,
particularly around voice features.
Now, this, I believe, will go down as one of the most divisive, initially, product updates that OpenAI has ever released.
So what we're going to do on this show is first we're going to talk about what they actually shared,
and then we'll get into the reactions and why I think it's actually more significant,
not less significant, than it seems at first.
Right away, the first thing you notice when it kicked off was that Sam Altman was not the one presenting.
I could be totally wrong, but I initially took this as a sign that perhaps it wasn't going to be as big an announcement as we might have thought.
sort of with the idea that they were keeping Sam in the background for the big major updates like GPT 4.5 or GPT5.
Now, one of the things that you'll hear a lot throughout this assessment of what happened is that I think that people's expectations or hopes really more than expectations of GPT 4.5 or GPT5 colored the way that they received what was actually shared.
This is, of course, in spite of the fact that OpenAI did make it clear in advance that we were not getting GPT4.5 or GPT5.
Quickly, CTL Miramorati honed in on three big pieces of the announcement.
First, there was a chat GPT desktop app.
Second, there was an updated chat GPTUI.
And three, and obviously the most important,
there was a new flagship model called GPT4O.
Basically, this was described as GPT4 level intelligence,
but faster and with better ways to interact.
On OpenAI's website, they call it their new flagship model
that can reason across audio, vision, and text in real time.
The O they write stands for Omni,
and is a, quote,
step towards much more natural human computer interaction.
It accepts any input as combination of text,
audio and image, and generates any combination of text, audio, and image outputs.
Plus, they say it's really fast.
It can respond to audio inputs in as little as 232 milliseconds with an average of 320 milliseconds,
which is similar to human response time in a conversation.
Before they got into the demos, the next part of the announcement had to do with accessibility.
Specifically, they said with the efficiencies of GBT4 Omni, we can bring this to everyone.
What that meant was that free users now have access to a GPT4 level model, custom GPs,
the GPT store, basically everything that you were paying for before. Paying users didn't have access
to any differentiated technology anymore. Instead, they had five times the capacity limits. They also
would be first in line for new features, as we saw later in the day, as GPT4 started immediately rolling
out. And as we'll discuss in a little bit, the improvement in what's available at the free base
level is hugely massive. And the only reason I think that it wasn't talked about as such is that
the vast majority of people who are spending their time watching an OpenAI product video are
probably already springing for the GPT Plus account. In other words, the free access part doesn't
benefit them, so it's easier for them to overlook the significance in aggregate. We'll come back to
that, though, in a few minutes. GPT40 was also going to impact the API. Specifically, it was going to
make it 50% cheaper, which is obviously a significant change. From there, we got into the live
demos of the real-time conversational capacity of the chat GPT app. When Miramarati asked what's
different from the existing voice mode that we have, the presenters answered,
that you can butt in whatever without throwing it off, that it has real-time responsiveness,
that the model picks up on emotion, and that it can generate voice in a wide variety of styles.
This emotional awareness is pretty significant.
One of the demos that they did was telling a bedtime story,
and the two presenters kept asking it to change or modulated speech based on some new criteria.
So first, they wanted it to be more dramatic than even more dramatic than most dramatic of all,
which it did each time very successfully,
and then they switched it to dramatic but in a robot voice,
and then they had it seeing the end of the story.
I will note here that even for people who weren't that impressed with anything else,
many had the same thought that cassette AI had when they said,
got to give GPT40 props.
That's the most natural sounding AI voice I've ever heard.
Next up, they showed off the new vision capabilities.
First, they did a linear equation where they asked ChatGPT to help walk them through how to solve it.
So instead of just pointing the screen at an equation on a piece of paper and asking it to solve it,
the presenters were really using it as a tutor more than anything else.
And in that way, I think it reflected what they were really showing off, which is these features as not somehow standalone, but as part of a complete assistant experience.
And speaking of that assistant capability, they also did a demo where they brought up the chat GPT desktop app, specifically the conversational version of it, and were able to ask it about the code that they were writing in a different application, simply by copying it into the chat chbt window.
They also showed off chat chadddddd describing what it saw on screen after the code was run.
The two other demos they did, theoretically from audience input, were real-time translation,
where one of the presenters spoke in English, and then Miro responded in Italian, with Chat Chb-T
operating as the translator in real-time, and then finally, they asked Chatchipt to recognize the
emotions looking at someone's face. And then that was it. It was a tight half an hour. There was no big
one more thing, Steve Jobs type of moment. And like I said, there were a lot of kind of underwhelmed
responses. Abakas AI's CEO Bindu Reddy writes, is this me or was that it? What, even?
That was the single most underwhelming thing I've seen this year.
I'm not sure what's cool about this.
That Google duplex demo from 2019 was way better.
The only highlight, if any, was the tone modulation, which wasn't even that spectacular.
Theo Jaffe writes, maybe I'll be crucified for this,
but I actually wasn't blown away by this demo like I was for the releases of ChatGBT and GPT4.
This seems more like a product update than a foundational new capability breakthrough.
On the flip side, you add folks like Pete from the neuron who wrote,
GPT40 is magical, absolutely magical.
Rory wrote, blown away that more people aren't blown away.
We just went from smartphone to iPhone.
Chris France writes,
Lull, new open AI model is better than all existing models
that everything supports real-time vision and audio and is free?
What?
Here's a story that is all too common.
Your business gets to a certain size and the cracks start to emerge.
Things you used to do in a day are taking a week.
You have too many manual processes.
You don't have one source of truth.
If this is you, you should know these three numbers.
37,000, 25, 1.
37,000.
That's the number of business.
businesses which have upgraded to NetSuite by Oracle.
NetSuite is the number one cloud financial system,
streamlining accounting, financial management, inventory, HR, and more.
25.
NetSuite turns 25 this year.
That's 25 years of helping businesses do more with less,
close their books in a day, not weeks, and drive down costs.
One, because your business is one of a kind.
So you get a customized solution for all of your KPIs
in one efficient system with one source of truth.
Manage risk, get reliable forecasts, and improve margins.
Everything you need to grow all in one place.
Listen, I know a lot of you guys are entrepreneurs out there, and one of the things that is
a reality of entrepreneurship is that almost from the moment that you've started, you have more
information and more places than you possibly know what to do with. That gets nothing but harder,
the bigger that you grow. And frankly, right now, with AI, companies are growing bigger,
faster than ever before. If you're phasing any of these challenges right now, you can download
NetSuite's popular KPI checklist, designed to give you consistently excellent performance,
absolutely free at netsuite.com slash breakdown. That's netsuite.com slash breakdown to get your own
KPI checklist. NetSuite.com slash breakdown.
As a listener of this show, I have a strong feeling you like to stay up to date on all
things artificial intelligence, including its impact on the workforce, which is why I highly
recommend checking out managing the future of work, the chart-topping business podcast from
Harvard Business School. HBS professors Bill Kerr and Joe Fuller talked to business leaders,
technologists, and policymakers grappling with the forces like AI, globalization, and demographic shifts
that are reshaping the nature of work.
Recent guests include IBM's CHRO, Nicol Lamarro, on how Big Blue is adopting AI,
Morningstar CEO, Kunal Kapoor on how AI can raise the investment IQ,
Microsoft Corporate Vice President Jared Sparrow on how the tech giant is experimenting its way from AI assistance to autonomous agents,
and many other prominent movers in business and the workforce ecosystem.
So don't miss out.
Follow managing the future of work on Apple Podcasts, Spotify, or wherever you're listening now.
But what about the team at OpenAI?
What story were they trying to tell?
Well, Sam Altman wrote it up explicitly on his blog.
He said that he wanted to highlight two parts of the announcement.
First, he said, a key part of our mission is to put very capable AI tools in the hands of people for free or at a great price.
I'm very proud that we've made the best model in the world available for free in chat GPT without ads or anything like that.
Our initial conception, he continues, when we started Open AI, was that we'd create AI and use it to create all sorts of benefits for the world.
Instead, it now looks like we'll create AI and then other people will use it to create all sorts of amazing things that we all benefit.
it from. We are a business and we'll find plenty of things to charge for, and that will help us
provide free, outstanding AI service to hopefully billions of people. Second, Sam writes,
the new voice in video mode is the best computer interface I've ever used. It feels like AI from the
movies, and it's still a bit surprising to me that it's real. Getting to human-level response times
and expressiveness turns out to be a big change. The original chat GPT showed a hint of what was
possible with language interfaces, this new thing feels viscerally different. It is fast, smart,
fun, natural, and helpful. Talking to a computer has never felt really natural for me. Now it does.
As we add optional personalization, access to your information, the ability to take actions on your behalf and more,
I can really see an exciting future where we are able to use computers to do much more than ever before.
And so I think Sam is getting here at two of the three biggest parts of the announcement,
the transformation that this represents when you make it free,
and Open AI's bet on a new mode of human computer interaction.
I'm going to talk about each of those in some more detail,
but the third that I want to point out is truly native multimodality of this.
This was an announcement that was not for a technical audience.
At least it didn't seem to be to me.
All of it was incredibly simple language,
and they didn't even show off some of the capabilities.
In fact, because they didn't explain it,
some people question what was going on underneath the hood.
Andrew Gao writes,
For my technical audience, thoughts on what's behind GPT40?
Is it really multimodal and not converting things to text?
I.e., you can replicate the demo by using Whisper to convert speech to text,
use regular GPT4, and then convert the response to speech using 11 labs.
It would be entirely different if OpenAI was actually going from audio waves
to audio waves end to end without other models.
in between. Definitely possible and would explain the ability to understand and hear breathing in the
demo. But this is also doable without that necessarily. While Andre Carpathie, previously of the founding
team of OpenAI, explained it this way. He said, they are releasing a combined text audio vision model
that processes all three modalities in one single neural network, which can then do real-time
voice translation as a special case afterthought if you ask it to. In other words, yes, this is true
native multimodality. It is not taking language tokens and then converting them. Will DePoo, who works
on video generation at OpenAI says,
I think people are misunderstanding GPD40.
It isn't a text model with a voice or image attachment.
It's a natively multimodal token in multimodal token out model.
You want it to talk fast, just prompt it to.
Need to translate into whale noises, just use few shot examples.
An example that he showed was character consistent image generation
just by conditioning it on previous images.
He then showed an example,
and if any of you have spent any time trying to get consistent characters
with workarounds like style reference on mid-journey
or creating a custom GPT as I've done,
or using a third-party application like scenario.gg,
the fact that it might just natively have these capabilities is pretty significant.
So to me, the three biggest parts of this announcement were, one,
the fact that this best-in-class model was free for everyone,
two, the fact that it was truly natively multimodal,
and three, the fact that OpenAI was clearly making such a huge bet
on this new type of human-computer interaction as the future of how we interact with AI.
But what about when people started to get their hands on it?
How are the reactions then?
Well, Sully Omar from Cognosis writes
GPD40 is way, way faster than GPT4.
It feels like an entirely different model, insanely fast.
Andrew Gow again writes,
To everyone disappointed by OpenAI today, don't be.
The live stream was for general consumer audience.
The cool stuff is hidden on their site.
Some of the examples he gives are text to 3D,
hugely advanced text and AI generated images.
Andrew points out they're so confident in their text image abilities
that they can create fonts with GPT40
and a bunch of other huge things as well.
Sully again writes, okay, I get where chat GPT is going.
Ultimate Workflow equals screen share with chatGPT.
ChatGPT operates the computer for you.
You can interject chat all through voice.
It's like having someone there directly working with you.
In fact, right now, as we're recording this,
streaming live on X is someone coding in cursor with GPT40,
basically as a live coding companion.
Others pointed out that the timing of this was no accident.
Robert Scoble writes,
what was just announced by OpenAI was designed to blunt attacks by Apple and Google
as both companies are about to change their voice assistance to LLM-based systems
that will fix most of the things we hate about both.
Apple has lots of advantages that it can brag about.
Like you'll be able to change the brightness on your phone by talking to Siri,
or be integrated into Apple's ecosystem,
i.e., can you put something on my reminders app?
Others pointed out that the chatypte demo today
was basically the demo that everyone freaked out about
from Jevonai Ultra back in December,
that then everyone found out was edited to death
and not actually representative of its true capabilities.
Even more than that, though,
Google I.O. is happening tomorrow. And Logan Kilpatrick, who notably used to work at OpenAI,
shared a video of what is presumably a Gemini assistant looking at the IO stage and explaining it to the person
holding the phone. So it seems highly likely that tomorrow we're going to be having a very similar
conversation comparing to whatever they announce at Google IO to what we got from OpenAI today.
Oh, and as one fun little aside, they did confirm that the I'm also a good GPT2 chatbot that everyone
has been freaking out about on LIMSIS is indeed a version of GPT4O,
they've been testing.
When it comes to real-world response,
certainly the real-time translation,
demo seems to have had an impact.
One little coder pointed out a 5% drop
in Duolingo's price in the wake of the demo.
Siki Chen summed up where I think
a lot of people will end up in the long run
when he wrote,
this will prove to be in retrospect
by far the most underrated OpenAI event ever.
He even went further and said,
TLDR, GPD40 is a significantly larger improvement
over GPD4 than 3.5 was over 3.
GBT40 equals GPT 4.75.
I think the point here, one that will ultimately be proven out or not by our interactions with it,
is that this native multimodality plus the ability to input on the basis of vision and video
transforms the use cases of chat GBT in a huge way that we're probably underestimating initially.
Another part of this, though, was summed up by Aaron Levy from Box.
He wrote,
The productivity unlocked for humanity is pretty insane when AI can bring this level of intelligence to anyone.
Like I said, I think the reason that we're not talking more about just how significant the free shift is
is that most of us who are doing the talking right now have been paying for chat GPT since the moment we could.
Giving billions of people access to that, though, for free is just likely to have an enormous, enormous impact
on work, society, and everything in between.
Ultimately, we'll see.
I think it is in no way guaranteed that the way that people will want to interact with these technologies
is through these sort of chat modalities or interactions with video.
the real world will show us that one way or another.
Regardless of what plays out, though,
it's pretty clear that OpenAI believes
that this is truly the future of interaction with AI.
I think just because Sam Altman wasn't doing the presentation,
just because they might have rushed this a little
to get in ahead of Google I.O.
And just because they didn't announce formally 4.5 or GPD5,
it would be a mistake to underestimate
how significant this update is in the minds of OpenAI themselves.
However, there is going to be a lot more to discuss with this,
especially with Google I.O. coming tomorrow.
So that is going to do it for this edition of the AI Daily Brief.
Until next time, peace.
