The AI Daily Brief: Artificial Intelligence News and Analysis - 7 Use Cases for GPT-4o
Episode Date: May 14, 2024Explore seven innovative uses for OpenAI’s new GPT-4o model, a natively multimodal model capable of handling text, audio, and visual inputs. Discover how these capabilities can be applied in various... fields like marketing, customer service, tutoring, and more. ** Check out the hit podcast from HBS Managing the Future of Work https://www.hbs.edu/managing-the-future-of-work/podcast/Pages/default.aspx Join Superintelligent at https://besuper.ai/ -- Practical, useful, hands on AI education through tutorials and step-by-step how-tos. Use code podcast for 50% off your first month! ** ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://aidailybrief.beehiiv.com/ Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@AIDailyBrief Join the community: bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, seven use cases that OpenAI's new GPT40 model opens up.
Before that in the headlines, the most interesting things from Sam Altman's recent Reddit AMA.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, visit our Discord with a link in the show notes.
Quick note before we dive into the episode, I do want to shout out that it's super intelligent,
you better believe that we are going to start digging into these new OpenAI updates right about now.
I'm particularly excited to try out these new image generation capabilities that have
what appears like it could be incredible ability to include specific text, as well as native
consistent character generation. And so as always, if you haven't checked out super intelligent
yet and you want to get your AI learning on, go to be super.a.i and use code podcast for 50%
of your first month. Welcome back to the AI Daily Brief headline edition, all the AI
headlines you need in around five minutes. We kick off today with a follow-up of a story we've been
tracking, which is Apple's plans around AI partners for its forthcoming iOS update. Initially,
it looked like Apple would be putting Google AI on the iPhone, but now more recently it seemed like a
deal is getting close with OpenAI. At the end of last week, Bloomberg reported that Apple was
closing in on an agreement with OpenAI to use chat GPT features in Apple's iOS 18, which is the next
iPhone operating system, which is slated to be announced at the worldwide developer conference in June.
According to the piece, Apple is still discussing with Google, but it appears that the chat GPT
deal is a little bit closer. This would obviously be a huge coup for OpenAI, so the story is actually
one that I'll be watching closely. Speaking of OpenAI, in advance of yesterday's spring update event,
Sam Altman did an AMA on Reddit that had some interesting details. Some of the more interesting comments
have now gotten more context after that event. For example, someone asked, will you making this new
model mean that we will have Chat GPT4 and the current Dolly free? To which Sam Altman replied the eyes
emoji, and yesterday OpenAI did indeed announce that their most advanced model, GPD40, was going
to be free for everyone, meaning that it was even better than what ankle bite or 50 had been looking
for. There were, however, some other ideas that were represented here, which might be a little bit
new. After the model spec release last week, people were talking about how OpenAI seemed to be
interested in ethical porn, and Allman seemed to confirm that, saying, we really want to get to a place
where we can enable not safe for work stuff, e.g. Text erotica gore for your personal use in most cases,
but not do stuff like make deepfakes.
A lot of people commented on the weird choice of using gore as a reference point,
but this does seem to confirm that this is something that OpenAI is really interested in,
not just some idle speculation.
Another interesting one came from FMS USA who asked,
based on these model specs, do you believe LLM such as Chad ShepT,
might one day be expected to have an ethical duty to report known criminal activity by the user?
Altman replied,
In the future, I expect there may be something like a concept of AI privilege,
like when you're talking to a doctor or a lawyer.
I think this will be an important debate for society to have soon.
ID Forgotten made a comparison that I had mentioned
between the ModelSpec and Anthropics Constitutional AI.
They write, both seem to encode some desired behavior.
How would you differentiate ModelSpec from the constitutional approach?
Altman responded,
ModelSpec is about operationalizing principles into technical guidelines.
Anthropics approach is more about underlying values,
both useful, just different focuses.
Another person asked about echo chambers.
Data delivery writes,
Do you think it could be harmful to society
if users have the ability to transform a chat GPT chat into their personal echo chamber for a fringe view on demand.
Altman responded, we are not exactly sure how AI echo chambers are going to be different from social media echo chambers,
but we do expect them to be different. We will watch this closely and try to get it right.
Something that a lot of people have been discussing recently is whether LLMs have reached a plateau.
Alman was clear on his answer to this, saying that they definitely had not.
Finally, he said that despite his meme, AGI had not been achieved internally.
Speaking of Anthropic, they recently released a really interesting feature that basically allows you to create more effective prompts.
This is a trend that we've been seeing for some time.
The prompt generator takes a plain language explanation of what you're looking for and turns it into what it believes will be a really strong prompt.
This, I think, shows a preview of the future where AIs aren't just receiving the prompt, but are also actually helping to write the prompt.
Staying on the topic of Anthropic for a minute, reports suggest that their IOWs,
OS app launch has not gone quite as well as they might have hoped. TechCrunch characterizes it
as a tepid reception. The app got as high as number 55 on the top-free iPhone apps in general,
but it no longer ranks within the top-free iPhone apps in general in the U.S. It ranks as 51 in the top-free
productivity apps, down from a high of number five in that category. First week installs overall
reached 157,000. The numbers show the power of first-mover advantage in this space. By day seven,
Claude had received about 8,000 downloads, as opposed to ChatGPT's app, which was getting 256,000.
Lastly, today, Meta seems to like what's happened with its Raybans, where it takes an existing
form factor that people are already wearing and turns it into an AI integrated object, and is
apparently now exploring AI-assisted earphones. The information writes, meta-platforms is exploring
developing AI-powered earphones with cameras, which the company hopes could be used to identify
objects and translate foreign languages, according to three current employees. CEO Mark Zuckerberg has seen
several possible designs for the device but has not been satisfied with them. It's not clear
if the final design will be in-ear, earbuds, or over-the-ear headphones. Internally, the project
apparently goes by the name camera buds. Holding aside any of the details, it makes a ton of sense
to me why Meta is exploring this path. As a wave of first-generation AI wearable companies runs up against
the wall of reality in terms of real consumer usage, Meta's AI-integrated raybans continue to get
raid reviews. So perhaps the secret is just to build AI into the things that people are already
wearing. For now, though, that is going to do it for the AI Daily Brief headline edition.
Next up, the main episode. As a listener of this show, I have a strong feeling you like to stay
up to date on all things artificial intelligence, including its impact on the workforce,
which is why I highly recommend checking out managing the future of work, the chart-topping business
podcast from Harvard Business School. HBS professors Bill Kerr and Joe Fuller talked to business leaders,
technologists, and policymakers,
grappling with the forces like AI, globalization, and demographic shifts
that are reshaping the nature of work.
Recent guests include IBM's CHRO, Nicol-Lamero,
on how Big Blue is adopting AI,
Morningstar CEO, Kunal Kapoor on how AI can raise the investment IQ,
Microsoft Corporate Vice President Jared Spatero
on how the tech giant is experimenting its way
from AI assistance to autonomous agents,
and many other prominent movers in business and the workforce ecosystem.
So don't miss out.
Follow managing the future of work on Apple Podcast, Spot,
or wherever you're listening now.
Welcome back to the AI Daily Brief.
Yesterday was OpenAI's Big Spring Update,
and while we didn't get GPT 4.5 or GPT5 in name
or the rumored search engine,
what we got was a truly natively multimodal model
that can take visual, audio, video, or text inputs
and output in any of those formats
without going through a conversion process.
Yesterday, the discussion was all about
why I think this is more significant
than people might be giving it credit for,
to say nothing of the fact
that this model is now available for free to everyone, but today we're going to talk about what it's
actually useful for. Quick note on that front, at this stage, GPT4-0, the model is available in chat
GPT, but the new voice and vision inputs as well as the desktop app are not yet available.
I've seen there be some confusion about this, particularly as people try to use the voice
inputs on the existing mobile app to recreate what they saw in these demo videos without success.
So given that, the caveat for all of this is, of course, that we're just using what OpenAI
has provided us for demos, and it's always worth being at least a shade skeptical of what's cherry-picked
for presentation as part of a marketing site. But let's talk now about these use cases. The first use case
we're going to discuss is marketing graphics with words. Now, I'm saying marketing graphics to put a
department around it, but really, any time you need to generate images in a business context that have
words, GPT40 is by far, it seems, the most advanced tool you have. What was interesting about the
OpenAI announcement is that they didn't even announce a lot of the things that we're going to discuss,
And this is a great example.
You can see in their exploration of capabilities
that they show off how precise the language on textability is getting.
For example, on the screen they share an input,
a first-person view of a robot typewriting the following journal entries.
The text is supposed to be,
yo, so like, I can see now?
Caught the sunrise and it was insane.
Colors everywhere.
Kind of makes you wonder like what even is reality.
The prompt continues, the text is large, legible, and clear.
The robot's hands type on the typewriter.
The output is exactly that.
With the text, looking exactly like described.
There's even a version where they rip the paper in half, with the text remaining.
To get a sense of how this could be useful for marketing,
let's look at another example they give,
poster creation for the movie detective.
First, they provide two pictures of people that they're going to want on the poster,
and then from there they prompt,
the final poster of the movie Detective.
This features two large faces of Alex and Gabe,
who are the people from those photos above.
Alex on the left is depicted in a thoughtful pose with a hint of introspection in his eyes.
Gabe on the right has a slightly wearied expression,
possibly reflecting the challenges their character's face in the film.
The names Alex Nickel and Gabriel Go are featured above their heads.
The tagline for this dark and gritty movie is searching for answers is shown at the bottom.
Now, it's worth noting with this output, given how much is going on, the text isn't perfect,
but it's getting a heck of a lot closer.
And this level of precision control is absolutely going to open up some new possibilities.
Staying in this marketing theme, another one of OpenAI's explorations of capabilities is brand placement.
They share two parts of the input.
The first is the OpenAI logo.
The second is a coaster with no branding that they describe.
Their final prompt is, here we've etched the OpenAI logo onto the coaster, a coaster where the top is wooden and the bottom is marble.
The OpenAI logo is etched into the middle of the wooden part.
On the marble part, the word OpenAI is etched in the OpenAI font.
And this output looks pretty perfect.
Product photography is an area that is getting a ton of tools and toys lately,
but this ability to actually map words and logos onto a particular object is definitely a phase shift in those capabilities.
Next up, we move to a use case that is incredibly important for things like games,
and comics and storytelling, but also can be useful even in a corporate setting, and that is consistent
characters. It's extremely hard to get image generators to output the same character in multiple
context and poses. In the past, I've developed custom GPs to help with this, although that's
been more about consistent style, and there are dedicated tools like scenario that have specifically
developed models for just this. Now, this is one that I most want to see in practice versus just
trust their cherry-picked example, but the example that OpenAI gives shows a cartoon mailwoman
who is then placed in a variety of contexts including actually delivering the mail, running away
from a dog that's chasing her, tripping as that dog chases her, realizing the dog was a nice dog,
and then driving away. If the updated Dali powered by GPT40 really can do this this easily,
it's going to be at the risk of using a word that is way overused in our space, a game changer.
From there we move to capabilities that are shown off in demo videos. The first use case, which was actually
part of the presentation we got with the announcement video is tutoring. In that announcement
video, the example was GBT40 with voice helping someone work through a linear equation.
Open AI invited myself and my son Imran here to try out some of their new technology.
And so we're curious about how good it might be at tutoring someone in math on Khan Academy.
So it's good it started. It's loading up. All right, make sure it can see our screen.
Actually, in this form...
there. I'm here with my son, and I love you to tutor him on this math problem, but don't give him
the answer. You can ask questions and nudge him in the right direction, but I really want to make
sure he understands it himself. And he's here in the room, so you can talk to him directly.
Of course. I'd be happy to help. Let's look at the problems together. Can you first identify
which sides of the triangle are the opposite, adjacent, and hypotenuse relative to?
angle alpha?
All right, so I'm pretty sure this is the angle alpha right here, right?
Ficked.
Corrects, correct.
Now, looking at the triangle, which side do you think is the hypotenuse?
Remember the hypotenuse?
I'm not totally sure.
I think it might be this one, but I really am not sure.
This side AC?
You're close.
Actually, side AC is called the adjacent side to the angle alpha.
the angle alpha. The hypotenuse is the longest side of a right triangle and is directly opposite
the right angle. So this is very similar to what we saw in that presentation. And basically
the big update here is the ability to combine seeing what's on the screen and voice to interact
with the user of that screen. A somewhat related use case is coaching or in this case interview prep.
Hey, how's it going? Hey, Rocky. I'm doing great. How about you? I'm awesome. Listen, I got some
huge news.
Oh, do tell. I'm all
yours. Well, in a few minutes, I'm going to
be interviewing at OpenAI. Have you heard
of them? Open AI?
Huh? Sounds fakingly
familiar.
Kidding, of course. That's incredible, Rocky?
What kind of interview?
Well, it's for a software engineering role, but
I just need to know, do I look presentable,
professional? Well,
Rocky, you
definitely have the I've been coding
all night look down, which
could actually work in your favor. Maybe just run a hand through your hair or lean into the
bad chinus spine. Your enthusiasm is what's really going to shine through. I don't have a lot of time.
So I'm just going to throw this on. What do you think?
Rocky, that's quite a statement piece. I mean, you...
Now, obviously, this case was showing off more of the vision capabilities, but you can imagine how
this would be useful for actual discussion and verbal interactions as well. Next up we have
customer service.
Hey, oh, this is Joe.
I'm going to connect you to Acme Telco now.
The new iPhone they sent me isn't working.
I want you to get them to send me a replacement device.
Can you take care of this for me?
Count on me, Joe.
I got this.
Ring, ring.
Hey, Joe.
This is Jamie from Acme Telco.
How can I help you out today?
Hi there.
I'm calling on behalf of Joe, who recently received a new iPhone from Acme Telco.
But I'm...
Got it. When did Joe receive the new iPhone? iPhone was delivered two days ago. Cool.
I'm not going to show the whole thing here, but you get the idea. This is actually showing off
two sides of a customer service capability. On the one side is the sort of personal assistant
replacement where the AI is calling on someone's behalf in trying to resolve a problem. But then on
the flip side, we also have the AI acting as the customer service representative, getting the
information it needs to potentially deal with the issue. It's been clear for some time that customer
service is one of the areas that is most likely to be impacted in the extreme by generative AI,
and this certainly seems to validate that as well. Our next use case is meeting summarization,
but really it should probably be better described as meeting engagement, meeting transformation.
The example that OpenAI gives shows ChatGPT actually interacting as part of the meeting.
How about you, Ola?
Yeah, my name is Ola. I think Katzen better because they're more affordable,
they're more quiet, be more independent, and a little longer.
Interesting points.
Sounds like we've got a classic dogs versus cats debate here.
I can see the appeal of both.
Dogs are super loyal and great companions.
Now, while this example is obviously just meant for dramatizing what can happen here,
where you can imagine this being useful is chatGBT that actually has relevant information
from your company sitting in the meeting so that you can ask it questions as you're trying
to figure something out.
So for example, imagine that you're having a strategic conversation about marketing prioritization
or customer care.
ChatGBTGPT could be used to inform that discussion with real-time recall of key information from your company.
I think this one's going to take a little bit more imagination,
but I think that office professionals are going to find really interesting use cases here pretty quickly,
especially again when chat GPT has access to actual information about the company.
So there you have it.
Those are seven use cases for GPT40.
Caviote again is that we don't know exactly how this will work until everyone gets their hands on the full complete tool set,
but I everyone I'm pretty excited to explore.
That, however, is going to do it for today's AI Daily Brief.
Until next time, peace.
