Everyday AI Podcast – An AI and ChatGPT Podcast - EP 474: GPT-4.5 hands on - What it can do and how you can use it
Episode Date: March 4, 2025You know the details. But do you know what it can do? OpenAI's newest model, GPT-4.5, is out and very impressive. But do you know the best use-cases on how to use it? Join us and find out. Newsle...tter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on OpenAI Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Overview of GPT-4.52. Comparison of Models3. Tools Available in GPT 4.54. Demonstration and Analysis of GPT 4.5Timestamps:00:00 Exploring GPT 4.5's Usability04:14 "GPT-4.5 Comparison and Basics"08:18 GPT 4.5's Surprising Success12:04 Adopting AI Platforms for Business13:47 GPT 4.5 Tools Overview16:27 "GPT-4.5: Relatable and Reliable"21:21 GPT-4.5 Uses and Benefits30:40 "GPT-4.5: More Concise Responses"33:38 Preference for Second Version45:25 Boost Success with Specific Goals49:35 Avoiding Compound Sentence Overload51:16 "Four Five's Superior Content Structure"55:12 Enhancing Communication with AI's EQKeywords:GPT 4.5, large language model, OpenAI, emotional intelligence, benchmark, AI news, chat GPT, generative AI, live demo, relatable AI, reliable AI, GPT 4.5 capabilities, AI podcast, AI livestream, AI newsletter, AI model comparison, IQ test for AI, non-reasoning model, chat GPT tools, Canvas mode, AI in business, GPT 4.5 user experience, email writing with AI, motivational message, LLM arena, AI in content writing, AI for coding, content writing techniques, AI empathy, AI emotional cues, AI reliability.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
GPT 4.5 is officially the world's best large language model.
But how do we use it?
Right.
This is something that a lot of people have been talking about for the last few days since OpenAI
released its first big updated base model in more than two years, right?
because two things that we're going to talk about today and we're going to do a live demo as well
about how this new model, it's more relatable and more reliable.
But that's really just begged the question, okay, what does that mean for actually how we use
it?
What is it actually better at?
And when we talk about a model excelling in emotional intelligence, I mean, you can't
really benchmark that.
So how do you actually know when you might want to really take advantage of this new GPT 4.5 model?
All right.
We're going to be answering hopefully those questions and a lot more today on Everyday AI.
What's going on, y'all?
My name is Jordan Wilson and I'm the host.
And Everyday AI, it's for you.
It is your daily live stream podcast and free daily newsletter,
helping us all not just understand AI, but how we can use it to actually grow our companies
and grow our career.
Yeah, when all these new models come out seemingly every single week,
you might be wondering, is this for my department?
Is this for my company?
Well, hopefully, at least after today's show,
you'll have a little bit better of an idea,
at least when it comes to GPT4.5, Open AI's newest model.
All right.
So if you're new here, thank you for tuning in.
This thing's live.
It's unscripted.
It's unedited.
So, you know, I try to bring you something real in artificial intelligence,
which I think is rare nowadays, right?
Everyone's creating all these super polished rundowns of, you know, models and, you know,
using AI, even avatars even, right?
Like, this is real.
So, you know, if you are listening on the podcast, thank you for tuning in.
Please make sure to subscribe to the show, leave us a rating.
That would be great.
And join the live stream, right?
Yeah, we have real, other real humans, you know, hanging out here in the live stream.
So thanks for everyone joining us, Max, Holman down in Chicago.
big bogey face on the YouTube machine.
Douglas, Woozy, Sandra.
Let's see, who else?
Christopher, Brian, Marie.
Thank you all for joining.
All right.
So I am going to need your help.
All right.
I don't know if these comments went on YouTube.
Maybe they did.
Maybe they didn't.
All right.
But I listed 13 different questions.
All right.
I want you all to go through.
I have them on my screen here.
I'll show them in a bit.
they're super small.
Just write down the number of which one you want.
All right.
So just say, hey, I want to see number five.
You know, I want to see number seven.
All right.
So live stream audience, I need a little help from you guys.
If you scroll through the comments, hopefully you should see it.
I do have a slide up here later.
It's super small.
But let me know which one you actually want to hear.
All right.
Let's get into it, y'all.
So if you want the daily AI news, sorry, go check the newsletter.
Didn't have time to put it all together this morning, if I'm being honest, because I was putting in a lot of work on this show.
I don't sleep a lot if you haven't noticed, you know, live stream audience by looking at me.
I don't sleep a lot.
Yeah, I have AI go do a lot of my homework, but it's a lot.
So if you do want the AI news, we're going to have that in the newsletter.
Don't worry.
And this is also as an FYI.
This is part two.
So I specifically, you know, I noticed and I heard from a lot of you all, right?
Like, you're like, hey, these shows are too freaking long, right?
I don't need an hour, 10-minute podcast on a new AI model.
So we actually broke a bigger show down into two separate parts.
So if you want to hear the first part, where we went over a lot more of the technical detail,
we went over some benchmarks.
We went over a lot of those other things for Open AIs new GPT-45 model.
So if you want that, go listen to Episode 472.
So, you know, you can just scroll like two episodes back, two episodes back.
That one is called Open AIs new GPT4.5, what's new and who can benefit the most.
But today we're going to be looking at really a comparison.
We're going to be going in and actually using this model, hopefully with some feedback and
suggestions, some suggestions from our live stream audience.
But let me just go over some of the basics.
Some of these we already covered in the previous show, some things we didn't.
So here's kind of some talking points from Open AI.
So they are reiterating that GPD 4.5 is a research preview.
It is their largest and best model for chat, all right, for chat.
It is a step forward in scaling up pre-training and post-training.
And by scaling unsupervised learning, GPT 4.5 improves its ability to recognize patterns,
draw connections, and generate creative insights without reasoning.
All right.
Early testers, early testing shows that interacting with GPT 4.5,
feels more natural.
It's broader knowledge-based, improved ability to follow user intent and greater EQ, emotional
intelligence, make it useful for tasks like improving, writing, programming, and solving practical
problems, and they also expect it to hallucinate less.
All right.
So this is kind of some of the bullet points that I said on our first show.
You know, this is the last non-chain of thought model developed by Open AI.
So Open AI CEO, Sam Altman, did say, hey, future model.
that are under this GPT5 kind of architecture,
it's going to be a hybrid model.
All right, so keep this in mind.
But this is essentially a new base model.
All right.
And when we talk about reasoning models like 01 and 03, right,
so we might not actually, you know, see an 04 as an example, right?
Just like we might not, you know, see, you know, certain minis, 04 mini, right?
It just might all be under GBT5.
We don't know yet.
they might say it's GPT5 and it uses 04 reasoning, right?
But in the future, you're just going to be working with one model.
And that's why this is extremely important.
I think a lot of people were kind of like upset, right?
And they're like, oh, this GPT 4.5 didn't break every single benchmark, right?
This GPT 4.5 is extremely expensive in the API.
Yeah, I don't know any company that is going to be able to afford to use this in the API, right,
for like 75 input and 150 output per.
million tokens, which is just, you know, 30 times more expensive than their previous GPT4 models.
So, but I did say after the first show, I said, humans are really going to like it, right?
Because this is, I think, I've always, you know, anytime you go chat, put unquote, chat with any AI model, I don't know, to me, it's never felt human.
feels like you're chatting with a machine.
GPD 4.5 is the first time I felt different, right?
To me, it doesn't feel like I'm chatting with a human.
I don't know what it is about me and how I tick as a person, right?
I know, right?
Yeah, it's like, yeah, I'm talking to a computer still, but it felt like real human.
It felt like a real human computer person, right?
I know a lot of people like reading people's first experiences, they're getting
enthralled in GBT 4.5's ability to carry on a conversation and to show kind of these
EQ tones that make humans human, right? People are always like, hey, what, you know, what separates
humans from AI, from humans from large language models? And widely, those things are usually
considered things that are EQ, right? Emotional intelligence, being able to understand nuanced
conversation, right? And right now, at least from a tax standpoint, GPT 4.5 is doing amazingly well.
And I did predict that humans are going to like it. And sure enough, humans loved it.
Because even though GPT 4.5, and again, this is a preview, even though it did not break every
single benchmark ever, right, which is what I think a lot of people were expecting or were hoping
from this model, what it did do in the LM arena.
So I talk about this.
The easiest way to think about this is, you know, those like blind, like Pepsi versus Coke tests, right?
From, I don't know when that was, the late 90s, early 2000s, right?
Someone goes, there's no label.
They drink both and they say, oh, this one's better.
Right.
That's kind of what ELO rankings are or the arena score from LM Arena.
All right.
So what this means is you put in a prompt, you get two different outputs.
They're blind.
you know, so you don't know which one's which and you choose which one is better. So this is,
at least in terms of how humans actually use a model, right? Yes, we have dozens of benchmarks that
test different things from coding to writing to math to science, right? So you have all of these
kind of, you know, systemized and organized and categorized benchmarks, but it's always like,
well, what about humans, right? Do humans care? Well, humans notice. Well, with,
GPT 4 or 5, the answer is yes, right?
Because it quickly shot up to the number one spot in the LM Arena Board.
So think of this, you know, every single model out there is in this, right?
When you go into this blind taste test and GBT 4.5, immediately once they got enough votes to rank on the chart, they were number one.
So the best model in the world.
I do know a couple hours later, you know, Grock's newest version came on.
So I think it's technically in a tie now.
But still, even without smashing every single benchmark,
this new model, GPD 4.5, just elevated itself to, I think,
probably the most preferred general use case model in the world,
which is extremely important because, like I said, in the future,
these reasoning models are going to be built on top of this.
All right.
So let's talk a little bit before we jump in live.
in live stream audience, I see a couple of you voted if you could go through.
Let me know which one, which one you want to see.
I agree with Douglas.
Douglas said Jordan needs coffee.
That was me sitting on the coffee.
But let's talk about a little bit about the model itself and how it performs inside of chat GPT, inside of the chat GPT interface.
Also, FYI, let me get this off my chest, right?
Because there's people online, you know, and they're like, oh, I tried this model.
It's in, you know, and I'm like, oh, okay, how did you try it?
Oh, via a third party.
You can't do that.
All right.
So I do have to talk about access because at least as of this hour, GPD 4.5 is only available
for pro users inside of chat GPT.
All right, that does cost $200 a month.
But presumably either by this week or early next week, that will be going to all paid chat
GPT subscribers.
So even if you are on the $20.
a month, chat GPD plus, you should be getting access to GPD 4.5.
So you might not have access to it now, but I highly encourage you.
Yes, there's third-party platforms.
You know, if you're on paid versions of other, you know, perplexity or PO or something
like that, you can probably go use 4.5 in a limited capacity if you're on a paid plan
for one of those services.
But that's not the best way to understand a model, right?
You should be using it in its natural environment.
So, yes, there's also the API that's available.
that's extremely expensive. All right. So if you are trying to see what's best for your team,
and a lot of times I don't understand why every single Big Fortune 500 in America doesn't have at
least a Teams or an enterprise chat GPT account. It's mind-boggling to me because, yes,
your company can have an internal version, right, that maybe you use for customer support or
sales or something like that, but you should all, you know, and I'm not just saying chat GPT,
but you should every single employee if you are a CEO of a small to medium sized company if you will if you
are a in an IT leader if you're a CMO whatever you need to be pushing for your entire team whatever
your AI operating system of choices you need to also have a full team or enterprise license whether
that's chat GPT Gemini obviously co-pilot right if you're a windows organization Microsoft organization
quad whatever it is but because when you are
using these models inside the chat interface, they come with a lot of tools, right?
I did a show probably about a year ago.
I should update it, you know, kind of like, hey, what needs to happen, you know, for
us to get to something like artificial general intelligence?
And one of the things is, number one, a model needs to have access to the internet.
And number two, it needs to also have tool use, right?
So this tool use, this occurs inside of the chat GPT interface.
And yes, third parties sometimes have versions of some of these tools, but I mean, the tools are best in the native interface.
But right now, at least, not all of the tools and features work with GPT 4.5.
So let's go over what does work and what is available versus what isn't.
Because OpenAI didn't say this.
I went through and tested it all for you so you know.
So, again, whether you have a pro account now or you're going to be getting GPT 4.5 in the coming
days or weeks, here's what's available out of the box.
Okay, so projects.
You can use GPT4-5 in projects.
You can use Dahl-E, you know, the AI image generator, which I don't know why anyone
would, right?
It's not that good, although it will be getting updated soon.
SORA does have photo capabilities for early beta testers, just FYI.
And also, if you don't know projects, that's essentially where you can organize chats
into one folder, which is great, but you can also upload documents that that fold
their chats can access to as well as special instructions.
So it's similar to GPTs a little different.
So GPT4-5 does have access to projects.
It does have access to Dali.
It does have access to chat GPD search,
which is extremely important because actually the knowledge cutoff
for GPT 4.5 was rolled backwards.
So its memory is a little worse,
or at least it's the recency in the training data.
So GPT40 is June 2024.
GPT 4.5 is October 2023.
So keep that in mind.
And that's why it's important that GPT 4.5 has access to chat GPD search.
You can upload files to GPT45, which is a must.
So glad that's there.
Also, Canvas mode, one of the most underrated, I think, tools or functionality of any, you know, AI,
you know, large language model company out there.
Canvas is available.
So right now, unavailable, and this is as of the time of, you know, when I checked 19 minutes ago, okay, right now tasks do not work with GPT-4-5 and GPTs.
So those custom small versions of chat GPT that you can create doesn't work right now with 4.5.
Both of those things, both tasks and GPTs, those obviously still work with GPT4.4.
All right.
So let me just boil this out of two things.
I wish OpenAI would just put these two words somewhere very large on their GBT4.5 page because a lot of people are asking.
Two things that I think really separates the biggest difference between four or five.
And this is in my experience so far.
It is more relatable.
So more human-ask, right, that EQ.
And more reliable.
So we went over the reliability.
a little bit in our first show, going over benchmarks, accuracy, lower hallucinations,
etc.
It just knows more.
It knows way more.
Actually, there was a, you know, there's a website that does, sorry, IQ scores for large
language models.
And GPT-4-5 actually got the highest score for a non-reasoning model, which is pretty
impressive because that was the first time a non-reasoning model performed at the same IQ level.
as the average human, right, which is pretty big, right?
When you have a reasoning model, it does way better because it uses more compute.
But the fact that a non-reasoning model in GVT4-5 scored this high on an offline IQ test, right?
So this is an IQ test that is not in training data.
It's pretty impressive.
So it is definitely more reliable, but it is also much more relatable because the emotional intelligence.
So this is from OpenAI, but more natural.
human-like interactions than GPT-4-O.
It's better at reading and responding to emotional cues,
and it is preferred by users as well against GPT-4-O.
All right.
So we're going to jump in.
We're going to jump in soon.
All right.
I know this is small on the screen.
I don't know if these comments posted to YouTube as well,
but I'm going to go ahead and try to send them again.
So live stream audience, I know a couple of you guys have already voted.
I don't know if these comments are coming through.
hopefully they are, but I have 13 essentially examples.
All right.
And I want to do these live.
I want to maybe do two or three.
We'll see how long it takes.
And I want to show you the difference, the difference between a query in 4-5 and a query in
4-0.
Full disclosure, haven't run any of these yet, right?
I run a ton of tests, but I like to do this live.
This is unedited, unscripted, right?
So live-stream audience, just put the number.
Try not to put anything else.
Just put the number.
And I'm going to scroll through the comments here on the right side of my screen,
you know, bringing in comments from from LinkedIn, Twitter, YouTube, et cetera.
So which one do you actually want to see?
So I haven't done these.
And I'm going to read the prompt out.
These are very short prompts, right?
They're supposed to be short.
I'm not going to go through the whole like prime prompt polish process, which if you want
the best output, you should be doing the basics of prompt engineering still.
But I want to show you just hopefully some short prompt.
prompts the inputs and then the outputs.
And we're going to run this in GPT 4-5 and GPT-40 and talk a little bit about the differences.
And hopefully we'll see the difference.
Who knows?
Maybe we won't.
That's the downside of trying to do unscripted, uned edited demos and examples inside generative AI.
So maybe if you're brand new to chat GPT, large language models, generative AI, and you don't know a ton, that's fine.
we try to keep it simple, but let me say this.
Generative AI in large language models, they're generative.
They're not deterministic.
So what that means, as an example, if you go search for something on the internet,
search engines are, for the most part, deterministic, right?
Yes, there's some personalization and some localization,
but for the most part, those search results are going to be roughly the same
every single time you put them in.
A large language model is complete.
It is a roll of the dice.
It is generative, right?
It is, you know, there's some next token prediction.
So, you know, you could in theory put the same prompt 10 times.
You could get nine very different answers.
You could get two very different answers.
You could get five things that are pretty much the same, but just worded differently.
So that's another thing to keep in mind.
Generative AI is generative, right?
Which is why sometimes these live demos are super fun.
All right.
I see some votes here.
All right.
I'm looking through here.
I'm seeing which are some of the most,
some of the most voted ones.
So I have 13 different examples on the screen.
And I really focused on a couple things.
So these prompts are supposed to,
you know, rely on creativity and intuition,
you know, so storytelling,
being able to think and write clearly,
strong in design and creative tasks.
But they're also really around these four categories
where I think EQ shines in a large language model.
Right.
So think.
So if you are using,
chat, GPT, as a personal or life coach.
So some of these prompts are more in line with that.
If you're using it as a therapist, if you're, or a work therapist even, right,
to work through tough problems, tough issues, how do I send an email, right?
Those things.
Content writer, business strategist, and creative partner.
So that's where I think some of the categories where the everyday person is going to
really see the benefits of four or five.
So let me repeat that.
if you're using this as a personal life business coach, therapist, content writer, business strategist, and creative partner.
Yeah, there's other things that it's actually going to perform really well in.
You know, I know a lot of people are saying, oh, 4.5 isn't great at coding.
It's actually really good at coding across the board, like across the LM Arena test.
It like swept everyone in almost every single category.
So it is measuredly better in almost every single category that you would use a large language model for.
I think hopefully we'll see the biggest improvements in some of these areas.
All right.
So let's go ahead.
Let's see if we can do this.
Live stream audience, as always, please let me know when you can see my screen.
We're going to do this live podcast audience.
I'm going to try not to make this one too long.
I'm going to try to be somewhat concise.
All right.
So as a reminder, if you have a normal chat deep D plus plan and you log on it today,
you're not going to see this 4.5, all right?
But when you do, I'm guessing within a couple of days to a week or two,
this should be rolling out to most paid users.
So chat ChbT Plus, chat ChbT teams, chat EDU, as well as Enterprise.
I think Enterprise might be a little after for those companies that hire us to train their large teams.
You might not be getting Enterprise in your PPP biz training, at least not in March.
All right.
So we need to select GPD 4.5 in the drop down.
All right.
So last dream audience, let me know if you can see this.
I'm looking to see which one, some of our most popular ones.
Okay.
Adobe just introduced an entirely new way to create,
bringing the power and precision of its creative suite into one conversational experience.
Meet Firefly AI assistant.
Now live in the Adobe Firefly app, the all-in-one creative AI studio.
powered by Adobe's creative agent, Firefly AI assistant lets you start with your vision,
just describe what you want, and shape the outcome as it takes form with the assistant.
The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across
Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more
to help bring your ideas to life.
You can also get started with creative skills, a growing library of pre-built workflows for
common creative tasks like batch editing photos, creating mood boards, portrait retouching,
and creating social variations.
Every step the assistant takes is visible so you can refine, redirect, or take over at any time.
You stay in the driver's seat as the creative director.
Adobe Firefly AI assistant now in public beta.
See it today at firefly.adobie.com.
All right.
I see quite a few people.
And again, both on LinkedIn and YouTube, it looks like in a couple of
on Twitter. Okay, so I'm seeing a lot of threes. I'm seeing some sixes and 13th. Let me go up earlier.
Five, 12. Okay. 9, 11, 7, 12. Okay. So I see, I see quite a few 12s. All right. So let's go
ahead and grab number 12, shall we? All right. Lives in your audience. Can we see? Can we see the screen?
All right, let's do number 12.
Let's see what number 12 was.
Okay, people, people, people were going for humor.
I wrote this one and chuckled.
All right.
So here's what number 12 was.
People wanted to see this.
All right.
It's saying you work at Microsoft, write a convincing memo on why Microsoft Zune should be reinstated.
Y'all remember Microsoft Zune?
All right.
So actually what we're going to do is we're going to first do this with 4.0.
I'm going to read the response.
Then we're going to do the same with 4 or 5.
All right.
And we'll see how long it is.
All right.
Now, I'll try to do my best to even describe what's going on.
So unfortunately, chat GPT is being extremely slow right now.
So this is going to be fun.
All right.
So GPD40 is giving us a memo.
It is emoji heavy, which I'm not a fan of.
And also, I intentionally left these prompts kind of open-ended, right?
Because I'm doing this for everyday people, right?
everyday people write it in a super short prompt and they hope to get a lot out of it.
And I think, I think and I hope maybe that's where GPT 45 shines.
Okay.
So here's what GPT40 says.
Here's a compelling internal MO advocating for the return of Microsoft Zoom.
I really wanted to Microsoft Zoom when it first came out.
All right.
So like I said, there's a ton of emojis.
It gives a date to Microsoft Leadership Team from your name, your position, subject, the case for reinstating Zoom.
So it's pretty long so far.
It's going a little bit slow.
So we might have to cut it off here if we want to do two or three quick examples without
turning this into an hour show.
So it starts with an executive summary.
It says the Microsoft Zune, despite its untimely discontinuation, was ahead of its time
in fostering a seamless ecosystem of music, community, and content consumption.
Given the resurgence of retro tech nostalgia, the rise of the rise of.
dedicated music devices and the renewed focus on hardware-specific integration, Microsoft is uniquely
positioned to reintroduce Zoom as a modern AI-powered cloud-connected audio experience.
This memo outlines how a Zoom revival aligns with current market trends in Microsoft's broader
strategy in AI, hardware, and consumer engagement.
So so far, executive summary from 4-0 seems pretty good.
little wordy.
I don't know that we need to, you know, say things like this memo outlines, you know,
depends on, you know, how you would communicate to your coworkers internally.
I wouldn't put something like that in an internal memo saying this memo outlines.
But other than that, pretty good.
So let's go through.
So here's number one, it says the market opportunity.
Number two, it says the vision for Zune 2.0.
Again, we're seeing some emojis, which for the most part,
I don't think we'd see on an internal memo, but that's fine.
So it's going over key features of the Zoom reboot included AI powered music curation,
lossless and spatial audio support, Microsoft 365 and Xbox integration,
cloud sync and offline playback in Zoom Social 2.0.
I know I know we have some listeners from Microsoft.
So let me know if this is going to hit the shelves anytime soon.
Three, it goes over competitive advantage for financial viability and go-to-market strategy.
So pretty good.
It has a launch plan, a marketing approach.
Conclusion, a strategic appeal.
All right.
So I'll just read the intro and the conclusion for both.
So conclusion, the Zoom brand still holds strong, emotional value.
And with the right vision, it can be reintroduced as a powerful differentiator in Microsoft's consumer hardware and AI portfolio by integrating AI superior audio technology.
and cloud capabilities, the next generation of Zoom could redefine how people explore music and
content. Let's bring Zune back, not as a competitor to smartphones, but as the intelligent
music companion for the AI-driven era. All right. So not bad. You know, I wouldn't ever, you know,
use this necessarily. But, you know, my background, it's in content writing, right? I was a journalist.
and, you know, more or less, I've been getting paid to write for two decades.
So I wouldn't use this.
I'm sure some people would look at this and be like, oh, good enough.
Not necessarily.
All right, but let's go ahead and switch model.
So yes, if you didn't know at the bottom of any prompt, there's these little icons, right?
So once you do get access to four or five and you're trying to see which model should be my daily driver,
I encourage you to run tests like this.
You can also do this in the back end of ChatGPT's kind of playground.
it will cost you on the API side.
So I would probably just wait if you don't want to pay those super expensive prices.
All right.
So we're going to do the same thing with 4-5 now.
So I just clicked 4-5.
Interestingly enough, okay, this is great.
4-5 was, I'm guessing, smart enough to know that I would probably want to use Canvas mode.
So it's, uh, love this, love this, right?
Uh, super smart, FYI.
Uh, so it started answering it in canvas mode, even though I didn't tell it to, uh,
4-0 did not because it probably knows, right, as a memo, it probably knows I'm going to
want to go back and forth, kind of like how I said, eh, I wouldn't use this.
Uh, it, it probably knows that, hey, you're going to want to update this.
So let me go ahead and throw this in canvas, even though you didn't ask me to, right?
and then it has the option here to click to answer in the chat instead.
All right.
So let's go ahead and take a look.
We're going to read both the intro and the summary, much more concise.
So personally, I like that.
Maybe it's because I use AI too much.
And I think, you know, let me just go on a small and short soapbox here.
Large language models are trained to be helpful assistance.
right? It's in their system prompt. And usually what happens if you don't go through this,
you know, what we teach like the refined Q process, right? And that's when you make a large model
smarter, smaller, and more specific for all of you that have taken our PPP course, right? For the
most part, a model doesn't necessarily know exactly everything that it would want to know in order
to provide an answer. So this is something I've seen from four or five and I personally appreciate.
but this might be a preference thing.
GPD 4-5 is much more concise, right?
Where 4-0 and previous models,
they just get super verbose, right?
Like, and this is something that large language models do all the time.
If you don't give it a lot of detail,
if you don't go through a back-and-forth process of essentially,
you know, you having a conversation with, you know,
4-0 or 4-5 about the Zoom,
and here's my role, here's my reasoning,
here's my rationale.
A lot of times it just is going to,
spit out a bunch of overly kind of generic content.
And it's going to try to tackle thing from too many different areas,
but it's not going to be super strong.
So that is something that I've noticed in my,
I won't say heavy use of GPT 4 or 5,
but I mean,
I've used it fairly often.
I know it just came out like Thursday,
right?
And today's Tuesday.
So,
you know,
a big chunk of that was the weekend.
And I was trying to sleep for the first time in like a week.
But I've used it very heavily since it came out.
And this is something I've noticed.
Answers are much more concise, sometimes more confident, which can be a good or a bad thing.
You really just have to have an eye on the output, but it's much shorter, right?
Another thing, I see no emojis, which I personally like.
All right.
So let's go ahead.
So here's the internal memo.
So it says to senior leadership team from your name, strategic product development, date, subject.
All right.
So let's at least read the overview, the conclusion, and then talk about what else is in here.
And hey, live stream audience, let me know which one also you think is better.
All right.
So overview, given current market trends and the growing nostalgia for early digital media devices,
I propose that Microsoft just strategically reinstates the Zune product line.
Originally discontinued in 2012, the Zune has since garnered a devoted fan base and iconic status,
becoming emblematic of a simpler digital age.
All right.
So then it goes over Market.
opportunity. It outlines strategic benefits. A lot of similar things that we saw from 4-0, right? So
brand strengthening, ecosystem expansion, talking about integrating it with Microsoft 365,
Xbox GamePass, Azure Cloud Services, that's cool, differentiation, the implementation
approach, going through initial market research and validation, modern product development,
marketing and launch strategy as well. And then the conclusion. So, reinstating
the Microsoft Zune is a strategic decision with significant potential upsides.
It positions Microsoft at the forefront of a profitable retro tech revival,
reinvigorates consumer interests, and strengthens brand affinity.
I look forward to discussing this further and obtaining your insights.
So I don't know if this, in this case, if there's one, you know, definitive winner and one
definitive loser.
For me, my personal taste would be the second one, right?
an internal memo, I mean, 4-0's version, super long, right?
So if I X out of canvas here, if you didn't know this at the bottom, you'll see now
one of two and two of two because I use that little model switching icon.
So I can just toggle left and right and compare the different outputs.
So for me, I much prefer the second version.
All right.
Let's see, it looks like some other people wanted number one.
So let's do number one.
I think this one might be a good example as well.
All right.
So we're going to launch a new chat.
We're going to start in 40.
We're going to do 40 first and then we're going to do four or five second.
All right.
So this one, a couple of you voted for, write an email to a colleague who missed a deadline due to a family emergency.
All right.
So we're doing GPT 40 first.
So subject line, hope you're doing okay.
Hi, colleague's name.
I just wanted to check in and see how you're doing.
I heard about the family emergency and I hope everything is okay on your end.
If there's anything I can do to help or support you, please don't hesitate to reach out.
I understand that the deadline project name was affected and I completely get it.
Family comes first.
Let me know if you need any assistance in catching up or if we should discuss a plan to move things forward.
Take care and looking forward to hearing from you when you are ready, best your name.
pretty good. I don't necessarily see, again, I'm not, let me be honest, I'm not the most empathetic
person, right? I'm not the, you know, I'll say my IQ is probably higher than my EQ. So I don't
see anything wrong with this email. Maybe it's, I don't know, maybe it's, it's, it's, you know,
setting off a bunch of red flags in your mind. All right, let's go ahead. Let's switch over to that GPT4.5.
And let's see if we can spot differences in how 4.5 would handle this. Presum,
it might be, I would hope it would be better, right, since, you know, 4-5 is supposed to be
much more relatable to humans.
All right.
So same thing here.
4-5 launched this in Canvas, which I like, because then you can use the Canvas features,
right?
As an example, you can highlight something.
You can suggest edits.
You can adjust the length.
You can change the reading level.
You can add polish.
You can add emojis, right, which I would never do.
But launching it in Canvas, again,
Again, small thing, super smart, but four or five assumes, right?
There we saw it twice.
The model says, oh, okay, well, you're probably going to want to add some of your own insights.
There's some placeholders in here, right?
If you want to get work done, we're going to launch this thing in Canvas.
All right.
So subject line, checking in and offering support.
Hi, colleague's name.
I wanted to reach out and let you know I've been thinking of you.
Okay.
I see one difference right away, right?
again, I don't know if this is this is good or bad, right?
But four or five is saying like, Ben thinking of you, all right?
I'm trying to jot some notes because I actually can't as easily compare to them in canvas mode.
I heard about your family.
And I'm truly sorry for, I'm truly sorry you're going through a difficult time.
Okay.
Please don't worry about the missed deadline.
Okay.
Interesting.
Don't worry about the missed deadline.
Your family and personal well-being are the priority right now.
If there's anything I can do to support you during this period or assist with any pending
tasks at work, please feel free to let me know.
We're all here to help you manage through this.
Take all the time you need and please keep me updated when it's convenient for you.
Okay.
So actually, now that I'm reading this, I'm seeing.
some nuances, right? Some small things, right? And again, this is not my area of expertise. I'm not going to be
hired as anyone's HR head anytime soon. But I do see a couple of things in 4-5's response, and I'm
kind of bolding them on my other screen right now. But let me know live stream audience,
which one was better? And do you see anything that you liked in 4-0 versus 4-5 or 4-5? Or 4-5?
versus, you know, four.
Let me know if one was preferable.
All right.
So here's a couple of things I picked up on.
So let's go back to 4-0.
So at least for me, 4-0 starts with saying, right?
Again, I didn't think there was anything necessarily wrong with 4-0's response until I read
4-5's response.
And I'm like, wait, okay, some things are better here.
So 4-0's response, the first thing it says,
says wanted to check in, right?
Yes, it says wanted to check in and see how you're doing.
But I think even when you read that email, right, if you're in that situation of a family
emergency and someone says, wanted to check in, it sounds kind of business, right?
It sounds, I guess, a little cold.
Whereas four or five says the first sentence, I wanted to reach out.
and let you know I've been thinking of you.
Okay.
There we, I mean, I mean, just that right there, I think you can hopefully see and realize
the, the bump in EQ, right?
And I think maybe that's where there's also, I don't know, in my mind,
I'm also, you know, trying to describe in real time the vibe.
The vibe of the 4-0 letter I'm getting now that I'm reading it is sympathetic.
you know, with a little bit of like, hey, let's get this project going forward.
Where 4-5, I think, is maybe a little more empathetic
and talking about working together to move something forward.
That's what I'm getting.
4-5 says, please don't worry about the missed deadline, right?
where 4 says, you know, I understand that the deadline was affected, where 4-5 says, please don't worry about the missed deadline.
Okay.
4-0, you know, to kind of move the project forward, says, let me know if you need any assistance in catching up or if we should discuss a plan to move things forward.
okay so again when i'm reading that by itself i'm not necessarily like oh this is bad all right and then
four five says if there's anything i can do to support you during this period or assist with any
pending tasks at work please feel free to let me know all right where even just saying please feel
free versus four oh just says let me know essentially about these tasks and then four five
again, it looks like showing a little more empathy versus sympathy and maybe prioritizing
the family situation where at least now, as I'm kind of comparing the two, you know,
it looks like 4-0 is just like wrapping up some sympathy and like, yo, let's get this
project going, right?
Which I don't know.
What do you guys think?
Denny says, 4-5 sounds like the person really does care.
and 40 sounds like I need to write this email to show I care.
That's a great observation from Denny Max says either one would work.
4-0 is what I usually would expect from the regular office people.
4-5 is superior EQ and empathetic more than usual office humans.
Yeah, that's what I'm saying, right?
Like when I first saw 4-0, I'm like, nothing wrong with this, right?
But then when I said 4-5, all of a sudden, I'm like, oh, okay, yeah, I can see how on
the, you know, on the human side, there's maybe some things that could have been improved
in this 4-0. Michael said, I would prefer to receive 4-5. I feel like I would write something
closer to 4-0. My gosh, Michael, we are the same, right? We are the same. I'm reading these
and I'm like, oh, yeah, I like that one. But I totally, I totally would have personally
written something like 4-0. Yeah, Sandra's saying 4-5 is more empathetic. Jonathan, what's up,
Jonathan,
Jonathan says,
4-0 minus the hope
your well seems more
HR-friendly,
while 4-5 just seems
friendly.
Yeah, I agree with that.
Yeah, I do think maybe,
you know,
4-0 is maybe more in line
with, you know,
maybe HR guidelines.
And 4-5 is probably something
that is going to resonate
with the human receiving it,
I think, much more.
Cecilia said 4-5 version
drops the need to move the deadline
forward and puts it on
a pause. 4-0 makes it clear that we will need to move things forward on the specific deadline
missed. Yeah. So I guess it ultimately depends on what's the most pressing thing, right? Is the
company going to shut down if this deadline isn't met? Right. So great observations. I'd love
doing this live, y'all, because you know, you guys spotted a lot of things. Robert from Twitter
saying 4-5 has more empathy in it. All right. I think we can do one more. So scrolling through here to see
one more we can do that people voted for. Okay, let me bring this in. So a lot of people wanted
number six. So let's try that. All right. So we're going now to a new chat. We're going to start
with 4-0. Here's number six. All right. So it says, draft a daily motivational message that
encourages personal growth with clear fact-based tips.
Not a lot of guidance.
We'll see what happens.
All right.
So first,
we're running with 4-0.
Lots of emojis.
So if you're personally motivated by emojis,
you can already tell,
4-0 is your model, right?
That's something I've noticed as well.
Not getting a lot of emojis out of 4-5.
All right.
So we can read this.
So here's 4-0's daily motivational message
that encourages personal growth with clear fact-based tips.
All right. Live stream audience, let me know after I read both of these, which one gets you
pumped up, which one of these is getting you ready to face the day.
All right.
So daily motivation, small steps, big growth, rocket emoji.
All right.
Every day is a chance to become 1% better.
Personal growth isn't about massive leaps.
It's about consistent small actions that add up over time.
Fact-based tip.
All right.
We're even labeling it here.
studies show that setting up specific achievable goals increases your chances of success by up to 42%.
Harvard Business Review.
I'm obviously going to Google that to see if that's real.
All right.
And it is.
All right, good.
So at least 40 did not hallucinate that part.
All right.
So studies show that setting specific achievable goals increases your chances of success by up to 42%.
Instead of vague intentions like get healthier.
Try walk 10,000 steps daily or read 10 pages before bed.
It's actually a great tip.
Actionable step.
Pick one small habit to improve today.
Whether it's drinking more water, writing down your thoughts, or practicing deep breathing for focus, small wins lead to big transformations.
Keep moving forward.
Your future self will thank you.
Emogies.
Hashtags, right?
Sorry, I hate reading this.
This is everything that's on social.
media right now, right? I want to bomb it as a content writer. I hate this. I hate anything
one-shotted from any large language model because most of it always reads like this, right?
You can get writing, chat GPT writing or AI writing to sound extremely human, but you're,
I don't care, right? Oh, people are like, oh, Cloud 3.5 or Claude 3-7 sonnet sounds so human.
GPT 4-0, GPD 4-5 sounds so human. No, for the most part. No, it sounds like this.
This is what I read on social media on LinkedIn, on Twitter, and I want to vomit my mouth.
There's nothing wrong with the actual tips.
I just, the content writing is getting me woozy, like Woozy Rogers.
All right, let's go 4-5.
So, again, draft a daily motivational message that encourages personal growth with clear fact-based tips.
All right.
Interestingly enough, I already see one emoji in 4-5.
I think forehead like, let's see how many forehead.
one, technically, two, three, four, five.
All right, so we had four emojis and two hashtags in four, oh, and then in four, five,
we had two emojis, no hashtags, so a little better.
All right, ready.
Here's four, five.
Today's motivation, embrace your growth.
Growth doesn't happen overnight.
It comes from small, consistent actions taken each day.
Here's your simple, fact-based guide to grow personally and professionally
today. Number one, learn something new, even just 10 minutes. Research shows daily learning builds
neural connections, improving memory and cognitive skills. Two, practice gratitude. Write down three
things you're grateful for daily has been scientifically proven to enhance overall happiness
and reduce stress. Number three, step slightly outside your comfort zone. Small challenges
creates confidence through psychological resilience, enabling you to
face bigger challenges more effectively.
Take a small step forward today.
You have everything it takes to grow, learn, and succeed.
All right.
Which one, live's your audience, which one has you pumped up and ready to run through a wall like the Kool-Aid man?
Oh, yeah.
Do any of these?
GPD 4-0, GPD-4-5.
Let's talk quickly about some of the differences.
Surprisingly, on this example, 4-0 was a little more.
more concise. It was a little too heavy on the emojis, a little too heavy on the on the hashtags.
Not a huge fan of this. One other thing is even if we're just looking at the quality of the content
writing, I think 4-0 was a little poor. There is no cadence or switch up in the sentence structure.
Yeah, I'm getting a little, you know, putting on my old writing hat, right? You always want
cadence in your written content. What that means, I try to throw cadence in my podcast.
right? I don't just always speak monotone. I don't always go in, you know, sentences that are,
you know, 15 to 20 words. I try to pause. Sometimes I talk slowly. Sometimes I talk really fast
and I have these long sentences that go together and there's no period. There's no punctuation
and I talk all excitedly. That's cadence, right? So, 4-0 has no cadence. It actually falls
into this compound sentence, right? So, yeah, we're talking about content writing now, but that's
something that I think is significantly improved in 4-5.
4-0s, you know, I know you're maybe not, you know, if you're listening on the podcast,
maybe this doesn't worry or matter as much.
But 4-0 is kind of the equivalent of watching paint dry when it comes to content structure.
Yeah, I was a journalist.
I wrote a lot.
For the most part, most of these sentences are it looks between 12 to 20.
words and the majority of them are compound sentences with an M-Dash. All right. So, yeah, all those people
are like, oh, you know, an M-Dash is definitely, you know, a sign of AI writing. Not really, right?
I was using M-Dashes back when I was a journalist at the Freeport Journal Standard in 20-02 or 20-03 or
whatever, right? Love M-Dashes, love compound sentences, but huge over-reliance on them here from
GPD-40. So let's see. One, two, two,
three. So out of like the six sentences, three of them are compound sentences with m-dashes.
Not good. We only have, let's see, we have zero sentences that I would consider short,
which is five words or less. All right. So if we look at GPD 4-5, we only have one compound
sentence with an m-dash. So that's better. Okay. We do at least have one. We do at least have
one short sentence.
All right.
So a little better,
a little better in terms of content structure,
you know,
some cadence,
some variance,
but still nothing great
if we're just looking at content,
right?
I know this is more about the motivational message,
but I did want to take a second
to look at even just how the content is produced
because I think that is another small detail
that 4 or 5 actually has better.
So yeah,
less,
you know,
like, oh,
people are always like,
oh,
content, right? You can't technically tell, although there's a lot of telltale signs, right?
Heavy emojis, double emojis and headlines, you know, random hashtags, you know, like I said,
an overreliance or a heavy percentage of sentences that are compounding compound sentences
separated by an m-dash. So overall, the content writing, I think, is much, much better on four or five.
All right. So what do you all think as we as we wrap up here? But like I said, these are the areas.
And I think you saw it in probably that middle example, probably the best, the email example,
how really we're having a little bit of EQ, some emotional intelligence and being relatable as a human, right?
A lot of you said the same thing. I said the same thing as well. I'm like, I want to receive that second email that we talked about,
the one that was from 4-5.
It just felt more human.
It was probably more human than something I would have written, right?
Which is pretty impressive, right?
It is pretty impressive.
And I think that's one of the reasons why this new model, GPT-4-5, when it comes to humans' preferences, right?
Yeah, you know, 4-5 didn't crush every single LLM benchmark.
it improved on almost all of the benchmarks from 4-0 to 4-5.
But, you know, people were like, oh, open AI has hit a wall.
Open AI is going to go bankrupt.
Open AI is garbage.
It didn't, you know, break every single benchmark out there, right?
I don't think most companies, we saw the same thing with Claude 3-7 Anthropic.
Saw it, Sonnet 3-7 from, from Claude.
It didn't break every single benchmark out there.
It really excelled and just,
widen their lead in anything software development, anything on the on the dev side, right?
But I think now we're going to see companies probably more focused on something like
Elo scores, right?
On the on the chatbot arena, right?
And they're like, yeah, we hope our actual benchmark, you know, our STEM, our math,
our reason, right?
All these kind of like quote unquote, more scientific research base category based benchmarks
and proof.
But I think ultimately we're past that.
I think we're past that.
And this is indicative.
The fact that GPT-4-5 did not crush every single benchmark on paper that people said,
oh, these are important.
But at the same time, instantly shot up to the number one model in the world preferred by humans.
That says something, right?
There is a human side to large language models that I think for the most part, you know,
that we ignored before 2023, right?
Everything was about overfitting models to hit certain benchmarks.
And I think over time we saw, okay, that's great for benchmarks, but it's not benchmarks using
these models.
It's humans.
It's humans trying to solve real problems.
It's humans trying to sell things to other humans, trying to improve customer relationships,
trying to increase accuracy and reliability, which are all things I think GPT-4-5 does a great job of.
So before you listen to that random influencer online that is just spitting out these benchmarks and
It's like, oh, open AI has hit a wall.
I'd say the exact opposite.
I'd say the exact opposite.
I'd say, if we're being honest, right, a lot of the things that we do on a day-to-day basis
are creating communication for other humans.
And as someone that's, y'all, I've won national writing awards.
I've done okay.
I was a Pulitzer fellow.
Some of those emails better than I would have written.
If I had to write some of those emails because it's thinking about the human.
It is trying to be more relatable.
It is really flexing its EQ skills, which I think is ushering in a new era,
not just of how large language models are built, but how they ultimately should and could
be used to strengthen relationships and connections between humans, while also still,
you know, hopefully excelling in all those benchmarks.
But in the end, that's what it's all about.
All right.
I hope this one was helpful, y'all.
If it was, please go to our website.
Go to your everyday AI.com.
Sign up for that free daily newsletter.
Also, share this, right?
I know a lot of people tell me, oh, Jordan, I'm not going to tell anyone about this, right?
Everyone at my company thinks I'm a genius, right?
I've gotten so many, so many messages.
I love these. Reach out if, you know, if you have a story like this. I always love hearing it.
It makes the long nights and early mornings really worth it. I love hearing from people that are like,
hey, I just got a job. My first job in AI, thanks to, thanks to, you know, your podcast, thanks to these
guests you bring on, right? And people tell me, like, I'm not telling anyone about this.
This is my cheat code. This is my secret. Share it, please. People are always like, how can I help?
How are you making all this information free? It's because of those of you that actually do share this.
So if you're listening on the podcast, thank you.
I appreciate it.
Please subscribe.
Please leave us a rating.
That would be great on the podcast.
And also go to Your EverydayAI.com.
Sign up for the free daily newsletter.
Read the daily newsletter as well.
Each and every day, we break down exclusive insights that you didn't hear from the podcast.
We're going to take this a step further as well as keeping you up to date with everything
else you need to know in AI.
So thank you for tuning in.
Hope to see you back tomorrow and every day for more Everyday AI.
Thanks, y'all.
Meet Firefly AI Assistant.
now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words, and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of
Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going. For a little more AI magic, visit Your EverydayaI.com and sign up to
our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.
