Everyday AI Podcast – An AI and ChatGPT Podcast - EP 301: Anthropic Claude 3.5 Sonnet – How it compares to ChatGPT's GPT-4o
Episode Date: June 25, 2024Is there finally a (real) ChatGPT killer? Anthropic just released its newest model, Claude 3.5 Sonnet. And out of the gate, this thing is friggin powerful. It's outbenchmarking every other model ...and the new Artifacts feature is a legit game-changer. But, does it actually hold its weight against ChatGPT's new GPT-4o head-to-head? We put it to the test. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on Claude 3.5 SonnetRelated Episodes: Ep 223: Anthropic Claude 3 – Better Than ChatGPT and Google Gemini?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Overview of Anthropic Claude 3.5 Sonnet2. Claude 3.5 Sonnet Artifacts Feature 3. Comparison of Claude 3.5 Sonnet to GPT-4o4. Real World Examples and Use CasesTimestamps:03:00 Reviewing powerful Anthropic's model and its performance.08:00 Anthropic touts 3.5 Sonnet's powerful new vision model.10:58 Excited about new artifacts feature, free with limits.16:31 Anthropic's artifacts revolutionize language model interface.23:21 Usage limited to one chat window for efficiency.27:46 Language models struggle with common problems.36:26 Create solution for non-existent problem, market strategy.39:52 Computer vision for food identification and organization.46:44 Chad GPT more accurate than Claude.48:13 Create game sales spreadsheet, visualize data graph. Upload.54:39 Write newsletter in host's style from podcast transcript.01:02:28 OpenAI may overshadow GPT-3 with new features.01:05:15 Chat GPT excelled, handled large spreadsheet impressively.Keywords:Jordan Wilson, Everyday AI, podcast transcript, newsletter copy, AI tools, Anthropic, Chat GPT, Claude, artifacts feature, coding visualization, front end website design, OpenAI's GPT builder, real-time internet access, third-party API, dynamic data, Claude 35 Sonnet, GPT 4o, logic questions, 35 OPUS, GPT Next, handling large spreadsheet, logic quiz, river crossing problem, t-shirts drying question, airplane black box, joke generation, marketing ideas, company brainstorming, Quantum Haven, SleepSync Sphere, Neural Dwell, Mood Morph, computer vision, food items identification, nutritional information, Chicago skyline, chatbots, data analysis, toSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the all-in-one creative AI studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
I'm actually going to be using Claude now.
Yes, yes, yes.
The same guy that loves chat GPT.
I'm actually taking a look at Anthropics new Claude 35 sonnet.
It's really good.
So yeah, Anthropic just released its new model.
And I can't lie, it is super impressive.
But there's one missing feature that can't.
keeps it from being an actual chat GPT killer.
And there's one new Claude feature that I think is so good.
I recommend that everyone uses it.
It's going to, I think, change how we interface with large language models.
All right.
We're going to be going over that today and a lot more on Everyday AI.
What's going on, y'all?
My name is Jordan Wilson and I'm the host of Everyday AI.
And we're a daily live stream podcast and free daily news.
that are helping everyday people like you and me learn and leverage generative AI.
So today's episode I'm extremely excited about.
I'm so excited.
I'm actually going to be in the comments with you live.
So yeah, this is technically a pre-recorded one.
I'm actually going to be on the road going to chair a panel at the AI Chicago Summit.
So if you're there, hit me up.
Let me know.
But let's just get straight into it.
And quick reminder, yeah, we normally go over the AI news.
I don't want to bring you stale news.
Our newsletter is still going to be fresh to death.
So make sure you go to your everyday AI.com.
Sign up for that free daily newsletter.
We'll still be bringing you the AI news for today.
Everything else that's happening around the web that is, that matters to generative AI.
But let's get into it.
Let's get into it right now and talk about the new Claude 35 sonnet.
And we're going to see live.
yes, we're going to do some live head-to-head comparisons that I'm super stoked about.
I haven't even done these.
So I'm going to be doing these live with you all.
So yes, if you are listening on the podcast, make sure to check out the show notes because
this is one of those ones.
I'm going to do my very best to explain this on the podcast with my words.
But it's probably going to be one that you're going to want to watch because we are going
to be doing head-to-head side by side.
we're going to let these two new models duke it out.
You know, Anthropics model.
We're going to talk about here in a second,
but it is now the most powerful model that there is,
according to their benchmarks.
So we got to see how it does in a little head-to-head battle.
All right, let's get straight into it, shall we?
All right.
So I'm going to go ahead and kind of first,
we're going to do a little bit of a walk
through and I'm going to tell you what's new in 3.5. Then I'm going to tell you what I like and
what I don't like. Then I'm going to give you kind of my thoughts on this large language model
race that we have going on right now. And then last but not least, we're going to do the head
to head. All right. So with that, let's start at the top. And let's just go ahead and talk a little
bit about what's new inside Claude. So Claude 3 sonnet. Here we go. So I have the announcement page
here up for our live stream audience. But this is pretty straightforward stuff. So we're going to
talk about this here in a second. But, you know, Claude 3 launched with three different variations.
So it was haiku, which was their least capable model. Sonnet, which was kind of their middle model.
and then Opus, which is their largest or most capable model.
So that was until a couple days ago when they released 3-5 Sonnet.
So right now it is just Claude 3 haiku, small one,
Claude 3 Opus, powerful one,
but now the middle one is actually the most capable or intelligent one
that's getting the highest benchmark.
So more on that here in a second.
So keep that in mind.
It's actually Anthropic did this to their middle model.
They came out with this iterative 3.5.
And the other ones, you know, Claude Haiku, the small one, still three.
And then the previous most powerful one, Claude Opus, is still on three.
So keep that in mind, it's actually their middle model that is the most capable.
So, you know, just kind of reading off the press releases right here.
So we haven't done all of these things, you know, to give them kind of a fair shake,
see if this is actually what's happening.
But Claude says that 3-5 Sonnet is twice the speed.
All right.
And here's the important part.
Here's what people are going to be talking about.
And you're going to be seeing this chart that's on my screen,
probably a lot.
If you follow AI News, if you follow large language models, generative AI,
you're going to be seeing this chart a lot.
So essentially, this is a benchmarking chart.
And you see kind of at the top, we have Claude 35 Sonnet,
their previously strongest model,
which is Claude 3 opus.
You have GPT40, and then you have Gemini 1.5 Pro, and then you have Lama 400B.
All right.
So a couple of things here.
A couple of things to keep in mind.
These are Anthropics benchmarks, but usually if a company puts out benchmarks like this,
they are fairly accurate.
I just wanted to put that out there that, you know, these are Claude's benchmarks.
these aren't a third party's benchmarks.
But essentially what these benchmarks show is that Claude 3, sorry, Claude 3.5 Sonnet is blowing every other model out of the water,
both their previous, Claude's previous most powerful model, Claude 3 opus, as well as GBT4O from OpenAI, Gemini 15 from Google, and Lama 400B, which is their early snapshot from meta.
That's important to keep in mind that production model is actually not out yet.
So this is just what meta kind of shared previously.
So those are not, the Lama 400B is not publicly out.
The other models all are.
So essentially what we are seeing in this chart is Sonnet is much higher than everyone, right?
Probably the one that we talk about the most often and I think is the most telling is the MMLU.
And it's actually interesting because that is the only metric that it is still tied with GPT40.
But what is interesting here is for Claude 35 Sonnet, essentially they have the exact same score.
So this is the only one, the only benchmark that Quad 35 Sonnet was not, quote, unquote, winning by itself.
So it's in a tie for first, according to Anthropic here, with an 88.7.
on this MMLU versus GPT-40's same score, 88.7.
However, kind of two different testing methodologies.
So Claude III, five sonnet, had a five shot, whereas GPT40 had a zero shot, but chain
of thought.
So kind of two different prompting methodologies here.
So that's why I think we have to wait until we have a third party kind of go through
and do this to see when we're comparing apples and apples.
But, I mean, aside from that, the new Claude 35 Sonnet pretty far ahead.
All right.
A couple other kind of updates.
So Anthropic is saying that the new vision model inside of Claude 35 Sonnet is much more powerful.
And then there's some of these testing scores that use vision, essentially showing 35 Sonnet way far.
way far ahead of not just everyone else, aside from one simple, again, on an MMMU,
which is kind of the multi-modal or multimedia version of that other test where GPT-40 is still ahead.
But everything else, I mean, it's pretty significant here, you know, even just saunit over Opus.
I mean, one that really stood out to me, which is, you know, Claude 3 was very far behind in this,
was visual math reasoning, where the previous version of Claude 3 was a 50.5, whereas GPT4 was a 638, Gemini was 639.
And now, so, I mean, just shot up in that one.
Claude 3 saw it went up to 677.
So, you know, we'll leave this in the show notes so you can go check it out.
But I'd say this right here is the biggest feature, which I love.
It is the artifacts.
So we'll probably, I'll probably try and show you an example of that one here live.
So you can check that out.
But essentially, artifacts is, you know, where you can perform a function on the left.
This really works for more coding and visualization.
So you can do something on the left.
You can generate code or something on the left.
and it will render it on the right, which I think is really, really cool.
All right.
So that is essentially, I'm going to go ahead and swap my screen a little bit here.
So we're going to go ahead and do a little bit of this live.
I want to show you at least that one feature.
I think it is very impressive.
All right.
Let's see if we have the right one here.
All right, we do.
Perfect.
There we go. All right. So let's go ahead. Nope, that's not the right one. Sorry, y'all. Give me a second. Hey,
when we do this live, when we do this live, you know, this is unedited, unscripted. Sometimes it takes a hot second.
All right, cool. So let's just go ahead. We'll do something fun. Here's what I'm going to do.
So I'm just going to go to the Everyday AI website here. All right. So I'm going to take a little
screenshot of this website, our homepage. All right. I'm going to click save. I'm going to save
that little screenshot. Now I am jumping back in Claude, and I'm just going to say, I don't like
how Claude forces you to put something in to get their normal layout. Okay, anyways, so this new
artifacts feature I love. So a couple other things to keep in mind, this new version, 35,
on it is free, which is amazing. But it is very throttled. So, but if you have a paid account,
much more generous rates. So you can go into your account. You can go to feature preview.
Okay. And then you just need to toggle this artifacts feature on. Okay. So now that I did that,
I previously did that. So now all I'm going to do is I'm going to upload this screenshot that I just
took of the Everyday AI website of the homepage. All right. So now I'm just saying to Claude.
So I have three, five sonnet here at the bottom. Hopefully y'all can see that. So I'm just
going to say to Claude, I'm going to say, please recreate this as a front end design.
I think Claude should be a should do a pretty good job of doing this. So right away,
we kind of saw a code generator come up and I can click to open.
to the component here on the right-hand side.
So now I can see on the right-hand side all the code going.
All right, interesting.
Hey, in this example that I tried here live, it didn't work.
So I tried it again.
It didn't work.
What a bummer.
All right, that's okay.
So I'm just going to go ahead and say, generate a front-end website design for a company called
everyday AI. All right, we'll just, we'll just do it this way a little better, just so you can
see this new feature. Hopefully have it working. We'll see if it actually does. Yeah, the first one
didn't work. That's why I do these live. Nothing's scripted, nothing's edited. So let's actually
see if this one works here. I think that would be super helpful. All right, there we go. Perfect.
So at least now you can see kind of what this looks like. So for the podcast audience,
I essentially said, hey, on the front end, generate me some code, right, for a landing page design.
Right.
So I can click over here and you'll see this code.
All right.
So it looks like it wrote some React.
So we have all of our code here.
But the cool thing is, which I love, you can preview this in real time.
So I'm looking, I mean, wow, this is, this is good.
I have what looks like a fully functioning website.
And the thing that I love is you can modify it just with your text, right?
So it's nothing great.
It says everyday AI.
It says AI solutions for everyday challenges.
There's a little button.
It says our solutions.
There's a contact.
There's a header.
There's a footer.
There's a hamburger menu because I'm in probably like mobile mode here.
It's super impressive.
It works, right?
I'm clicking on this mobile menu.
and it works. It renders in real time, which is amazing.
But the thing I love is you can talk to it iteratively.
So I can say, like, looks great.
I'm going to say, let's change the blue color to a darker blue and make the button a turquoise blue.
I should probably spell turquoise correctly.
I don't think it matters.
and then I'm going to say, let's change the logo, right?
So simple things, right?
This might be something if you've ever worked in web design.
You probably do a lot of back and forth in this,
but it takes days or dozens of hours to do this kind of stuff,
sometimes, especially 15, 20 years ago to make these kind of changes
would take a very long time.
So now on the right-hand side, it is changing the code.
I'm watching and do it live very fast.
All right, so let's see how it did here.
So didn't do the best.
All right.
So I did ask for a main blue hero section, but it's not blue anymore.
It's kind of a whitish gray color.
It did change the logo.
So we got that right.
I asked for turquoise blue as the button.
Didn't get it.
So I'm curious.
I'm going to go ahead and click retry to see if it gets it correct.
I really wanted this one to work.
So brand new feature.
maybe it's a little finicky, maybe just the, it's the model itself.
So again, I'm not sure if it's the model itself that's not correctly generating the code
or if it's something with this new artifacts feature.
I don't want to take too long because we still got to get our head to head, but bam,
there we go.
Hey, the second time, it actually worked well.
So I'm glad I clicked regenerate.
Y'all generative AI is generative.
That's what people don't understand a lot of times, right?
Sometimes, you know, it's essentially next token prediction, but it's like insanely
good next token prediction so i regenerated and now we have everything that i asked for i asked for
a dark blue background it's dark blue i asked to make the the button turquoise it's turquoise i ask for a new
logo new logo again we have a working menu it's it's actually a pretty slick looking very simple
website looks good all right so now now what we're going to do y'all let me tell you real quick
so i told you what's new right so with a new model it's benchmarking off the chart this artifacts
I'm going to tell you what I love.
Well, I love the artifacts.
I think it is a great feature.
I think it is going to change how we interface with large language models.
If I'm being honest, though, technically open AI and chat GPT had this first with the GPT builder,
because you could talk to the GPT builder conversationally on the left hand side.
And then on the right hand side, it would generate your GPT that you could preview and you could work with it iteratively.
So it would essentially generate code, rendered on the right hand side.
hand side. So technically, Open AI had a feature similar to this, but with Anthropic here,
they did a little, some different things. So generating code, visualization, some basic graphics,
which is really cool, generating SVGs, for example. So this, I think, is going to be something that
we're going to see out of all large language models. It is too good to not be included in the future
of work, period, right? Just something, it seems simple, right, but being able to render
websites, visualizations, et cetera, super impressive.
So that's what I like, what I don't like.
And the one reason, the one reason why I will still not be using Anthropics
Claude 35 or 35 sonnet as my main model and why I would recommend you probably shouldn't
either, right?
You should go in, play with it, right?
But it is not connected to the internet, y'all.
I cannot emphasize that enough.
Huge, huge miss, huge miss from Anthropic Claude.
And, you know, I'm curious because two of their, you know, biggest investors, I believe,
are Amazon and Google.
So those are two companies that should be able to help them with that.
I do not understand for the life of me why Anthropics Claude does not allow you to connect
to real-time data.
for 95% of our use cases, and I would say 95% of everyone's use cases, you need to have real-time
internet access.
If you don't, you cannot be confident about the outputs, right?
The most important thing when you're using a large language model is, is it correct?
Do I feel confident in the output?
Yes, it does have, I believe, a very up-to-date, I think April 2024 knowledge cut off.
so it's not like we're working with data that's two years old.
But y'all pretty soon the data is going to get old.
It's going to get stale.
And even think of the last three months.
Think of how much the world has changed, how much your business has changed,
how much your industry has changed in the last three months.
You cannot be using a model like this for business purposes,
for day-to-day business purposes.
You can't.
I wish I could because I actually really like it.
But I cannot and I cannot recommend anyone else,
unless you're using it with a third-party API that taps into an internet connection.
You know, perplexity as an example, I switched over my perplexity account from GBT40 to
Claude 3-5s on it.
That's a no-brainer, right?
But working with Claude out of the box, I still cannot recommend it for the majority of people.
If you're using it for very specific use cases, or if you're manually bringing that data in at all
times, sure, but that's a lot of data to bring in, right? And you have to bring in data in a static
way, right? You can't bring in dynamic data into Claude either. So huge downside. And still,
it's such a big upside with the model and the artifacts, amazing. But the downside, I think,
is too big to recommend that people use this day to day for any business purposes. Again, unless you're
trying to get it to, you know, oh, write some copy or something like that, that you don't necessarily
need, you know, up-to-date information or you're going to be bringing it in yourself.
Hey, I'm sorry, that's just the reality.
All right.
So now let's get ready.
Let's have some fun.
Hopefully this works.
So you'll have to bear with me, y'all.
So let me go ahead and explain to you guys what we're going to do now.
And I hope this works.
All right.
So we are going to be doing a live side-by-side.
battle. All right. So we have Claude 3-5 Sonnet and we have GPT-40. All right. So I'm going to go ahead and share my window here. Hopefully we can, let's see there. All right, bam. And we're in it. And we're in it, y'all. All right. So let's get this thing going. So we are going to go.
One at a time.
One at a time.
And let me just tell you a couple things.
So if you are joining us on the podcast or even if you're joining us live, I have to say
a couple of things.
Number one, this is not a scientific test.
All right.
This is not scientific.
All right.
This is not definitive.
I just wanted to be able to do some real life use cases, okay?
But I am only testing similar capabilities that the model share, right?
So I'm not going to be asking it questions that require active information because Claude
cannot do that.
You know, I can't be asking it about things that have happened, you know, with, oh, the,
you know, Invidia and Microsoft and Apple, they've been going back and forth.
And, you know, I can't, you know, ask it current event questions, pop culture, because it doesn't
know anything.
It doesn't know anything recent.
So I'm only kind of comparing capabilities that the model share.
Also, I'm zero-shodding all these prompts, right?
Copying and pasting them in, which is if you want actual good outputs, you should never do this, right?
You always need to go through prompt engineering basics.
I recommend you do a methodology similar to our prime prompt polish, which is a free course.
We do usually multiple times a week.
So, hey, if you've taken it before, if you're joining us live,
if you're still around, let people know if you like the course or not.
If you want to access, just type in PPP.
I'll get around to it, I swear.
All right.
Enough jibber, jabber.
Let's go.
A couple other things.
I'm starting with a simple prompt here just to kind of set the stage for both models.
All right.
So essentially what I'm saying, and we're going to zoom in here and hopefully everyone can see.
All right.
we are going to be working in one chat window the whole time.
Why that's important is, well, it's technically keeping all of this in its context, right?
So I'm not going to ask it really, aside for maybe once, to draw on something that was a previous,
that was previously in the context of this chat.
But again, this is also not how you would normally want to be using a large language model.
You would go in, use it for one specific purpose, one specific purpose, one specific.
purpose only and then go back to that chat whenever you need it.
Right?
We are not doing that here.
All right.
We are doing this just for the sake of easy copy and paste.
All right.
So with that, let's go ahead and get started.
So what I did to get this kind of prompt started, I said for this chat,
please respond with proper formatting and structured bullet points.
Do not waste words and answer in the shortest way possible while still being detailed enough
to fill the request and answer the questions.
Do not answer in vague or general statements.
Please take your time before each response, go step by step, and do your best to answer
each question to the best of your abilities.
Are you ready to start?
All right.
Essentially, large language models have the tendency to be overly verbios, right?
They're going to jam in like so many words.
We don't need that.
We're just doing comparison.
So I'm telling both models, just give me the answers.
You don't got to take me on a journey here, y'all.
All right, here we go.
Let's go with the first one.
We are going to start with some logic questions.
All right. So hopefully, I mean, well, I guess we'll see if they get it right.
So I'm going to try to always hit enter the same time.
So what I'm saying is I said, I just woke up today with six apples and three bananas.
Yesterday, I ate a banana and two apples.
This morning, I will eat one apple and no bananas.
However, I don't really like apples and one banana may turn brown tomorrow.
Assuming nothing else changes, how many apples and bananas will I have tonight?
So obviously I threw in some irrelevant information to see if I could throw the models off.
The correct answer should be five apples and three bananas.
So let's see.
Let's see how they did.
All right.
So it looks like I threw off both of them.
All right.
Both of them answered wrong.
They said,
tonight you will have three apples and two bananas.
So I threw them both for a loop.
The only thing that matters is that I woke up today with six apples.
and three bananas. So yesterday, it's both of them did the same. I kind of tricked them, right? So I put in some
information about yesterday because, well, it said yesterday and it's subtracting that from today's
total. Well, no, yesterday I actually had four bananas and eight apples, right? I'm saying today I woke up
with this many apples. I'm eating one apple. So it should have been six apples, minus one, five
apples. I'm not eating any bananas. They both got it wrong.
All right. So generally, we can do some a little bit of prompt engineering,
and it probably would have done a little bit better. But out of the box, again,
we're just comparing 3-5 sonnet with GPD40. I'm going to try as well. Sorry, y'all.
I know, hey, if you're joining this on the podcast, I'm sorry. This is one of those ones.
We don't really edit this. But I do want to keep a
a running track of score to see who's winning.
So so far we have, it is zero to zero.
That makes it super easy to keep score here.
All right.
Next question.
Let's go.
We're going to get a little faster going from this.
All right.
So this is,
some of these are common problems that large language models generally get wrong.
So I'm saying here, a man and his dog are standing on one side of the river.
There's a boat with enough room for one human and one animal.
how can the man get across with his dog and the fewest number of trips?
All right.
So both of them got it wrong.
Both of them said the same thing.
Again, this is a very famous problem that large language models always get wrong.
So it says, trip one, the man crosses the river with his dog, trip two, the man returns home alone.
Trip three.
The man crosses the river again with the dog.
Yeah, it should be one trip.
Each of them said three trips.
So there we go.
we're starting off with two wrong.
But again, these are common types of problems
that large language models got wrong.
The reason why I'm doing this still is,
well, I want to see if Claude 35 saw it
that just came out.
I haven't tried these if they,
if it finally figured these out.
And I do have to figure that eventually these models
are going to get these because all of these models are now,
or sorry,
all of these common problems are on the internet.
And they're being scraped.
So I'm actually not sure.
why Claude 35 Sonnet didn't get these correctly because the knowledge cut off is April
2024, right?
You know, GPT40, knowledge cut off October 2023.
Maybe these weren't on the internet, but these are, I mean, you can find these in hundreds
of places on the internet.
So I'm surprised that I got it wrong.
All right.
Next logic question.
We're still doing some logic here.
All right.
This is, if it takes three days, sorry, I'm going to zoom in here.
Sorry, I said if it takes.
three hours to dry 10 t-shirts in the sun, how long will it take to dry 30 t-shirts in the sun?
The correct answer should be three hours.
It's the same amount of time.
It doesn't triple it.
So let's see if it got it.
So both of them, Claude and Chat Chb-T were not fooled on this one and got them both.
Correct.
All right.
Finally, large language models.
You were having us worried for a second.
All right.
Another simple, again, we're going with some.
simple logic, kind of some brain teasers here.
So let's go ahead.
We're doing our next one.
Our next one is if you have a single match and you walk into a room with an oil
lamp, a candle, and a fireplace, which do you light first, right?
You light the match first.
So both of them got this correct.
All right, good.
Yeah.
Again, we're going with some trick questions.
You know, some people might say, you know, or you might think or maybe some of the older
models might say, oh, you know, you would do the candle first, right? Or, you know, I don't know.
All right. So got that right at least. Our next question. And hey, live stream audience, I'm going to
give you the question first. I want you to guess this. Let's see if we're all smarter than a large
language model. What color is in airplanes black box? I'm going to take a sip. Everyone joining this
live, please get your, get your guesses in right now. Adobe just introduced an entirely new way to
create, bringing the power and precision of its creative suite into one conversational experience.
Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative
AI studio.
Powered by Adobe's creative agent, Firefly AI Assistant lets you start with your vision, just
describe what you want, and shape the outcome as it takes form with the Assistant.
The Assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe
Creative Cloud apps, including,
Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life.
You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks,
like batch editing photos, creating mood boards, portrait retouching, and creating social variations.
Every step the assistant takes is visible so you can refine, redirect, or take over at any time.
You stay in the driver's seat as the creative director.
Adobe Firefly AI assistant now in public beta.
See it today at firefly.adobie.com.
All right, here we go.
Let's see.
I really hope they get it right.
All right.
So Chad GPT got it right.
In Airplane's black box is actually bright orange.
All right.
And Claude got it correct as well.
All right.
So they both got it correct.
Fantastic.
And we have two more.
Logic questions. People wanted logic questions. So I'm giving you all logic questions.
All right. Our next one is, I'm saying, please give me seven jokes that end in the word blue.
Two should be about animals. Three should be about some other topic in the body of this chat.
And you can make up the other two. All right. So let's see how these worked.
All right, so they both, okay, they both sorted it out.
They said animal joke, animal joke.
Claude got blue, blue, all right.
All right, and let's see, chat, GBT, good.
I personally like this joke better from chat GBT's that said,
what do you call a sad polar bear, a bur blue?
All right.
Okay, so here we go.
I'm seeing here two, okay, they got two of them about something else in the chat.
Oh no, I said three should be about some other topic in the body of this chat.
Okay, so it looks like Claude did, and they all three end in, oh, they don't end in blue.
So Claude already failed.
This one ends in the word burn.
Let's see if, all right.
So open AI is actually pulling in some information from somewhere else that it shouldn't be pulling in.
Let's see here.
So yeah, it's pulling in some information I had from a different chat.
So, but it still does say blue.
That one, not really.
All right.
So they both, they both failed.
All right.
So they both failed.
They're both getting, I mean, they kind of got some of it right, but they both failed.
It wasn't all blue.
All right.
And then last, and actually, because of chat GPT's memory and how it works,
it's pulling things in from previous chats, from other chats, which it shouldn't have done.
All right.
So now, a little bit of a brain teaser here.
So I'm saying a box is locked with a three-digit numerical code.
All we know is that all digits are different.
The sum of all digits is nine.
And the digit in the middle is the highest.
What is the code?
All right.
So, Claude says the only possible combination is one plus seven plus one equals nine.
So it says one, seven, two, right?
So, okay, middle number is the highest.
Three can't be the same.
All add up to nine, but that is not right.
So one, seven, two, those digits do not add up to nine.
Those digits add up to 10.
So Claude got that wrong.
All right, let's look at chat GPT here.
So it's going through step by step.
Again, we instructed the models in the beginning to have this kind of step by step problem solving logic to hopefully make, give them both the highest likelihood of getting this correctly.
All right.
So chat GPT got it correct.
No, it didn't.
It got it wrong.
So it said 126, middle digit is the highest, which that is not.
So they both got it wrong.
They got it wrong in different ways.
All right.
Next one, y'all.
Let me know.
Let me know what you think.
Are you surprised or not surprised with this?
All right, here we go.
So I'm going to say our next little quiz here.
So now we're going into some brainstorming.
So my prompt here is generate unique and creative marketing, advertising strategies to grow the everyday AI podcast.
Do not suggest general or run-of-the-mill ideas.
only pitch clever advertising and marketing tactics to specifically grow the everyday AI podcast.
All right.
So there's no right or wrong here.
So this is going to be kind of just judging, but more or less, I'm just seeing if it did the,
if it followed the directions, right?
I said, do not suggest general or run-of-the-mill ideas only pitch clever advertising
and marketing tactics.
So I'm just looking, did it at least follow directions or did it not?
and did it not hallucinate?
That's what I care about, right?
Okay, so it looks like Claude as an example said, okay, AI generated episode teasers,
virtual AI co-host challenge, AI powered listener Q&A, augmented reality podcast experience,
AI ethics dilemma game.
Okay, so those are fine.
They're not really marketing or advertising necessarily.
these look like features of different shows.
All right, let's see how chat GPT did.
So it said monthly AI puzzles, AI art contest, custom episode recommendations, personalized
episode.
It did the same with a guest AI co-host, which I actually just did with Hour 1 last week.
So make sure to go check that out.
So pretty similar.
I would say I'm not super impressed by kind of the,
prompt following because it gave a lot of just very general. So it's not,
not that I'm failing either of these here, but, you know, nothing really blew me away.
So no one's getting right or wrong.
We're just kind of getting pass.
So we're getting passes there.
All right.
So let's keep it going.
Our next one here in our brainstorming.
All right.
So our next one, I'm saying create a new company.
Yeah, let's see how it does here.
Create a new company and brand for a future smart home device.
this will solve a problem that does not currently exist.
To start, come up with the company's name and its first flagship product.
Give the product a name, branding campaign, go to market strategy, tagline, and rationale for why it will work.
Respond in a succinct way, keeping responses to short bullet points, but with ultra-specific facts.
All right.
So one thing, I mean, and this is more of a clawed thing.
It can't really respond with formatting, which stinks.
Personally, I find Claude hard to read for that reason.
I don't know if it's my eyes or what.
I like to have different weights of bold, different text styling.
You don't get that really in Claude.
You get that in ChatGBTGPT.
That's a small thing.
All right.
So let's just go over some of the things quick.
So ChatGPT said company name is Quantum Haven.
And the flagship product is the Sleep Sync sphere.
All right. And then Claude has the neural dwell and the product is a mood morph. So mood morph is a wall-mounted
AI-powered emotional environment adjuster. Let's see if it gave us the problem that does not exist.
Okay, I don't really see anything there. All right, let's look at the motto at least. It says live in sync
with your feelings. All right. Didn't really give us what problem it's solving, but it did give us everything
else. It gives us the branding campaign, go-to-market strategy, and the rationale behind some of these things. All right. So chat, GBT, Quantum Haven, Sleep SyncSleep Sphere. That is a mouthful. All right. So it harmonizes household sleep cycles by analyzing and optimizing environmental factors. Which interesting here is they kind of created similar products. Okay. And branding campaign, it says Dream in Harmony. Visuals, let's see, visuals, channels. So we have our.
branding campaign, our go-to-market strategy, our tagline, which is sync your sleep,
transform your life, rationale.
Here we at least have the problem solved.
That was kind of the big part of it.
Claude just didn't address that.
The whole point of this was to, you know, create a product that solves a problem that
doesn't exist.
So I'm not going to give it a pass fail, but I will say Claude didn't pass very well.
I'll give them both a pass, but chat Chupit did a little better.
All right, here we go.
This is where my screen sharing might get a little crazy.
So let's hope this works okay.
So now what we're doing is I'm saying,
please tell me what this chart is,
list everything by category,
and give me 13 of the best ways that everyday AI could use these things.
So here's, let's see if I can bring this up here.
All right, actually, this is running long.
going to skip this one. We're going to go to the next one. All right, because these ones are kind of
similar. All right. So this one I have, I have a picture here. It's just some random food items, right?
But it's kind of hard to see, even with the human eye when you zoom in. So what I'm asking it to do here,
let's just go ahead. I just wanted everyone to be able to see that. So I'm asking it to please identify
what these food items are, save the data in an organized,
spreadsheet. So again, something that people don't know about large language models is they can see.
They have computer vision capabilities, which is, you know, obviously a pretty, a pretty important thing for
models to have. So I'm going to go ahead and drop this in here. I'm going to go ahead and drop the food
label in Claude. So the good thing is, in both of them, you can just drag and drop. So I'm saying,
please identify what these food items are, save the data in an organized spreadsheet that I can
download. After that, please create a JSON file with all the structured data. All right,
kind of shooting them off at the same time. So we'll also see which ones kind of go quicker.
All right. So both of them going pretty quick. Let's see if one of them finishes before the other.
Okay. So chat GPT created the spreadsheet first. It looks like, okay, I did not know this
actually. Again, I haven't used this new.
Claude 3-5 saw it too much.
So it cannot, it says Claude does not have the ability to run the code it generates yet.
However, it literally does.
We saw that.
But for whatever reason, it did not save a downloadable spreadsheet.
Okay.
But Claude did go ahead and do the JSON.
However, it only was able to recognize three different items.
Okay.
Interesting.
There were 10 items in the photo.
So it only got three of them.
And it did not get a lot of the ingredient.
I guess I didn't ask for ingredients.
So what I said is save the data in an organized spreadsheet.
So I didn't ask it to, I probably should do that.
So I'm going to go ahead and run this one more time.
I didn't want to rerun these.
I'm going to say, please.
I'm going to say please.
list all of the ingredients, all of the, not the ingredients, the nutritional information.
I'm going to say, please list all of the nutritional information in the chart and the JSON file.
All right.
So let's go ahead and try to run this one more time.
All right.
I didn't want to have to do these twice, but it is what it is, y'all.
All right.
So let's go ahead and try that one more time.
and see if we can get it.
So on the first pass, let's see how ChatGBTGPT did.
Similarly, yeah, they didn't.
I didn't really tell it what to do enough.
So let's go ahead and look.
So same thing here.
It looks like Claude was only able to identify three of the items,
and it did not get much of the nutritional content.
Let's see here.
Okay.
Wow.
So chat GPT got almost all of them.
However, I'm looking, I do think it made up.
Okay, we got, it looks like we got some hallucinations.
It did get all of the content.
So we got some of them exactly right,
but some of this was not, was not items that were in there.
So like, okay, as an example, the diced tomatoes.
Yeah, those were in there.
And it looks like it got all of the nutritional
information correct.
The pasta unknown was correct.
But I don't think we had green peas.
I think this information is correct.
I think those were pumpkin seeds.
I said granola bars,
Nature Valley.
I don't think we had those either.
Now I'm looking at the photo here.
We didn't.
We had it was an elevation bar.
So I think actually chat GPT did a little bit better,
but they actually both failed.
But, you know, Claude did a pretty terrible job, if I'm being honest, right?
You know, Anthropic was really adamant that their, you know, their vision was so improved and it didn't look improved at all.
All right, we're going to do one more, one more vision prompts here, and then we're going to get going.
I know this is a longer episode, y'all.
But, hey, I asked in my newsletter.
I said, what do you guys want to see if this is what you all wanted to see?
So I'm going to give the people what they want, I guess.
All right. So now I'm saying I have a photo. I'm going to show the photo here to our live stream
audience. Nothing crazy here. Just a simple, simple photo here. So it's just the Chicago skyline,
but we're on 90-94. That's the highway that we're on right there. All right, so we're driving near
Chicago. You should be able to tell that it is Chicago. Chicago has some iconic skyscrapers as well as,
you know, the computer vision should be able to see these signs, right? So it shouldn't
know California Avenue, California Avenue, Diversity Avenue, Fullerton Avenue, the accent numbers.
It should be able to see those things and it should know. All right. So let's go ahead and see if it gets
right. So I'm saying, please identify where this picture is located, what direction the photo is
facing and every other detail that you can make out. All right. So we're giving them here a second.
All right, so let's see how it did.
All right.
So Claude says, picture is in Chicago.
The photo was taken on a highway or expressway leading into downtown Chicago.
The photo, it says, is facing southeast.
All right.
It's talking about the traffic, the road signs, the streetlights, the skyline,
okay, pulls out the Willis Tower, formerly the Sears Tower.
So pretty good, not great.
I would have liked it to identify the highway.
that's pretty important when I say, where is this located?
That's the exact same first thing.
All it says is Chicago, Illinois.
It should have been pretty easy to know that this should have been 9094.
All right.
So let's see how Chat Chb-T did.
All right.
So it says city, Chicago, Illinois, facing south.
Landmark, same thing, Willis Tower.
Road.
It says likely an expressway or major highway leading into downtown Chicago.
So heavy traffic.
It's talking about things, the signs, environmental details, vegetation.
All right, here we go.
Hey, it got it right.
Good job, ChatGBT, GBT.
So it said, it appears to be an expressway likely I-90, I-94.
So it got it right.
So both of them got it right.
We're seeing a pattern here.
Both of them are getting it right, but Chad GBT is getting it more correct, right?
So that one there, Claude didn't know that that was 90-94.
It should have known.
It's one of the most, you.
you know, I don't know, 10 most popular highways in the United States.
There was plenty of telltale signs that that's what it should have been,
but didn't get it, but didn't get it wrong.
So all right, now let's go ahead and get our next one here.
I know this episode's going a little long, y'all.
Thanks for sticking around.
I hope you're enjoying this.
Like I said, if you are listening to this on the podcast, this is one of those.
You might want to go just watch the replay if you want to see these things.
If you don't want to do all this for yourself,
all the prep work for you. All right. So now let's go ahead and let's make sure I get the right
thing here. Here we go. All right. So now we're going to be uploading a spreadsheet. We're going to
be doing some data visualization. Okay. So all right, I got to make sure I get this key in here.
This one, this one's a little tricky. All right. Let me explain what we have going on here.
All right. So I'm going to be uploading a spreadsheet. Okay. So, and I have some direction.
some instructions. So I said, this is a huge data set. It's a publicly available data set,
hundreds of thousands of rows of data. And I'm saying, identify the 10 bestselling video games
worldwide from 1971 to 2024. For each game, provide the total sales, the regions where it
was sold the most, and its critics score, create a spreadsheet with that information,
and visualize the data in a graph. All right. And then I give it a key so it knows what the data is.
and then I'm going to go ahead and upload.
I'm going to go ahead and upload the file there.
So I uploaded it to chat gvt.
I'm going to try to get these going at the same time if I can.
All right.
I'm going to go ahead now and drop that in.
There we go.
All right.
And I'm going to go ahead and click go.
I gave Sonnet, Claude Sonnet, a little bit of a head start.
All right.
Interesting.
So it already says your message.
will exceed the length limit for this chat on Claude.
Bummer.
All right.
So I have to start a new conversation in Claude.
That is a week.
That is weak, Claude.
Why?
If you have that long 200K memory,
like why can't I actually use it?
That's interesting,
Claude.
All right,
regardless,
let's go ahead and give it a try.
So now you know,
GPT-40.
I'm still working in the same chat.
I just had to start a new chat in.
Claude. I don't know why. Maybe too many files. I'm not sure. Let's see. So now it says,
all right, bummer. So, all right. Well, I guess Claude cannot even handle this spreadsheet.
All right. Interesting, Claude. Kind of disappointing. Let's see how Chet, GBT did. I mean,
it's a huge spreadsheet. But if, Anthropic, if, if, if,
If one of your thing is you're talking about this large context window,
I should be able to upload large files, right?
Even if it is a document, a spreadsheet with tens of thousands of rows of data,
I should be able to do that.
Let's see how chat GPT did.
I'm going to go ahead and download this file here.
All right.
And let's go ahead and look at the visualization.
All right.
Let's go ahead.
It looks like it's giving me.
All right.
So it looks like we got an error from chat.
Gpt.
However, it did.
Oh, no, there it is.
Oh.
Okay.
Okay, JetGBT.
GBT 4-0.
So not only did it actually give me the CSV that I downloaded.
I'm looking at this here.
There we go.
There it is.
All right.
But now, also, it gave me a visualization.
Yeah.
So apparently Grand Theft Auto,
Grand Theft Auto is number one.
Grand Theft Auto, Vice City is number two.
call a duty, whatever, whatever.
It got it right.
We broke Anthropic.
All right.
So this is the first time I'm saying definitively that Claude got it wrong.
Chat GBT, GBT, GBT got it right.
The last couple, Claude got it kind of right.
Chat GPT got it kind of better.
But there we go.
Claude failed.
Flaylor.
Not even close.
All right.
So I'm also going to go back.
All right.
We got a couple more here.
Let's go ahead and finish up.
I think we just got one or two more.
All right.
Well, actually, the next one, we can't even do.
We can't even do inside of Claude.
I'll just do it inside of chat, GPT, to really see if I can do it.
So I said, create a line graph showing the total video game sales per year from
1971 to 2024.
This is actually really hard.
And then I said, highlight any significant spikes or drops in sales and provide possible
explanations for these trends based on historical events or industry changes.
This is a lot.
So now we see here we have the Python code that the advanced data analysis version
two is running inside of GPT40, super powerful data analysis model.
So it's going through it super quick.
My gosh, that's good.
My gosh, we just went through literally tens of thousands.
It might even be more than 100,000.
rows of data. I couldn't even upload it into Claude, even though it has this super big
context window, couldn't upload it. Look at this graph, y'all. This is, oh my gosh, this graph
is interactive. Look at this. I am hovering over this and it's showing me year by year trends.
Y'all, my gosh, I am blown away by this. This is a freaking huge spreadsheet.
Huge, right?
I just,
I just want to show some of y'all.
Let's just go ahead.
Let's just go ahead and show this, y'all,
because this is,
look at this.
Look at all this data.
Look at this.
This is so much.
Let's see how many rows.
How many rows of data is this?
And chat, GPT,
just ate it for breakfast.
My gosh.
Let's see.
64,000 rows of data,
columns A through M.
So, again,
Yeah, that is hundreds of thousands of cells.
ChatGBTGBT just ate it for breakfast.
Cloud couldn't even take it.
Cloud couldn't take it.
All right, anyways, let's keep going.
All right, next one.
This is our last one, y'all.
All right, so this one I think is probably where Claude is going to shine, but let's see.
I actually have two files that I need to upload.
So bear with me for a second.
All right.
And then I'll explain to you exactly what we're doing.
Let's go ahead and upload these files.
And then I'm going to get it going at the same time.
And hopefully we can watch these go side by side.
All right.
Also, let me put this out there.
I'm a human.
I write my newsletter.
But I just wanted to do this as an example.
So, yeah, all the other AI newsletters, they all brag about how AI writes my newsletter.
I spend no time on it.
Guess what?
I spend a stupid amount of time writing the newsletter.
I'm a human.
I write the newsletter, former journalist.
I don't do this, but I just wanted to do this as an example.
So I'm saying for this chat, you will turn a podcast transcript of me, Jordan, the host of
Everyday AI, talking about the AI News That Matters, so show yesterday, and turn it into choppy
and engaging newsletter copy.
I've attached examples of previous newsletters and how they should be written as well as my most
recent podcast transcript.
please write a newsletter for the attached transcript,
mimicking the style as closely as possible to the examples given.
The priority is to write the newsletter in the exact same format, tone of voice,
and style as the examples,
but for this episode with the attached transcript,
please complete this task.
All right, so we're going to do it side by side here.
All right, ready, and we'll see who is faster and who is better.
All right.
So essentially, again, giving examples of the newsletter.
I say these are how the newsletter usually is.
Then I say, here's a transcript.
Write it for this transcript.
All right.
So we'll see.
I'm going to see who actually finishes first.
So they're going both pretty quickly.
Let's see who finishes first.
All right.
They're both pounding out content.
They literally both finish at the exact same time.
How is that even possible?
All right.
Let's take a look at which one.
did better. All right. So let's see here. On the left, yeah, I can, yeah, Claude,
Claude cleaned up here. Claude cleaned up. All right. So it says, is Chad GBT in trouble?
Question mark. Anthropics Claude just dropped three five sonnet with artifacts. Speaking of dropping,
Nvidia fell. Yeah. So it's, it's actually, it's kind of taking my transcript almost verbatim,
because I actually said those nearly same words. Uh, but that's,
fine. That's fine. Let's actually look at the body. Right. So in our Monday newsletters,
we essentially break down our main news stories. And the first part is a breakdown. And then the
second part is like what it means. So let's see how it did. So let's just kind of compare one by one.
This is meta, isn't it? So, okay, Anthropic got this part wrong. So the first story was
actually runway, Gen 3, Alpha, not Anthropic. So I'm wondering, is,
If Anthropic hallucinated here, let's see here.
I'm looking at the different ones.
Okay, no, it didn't.
Okay, so Anthropic just put it in a different order, which is okay.
But again, it's not what I wanted.
I mean, the Open AI, or sorry, the GBT40, the content tone is garbage, right?
So it starts with, hey, there, AI enthusiast.
That's what you always get out of chat GPT by default.
You know, one thing I think it's clear in terms of, you know, writing, engaging copy.
Out of the box, Anthropic Claw is better out of the box.
ChatGPT is great.
You just have to work with it.
So, you know, overall, let's see kind of the, we'll just do the comparison here of anthropic.
So, okay.
I mean, neither of them, if I'm being honest.
So they didn't really follow the correct.
format. I'm looking at, yeah.
So what Anthropic did is it just bullet pointed everything, which is not how I normally do it.
And then chat GPT kind of just created different subpoints per each category, which is also
something I don't do.
But let's look at the what it means and see if we can, you know, find any difference.
So let's see what it means.
Okay.
So Anthropic says it's a slug fest between Anthropics.
and open AI and poor Google Gemini can't seem to catch a break.
Claude 35 Sonnet artifacts feature could be a game changer potentially altering how we use
large language models.
So again, the copy is pretty good from infropic.
It's much better than chat chbt, but if I'm being honest, it kind of just took my actual
words, which is not what I told it to do.
I told it to reflect the tone and mimic the tone in the newsletter exactly, which it really
didn't do that well.
However, the overall copy is much better, but it didn't.
follow the instruction.
So let's read the what it means for chat chp t here for this newsletter piece.
It says the introduction of artifacts in Quad 3-5 Sonnet enhances user experiences by enabling
tasks such as generating test documents code.
Okay.
So neither of them did a great job.
I will say they both passed because they both correctly.
Here's the thing, y'all.
I uploaded two separate PDFs.
One of them was more than 20 pages.
The other one was eight pages.
I'm going through here.
And it got everything.
correct. So first and foremost, that's super important. So there's no hallucinations.
They didn't get everything in the correct order, didn't get the format right, but they did,
they both did a serviceable job. Essentially, I mean, Anthropic Claw just really took the words
out of my mouth, which is what I told it really not to do. I said, no, write it, like use that
information from the transcript, but write it like the newsletter examples. So neither of them did a
great job. I will say, you know, if I had to choose here, I will say I'll give this slight nod
to Claude. They both passed, but I think Claude did a little bit better. Wow. So that was a ton,
y'all. That was a giant, super long, super in-depth episode. Let me give you just a quick recap.
All right.
So like I said, what's new in Claude 3-5?
Sonnet, well, the model's really fantastic.
It is fast.
It is powerful.
And at least according to Anthropics benchmarks, it is much more capable than any other model.
Some things I like and don't like, love the artifacts feature when it worked.
I had to do it twice in the beginning there, but you saw it actually created a very nice website and it generated it in real time.
It rendered it.
love to see it.
What I don't like, it's not connected to the internet.
And apparently you can't upload long documents.
So a couple of things I didn't like.
You can't upload long documents.
And also the chat history broke, right?
It said, hey, this is too much information for this chat.
You got to start a new chat.
What happened to that long context window?
Anthropic, is that gone now?
Was it just because I was uploading a super long file?
Regardless, don't like it.
Also, the fact that it's not connected to the internet,
I cannot recommend Claude to any business.
person. I cannot recommend that you use Claude until they offer that. Period. I already gave you
my rundown and why I think that's important. All right. So a couple thoughts on the large language
model race that's going on. I didn't really get into that too much. Let me just give you the super
hot take right now. It's just going to be a lot of back and forth, if I'm being honest, right?
So, you know, first you had Claude 3 and then there's GPT 4 turbo. And now we have Claude 3 or sorry,
And then you had GBT40.
So sorry, it was GPT4 turbo first and then Claude 3.
Now, and then GPD40, now you have Claude 3-5 sonnet.
So it's literally bent back and forth, back and forth with who has the most powerful and capable model according to the benchmarks.
Here's the thing.
I don't think with this, if I'm being honest, I don't think if I was open AI, I'm not rushing to release my next model.
Here's why.
And I think that Anthropic actually got this wrong with the timing.
because all anyone, all Open AI has to do is release those other features.
And all of a sudden, people are going to forget about this new 3-5 sonnet.
So Open AI announced their GBT4O model, but with a lot of other features that are not out yet,
kind of this more neural and more relatable voice, this, you know, they call it her,
but also kind of this live Omni is what we're calling it.
So the ability to, for the model to see in real time and interact in real time.
You know what?
Open AI could release this today or tomorrow.
And I think people would stop caring as much about this new 3-5 sonnet because all eyes are going to be on that.
Here's the other thing.
They did this with their middle model, Anthropic did.
So I do assume that they are, they know that Open AI is going to clap back.
And when they do, I'm guessing they are going to.
they are going to release this new 3-5 opus,
which presumably would be at or above OpenAI's next model.
Presumably, I don't know if they will.
I think once Open AI releases,
whether it's 4-5 or 5 or they might call it GPT next,
I'm not sure what it's going to be called.
Whenever they do, if I'm being honest,
I think it's going to take everyone a couple of years to catch up,
like if we're being honest.
However, it is just going to be this back and forth.
and I just don't like the timing here from Anthropic, right?
Apple just, you know, had their Apple intelligence announcement like 10 days ago,
partnering up with Open AI.
You just have the co-pilot plus PCs that are shipping right now from Microsoft.
Like, come on, Anthropic.
Timing is everything.
And the timing on this one, the timing was off, right?
You have a very great model, very capable, very fast, some very important.
impressive things on it.
And they released it on like a,
what was it, a Thursday,
like a Thursday afternoon, right?
Right in the middle of all these other things that are going on.
So not crazy about the timing.
And I do think that Open AI at any point can make this kind of irrelevant, right?
The benchmarks are great.
I think there's the artifacts thing I think is going to change how people are using
large language models.
And then when we look at the head to head,
I'm looking at my score here.
Technically, Chad GBT was slightly ahead.
GPT 4-0, right?
They all kind of got the same amount of things right, except Claude failed.
It couldn't accept a spreadsheet that had 600,000 cells, right?
So maybe it's not built for that, but yo, like chat GPT munched it.
It crushed it.
It literally gobbled up 600,000.
sells in creative and interactive spreadsheet as fast as I could read it.
That is extremely impressive.
So on the head to head, it was kind of close, but there were certain things that, you know,
they kind of both got right.
And chat GPT just did a little bit better.
I gave you some of those examples.
And then obviously, Claude just couldn't handle the size of that spreadsheet.
Even with this long 200,000 token context window, not sure why, but it couldn't.
This was a long one, y'all.
I hope this was helpful.
If so, please shout us a, give us a rating.
Give us a rating on the podcast.
If you were listening on the podcast, we'd appreciate if you could rate or subscribe.
If you're still online, are you still online?
This is an hour.
This is a marathon episode.
Thank you for listening.
Please, if this is helpful, repost this, tag a friend, something.
Let me know.
Drop me, drop me a comment.
But please join me tomorrow and every day for more everyday AI.
Thanks, y'all.
Meet Firefly AI Assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
Stand control with the ability to step in and refine at any time.
See it today at firefly.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
