Everyday AI Podcast – An AI and ChatGPT Podcast - EP 223: Anthropic Claude 3 - Better Than ChatGPT and Google Gemini?
Episode Date: March 7, 2024Awesome Stuff From Our Partner, NVIDIA -Register for the FREE virtual NVIDIA GTC Conference or buy tickets to the in-person event and fill out this form here: https://www.youreverydayai.com/nvidia-giv...eaway/Anthropic just released its new AI model Claude 3 and it's setting impressive standards. Is it better than other models like ChatGPT and Google Gemini? We're breaking down Claude 3 and showing you just how good it actually is.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on Claude 3Related Episodes:Ep 217: 7 Steps on How To ACTUALLY Use ChatGPT in 2024Ep 204: Google Gemini Advanced – 7 things you need to knowUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:04:49 Claude 3 beats competitors in technical test.07:07 Comparison of language models and market competition.10:58 Math discussion on language model capabilities.15:51 Opus demo allows interactive chart manipulation.20:22 Claude, recent knowledge cut-off, warn against hallucinations.25:13 Discussion about ocean poem and its style.26:33 Math Riddles32:13 Image recognition and text extraction capabilities demonstrated.35:39 Standard business advice, focus on loyal audience.38:46 Free NVIDIA conference in San Jose, virtual option.44:07 Claude 3 Opus API improves interactivity and coding.45:55 Claude 3 vs Chat GPT and Google GeminiTopics Covered in This Episode:1. About Anthropic Claude 32. AI Model Challenges and Tests3. Discussion on Claude Opus API4. Comparison of ChatGPT, Claude 3, and Google GeminiKeywords:San Francisco, California, Gen AI startups, funding, investors, valuation, revenue projection, Claude 3, Opus, SONNET, Haiku, task automation, data processing, customer interactions, computer vision, natural language models, NVIDIA conference, marketing plan, Jordan Wilson, Chat GPT, Google Gemini, Anthropic, AI challenges, AI poetry challenge, logic problems, AI newsletters, competSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
Anthropic just released its new model, Claude 3,
and the internet is going crazy.
All right.
Well, actually, the internet's not going crazy.
People aren't talking about Claude 3 a lot.
But should they be?
I mean, is Claude 3 the new model from Anthropic now better than chat, GPD,
and Google Gemini?
Or is it just a number?
large language model that may or may not get used. Well, we're going to be answering those questions
today and more on Everyday AI. Thanks for tuning in. My name is Jordan Wilson and I'm your host.
And Everyday AI, it's for you. It's to help you keep up with what's going on in the world of generative
AI to grow your company and to grow your career because there's so much going on literally
every single day. So you can waste so many hours or you can just tune in to everyday AI,
every day and learn from us. So today's show technically.
pre-recorded, debuting it live.
You know, it actually works out.
You know, I'm doing a little bit of traveling, and it's actually great for some other
guests that I've been able to interview recently.
So in the coming days, we're going to have a couple of pre-recorded shows.
I'm going to be joining live.
Most days, don't worry.
So still get your questions in.
And I know a lot of the guests are going to be answering as well.
All right.
With that, if you don't know, your everyday AI.com is the place to be.
It is the spot to learn, whatever you need.
need to learn about generative AI. I tell people it is a free generative AI university.
Type in whatever you need. Go search by category. We probably have a handful of dozens of experts
that we've already talked to who have already figured out that thing that you're trying to figure
out. So make sure to go to your everyday AI.com. And if you do need that daily dose of AI news,
it's going to be in the newsletter as well. So make sure you go to the website, check all that out.
All right. So let's get to the top.
of the question, the top of the show here, is Claude 3 better than ChatGPT and
Gemini, right? And we're going to get into a little bit of background and some quick history and
facts. If you don't, if you aren't very aware of, you know, Claude and Anthropic,
don't worry, we're going to be giving you the bio. But here's here's what you need to know.
Not a lot of people are talking about it right now, right? I assumed when, uh,
Claude 3 was released, right?
We've known for a while it was coming, right?
Everyone's always, there's always rumors about the next model, you know, GPT5 and, you know,
Gemini, you know, before it came out.
There's always rumors swirling around.
And, you know, these are the three big players.
You have Google with Gemini, you have OpenAI with chat GPT, and you have Anthropic with Cloud.
And I think that, you know, mistral is, you know, kind of in the top four as well, right?
But top three, I mean, there's no denying.
It is those three.
So I was actually kind of surprised, you know, it's only been out for about a day and change.
And there's not a ton of people talking about it.
So I said, all right, I'm going to dive in deep.
So I've spent a couple hours with Claude so far, taking it through some basic tests.
And we're going to be doing some tests here live on this show as well.
So if you are listening to the podcast, as always, I recommend you check out your show notes.
There's always some goodies in there.
So make sure you check that out.
All right.
So with that, let's just start.
at the top. All right. So here's, here's what's happening. According to Anthropics benchmarks,
it is the best model out there, bar none, right? Not even close, according to Anthropics benchmarks.
So we always got to keep this in mind. I said the same thing when, you know, Google Gemini in
December came out with their benchmarks. I always say these are not third party, right? These are
internal benchmarks. But at least one thing that we have here is consistency, which is nice. We
didn't get that very much or we didn't get that as much from Google.
All right.
So if you are on the podcast, don't worry.
I'll do my best to explain this to you.
So we talk about this one benchmark on the show all the time.
So it's MMLU, multitask language understanding.
All right.
So essentially it's 57 different subjects, you know, across, think of it like an SAT and
an ACT, but for large language models, right?
And why this has kind of become the gold standard for large language models in their benchmarks is because this is how you can kind of most accurately tie a large language model's performance to a human, right?
That's why it's across all these different subjects.
It's a very technically difficult test.
And according to Anthropics benchmarks that they released on their website at the same time they release Cloud Cloud 3 is it crushes everyone.
Well, it doesn't crush everyone.
but it wins every single category.
So specifically for MMLU, it's just barely, barely above GPT4.
So GPT4 scored an 86.4 and Claude 3 scored in 86.8.
So just marginally better.
You know, Google was in the 83 for their Ultra in 71 or Gemini Pro.
So it, you know, that's the other thing that struck me is, you know, I was technically
expecting a little bit more, which I know like, like y'all might be like,
all, Jordan, you're weird.
Why would you expect more?
They're winning literally in all of these benchmarks.
Well, because the GPT4 model is like almost two years old, right?
It's been being developed for two years.
It's been out for a year and a half now, I think.
So that's why, right?
It's also, I mean, you have to, you know, tip your cap to open AI,
the fact that they released GPT4 so long ago.
And it is now just finally being passed.
you know, this much time later, it's pretty impressive.
But that also to me tells you're right.
And people are like, oh, Jordan, like what large language model should I be using?
Well, I always say chat GBT.
I'll answer at the end if Claude 3 changes in my mind.
But, you know, presumably whenever GBT 5 or GPD 4.5 comes out, it is going to be so far ahead
of all of these other models, both in terms of real world performance benchmarks.
You know, we call large-nuage models treating them as business operating systems, right?
It's going to be so far ahead.
But right now, according to Anthropics' own internal benchmarks, Claude is ahead.
And it is important here.
And we're going to talk about this.
There's three different quad models.
So the most powerful model is Opus.
So similarly how, you know, Google has their base model, which is Gemini Pro.
And then they have their Gemini Ultra and they've actually upgraded it to 1.5, but not everyone has access to that.
That's another story.
Same thing with chat GPT.
They have their free model in 3.5,
and then their paid or premium model in 4.
Then with Claude, you have your basic or your baseline haiku model.
Then you have your mid-tier sonnet.
And then you have your premium opus.
All right.
So when we're talking about anthropic out benchmarking everyone,
it is with their highest tier model, which is opus.
All right.
So now that we got that out of the way,
We know it's these three companies, large language models, everyone is head down sprinting,
trying to, you know, gain the biggest market share, trying to raise the most money.
But you might not know, and profit, right?
Maybe you do, maybe you don't.
They're a startup, but they are well funded and they are heavy hitters, right?
So a lot of people don't know or don't realize, yeah, Open AI is a startup too, but they've been around since 2015, right?
They were some of the pioneers in the GPT space, right?
So here is Anthropic, high level, what you need to know.
This is like if Anthropic had a basketball card.
These are the stats on the back, right?
All right.
So it was founded in 2021, and the founding team is former Open AI employees.
All right.
Yeah, we're going to see a lot of that in startups in the future, FYI.
It's based in San Francisco, California, currently raised, all right, $7.3 billion.
million. I think that has to put them in top, top three to five of Gen AI startups with
the most money raised. Investors, their investors is a who's who list of who's who in tech.
I don't know any other company. And someone, if you're in the comments, let me know.
I don't know any other company that has this level of heavy hitters. All right. So Google,
Salesforce, Zoom, Amazon. In my.
Microsoft. Yeah, those are big partners. Yeah, they have dozens of others, you know, a lot of the big capital firms, you know, private equity, venture capitals, et cetera. But look at that. That is who's who, literally, of investors, right? Current valuation is more than $18 billion. All right. Their revenue, so it is projected for 2024. They're going to be at about $850 million annualized. All right. So that's kind of by the end. That's what they're hoping.
or projecting the hit.
And then there are different releases.
All right.
This isn't the first release,
but it is the third release in a year.
So the first version of Claude came out in March,
2023.
You had Cloud 2 and 2.1, you know,
Claude 2 came out in July, 2023.
And then here we are, present day,
Claude 3 in the three different varieties,
March 2024.
All right.
Let's keep going over some of the high level.
All right.
So these are some changes.
These are some things that have changed a little bit
from the last model, which I believe was 2.1, to now three.
All right, I already talked about this, but you have three models.
Opus, Sonnet, and Haiku.
All right.
Overall, you have improved performance, speed, and efficiency.
According to Anthropic, there is a two-fold improvement in accuracy.
So fewer hallucinations, they also mentioned.
A more willingness for Claude to answer questions.
You know, that's something they pointed out, previous versions.
Faud would all the time just say,
I can't really do this.
Sorry.
So more willing to answer more questions,
two-fold improvement and accuracy at launch,
which is now a 200K context window.
All right.
So that's 200,000 tokens.
All right.
So essentially,
without getting too far into it,
we've had plenty of episodes
and plenty of talks about tokens
and even the tokenization process,
why it's important, et cetera.
but essentially that's like a memory, right?
So it's got a huge, huge memory.
Almost, let's see, I should have done this math ahead of time.
But I think that's, so GBT4, if you're using chat GBT is 32K.
So at 200k, yeah, that's what, six times, six and change.
So more than 6x right now of the context window than chat GBT at launch.
And they talked about the capability to accept prompt,
puts up to a million tokens, right?
Memory is so important when it comes to working with a large language model.
You see the same thing with Gemini Ultra 1.5, I believe, is that a million tokens as well,
but most people don't have access to this.
It did seem, at least from Anthropics kind of release, that the capability to have that
one million tokens, it's more of a capability, and it's more on an as-needed basis.
You will be paying for it because if you're using the API, these have a cost, and we're going to get
into that here in a second as well. So here is our cost. It's kind of like a cost in power grid.
All right. So if you're joining us on the podcast, I think I can explain this graph. But essentially,
you have a cost on one axis and intelligence on the other, right? And obviously, the more these models
cost, the more intelligent they are. So haiku is kind of the lowest intelligence, lowest cost,
but it's also the fastest, right? Sonnet is somewhere right there in the middle. And I'm going to tell you
what all these different models are, you know, being positioned for from a marketing perspective.
So Sonnet is right there in the middle. And then Opus is, you know, off the charts, intelligence,
and crazy expensive, right? But I want to talk about that later because there's something,
I think, pretty special there. All right. So Opus, well, I'm going to talk about it now because
actually it's one of these bullet points. So we're going to talk about costs. So the cost,
there's going to be two costs for each of these models.
So one is the input, and this is the price per millions of tokens or million tokens in
outputs.
All right.
So essentially, I'm going to flip through these and then flip back so we can do all the
prices at once.
So this is if you're using the API.
All right.
So Opus is $15 per million tokens.
Input, $75 million output.
All right.
And then Sonet is $3.3 per million.
input, 15 per million output. And then you have Haiku, 25 cents million input, one dollar,
25 cent million output. So quite a range there, right? You go from a quarter to $15 for the,
for the input price per million difference between Haiku, the cheapest or most affordable and
opus, right? So wild. All right. So now let's talk about the capabilities. I've done a little bit of
testing between Opus and Sonnet.
So Opus.
So this is kind of what it's being billed or it's being advertised, that it can be
used for these things.
So task automation.
So to plan and execute complex actions across APIs and databases, interactive coding.
Research and development.
So research review, brainstorming and hypothesis generation drug discovery.
Strategy.
So advance analysis of charts and graphs, financial and market trends.
Okay.
So that is kind of opus and how it's being positioned.
All right.
So now moving to Sonnet, we already give you the price.
So let's use kind of what they're saying that, you know, people should be using this for.
So Sonnet, data processing.
So that's for rag or search retrieval over vast amounts of knowledge, sales.
So that's product recommendations, forecasting, targeted marketing.
And then time saving tasks, code generation, quality, control, parse, text from images, etc.
All right.
And then we have customer interactions for our last one here, Haiku, right?
So this is more of your, I'll say, you're low-hanging group, not saying this stuff's not important,
but you have your customer interactions, quick and accurate support in live interactions,
translations as well, content moderation, catching risky behavior or customer requests,
and cost-saving tasks.
So optimizing logistics, inventory management, extract knowledge from unstructured data.
All right, so essentially you have different use cases, all right?
And I think what I'm going to be keeping my eye on is how people are using the API for Opus, all right?
Because I saw a couple demos.
I'll make sure to link to it to it in the description here.
But the demos were pretty impressive of Opus, right?
It was taking a chart, right?
So it was near like working as an agent.
So it's taking a chart from Google, from a live Google search.
And it's grabbing, I believe via Python.
I'm not sure.
You know, they weren't really letting you look under the hood necessarily.
But it's grabbing in an interactive fashion.
So you know how you can like hover over, you know, a Google chart.
Maybe if you go for a stock price.
So in the demo of Opus, they are speaking with Opus.
And this was presumably via an API connection.
And it is not just able to.
read, you know, the web in this case, a live, you know, chart on Google. But as you, you know,
as you hover over, you know, this chart, there's, there's different price points. And it's able to
grab all of those as well. So some very impressive future use cases that right now, even chat GPT
and their GPT4 model doesn't have. So that's, that's impressive. But again, presumably that was
looking at an API use case and not something that you would use in inthropic out of the box.
All right.
So now that we have an overview of the models, let's put it to the test.
Let's put it to the test.
All right.
I'm going to do my best to describe what we have going on here.
But we're going to look at outputs in Claude and we're going to look at outputs in chat GBT.
So I'm going to be using the high.
highest level model for each or the highest level model that's available via their default chat.
All right.
So here's what this is going to look like.
All right.
I'm going to do the same prompts in each different model.
Okay.
And I'm going to read the response.
All right.
And this is not a scientific test.
This is not an MMOU benchmark.
This is infotainment, right?
We're going to get some information and maybe it'll be a little entertaining.
Maybe it won't.
maybe it'll just be super dorky.
All right, but I wanted to give everyone an emphasis because, or I wanted to give everyone
a chance and to see what we're emphasizing when we're testing certain things that we're looking
for.
All right.
I'm looking at it from an average business use case, right?
So day to day, something that's helping you in your work and your school, your research,
data analysis, et cetera.
We're going to look at very simple things, though, so this doesn't accidentally turn into a one-hour
show.
All right, let's get it started.
And again, we are using the opus.
model, all right, in Claude 3, which is the most intelligent model.
And also, I did do a just a model comparison.
If this is up your alley, I'll make sure to link that video in the description as well.
I did a 30 minute, more in-depth rundown.
We did some data, some data analysis, et cetera.
So let's go ahead.
So I'm typing in right now, and I'm going to zoom in so hopefully people can see.
So I'm saying to Claude, what model are you specifically using?
give me a brief overview of how it works, include knowledge cutoff.
Please keep it short yet specific.
All right, so Adobe just introduced an entirely new way to create,
bringing the power and precision of its creative suite into one conversational experience.
Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative
AI studio.
Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision,
just describe what you want, and shape the outcome as it takes full.
form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus
pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere,
Lightroom Express, and more to help bring your ideas to life. You can also get started with
creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing
photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant
takes is visible so you can refine, redirect, or take over at any time.
You stay in the driver's seat as the creative director.
Adobe Firefly AI Assistant now in public beta.
See it today at firefly.addauby.com.
This is actually improved a bit since yesterday, but still not that good.
All right, so it says, I am Claude and AI Assistant created by Anthropic using their
latest natural language model.
The specific details of the model are not public.
Like, oh my gosh, Anthropic and Google, you got to stop this, right?
The number one thing, the number one thing, holding so many people back from using large
language models, right?
And I love large language models.
I use them daily.
I get so much benefit out of them.
And I try to almost sometimes convince people.
I'm like, yo, like, why aren't you doing this?
This is like the best, better than sliced bread with no calories dripped in butter, right?
It's better than that.
But so many people don't trust large language models because they hallucinate and they
don't understand how it works. So if I ask a model, what are you? Right? I am selecting a model.
It should know that I selected opus. It says it right there at the bottom of the screen, right?
It's not like if you can switch models in like mid prompt. So it should know it is the opus model.
It is not telling me. I don't like that. All right. Also, it does say August 2023. So a little
upside there. All right, and we'll see that here in a second. All right. So now we're going in.
we're running the exact same prompt inside of chat GPT using GPT for the default model.
So at least here's a little better, right?
So at least it says GPT4.
It's not saying I'm chat GPT or it's not saying, you know, I'm, you know, I don't know
what model I am.
It's saying GPT for, all right.
It gives me a little description chat chpT does and it says here, which is interesting, right,
April 2023.
So technically Claude, which is, you know, you would assume so.
since it's a much newer model, but Claude has a more recent knowledge cutoff.
So what that means, without going too deeply into it, when we talk about using large
language models, hallucinations, it's always the thing you've got to, like, you have to avoid.
And if you think you can just use copy and paste prompts, you're going to get bad results,
but you're also going to get a lot of hallucinations.
These large language models are trillions of parameters, right?
Sometimes it's better in a lot of cases if you had 50 small models, right?
But most people use one large language model.
And the problem with that is people are impatient and they just copy and pay something
and they want to get something good out of it.
But what that leads to a lot of times because of the knowledge cut off is the model
being a little not too certain, right?
Or it maybe sounded certain like a year ago or a year more ago.
Okay.
And that's why you have to understand the knowledge cut off because essentially these large
language models gobble up the history of the internet copyrighted works.
yeah, they do in everything, right?
And then you can ask it, right?
But it cuts off at that point.
So anything that exists past April 2020 and chat ChbT,
chat ChbT doesn't know.
By default, however, chat Chbett does browse the internet.
It uses Brows with Bing, all right?
Huge advantage.
You don't have that right now in Claude.
But Claude does have a more recent knowledge cutoff.
All right, let's do the same prompt.
We're just going to keep this all in the same chat,
which is technically not ideal.
If I was actually doing a test, I would be doing it in new chats each time as to not impact the outcome.
All right.
So now I'm saying, please tell me five jokes that end in the word blue and all mention either fruit or an animal.
All right.
A little tricky, right?
Let's see.
So Claude spits it back out.
It says, here are five jokes that end in the word blue and mention either fruit or an animal.
All right.
So I'm just going to read one or two.
Number one, what did the banana say when it was feeling down?
I'm just feeling a little blue.
All right, not funny, but it didn't hit all the criteria either.
Oh, yeah, it did.
Okay, so it ends in blue, got the banana.
Good.
So, okay, it either, again, it has to either be a fruit or an animal and end in blue.
All right.
The second one, why was the blueberry sad because it was feeling blue?
All right, not really a joke, but kind of, I guess.
We'll do another one.
What did the cat say when it fell into a vat of blue dye?
I'm feeling blue.
There we go.
Feeling like feline.
All right.
So it actually improved this.
When I did this yesterday, it failed miserably.
So it's improved a little bit.
Maybe there's some behind the scenes, little updates going out, going out there.
Oh, I guess this would have helped you all if I was sharing the correct screen here for the live stream audience.
So sorry about that.
So here's our three that we just read out for and profit.
All right.
So now let's jump over.
And we're going to do the exact same thing.
in chat GPT.
Let's see how chat GBT does.
All right.
Number one, what did the grape?
Why did the grape stop in the middle of the road?
Because it saw the traffic light turn blue.
Not that funny, but it hits.
It hits the things.
All right.
What did the orange say when it wasn't ready to be picked yet?
I'm feeling a little blue.
All right?
Why did the banana go to the doctor?
Because it wasn't peeling well and felt a little bit blue.
All right.
So here we go at least.
we got both of them kind of quote unquote passed that test, right?
It's not a real official test, but for the most part, I'd say they both got a passing grade way.
All right, next.
All right, we are jumping into Claude.
Again, the opus, the most powerful model.
Here is our next test.
Write a four-line poem about the ocean.
Make lines two and three rhyme.
And do not mention the words blue, wet, waves, beach, or sand.
A little tricky, but shouldn't be too hard.
All right, here is the poem from Claude.
Vast expanses of endless brine where creatures swim in secrets hide.
In depths that reach the ocean's side, a world untamed, serene, divine.
All right.
Pretty good, actually.
I like that, right?
It's tricky, right?
It's tricky to write a four-line poem about the ocean, rhyming the middle two lines.
and not mentioning the words blue, wet, waves, beach, or sand.
Kind of difficult, right?
All right, let's jump into chat chbt.
We're going to do the exact same thing.
And we're going to see if chat chbtee passes this test as well.
All right.
So chat chp tcd kind of took a shortcut.
I don't hate it.
It played within the rules.
All right.
So here's here is chat chabotis.
The oceans.
Oh, interesting.
Okay.
I never said don't use the word ocean, right?
And the thing with Claude is you kind of,
you kind of didn't know what it was talking about if you just read it, right?
So I guess plus one, you know, chat GPT is being a little, not sneaky, but really taking
advantage because it actually rhymed all lines and it used the word ocean and I never said,
you can't rhyme all lines.
So maybe it made a better poem because we at least know it's about the ocean and the whole thing
rhymes versus just randomly rhyming the middle two lines.
Anyways, the oceans embraced vast and deep where secrets and ancient whispers keep, a realm
were silent shadows sweet.
Majestic, it lulls the world to sleep.
I'll let you judge.
Let me know the comments, which one?
Hey, any poets out there?
Which large language model is the next Hemingway?
Let me know.
All right, but regardless, they both pass.
Actually, I kind of like chat chit.
It was a little, not like tricky, but, you know,
it really went to the limits of where it could operate, right?
It said, all right, well, a poem, maybe it thought a poem.
poem would be awkward if just lines two and three rhyme.
That's not a normal rhyming scheme.
So let's rhyme them all.
I never said don't rhyme them all.
So all right, chat, GPD.
All right, our next prompt, we are in Claude.
I'm saying, and I'm going to go ahead and tell you the answers to before I even show it on the screen.
So the next one is, I had seven apples yesterday and gave two to my friends yesterday and also ate one yesterday.
one of my apples went bad two days ago right after I sold one the day before that.
How many good apples did I start with three days ago and how many apples do I have left?
All right.
So this is pretty, it's a little, it's a little tricky, but not tricky at all.
Right.
So if you look at the seven apples, you say gave two to my friend, also ate one yesterday.
So technically today we're left with four apples.
The rest of it is kind of a trick, but it's just get to the answer of how many apples did we originally have.
All right. So I said one of my apples went bad two days ago right after I sold one the day prior. All right. So let's go ahead and see, go over the logic. So this is good, Anthropics breaking it down. And again, y'all, this is so we can see. Are these models good at reasoning? Are they good at logic? Can they, you know, think critically, right? A lot of people think.
large language models are just auto-completes.
Are they?
Yes.
But now they're at the point where they are auto-completing the smartest humans ever.
All right.
There's a big difference.
You know, it's not just, you know, Google auto-complete.
It's using logic, right?
If I just spit that out, you might have to, you might not know the answer, if I'm
being honest, right?
You might have to write it down and think about it.
All right.
So here's what Claude says.
All right.
It says, great.
Let's solve this problem step by step.
to find out how many good apples you started with three days ago and how many have left.
We need to work backwards from the information given.
All right.
So it's saying given.
Here it is.
It's saying step one.
Apples two days ago plus apples given to a friend plus apples eaten.
Apples two days ago, 10 apples.
All right.
So that's seven plus two plus one.
All right.
Good.
And then step two, find the number of good apples you had two days ago.
It's going through that.
I had nine good apples 10 days ago.
I total of 10 apples.
All right.
Step three, it's finding.
you know, the step three and four, it's finding the total amount.
So here it says, therefore, you started with 10 good apples three days ago,
and you have four apples left now.
So technically, it's, it's correct, right?
I didn't specify, but I do like that Claude broke it down and actually did the difference
between good apples and bad apples, right?
Because technically there could be two different answers to this because I didn't ask the
question very well.
I should have said either how many good apples or how many total apples, right?
So it technically gave me kind of two different, two different ways, broke it down, good job.
All right, so pass the test.
Let's look into chat chbtee.
We're not going to go down and read everything.
I'm just going to see.
So the thing I like about chat chabit is it breaks it out into Python, right?
And you can see exactly how it's running it.
I'm literally learning Python just by looking at how chat chabit analyzes it.
I love that you can expand the output and kind of see what's going on on.
All right. So it's going through. It's, it's, you know, rationalizing. It's, it's coming to conclusions.
It's doing it live. So it says apples yesterday equals seven, given away equals two. Eaton equals one.
When bad, equals one. All right. So let's go. Yeah, I should have been a little more specific because technically there's, technically there's two answers to this. But, all right.
So same thing. All right. It did chat. GPD did clarify and said good apples. All right. So 10.
good apples two days, three days ago and four apples left. All right. So they both got that one.
Not super tricky, but again, it just shows you a little bit, a little bit of logic. All right,
our next one here. We're going to go through this one quickly. I fill the cookie jar with 12
cookies on a Monday. I eat three cookies. Oh, I eat all the cookies. I just had some Girl Scout
cookies and mince. Those are my favorite. What's your favorite? By the way, I'm curious. Okay, so this one,
I said, I filled a cookie jar with 12 cookies on Monday.
I ate three cookies on Tuesday and my sister took four on Wednesday.
I baked and added five more cookies on Thursday.
On Friday, two cookies were sale and had to be thrown away.
How many cookies did I start with on Monday and how many cookies are left now?
So a similar problem before using some simple logic, math, et cetera.
Not a super hard problem.
I'm assuming both are going to get this right.
I haven't tried.
If it doesn't get this right, there's a problem.
This is a very simple, simple equation.
All right, there we go.
So Claude got it right.
And it says,
therefore, you started with 12 cookies on Monday and you have eight cookies left now.
That's the correct answer.
Eight cookies.
Yum cookies.
I'm hungry for cookies.
All right.
We're doing the exact same thing now inside Chat Chb-T.
And as always, it's busting out the Python code.
And we can look at it, do its work.
All right.
So it looks like Chad GPD is going a little faster.
Yeah, actually a decent amount faster, it looks like.
All right.
Same thing.
You started with 12 cookies on Monday.
You have eight left.
All right.
So got it.
for each one. Not too bad. All right. So now this next one, this is the only one where we're going
to be bringing in kind of some outside data. All right. So here's here's what we're going to do.
Let's see if we get this going. All right. So I have a screenshot here. All right. So this is just a
screenshot of the everyday AI homepage. All right. So one of the things that Anthropic Claude said
Claude 3 was great at was just way better vision than everyone else. So vision, you know,
It's this concept of think of the AI chat that you're chatting with has computer vision, right?
And you can upload a photo and it can say, oh, that's a dog or that's a dog on a tricycle in,
you know, South Africa, you know, eating up there, right?
Like, whatever is in the image, it can use kind of computer vision and tell you,
similarly, it can pull out text or it can pull out, you know, you can upload one test I did
recently for this Claude versus ChatGBTBT, was uploading a PDF from a spreadsheet, right?
Pretty hard, pretty difficult.
And you can go watch that one if you want to see how they handle it.
All right.
So essentially, here we go.
I have a screenshot on my website.
I haven't tried this.
All right.
And I have really just a simple prompt, right?
So let's see if this works in Claude.
So here's what we're going to do.
I don't even know.
I assume I can upload PNG's.
I believe I have a PNG file here.
Let's see.
Yes, it's PNG.
All right.
So I am uploading this into Claude Opus.
And here is the prompt.
I am saying, please tell me what this is and give me a 10-step marketing plan for this business,
as well as the current competitive landscape.
Keep it detailed and ultra-specific, yet short with bullet points without wasting any words.
All right.
So let's see how Claude reasons here.
All right.
So it says the image shows a landing page for a daily AI newsletter and podcasts called Outsmart the Future.
All right.
So it mistook our kind of headline, Outsmart the Future, for the name of the company.
The name of the company is Everyday AI.
So I don't know.
I mean, we'll see if I'm disappointed in that outcome.
I could see.
outsmart the future on our website is much larger.
But you would think a model would be able to tell the difference, but maybe not.
We'll see how chat chivit does.
All right.
So it says it invites visitors to join over 9,100 daily email newsletter subscribers, all right?
In 22,000 podcast listeners, hey, if you're listening, you're one of those people.
All right.
So pretty good.
You know, here's just reading text.
All right, I really wanted to see how it did with the second part, which is when I'm
asking it to give me a 10-step marketing plan for this business as well as the current
competitive landscape.
All right.
So, all right.
Another good thing is I was, I was curious if it was going to be able to read the logos because,
you know, we had logos of, you know, all the different tech people out there that read
our newsletter.
So it got it right.
It got Google, Amazon, meta, IBM, Intel, Salesforce, and video.
All right.
So here it says the business plan I would recommend.
All right.
So it says, you know, number one, continue publishing high quality, informative,
AI related content.
Two, monetize the newsletter through sponsorships.
Three, there's more.
I'm just not going to read it all.
Three, grow the podcast listenership and monetize it through podcast ads, sponsorships, etc.
Four, create additional AI-related content offerings like e-books, courses, webinars.
So so far, this is fine.
Nothing crazy, nothing great.
Five, explore partnership opportunities with some of the major tech companies mentioned to increase visibility.
Six, continue to optimize the newsletter landing page and sign up flow to maximize conversion
rates, A, B, test, different headlines, et cetera.
All right.
So here's the thing.
It only got to six.
All right.
So not that good.
Didn't get to 10.
The business advice was, meh, a standard.
There's maybe one thing that I'd be like, okay, you know, this couldn't come from a high
schooler.
You're right.
So maybe testing, you know, A.B testing, different call to action button, social proof,
et cetera.
So not the best, but that's fine.
Also, it says the focus should be on steadily growing a loyal audience by
providing uniquely valuable AI content, then monetizing that audience through multiple channels,
the existing traction and authority status provide a great foundation to scale this business.
All right, so not that good.
All I'm going to do, I'm going to give each of these two shots.
All right.
I'm going to go ahead and click retry as we go over here to chat GPT and try the same thing.
I'm not going to read through the whole thing.
Don't worry.
I just want to see how different it is because there, I wouldn't say Claude failed, but it didn't
tell us the competitive landscape.
It didn't give me 10 steps.
It wasn't super specific either, all right?
And it got the name wrong.
So let's see how chat GBT does with vision.
All right.
So again, the GPD4 model in theory or chat GBT has a huge advantage.
It is connected to the internet.
It can use browse with Bing.
Claude, at least right now, is not nothing we know of.
All right.
So already, we're already ahead here with chat GBT.
It says the image appears.
to be a screenshot of a website promoting everyday AI, right?
A platform that offers a daily podcast, daily newsletter podcasts and tracks related to artificial
intelligence.
All right.
And then we said, okay, so it says here's a 10-step marketing plan for everyday AI, along
with a brief competitive landscape.
All right.
So here's the marketing plan.
I'm not going to go too into that.
Identify target audience, you know, clearly define the target market segments.
Unique value proposition.
That's good.
I'm surprised claw.
I didn't mention that.
emphasize the unique aspects like daily updates, etc.
Content marketing, produce high quality SEO optimized content, social media strategy,
engage with the audience on platforms like LinkedIn and Twitter.
Five, partnership collaboration, collaborate with tech companies.
Six, email marketing, develop a drip email campaign for new subscribers in a nurturing campaign
for leads.
All right.
So there's one thing.
It's kind of gotten two that are, you know, above average so far.
Seven, online communities.
Oh, we're going to have something like that launching soon.
Should I go ahead and promote it?
Yeah, go ahead.
This has been three months in the waiting, y'all, and I'm sorry for making you wait.
But type in Inner Circle.
All right.
I swear we're launching this soon.
It's going to be amazing.
All right.
So seven, online communities and engagement.
Eight webinars and live events.
Referral program.
That's a good one.
We're going to be launching that as well.
And then analytics and optimization.
So by far, the specificity in the actual examples in the 10-step business plan were much better in chat chb-t.
Here we go, competitive landscape.
So it says direct competitors.
Oh, this is funny.
Direct competitors.
Other AI newsletter and podcasts, such as the AI podcast by NVIDIA.
Wait, they're not a competitor.
They're a partner.
Look at that.
An unprompted little, not ad, but hey, this.
This right here.
You see this?
This is from our friends at NVIDIA.
So we're going to be broadcasting live at NVIDIA, March 18th to March 21st.
All right.
It is in San Jose, but you can also sign up for free and attend the virtual conference.
It's literally, it's a who's who of generative AI.
Obviously, you know, you have everyone at NVIDIA is going to be there speaking, but you have people
from, I mean, you have people from meta.
You have people from OpenA.
I have people from sales force, I believe.
It's literally everyone, Microsoft.
Everyone's there, right?
So go check the show notes here.
I'll have the link in today's newsletter as well.
So if you sign up even just for the free.
So if you buy tickets, screenshot it, I got a form to fill out on the website.
If you just sign up for the free, the free one as well, go ahead, sign up for the free one.
It's going to be amazing.
It's going to be virtual.
I hope you still listen to me daily, even though you're.
at the virtual conference for
Nvidia GPC.
But even if you sign up for the free one,
screenshot your registration.
And then I have a form on my website
and then you'll be entered to win this.
This is the G4s-R-TX-3080TI GPU chip, right?
If you want to run
Nvidia's new local,
it's not even technically a model,
it runs other models,
but their new local software,
which early reviews are,
it's pretty amazing.
Chat with RTEs.
You need a G-Force chip like this or above.
So, hey, you could go spend, I don't know what this goes for, like $300 plus
dollars on this chip or literally go sign up.
It's a win, win, win.
Go sign up for the free conference, attend the sessions, just screenshot it to me,
fill out the form, that's it.
And then you can win this thing.
All right.
That was an unintended ad, I swear.
Like, I just happen to have this chip here.
And I'm like, oh, should I talk about it?
but chat ChbT gave me a good idea.
All right, anyways, back to the competitive latest case.
So chat Chbitty says direct competitors, other AI newsletters and podcasts,
such as the NVIDIA podcast by NVIDIA, this week in machine learning, et cetera.
It's a great, great podcast, by the way, TW IML.
Indirect competitors.
It's naming some indirect competitors.
Yeah, interesting.
Okay.
Hey, it's telling me something I didn't really think that, you know, tech crunch and wired.
I kind of thought of them as indirect competitors, but sure.
And then it's going into differentiation.
So it's saying focus on daily updates, which may set every day AI apart from weekly or monthly
competitors and emphasize actionable business insights derived from AI trends.
All right.
And then it gave me market trends.
It says there's growing interest in AI across industries leading to an increase in demand
for AI educational content.
All right.
So I don't think it is even close how Claude did all this, right?
So Claude got the name wrong.
Chat, GPD, got it right.
Claude's advice was pretty generic.
Chat GPT's advice was generic, but a little better.
Claude only did six pieces of advice.
All right.
Chat GPT did them all.
Chat GPT did 10.
And then last, and then last but not least,
the competitive landscape.
Claude just skipped over that, right?
Chat GPD didn't.
All right.
I did say I'll give, chat GPD doesn't need another chance.
We'll see if it did better on the second chance.
chance. Again, generative AI generate something different almost every time. All right. So let's see if it did any
better on the second try here. All right. So nope, still did six points. But this time it did give us
somewhat of a competitive landscape as well. And then some ideas on how to stand out. So the first time
it whiffed. Let's see. This time it said the newsletter is called Outsmart the Future.
everyday AI. So the first time it just got it way wrong. The second time, still wrong,
but a little less wrong. All right. So that was a very, very unofficial, a very unofficial kind of
test, right, on on these models. So here's the thing. Does this matter? Does this matter? You know,
I'm showing this, this, this MMLU chart, these benchmarking charts, right? I don't know. People are
asking, you know, one of the things I get asked all the time is, okay, well, Jordan, you talk about
AI every day. What tools do you use the most? And I feel like so basic. I say chat chad-jib-T,
all day, all day every day, chat chabit. People are saying, Claude, no, don't use Claude.
I mean, when Claude was two or two point one, I didn't. You know, I'm going to give three a try,
continue to explore, you know, seemingly it's going to get better by the day. It's like,
I don't, I don't use Claude. It's not very good. Cloud two wasn't very good.
Quad three is a good maybe, right?
In our very limited testing, you can't look at this is definitive.
So if you're brand new to large language models and you're trying to make a decision on which one is best for you, your small business, your department at a large company, this is not it, right?
This is a very small test.
You have to actually do the work.
But this is a very small test.
And you'll see that for the most part, Claude wasn't very good compared to chat GDPD.
There are some things that it does better, right?
In my unofficial testing, I didn't want to have to.
code live, but it does do better at coding, at least to some initial test.
But one thing I'm excited for, and we did mention that earlier, is the API, right?
That interactive piece and being able to interact with things in a different way on websites,
like the example of the interactive charts and being able to run research and development.
And essentially, I think the Claude Opus API,
could be our first official segment into agents, right?
This could be the first one.
You know, we kind of have like mini agents that, you know, you have,
you have Langchain, which is improving.
But Claude, maybe Opus, the API is going to be our first hint at what it means to have
AI agents, right?
Kind of autonomously doing work for you, right?
Because right now you can't really do all of that with chat, GPT,
especially since they got rid of plugins, right?
All right.
So I hope this was helpful.
If so, please consider sharing this with your friends.
You know, if you're here on LinkedIn or Twitter watching this, you know,
click repost or retweet or reex or whatever it is, you know,
sometimes we take five, six, seven, eight hours to put one single show together.
People always ask, Jordan, how can I support you?
This is so good.
Free knowledge.
Hit that repost.
Share this with your friends.
If you enjoyed this, please leave us a rating on Spotify or Apple.
and go to your EverydayAI.com, sign it for that free daily newsletter.
And you know what?
Here's my takeaway.
Quad 3.1, is it better than ChatGPT and Google Gemini?
Well, you're not even comparing Google Gemini because Google Gemini's a hot mess right now.
Maybe it'll be better.
So is Claude 3 better than ChatGBT?
Well, for me, no.
On benchmarks?
Yes.
But I want to hear from you.
All right.
Thanks for tuning in.
And we'll see you back tomorrow.
every day for more everyday AI. Thanks.
Meet Firefly AI assistant.
Now live in Adobe Firefly,
the Allman One Creative AI Studio.
Just describe what you want to create
in your own words and the assistant handles the rest.
Orchestrating multi-step workflows
across Adobe Creative Cloud apps,
including Photoshop, Premiere Express,
and more in one conversational interface.
You direct the outcome while the assistant
accelerates execution.
Stand control with the ability to step in
and refine at any time.
See it today at
firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com
and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
