Everyday AI Podcast – An AI and ChatGPT Podcast - Ep 662: Opus 4.5: New king of the AI hill or just a niche model for coders?
Episode Date: November 26, 2025"... best model in the world..." 🤔Wait, again? Days after Gemini 3 Pro splashed on the scene, Anthropic snuck in a low-key drop in Claude Opus 4.5. And Anthropic pulled no punches, call...ing its new model the "best model in the world for coding, agents and computer use"So, should you be hot swapping your Gemini or ChatGPT use out for the new Opus 4.5? Or, is this model more of a niche for software devs? Tune in, as we put AI to Work on Wednesday! Opus 4.5: New king of the AI hill or just a niche model for coders?P.S.... we're out for Thanksgiving. So after this show, we'll see ya Monday!Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion:Thoughts on this? Join the convo and connect with other AI leaders on LinkedIn.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:Claude Opus 4.5 Release & OverviewAnthropic's Coding & Agentic Task BenchmarksOpus 4.5 vs Gemini 3 Pro ComparisonAPI Price Cut & Cost EfficiencyAgentic Tool Search and Context CompactionMultimodal Vision Features & Zoom ToolClaude for Excel & Enterprise Data WorkflowsChrome Extension and Desktop App UpdatesTimestamps:00:00 "AI Throne: Gemini vs. Claude"05:38 "Trends Dashboard with Claude Tools"09:25 "AI Model Benchmark Showdown"11:17 "Benchmark Comparison: Coding Models"13:49 AI Models for Software Engineering18:24 Claude API Pricing Slashed21:09 "Multi-Agent Models & Vision Tools"25:48 "Claude Chrome Extension Access Update"28:36 "Claude for Excel Launches"30:50 "Chat Prompt Context Limitations"35:45 "Improving AI Chain of Thought"38:50 "Analyzing Podcast Trends with AI"40:41 "AI Tools for Building Apps"Keywords:Claude Opus 4.5, Anthropic, benchmark leader, AI hill, coding model, software engineering, agentic research, data analysis, Opus 4.5 API price cut, sweep bench verified, coding capabilities, tool orchestration, context compaction, infinite chat, effort parameter, agentic tasks, API pricing reduction, front end AI chatbot, Chrome extension, Claude for Chrome, browser prompt injection, Claude desktop app, Claude code, Excel integration, ClaudSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
Transcript
Discussion (0)
This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips.
Listen daily for practical advice to boost your career, business, and everyday life.
Meet Firefly AI Assistant, now live in Adobe Firefly, the all-in-one creative AI studio.
Just describe what you want to create and the assistant handles the rest,
orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome.
The assistant accelerates execution.
Is Google's reign on top of the AI throne already over?
Just a couple days after Google released Gemini 3 Pro.
Anthropic quietly released what they called the best model in the world for certain tasks.
And now the rest of us are left here wondering, what the heck happened?
how can we keep up? And is Claude's new model or Anthropics new model in Claude Opus 4.5,
the new king of the AI Hill, or is it just a really good model for developers and people who work
with data? Well, we're going to be diving in on today's episode of Everyday AI to find out.
Let's get after it. What's going on, y'all? My name is Jordan Wilson. Welcome to Everyday AI.
If you're new here, this is your daily live stream podcast and free daily newsletter,
helping everyday business leaders like you and me make sense of these AI updates.
Yeah, apparently, you get the world's best model every other day.
All right.
So this is where we help you keep track of all those updates, leverage them and grow your
company and your career.
So if that's what you're trying to do, awesome.
Starts here with the unedited, unscripted live stream podcast.
But to take it to the next level, you've got to go to our website, the cheat code,
not in disguise your everyday AI.com.
Go sign up for the free daily newsletter.
We're going to be recapping today's episode and a whole lot more,
including the AI news for today.
Who knows?
By the time the newsletter's out in a couple of hours,
maybe we'll have a completely new leader in the AI world.
All right.
So let's get into it.
And first, apologies, y'all.
So this is our putting AI to work on Wednesday's series,
where on Wednesdays, we do more of a practical look at news.
large language model updates, usually from one of the big four or new features,
from Anthropic, OpenAI, Google, or Microsoft.
And on Monday, y'all, I said, hey, we're going to be taking a deeper look at Gemini 3.
But then hours later, after our Monday show, Anthropic made this huge splash out of nowhere,
and now we have Opus 4.5.
And according to some benchmarks, it is the best model in the world for specific tasks.
So we'll still do another deep dive on Gemini 3.
Don't worry, but today we've got to take a look at this new model from Anthropic in Opus 4.5.
So on today's show, we're going to dive into those benchmarks from Opus 4.5, industry leading in many different categories.
We're going to showcase three overlooked feature updates that Anthropic barely even mentioned.
And we're going to explore three different use cases for Opus 4.5 across a Gentropic.
authentic research, file creation, and data analysis and coding.
All right.
So let's just start there.
Let's start at the end.
All right.
So live stream audience, let me know.
Can you see my screen?
Hopefully you can.
So we're going to end up showing you three use cases.
We're only going to be able to do two of them live because I did some token math and
because of Anthropics, terrible rate limits, which even though they just said that they
updated, they're still terrible.
We're only going to be able to demo two of these live, but I did run another one earlier that we'll be able to take a look at.
All right.
So we're going to get these started like any good cooking show.
Let's get in the kitchen.
So I'm going to describe a little bit what we're doing later on these, but let me just read the different prompts and let everyone know.
So podcast audience, this is probably not going to be a super visual episode, but I do always recommend just going to our website.
Again, your EverydayAI.com and check out the video version of this.
All right. So the first one here, again, I'm selecting Opus 4.5, which is now available for paid users.
And my first task, we ran a couple of these last week in our kind of hands-on, initial hands-on with Gemini 3 Pro from Google.
So the first one, I'm saying, read the last five newsletters at read.
Dot your Everyday AI.com.
So, yeah, you can actually go read every single newsletter we ever had on that website.
And I'm saying, then find 10 recent AI news stories.
that were not covered in those five issues.
Research each story and outline how each would look as a podcast segment.
Suggest episode title, main points, context background, and a short show outline,
ground answers with links to sources.
All right.
So again, when I go over these use cases, I'm giving you actual use cases that we test different
models and how we use different modes and features across the big, large language models.
It's a question I get all the time, Jordan.
and how are you using AI?
Well, here, I'm showing you.
We're putting AI to work on Wednesdays.
All right, so let's get that first one, cook in, get it in the oven,
$450 for 20 minutes.
I don't know if it's actually going to be $450 for 20 minutes.
We'll see.
All right, next one, that we're going to get it going.
I am uploading three different documents from my podcast.
So these are just different stats.
We use Buzz Sprout as our podcast host,
so we get some different stats here.
So what I'm saying is analyze my podcast stats.
I've uploaded the files.
Using Claude Artifacts, I'm going to talk more about artifacts later.
It's always been one of my favorite modes across any large language model.
So I'm saying using Claude Artifacts show me the top 10 obvious trends, the top 10 hidden
trends, 10 biggest growth opportunities, and 10 episode ideas I should plan for December
2025 based on recent trends.
Using Claude Artifacts, build a dashboard that is extremely interactive, sortable,
and filterable and useful as if it were a full stack high priced SaaS application.
Again, using Opus 4.5.
So the first prompt, more on the agentic research side.
The second one, at least that we'll be able to run here live a little bit more on data
analysis, visualization, coding, etc.
All right.
We're going to let those cook in the background and let's get into more of the details now
on what's new inside Claude Opus.
Again, we had kind of heard some rumors and rants recently that Anthropic might release Claude.
We actually, in our newsletter, it's the hard part of a daily newsletter, y'all.
I remember hitting scent, and I kid you not, it was seven minutes later.
I go to my wife, oh, my gosh, Anthropic just released a new model.
And she's like, all right, didn't you just say this last week, like twice?
There's a new best model in the world.
Yes, I did.
That's how quickly AI news happens.
So there's kind of, we knew something was coming, didn't know it was coming this early.
So here's kind of what's new and the biggest takeaways from Claude 4.5 opus.
So I got to get in the right habit.
Technically, Opus 4.5.
It used to be a number than opus, but it's Opus 4.5.
I have it wrong on my little visuals here.
So Opus 45 is Anthropics top mom.
model. All right. So they have their three different tiers. All of them now are on the four or five
variant. So you have your haiku, which is your kind of fastest, but least intelligent. You have
your sonnet, which is kind of your middle of the road, middle intelligence, middle speed. And then you
have your opus, your most powerful, but usually a little bit slower and more expensive if you're
using it on the API side. For the most part, aside from when we're talking about API pricing,
we're talking about using Claude on the front end or using Opus 4.5 on the front end. So that's when you go
to claw.AI and you're using it as a front-end chat bot. So they did say that this is their new top
model, achieving state-of-the-art results in coding and agentic tasks. The API price cut, huge. We're
going to talk about that in a couple of slides here. And it did set a new benchmark on SWEBENCH-Verified,
which is one of the, you know, if you are a developer, software engineering, coder, you know
SwayBench verified. It is one of the more notable benchmarks that AI
models go through specifically when it comes to coding and completing different kind of bug
creation or bug fixing to put it in layman's terms. This also introduces an effort to parameter
on the dev side to trade response thrown as for token use and latency. So that's on the dev side.
And then there's also some new enhanced agentic features. So tool search and context
compaction or the infinite chat, which improves long running multi-step workflows.
benchmarks. We got to talk about it because even though Anthropic was kind of quiet with their
announcement, right? Simple blog posts, a couple tweets, right? Not the usual type that were
used to out of Silicon Valley, right? Big splashy live stream, you know, big production. Anthropic
just kind of, you know, put out a couple videos and just a little blog post. But they were extremely
splashy by calling this the again.
And I'm going to quote their words,
the quote unquote best model in the world for coding agents and computer use.
And in the benchmarks, they shared,
aside from three of the benchmarks where either Gemini or OpenAI came out on top.
So in the other, what is that?
In the other six or seven,
Claude Opus 4.5 was tops against their older version,
Sonnet 4.5, Opus 4.5.
Opus 4-1 and then Gemini 3-Pro and then GPT-5-1.
What's important to note, there already is a more powerful version of GPT-5-1, which is
GP-T-5-Pro, which is extremely impressive.
I think for me even, I mean, we'll see how Opus 4-5 fares.
I don't think it's going to end up being my daily driver model.
My daily driver model probably now is going to be Gemini 3 Pro, but when I need a little bit
more power, a little bit more juice, I'm probably personally going to be using GPT-5-1 Pro,
an amazing model, unfortunately only available on that $200 a month pro plan. But Opus 4-5,
at least for these benchmarks on agentic coding, agentic terminal coding, agentic tool use,
scaled tool use, computer use, and the novel problem solving. These are the benchmarks that
Anthropics shared on their website pretty far ahead of everyone else. So if you're just
looking at Anthropics website and reading their blog posts, you might just think, oh, this is the
most powerful model in the world. And that's literally, like I said, what they said themselves.
Is it? I don't know. Let's look at some third party, some third parties here. So on the artificial
analysis website, and this is, we've mentioned it before on our show, this is a great,
unbiased third-party site to look at when you're trying to figure out what is the best model for
what use case.
So this is kind of an aggregate score of some other different benchmarks, including live code
bench, side code, terminal bench hard, and some others.
But you'll see here, even on the coding index, not even the artificial analysis intelligence
index, which is kind of like their overall or kind of their final boss metric.
so to speak. So even on the coding one, even though Anthropics said best model in the world for coding,
when you start to look at aggregate coding benchmarks, it's not. So Gemini 3 Pro comes in ahead of Claude Opus 4.5.
So interestingly enough, they didn't on aggregate benchmarks, they aren't the best in the world.
And two points actually, and this is a pretty comfortable lead.
And then even on the artificial analysis intelligence index, it's not in the lead either.
It is tied for second place with GPT-5-1 high, which again is not OpenAI's best model.
It's not benchmarked.
Their GPT-5-1 Pro is not benchmarked.
It just came out last week.
There's no API access yet.
So that's why the GPT-15 Pro has not been benchmarked in a lot of different places.
But even on the artificial analysis intelligence index, which is the conglomerate of all the different
available third-party benchmarks.
Gemini 3-Pro is fairly ahead of Claude Opus 4-5,
which I said right now is tied with GPD 5-1-high.
So it seems like some cherry-picking and marketing to me from Anthropic in those claims.
But other third-party benchmarks do show that it's, you know, doing very well.
In this case, on Live Bench, it does hold a slight lead.
over Gemini 3 Pro and GPT5 high on their kind of aggregate scoring system by less than a point,
about 0.7 points over GPD5 high and about 0.3 points over Claude of 4 or 5 opus.
So very slim lead.
But is it a top tier model?
Absolutely.
Especially.
I think if you're working on any agendic tasks, if you're using
Claude on the back end, absolutely dev software engineering, sure.
But a lot of our audience, I believe, and hey, let me know in either the live stream comments
or in the podcast comments, if I'm wrong.
But I think a lot of our audiences using these models on the front end and a lot of times
with a team.
And in that case, I don't think really Claude 4 or 5 Opus for general tasks is a clear
runaway king of the AI Hill, so to speak.
But let's highlight some of the more.
notable kind of accomplishments and also just what's new across the board with this model.
Because it's a lot more, I'd say, than just a small little marginal update to Opus 4-1 or
to Sonnet 4-5, whichever you consider to be their most powerful model.
So obviously, the headlines, software engineering and coding.
It's actually interesting.
Before I started recording, I wanted to look.
the history of what Anthropic itself has benchmarked itself against.
And if you go back to some of its earlier models, you know, a year and a half ago,
they're not using the same benchmarks.
They've essentially given up on being a general purpose model.
And they're really only focused on, I think, making a play in the vertical space.
I really think they only want to compete in the long term kind of software engineering and financial
and the agetic sides, right?
So I don't think Anthropic is really concerned with, you know,
anymore being like a great strategist,
a great creative thought partner, you know,
being overly creative, those types of things.
I don't think that's where the model has been developed
over the last six to nine months.
Anyways, on the software engineering and code side,
it did receive an 80.9 on Swee bench verified.
That's absolutely frontier engineering ability.
Also, according to Anthropic, it can reduce multi-day team projects to hours,
you know, giving a boost to your team's engineering velocity.
And it outperformed any human ever on Anthropics' own two-hour engineering exam.
And this was Opus 4-5 was the first model from Anthropic that actually outperformed any human ever.
So Anthropic has their own internal engineering exam.
And this is the first time that it outperformed any human that had ever taken it.
So let's talk a little bit on the API side because that's the other big play here.
All right.
So I am at the very end going to give you kind of the three under the radar features.
But I mean, if I had to say number four, it would be the cost reduction.
So previously, it didn't make sense anymore.
per anyone, if I'm being honest, to use Anthropics models on the back end.
They were ridiculously expensive depending on what your task was.
You know, not only before this came out, before Opus 4.5,
Anthropic was not a top three model, even for engineering and coding,
at least if you look at any relevant benchmark, right?
Actually, let me just scroll back here a couple spots here on my screen.
So you'll see, let's see, yeah, here we go for the coding index.
So again, this is the aggregate of multiple coding benchmarks.
Up until this week in Opus 4-5, they did not have a top-5 coding model.
All right.
Claude 4-5 Sonnet was not a top-five coding model.
And so the fact that Anthropics API was still anywhere 3 to 10x more expensive.
than the other models that were better at them than coding.
You know, I said that on the show probably two months ago.
I'm like, no one out there should be using Anthropic, like pretty much, period,
because across all benchmarks.
And this was after, you know, essentially after Gemini 2.5 Pro, 2.5 Pro flash,
after some, you know, even Open AI's Codex models that they released here earlier in November,
it didn't make sense.
It was way more expensive in most benchmarks.
It literally couldn't compete even in the top five.
So big move here from Anthropic to maybe go in and change that.
They said, okay, yeah, we're going to come back on the top of some of these leaderboards.
And yes, we are going to try to remain or, sorry, maintain our stranglehold as, you know,
coder or software developer's favorite model on the API side.
Not only that, but we're going to come in with a huge API.
price cut, which is still not the best price per performance, but at least now it makes sense
financially.
Right.
So they did cut their API pricing by two thirds.
So now it can, you know, they kind of took their model from the penthouse to the ground
floor where the work is being done.
All right.
Where before, you know, even three to four weeks ago, I didn't really know.
And I talked to a lot of people in the space, obviously.
I didn't know anyone that was doing big numbers on the API side that was still using Claude.
And if they were, it just means that they didn't do their proper due diligence on having a modular approach.
And you have to be modular, especially when a model like Gemini 3 Pro can come in.
And I'm sure we're going to be seeing some flashlight versions coming soon.
You have to be ready to swap those out.
Right. So no one was actually using, right, at least huge enterprises that are spending millions of dollars annually.
We're not still using any of Claude's models. So big play there. And it's it should be interesting to see how and when and if, I guess, Anthropic continues to cut their prices.
Obviously, they've had a longer standing partnership with Amazon and using Amazon's chips,
but we also saw news last week for a multi-billion dollar partnership with Google to use Google's
TPUs, which as of recently have been getting a lot of love.
So, you know, kind of a combination of those things.
And maybe behind the scenes, anthropic reducing its reliance on Nvidia has made.
maybe allowed it to be a little bit more competitive on the API side.
All right.
Next, agentic workflows and tool orchestration.
So they've really optimized four five, opus four five,
for reliable agent handling more complex, multi-step reasoning.
Also, again, a lot of this is more on the API side too.
So tool search, being able to dynamically find needed tools from large libraries,
So avoiding context pollution, right?
I mean, if you are a software engineer, I'm sure that's a pain point for you, right?
Working in longer context windows, you know, having to really pile on top of the scaffolding of these now very agentic models, right?
It got a little cloudy.
So apparently, OPA5, a little bit better at that.
And also it enables multi-agent systems that self-refine in fewer iterations.
another benchmark we didn't go over, but very impressive and third-party benchmarks as well,
is just the vision capabilities.
And that's one that I'm personally interested to test out a little more.
Again, when I'm talking about these things, I always, especially on Wednesdays, right,
I always want you to be thinking of your use cases, right?
So maybe you have extremely complex diagrams, right?
And this is a big part of what your company does.
you know, parsing information out of manuals with, you know,
complex illustrations inside of PDFs, et cetera.
You know, Opus 4.5, extremely high benchmarks,
even on third-party benchmarks for vision and multimodal capabilities.
So they did score a 80.7% on MMMU validation.
So more also, it added a Zoom tool, OpenAI,
kind of stole the show and kind of went a little viral among us AI nerds last year with this.
But Anthropic finally followed suit with a Zoom tool for inspecting screen regions at full resolution,
just to be able to better understand images, right?
The future, well, not the future, but in 2025, right, the default for large language models,
they have to be multimodal.
They have to be able to understand images as an input.
Unfortunately, right?
And that's a game.
Anthropics not playing on the output side,
even though, you know,
obviously Google and OpenAI can output video and images.
That's an area that Anthropic is not yet playing in.
All right.
Next big category of improvements with Opus 4.5,
just improved tool support for Excel,
which we're going to be talking about here in a little bit in Chrome.
Another one we're going to be talking about.
Adobe just introduced an entirely new way to create,
bringing the power and precision of its creative,
suite into one conversational experience. Meet Firefly AI assistant, now live in the Adobe
Firefly app, the all-in-one creative AI studio. Powered by Adobe's creative agent,
Firefly AI assistant lets you start with your vision, just describe what you want, and shape the outcome
as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on
60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere,
Lightroom Express, and more to help bring your ideas to life.
You can also get started with creative skills, a growing library of pre-built
workflows for common creative tasks like batch editing photos, creating mood boards, portrait retouching,
and creating social variations.
Every step the assistant takes is visible so you can refine, redirect, or take over at any time.
You stay in the driver's seat as the creative director.
Adobe Firefly AI assistant now in public beta.
See it today at firefly.adobie.com.
They also have said that they've worked on their context issues.
And I'm not going to frame this as like a as a new feature.
No, this is them trying to fix something that has made their platform absolutely unusable.
I've talked about this before in the show, especially on the front end for front end users.
Single prompts, y'all, single prompts have busted, not just.
the rate limits for Claude, but also the context window. I've run single prompts. So we'll see
if that gets any better. Also, Anthropic says that Opus 4-5 produces more consistent domain-aware
spreadsheets, slides, and documents for precision verticals. That is a huge, another kind of under-the-radar
feature that was announced a couple of months ago. The go-to-market on Anthropic is wildly poor.
let me just say that unequivocally right they have some of the most groundbreaking innovation
inside large language models but when they roll it out they just roll it out to their like max
subscribers um so no one's talking about it because who's a max subscriber like you know 10 people on
twitter so you know it's great features their five creation feature amazing but when they rolled
it out no one had access so no one talked about it so then when they kind of gave access to the masses
it's, well, it's all news, right?
No one cared about it then because no one had it,
so you can't create a grand opening of a feature twice.
But low-key, one of my favorite features,
we actually did a dedicated show on that a couple of weeks ago.
All right.
So that's kind of the high level of what's new.
Lao, let's go over three overlooked feature updates
because with this new Opus 4.5,
Claw or sorry, Anthropic also kind of snuck in
some unrelated feature updates
that maybe should have just been their own announcements but weren't.
So a lot of people missed these.
I didn't.
I saw these.
I was scrolling through their kind of announcement thread.
And I was like, wait, how is this in the middle of a tweet thread, right?
But I guess like any good sandwich, right?
Sometimes the meat is in the middle.
So number one, the Claude for Chrome extension rolling out to more people.
So previously, again, speaking of Anthropics, absolutely terrible.
go-to-market strategy and just marketing in general.
Chrome extension looked really good, right?
So kind of in a similar way that you can use the Atlas browser from OpenAI, right?
And it can see and understand what's in your browser.
There's some elements of that in the new Claude for Chrome extension.
But when they rolled it out, they rolled it out to a thousand people, right?
And this was months ago.
So now it's open, well, it's not open to everyone.
It's only available to those on Claude Max in, I believe, Enterprise Plan.
plans, but it is noteworthy because now, at least if you are on that $100 or $200 a month plan
for Anthropic, you at least have access to the Claude for Chrome extension.
And with kind of their visual understanding, their agentic capabilities inside Opus 4.5,
and also its kind of ability to handle that context.
This is a big, big update, right?
So, yeah, if you are on a Max plan, you probably should be trying out.
I might have to begrudgingly upgrade to Max just to use this and try it for you all.
So the Chrome extension also demonstrates improved robustness against browser prompt injection attacks compared to previous models.
And, you know, obviously it kind of brings, you know, agentic capabilities
to Chrome, right, that you would have in other browsers, like as an example, to handle tasks across,
you know, multiple open tabs inside Chrome. All right, the next overlook feature would be
Claude code coming to desktop. So Claude code is now available within the Claude Desktop app.
So you don't have to run it separately in the terminal. So this is, I think, a boon to non-technical
users. The Claude Desktop app is actually really good. I'm surprised more people don't talk
about it. I'd say the capabilities leap, right? If you look at as an example, you know,
open AIs, you know, chatGBT.com versus open AIs chat chat.com, you know, fairly apples to apples.
I think when you look at Claude.aI, so the Claude chatbot and Claude desktop, the desktop
is a leap better, right? Just because of everything that you can do kind of with MCPs,
you know, you can have it read, right? I'm a Mac.
user so it can read your iMessages so it can like look into different programs.
So it has some really complex and very robust capabilities.
So also now, Opus 4.5 support on that as well on desktop.
And it does allow software engineers to run multiple local and remote coding sessions
simultaneously.
And then last but not least, three underrated new feature.
updates. So now Claude in Excel. So the specialized Claude for Excel product is now generally
available where before it was in a beta and it did roll out now to Max team in enterprise users.
So unfortunately, if you're on that $20 a month plan like myself, you're not getting clawed
in Excel. But if you do have Excel co-pilot, FYI or sorry, Microsoft 365 co-pilot.
And if you're in Microsoft's Frontier program, you will have a similar version.
It's just the co-pilot branded version of this, but it's actually powered, I believe, by 4.1 from Claude.
So this delivers the Claude in Excel, Anthropics version, delivers a step change improvement for knowledge workers, creating spreadsheets, documents, and slides with more professional polish.
So Claude for Excel uses programmatic tool calling to efficiently read and modify spreadsheets with thousands
of rows. So this is one of those things, you know, kind of about where I said originally,
it seems like Intropic is really just trying to compete vertically, right? They're not trying
to compete on the app layer. They're not trying to be in everything app like Google or Microsoft
or OpenAI. It seems like they're really just sticking with, you know, software engineer,
coding and people in finance or working in data, right? So if you're in spreadsheets all the time,
Claude might be a model that you really want to consider, or if obviously you were in software development.
All right.
That's it for kind of what's new and noteworthy.
Now let's check back on our cake.
All right.
So we're going to start with the third one, the use case that I actually couldn't show because,
unsurprisingly, Claude could not handle it.
even the new 4.5.
And one thing that Anthropic did specifically promote another kind of new feature was its extended context window, right?
By far on the chat side, again, this is different than when you're looking at the API.
A clawed chat was absolutely terrible, right?
So if you're in a chat and you dump a bunch of information, you know, literally I've had single prompts.
bust the context window for Claude.
So my third kind of example here that I couldn't do live because I also did the math
and I realized that it would push me over my current session limit.
So I'm showing this on my screen here.
So my two other prompts that are complete use up 54% of my current session.
And when I did this third test on its own, it took up the,
more than 50%, so I knew that I couldn't do all three.
Anyways, it failed.
So it doesn't matter.
It failed even when I did it on its own.
And this is another example that I tried to do a month ago when talking about this new file
creation feature.
So I wanted to try to have Claude, or in this case, obviously, Opus 4.5, do some tool calling, right?
do some multi-step agentic work.
So the file creation feature, if you haven't used it, it is absolutely amazing.
I think that and artifacts are two standout features for Anthropics Claw.
So in the file creation, it can create PowerPoints.
It can create Excel spreadsheets that you can actually download.
So obviously now Gemini can do that.
They just rolled out their kind of slides version that we've.
went over a couple of weeks ago as well. Open AI can do that in Chad GPD. It's a little clunky,
though, because you have to use agent mode and it takes forever. And most of their slides, or if you're
talking about PowerPoints are very ugly. Claude makes beautiful, beautiful PowerPoints. In this case,
though, it failed. Let me kind of show for our live stream audience. I'll kind of say quickly
what I was doing. I uploaded an older PDF that I did, a presentation, an older podcast. And
And I told Opus 4 or 5, this is more than a year old.
First, you're going to have to go research.
This, you know, I gave it the website.
Here's the website that went along with this presentation.
You need to analyze the PDF.
You need to go find the corresponding transcript or podcast episode.
I gave it the link.
And then I said, you need to update this slide, right?
Or this slide deck.
It was about a 25-page slide deck.
And so, hey, it's a year old.
Go see what's still relevant.
Go research everything that's not relevant.
anymore. So we'll see here, Claude started off by doing a pretty good job. So good instruction
following early on. It kind of broke the task, a complex task, sure, into multiple steps.
It started to do it, right? So it started to read what I uploaded. It was kind of calling on its
PowerPoint skill. Claude has this kind of new skill feature. It went through. It correctly went
and found that URL. It found other episodes on the website that talked about agents,
which is what I encouraged it to do. So it did a pretty good job, you know, calling other tools
and instruction following. However, it just busted. And I did try this in three separate chats,
but in each case, it just died, right? And I tried, you know, the standard, please continue, right?
And it did 10 different attempts to continue. And it just did.
So it busted the context window one prompt.
All right.
So for all of the hoopla of people talking that, oh, Claude on the front end has extended their context window, still not that good if you're doing a complex query.
All right.
So that was technically prompt number three.
Let's go back to prompts one and two and see how we did.
So the first one, remember, I said go find the five latest newsletters on our website.
then go find 10 recent AI news stories we didn't cover and then outline those as if they were podcast episodes.
All right.
So we'll see here.
It went in.
Not sure why.
It used,
uh,
okay,
interesting.
It used my Canva connector,
even though none of them were connected.
Okay.
That's interesting.
Oh,
no,
I just went to Canva.com,
uh,
and found the,
uh,
news there.
Okay.
So,
uh,
it went,
it did a little bit of research.
It looks like it.
read, hopefully it read through the last couple of newsletters.
I don't actually know if it did.
So interestingly enough, it went to our main website at Your EverydayAI.com.
I explicitly told it to go to our subdomain, read.
Your EverydayAI.com, which is not the same thing.
So interestingly enough, I'm looking through the chain of thought, always read through the chain of
thoughts.
Like if there's one thing you get from AI at work on Wednesdays,
read the chain of thought, right?
These agenic or hybrid models that think and reason and plan ahead and show you,
right, except for Gemini 3 Pro, all right?
Gemini team, please add tool calling to the chain of thought so we can see what these
models are doing a little bit more transparency versus just a summarized chain of thought.
So, but I don't see that it actually went to the actual newsletter.
So instead, it went to the episode page and it looked at podcasts, which is wrong.
That's not what I wanted it to do.
So instead, it's just searching everyday AI and it did not do it.
So in this case, the instruction following, bad.
All right.
So it said, I now have a good overview of recent Everyday AI,
newsletter topics, which is false. It went to our website, which does not have our newsletter on it.
You had to go to this subdomain. So in terms of instruction following and calling the right tools
to do the right thing, pretty big failure here. And instead, it just looked at topics that we
covered on the podcast, which is two completely different things. So failure here. Let's at least see
if it did a decent job at creating us,
kind of podcast segments anyways.
So it looks like it ran some JavaScript, interesting.
It said, let me create a comprehensive word document,
but instead it just created a JavaScript file.
Interesting.
So I could obviously go through and convert this JavaScript file.
I'm kind of scrolling through it.
So yeah, it looks like it probably planned some of this information out.
But yeah, it said that it was going to, I can see in the thought process that it actually went and did it.
But big fat failure here, big fat failure.
All right.
So next one, and this is one I was looking forward to.
I'm going to hit refresh on this.
Okay.
Let's see.
Do we get another?
Did we get another failure?
That would be surprising.
Let me see.
I'm wondering if something is up with my browser.
I'm going to go ahead and publish this artifact.
Open it in an incognito window here.
And let me see if it actually rendered.
So interesting.
Did we get a bunch of failures from Claude?
You know what? I'm going to give it a second chance here.
And I'm going to say, you know, nothing rendered.
Please try again and rebuild this using Claude artifacts.
Okay. So interestingly enough, even though I go through, I'm going through, right, this is where I gave it my podcast stats.
I said, you know, find 10 obvious trends, find 10 hidden trends.
and then find me 10 episode ideas that maybe I should be planning based on these trends that you find,
you know, in these thousands of rows of data in the spreadsheets that I upload.
It seemed like it didn't even render anything inside Claude Artifax.
So I'm giving it a second try here.
So we'll see what it does, right?
And I don't want this to go on for too long because here we are at the 40 minute mark.
But I will tell you, Claude Artifacts is one of my absolutely favorite features, right?
I will say that is probably recently been leapfrogged by Google Gemini Canvas.
Google Gemini Canvas, especially with the new Gemini 3 Pro, is unfair, right?
If you aren't using it, all right, I'm going to say this out loud.
If you're not using Google Gemini canvas every day, you're leaving immense value on the table.
I'll just say that.
It is not just extremely flexible in what it can't accomplish, but it is visually, aesthetically, just stunning, right?
Which is, yes, I know that there's a matter of taste to that.
But the overall flexibility and utility of Gemini Canvas is amazing.
So, you know, it's obviously something very similar to Claude Artific.
where you can run and render code, right?
So you can create interactive dashboards.
You can create little micro apps or full-blown like SaaS type apps, right,
inside Google Gemini Canvas.
And you can do that inside Google's AI studio using their build tool as well.
So similarly, you know, artifacts was actually kind of first on the scene in its ability
to write code, right?
Not just write code because that's obviously one thing that the Claude models are great at.
And here we're testing out Opus 4.5, but the ability to render it kind of using their
Claude Artifax engine.
So here for our live stream audience, as I'm drawing this one out, sorry, you know, it's
writing the code on the left side.
And how it's supposed to work is it is supposed to render the code on the right side.
So this is supposed to give me a nice looking dashboard.
I said it should be like SaaS worthy, right?
So it should be so good.
It feels like a product that you pay for, but it's personalized to me, to my information.
So, you know, I don't have to know how to write code to do any of this.
I just have to know, right, here's a bunch of information.
Go make me something super useful and use clawed artifacts, right?
All right.
So I'm going to give it.
I'd hate to put it on a shot clock because now I've already drawn it out like three
minutes. I'm wondering, I'm going to go ahead and maybe see if I can quickly. Let me go up,
because I did use a very similar prompt inside the Google Gemini version of this that I did
probably last week. So I'm going to say, let's see, top 10 obvious trends. I'm going to see
if I can just quickly pull up the Google version of this. All right. So doing a little multitasking
here, live stream audience. Thank you for bearing with me. Let's see. Let's see if I can find the one.
I have so many paid accounts. It is hard to find them. I must have done it in my personal account
because that is where my my Ultra, Google Ultra subscription is. So let's see if I can find that.
There we go. All right, cool. So let's go ahead while we're waiting because
You know what?
What a bummer.
What a bummer, Claude.
Opus 4-5.
Let's scroll down.
It looks like it's another failure.
Another failure.
Oh, no.
Still writing and rendering.
All right.
So we might just have to end on this,
showing you the version that Gemini,
Gemini 3 Pro built.
So good.
I was like flabbergasted last week.
Again, running some similar problems.
or some similar use case or internal benchmarks that we did last week when going over a Gemini 3 Pro.
This is really good.
I mean, our live stream audience, this literally looks like a full-blown SaaS application, all just based on my data.
So everything that we just asked Opus 4.5 to do Gemini 3 Pro just absolutely crush it.
Beautiful, interactive.
It shows the overview.
It shows the trends, the top, top 10 obvious, top 10 hidden trends, opportunities.
They're all color-coded, nice icons, right?
Really, really good.
And then it also included some of the raw data.
Let's see.
I didn't even see if the search works.
Oh, the search works, you know, interactive search.
So it looks like unfortunately, all right, because I'm going to, I'm going to stop it there.
Claude is still going.
I think the first version might have took like 15 minutes.
I should have checked.
Yeah.
So you know what?
If it finishes, if Claude can redeem itself with this Opus 4.5 with this artifact, we'll go
ahead and put it in our newsletter.
All right.
So we're going to have to wrap it there leaving you on a cliffhanger, but not my fault.
So again, live demos of generative AI usually never a good idea, right?
But hey, last week when I did the same with when I did the same with Gemini 3, Gemini 3 pro,
was good, right?
everything worked, you know, today, mainly hiccups with Obis 4.5.
So don't take my word for it, right?
This is obviously, I'm still going to be using this model every single day.
Sometimes live use cases aren't the best, but hey, I showed you.
Sometimes best benchmarks are still going to have hiccups.
So I hope this episode was helpful.
And, hey, on our Wednesday series, I always want you to think about what is your use case.
Tell me, right?
Tell me in the comments.
I always go through and read the.
comments on LinkedIn or on the podcast. So if you're listening on Spotify, thank you. You can leave a
comment there. So thanks for tuning in. If you haven't already, please go to our website at your
EverydayAI.com. Go sign up for that free daily newsletter. Thanks for tuning in. F.I. Happy Thanksgiving
to everyone. So tomorrow for Thanksgiving and Friday, actually, the podcast and the newsletter are taking a
little, little break, right? People tell me how tired I look. I'm going to try to sleep a little bit.
So we will see you Monday for the AI News That Matter.
So thank you for tuning in.
And we'll see you next time on Everyday AI.
Thanks, y'all.
Meet Firefly AI Assistant.
Now live in Adobe Firefly, the Allman One Creative AI Studio.
Just describe what you want to create in your own words and the assistant handles the rest,
orchestrating multi-step workflows across Adobe Creative Cloud apps,
including Photoshop, Premiere Express, and more in one conversational interface.
You direct the outcome while the assistant accelerates execution.
stay in control with the ability to step in and refine at any time.
See it today at firefly.adobie.com.
And that's a wrap for today's edition of Everyday AI.
Thanks for joining us.
If you enjoyed this episode, please subscribe and leave us a rating.
It helps keep us going.
For a little more AI magic, visit Your EverydayAI.com
and sign up to our daily newsletter so you don't get left behind.
Go break some barriers and we'll see you next time.
