The AI Daily Brief: Artificial Intelligence News and Analysis - Why Opus 4.5 Changes Vibe Coding
Episode Date: November 26, 2025Today's episode digs into why Anthropic’s surprise launch of Claude Opus 4.5 is landing like a true step-function moment for coding, agentic workflows, and the emerging paradigm of vibe-based so...ftware creation, with new benchmarks, early user tests, and developer reactions all pointing to a shift in how real work gets done; plus a quick look at the latest headlines including the White House’s Genesis Mission and Amazon’s massive new government-focused AI expansion. Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefLandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, the incredible string of model releases continues with Anthropic dropping Claude Opus 4.5.
Before that in the headlines, the White House launches the AI Genesis mission.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors.
Super intelligent, robots and pencils, blitzie, and robo.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief,
or you can subscribe on Apple Podcasts.
And if you are interested in sponsoring the show,
we're doing a bunch of wrapping up Q1 right now.
Send us a note at sponsors at AIDailyBreef.A.I.
and I can give you all of the info.
And with that, let's dive in.
Welcome back to the AI Daily Brief Headlines edition,
all the daily AI news you need in around five minutes.
Yesterday you heard about how one AI executive order
from the White House had been squashed.
Basically, there was a big dust up with congressional Republicans
around the White House's plan to create a task force
to go after states who put AI regulations on the books, but as it turns out, that was not the
only executive order they have planned. President Trump has now officially signed an executive order
to launch a national AI science program known as the Genesis Mission. The text of the order
argues that the race for global technology dominance in the development of AI requires a historic
national effort comparable in urgency and ambition to the Manhattan Project. This order launches
the Genesis mission as a dedicated, coordinated national effort to unleash a new age of AI
accelerated innovation and discovery that can solve the most challenging problems of the century.
Michael Kratzios, the director of the White House Office of Science and Technology Policy,
continued that tone during the Monday announcement.
He described the Genesis Mission as the largest marshalling of federal scientific resources
since the Apollo program.
Now, stripping away the superlatives, the Genesis mission is at core, an initiative to collate
scientific knowledge from across the government to enable new AI-driven discoveries.
Datasets will be gathered from the National Science Foundation, the National Institute of Standards
and technology, and the National Institute of Health. The datasets, some of which stretch all the way back
to the 1940s, will be cleaned and transformed into machine-readable formats to make them accessible
to AI models. The order lays out a two-fold goal, to train scientific foundation models,
and create AI agents to test new hypotheses, automate research workflows, and accelerate scientific
breakthroughs. To that end, the Department of Energy and their network of 17 national labs
will make their data and compute resources available to research institutions and private sector companies.
The order instructs the DOE to, quote, create a closed-loop AI experimentation platform
that integrates our nation's world-class supercomputers and unique data assets
to generate scientific foundation models and power robotic laboratories.
Essentially, this is a major effort to organize the scientific data that's scattered across
government agencies and marshal resources in order to drive AI-accelerated scientific discovery.
Krasios again said,
Since the 1990s, America's Scientific Edge has faced growing challenges.
He cited declining numbers of drug approvals and research outlets.
outputs despite soaring scientific budgets. The Genesis mission seeks to reverse that trend by,
in his words, unifying agency's scientific efforts and integrating AI as a scientific tool
to revolutionize the way science and research are conducted. Data sets and compute infrastructure
will be centralized into the American Science and Security Platform to be established by the DOE,
who said that once complete the platform will be, quote, the world's most complex and powerful
scientific instrument ever built. It will draw upon the expertise of roughly 40,000 DOE
scientists, engineers, and technical staff, alongside private sector innovators to ensure
that the United States leads and builds the technologies that will define the future.
The DOE is also tasked with formulating a list of 20 science and technology challenges of national
importance to form the initial focus of the Genesis mission. This potentially includes domains
like advanced manufacturing, biotechnology, critical materials, nuclear fission and fusion energy,
quantum information science, semiconductors. The initiative builds on the existing national artificial
intelligence research resource or NER, which was established in 2020 and brought together
federal agencies, including the Department of Defense, NASA, and the National Institutes of Health,
with private companies like OpenAI, Google, and Palantir to form a nationwide research community.
Lynn Parker, who co-chaired Nair during the Biden admin, said,
government support for AI research builds the foundations for new breakthroughs and helps keep
innovation aligned with the public interest. We take for granted that new products appear regularly,
but seldom consider the decades of research that made them possible. Without long-term investment,
we risk seeding leadership in the technologies that will define our economy, our security, and our daily
lives. Now, speaking of the connection between public and private, Amazon announced on Monday that
they will spend up to $50 billion to expand their AI and supercomputing facilities for U.S.
government customers. The expansion will begin next year and is expected to add a total of 1.3
gigawatts of AI capacity to the AWS regions that service government demand. The expansion will
increase capacity for both unclassified and top secret AWS servers. Said AWS CEO Matt Garman
in a press release, our investment in purpose-built government AI and cloud infrastructure will
fundamentally transform how federal agencies leverage supercomputing. We're giving agencies expanded
access to advanced AI capabilities that will enable them to accelerate critical missions,
from cybersecurity to drug discovery. This investment removes the technology barriers that have held
government back and further positions America to lead in the AI era. Staying on the chip theme,
meta appears to be preparing to use Google's TPUs in their own data centers. The information reports
that Google has begun pitching large cloud customers, including meta and large financial institutions,
on installing TPUs at their own facilities.
Google has made their custom AI chips available through Google Cloud for years,
but they've yet to sell TPUs directly to outside customers.
Part of the pitch is that they're able to operate the chips with higher security
and compliance standards that aren't possible with cloud use.
According to sources speaking with the information,
meta is in talks to order billions of dollars worth of TPUs
to install in their data centers in 2027.
If you've been listening over the last week,
what's clear is that while Google has been making TPUs for over a decade,
the release of Gemini 3 put the chips firmly on people's radar.
The new model was trained exclusively on TPUs,
leading many to question whether Google's chips could be a viable alternative to
NVIDIA's GPUs.
The news seems to have moved the stock market,
with Bloomberg reporting a 2.7% bump for Google and a 2.7% drop for Nvidia in overnight
markets.
Bloomberg analysts wrote,
Meta's likely use of Google's TPUs, which are already used by Anthropic,
shows third-party providers of large language models are likely to leverage Google
as a secondary supplier of accelerator chips for inferencing in the near term.
Now, while Google is clearly ramping up to compete, the analysis is still probably getting a little
bit ahead of itself. That said, the new report contained a few more crumbs of information on how
Google is looking to address the market for AI chips. One of Nvidia's biggest moats is the
Kuta developer ecosystem. As part of the information report, they write that Google has developed
a new software suite called TPU Command Center that's designed to make TPU compatibility more easy to
navigate. Ultimately, while it could take Google a number of years to carve out a meaningful share of
the AI chip market, Nvidia is already taking the threat seriously. According to the information,
Nvidia is following the deal-making closely and have enticed Anthropic and OpenAI to make large
commitments to Nvidia GPUs. They also wrote that it's possible that Nvidia will seek to preempt a deal
between Google and meta. Futurum Equity's chief market strategist Shea Boulure writes,
I know the first instinct is to frame meta exploring Google TPUs as the start of Nvidia's pricing power
erosion, but that's not what it is. The real story is the velocity of Metis AI workload curve,
Aslamma training cycles, video understanding systems, and tens of billions of daily inference calls
all smash into the same compute ceiling. Meta is already on pace to spend $100 billion
on Nvidia hardware, and they're still capacity constrained. Adding CPUs doesn't replace the spend,
it just sits on top of it. Even if Nvidia doubled output, meta would still be short on compute.
That's how steep the structural AI capacity shortage actually is.
Lastly today, in an interview at the Emerson Collective's Demo Day, which is the venture and
philanthropy fund of Steve Jobs' widow Lorene Powell Jobs, Sam Altman and Johnny Ives said that they've
nailed the design of their AI device. In possibly the strangest ever description of a consumer
device, Altman said, there was an earlier prototype that we were quite excited about, but I did not
have any feeling of, I want to pick up that thing and take a bite out of it. And then finally, we got there
all of a sudden. Altman said this was Ives' test for knowing when a design is dialed in,
when you want to lick it or take a bite out of it or something like that.
The pair stayed silent on features, but Altman was excited to describe the vibes of the product.
He compared the experience of modern devices as being like walking through Times Square,
flashing lights, noises, and the dopamine drip, constantly just dealing with all the little indignities.
By comparison, he wants using the open AI device to feel more like,
sitting in the most beautiful cabin by a lake, and in the mountains, and just sort of enjoying the peace and calm.
I've added his vibe, commenting,
I love solutions that teeter on appearing almost naive in their simplicity, and I also love
incredibly intelligent, sophisticated products that you want to touch, and you feel no intimidation
that you want to use almost carelessly. Altman commented, I hope that when people see it, they say,
that's it. The interview added no information on what the device will actually do, but for Altman,
the key feature continues to be total contextual awareness. He said, it is so simple, but then AI can
do so much for you that so much can fall away. And the degree to which Johnny has chipped away
at every little thing that this doesn't need to do or doesn't need to be in there is remarkable.
If you feel more rather than less confused, don't worry about it. Substantively, the biggest news was a
timeline with I've stating the device could be available within two years. But with that, we close
today's headlines. Next up, the main episode. Today's episode is brought to you by Superintelligent.
Now, for those of you who don't know who are new here, maybe, super intelligent is actually
my company. We started it because every single company we talk to, all the enterprises out there,
trying to figure out what AI can do for them, but most of the advice is super generic,
not specific to your company. So what we do is we map your AI and agent opportunities by
deploying voice agents to interview your teams about how work works now and how your people
would like it to work in the future. The result is an AI action map with high potential
ROI use cases and specific change management needs, basically everything you need to go actually
deliver AI value. Go to B-Supert.a.i to learn more.
AI isn't a one-off project. It's a partnership that has to evolve as the technology does.
Robots and pencils work side by side with clients to bring practical AI into every phase.
Automation, personalization, decision support, and optimization. They prove what works through
applied experimentation and build systems that amplify human potential. As an AWS-certified partner
with global delivery centers, robots and pencils combines reach with high-touch service.
Where others hand off, they stay engaged, because partnership isn't a project plan.
It's a commitment. As AI advances, so will their solutions. That's long-term value.
Progress starts with the right partner. Start with robots and pencils at robots and pencils.com
slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development
Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for
hours to understand enterprise-scale code bases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzy platform,
bringing in their development requirements.
The Blitzy platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously,
while providing a guide for the final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy
as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org.
Visit blitzy.com and press get a demo to learn how Blitzie transforms your SDLC from AI assist
to AI Native.
Meet Rovo, your AI-powered teammate.
Robo unleashes the potential of your team with AI-powered search, chat, and agents,
or build your own agent with Studio.
Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform,
so it's always working in the context of your work.
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Rovo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your
apps and delivers personalized AI insights from day one. Robo is already built into
Gira, Confluence, and Gira service management standard, premium, and enterprise subscriptions.
Know the feeling when AI turns from tool to teammate. If you rovo, you know.
Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in victory,
oh.com. Welcome back to the AI Daily Brief. The Thanksgiving 2025 parade of models
has continued into a new week, this time with the launch of Clothes.
opus 4.5 from Anthropic. Now, people have been assuming for some time that we were going to get
an opus 4.5. We've obviously had Sonnet 4.5 for a while now, and so people figured that this was in the
offing, but there had been a lot less conversation leading up to this around when it was going to come.
The big model, of course, that people have been anticipating is Gemini 3, and in many ways
this was a wildly understated announcement. And yet, the response has been, in a word,
significant. While they may not have hype posted, Anthropic minces no words in their launch post.
Our newest model, Claude Opus 4.5 is available today. It's intelligent, efficient, and the best
model in the world for coding, agents, and computer use. It's also meaningfully better at everyday tasks
like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI
systems can do and a preview of larger changes to how work gets done. So let's talk first about
the benchmarks. And it is no accident that the one they choose to put right at the top is
sui bench verified. Now, you might remember that in our discussions about Gemini 3, the only major
benchmark that they didn't win or at least match was this one. While Sonnet 4.5 was at a 77.2%,
Gemini 3 Pro was at 76.2%, not like it was super far behind, but still not technically state
at the art. GPD 51 was also a little tiny bit ahead of Gemini 3 pro at 76.3%.
and extended that lead at 77.9% when they released GPT-51 Codex Max in the days following Gemini 3.
For a very short time, 5-1 Codex Max was the top of the sweep-bench verified chart,
but Opus 4.5, at least by the benchmarks, lows it out of the water.
80.9%.
writes Morgan, a 3% lead has never looked so large.
And it wasn't just Sweet Bench verified.
On the Terminal Bench 2.0 Agentic Terminal Coding benchmark,
4-5 was meaningfully ahead of all the others as well on agenetic tool use, scaled tool use,
and computer use Opus 4.5 sets a new standard.
Now, there were some tests where Opus 4.5 meaningfully lagged behind Gemini 3, such as Humanity's
last exam, where they were significantly behind both without search and with search.
And yet, what everyone was talking about, of course, was the coding results.
If you are a regular listener of this show, you will know that the ascendancy of Anthropic this
year, and the speed with which they are catching up to OpenAI has much to do with them being
the preferred AI coding model for developers. That started with 3.5, and is basically continued
unchallenged, although after the release of GPT5, there have at least been credible competitors.
Anthropics seems very clearly to agree with SWICs on the relative importance of coding as
compared to all other use cases. A couple times I've referenced Sean's post about what made him
decide to go work with cognition, where he basically book coding as the high-value short timeline
activity. The line which I've shared a couple of times, code AGI will be achieved in 20% of the time
of full AGI and capture 80% of the value of AGI. Whether or not that's true, Anthropica certainly
behaved as such. Now, outside just the standard sweepbench, there were a couple of other things
that people noticed. Igor Kotenkov points out that while there are ways to overfit towards
the sweepbench verified benchmark, the more recent sweepbench pro is a lot more difficult and connected
to the real world, and Opus blows previous models out of the water. Opus gets a 55.
where Sonnet 4.5 got 43.6, and GPT5 got just 36%. On ARC AGI, Opus 4.5 set a new standard ahead of 51 in Gemini 3,
and at ARC AGI2, they got 37.64% at 240 a task. Already just hours after the release, the people
who had early access were also independently verifying some of these results. Bin new ready,
writes, Opus 4.5 tops Live Bench AI and is the world's best agentic model. We can confirm this
after testing this over the past few days.
Now, interestingly, one of the things that we've seen a lot from labs recently
is the people inside the labs really talking up the specifics about what they like about the models.
We got a spate of that from Anthropic team members, such as Jake Eaton, who writes,
Opus 4.5 is very good at a lot of things, and you should read the benchmarks, the model card, etc.
But my favorite thing about working with it these past two weeks is that in conversation,
it is somehow more fine-grained.
It has a depth and texture that for me was immediately noticeable.
It also feels interestingly much more self-contained.
Sasha de Merigny says,
the internal response to Opus 4.5 has been a mix of excitement, awe, and surprise,
particularly around how good it is at coding.
Theric writes,
Opus 4.5 is special, a world record in Sweet Bench and OS World benchmarks,
the best model we've ever had at Vision.
On Claude Code, I've completely stopped writing code in the IDE.
I think there's so much to discover about Opus 4.5.
And indeed, some of the most interesting responses from Anthropics members
come from their engineering team.
Shelto Douglas writes,
I am so excited about this model.
First off, the most important eval.
Everyone at Anthropic has been posting stories of crazy bugs that Opus found
or incredible PRs that it nearly soloed.
A couple of our best engineers are hitting the intervention's only phase of coding.
Adam Wolf writes, this new model is something else.
Since Sonnet 4.5, I've been tracking how long I can get the agent to work autonomously.
With Opus 4.5, this is starting to routinely stretch to 20 or 30 minutes.
When I come back, the task is often done,
simply and idiomatically. They talked about how Claude Opus compared on a notoriously difficult
candidate exam. In their announcement post, they wrote, we give prospective performance engineering
candidates and notoriously difficult take-home exam. We also test new models on this exam as an internal
benchmark. Within our prescribed two-hour time limit, Claude Opus 4.5 scored higher than any human
candidate ever. They continue, the take-home test is designed to assess technical ability and judgment
under time pressure. It doesn't test for other crucial skills candidates may possess, like collaboration,
communication, or the instincts that develop over years, but this result, where an AI model outperforms
strong candidates on important technical skills, raises questions about how AI will change engineering
as a profession. Now, they also talked to staff members to estimate the impact of using
Opus 4.5 in Claude Code. 50%, 9 of the 18 they surveyed, reported a productivity improvement of at least
100%. The mean self-estimated productivity improvement was 220%. They also popped open the hood a little bit
on how they're making Claude even better when it comes to Agenics. In short, they have a huge emphasis on tools.
Indeed, they write, the future of AI agents is one where models work seamlessly across hundreds or
thousands of tools, an IDEE assistant that integrates Git operations, file manipulation, package
managers, testing frameworks, and deployment pipelines, an operations coordinator that connects Slack,
GitHub, Google Drive, Jira company databases, and dozens of MCP server simultaneously.
To build effective agents, they need to work with unlimited tool libraries without stuffing every
definition into context up front. Agents also need to be able to call tools from code.
Agents also need to learn correct tool usage from examples. Following that, they share that they
were releasing three features to make all of that possible. A tool search tool, which allows
Claude to use search tools to access thousands of tools without consuming its context window,
programmatic tool calling, which allows Claude to invoke tools in a code execution environment,
reducing the impact on the model's context window, and tool use examples, which provide a universal
standard for demonstrating how to effectively use a given tool.
So again, all of this is telling a very consistent story, which is that Claude is for coding
and pushing the frontier of what agents can do.
So outside of interacting with the benchmarks, what were people's first impressions?
Some were excited and appreciated that there was less hype around this.
Nico Christie writes, have to respect Anthropics' commitment to not vague posting all weekend.
This is the most exciting model release in Sonnet 3.5.
Leo at synthwaived writes,
Be Anthropic, pretend Gemini 3 does not exist.
No, you're ready to cook it for code anyways.
Wait, zero high posting.
Drop new opus, state-of-the-art for code, state-of-the-art in RKGI, better than expected,
cost less than old opus.
Be more like Anthropic.
On the flip side, Ethan Mollick basically asked why they were burying the lead.
I'm not sure why Anthropic keeps doing it.
very low-key launches for fairly major releases and materially important improvements to their
services. I kind of think it has to do with the assessment and the specificity of their audience
in and among developers. Basically, it's a group of people that they think is going to respond more
to having their peers and colleagues tell them about an update rather than getting maximum
social distribution because of being loud and hypey. But what about people's early tests?
Victor Taylin writes, to my surprise, Opus 4.1, one shot at my hardest calculus problem tying with Gemini
3. In terms of first hour impressions, couldn't be more.
promising, I guess. Ethan Malick writes, I had early access to Opus 4.5, and it's a very
impressive model that seems to be right at the frontier. Big gains in ability to do practical work,
like make a PowerPoint from an Excel. Niko again writes, Opus 4.5 is a step function improvement
for spreadsheet work. Extremely hard became doable, doable tasks became easy, and easy tasks are now
solved. And yet, if there were a few examples of people trying non-coding things,
coding is very much where the main excitement lies. Garnel Rotch, the CEO of Versell,
writes, Opus is on a different level. It's unreasonably good at NextJS and the best model we've
tried on V0 to date. Menlo Ventures DD Das writes, Anthropic just dropped the best coding model, Opus
4.5. The coolest thing he points out is it does better at Sweet Bench verified without thinking
than with 64K reasoning tokens, in other words, a super token efficient model. Matt Schumer, who didn't
have early access, said first test of Claude Opus 4.5 and I'm already impressed. I asked it for a
co-lab competitor UI, and it quickly pulled together this screen. Definitely better than my similar
test with GBT-51 and, shockingly, Gemini 3. More testing to go, but this is a good start.
He followed it up. Okay, wow, I'm kind of blown away. In one shot, Opus 4.5 made the UI actually
functional, with Python running in the browser. Some, like Superdario, pointed out, that this may not
even be the best model than Anthropic has behind the scenes. They write,
good time to remind everyone, Anthropic has a long-standing policy of not significantly pushing the frontier to prevent an arms race.
Dario can hit sweepbench scores at will.
Now, whether or not that's true, the fact that there is a lot of chatter like that, I think, is good reflection of the sentiment in the community.
Maybe the most vocally excited about this is Dan Shipper in the team at every.
He writes, Breaking News. Anthropic just dropped Claude Opus 4.5.
It is by far the best coding model I've ever used.
And here's how Dan describes it.
it extends the horizon of what you can vibe code.
Explaining, he writes,
The current generation of new models,
Anthropic Sonnet 4.5, Google's Gemini 3,
or OpenAI's Codex Max 51,
can all competently build a minimum viable product in one shot,
or fix a highly technical bug autonomously.
But eventually, if you keep pushing them to vibe code more,
they'd start to trip over their own feet.
The code would be convoluted and contradictory,
and you'd get stuck in endless bugs.
We have not found that limit yet with Opus 4.5.
it seems to be able to vibe code forever.
Two more observations.
Opus 4.5, he says, takes working in parallel to a whole new level.
Because it's far better at planning and coding, it can work with more autonomy,
meaning you can do more in parallel without breaking anything.
One of his teammates worked on 11 different projects in six hours and had good results on all of them.
Lastly, he points out its grade at design iteration.
Opus 4.5, Dan writes, is incredibly skilled at iterating through a design autonomously
using an MCP-like playwright.
Previous models would lose the thread after a few cycles or, say, a design
was done when it wasn't. Opus 4.5 is incredible at autonomously iterating until a design is pixel
perfect. Indeed, Dan's team at Every were equally as vocal in their love of this model.
Kieran Klausen writes, 2023 was GPT4, 2024 was son at 3.5. 2025 is Opus 4.5. This is the coding
model launch I've been waiting for. First time I genuinely believe I can vibe code an entire app
end-to-end without touching the implementation details. We haven't found the limit yet. Previous models
would eventually trip over their own feet.
Convaluted code, contradictory logic, endless bugs,
Opus 4.5 just keeps going.
If you write code with AI, you need to try this.
And I think that this idea is the thing to watch for
to see whether Kieran and Dan's first impressions here
and some of the impressions of the Anthropic team really play out.
That this is, as Kieran puts it,
the first time we can vibe code an entire app
and to end without touching the implementation details.
It strikes me that if that is the case,
That could be the most massive implication of this model.
Adam Wolfe from Anthropic again wrote,
I believe this new model in Claude Code
is a glimpse of the future we're hurtling towards,
maybe as soon as the first half of next year.
Software engineering is done.
Soon, we won't bother to check generated code
for the same reasons we don't check compiler output.
I love programming, and it's a little scary to think
it might not be a big part of my job,
but coding was always the easy part.
The hard part is requirements, goals, feedback,
figuring out what to build and whether it's working.
There's still so much left to do.
and plenty of the models aren't close to yet.
Architecture, systems design,
understanding users, coordinating across teams,
it's going to continue being fun and very interesting
for the foreseeable future.
But still, it's not hard to see that that's a fairly big pronouncement.
Now, moving back to the realm of the non-speculative,
the other thing that captured people's attention about this
is that Opus 4.5 is significantly cheaper than Opus 4.1,
the cost dropped from $15 to $5 per million input tokens
and from 75 to 25 per million output tokens.
Indeed, Jeremy from Anthropic points out,
one fact people won't realize immediately about Opus 4.5,
it's remarkably token efficient.
All in, it's often cheaper than Sonnet 4.5 and other models for cost per task success.
Simon Willison points out why we probably need to be looking not just at cost per output and input
but also token efficiency, when he writes,
this is notable.
Opus 4.5 is around 60% more expensive than Sonnet,
$25 per million output compared to $15 per million output,
but if it can use 76% fewer output reasoning tokens for the same complex task, it may end up cheaper.
Now that 76% came from Claude Relations, Alex Albert, who said on Sweebench verified at medium
effort, Opus 4.5 beats on it 4.5 while using 76% fewer output tokens.
Look, it's early days, but the first impressions are big.
Dan Shipper again sums up, every six to 12 months of model drops that truly shifts the paradigm.
Opus 4.5 launched today, and that's what it is.
best coding model I've ever used and it's not close.
We're never going back.
Brian Atwood points out,
I said a month or two ago that Anthropic is a vertical AI company
and this is what I meant.
They rightly identified that coding is the number one use case for LLMs right now
and are overwhelmingly focused on it.
Meanwhile, others are throwing darts in every conceivable direction,
spreading themselves thin.
Interestingly, just a couple days ago, Sam Altman posted,
It has been amazing to watch the progress of the Codex team.
They are beasts.
The product and model is already so good and will get much better.
I believe they will create the best and most important product in the space and enable so much
downstream work. It has been pretty clear for some time now that OpenAI has come around to a similar
view of the importance of coding and are very much not content to cede that ground. Summing up,
Ethan Malik writes, the main lesson of the past few weeks is that the big four US labs all seem to
have figured out a path forward in continuing the exponential pace of LLM improvement, at least in the near
future. More simply put, Andrew Curran writes, AI winter is canceled. Try again next to your Grinch Squad.
There will, I'm sure, be lots more to discuss around Opus 4.5 as people get deeper into it.
But for now, like I said, the Thanksgiving model explosion continues on abated.
That's going to do it for today's episode.
Appreciate you listening, as always.
Until next time, peace.
