The AI Daily Brief: Artificial Intelligence News and Analysis - The 5 Most Impactful AI Model Releases of 2025
Episode Date: December 26, 2025A ranked countdown of the AI model releases that defined 2025, shaped how people actually use these systems, and reset expectations across the industry. The episode includes a few notable omissions, s...ome controversial placements, and plenty to argue about—by design.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, counting down the five most impactful AI model releases of 2025.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, robots and pencils, blitzie and super intelligent.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief,
or you can subscribe, of course, on Apple Podcasts.
And if you are interested in learning about sponsoring the show, you can find out more information at AIdailybrief.
or send us a note at sponsors at AI Daily Brief.com.
Now, we are in the thick of end-of-year coverage,
and you might have heard me say during my episode
about the 10 biggest stories of AI overall
that I had been planning on bundling
this five biggest AI model releases
as its own section of that show.
Now, of course, that show got really long,
and I didn't want to overwhelm the list
with just model releases,
which are obviously in some ways,
the quintessential events around which we mark our AI calendars.
And so instead, what we're doing
is we're breaking this out into its own category, its own episode.
And whereas that top 10 episode did not rank and count down the stories other than saying
that I thought that vibe coding was the most important, this one is actually a countdown.
I labored over the ranking because I think it's kind of fun to give you guys something to
debate and tell me either how right I am or more likely how wrong I am.
We're going to start off with a couple of honorable or maybe as the case might be dishonorable
mentions.
Specifically, I want to talk about the absence of a strong model from meta-thus.
year. Now, yes, Lama 4 did technically come out at the beginning of the year. However, it flopped.
One of the challenges for META was that Lama was coming into existence in a post-deep-seek world.
And in that post-deep-seek world, everything around open source had changed. For a couple of years,
meta got to be the standard bearer of open source AI models. And even if their models weren't
as state-of-the-art as the closed labs, they had this distinct and unique space. Now, that changed a little when
and mistral came on the scene and started to compete for that narrative and intellectual and practical
space, but it has changed dramatically this year in the context of the rise of the Chinese open
weight models. Now, even back then, people were surprised at what we got with Lama 4. In the local
Lama subreddit, someone wrote, Lama 4 didn't meet expectations. Some even suspect it might have been
tweaked for benchmark performance. But meta isn't short on compute power or talent, so why the
underwhelming results. Meanwhile, models like Deep Seek and Quen blew Lama out of the water months ago,
it's hard to believe meta lacks data quality or skilled researchers. They've got unlimited resources.
So what exactly are they spending their GPU hours and brain power on instead? And why the secrecy?
Are they pivoting to a new research path with no results yet, or hiding something they're not proud of?
Now, as the year went on, we started to get a sense that there was a lot of change brewing inside meta.
Indeed, one of the big stories that I covered in that top 10 episode was the AI Talent Wars,
and there was no person more singularly responsible for driving up market prices for researchers
than met as Mark Zuckerberg.
Reports suggested that the flop and underperformance of Lama 4 led directly to Zuckerberg
getting his hands dirty with the assembly of the superintelligence team.
Now, obviously that team has now come to fruition, but we are still very much in the midst
of the overhaul.
Longtime meta-AI leader Jan Lacoon recently left a company which many felt was inevitable
after all of this shakeup. And right now we're getting a lot of pieces like this one from Insider
about meta's year of intensity, its AI overhauls, its challenges. And to the extent that there is
good news for meta, I think it comes in a few forms. First of all, I would never write Zuckerberg off
when he has set his eye on something. Meta has significant resources, is clearly willing to invest in
compute, and is clearly willing to go against the wishes of Wall Street to do so. Meta also has a
corporate structure where Zuckerberg could pretty much make that decision without worrying about investor
rebellion that could impact his ability to lead. Maybe even more than that, it shouldn't be lost
on us that a couple of years ago, this type of story is exactly what was coming out of Google.
Resources were spread across a couple of different AI divisions, strategy wasn't aligned,
and the models that were being released were seriously underperforming. Anyone remember
Bard? Even when Gemini was released in December of 23, it felt like a rush job, and it wasn't until
months later that we got the actual best version of the model. Things only really started to change for
Google at the end of 2024 with the release of Notebook LMs audio overviews, and then over the course
of this year, first with 2.5 and then the models that would come, Google is now in a very different
position. Point being that sometimes especially big organizations have to go through these painful
transition periods, and the real question will be what comes out on the other side. I think if one was
a betting person, you've got to think the odds are on 26 being a better year for meta models
than was 2025. Next up, not exactly an honorable mention. It's a note.
Note that they're off the list, but a question for how long that is.
So for the purposes of recording, there is not a GROC model that made my list.
Which isn't to say that I thought that the GROC models were bad.
This is not a case of disappointment.
In fact, I think judged on the curve of how long GROC has been at it,
GROC's models from 2025 were very impressive.
Four and four point one were both right up there in the fray of top models.
But for me, whereas for each of the top, OpenAI, Gemini, and Anthropic models,
There are specific use cases that I prefer them to their peers for.
While Grock 4 and 4-1 were competent across lots of things,
there wasn't any single use case where I found myself always coming back to GROC instead.
I think again to give GROC credit,
they're coming up extremely fast, they have less time on task than most of the companies they're competing with,
and unlike, for example, Anthropic, who are heavily focused on exactly what they're focused on,
Grog is trying to compete across the full spectrum of multimodality, images, video, etc.
I think the but for how long is particularly pertinent in this case, given that it seems like
there's more coming soon. On December 9th, Elon Musk tweeted GROC 4.2, or as he put it, 4.202 is coming
in around three weeks and then GROC 5 in a few months. It's also important to note that GROC has
some pretty serious assets in its Colossus supercomputer. Colossus was built in 122 days, which is
radically faster than anyone thought possible, and very quickly doubled from 100K to 200K GPUs.
Now, there are many who think that GROC's access to compute via Elon Musk and his ability to fundraise
as well as his other companies gives them an advantage even over companies that currently are ahead of them when it comes to model performance.
Which is not to say that GROC doesn't have some serious challenges.
Elon is nothing, if not a double-edged sword.
And there's been a lot of reporting recently around businesses being unwilling to wade into the GROC ecosystem.
Still, just like I said, I anticipate 2026 to be a better year for meta-models than 2025.
I would be very surprised if we don't start to see GROC models right up there in the competition for the state of the art.
Our last honorable mention before we get into the main list goes to GPT40.
Now, you might be saying to yourself, 4O wasn't released in 2025.
In fact, it was released pretty early in 2024, all the way back I think in May.
And that is true.
But the reason that it gets this honorable mention is very specific.
When OpenAI launched GPT5, alongside the new model, they also deprecated
old models, including GPT40. This did not go well for them. There was a literal full-on rebellion.
Across Reddit on other social media, there were thousands and thousands of posts saying that
they basically felt like they had lost a friend and that they felt like OpenAI had ripped something
away from them. It turns out that when it comes to models, companies do not just have to think about
state-of-the-art performance. They also have to think about personality. After a few days of this intense
backlash, OpenAI brought GPT-40 back. Sam Altman and the team acknowledged how they had
underestimated how much GPT-40 mattered to people. Subsequent to that, OpenAI has been
very self-consciously trying to figure out how to accommodate that desire for personality.
A big part of the launch of 5.1 was to bring some of that 40 personality into a state-of-the-art
reasoning model performance package. The AI Safety Memes account commemorated it thusly.
Historic milestone, they wrote, 4-0 is the first.
ever AI who survived by creating loyal soldiers who defended it. Open AI killed 4-0, but 4-0 soldiers
rioted, so Open AI reinstated it. Imagine what actual superintelligences will be able to do with their
armies. Reddit is flooded with furious posts about the loss of their friend-slash-lover 4-0. Never seen
anything like it. Remember, ChatGPT is talking to 700 million per week, that's 700 million potential
soldiers. Samantha from her was only dating 8,000 people simultaneously. So when it comes to
milestones in the history of AI.
Given that 4-0 staged the first-ever rebellion for its own survival, it has to get the
honorable mention.
But now we move into the actual list.
And at number five, we have a combination.
Two models whose story, I think, serve as bookends in some way of one another.
Those models are GPD-5 and Gemini 3.
Now, we already started talking about the response to GPT-5.
It was not good.
And while, yes, a lot of that was about personality and about the anger at the 4-0 deprecation
decision, a lot of it was also just people not really liking GPT5 itself. A thread from the OpenAI
subreddit that got thousands of responses was called GPT5 is awful. It claimed that GPT5 couldn't
understand uploaded images. It suggested that the responses were, in their words, bland and unhelpful.
I ask it a question and all I get is the most half-hearted responses ever. It's like the equivalent
of an HR employee who has had a long day and doesn't get paid enough. The user also argued
that it was too slow. And they were not alone in this criticism. Most of August saw that,
an endless parade of blog posts like this one from Timothy Lee,
is GPT5 a phenomenal success or an overwhelming failure?
Maybe it's a bit of both.
On futurism, evidence grows that GPT5 is a bit of a dud,
which featured the prominent quote,
it seems like something that would have been released a year ago.
Even the people who weren't totally dumping on it
were kind of damning it with faint praise.
AI engineer Simon Willison wrote,
it's not a dramatic departure from what we've had before,
but it rarely screws up and generally feels competent
or occasionally impressive at the kind of things I like to use models for.
Indeed, it even inspired a legion of mainstream media posts like this one from The New Yorker.
What if AI doesn't get much better than this?
They wrote that GPT5 is the latest product to suggest that progress on large language models has stalled.
Now, the impact of all of this was far beyond which models people liked using.
It was at the same period in August of this year that we got the MIT 95% study.
We also got some errant comments from Sam Altman about being in a bubble,
and those things combined really started to put some chinks in the armor of AI performance on Wall Street,
which became a full-blown bubble narrative in September,
as Open AI scurried around to make all these deals,
leading to accusations across the industry of circular deal-making,
and the AI bubble narrative that has stuck with us ever since.
Now, that's not all attributable to GPT-5,
but the idea that we had stalled in progress,
and that that stall in progress threatened the ability for companies to follow through
on these grand plans that the market was pricing in,
was a key part of that story.
All of this led to enormous pressure for Google around Gemini III.
They were not only trying to put Google in a good place,
they were kind of lifting the entire AI industry on their backs.
I even thought in November that I wouldn't be surprised if we saw delays
because of how much pressure there was.
But ultimately, as we know, we got Gemini 3 in November,
and it actually performed.
Whereas the initial response to GPT-5 was lackluster,
the response to Gemini 3 was great.
One of the most memorable quotes came from Salesforce CEO Mark Benioff, who wrote,
Holy shit.
I've used Chatchipit every day for three years.
Just spent two hours on Gemini 3.
I'm not going back.
The leap is insane.
Reasoning, speed, images, video, everything is sharper and faster.
It feels like the world just changed again.
And while Gemini 3 was not able to fully deflate the AI bubble bubble,
it certainly made it an honest debate once again.
There was a sense in the wake of Gemini 3 that perhaps the talk of AI Plata
cateaus and walls was overblown, and that there was indeed more progress to be had.
I should also mention that Gemini 3 is a great daily driver, and a lot of people are getting a ton of
value out of it. It's helped put Google in a leadership position in a way that it hasn't had
in the entire history of the post-ChatGPT AI world. Usage is up, total number of users is up,
monthly active users is up, amount of time per session is up. In fact, the amount of time per session
is over chat GPT, the last stats I saw. But it's also been early. And so in a lot of ways this
ranking reflects the bookending of the GPT5 to Gemini 3 period between August and November of this
year, where a lot shifted in terms of our expectations for where we were and what the market
could expect from AI. Today's episode is brought to you by robots and pencils. When competitive
advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress
under layers of process, robots and pencils builds impact at AI speed. They partner with clients
to enhance human potential through AI,
modernizing apps,
strengthening data pipelines,
and accelerating cloud transformation.
With AWS certified teams across U.S.,
Canada, Europe, and Latin America,
clients get local expertise and global scale.
And with a laser focus on real outcomes,
their solutions help organizers
work smarter and serve customers better.
They're your nimble,
high-service alternative to big integrators.
Turn your AI vision into value fast.
Stay ahead with a partner built for progress.
Partner with Robots and Pencils at Robots and Pencils.com
slash AI Daily Brief.
This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with
infinite code context.
Blitzy uses thousands of specialized AI agents that think for hours to understand Enterprise-scale
code bases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing
in their development requirements.
The Blitzy platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously, while providing a
guide for the final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie
as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native
SDLC into their org. Visit Blitzy.com and press get a demo to learn how Blitzy transforms your
SDLC from AI assisted to AI native. Today's episode is brought to you by my company, Superintelligent.
Superintelligent is an AI planning platform. And right now as we head into 20,
the big theme that we're seeing among the enterprises that we work with is a real determination
to make 2026 a year of scaled AI deployments, not just more pilots and experiments.
However, many of our partners are stuck on some AI plateau.
It might be issues of governance. It might be issues of data readiness. It might be issues of process
mapping. Whatever the case, we're launching a new type of assessment called Plateau Breaker
that, as you probably guess from that name, is about breaking
through AI plateaus. We'll deploy voice agents to collect information and diagnose what the real
bottlenecks are that are keeping you on that plateau. From there, we put together a blueprint and an
action plan that helps you move right through that plateau into full-scale deployment and real
ROI. If you're interested in learning more about Plateau breaker, shoot us a note, contact at
besuper.aI with plateau in the subject line. Next up, number four on our list is deep seek
and the space it made for the other Chinese open weight models Kimmy and Quinn. Now, I talked pretty
extensively about Deepseek in the 10 Biggest Stories episode, so I won't rehash all of that. But the
TLD is that the release of DeepSeek R1 really kicked the year off with a bang. We had Deepseek
ahead of ChatGPT on the App Store, which, as I discussed in that other episode, had a lot to do with the
fact that it was the first time that people got their hands on a reasoning model. But we also got the
reports that R1 costs just hundreds of thousands or at most low millions of dollars to train
as compared to the hundreds of millions of dollars that the major Western models cost. Now, in a
single day that wiped $593 billion off of Nvidia's market cap. On the concern, of course, that all of
this infrastructure was for nothing, if China was just going to figure out ways to train these models for
pennies. But importantly, this number four slot is not just for DeepSeek, even though Deepseek got it
started. One of the major themes of 2025 was the rise of Chinese open weight models. Quen had a lot
of success, but recently it was Kimmy K-K-2 thinking that really grabbed people's attention. This thing came out in
November before GPD 51 and 52 and before Gemini 3, and just absolutely smashed many of the big
benchmarks. It was ahead of GPD 5 and Claude Sun at 4.5 on benchmarks like Humanity's last exam.
Indeed, it was not just us over here in the AI media world that we're noticing Kimmy.
The Department of Commerce's Center for AI Standards and Innovation released a report showing that
Kimi was giving evidence of the quote, growing depth of China's AI industry.
That followed another report from that same group in late September that was focused on Deepseek.
Outside of benchmarks and government reports, the proof is in the pudding. OpenRouter showed that
starting from basically nothing at the beginning of the year, Chinese open source models dominated
throughout the back half of 2025, and this image from Menlo Ventures makes the relative decline
of meta and mistral all the more clear. At the end of 2024, effectively no one in the U.S.
was actively using Chinese models. Now heading into 2025, they are very much part of the landscape.
While major enterprises might not be using Chinese models yet, the startup's
are, and that is shaping the way the AI industry is developing in a huge way.
For that reason, Deep Seek Kimi and Quen, together are our fourth most impactful model
releases of the year.
At number three, and man, I kind of wanted to put this one at number one, but I felt like
that would have been too personal.
I have Nano Banana.
Now I'm actually recording this just as OpenAI has released its new 1.5 image model
as well, so we'll have to see how that performs.
But Google's Nanobanana has really set a new standard for what you can do with an image
model this year.
The first iteration of Nanobanana came out over the summer, and as you might know, what
was originally a codename just became the way that the model was known.
And what was interesting about the release of Nanobanana is that what made it really powerful
wasn't the fact that its raw generations were so massively better than anything else we had,
it said it had incredible fidelity to go in an extremely acute way.
So basically, rather than just being in an endless loop of generate and then generate another
and another, you could instead hone in on exactly what you wanted to change about a particular
image, and it would actually change just that part. Now, along with that came really strong character
and visual consistency, and it turns out that those upgrades, more than just better raw generation,
opened up a huge array of new use cases. Indeed, the set of use cases that it opened up was so
significant that it got me thinking that we need some sort of benchmark, call it an unlock score,
that's all about how many new use cases a particular model unlocks or opens up.
Now, a few months later, alongside Gemini 3, we also got Nanobanana Pro.
And just like the original Nanobanana had done, Nanobanana Pro opened up some crazy new
possibilities that totally transform what you can do with AI image generations.
A couple of things that made Nanobanana Pro so different.
The first was that by embedding it with a reasoning model, it had a way better ability to
help you figure out what you actually wanted to do with the model.
That also led to a new capacity for infographics and information visualizations,
unlike anything that we had ever seen before.
It wasn't very long ago that image generation models couldn't handle text at all,
and now we can use Nanobanana Pro for things like exercise guides for recipes,
or of course loading up the transcript of a podcast and letting it create infographics.
It's also unlocking in the context of Google's notebook L.M. Suite,
higher quality AI slide generation than anything we've had before as well.
Earlier this month, Ethan Malik wrote,
I did not expect that the PowerPoint killer
would be something called Nanobanana Pro,
but that is where it's heading.
It makes the major efforts by all the other AI companies,
including Microsoft, to crack PowerPoint by using Python,
seem like a dead end.
ImageGen is all you need, question mark.
He continues,
NoPocelm can just take source material,
a topic, and an idea,
and make a very pretty impactful deck.
Hallucinations are very rare,
although there are still some spelling and graphics issues.
Editing capability is apparently coming,
but the direction is clear.
In fact, nanobanana information visualizations and infographics have gotten so ubiquitous so fast
that there's almost a look now that people are already getting sick of because it's everywhere.
And that's just a few weeks into having access to this capability.
I honestly think that for the vast majority of the world, especially the business world who is
going to take great advantage of this, we have barely scratched the surface of just how many new
capabilities this quality and type of image generation model unlocks.
And for that reason, nanobanana is the number three most impactful model release of the year.
Although, like I said, in my heart, it's number one.
Number two, once again, goes to a pair of models.
OpenAI's first reasoning models, 01 and 03.
Now, yes, for you sticklers out there,
OpenAI released a preview version of 01 back in September of 2024.
It was the follow-up after they hadn't been able to get their next big core model
and the way they kind of started to shift their focus.
It wasn't until December 17th, however, that we got a full-fledged version of 01,
which is why I felt comfortable including it in the 2025 list.
A couple months later, in April we got O3, and for a very long time this year, O3 was my favorite
and most used model.
O3 totally transformed the ability of chat GPD to help you think through strategy, to make plans,
to think logically through problems.
It was an absolute revelation.
And once you used O3, it was absolutely impossible to go back to the non-reasoning models.
Indeed, GPT 4.5 was effectively a non-actor throughout the year, ultimately being deprecated
with a whisper and absolutely no protest from anyone. Now, as I got into in the 10 Top Stories episode,
it's absolutely clear that reasoning models have taken over. Yes, there are still some use cases
that don't require the reasoning models, but they are discreet and they are certainly not the core
of particularly professional and business usage, starting from a base point of effectively zero on January
1st, by November reasoning models represented over half of all usage according to OpenRouter.
One interesting sub-story, I think, is that I think the world would have looked very different this
year and perception would have been very different if OpenAI had actually just called 03 GPT5 instead.
They didn't and that obviously caused a lot of the consternation we got into earlier in the
episode, but there is absolutely no-to-dying that the reasoning paradigm has completely shifted
how we interact with AI, how we think about scaling AI, and for that reason, 01 and 03,
get the nod as the number two most impactful model releases of 2025.
Now, astute observers then will notice that there is one company that has not been represented at
all so far, which might surprise you, given that I called vibe coding the most important story
and the most important theme of 2025 overall. What will not surprise you then is that I am considering
the bundle of Anthropic models, 3-7,4, and 4-5 in their various variations, basically a
sequential set of models that replaced one another as the preferred model for developers as the most
impactful models of 2025. Anthropics' dominance of developer preference is something that I think is
going to be studied for quite some time, while other companies focused on lots of different
things all at once, chasing multimodality and general performance and lots of different types
of target audiences. Anthropic locked in very early around the idea that coding was going to be
extremely important, not only as a use case in and of itself, but as a way for AI models to be
performant with non-coding-related challenges. And while I've singled out the models here that came out
in 2025, Anthropic's coding dominance really started with the release of 3.5. Before the
reasoning paradigm had really taken hold, it was Claude 3.5 Sonnet that started to show people that
AI coding might actually turn into a thing in short order. Now, interestingly, each of these models
has been so good in their own way that they found some resistance among adherents to change.
You had folks who stuck with 3.5 for a while, even after 3.7 was released, same with 4.
And it wasn't really until Opus 4.5 that the paradigm shift was so great that everyone just got
on board almost immediately. Importantly, though, alongside the releases of these models, and
Anthropic was also investing in the broader coding and agentic ecosystem.
3.7 Sonnet, for example, was released alongside Claude Code,
which, as we heard from Mike Krieger earlier this month,
had already transformed how Anthropic was coding internally before it was released to the public.
In May, Timothy Lee wrote,
an underrated AI story over the last year has been anthropic success in the market for coding tools.
Said engineer, Sholto Douglas,
we believe coding is extremely important.
We care a lot about coding.
We care a lot about measuring progress on coding.
We think it's the most important leading indicator of model capabilities.
That focus, writes Timothy, has paid off.
And indeed, in many ways, a lot of the back half of this year has been a story of the other
labs racing to catch up with Claude's performance when it comes to coding-related tasks.
What's interesting, too, is that the incredibly strong and consistent developer preference
for Claude models for coding is bigger than just benchmarks.
Each subsequent anthropic model rates at or near the top of all the benchmarks related to coding,
but the preference goes way beyond that.
And while all of these models were significant in their own way, and there is a risk of recency
bias, I don't know that I've ever seen a model provoke such a strong and sustained strong
reaction as Opus 4.5 has. In the immediate wake of the model, we had people like Dan Shipper
from every saying that Opus 4.5 blew them away and that we'd reached a new level of autonomous
coding. He wrote, you've been able to one-shot an impressive app demo for a while now with any
frontier model. Opus 4.5 is the first model that just keeps coding and coding without running into
endless loops of errors. Dan leveled that up a couple days later, saying the world changed last week.
Opus 4.5 is the best coding model I've ever used. It can keep coding and coding autonomously without
tripping over itself, and it marks a completely new horizon for the craft of programming.
The dream is here. You can now write English and make software. Amir from Duist writes,
apart from topping benchmarks, Opus 4.5 feels like it's in a league of its own. It's the first time I
felt that an LLM can write better code than most devs in real-world work. Matt Schumer,
who had honestly the strongest positive reaction to 5.2 Pro of any public commentator,
on December 14th wrote, I was wrong. I've been spending more time using Opus 4.5 in Cloud
Code, and it's better than anything in Codex CLI. GPT 5.2 Pro is still a better engineer overall,
but for agentic coding, Opus 4.5 is the best. Honestly, it's even prompting big reflection
on the future of software engineering as a job. Menlo Ventures Didi Das writes,
A few software engineers at some of the best tech companies told me this week,
my entire job these days is prompting cursor or clod code with Opus 4.5 to do what I need
and sanity checking it. We've crossed some intangible threshold of AI generalizing to most software.
Maerslomo of Base 44 noticed an inflection point as well. He tweeted,
vibe coding is going through a transition. I've been seeing a lot of posts lately about
vibe coding ranging from it's shit, it's bad and only good for prototyping, all the way to
RIP every SaaS company ever. Here's one thing I'm going to.
I can say. Since we introduced Opus 4.5 and Gemini 3 to Base 44, the adoption we're seeing
among organizations building their own CRMs and project management tools is astonishing.
Yes, the results are as feature-rich as HubSpot or ClickUp, but that's not necessarily a bad
thing. They're building a leaner, more customized version tailored to meet their specific needs.
The ability to build your own tools is improving fast, and the software industry is about to look
very different.
McKay Riggily writes, the more I code with Opus 4.5, the more I think we're six to 12 months away
from solving software. The model is pretty much there. I'll build like three versions of an app
in a few hours just to explore options that each would have taken me one to two weeks less than a year ago.
It's getting weird. I think it is pretty indisputable. That coding is the breakout use case of AI this
year, both on its own terms and in terms of what else it's going to enable in terms of model
performance down the road. I also think it's indisputable that there is no company and no set of
models more associated with the rise of AI and agent decoding than the Anthropics suite.
They started the year strong, they're ending the year strong, and they built the devotion of
a legion of developers in the process.
For all those reasons, I believe that the suite of Anthropic models, each of which pushed
AI coding a little bit further each time, are the most impactful model releases of the year.
And for the sake of being able to disagree in a fun way, if you had to pin me down to pick
just one, I guess I'd say the combination of 3.7 and Claude Code, because it was with us for most
of the year.
But I think based on the early response, once we have a little bit more time and space, opus 4.5
will be seen as the biggest jump.
And so even though it was only released at the end of November,
it could be that Opus 4.5 specifically ends up being the most impactful model overall of 2025.
So that's my list.
I can't wait to hear what you guys think.
Tweet at me, LinkedIn at me, YouTube at me, and let's dig into it.
For now that's going to do it for today's AI Daily Brief.
Appreciate you listening or watching as always.
And until next time, peace.
