The AI Daily Brief: Artificial Intelligence News and Analysis - A Huge Week for AI Models Gets Even Bigger
Episode Date: November 21, 2025OpenAI followed Gemini 3 with a major one-two punch, dropping both GPT-5.1 Pro and the new Codex Max coding model, while NVIDIA’s blockbuster earnings smashed lingering AI bubble talk and reinforced... the sense that capability curves are still accelerating across the stack. Today’s episode breaks down what the new models actually do, why compaction matters, how early testers are reacting, how NVIDIA reframed the entire AI market in a single earnings call, and why this entire week may mark an inflection point in the narrative around AI progress.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, OpenAI drops two more advanced models, making this the best week for model releases in a very long time.
And before that, on the headlines, Nvidia's blowout earnings absolutely smash the AI bubble bubble.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors.
Rovo, robots and pencils, blitzie, and super intelligent.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can
subscribe on Apple Podcasts. Again, it's just $2.99 a month for ad-free. And if you were interested in sponsoring
the show, shoot us a note at sponsors at AIDailydief.aI. Finally, if you are interested in our AI-R-R-I benchmarking
study, we are collecting data for just a few more days. Anyone who shares three use cases will get the
extended report. You can find that at ROISurvey.com. Welcome back to the AI Daily Brief Headlines
edition, all the daily AI news you need in around five minutes. Boy, it is so clear to me that the
combination of Gemini 3 and these new OpenAI 51 Pro and Max models, plus what we're about to hear
from Nvidia is significantly putting a damper on the bubble in AI bubble talk. In InVidia had its
earnings call yesterday, and CEO Jensen Huang went right into it. Opening the call, he said there's been
a lot of talk about an AI bubble. From our vantage point, we see something very different. That very
different looked like revenue up 62% compared to last year and reaching 57 billion for the quarter.
profit was a buck 30 per share, and both of these key metrics beat Wall Street expectations.
CFO Colette Crest doubled down on Huang's suggestion that Nvidia could see $500 billion in sales next year.
In the first 60 seconds of the call, she said, we currently have visibility to half a trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026.
She added later, there's definitely an opportunity for us to have more on top of the 500 billion that we announced.
The number will grow.
Now, beyond the extremely strong numbers, Huang reinforced how central Nvidia is to every element
of the AI stack.
He said, we excel at every phase of AI from pre-training to post-training to inference.
Indeed, he provided not just numbers to counter the narrative, but a new narrative.
This framing has already been extremely resonant, and so I think it's worth sharing his
comments in a little bit more extensive detail.
Jensen said, The world is undergoing three massive platform ships at once, the first time since
the Don of Moore's Law.
The first transition is from CPU general purpose computing to GPU accelerated computing.
As Moore's law slows, the world has a massive investment in non-AI software,
from data processing to science and engineering simulations,
representing hundreds of billions of dollars in compute and cloud computing spend each year.
Many of these applications, which once ran exclusively on CPUs, are now rapidly shifting
to CUDA GPUs.
Accelerated computing has reached a tipping point.
Secondly, AI has also reached a tipping point and is transforming existing applications
while enabling entirely new ones.
For existing applications, generative AI is replacing classic machine learning and search
ranking, recommender systems, ad targeting, click-through prediction, and content moderation,
which are the very foundations of hyperscale infrastructure.
Now, he said, a new wave is rising, AI systems capable of reasoning, planning and using tools,
from coding assistants like cursor and Claude Code to radiology tools like IDOC,
legal assistants like Harvey and AI chauffeurs like Tesla FSD and Waymo.
These systems mark the next frontier of computing.
So there are three massive platform shifts.
The transition to accelerated computing is foundational and necessary.
The transition to generative AI is transformational and necessary,
supercharging existing applications and business models.
And the transition to agentic and physical AI will be revolutionary,
giving rise to new applications, companies, products, and services.
And to bring it back to Nvidia, he pointed out simply,
Blackwell sales are off the charts and cloud GPUs are sold out.
Compute demand keeps accelerating and compounding across training and inference,
each growing exponentially.
We've entered the virtuous cycle of AI. The AI ecosystem is scaling fast, with more new foundation models, more AI startups across more industries and in more countries.
AI is going everywhere, doing everything all at once. Now keep in mind, these record revenues came with zero sales into China, and Nvidia is currently forecasting zero sales in perpetuity.
InVedia also responded directly to Michael Burry's short thesis regarding the rapid depreciation of chips, noting that A100s from six years ago are still in operation at 100% utilization rates.
ultimately markets liked what they heard.
Brian Mulberry of Zach's investment management said,
Markets are reacting very positively to the news that there is no slack in AI momentum.
And indeed, Nvidia stock was up 4% in overnight trading,
and the beaten down neocloud's nevius group and core weave were up 10% and 8% respectively.
Vital knowledge wrote that the report, quote,
should quiet the skeptics and help clear the path for a year-end rally.
There are certainly pockets of the AI space where valuations needed to take a breather,
but Nvidia is not in that camp.
Next up, staying on the chip theme, but moving a little bit geopolitical,
the U.S. has agreed to supply advanced AI chips into the Middle East.
According to Bloomberg sources, the administration has approved the sale of 35,000 chips
to UAE firm G42 and Saudi-owned humane.
The chips form part of broader bilateral deals that include prohibitions on diverting
hardware to China.
The news comes, of course, as Saudi officials arrive in Washington for an investment forum.
President Trump has said that $270 billion worth of deals are being signed
between dozens of private companies.
And while those deals do span multiple sectors, AI was of course one of the key cornerstones.
Among the deals was a partnership between XAI and Humane to develop a 500 megawatt data center
in Saudi Arabia using Nvidia chips. On stage with Jensen Huang, Elon Musk stumbled over the size
of the announcement, quipping, the 500 gigawatt one will have to wait, as that'll be $8 billion.
Now, we're expected to get a lot more on AI from the White House in the days to come.
President Trump apparently plans to roll out a new AI initiative known as the Genesis mission
as part of an executive order to be announced on Monday.
Speaking at a conference in Tennessee on Wednesday, Department of Energy Chief of Staff,
Carl Coe, said the administration views the AI race as being just as important as the Manhattan Project or the space race.
He said, we see the Genesis mission as equivalent.
Coe didn't provide many further details, but said the order would likely direct national labs
to do more work on emerging AI technologies and could include public-private partnerships.
In addition to the Genesis mission, the administration is planning an executive order that would ban states from passing their own AI regulation.
According to a draft document leaked to the press, the executive order would empower the Justice Department to challenge state AI laws in court.
Government lawyers would be instructed to argue that state laws are unconstitutional on the basis that they restrict interstate commerce.
A new AI litigation task force would be established with the sole purpose of pursuing these lawsuits against the states.
In addition, the Commerce Department would be ordered to withhold federal broadband funding to states that pass their own AI legislation.
Trump hinted at the order during the Investment Conference on Wednesday stating,
We are going to work it so that you'll have a one approval process to not have to go through
50 states. Republican lawmakers are also looking to insert a moratorium on state AI laws into the
must-pass National Defense Authorization Act, which will come to a vote in December.
Moving out of the realm of the policy and into the practical, OpenAI has launched ChatGBT
CBT for teachers. The new version of the ChatGBTGTUX features a secure workspace for teachers
to create class from materials and optimize their prep time. It also includes account
management for school and district leaders to ensure compliance with privacy regulations.
OpenAI is using the service to demonstrate how the features they've added this year can be utilized by teachers.
They highlight the use of memory to ensure ChatGPT remembers curriculum details and preferred formatting for lesson plans.
Teachers will also be able to make use of new ChatGPT integrations like Canva and Microsoft 365
to create presentations and documents natively in ChatGPT.
OpenAI is also providing a prompt library designed to get teachers off to a fast start.
The service will be provided for free to all verified U.S. teachers K through 12 until the summer of 2027,
including unlimited use of GPT-5-1.
Lastly today, AI Music Startup Suno has officially raised another $250 million at a $2.45 billion
valuation.
The round was led by Menlo Ventures with participation from Hollywood Media, Lightspeed, Matrix,
and Nvidia.
Now, interestingly, the large record labels weren't included in this announcement and don't
appear to be on Suno's cap table as of yet.
Universal Warner and Sony filed a copyright infringement lawsuit against Suno and UDio in June
of last year.
and you might remember that Warner and UDO finalized their settlement on Wednesday
with the company's partnering on an AI remixing platform to be released next year.
Earlier reports suggested Suno was also moving towards a settlement with the record labels
looking for an equity stake as part of the deal.
Instead, it appears that Suno will continue to fight the lawsuit on the basis that music generated by their models
doesn't use samples and therefore doesn't infringe on copyright.
Menlo's Didi-Das writes,
Suno is so much more than a neat tool to generate music.
students use Suno to remember schoolwork, indie movie makers use it for soundtracks, parents customized
birthday songs for their kids, and Suno songs even made top music charts. Now, in addition to the
raise, Suno also disclosed that they'd reach $200 million in revenue. That puts them in the same
echelon as lovable and replet as some of the fastest growing startups in AI. I did a whole episode
about why Suno tells such an important story for AI. In short, the vast majority of that
revenue is not spend that was previously going to working musicians heading over to Suno,
although certainly with certain types of behavior that's part of it. Still, the vast majority
is just individual consumer use because people love it. It is net new revenue for a net new
behavior. Michael McNano of Lightspeed writes, I see a lot of people on this website surprised
by Suno's success. It's actually very simple. Everyone loves music, but only if you could make music.
Now everyone can make music. And I think he might be right. In any case,
that is going to do it for today's headlines. Next up, the main episode. Meet Rovo, your AI-powered
teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or
build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's
trusted and secure platform, so it's always working in the context of your work. Connect Robo to
your favorite SaaS app so no knowledge gets left behind. Rovo runs on the teamwork graph, Atlassian's
intelligence layer that unifies data across all of your apps and delivers personalized AI insights
from day one. Robo is already built into Jira, Confluence, and Jira service management standard,
premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you
rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in
victory o.com. Small, nimble teams beat bloated consulting every time. Robots and pencils
partners with organizations on intelligent, cloud-native systems powered by AI. They cover human needs,
design AI solutions, and cut-through complexity to deliver meaningful impact without the layers of
bureaucracy. As an AWS-certified partner, robots and pencils combines the reach of a large
firm with the focus of a trusted partner. With teams across the U.S., Canada, Europe, and Latin America,
clients gain local expertise and global scale. As AI evolves, they ensure you keep peace with change.
And that means faster results, measurable outcomes, and a partnership built to
last. The right partner makes progress inevitable. Partner with Robots and Pencils at Robots
and Pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise
Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized
AI agents that think for hours to understand enterprise-scale code bases with millions of lines
of code. Enterprise engineering leaders start every development sprint with the Blitzy platform,
bringing in their development requirements. The Blitzy platform provides a plan, then generates and
pre-compiles code for each task. Blitzie delivers 80% plus of the development work
autonomously while providing a guide for the final 20% of human development work required to
complete the sprint. Public companies are achieving a 5x engineering velocity increase when
incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of
choice to bring an AI-native SDLC into their org. Visit blitzie.com and press get a demo to learn
how Blitzie transforms your SDLC from AI-assisted to AI Native.
Today's episode is brought to you by my company Super Intelligent.
You've got a hundred what-if ideas, but which one becomes an agent.
Super Intelligent maps every AI use case across your company and helps you create an agent
plan that you can actually execute.
We match opportunities to your tech stack, your data profile, and your team.
No more guesswork, just a clear path from pilot to production.
If you want agents that deliver business outcomes, start with planning.
Go to B-Supert.ai and sign up for a demo.
Welcome back to the AI Daily Brief. Boy, did this turn into just a hell of a week.
Today we're talking about OpenAI's response to Gemini 3, but we're also talking about what I think
will start to happen in the wake of this week, which is a bit of a recalibration in the larger
narrative around AI as well. First, though, let's start with the new model releases. When we got GPT5-1,
which frankly no one was really expecting, it became clear that OpenAI knew that Gemini 3 was coming out
very, very soon. Now, 5-1, as I've said numerous times, was a major update. It was not a nothing
update at all. On the one hand, 5-1 brought more personality back to the model, trying to appeal to
the 4-0 people who had been so mad when GPT-5 came out and felt much more clinical to them,
but it also has felt to many, just frankly, a big step-up in capabilities from GPT-5.
I know on a personal level I have significantly increased the amount of time that I've been
collaborating in a brainstorm and creative and strategic ideation capabilities since 5-1 dropped.
Likewise, it was notable that the pre-Gemini 3 drop did not include a pro version, leading many
to speculate that that would be OpenAI's fast follow to Gemini 3.
I'm not sure that people thought it would be this fast to follow, though.
And as it turns out, it was not just 51 pro that we got, but in fact, even more emphasis
yesterday was placed on a new coding model, GBT-1 Codex Max.
In their announcement post-open AI writes,
GPT-51 Codex Max is built on an update to our foundational reasoning model,
which is trained on agentic tasks across software engineering, math, research, and more.
GPD-51 Codex Max is faster, more intelligent, and more token-efficient at every stage of the development cycle,
and a new step towards becoming a reliable coding partner.
Codex Max, they say, is built for long-running, detailed work,
and one of the big new innovations is this new process they call compaction.
They write,
it's our first model natively trained to operate across multiple context windows through a process
called compaction, coherently working over millions of tokens in a single task. This unlocks project
scale refactors, deep debugging sessions, and multi-hour agent loops. In other words, this model is not only
designed for raw capabilities, but it's designed to improve performance in the specific context in which
is going to operate as not just a coding assistant, but as an autonomous coding agent. Now, as with any
model release, we got some benchmarks. And remember, this is a model that is very specifically designed
for the purpose of coding. Introducing the benchmarks, they reinforced that it was trained on real-world
software engineering tasks, including PR creation, code review, and front-end coding. And in so doing,
Codex Max represents a major jump from 5-1 Codex High on both Sway Lancer as well as Terminal Bench.
The value, however, is in just in output. It's also in token efficiency. For example, they write,
On Sweet Bench verified, Codex Max with medium reasoning
achieves better performance than GPT-51 Codex
with the same reasoning effort
while using 30% fewer thinking tokens.
They also announced that they're introducing
a new extra-high reasoning effort
for non-latency-sensitive tasks,
i.e. tasks that can run for a long period of time.
Overall, then, you're getting better results
and more efficient performance.
And it's clear from the blog post
that this is a model that's designed to expand
the universe of what's possible with AI and agentic coding.
In a section called long-running tasks,
Open AI writes, Compaction enables Codex Max to complete tasks that would have previously failed due to
context window limits, such as complex refactors and long-running agent loops, by pruning its history
while preserving the most important context over long horizons. The ability to sustain coherent
work over long horizons is a foundational capability on the path towards more general, reliable
AI systems. Ultimately, they claim that Codex Max can work independently for hours at a time.
Indeed, they say, in our internal evaluations, we've observed Codex Max work on tasks for more than 24
hours. They conclude, Codex Max shows how far models have come in sustaining long horizon
coding tasks, managing complex workflows, and producing high-quality implementation with far fewer
tokens. Finally, they clude with some statistics. Internally, they say 95% of their engineers
use Codex weekly, and the engineers that do ship roughly 70% more pull requests since adopting
codex. So that's the official blog post. Other members of OpenAI's team focused on different parts.
Researcher Nome Brown used it as a chance to reinforce a message which has been coming up all week,
Pre-training hasn't hit a wall he writes, and neither has test time compute.
Ethan Mollick points out in a theme we'll come back to,
5-1 Codex was released six days ago, now we have 5-1 Codex max.
The use of every naming scheme piled on top of each other from version numbers to
qualifiers like Max makes it hard to see how big a deal each releases,
but this looks like a big jump in ability.
Peter Gostov tested it against a prompt to create an application that allows you to
view the Golden Gate Bridge from various angles,
and said, this is definitely the best I ever got out of this type of prompt by
far. On meter's measurement of long-time horizon tasks, which is of course this chart that we've been
following very closely as a more fast visual cue to understand shifts and capabilities, show that
Codex Max was able to complete tasks that take a human programmer two hours and 42 minutes with a 50%
success rate. That's 25 minutes longer than GPT5, which was the previous state of the art, although
GROC 4-1 and Gemini 3 have not yet been tested. What all of this adds up to, by the way, on the meter
test is that the time horizon for agented capabilities is still doubling roughly every seven months,
but due to a slight inflection point somewhere around the release of O3, the time horizon of
capabilities for the state of the art has actually tripled since the release of Claude 3
sonnet in February. Now, people have not had a lot of time to digest this, but a lot of folks
are jumping on this idea of compaction and what it might mean for context windows in the long run.
And indeed, you get the sense that a lot of the innovations in Codex Max were basically open
AI trying out things that it wants to bring to general purpose AI in what they perceive as the most
competitive and highest value use case area right now, which is AI coding.
Now, Simon Willison pointed out, despite Codex Max, the quote, bigger news today may actually
beat GBT5 Pro. Although, as he points out, that one didn't even get a blog post. It just got this tweet.
OpenAI actually retweeted its announcement of GBT51 from last week, saying GBT51 Pro is rolling out
today to all pro users. It delivers clearer, more capable answers for complex work, with strong gains in
writing, help, data science, and business.
business tasks. Now, despite it not having a lot of release hullabaloo, there were some people who
had early access to it. Professor Daria Anutmasz writes, I can confidently say 5-1-Pro has raised
the level of my favorite model, GPT-50 Pro, by a significant notch. He gave an example where he asked
both 5-0 and 5-1 Pro about the top unanswered questions in immunology, requesting that both
models unpack each question clearly so that someone without an immunology degree could understand
their importance. He concludes, 5-1-Pro is clearly better in that someone without an
immunology background can more easily understand these explanations, with the importance and potential
payoff clearly spelled out. They are also more self-contained, more visual, and more accessible while
still being deep. Content creator Theo had tweeted back on November 17th, just had my mind absolutely
melted by redacted, can't wait to talk about it, and responded yesterday. OpenAI just quietly
released GPT-51 Pro, and this is the redacted I was talking about. Matt Schumer did not mince words.
He said, I've had access to GPT-5-1 Pro for the last week.
It's an effing monster, easily the most capable and impressive model I've ever used.
But he says it's not all positive.
His review ultimately is called an absolute monster but trapped in the wrong interface.
His summary reads,
5-1 Pro is a slow, heavyweight reasoning model.
When given really tough problems, it feels smarter than anything else I've used.
Instruction following is the standout.
It actually does what you ask for without going off the rails.
For serious coding, it feels less like an assistant
and more like a contract engineer working from a spec.
It is ridiculously smart.
it genuinely feels like a better reasoner than most humans,
and I expect examples within days of it solving problems
people thought were out of bounds for today's AI systems.
However, he said there are still areas where it loses to Gemini 3,
and there are interface issues.
He writes,
front end and U.X design are still far worse than Gemini 3,
and the biggest weakness is the interface.
It lives in chat GPT, not in my IDE,
not wired into my existing tools.
This friction is beyond limiting and frustrating.
He says, for most day-to-day work,
Gemini 3 is just better,
waiting 10 minutes for an answer in a separate interface,
is not ideal. For anything that requires deep thought, planning, and research, and anything that I need
to get right on the first try, I reach for 5-1 Pro. Ethan Mollick pointed out, OpenAI feels like it undersells
GPT5 Pro, which is still the model that is most likely to deliver serious value on very hard problems.
Partially it is because these hard problems are complicated, so they're hard to describe to others.
Now, Ethan also points out the right comparison is probably not Gemini 3, but Gemini 3 Deep Think,
but still it is interesting that 5 Pro has always had a bit of a shroud of mystery when it comes
to the right use cases. One other person who had early access to 51 Pro is Simon Smith. He wrote,
I was invited to Alpha Test 51 Pro alongside experts in robotics, math, immunology, medicine,
music, and more. My focus was life science commercial research and strategy and some personal
use cases. Having used 51 Pro for a few days, I find it more like a human domain expert than 5Pro,
with clearer writing, better judgment, fewer tangents, stronger synthesis, and more emotion.
I ran 5-1 Pro head-to-head against 5-Pro on work tasks like scientific literature synthesis,
drug launch planning, and social media analysis. I also tried it for personal financial planning and even
journaling. It was more rigorous and comprehensive in research and planning, stronger at reasoning,
better at staying on track and avoiding tangents, and in at least one case associated errors,
much clearer, more confident, more empathetic in its communication style. Now, he does point out that
it's still bad at certain things. He said that it's not good at creating professional quality
presentations or Excel spreadsheets, and he said, I saw that at least one tester found the model
conservatively avoided tackling known open problems in STEM domains, choosing instead to explain
why they're open problems. Ultimately, he says it's about a 10 to 15% jump over 5 Pro for the types
of things he uses it for, and he says, knowing OpenAI's focus on real-world performance like
GDP Val and reports of it hiring domain experts in fields like finance, I think human domain expertise
is exactly what they're going for, and with 5-1 Pro, they're getting closer.
well for AI doing even more impactful work in 2026.
Now to zoom out here, I think the obvious surface-level story
is something like OpenAI cracks back in the week that Google wanted to dominate with Gemini 3.
And to some extent that's the case, although it's pretty clear that OpenAI is not trying
to steal Gemini's general thunder with this, or at least knows that it's not possible with
these models, but instead, they chose to release the two update models that are most specifically
about very discrete types of work. They are showing off some new approaches, or at least newly
named approaches like this compaction that hint at where the future of general models is headed
and suggest that there is still much, much more territory to be claimed. Indeed, interestingly,
I think that these releases, in a weird way, are much less about trying to win back momentum
from Google and much more about leaning into Google's momentum more broadly. Take it alongside
Nvidia's earnings report, you can feel the embers of a little bit of a shift in the AI narrative.
For a couple of months now, markets have been flirting with the idea.
that AI is just a big bubble. And one of the things that they've been looking for as evidence
is, of course, plateaus or walls in the ability of these models to continue to improve.
The story of this week, as investor Gavin Baker points out, is that Gemini 3 shows that scaling
laws for pre-training are intact. He says this is the most important AI data point since the
release of 01. Now, he gets into why that is, which is a topic that we'll explore in an episode
later this week. But for our purposes here today, I think that take a lot of the first of
away one from these new models from OpenAI is that we all just got even more new tools to play with.
And two, in some ways, this week wasn't about competition, but about all the model companies,
including Grok with 4-1, standing shoulder to shoulder and telling all of the skeptics,
just wait to see what comes next.
That's going to do it for today's AI Daily Brief.
Thanks for listening or watching as always.
And until next time, peace.
