The AI Daily Brief: Artificial Intelligence News and Analysis - A Huge Week for AI Models Gets Even Bigger

Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI drops two more advanced models, making this the best week for model releases in a very long time. And before that, on the headlines, Nvidia's blowout earnings absolutely smash the AI bubble bubble. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors. Rovo, robots and pencils, blitzie, and super intelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. Again, it's just $2.99 a month for ad-free. And if you were interested in sponsoring

Starting point is 00:00:40 the show, shoot us a note at sponsors at AIDailydief.aI. Finally, if you are interested in our AI-R-R-I benchmarking study, we are collecting data for just a few more days. Anyone who shares three use cases will get the extended report. You can find that at ROISurvey.com. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Boy, it is so clear to me that the combination of Gemini 3 and these new OpenAI 51 Pro and Max models, plus what we're about to hear from Nvidia is significantly putting a damper on the bubble in AI bubble talk. In InVidia had its earnings call yesterday, and CEO Jensen Huang went right into it. Opening the call, he said there's been a lot of talk about an AI bubble. From our vantage point, we see something very different. That very

Starting point is 00:01:26 different looked like revenue up 62% compared to last year and reaching 57 billion for the quarter. profit was a buck 30 per share, and both of these key metrics beat Wall Street expectations. CFO Colette Crest doubled down on Huang's suggestion that Nvidia could see $500 billion in sales next year. In the first 60 seconds of the call, she said, we currently have visibility to half a trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026. She added later, there's definitely an opportunity for us to have more on top of the 500 billion that we announced. The number will grow. Now, beyond the extremely strong numbers, Huang reinforced how central Nvidia is to every element of the AI stack.

Starting point is 00:02:07 He said, we excel at every phase of AI from pre-training to post-training to inference. Indeed, he provided not just numbers to counter the narrative, but a new narrative. This framing has already been extremely resonant, and so I think it's worth sharing his comments in a little bit more extensive detail. Jensen said, The world is undergoing three massive platform ships at once, the first time since the Don of Moore's Law. The first transition is from CPU general purpose computing to GPU accelerated computing. As Moore's law slows, the world has a massive investment in non-AI software,

Starting point is 00:02:37 from data processing to science and engineering simulations, representing hundreds of billions of dollars in compute and cloud computing spend each year. Many of these applications, which once ran exclusively on CPUs, are now rapidly shifting to CUDA GPUs. Accelerated computing has reached a tipping point. Secondly, AI has also reached a tipping point and is transforming existing applications while enabling entirely new ones. For existing applications, generative AI is replacing classic machine learning and search

Starting point is 00:03:03 ranking, recommender systems, ad targeting, click-through prediction, and content moderation, which are the very foundations of hyperscale infrastructure. Now, he said, a new wave is rising, AI systems capable of reasoning, planning and using tools, from coding assistants like cursor and Claude Code to radiology tools like IDOC, legal assistants like Harvey and AI chauffeurs like Tesla FSD and Waymo. These systems mark the next frontier of computing. So there are three massive platform shifts. The transition to accelerated computing is foundational and necessary.

Starting point is 00:03:31 The transition to generative AI is transformational and necessary, supercharging existing applications and business models. And the transition to agentic and physical AI will be revolutionary, giving rise to new applications, companies, products, and services. And to bring it back to Nvidia, he pointed out simply, Blackwell sales are off the charts and cloud GPUs are sold out. Compute demand keeps accelerating and compounding across training and inference, each growing exponentially.

Starting point is 00:03:53 We've entered the virtuous cycle of AI. The AI ecosystem is scaling fast, with more new foundation models, more AI startups across more industries and in more countries. AI is going everywhere, doing everything all at once. Now keep in mind, these record revenues came with zero sales into China, and Nvidia is currently forecasting zero sales in perpetuity. InVedia also responded directly to Michael Burry's short thesis regarding the rapid depreciation of chips, noting that A100s from six years ago are still in operation at 100% utilization rates. ultimately markets liked what they heard. Brian Mulberry of Zach's investment management said, Markets are reacting very positively to the news that there is no slack in AI momentum. And indeed, Nvidia stock was up 4% in overnight trading, and the beaten down neocloud's nevius group and core weave were up 10% and 8% respectively.

Starting point is 00:04:40 Vital knowledge wrote that the report, quote, should quiet the skeptics and help clear the path for a year-end rally. There are certainly pockets of the AI space where valuations needed to take a breather, but Nvidia is not in that camp. Next up, staying on the chip theme, but moving a little bit geopolitical, the U.S. has agreed to supply advanced AI chips into the Middle East. According to Bloomberg sources, the administration has approved the sale of 35,000 chips to UAE firm G42 and Saudi-owned humane.

Starting point is 00:05:07 The chips form part of broader bilateral deals that include prohibitions on diverting hardware to China. The news comes, of course, as Saudi officials arrive in Washington for an investment forum. President Trump has said that $270 billion worth of deals are being signed between dozens of private companies. And while those deals do span multiple sectors, AI was of course one of the key cornerstones. Among the deals was a partnership between XAI and Humane to develop a 500 megawatt data center in Saudi Arabia using Nvidia chips. On stage with Jensen Huang, Elon Musk stumbled over the size

Starting point is 00:05:35 of the announcement, quipping, the 500 gigawatt one will have to wait, as that'll be $8 billion. Now, we're expected to get a lot more on AI from the White House in the days to come. President Trump apparently plans to roll out a new AI initiative known as the Genesis mission as part of an executive order to be announced on Monday. Speaking at a conference in Tennessee on Wednesday, Department of Energy Chief of Staff, Carl Coe, said the administration views the AI race as being just as important as the Manhattan Project or the space race. He said, we see the Genesis mission as equivalent. Coe didn't provide many further details, but said the order would likely direct national labs

Starting point is 00:06:08 to do more work on emerging AI technologies and could include public-private partnerships. In addition to the Genesis mission, the administration is planning an executive order that would ban states from passing their own AI regulation. According to a draft document leaked to the press, the executive order would empower the Justice Department to challenge state AI laws in court. Government lawyers would be instructed to argue that state laws are unconstitutional on the basis that they restrict interstate commerce. A new AI litigation task force would be established with the sole purpose of pursuing these lawsuits against the states. In addition, the Commerce Department would be ordered to withhold federal broadband funding to states that pass their own AI legislation. Trump hinted at the order during the Investment Conference on Wednesday stating, We are going to work it so that you'll have a one approval process to not have to go through

Starting point is 00:06:49 50 states. Republican lawmakers are also looking to insert a moratorium on state AI laws into the must-pass National Defense Authorization Act, which will come to a vote in December. Moving out of the realm of the policy and into the practical, OpenAI has launched ChatGBT CBT for teachers. The new version of the ChatGBTGTUX features a secure workspace for teachers to create class from materials and optimize their prep time. It also includes account management for school and district leaders to ensure compliance with privacy regulations. OpenAI is using the service to demonstrate how the features they've added this year can be utilized by teachers. They highlight the use of memory to ensure ChatGPT remembers curriculum details and preferred formatting for lesson plans.

Starting point is 00:07:26 Teachers will also be able to make use of new ChatGPT integrations like Canva and Microsoft 365 to create presentations and documents natively in ChatGPT. OpenAI is also providing a prompt library designed to get teachers off to a fast start. The service will be provided for free to all verified U.S. teachers K through 12 until the summer of 2027, including unlimited use of GPT-5-1. Lastly today, AI Music Startup Suno has officially raised another $250 million at a $2.45 billion valuation. The round was led by Menlo Ventures with participation from Hollywood Media, Lightspeed, Matrix,

Starting point is 00:08:00 and Nvidia. Now, interestingly, the large record labels weren't included in this announcement and don't appear to be on Suno's cap table as of yet. Universal Warner and Sony filed a copyright infringement lawsuit against Suno and UDio in June of last year. and you might remember that Warner and UDO finalized their settlement on Wednesday with the company's partnering on an AI remixing platform to be released next year. Earlier reports suggested Suno was also moving towards a settlement with the record labels

Starting point is 00:08:24 looking for an equity stake as part of the deal. Instead, it appears that Suno will continue to fight the lawsuit on the basis that music generated by their models doesn't use samples and therefore doesn't infringe on copyright. Menlo's Didi-Das writes, Suno is so much more than a neat tool to generate music. students use Suno to remember schoolwork, indie movie makers use it for soundtracks, parents customized birthday songs for their kids, and Suno songs even made top music charts. Now, in addition to the raise, Suno also disclosed that they'd reach $200 million in revenue. That puts them in the same

Starting point is 00:08:54 echelon as lovable and replet as some of the fastest growing startups in AI. I did a whole episode about why Suno tells such an important story for AI. In short, the vast majority of that revenue is not spend that was previously going to working musicians heading over to Suno, although certainly with certain types of behavior that's part of it. Still, the vast majority is just individual consumer use because people love it. It is net new revenue for a net new behavior. Michael McNano of Lightspeed writes, I see a lot of people on this website surprised by Suno's success. It's actually very simple. Everyone loves music, but only if you could make music. Now everyone can make music. And I think he might be right. In any case,

Starting point is 00:09:35 that is going to do it for today's headlines. Next up, the main episode. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence, and Jira service management standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you

Starting point is 00:10:24 rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in victory o.com. Small, nimble teams beat bloated consulting every time. Robots and pencils partners with organizations on intelligent, cloud-native systems powered by AI. They cover human needs, design AI solutions, and cut-through complexity to deliver meaningful impact without the layers of bureaucracy. As an AWS-certified partner, robots and pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the U.S., Canada, Europe, and Latin America, clients gain local expertise and global scale. As AI evolves, they ensure you keep peace with change. And that means faster results, measurable outcomes, and a partnership built to

Starting point is 00:11:09 last. The right partner makes progress inevitable. Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzie delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to

Starting point is 00:11:47 complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzie.com and press get a demo to learn how Blitzie transforms your SDLC from AI-assisted to AI Native. Today's episode is brought to you by my company Super Intelligent. You've got a hundred what-if ideas, but which one becomes an agent. Super Intelligent maps every AI use case across your company and helps you create an agent plan that you can actually execute.

Starting point is 00:12:21 We match opportunities to your tech stack, your data profile, and your team. No more guesswork, just a clear path from pilot to production. If you want agents that deliver business outcomes, start with planning. Go to B-Supert.ai and sign up for a demo. Welcome back to the AI Daily Brief. Boy, did this turn into just a hell of a week. Today we're talking about OpenAI's response to Gemini 3, but we're also talking about what I think will start to happen in the wake of this week, which is a bit of a recalibration in the larger narrative around AI as well. First, though, let's start with the new model releases. When we got GPT5-1,

Starting point is 00:13:02 which frankly no one was really expecting, it became clear that OpenAI knew that Gemini 3 was coming out very, very soon. Now, 5-1, as I've said numerous times, was a major update. It was not a nothing update at all. On the one hand, 5-1 brought more personality back to the model, trying to appeal to the 4-0 people who had been so mad when GPT-5 came out and felt much more clinical to them, but it also has felt to many, just frankly, a big step-up in capabilities from GPT-5. I know on a personal level I have significantly increased the amount of time that I've been collaborating in a brainstorm and creative and strategic ideation capabilities since 5-1 dropped. Likewise, it was notable that the pre-Gemini 3 drop did not include a pro version, leading many

Starting point is 00:13:45 to speculate that that would be OpenAI's fast follow to Gemini 3. I'm not sure that people thought it would be this fast to follow, though. And as it turns out, it was not just 51 pro that we got, but in fact, even more emphasis yesterday was placed on a new coding model, GBT-1 Codex Max. In their announcement post-open AI writes, GPT-51 Codex Max is built on an update to our foundational reasoning model, which is trained on agentic tasks across software engineering, math, research, and more. GPD-51 Codex Max is faster, more intelligent, and more token-efficient at every stage of the development cycle,

Starting point is 00:14:19 and a new step towards becoming a reliable coding partner. Codex Max, they say, is built for long-running, detailed work, and one of the big new innovations is this new process they call compaction. They write, it's our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task. This unlocks project scale refactors, deep debugging sessions, and multi-hour agent loops. In other words, this model is not only designed for raw capabilities, but it's designed to improve performance in the specific context in which

Starting point is 00:14:52 is going to operate as not just a coding assistant, but as an autonomous coding agent. Now, as with any model release, we got some benchmarks. And remember, this is a model that is very specifically designed for the purpose of coding. Introducing the benchmarks, they reinforced that it was trained on real-world software engineering tasks, including PR creation, code review, and front-end coding. And in so doing, Codex Max represents a major jump from 5-1 Codex High on both Sway Lancer as well as Terminal Bench. The value, however, is in just in output. It's also in token efficiency. For example, they write, On Sweet Bench verified, Codex Max with medium reasoning achieves better performance than GPT-51 Codex

Starting point is 00:15:28 with the same reasoning effort while using 30% fewer thinking tokens. They also announced that they're introducing a new extra-high reasoning effort for non-latency-sensitive tasks, i.e. tasks that can run for a long period of time. Overall, then, you're getting better results and more efficient performance.

Starting point is 00:15:43 And it's clear from the blog post that this is a model that's designed to expand the universe of what's possible with AI and agentic coding. In a section called long-running tasks, Open AI writes, Compaction enables Codex Max to complete tasks that would have previously failed due to context window limits, such as complex refactors and long-running agent loops, by pruning its history while preserving the most important context over long horizons. The ability to sustain coherent work over long horizons is a foundational capability on the path towards more general, reliable

Starting point is 00:16:11 AI systems. Ultimately, they claim that Codex Max can work independently for hours at a time. Indeed, they say, in our internal evaluations, we've observed Codex Max work on tasks for more than 24 hours. They conclude, Codex Max shows how far models have come in sustaining long horizon coding tasks, managing complex workflows, and producing high-quality implementation with far fewer tokens. Finally, they clude with some statistics. Internally, they say 95% of their engineers use Codex weekly, and the engineers that do ship roughly 70% more pull requests since adopting codex. So that's the official blog post. Other members of OpenAI's team focused on different parts. Researcher Nome Brown used it as a chance to reinforce a message which has been coming up all week,

Starting point is 00:16:51 Pre-training hasn't hit a wall he writes, and neither has test time compute. Ethan Mollick points out in a theme we'll come back to, 5-1 Codex was released six days ago, now we have 5-1 Codex max. The use of every naming scheme piled on top of each other from version numbers to qualifiers like Max makes it hard to see how big a deal each releases, but this looks like a big jump in ability. Peter Gostov tested it against a prompt to create an application that allows you to view the Golden Gate Bridge from various angles,

Starting point is 00:17:18 and said, this is definitely the best I ever got out of this type of prompt by far. On meter's measurement of long-time horizon tasks, which is of course this chart that we've been following very closely as a more fast visual cue to understand shifts and capabilities, show that Codex Max was able to complete tasks that take a human programmer two hours and 42 minutes with a 50% success rate. That's 25 minutes longer than GPT5, which was the previous state of the art, although GROC 4-1 and Gemini 3 have not yet been tested. What all of this adds up to, by the way, on the meter test is that the time horizon for agented capabilities is still doubling roughly every seven months, but due to a slight inflection point somewhere around the release of O3, the time horizon of

Starting point is 00:17:56 capabilities for the state of the art has actually tripled since the release of Claude 3 sonnet in February. Now, people have not had a lot of time to digest this, but a lot of folks are jumping on this idea of compaction and what it might mean for context windows in the long run. And indeed, you get the sense that a lot of the innovations in Codex Max were basically open AI trying out things that it wants to bring to general purpose AI in what they perceive as the most competitive and highest value use case area right now, which is AI coding. Now, Simon Willison pointed out, despite Codex Max, the quote, bigger news today may actually beat GBT5 Pro. Although, as he points out, that one didn't even get a blog post. It just got this tweet.

Starting point is 00:18:34 OpenAI actually retweeted its announcement of GBT51 from last week, saying GBT51 Pro is rolling out today to all pro users. It delivers clearer, more capable answers for complex work, with strong gains in writing, help, data science, and business. business tasks. Now, despite it not having a lot of release hullabaloo, there were some people who had early access to it. Professor Daria Anutmasz writes, I can confidently say 5-1-Pro has raised the level of my favorite model, GPT-50 Pro, by a significant notch. He gave an example where he asked both 5-0 and 5-1 Pro about the top unanswered questions in immunology, requesting that both models unpack each question clearly so that someone without an immunology degree could understand

Starting point is 00:19:12 their importance. He concludes, 5-1-Pro is clearly better in that someone without an immunology background can more easily understand these explanations, with the importance and potential payoff clearly spelled out. They are also more self-contained, more visual, and more accessible while still being deep. Content creator Theo had tweeted back on November 17th, just had my mind absolutely melted by redacted, can't wait to talk about it, and responded yesterday. OpenAI just quietly released GPT-51 Pro, and this is the redacted I was talking about. Matt Schumer did not mince words. He said, I've had access to GPT-5-1 Pro for the last week. It's an effing monster, easily the most capable and impressive model I've ever used.

Starting point is 00:19:52 But he says it's not all positive. His review ultimately is called an absolute monster but trapped in the wrong interface. His summary reads, 5-1 Pro is a slow, heavyweight reasoning model. When given really tough problems, it feels smarter than anything else I've used. Instruction following is the standout. It actually does what you ask for without going off the rails. For serious coding, it feels less like an assistant

Starting point is 00:20:12 and more like a contract engineer working from a spec. It is ridiculously smart. it genuinely feels like a better reasoner than most humans, and I expect examples within days of it solving problems people thought were out of bounds for today's AI systems. However, he said there are still areas where it loses to Gemini 3, and there are interface issues. He writes,

Starting point is 00:20:30 front end and U.X design are still far worse than Gemini 3, and the biggest weakness is the interface. It lives in chat GPT, not in my IDE, not wired into my existing tools. This friction is beyond limiting and frustrating. He says, for most day-to-day work, Gemini 3 is just better, waiting 10 minutes for an answer in a separate interface,

Starting point is 00:20:46 is not ideal. For anything that requires deep thought, planning, and research, and anything that I need to get right on the first try, I reach for 5-1 Pro. Ethan Mollick pointed out, OpenAI feels like it undersells GPT5 Pro, which is still the model that is most likely to deliver serious value on very hard problems. Partially it is because these hard problems are complicated, so they're hard to describe to others. Now, Ethan also points out the right comparison is probably not Gemini 3, but Gemini 3 Deep Think, but still it is interesting that 5 Pro has always had a bit of a shroud of mystery when it comes to the right use cases. One other person who had early access to 51 Pro is Simon Smith. He wrote, I was invited to Alpha Test 51 Pro alongside experts in robotics, math, immunology, medicine,

Starting point is 00:21:28 music, and more. My focus was life science commercial research and strategy and some personal use cases. Having used 51 Pro for a few days, I find it more like a human domain expert than 5Pro, with clearer writing, better judgment, fewer tangents, stronger synthesis, and more emotion. I ran 5-1 Pro head-to-head against 5-Pro on work tasks like scientific literature synthesis, drug launch planning, and social media analysis. I also tried it for personal financial planning and even journaling. It was more rigorous and comprehensive in research and planning, stronger at reasoning, better at staying on track and avoiding tangents, and in at least one case associated errors, much clearer, more confident, more empathetic in its communication style. Now, he does point out that

Starting point is 00:22:09 it's still bad at certain things. He said that it's not good at creating professional quality presentations or Excel spreadsheets, and he said, I saw that at least one tester found the model conservatively avoided tackling known open problems in STEM domains, choosing instead to explain why they're open problems. Ultimately, he says it's about a 10 to 15% jump over 5 Pro for the types of things he uses it for, and he says, knowing OpenAI's focus on real-world performance like GDP Val and reports of it hiring domain experts in fields like finance, I think human domain expertise is exactly what they're going for, and with 5-1 Pro, they're getting closer. well for AI doing even more impactful work in 2026.

Starting point is 00:22:46 Now to zoom out here, I think the obvious surface-level story is something like OpenAI cracks back in the week that Google wanted to dominate with Gemini 3. And to some extent that's the case, although it's pretty clear that OpenAI is not trying to steal Gemini's general thunder with this, or at least knows that it's not possible with these models, but instead, they chose to release the two update models that are most specifically about very discrete types of work. They are showing off some new approaches, or at least newly named approaches like this compaction that hint at where the future of general models is headed and suggest that there is still much, much more territory to be claimed. Indeed, interestingly,

Starting point is 00:23:22 I think that these releases, in a weird way, are much less about trying to win back momentum from Google and much more about leaning into Google's momentum more broadly. Take it alongside Nvidia's earnings report, you can feel the embers of a little bit of a shift in the AI narrative. For a couple of months now, markets have been flirting with the idea. that AI is just a big bubble. And one of the things that they've been looking for as evidence is, of course, plateaus or walls in the ability of these models to continue to improve. The story of this week, as investor Gavin Baker points out, is that Gemini 3 shows that scaling laws for pre-training are intact. He says this is the most important AI data point since the

Starting point is 00:24:04 release of 01. Now, he gets into why that is, which is a topic that we'll explore in an episode later this week. But for our purposes here today, I think that take a lot of the first of away one from these new models from OpenAI is that we all just got even more new tools to play with. And two, in some ways, this week wasn't about competition, but about all the model companies, including Grok with 4-1, standing shoulder to shoulder and telling all of the skeptics, just wait to see what comes next. That's going to do it for today's AI Daily Brief. Thanks for listening or watching as always.

Starting point is 00:24:35 And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - A Huge Week for AI Models Gets Even Bigger

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.