The AI Daily Brief: Artificial Intelligence News and Analysis - The 10 Biggest AI Stories of 2025

Episode Date: December 22, 2025

From DeepSeek’s shockwave debut and the trillion-dollar AI infrastructure buildout to the bubble debate, the MIT enterprise adoption backlash, the AI talent wars, and the rise of reasoning, agents, ...and vibe coding, this episode walks through the 10 defining AI stories that shaped 2025 and set the trajectory for 2026, including why agent infrastructure quietly became the most important foundation of the year and how next-leap models like Gemini 3, Opus 4.5, and GPT-5.2 reset expectations for what’s coming next. In the headlines: DeepSeek R1, Project Stargate, the AI bubble debate, enterprise ROI myths, talent wars, reasoning models, vibe coding, agent infrastructure, and next-generation models.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, the 10 biggest AI stories of 2025. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Super Intelligent, Robots and Pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts, and for all of the information that you could possibly be looking about for the show, sponsorship speaking, etc., go to AIDailyBrief.aI. Now, we are in the early stages of our end-of-year coverage.
Starting point is 00:00:44 From here on out, most of our episodes will be either looking back or looking forward. And today we're starting with the 10 biggest AI stories of 2025. Now, these are not in ranked order. Instead, I put them in a combination of a linear and narrative sequence, but I will call out when I hit my vote for the biggest story of the year. And we're going to kick off with the very first big story of the year, which was the absolute hullabaloo around the release of DeepSeek R1. Now, Deepseek started to have models that people were paying attention to at the end of 2024. But in January, when they released their first reasoning model R1, everyone stood up and took notice.
Starting point is 00:01:22 There were a couple of reasons for that. First of all, while all the American labs were spending hundreds of millions, if not billions of dollars to train their models, deep seek was saying that R1 was trained for just a few million dollars. On top of that, however, alongside the model, DeepSeek also released their very own chatbot app, and it rocketed to the top of the app store charts, even displacing chat chippy T for a while. As markets tried to digest the news, there was a deep sell-off of AI stocks. Invidia lost $593 billion in market cap in a single day, the single biggest one-day loss in stock history. Now, of course, markets recovered, but this deep-seek story set up so many of the
Starting point is 00:02:06 themes that would shape the rest of the year. One that we'll discuss in a few minutes is the rise of reasoning. Part of what made the deep-seek application so popular was that while OpenAI had released their 01 reasoning model at that point, and while 01 remained ahead of what you could get with Deepseek R1, O-1 was at the time entirely behind a paywall, so the vast majority of people had never seen a reasoning model. They were delighted both with the reasoning traces, that Deepseek exposed in their app, as well as just the differentiated quality of the results. Of course, that market squirm would portend everything that we've been dealing with for the past five months around the AI bubble debate. And from a lasting legacy perspective, one thing that
Starting point is 00:02:43 was absolutely true about Deepseek was that Chinese models were much closer in nipping on the heels of Western Close Source models than the vast majority of people have thought coming into the year. That has played out throughout the year, with models like Ken and Quimmy as well as later deepseek models, being right up in the thick of things as some of the best models available. You can see Kimi K2 and Deepseek 3.2, behind Gemini 3, GBT 5.2, and Opus 4.5, but ahead of pretty everything else. It would also kick off a back and forth debate around the appropriate U.S. policy vis-a-vis China that has continued to be dynamic throughout the year. With the latest big change, of course, being the Trump White House deciding to allow Nvidia to sell H-200 chips into China,
Starting point is 00:03:21 the most advanced chip we've allowed to be sold to China in a number of years. All at all, the deep-seek story started 2025 off of the bang, and it has not let down ever since. Our second big AI story for the year also kicked off in January, which was the massive AI infrastructure buildout. It started oh so innocently, just Open AI and a couple of friends like SoftBank, MGX, and Oracle, announcing their intention to invest a half trillion over the next four years to build AI infrastructure in the United States. The initiative was called Project Stargate, and it was announced at the White House on Tuesday, January 21st with President Trump in attendance with Oracle founder Larry Ellison, OpenAI CEO Sam Altman, and SoftBank's CEO Masayoshi
Starting point is 00:04:01 Sun. Now, of course, since then, the AI infrastructure deals have done nothing but increased throughout the year. We have seen a massive amount of hyperscalor CAPEX and expansion, with basically every major company, Microsoft, Google, Amazon meta, all increasing their guidance around their CAPEX for 25 and 26. We saw initiatives like the Global AI Infrastructure Investment Partnership between BlackRock, Microsoft, MGX, and others, which was a $100 billion investment vehicle focused on data centers and the electricity to power them. We had Elon Musk's XAI Colossus expansion, which sees that company attempting to scale from their current 100,000 GPUs to a million GPUs or more. And of course, with all this data center buildout, there is also going to be energy requirements,
Starting point is 00:04:41 leading to announcements like the Google and Next Era Energy Partnership, which is an agreement to develop gigawatt-scale data center campuses that have power generation on-site, thanks to an investment in nuclear. Now, as we discussed, this was a theme throughout the year, and right up until the end of the summer, it was a major theme driving up stock prices. But then came the Oracle and Open AI deal. At the end of August, Oracle revealed that it had added
Starting point is 00:05:05 $317 billion in future contract revenue during its quarter that ended August 31st. That led the company's stock price to surge by as much as 43%, temporarily pushing his net worth up over even Elon Musk. When a couple days later, it was revealed that OpenAI was the customer driving about $300 billion of that, markets started to get a little bit more nervous. And this, of course, brings us to our next big story of the year, which is the AI bubble debate.
Starting point is 00:05:32 Now, if we were just looking for what theme or topic was most discussed, particularly in mainstream media, for sure, this is the biggest AI story of the year. Like I said, at least in terms of the amount of sheer ink spilled on it, every week even to now sees an endless stream of AI bubble debate-related article. And interestingly, a lot of the focus is on Oracle, that big deal with OpenAI, and the debt that they're taking on to finance the buildout. One of the key themes of the bubble conversation is the circularity of revenue. I'm sure you've seen some version of this chart, which shows the dense web of investment
Starting point is 00:06:06 and customer relationships between major companies, including Microsoft, OpenAI, Intel, Oracle, Nvidia, X-AI, and AMD. Now, to some, this screams House of Cards. To others, it shows the dense web of relationships that is driving the mass AI of the economy writ large. AI bubble talk is so ubiquitous that it now has its very own Wikipedia entry, complete with a section on that circular financing. Now, part of what makes this such a juicy and resonant theme is that it's one that's impossible to prove or disprove in the short term. In other words, even if we are in the midst of an AI bubble, the way that that would
Starting point is 00:06:40 be manifest and problematic, in terms of, for example, open AI missing financial obligations with these big deals, is not coming to bear in the short term. That means that it's ripe territory for narrative debates as market actors try to drag participants to their view of the world. Now, one good resource that I've pointed to before, if you are interested in this story, comes from Exponential View, who put together a boom and bubble monitor. This came out of a blog post where they looked at five historic indicators for financial bubbles, economic strain, industry strain, revenue momentum, valuation, heat, and funding quality, and now turn them into a live tracker. Now, at this stage, they argue we are still firmly in boom territory with only one
Starting point is 00:07:21 in the five gauges in the red, which is the industry strain. That said, there is a lot to watch here, and it's a great resource. You can find it at boomerbubble.aI. Now, moving on to our next story, one that I have to begrudgingly include, if the AI bubble debate was the most debated topic of the year, the most referenced media of the year, to my great chagrin, was the MIT report that argued that 95% of generative AI pilots are failing. Now, in my notes about the 10 biggest stories, I called this enterprise adoption and the MIT lie. And while I've talked about the MIT report a lot,
Starting point is 00:07:57 I do want to one more time and for posterity as part of this recap episode, rip it to shreds for the utter garbage that it is. Two big reasons for that. First of all, the methodology, and second of all, the incredible and incorrect leaps of logic that are embedded in the analysis.
Starting point is 00:08:12 So first of all, from a methodology perspective, this study, which I say in the biggest most aggressive air quotes I can manage, looked at a couple of things. First, it looked at recent earnings report of public companies who mentioned AI to see if any of them talked about revenue acceleration. It then paired that with around 50 convenience interviews from random executives they apparently had access to. This is the entire methodology for this thing. Not only is that a radically underwhelming data source, but the idea that an organization not mentioning revenue gains from AI in a report means that their pilots are failing is absolutely ludicrous. Again, one would think that with a
Starting point is 00:08:51 headline backed by someone as prestigious as MIT that says that 95% of pilots are failing, you would assume that they asked a bunch of enterprise AI leaders if their pilots were succeeding or failing and 95% of them said that they were failing, right? But no, this is an inference from a missing articulation of revenue gains in earnings reports, and nothing more. Now, if we can be charitable to the study authors for a moment, they obviously didn't know that it was going to have the impact that it had, and it became caught up in something that was much bigger than just the one report. However, still, frankly, it didn't befit the MIT name,
Starting point is 00:09:24 and I do think they should be embarrassed at the quality of their thinking. Now, packing my soapbox away for the rest of the episode, we do have to acknowledge that there was a reason that this report was so resonant. it came into a ready and waiting environment where a combination of factors made this report have an element of confirmation bias. The first was that markets were starting to turn, and this seemed like perfect evidence of why. A huge part of the amplification that happened in the first couple weeks after this was announced came from Wall Street analysts and investors who were talking about it as part of their assessment of AI markets. But the second thing was,
Starting point is 00:10:01 hold aside the AI bubble debate. A lot of the learnings of 2025 from an enterprise perspective were around the theme that to be good at AI and to really get the value out of this technology, it was going to take more than just dropping a chatbot on top of your people. Obviously, sophisticated organizations never thought it was going to be that simple. But at the time this study came out,
Starting point is 00:10:21 there was the beginning of a broad recognition that, okay, to really get the full value out of AI, we're going to have to think in bigger, more comprehensive, and systemic terms. We're going to have to redesign systems. We're going to have to address data readiness. We're going to have to think about the context we give agents, and that is the real substantive piece that it interacted with. Still, if you want to know
Starting point is 00:10:42 the actual story of enterprise adoption over the course of 2025, it was that even as all that learning that I was just mentioning and realization was happening, that to really get the full value out of AI and agents it was going to take more, adoption was still just a steady lineup. And not only was it a steady lineup, the AI implementations that were happening were leading to value. In our AI-R-O-I benchmarking study, we found that around 44% percent, of use cases were reporting modest ROI, and about 38% were reporting high ROI of either significant or transformational impact. Only 5%, basically the exact inverse of the MIT study, we're reporting negative ROI. And keep in mind, negative ROI does not mean failure. Negative ROI means
Starting point is 00:11:23 failure to reach ROI yet, where the outlay of resources is still higher than the gain from that outlay of resources in the short term. But if you look at leaders who are interacting with AI, 2025 saw their optimism about the value of this technology go nothing but up. Comparing KPMG's global CEO study from 2024 to 2025, in 2024, the majority of CEOs, 63% said that they expected to see ROI from AI in three to five years. 20% of optimists thought it would be one to three years and 16% of pessimists thought it was going to be more than five years. By 2025, that had pulled forward massively. Two-thirds of CEOs surveyed 2025 thought that they would see ROI.
Starting point is 00:12:02 within one to three years instead. 19% said that it was just six months to a year away, and now less than 2% thought it was going to take more than five years. Look, I do think it is worth understanding why this MIT report, as bad as it was, struck such a nerve. But when you peel the layers away, the story of enterprise adoption in 2025
Starting point is 00:12:21 is more adoption, starting ROI, and a real recognition that to get the next set of value, it's going to take more work. Sure, there's hype about AI, but KPMG is turning AI potential into business value. They've embedded AI and agents across their entire enterprise to boost efficiency, improve quality, and create better experiences for clients and employees. KPMG has done it themselves. Now they can help you do the same. Discover how their journey can accelerate yours at
Starting point is 00:12:51 www.kpmg.org.coms slash agents. That's www.kpmg.comg.coms slash agents. Today's episode is brought to you by my company, Superintelligent. Superintelligent is an AI planning platform. And right now, as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments. However, many of our partners are stuck on some AI plateau. It might be issues of governance. It might be issues of data readiness. It might be issues of process mapping. Whatever the case, we're launching a new type of assessment called Plateau Breaker, that, as you probably guess from that name, is about breaking through AI plateaus.
Starting point is 00:13:40 We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau. From there, we put together a blueprint and an action plan that helps you move right through that plateau into full-scale deployment and real ROI. If you're interested in learning more about Plateaubreaker, shoot us a note, contact at B-super.a.i with plateau in the subject line. AI isn't a one-off project. It's a partnership that has to evolve. as the technology does. Robots and pencils work side by side with clients to bring practical AI into every phase, automation, personalization, decision support, and optimization. They prove what
Starting point is 00:14:17 works through applied experimentation and build systems that amplify human potential. As an AWS-certified partner with global delivery centers, robots and pencils combines reach with high-touch service, where others hand off they stay engaged, because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's, long-term value. Progress starts with the right partner. Start with robots and pencils at robots and pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale codebases
Starting point is 00:14:56 with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzie delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzy.com and press get a demo to learn how Blitzie transforms your SDLC from AI-assisted to AI Native.
Starting point is 00:15:32 Our next major story of the AI year has to be the AI talent wars. Now, talent was always valued in AI. That was never a question. Top researchers inside the big labs have for a number of years been making very, very hearty salaries that would make most people extremely happy. However, around the middle of this year, that started to get to new extreme levels as competition between the labs for talent started to ratchet up. Now, a little bit of that was spinouts from the labs who were bringing people along with them. OpenAI's former CTO Miramaradi started her own Thinking Machines Lab, bringing a bunch of talent with her.
Starting point is 00:16:12 Another former OpenAI leader, Ilya Sutskhaver, started his safe superintelligence once again, recruiting a bunch of talent away from the other labs. But where things really heated up was the middle of the summer when Mark Zuckerberg started recruiting for his superintelligence lab. Reports started coming in of just absolutely crazy offers. In June, Sam Altman said that Meta had offered some OpenAI staff up to $100 million, $1, bragging at the time that no one had taken him up on that offer, although that wouldn't last for long. And the numbers just got crazier from there.
Starting point is 00:16:41 We started to see more and more of those nine-figure offers, and people started making the comparison of professional athletes. Sequoia even wrote a piece called why AI Labs are starting to look like sports teams. Now, in many ways, this culminated with the sort of but not exactly acquisition of scale AI by meta, which cost meta $15 billion and seemed like mostly a way for them to get their hands-on-scale CEO Alexander Wang to lead that super-intelligence. lab. And while the insane headlines about nine-figure deals may have died down over the course of the fall, the AI talent wars continue a pace. More recently, what we've been seeing is the gutting of some
Starting point is 00:17:14 incumbents, particularly Apple, who are having an extremely hard time keeping talent right now as their AI strategy flounders. Now, we'll see how this all shakes out heading into 2026, but my guess is that talent is going to continue to be a key battleground for all these labs as we head into next year. From here, we move into some stories that are a little bit more about the substance of AI, rather than the market and the ecosystem around it. The next story is one that is so ubiquitous in surrounding us that it might not even seem like a story as it was just our reality throughout the year. And that's what I'm calling the rise of reasoning.
Starting point is 00:17:45 And I mentioned back in the Deepseek story that a big part of why their app rocketed to the top of the app charts was that it was the first time that most free AI users, which obviously represents the vast majority of them, had used a reasoning model. And of course, once you use a reasoning model, it is very hard to go back. Towards the end of the year, we got some numbers around this from OpenRouter. OpenRouter is a platform that allows developers to connect their applications to a variety of LLMs,
Starting point is 00:18:08 meaning that they don't necessarily have to be locked into one ecosystem, but there can be model switching based on different needs or based on the models going down, or whatever the reason is. And over the course of the year and 100 trillion tokens, from a starting point of basically zero at the beginning of the year, reasoning tokens now represent over 50% of the total consumed. If you used O3 or Gemini 2.5 Pro or Claude after 3.7, or Gemini 3 or GPD5, or basically any model in the second half this year, chances are that by default you are using reasoning models. Now that's said, and the reason that I wanted to call this out as an explicit story, is that while this may be obvious to us in the space, the difference between reasoning
Starting point is 00:18:45 and non-reasoning models is not necessarily widely known outside of AI users. Professor Ethan Malik referenced a recent study that found that clinical LLMs could ace medical exams, but at the same time perform weekly on realistic clinical tasks. The problem is that the study was using GPT4 and Claude 3 opus. Ethan wrote, I hate to keep bringing this up, but studies cannot lump reasoners with earlier models
Starting point is 00:19:07 when considering AI abilities. And while studies don't always need to use the latest models, they should test to see if there are trends in ability as model size scale to anticipate the future. Now, of course,
Starting point is 00:19:16 part of what the reasoning models opened up is our next big story of the year, and the one that if I had to commit to a single biggest story of the year would absolutely be my number one, is the emergence and growing ubiquity of vibe coding. Man, what to say about vibe coding that hasn't already been said.
Starting point is 00:19:33 It started with such humble origins. This tweet from back in February from Andre Carpathy. He said, there's a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs, e.g. cursor, composer with Sonnet, are getting too good. When I get error messages, I just copy paste them in with no comment. Usually that fixes it.
Starting point is 00:19:54 The code grows beyond my usual comprehension. I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. I'm building a project or a web app, but it's not really coding. I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works. Now, of course, Vibe coding was shorthand for a much broader array of AI and Agentic-enabled coding. We saw massive growth in consumer apps like Lovable and Replit, but then we also saw the rise of cursor and cognition,
Starting point is 00:20:24 and these tools that were for AI-enabled and agentic coding but for professional developers. Pretty much universally, it's acknowledged that coding became the first most important use case of Gen A.I., which was expressed in the numbers. Menlo Ventures in their annual study of Enterprise AI found that 55% of departmental AI spend about $4 billion could be attributed to coding. The next highest category was IT at $700 million. Repplet and Loveable both surged ahead of $100 million in ARR and have continued to grow. Meanwhile, cursor is closing in on $800 million, making these companies some of the fastest growing revenue companies in history. Indeed, vibe coding became so ubiquitous that by the end of the year, the conversation had shifted
Starting point is 00:21:02 a little bit inside and around professional developers and software engineers. That group is now in many cases wrestling with the downsides of vibe coding, whether it's the amount of review that's required or technical debt that gets created, or the atrophy of key coding skills. On top of those issues, there's also just questions of how to design the modern AI coding stack. How much in what context do people want super fast AI assistance versus full automation that does just big chunks of the coding work for them. Whatever the case, model releases throughout the year have shown that for the big model labs, there is nothing more important than the coding use case, with basically all of them seeing it as key not only to unlocking the coding market,
Starting point is 00:21:39 but as key to making AI capable for other general use cases. I think as we head into next year, we're going to start to see a fork in the vibe coding conversation. Right now, we're still talking about AI-enabled and agentic coding that professional software engineers and software engineering organizations are doing with the same set of terminology and in the same breath as what non-technical people are doing with code for the first time. I don't think those are really the same thing, and I think those conversations are going to break apart a bit. I also think, frankly, that as ubiquitous as vibe coding was this year, the impact that it is poised to have in 2026 is even greater. In other words, I don't see this as a trend that will dissipate into all the other things
Starting point is 00:22:18 that you can do with AI. Instead, I think this is a fundamental capability shift that will change how a huge portion of knowledge workers do their work forever going forward. I think we've barely scratched the surface on that, which is why, of course, I'm exploring it through a couple of different interviews throughout the course of these end-of-year episodes. Now, staying on the coding and agent team a little bit, a lot of people had 2025 pegged as the year of agents. I actually tend to think that was true,
Starting point is 00:22:42 although it meant something different than we thought going into the year. Part of that was that it was the year of coding agents, but another part of that was that a lot of the key events of this year were about agent infrastructure, the rise of context, and the decisions that all of the competing model labs made to go in on the same set of standards in order to all move further faster. Anthropic introduced the model context protocol at the end of 2024, and it got some initial attention. However, towards the end of February and into March is when it really started to capture people's attention and became a major theme for AI builders everywhere.
Starting point is 00:23:16 MCP, of course, was a way for agents to connect to external services and data sources, greatly expanding what those agents can do. And one of the things that was really interesting is that if you look back at the history of computing, there have often been standards wars that lasted years at a time, where groups who wanted one set of standards fought against groups who wanted another set of standards, all of which ultimately served to slow down overall development in whatever field they were in. That did not happen this year. You could tell as soon as MCP hit that inflection point, that the other labs considered competing and then ultimately decided to just get on board. On March 26, Sam Altman tweeted,
Starting point is 00:23:51 People love MCP and we are excited to add support across our products. On March 30th, Alphabet CEO Sandarpa Chai wrote, To MCP or not to MCP, that's the question. Let me know in the comments. Followed up on April 9th with, Love the Feedback, to MCP it is. And it wasn't just MCP. Other parts of agent infrastructure also saw similar uptake across the labs.
Starting point is 00:24:12 Also on April 9th, Google announced the agent-to-agent protocol. Agent-to-Agent, like it sounds, is an agent-communication protocol. It was explicitly framed when it was announced as a complement to MCP, and within a month, you even had Google competitor Microsoft embracing A-to-A. More recently, we've seen a similar phenomenon with anthropic skills. Skills are a way to give generalized agents access to specialized context knowledge or instructions using a file and folder system, and in December, OpenAI started supporting the framework as well. Now, on top of all this agent infrastructure, we also had the emergent discipline of context
Starting point is 00:24:44 engineering. Whereas prompt engineering was all about figuring out the right way to prompted LLM to get the results that you wanted, context engineering is all about making sure that the LLM has access to the right information or context to do the work that you're hoping to have it do. Taken together, all of this kind of makes 2025 the year of agent infrastructure, which of course sets up 2026 to be the year of agent impact in practice. Now, of course, I'm sure there's a lot more infrastructure still to be built and these lines are ultimately pretty blurry, but I think that this focus on context, and the emergence in rallying around of key agent infrastructure is a key AI story of the year. Lastly, today in our 10 biggest AI stories of the year, are what I'm
Starting point is 00:25:23 calling collectively the next leap models. With this, I'm referring to Gemini 3, Opus 4.5, and GBT 5.2. Now, I had initially planned to include in this episode my countdown of the most impactful model releases of the year, but obviously at this point you can tell the show is getting pretty long, which I probably could have predicted. And so I'm going to move that into its own episode. But for our purposes here for this last story, when GPT5 came out, it was for many a big disappointment. In fact, it helped really fuel the AI bubble debate. The chorus of people saying that AI had hit a wall got louder and louder, pointing to GPT5 as evidence. All of this meant that there was huge pressure on Google leading into the release
Starting point is 00:26:03 of Gemini 3, and it was a challenge that they undeniably met. Google, in fact, released two incredibly important models in November, Gemini 3, as well as their image model, Nano Banana 2. One impact was for Google itself. For the first time, really since ChatGPT launched, Google appeared to be in the driver's seat across the industry as a whole. But even beyond that, there were impacts on the market as well. Gemini 3 served to counteract the narrative that AI had hit a wall, giving more optimism that we'd see continued growth and adoption, which of course could help justify these big deals over the next five years, that markets were trying to figure out how they should price in. Just as the AI community was digesting Gemini 3, Anthropic dropped
Starting point is 00:26:43 its most advanced model Opus 4.5. Now, it's been a few weeks since this came out, and I don't know that I've ever seen a model that started on such a high note in terms of people's perceptions, and just continued to grow in esteem. Many people have argued that Opus 4.5 represents a fundamental level up when it comes to the coding capabilities of AI. I've seen people reset their timelines and how they think about the future of software engineering jobs because of Opus 4.5. Even for non-software engineers. Opus 4.5's capabilities have found their way into the vibe coding apps like Replit and Lovable. Transforming with those platforms can do competently as well. Now, of course, all of this prompted some concern from OpenAI. In the weeks leading up to Gemini 3, an internal memo that was
Starting point is 00:27:24 later leaked, saw Altman forecast some rough vibes due to a resurgent Google. The Rough Vives memo was upgraded to a full-on code red and a shift in priorities away from a bunch of long-term and short-term efforts to just focus on chat GPT and new model releases. It was that code red effort that got us an advanced release of GPT 5.2, which while not necessarily seeing the same universal praise as Gemini 3 and Opus 4.5, certainly has a lot of proponents, including myself, who think that GPT 5.2 Pro is, for my use cases, around business strategy, the best model out there. But you take it all together, and this set of NextLeap models have not only demonstrated that AI development hasn't really hit a plateau, but also leaves us heading into 2026 with veritable superpowers compared to
Starting point is 00:28:07 where we were heading into 2025. And of course, it doesn't appear like that's going to slow down any time soon. We are anticipating at least another OpenAI model in January, and you've got to think that the other labs won't be far behind. And so, friends, that is my list of the 10 biggest AI stories of 2025. Like I said, in another episode, I will count down, and it will actually be a countdown of the most impactful model releases of the year, but for now, that is going to do it for today's AI Daily Brief.
Starting point is 00:28:33 Appreciate you listening or watching, as always, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.