The AI Daily Brief: Artificial Intelligence News and Analysis - 10 Things GPT-5 Changes

Starting point is 00:00:00 Today on the AI Daily Brief, 10 things that change after GPT-5. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Blitzy and Super Intelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief. And if you are interested in sponsoring the show, send us a note at sponsors at aidailybrief.aI. Now, it was obvious this being GPD 5 week that the long reads and or big think episode of the week had to be dedicated to the subject of GPT5 in some way, shape, or form.

Starting point is 00:00:46 I've obviously been thinking about this topic a lot, and I've been trying to think about what I think the state of play is post-GPT-5, taking into account people's first reactions, how I'm seeing it used, what I think it suggests for the direction of the industry. I came up with a list of 10 things that I think will change or are changing or just being reinforced now that this model is live. You'll see these aren't big societal pronouncements that relate to the capabilities of GBT5. This is much more practical and specific and about the state of play in the AI space itself. This is, of course, all just my subjective opinion based on my observations.

Starting point is 00:01:25 And I'll be interested to see what you guys think, especially as you get access to this model and play around a bit. Now, the first thing that I think changes, or perhaps as reinforced, is our sense of LLM progress. There has been a question for some time now of whether we were hitting some sort of plateau. This really started to be a narrative around the end of last year, frankly around this time last year, when we got what we felt like were delays on GBT5. There was a lot of scuttlebutt about pre-training just not working as well, and a big shift that came with the introduction of reasoning models and new.

Starting point is 00:02:00 approaches to scaling like test time compute that put the emphasis at the moment of inference rather than on pre-training. Subsequent to that, of course, we got a whole new generation of models like 01 and then 03 and all the reasoning models from the other labs as well that showed that even if the pre-training technique was reaching its limits, there were still lots of areas of development to be had. I think with this model though, even though there are all those other areas of development and places that we are going to continue to get progress, there is broadly a lot of. There is broadly a sense, that at least this paradigm has at least some amount of a plateau. CEO Amjad Mossad wrote, can't help but feel the crushing weight of diminishing returns. We need a new S-curve.

Starting point is 00:02:41 AI researcher Jack Morris affirmed this, but also put it more positively. He wrote, Shortest explanation of GBT5, this is exactly what the scaling laws predicted. The model is better. The returns are diminishing. And sadly, absolute general intelligence improvements will only get smaller. The good news is that there's still so much to do. personality, reasoning, memory, and creativity are still open problems. And that brings me, I think, to the second point, which is the idea that the emphasis in model improvement is shifting away from just raw capabilities and towards tool usage and how models can interact with the real world.

Starting point is 00:03:20 This was the core subject of Ben Heilich's essay on latent space about GPT5. He summed up on Twitter, GPT5 is insanely good at using tools. Tools are about to change fundamentally, and this is why OpenAI just released unstructured function calling. For those of you who didn't listen to my intro episode on GPT5, Ben basically made a comparison to the Stone Age. He argued that what marked the dawn of human intelligence was humans learning how to use tools. As humans, he wrote, we manifest our intelligence through tools. Tools extend our capabilities. We trade internal capabilities for external capabilities. It's the defining characteristic of our intelligence.

Starting point is 00:03:57 GPD5, he wrote, marks the beginning of the Stone Age for agents and LLMs. GBT5 doesn't just use tools, it thinks with them, it builds with them. I think Ben is dead on here, and I think a lot of what we're going to see is optimizations and improvements that are designed for how models and how their agentic expressions can actually go use tools. One of the things you start to notice now, even as companies present the results of their on benchmarks, is that they're always going to share models raw, but then also models that use tools. For example, when OpenAI presented GPT-5's performance on Humanity's last exam, while it got 24.8%

Starting point is 00:04:37 with no tools, with a full slate of tools including Python and a search, it got 42%. In other words, tools represent a whole new frontier of places to get more gains from these models. And so if you are worried that we are maybe on a raw capabilities plateau, at least with current strategies, there is very clearly a ton of new areas and new frontiers to explore that will continue to see greater progress. Moving on now, one of the things that is most clear from the release of GBT5 is that this is a huge boon for the Normies. For people who have mostly only interacted with chat GBT through whatever model was

Starting point is 00:05:14 default like 4-0, they are going to have their head spun by some of the capabilities of GBT5. Dan Shipper from Evere had his mom tested out, and she was glowing, saying this is way more comprehensive than the answers I usually get from chat GPT, the information it gives me is readable and flows really well. The model is gold. Signal argued that the no-model roulette thing is actually bigger than most people, especially average people, will clock. Basically, they argue that the amount of cognitive load it takes for people to understand or try to figure out which model to use was actually even more damaging than it seemed. One of the things we saw when Deep Deep Seek launched earlier this year was that even though the model itself was less performant than the

Starting point is 00:05:52 reasoning models that Open AI had available, Open AI wasn't giving those models to people as their base, which means that when people tried Deepseek, it was the first time they had used a reasoning model, and the experience blew them away. Now that experience is going to be the norm for everyone, and I think you're going to see a massive democratization of a lot of the best capabilities of AI, two huge new pools of audiences that didn't have access to them before. I want to talk specifically about two use case categories that I think get a major boost that way. The first is strategy support or strategic thinking. Since the advent of reasoning models, especially 03, LLMs have for many of us become constant strategic companions. I am literally day in and

Starting point is 00:06:34 day out, weighing different decisions for super intelligent or for the podcast through the strategic lens of previously 03. Now, of course, this does not mean that I act on the strategy. There are still big gaps, but the reasoning models really have hit a point where they're extremely adept at helping you think more comprehensive and holistically about the types of decisions that you're going to make. GPD5 improves that meaningfully. First of all, in my early tests, I've seen the reduction of sycophancy that OpenAI worked so hard on manifest as a willingness to take a harder line on decisions, which is a key part of strategic thinking, whereas previously O3 would try to hedge and explain or justify anything

Starting point is 00:07:13 that I said, it now is much more comfortable actually weighing different options and suggesting a best course of action. Again, that doesn't mean that I'm necessarily any more likely to take it, but it's a lot more instructive and informative and useful to see what the AI actually thinks, not just how it justifies what I think. This type of strategic collaboration was not possible with the non-reasoning models. And for that reason, the vast majority of chatchabit users have not engaged with the models in that way. I think it will be one of the biggest unlocks for people's personal productivity, for how they manage their careers, for how they manage their part of businesses, be able to engage with the strategic capabilities of GPD-5.

Starting point is 00:07:53 Now of course, the other even more obvious thing is that we are about to see an absolute explosion in vibe coding. Vibe coding was already an insanely fast-growing area of AI usage, pretty definitively the most important theme of 2025. As I said on my initial coverage, OpenAI made it pretty clear. They believe that about 700 million weekly new vibe coders are going to come online very, very soon. It was not just, in my opinion, that they really wanted to catch up with Anthropic for the sake of professional developers. Yes, that was and is absolutely a goal, but I think that they're

Starting point is 00:08:30 thinking about this more broadly. I think that OpenAI have come to the conclusion, one that I agree with, by the way, that coding is a new lingua franca, and that interacting with code via vibe coding type tools is going to be simply a standard part of using computers in the future. Now, this won't happen all at once. Yes, certainly now that GPT-5 is available, some number of their 700 million weekly users who haven't been vibe coding at all will say things like, hey, build me a game or build me a website, but it'll take a while for that to become normalized. But normalized, I believe it will become. And it's clear with GPT-5 that they're placing a lot of emphasis on that type of beginner usage. A term you hear all over the place with people's first impressions of GPT-5 is one-shoting.

Starting point is 00:09:16 The idea that with very little guidance and very little follow-up, they were able to one-shot some comprehensive thing that code could create. Alam writes, what's truly impressive about GBT5 one-shotting this game isn't the graphics. It's the flawless prompt adherence and constraint handling. A fully functional, interactable game generated in a single pass. This level of instruction following and code generation is wild. Other people have been sharing all the different things that they've one-shotted with GBT5, a space simulator, a meditation app, a duolingo clone, Windows 95 even.

Starting point is 00:09:49 The more that people share these short of examples, the more that regular people are going to come online, try it out, and realize that this entirely new capability set that they didn't even consider before is now opened up to them. Now, this doesn't mean that everyone is in full agreement around OpenAI's emphasis here. Dax, for example, writes, I think the biggest market for AI coding tools is software engineers. A lot of the industry believes it's non-software engineers, hence the focus on one-shotting. But in the end, I believe it'll be the smaller numbers. That created a great conversation. Atlassian engineer Koon Chen writes, it's hard to say. Web 2.0 was less about professional writers, more about regular bloggers. Tick-Tock was less about filmmakers with

Starting point is 00:10:30 the DSLR, but more random people shooting random stuff. Things that don't feel valuable in aggregate can add up when there's a long tail. What's for sure is that even if traditional software engineers remain the majority of software engineers, at least when it comes to important code that gets pushed. Vibe coding for all is a major, major theme, unlocked in a huge way by GPT5. AI agents are the buzzword that everyone's talking about, but do you truly understand their significance? KPMG's agent framework demystifies the concept, offering practical steps to unlock AI agent's immense potential. Think of it as your GPS for AI strategy. KPMG partners with clients to harness the benefit of AI agents, guiding you from strategy to execution with a secure architecture, and a plan for

Starting point is 00:11:15 workforce devolution. Check out their comprehensive insights on scaling agent power within your enterprise. This isn't just about tech, it's a leadership imperative. Go to www.kpmg.us slash agents to learn more. That's www.kpmg.us.comg.us slash agents. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80%

Starting point is 00:12:00 plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-I-D-E development tool, pairing it with their coding co-pilot of choice to bring an AI-Native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project in your org. Visit blitzy.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted to AI Native.

Starting point is 00:12:34 That's BLITZY.com. If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point, but I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial readiness report. Over the last six months, Super Intelligence has built out an entire agent planning suite. We help you move from discovery to planning to implementation. After you've completed your agent readiness audits, we help you double click on your most important use cases with what we call our use case planning reports. These reports are going to help you understand

Starting point is 00:13:08 what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination. After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for. If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions. Just go to bit.ly slash super agent. That's bit.l.ly slash super agent, all one word. And if you have any questions, the agent can even help you book an appointment with our team. One sub-theme from all of this,

Starting point is 00:13:50 which isn't exactly new, but which I do think is profound, is let's call it the consumerification of OpenAI. Open AI certainly hasn't abandoned any part of their enterprise efforts. In fact, they used the occasion of their announcement to share that they now had five million businesses using ChatGBTGBT. C.O. Brad Lightcap and his thread wrote that he thought that developers and enterprises especially will love it. We also recently got news that OpenAI was developing their own forward-deployed engineering teams to service big customers who were spending at least $10 million with them with the sort of very hands-on type of development engagement that's required for enterprise AI to work. And yet at the same time, it's unignorable that Chat-Gat.

Starting point is 00:14:32 GBT is AI to a huge portion of the world. In advance of the announcement of GBT5, the company shared that they were on track to reach 700 million weekly active users. As Professor Ethan Malik pointed out, that's 8.6% of the world's population using chat GBT every week. That consumer basis, the fact that they are so definitively the model, the company, the application that people think about when they think about AI, has to go into the decisions that they make about where they're going to put their emphasis. GPD5 to me is not a strictly better implementation of AI than the previous OpenAI models was. They have talked endlessly about how challenging the model selector was, but it wasn't for us power users. In fact, one of the big complaints that you see all over places

Starting point is 00:15:18 like X right now is that the power users want their menus back. They want to be able to choose between the different models based on their query. No, instead, Open AI has made a choice here. They've made a choice to prioritize the needs and UX features that will most benefit the normal consumer over those of the power user, which doesn't mean that they're not considering the power user. The people who are paying $200 a month, like obviously I am, still can go in and select the legacy models to be available in their menus. And I'm sure there will be other concessions to power users that come online in the future. But it's very clear that this is a step towards the emphasis of that base user. The 700 million in general, not the 7 million power users on the top.

Starting point is 00:15:59 The implications for the enterprise, I think, are really interesting. On the one hand, as we'll see in just a minute, I think it opens up competitive opportunities for others. On the flip side, there is kind of an argument that we might see a bit of a convergence of consumer and enterprise AI usage. Basically, it wouldn't ultimately be all that surprising to me if in practice, the simplicity and the decisions that they made for consumers with GPD-5 actually improve its utilization in the enterprise as well. I don't think that's a for sure, but I wouldn't be surprised. Still, like I said, I do think that one of the other things that has changed after GPD5

Starting point is 00:16:36 is that it really reinforces that there is opportunity for the other big players, for Gemini, for Grock and Claude. OpenAI has had such a definitive lead since the beginning, because of the launch of ChatGPT, because they were first to get to a GPD4 class model, in fact, we called them GPT4 class models, but even at their size and scale, the decisions that they make have tradeoffs to other decisions they don't make. When it comes to finding the right balance between an intersection of consumer and enterprise, there are reasons to think that Google Gemini is better positioned than even OpenAI is. And as much improved as GPT5 is when it comes to coding, there are lots and lots of coders out there

Starting point is 00:17:16 who are not shifting their daily driver away from Claude. Beyond that, I think the continued focus on the consumer for OpenAI opens up an opportunity for Claude and Anthropic to continue to peel off enterprise users from OpenAI, although, as we'll discuss in a minute, there is a cost dimension here which could make that a little bit trickier. And then for GROC, there's the simple fact that GAPT5, as good as it is and as much of an improvement as it represents, is not some crazy knockout blow on performance. Grock 4 heavy beat it, for example, on a number of the benchmarks. Tony from XAI wrote, very proud of us at XAI after seeing the GPT5 release. With a much smaller team, we are ahead in many ways. Grock 4's world first university,

Starting point is 00:17:57 model and crushing GPD5 in benchmarks like RKGI. OpenAI is a very respectful competitor and still the leader in many areas, but we're fast and relentless. Many new models to share in the next few weeks. And look, for us as consumers, the fact that there are all these opportunities for the other big labs post-GPT5 release is awesome. It means that we are going to get so much advancement in so many areas we're just going to be drowning in opportunity for the foreseeable future.

Starting point is 00:18:24 Now, one interesting thing that comes out talking about enterprise and developers and Anthropics' ability to peel off, for example, users from Open AI, is that this was very much not just a capability competition moment, but a price competition moment. People were gagged by how low OpenAI price this. Theo writes, I've been using GPT5 for a bit now. The model broke me it so good. I didn't know what the price was. I assumed it would be 03 Pro price because it's that smart. Nope, truly insane. Niko Christie writes, this was an attempted Anthropic killshot. Get cozier with cursor and make pricing 10x cheaper than Opus. Excited to see how Anthropic responds. Personally, I don't mind $15 per million inputs. Give me the frontier. And at least for now, Nico is far from alone in that. In fact,

Starting point is 00:19:12 in Menlo's mid-year LLM market update, they found very much that at this point, people are not switching models for price. They are switching only for performance. However, workloads are going up dramatically, we're now in the multi-agent paradigm, which we'll get into in just a minute, and even as costs come down, the sheer number of tokens that we are going to be consuming is likely to be going up faster. So I'm not sure how long enterprises at least will have the privilege of not thinking about price. Now, going back to the idea that there's so much competitive opportunity between all the labs right now, it really reinforces that a lot of the action is going to take place at the app layer.

Starting point is 00:19:54 In a world where all the models are commoditized and very closely clustered together in terms of capabilities, the actual product experiences that people have are going to be the big drivers of customer devotion. Mix panel founder, Sehale, has been talking about this all year. Back in January, he shared a tweet from Cursor, where they wrote, O3 Mini is out to all Cursor users. We're launching it for free for the time being to let people get a feel for the model. The Cursor dev still prefer sonnet for most tasks, which surprised us. Cahill added to that, the app layer decides which model is used and which model isn't used now. Defaults matter. A few months later, he shared comments from Sam Altman and said app layer incoming. One, very smart models will be commoditized. Two,

Starting point is 00:20:35 build the best defining product in the space. Now, clearly OpenAI gets this. It's why they're building and released to much fanfare chat GPT agent. They very clearly value owning the relationship with their customers, rather than just being the foundation layer that everyone else builds on. Still for builders in this space, the fact that there is going to be so much opportunity at the app layer is a really exciting development and confirmation. Once again, for us as consumers, it means that the amount of choice we have, the amount of people who are building things customized for our use cases, is just likely to be incredibly, incredibly high. Last thing that changes, or at least gets amplified in the wake of GBT5, this kind of goes back to tool usage, this kind of goes back to the way

Starting point is 00:21:19 that agent decoding is evolving. But it's very close. clear that a big phenomenon right now is not just autonomous agents doing things for us. It's groups or parallel sets of autonomous agents doing things with a selector to determine which output is best or to combine the results. One of the things that Beth Jzos noted, comparing GPD 5 and GROC 4 on Humanity's last exam, is that GPD Pro, even with tools, still did not beat GROC 4 Heavy. GROC 4 Heavy had 44.4% as opposed to GPTT5. Pro with tools 42%. Beth pointed out, however, that

Starting point is 00:21:56 given that it's a single agent rather than a swarm of agents, that's very impressive. Now, I actually am not totally sure that GPT5 Pro for that humanity last exam isn't a similar type of structure. In their research blog, OpenAI writes, for the most challenging complex tasks

Starting point is 00:22:11 we're also releasing GPT5 Pro, a variant of GPT5 that thinks even longer using scaled but efficient parallel test time compute. I would imagine that parallel test time compute involves a process similar to Grog4 Heavy, where they spin up and deploy multiple agents to do that work in parallel. And increasingly, this is just going to be the norm. You're starting to see it with coding, with all of these IDs building and tooling where you can

Starting point is 00:22:35 spin up multiple agents at once. And I think you can view that as a leading indicator of where everything else is going to go. Right now, you're not using multiple agents at the same time to write social media copy, but that's only because the interfaces that you have access to aren't suggesting that you do. I think this is going to be one of the areas where we next saturate and see how much power we can pull out of it. And I think that that process starts right now. So friends, those are 10 things that I can see changing in the wake of GPT5. Let me know what you think about these.

Starting point is 00:23:02 Which are the most significant? Are there any you disagree with? Are there any that are obvious that I've missed here? Excited to begin this conversation and excited to really see what this model can do. But for now, that is going to do it for today's AI Daily Brief. I appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 10 Things GPT-5 Changes

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.