The AI Daily Brief: Artificial Intelligence News and Analysis - Why You Need Different AIs for Different Jobs

Episode Date: September 11, 2025

Even the biggest companies are learning that no single AI model can do it all. Microsoft, for example, is now bringing Anthropic’s Claude into Office 365 because it outperforms OpenAI in key areas l...ike Excel and PowerPoint. This shift highlights a bigger truth: the future of AI is about using different models for different jobs—not looking for one model to rule them all.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? nlw@aidailybrief.ai

Transcript
Discussion (0)
Starting point is 00:00:00 Today in the AI Daily Brief, why even big companies are looking at different AI models for different AI use cases. Before that in the headlines, Volkswagen is making a billion dollar bet on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, quick announcements before we dive in. First of all, thank you to today's sponsors, robots and pencils, Blitzy, KPMG, and Superintelligent. To get an ad-free version of the show, go to Patreon.com slash AI Daily Brief. And to learn about sponsorship opportunities, send us a note at sponsors at AIdailybrief.aI. With that, let's dive in.
Starting point is 00:00:42 Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with the latest example of an enterprise declaring their intention to move into the AI space, with Volkswagen who say that they are planning to invest $1.2 billion into AI capabilities by the end of the decade. The company says that the focus will be on enhanced AI-supported vehicle development, industrial applications and expanding their IT infrastructure, basically the full range of things that you can use AI for. Said Hawkei Stars, member of the Board of Management for IT, AI is our key to greater speed, quality, and competitiveness across the entire value chain, from vehicle development to production. Our ambition, no process without AI. Volkswagen said they anticipate $5 billion in savings
Starting point is 00:01:24 by 2035 from increased efficiencies and cost reductions. And what's cool about this is not just the big number, but the breadth of the ambition, that idea of no process without AI. The company says that they currently have more than 1,200 AI applications in use and several hundred more in development. Now they're trying to scale those processes up. One of the key initiatives is to make an AI-powered engineering application available globally to their engineers across all group brands. Volkswagen said that this, in combination with other initiatives, are expected to slash vehicle development time from three years to two years. Now, there are no shortage of companies who are articulating their AI plans. But I think what's interesting about this particular instance is the totality of the goal here.
Starting point is 00:02:03 Next up, if you are worried about an AI bubble, well, Oracle says that's fine, but we're just going to keep on cashing our checks. The company has provided fairly insane revenue growth as part of their earnings report. Now, Oracle's earnings came in soft with a minor miss on revenue, but sky-high forecasts knocked analysts out of their seats. Oracle announced a 359% year-over-year increase on their contract backlog to reach $455 billion. CEO Safra Katz said that the company had signed four multi-billion dollar contracts across three different customers in just the past quarter. The projected revenue is broken down across a five-year scaling forecast.
Starting point is 00:02:37 They expect cloud infrastructure revenue to almost double to $18 billion this year and double in each of the subsequent four years. That means Oracle is projecting $144 billion in infrastructure revenue for 2030, an 8x increase from where they expect to be at the end of this year. John DeFucci of Guggenheim Security said that he was, quote, away by the forecast. And while the projection is eye-popping, it doesn't seem to be just based on random hype. This is a measure of contracts that are signed but not yet fulfilled stretching out for years. Now, of course, there's no guarantee that all this revenue will come in, but it is at least a measure
Starting point is 00:03:08 based on planning and contracting across the sector. During the earnings call, Brad Zellnick of Deutsche Bank remarked, we're all kind of in shock in a very good way. There's no better evidence of a seismic shift happening in computing than these results you just put up. The projection sent Oracle stock up 27% in after-hours trading, which is just a massive jump for a company already worth more than a half a trillion dollars. Google's Cloud Division is also forecasting a huge revenue boost over the coming years as well. Cloud CEO Thomas Curian said that the division has up to $106 billion in commitments from existing customers that they're yet to fulfill. He said that at least half of those contracts, around $58 billion worth, are expected to be fulfilled and turned
Starting point is 00:03:46 into revenue within two years. Of the backlog, Curian said, it's growing faster than our revenue, So not only are we growing revenue, but we're also growing our remaining performance obligation. Google's cloud revenue is currently at $13.6 billion and growing at a 32% annual rate. The new revenues already under contract then suggest that the already rapid growth is going to do nothing but increase. Rites Ricardo Amato, Google Cloud's $106 billion backlog isn't just future revenue. It's proof of sticky demand. Over in Meta Land, the company has signed a licensing deal with image generation startup Black Forest Labs. According to Bloomberg's sources, the multi-year deal will see Meta commit to spend 35 million in the first year and 105 million in the second year.
Starting point is 00:04:26 Now, this is, of course, the second big deal meta has signed with an image generation startup, with mid-geny agreeing to a licensing deal for their aesthetic technology, that was the phrase they used last month. The terms of that deal, however, were not disclosed. Black Forest, for their part, has signed a string of deals over the past year, including partnerships with Adobe, Canva, and Snap. One of their highest profile deals came last year when they partnered with XAI to power GROC's initial image generation. capabilities. XAI has since developed their own image model, but the feature helped catapult Black Forest Labs to be seen as one of the leading startups in the category. A Bloomberg source said that Black Forest is currently generating 96 million in ARR and is expected to hit 300 million
Starting point is 00:05:03 in ARR next year. Now, moving over to the world of models, the big dust up around GPD5 was not just that people were underwhelmed with the 5 model, but that OpenAI deprecated GBT 4O. That led to a rebellion, which restored 4O, and it seems like the company has learned their lesson. Nick Turley, the head of ChatGBTGBT, posted on Tuesday, last month we announced that everyone now has access to advanced voice mode, with usage limits expanded from minutes per day to hours for free users and near unlimited for plus. We also announced that standard voice mode would be retired after a 30-day sunset. We've heard feedback that standard voice is special to many,
Starting point is 00:05:39 and we want to get this transition right. Standard voice will stay available while we address some of your feedback in advance voice. Now, while standard voice isn't quite as beloved as GPT40, So, users still expressed a strong preference for retaining the choice. The replies to Turley's posts were filled with feedback on advanced voice mode. Some suggested it was lazier and less smart compared to the text models. Others noted that the more recent update seemed like a downgrade. At the same time, there was also sentiment that advanced voice mode is more
Starting point is 00:06:05 performant in domains like handling customer service calls. Now, I think it's a really interesting question of whether there are meaningful differences that make older models better, or whether people just get really used to the models that they know and interact with. Lastly, today, you may notice that even though Apple had a big launch event yesterday, we did not include it anywhere near the top of the AI headlines. The reason for that can be summed up in the Virges headline about the event, Apple barely talked about AI at its big iPhone 17 event.
Starting point is 00:06:34 Now, the Verge pointed out that last year, Apple couldn't stop talking about Apple intelligence, but this year, the emphasis was very different. If you've seen anything about the event, you probably saw the incredibly thin iPhone air, or maybe the Bitcoin Orange Pro Max. But if you were paying attention closely, there were some sneaky, subtle things that I think relate to Apple intelligence and its future. One hint towards an AI future for the iPhone, for example, was an upgrade to the silicon powering the handset.
Starting point is 00:06:58 Apple executives said that they were now including neural accelerators in each GPU core, providing what they said was MacBook Pro levels of compute in an iPhone. That could make iPhone 17 ready to power on device models from Google, for example, if that partnership comes to fruition. The one place that there was a very distinct AI feature in the announcements was for the new AirPods Pro 3. In an upcoming feature, this new generation of AirPods will be capable of live in-person generation. Basically, this will work exactly as you expect. The AirPods will be aware of someone talking in a different language around you and will actually translate in basically real-time what they were saying. From there, if you have your iPhone connected,
Starting point is 00:07:37 you can simply speak your response and it will show it in the original language. Now, there are some folks who thought it was a little dystopian, but mostly people are pretty enthusiastic about this being a native feature of this device. I tend to think that even if it's a little weird at first, and doesn't fully break down the linguistic barrier, the amount of engagement and new opportunity that it opens up will be more than worth the initial feeling of cringe. Still, overall, it feels like when it comes to Apple and AI, there was a very distinct choice here to not overemphasize Apple intelligence and just build and set up for the future. We'll close the headlines there. Next up, the main episode.
Starting point is 00:08:11 Small, nimble teams beat bloated consulting every time. Robots and Pencils partners with organizations on intelligent, cloud-native systems powered by AI. They cover human needs, design AI solutions, and cut-through complexity to deliver meaningful impact without the layers of bureaucracy. As an AWS-certified partner, Robots and Pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the U.S., Canada, Europe, and Latin America, clients gain local expertise and global scale.
Starting point is 00:08:39 As AI evolves, they ensure you keep peace with change. change, and that means faster results, measurable outcomes, and a partnership built to last. The right partner makes progress inevitable. Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code.
Starting point is 00:09:08 Enterprise engineering leaders start every development sprint with the Blitzy platform bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-I-D-E development tool, pairing it with their coding co-pilot of choice to bring an AI-native SDLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project
Starting point is 00:09:45 in your org. Visit blitzie.com and press book demo to learn how Blitzie transforms your STLC from AI assisted to AI native. That's BLITZY.com. What if AI wasn't just a buzzword, but a business imperative? On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward-thinking enterprises. hosted by me, Nathaniel Wittamore, and powered by KPMG, this seven-part series delivers real-world insights from leaders who are scaling AI with purpose, from aligning culture and leadership to building trust, data readiness, and deploying AI agents. Whether you're a C-suite executive, strategist, or innovator,
Starting point is 00:10:24 this podcast is your front-row seat to the future of Enterprise AI. So go check it out at www.kpmG.org.us slash AI podcasts, or search you can with AI on Spotify, Apple Podcast, or wherever you get your podcasts. If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point. But I wanted to tell you today about the full suite of agent readiness products that go beyond just the initial readiness report. Over the last six months, Super Intelligence has built out an entire agent planning suite. We help you move from discovery to planning to implementation. After you've completed your agent readiness audits, we help you double-click
Starting point is 00:11:05 on your most important use cases with what we call our use case planning reports. These reports are going to help you understand what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination. After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for. If you want to learn more about superintelligence agent planning suite, we've built a
Starting point is 00:11:35 custom GPT to answer your questions. Just go to bit.ly slash super super super agent. That's BIT.L.Y slash super super agent, all one word. And if you have any questions, the agent can even help you book an appointment with our team. Welcome back to the AI Daily Brief. Today we are talking about something which I think separates the power users of AI from the more casual users. and that is the embrace of the simple idea that different AIs are good for different things. That for specific use cases and specific jobs, you may want a different AI than for a different job, that there is no one model to rule them all. When we talk about AI models, we have a tendency, as humans do with all things, to ask which is the best,
Starting point is 00:12:21 as though there is one definitive answer that subsumes all others. This is why when a new model is launched, all of our conversations are about how much better it is than the previous best, rather than asking what it is good at, what it is uniquely good at even. However, if you look at the behavior of the people who are getting the most out of AI, they tend to use a big variety of tools for a lot of different purposes. I've talked before on this show about how I use Mid Journey differently than Ideogram, for example, even though they are both image generation tools. Today we've got a set of stories, all of which sort of add up to this idea of using models for different purposes,
Starting point is 00:12:55 and understanding as we move more into the production era of AI, what sort of trade-offs between cost and performance real people are making. Now, we kick off with the story that you probably have seen, which is that Microsoft is planning to actually use Anthropic models inside its Office 365 copilot product. Of course, historically speaking, Microsoft has used exclusively OpenAI models, or a combination of OpenAI and their own internal models, but primarily Open AI.
Starting point is 00:13:22 Now they're looking to buy from Anthropic as well, specifically using Claude for a set of features and functions inside the Office Suite. Now, if you've seen the coverage for this, you've probably seen it presented as some big psychodrama between Microsoft and OpenAI. And certainly it is the case that Microsoft is on a larger trajectory of trying to have at least some amount of distance from its partner. It's basically been on this trajectory ever since the whole dust up with Sam Altman being fired and rehired and is nothing new.
Starting point is 00:13:48 Now, my position on this has always been that Microsoft is sincere when it says that the Open AI relationship is important to it and it wants it to go really well, but that they are also clearly hedging and trying to make sure that they don't get covered. with their pants down if something goes poorly. This is made all the more important by the fact that their contract is written so weirdly and rests on strange definitions of AGI for very, very mission-critical business functions. And yet, with all that said, it is also equally clear to me that the people trying to read the story primarily as an example of some weird negotiating tactic between Microsoft and OpenAI are missing the much more simple truth that there is just stuff
Starting point is 00:14:23 that Claude is better at right now. And if you read the sources, that's exactly what they're saying. writes the information, while Microsoft's use of Anthropic technology could be viewed as a negotiating tactic, leaders developing the Office AI features found Anthropics' latest models simply perform better than OpenAIs at automating tasks such as financial functions in Excel or generating PowerPoint presentations based on customers' instructions. Continuing they write, OpenAI's recent launch of its flagship GPT5 model is a step up in quality, but Anthropics Claude Sonnet 4 performs better in subtle but important ways, such as creating PowerPoint presentations that are more aesthetically pleasing. than what OpenAI's models create. This is basically a big business version of the same consideration that people go through when they try to figure out if they're going to use
Starting point is 00:15:05 Claude or ChatGPT for any given use case. Microsoft is in this interesting position where because of the legacy of that company, they have incredible distribution and incredible enterprise lock-in. However, one of the real challenges is that all of a given company's employees are using chat GPT outside the workplace and thus able to know when co-pilot isn't stacking up. One of the great frustrations you hear from employees inside big companies is that the tools they get to use on their own time with their personal emails are so much better than the enterprise versions. It doesn't surprise me then to see Microsoft prioritizing what it thinks are the better models for a given use case inside one of their most important product suites. Now, it is notable that they're willing to pay to access Anthropics models through Amazon Web Services, even though they get OpenAIs tech for free. But still ultimately, I do believe that this just comes down to an assessment of which models are better for a particular set of tasks.
Starting point is 00:15:55 For those of you who are interested by and excited about market competition, however, investor Dylan Reader points out that this does seem to open some space for OpenAI to launch a rumored productivity suite, given the growing space between them and Microsoft. Now, interestingly, these use cases that Microsoft seems interested in using Clod for seem to be ones that really matter to Anthropic more broadly. The company has just introduced the ability to create and edit documents directly from Claude's interfaces. Users can now generate Excel spreadsheets, PowerPoint decks, Word documents, and PDFs from within the app and web interface. Anthropic writes, this transforms how you work with Claude. Instead of only receiving text responses or in-app
Starting point is 00:16:33 artifacts, you can describe what you need, upload relevant data, and get ready to use files in return. They gave the example of providing raw data and receiving back a polished set of documents complete with charts and statistical analysis. Anthropic says that the feature is powered by Claude's computer use capabilities. Basically, they're saying that Claude is actually creating these documents in a virtual computer environment before delivering them to the end user. Anthropic writes, this shows where we're headed, making sophisticated multi-step work accessible through conversation. As these capabilities expand, the gap between ideas and execution will keep shrinking. Now, the features are similar to the recently released chat
Starting point is 00:17:09 chatypte agent from OpenAI, which also uses a virtual computing environment to deliver generated documents. Rights Venturebeat, both OpenAI and Anthropic, as well as competitors like Google, Cohere, and Mistral are all chasing enterprise users in hopes of becoming the de facto chatbots for employees. Google has the advantage of owning a workplace suite in Google Workspace, which allows people to create documents using Google Docs, although it doesn't offer the same type of file creation on the chat platform yet. Claude and ChatGBTGBT, and yes, Gemini already allow users to generate and edit code with context on the platforms if they're not interested in using their IDE of choice. Having the ability for users to use natural language to
Starting point is 00:17:44 create any type of document they need and just describe what they'd like to see, keeps people on the chatbot instead of switching to another window. And indeed, it's very clear that these companies are trying to own not just the workflows, but the product surface where the work gets done as well. People are extremely enthusiastic about this new set of features. Professor Ethan Malik writes, Claude's new ability to work with Excel files is the best I've seen so far. I've given it existing spreadsheets to work with and asked it to create new ones, good use of formatting, formulas, etc.
Starting point is 00:18:10 It created all of this, and he shares a bunch of documents, including 406 formulas from one prompt and it's solid. Olivia Moore from A16Z writes, Claude can now make slide decks, and in my opinion, its agent is much better than ChatGBTGBT. I gave it a link to Figma's S1 and asked it to make a presentation, which it did in less than five minutes. Olivia actually went on to say why she thought it was better. The first was efficiency. It took Claude four and a half minutes, while it took GenSpark 17, ChatGBTGT, 19, and Manus 28, to make a serviceable deck.
Starting point is 00:18:40 As Olivia points out, speed equals more iteration. When it came to precision, she said, it got the data correct, generated some of its own insights, and even framed an investment thesis. Formatting needed just a few tweaks. The only product that outperformed here was GenSpark, which took much longer. She found that it was editable. I asked it to make the slides more aesthetic, and it generated a new version in under two minutes with a more appealing design.
Starting point is 00:19:01 Same with asking it to make the slides more financially rigorous. Very few other products can check their work in this way. In my estimation, this type of document output is going to become so commonly used with LLMs that we will be hard-pressed to remember a time when they didn't have these features natively. The idea of copy-pasting markup files into another program is just going to seem so absolutely archaic. I think that these will really and truly unlock whole new levels of productivity even from where we are today. Now, on the theme of using different models for different purposes, a lot of discussion earlier in this week has been about which coding models are best and how
Starting point is 00:19:36 people's behaviors are changing in that area. Specifically, we talked about how much behavior shifts we'd seen, moving from cloud code to codex, and shared that Sam Altman said that codex usage had been up something like 10x over the past few weeks. Engineer Sawyer Hood showed a chart of agent sessions on their platform, showing that the number of sessions with Codex was going up while the number of sessions with Claude Code was coming down in fairly dramatic ways. When asked why he thought that was, he said it is just really good for the price. And I think as we discuss this idea of different models being good for different purposes, it's important to note that as we move more into production, this is not just a question of raw
Starting point is 00:20:13 performance, but a question of how much performance you get for what cost. In their July market update, Menlo Ventures noted that almost half of AI programmers had upgraded to Claude Forc Sonnet, which was at the time the latest release. They remarked, this creates an unexpected market dynamic. Even as individual models drop 10x in price, builders don't capture savings by using older models. They just move on mass to the best performing one. I think that that is going to be less and less the case as we get further and further over the frontier of capability, and as more of these use cases move into bigger production, that simply requires more raw token consumption. And speaking of moving to production and actually caring about cost, Google's V-O-3 made some big updates this week. Indeed, overall, one category of
Starting point is 00:20:57 use cases that are coming along quite a bit this year is image and video generation, and Google at this point seems to be establishing a dominant position. The key thing that we talked about around the release of Nanobanana last month was the idea that the model's improvement in editing capabilities meant that image generation had reached a point where it could be reliably used in a lot of additional professional workflows.
Starting point is 00:21:18 With this week's update to V-O-3, Google seems to be attempting something similar for video generation. Now, some creators only cared about the fact that the update adds support for 1080P resolution and even more importantly, vertical videos, which matters greatly when it comes to generating short-form content for social media,
Starting point is 00:21:34 but Google are also showcased v03 fast, which is a faster and more affordable version of the model. Google AI Studio head Logan Kilpatrick pointed to price drops with VO3 seeing a 46% decrease and VO3 Fast seeing a 62% decrease as part of the big innovation. He couldn't have put the goal of this release more clearly when he wrote, VO3 is stable now and ready for scaled production use. Now in terms of where those numbers are, VO3, the original recipe now costs around 40 cents per second of generation, which is down from 75 cents, and V-O-3 fast is 15 cents per second,
Starting point is 00:22:07 which is down from 40 cents. In a sample video posted by Google, a generated rock climber said, V-O-3 is now like 50% cheaper and higher quality, so go build. V-O-3 was already a huge breakthrough in it it introduced audio generation baked directly into a video model. And while it has already seen a ton of use
Starting point is 00:22:26 in spreading viral videos, and really I think represented a huge inflection point in the general usage of AI video, there was still a sense that many of those generations were largely about showing off what the model could do, or alternatively having some very, very basic, repeated types of uses, like the Bigfoot vlogs that were all over TikTok and Instagram Reels. With this update, Google is absolutely and very clearly targeting a user base that is going to put this to work for more professional uses.
Starting point is 00:22:53 Sego Edgewee showed off what he'd done with V-O-3 for his fitness coaching platform, commenting, really amazed with the result that I got on the first try. And while right now V-O-3 has established something of a lead, I would anticipate that over the next three to six months, you are going to see an absolute flood of highly competent video models with included audio generation that are trying as much as they can to also compete on price. As that happens, I guarantee that we will discover that there are some things that VO3 is best at, or VO4, or whatever we get next,
Starting point is 00:23:23 but that there are other types of videos that other models are simply just still better suited for. And that, if anything, is the takeaway of this episode. It's fun to look at the changing relationships between Microsoft and Anthropic and Open AI in dramatic Shakespearean terms. But ultimately, one of the key secrets to AI right now is that different models are good for different things. Anyways, friends, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.