The AI Daily Brief: Artificial Intelligence News and Analysis - What's the Bigger Deal for AI: o3 Pro or o3's 80% Price Drop?

Starting point is 00:00:00 This podcast is supported by Google. Hey everyone, David here, one of the product leads for Google Gemini. If you dream it and describe it, V-O-3 and Gemini can help you bring it to life as a video. Now with incredible sound effects, background noise, and even dialogue. Try it with a Google AI Pro plan or get the highest access with the Ultra Plan. Sign up at Gemini.com to get started and show us what you create. Today on the AI Daily Brief, OpenAI drops O3 Pro and, drops the price of 03 by 80%.

Starting point is 00:00:36 Before that in the headlines, Meta's biggest acquisition ever, well, sort of acquisition at least, appears to be for data labeling startup scale AI. Coming along with it looks like a major leadership shakeup. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello friends.

Starting point is 00:01:01 Back with quick announcements today. First of all, thank you to today's sponsors, Gemini, Blitzy, Vanta, and Agency.org. And of course, to get an ad-free version of the show, you can go to patreon.com slash AI Daily Brief. I continue to be on the road, but the AI news continues to be rolling, so with no further ado, let's dive in. Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. It's rare that we have a day that is chock full of headlines and has a big, thick, juicy main, but that is exactly the story of today. In our main episode, we will, of course, be talking about 03's cost reduction and O3 Pro being released.

Starting point is 00:01:36 But when it comes to mainstream media covering AI, it is in fact a different story that is dominating headlines. That is, of course, the report that META is about to pay about $15 billion for 49% of data labeling startup scale. The company is taking non-voting shares, and obviously that 49% is very clearly designed to get around antitrust scrutiny, which is just a fact of life for all big tech companies now, despite the shifted administrations. One thing that has not really changed between the two is that big tech is very much in the antitrust hot seat. Importantly, though, this is not just an acquisition. And it's not even really that it would be META's biggest acquisition to date that's causing

Starting point is 00:02:16 attention. The New York Times broke that this is also part of a larger shakeup in AI leadership at META. They wrote that META is preparing to unveil a new superintelligence lab with 28-year-old Scale AI CEO Alexander Wang at the helm. Several other Scale AI employees are expected to join META and sources say that multiple seven to nine figure offers have been made to dozens of researchers from other leading AI labs. In other words, if this report is correct, META is rolling out compensation packages ranging up to hundreds of millions of dollars to poach top AI talent.

Starting point is 00:02:48 The news, of course, comes in the context of multiple changes to AI leadership at META. After reportedly being panicked by the release of DeepSeek and failing to impress with their own Lama4 model, some have seen META as being in a bit of a crisis. Bloomberg reports that Mark Zuckerberg himself is personally over. overseeing this new team, writing, Zuckerberg has prioritized recruiting for the secretive new team, referred to internally as a superintelligence group. He has an audacious goal in mind.

Starting point is 00:03:13 In his view, Medi Cannon should outstrip other tech companies in achieving AGI. Bloomberg sources say that the team is being hired up to around 50 people, including presumably Wang at the head of it. Now, what we don't have is information about current leadership at Facebook like Jan Lecun, but Zuckerberg's focus definitely appears to be this team. Those same Bloomberg sources say that Zuckerberg has rearranged desks so the new staff will sit nearer to him. Now, interestingly, a lot of folks privately asked me what I thought the deal was here. While Scale AI's business is very successful and seems to be growing,

Starting point is 00:03:48 they reportedly had $870 million in revenue last year and are on track for $2 billion this year. That's very clearly not the reason for Zuck to make this acquisition. It also doesn't seem like the natural place to go hunting for research talent, as that's not really what Scale does. The narrative that people have settled on quite quickly is about competition around data. Scale AI is somewhat unique as the largest startup providing data labeling services at scale.

Starting point is 00:04:12 They have over 100,000 global contractors working on labeling images, video, and text. Now, at the beginning, that was mostly about pre-training, but increasingly it's about higher-order reinforcement learning from human feedback, which continues to be a key part of not only model advancement, but also things like compliance in new regimes like the EU's AI Act. Many other people have the same thought that maybe this is Zuckerberg's way of cutting off competition

Starting point is 00:04:35 from data. And who knows, that may be part of it. It does feel to me from the outside, though, that if that is a part of it, it is only one part of it. For whatever reason, it feels to me like Zuck is fairly convinced that Alexander Wang is the new force and the new energy that he needs to bring in from a leadership position for AI inside of META to right the ship. The price that we're seeing may simply be the cost that it took to get him there, with Zuckerberg being able to justify all the rest, based on, yes, their business model, of course, but also the privileged position it puts them in vis-a-vis others who need their services. I would say overall, the tone and tenor of the response is skeptical.

Starting point is 00:05:13 Signal writes, so let me get this straight. Meta's AI strategy is just brute forcing with cash again. What's the vision? Just spending their way into the superintelligence race? Feels a lot like the metaverse play, overfunded, underthought, and wildly disconnected from how people perceive these experiences. Am I missing something? Flooding the zone? with capital just breeds distorted incentives and likely shallow execution. Indeed, on that incentive line, some people pointed out that the payday for Alexander Wang on this is going to be something like $4.2 billion. And so is he actually going to show up at Meadow with fire in his belly, or is he going to be just wanting to go off in Gallivant and party? For that, we will have to wait

Starting point is 00:05:48 and see, but that was far from the only news over the last day or so. One story that is getting some traction is that it appears that Elon Musk's feud with Donald Trump is weighing on his AI fundraising. Last week, it was reported that XAI was looking to raise $5 billion in debt funding. The Wall Street Journal reports that Morgan Stanley had gathered XAI executives on Thursday afternoon to pitch the debt to investors. That is the same Thursday afternoon that Musk himself was teeing off against the administration. The journal even reported that investors were following Elon's tweets on their phones while the

Starting point is 00:06:17 presentation was underway. Now, so far, this doesn't really seem to have affected things. The journal writes, so far buyers who showed initial interest haven't backed off in The demand for both the debt and equity sale has actually increased since Thursday, said one advisor to the company. Now, it feels like a pretty big grain of salt, as that's obviously the narrative that you would want. But maybe it's all an overblown story, given that Elon is officially starting to walk back his position tweeting this morning, I regret some of my posts about President Donald Trump last week. They went too far. For a very long time, Elon has had a

Starting point is 00:06:50 basically blank check when it comes to his companies, and so it will be very interesting indeed to see if that is starting to run out. Or, as it seems might be the case, this ends up being just a very temporary bump. One company that is not having any trouble getting interest for fundraising is lovable. The company is apparently in talks to raise $100 million at a $1.5 billion valuation, which honestly I could argue is kind of cheap. At the end of May, CEO Antoni Sika shared that the company had crossed 60 million ARR and that growth was up 50% week over week.

Starting point is 00:07:21 This is a company that still only has like 28 employees. And obviously, if you listen to this show regularly, you know how central I think vibe-coding will be to our future, and so it makes total sense to me that there's this big interest. And for those wondering why they would raise this money, given how much money they're making, the short answer is that this is going to be one of the most hotly contested spaces in all of AI, and it's just going to take resources to compete. If the story gets confirmed, I will of course share it here, but for now, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode.

Starting point is 00:07:54 This episode is brought to you by Blitzy. If you're a technology leader, here's something that probably sounds familiar. Your organization's competitive edge is buried in legacy code that desperately needs modernization, but the resources required feel out of reach. That was the case for a global investment analysis firm. They needed to migrate 70,000 lines of complex MATLAB financial algorithms to Python. Algorithms that drive investment decisions for trillions in assets. Their estimate, months of high-cost specialized engineering work. Instead, they partnered with Blitzie. Blitzy's autonomous AI preserved mathematical precision and generated over 80% of the codebase, completing the migration with just five days of engineering time. They cut the timeline

Starting point is 00:08:32 by 95% and saved 880 engineering hours. If your organization is facing similar modernization challenges, visit blitzie.com to schedule a consultation and discover how AI power development can transform your technical capabilities. Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim security, they have to prove it, achieving compliance with a framework like SOC2, ISO-2, ISO-2701, HIPAA, GDPR, and more, is how businesses can demonstrate strong security practices. The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources.

Starting point is 00:09:11 Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieved 535,000, dollars per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com slash NLW. That's V-A-N-T-A.com slash NLW for $1,000 off.

Starting point is 00:09:43 Today's episode is brought to you by Agency, an open-source collective for inter-agent collaboration. Agents are, of course, the most important theme of the moment right now, not only on this show, but I think for businesses everywhere. And part of that is the expanded scope of what agents are starting to be able to do. While single agents can handle specific tasks, the real power comes when specialized agents

Starting point is 00:10:06 collaborate to solve complex problems. However, right now there is no standardized infrastructure for these agents to discover, communicate with, and work alongside one another. That's where agency, spelled A-G-N-T-C-Y, comes in. Agency is an open-source collective building the Internet of agents. a global collaboration layer where AI agents can work together.

Starting point is 00:10:28 It will connect systems across vendors and frameworks solving the biggest problems of discovery, interoperability, and scalability for enterprises. With contributors like Cisco, crew AI, Langchain, and MongoDB, agency is breaking down silos and building the future of interoperable AI. Shape the future of enterprise innovation, visit agency.org to explore use cases now. That's agn-tcY.org. Welcome back to the A&TCY.org.

Starting point is 00:10:54 I Daily Brief. Boy, you know that you are owning a news cycle when the title of the podcast is which of your two announcements was the bigger deal. Yesterday I tweeted, an O3 Pro that's more agentically capable, an 80% cost reduction in existing O3, a massive acquisition light that could reshape competitive dynamics regarding data, multiple multibillion dollar fundraises, a viral singularity prognostication, and a huge debate on reasoning, and it's barely Wednesday. Yes, of course, based on the inscrutable and immutable laws of the universe when I am traveling, it has to be the biggest week at AI we've had in some time. Luckily for all of us, I've got all the equipment on the road and we are going to dig into this. In a surprise announcement, yesterday Sam Altman tweeted,

Starting point is 00:11:35 we dropped the price of 03 by 80%. Excited to see what people will do with it now. Think you'll also be happy with O3 Pro pricing for the performance. A couple of hours later, the official OpenAI account confirmed OpenAI O3 Pro today. And so these are, of course, the two big stories that we're going to focus on in this main episode, a highly performant new model that spoiler alert seems even more tuned for the agenic era that we're moving into, and a massive cost reduction that could have significant implications for what people build. So let's talk first about this price reduction. Chubby at Kim Minismas summed up many people's feelings when they tweeted this is the real revolution, with a chart of the 87% price reduction between 03 pro and 01 pro. Now keep in mind,

Starting point is 00:12:20 this is not even the 80% reduction that we were talking about with 03. This is just the base cost of 03 Pro as it came out as compared to where 01 Pro was just a few months ago. But in terms of that big 03 price drop, many people could hardly believe it. Now, the specifics here were that it went from $40 per million output tokens to just $8, and on top of that, they also announced that they were going to double the rate limits for O3 for plus users. Now, this led many to assume that this must be a distilled version of the model. Not so, said Adam, who does go to market at OpenAI. He tweeted in response, it's not distilled, same model. When someone said, is it quantized, though? Adam responded, it's the same model full stop. And when someone asked, then how was it done? Were there

Starting point is 00:13:05 major improvements on the software side of things? Is this because of increased resources? Or did nothing change and you can just incur the cost now? Adam responded to that one, as my teenage daughters would say, the inference engineers ate. Basically, that, it seems like these are actual efficiency gains, not just competitive pressure in a bigger balance sheet. You'll remember that OpenAI also has jumped from $5.5 billion in ARR at the end of last year all the way to $10 billion now. Now, the claim here at least is that this is actual technical improvement. What's more, Open AI researcher Nome Brown reinforced that businesses need to be skating to where the puck is going in terms of cost. Posting, input is now $2 per $1 million and output is now $8 per $1 million.

Starting point is 00:13:46 The cost versus intelligence curve will continue to improve rapidly. Some people, though, despite the protestations of OpenAI staffers, think that this is at least a little bit about competitive pressure. Lassan Al-Gyb, who featured prominently in our breakdown of the Apple Intelligence Report from yesterday, tweeted, Gemini 2.5 Pro and Sonnet might actually be forcing OpenAI to lower their ridiculous O3 prices. However, others were just excited. Edwin Arbus writes, O3 is 20% cheaper than GPT-40.

Starting point is 00:14:14 Rethink everything. Bindu Reddy celebrated the competition, saying O3 price just dropped by 80%. This makes it less expensive than Sonnet 4. Finally, we have choice. Now, not to be petty here, but I do for just one moment want to bring things back to almost exactly a year ago. You might remember that as summer was taking hold in 2024, people were getting a little bit bored, and we had a whole slate of articles that wanted to discuss how AI was never going to pay back the big investment that was going on in it. Now, some part of that conversation, station was CAP-X and Wall Street valuations. All things that I said were firmly in the realm of investors to decide how they should value things. But you might remember that there was one part

Starting point is 00:14:54 of a Goldman Sachs report that really ground my gears. Their report was called Gen A.I, too much spend, too little benefit. And while if you go back and listen to the show, I'm actually arguing that the report is not nearly as negative as the title suggests. One person who was very negative was Goldman Sacks head of global equity research, Jim Covello. One thing that was particularly notable to me, and I called out then, was that when the interviewer asked, even if AI technology is expensive today, isn't it often the case that technology costs decline dramatically as the technology evolves? Jim first argued that that's revisionist history.

Starting point is 00:15:28 But he also said, even beyond that misconception, the tech world is too complacent in its assumption that AI costs will decline substantially over time. Moore's Law and Chips that enable the smaller, faster, cheaper paradigm driving the history of technology innovation only proved true because competitors to Intel like AMD forced Intel and others to produce costs and innovate over time to remain competitive. The starting point for costs he continued is also so high that even if cost decline, they would have to do so dramatically to make automating tasks with AI affordable. And so obviously I think you know where I'm heading here. In three months, we have seen an 80% decline in arguably the most performant model, at least the

Starting point is 00:16:04 most performant model when it comes to many agentic use cases. Not only is that a faster price decline than Jim predicted. It's faster than anything that anyone predicted. Simply put, whether you are skeptical of AI in general or not, cost will not be the constraining factor in how much impact it has. But what about this new model O3 Pro? If you're a regular listener, you'll know that I am a huge fan of O3. It is my default model for a huge amount of the sort of business strategy and ideation type of use cases that are my day in and day out. And so I even more than most have a particular interest in digging in deep around O3 Pro. That said, I've only just barely scratched the surface. I'm planning on doing a top five use case type of show later in the week, and I'm still learning exactly what

Starting point is 00:16:50 O3 Pro is really good for us compared to O3, but in the meantime, we do have some folks who have spent time with the models who shared some really interesting thoughts. The most notable of these comes from AI entrepreneur Ben Heilack, who wrote a guest post for latent space. The piece, by the way, has the phenomenal title of God is hungry for context. But here's how Ben summed up his time with O3 Pro. He said the problem with evaluating O3 Pro, it's smarter, much smarter. But in order to see that, you need to give it a lot more context. There was no simple tester question I could ask that blew me away. But then I took a different approach. My co-founder Alexis and I took the time to assemble a history of all of our past planning meetings at Rain Drop, all of our goals, even recorded voice memos,

Starting point is 00:17:34 and then asked O3 Pro to come up with a plan. We were blown away. It spit out the exact kind of concrete plan and analysis I've always wanted an LLM to create, complete with target metrics, timelines, what to prioritize, and strict instructions on what to absolutely cut. But the plan O3Pro gave us was specific and rooted enough that it actually changed how we are thinking about our future.

Starting point is 00:17:56 This, Ben points out, is hard to capture in an e-val. Now, this is hugely resonant for me. I can in very simple language describe how different it is to talk about business strategy and ideas with 03 as compared to, for example, 4-0 or 4-5. But it's huge. It is incalculable. There is in most situations very little of value when sharing and trying to get feedback on an idea or processing a particular business problem when just chatting with 4-0 and 45.

Starting point is 00:18:25 O3, on the other hand, is so frequently useful, if not for its blistering insight, than for different things like the way that it structures thinking through the answer to a particular problem, that it's very rare that when I'm brainstorming or ideating or thinking about something, I don't have a sort of ongoing dialogue with some combination of O3 raw and deep research with O3. And it sounds like from what Ben is arguing in this piece, that the glow up and change between O3 and O3 Pro might even be more significant. It seems to resonate with Sam Altman, who tweeted that particular quote about how it changed how they're thinking about their future. Now, the other thing that I think is

Starting point is 00:19:00 really important to note about Ben's review of O3 Pro, and something which relates directly back to the conversation we were having earlier this week about the Apple paper and whether and in what ways it mattered or not, is that O3Pro's power is a real-world contextual power. It's about application and interaction with the real world, not just raw power in the lab. Ben writes, trying out O3 Pro made me realize that models today are so good in isolation we're running out of simple tests. The real challenge is integrating them into society. It's almost like a really high IQ 12-year-old going to college. They might be smart, but they're not a useful employee if they can't integrate. Today, this integration primarily comes down to tool calls, how well the model collaborates

Starting point is 00:19:42 with humans, external data, and other AIs. It's a great thinker, but it's got to grow into being a great doer. O3 Pro makes real jumps here. It's noticeably better at discerning what its environment is, accurately communicating what tools it has access to, when to ask questions about the outside world, rather than pretending it has the information or access, and choosing the right tool for the job. In other words, this is a model that is meant to be in the real world with real context. He even says that on the flip side, its big shortcoming is that if you don't give it enough context, which could be anything from meeting notes to call transcripts to PDFs to you name it, he says it tends to overthink.

Starting point is 00:20:19 Quote, it's insanely good at analyzing, amazing at using tools to do things, not so good at doing things directly itself. I think it would be a fantastic orchestrator. Now, as an example of that type of overthinking and why it's so important with new models, to figure out what use cases they open up and what use cases they're good for, is that investor Eric Wall demonstrated the other case. He pitted O3 against O3 Pro in selecting a group of animals to defend the user against the rest of the menagerie. There were selections like 50 eagles, 10,000 rats, five gorillas, and a single human rifleman to give you an idea of what we're dealing with here.

Starting point is 00:20:54 After making their choice, the models then argue against each other to determine the winner. Wall writes, O3 Pro lost to O3 in this test despite thinking for 10 minutes. O3 thought for 25 seconds. Interestingly, more telling was O3 Pro's explanation of why it lost. The model wrote, Thinking longer is only an advantage when the extra cycle surface new decisive information. Here, they mostly amplified a hidden assumption and buried the robustness check.

Starting point is 00:21:19 The lighter model's quick heuristic, minimized single point of failure, maximize coverage, was enough to nail the best answer faster. The point is, once again, the context is everything. If O3Pro doesn't have enough context to chew on, it will actually use the extra inference to confuse itself by overthinking. Now, for a somewhat more substantive evaluation, one of the few sets of e-vals that aren't totally washed at this point is the ARC-AGI tests. Now, on this test, the TLDR basically of it,

Starting point is 00:21:47 is that O3 Pro is performing pretty much in line, with O3 on ARC AGI 1, but for a much higher cost. However, what's worth noting is that ARC has intentionally started to limit the inference deployed against their tests as they're looking for sparks of AGI at the consumer level. This means that O3Pro probably isn't performing at the level you would use it in in high-value tasks during this testing. So what does this all mean for O3Pro? I'm not sure yet, but my strong guess is that if Ben's right and that the real majesty of this model is in how it understands context and uses tools, it's going to take just a lot of little while for us to really understand when you should be using O3 Pro and for what, as opposed

Starting point is 00:22:24 to O3 or a different model. I am going to, myself, surely take some time even though I'm traveling this week to try to suss that out, and I will be back here to share what I've learned later in the week. For now, a very exciting day with big implications for the long term. As to this question of which of these is a bigger deal? The short answer is that they both are in totally different ways. They both show how things are trending in totally different aspects. Model capability. Model capability. and practical utility even more continue to increase, costs continue to decrease. The net of all of that is a straight line to intelligence to cheap to meter, and incredible new capabilities for all of us to deploy.

Starting point is 00:23:03 For now though, that is going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - What's the Bigger Deal for AI: o3 Pro or o3's 80% Price Drop?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.