The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Model Wars Just Heated WAY Up

Episode Date: August 2, 2025

Today’s AI Daily Brief dives into the escalating model wars between OpenAI, Google, and Apple. OpenAI seems to have leaked GPT-5 and their open weights model temporarily, plus the surprise launch of... Google’s Gemini 2.5 Deep Think, and why Apple is scrambling to catch up—with M&A as its only viable AI strategy. We also explore new AI interface innovations from Manus and Perplexity, plus the implications of China’s probe into Nvidia’s H20 chips. Ask GPT about our Agent Readiness Audits - ⁠⁠⁠https://bit.ly/supersuperagent⁠⁠⁠Brought to you by:KPMG – Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://kpmg.com/ai⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠agntcy.org ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠  ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Plumb - The automation platform for AI experts and consultants ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://useplumb.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, the latest in the very much heating up AI model wars. Before that in the headlines, can Apple actually acquire its way out of AI failure? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements for this Friday. First of all, thank you to today's sponsors, Blitzy Vanta, Plum, and Superintelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief. And if you're interested in sponsoring the show, hit me at NLW at Breakdown.network. But with that, let's dive into some Wall Street dealmaking and model intrigue.
Starting point is 00:00:44 Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We have a very deal-oriented headlines today, kicking off with Apple, who were the latest big tech company to do their quarterly earnings. TLDR on this one is that while the company is seeing a big rebound in iPhone sales, their failure on AI is absolutely weighing. them down. Big Tech earnings have been telling a very clear story around AI. The companies that are betting big like Microsoft and Meta have seen blowout results and massive increases in their stock prices. Indeed, despite the fact that some analysts remain concerned on this KAPX spending, as the Wall Street Journal put it, Big Tech's $400 billion AI spending spree just got Wall Street's blessing. The sort of results that Microsoft and Meta are putting up are, in other words,
Starting point is 00:01:30 justifying even in the short term those costs to Wall Street investors. And then there's Apple. Strange, strange, strange Apple. On the one hand, the numbers were actually strong. iPhone sales were up 14% to reach 44.6 billion for the quarter, which was 10% above analyst forecasts. Top line revenue was up 9.6%, which wasn't nearly as impressive as the cloud giants, but far from a disaster. Guidance was also strong, with Apple predicting revenue growth in the mid-to-high single digits for the next quarter, which is well above the previous 3% analyst forecast. Overall, this was Apple's strongest quarter of revenue growth since December 2021. In any other era, this would have been a blowout quarter that defied negative views from analysts,
Starting point is 00:02:11 but the market only gave Apple a 2% boost in after-hours trading. And honestly, if you have to have one takeaway from mega-cap tech earnings this week, it's that the market does not care how many iPhones you ship, only how many AI tokens you're serving. Apple is now squarely in catch-up mode and did, in fact, articulate something of a plan to investors. During the earnings call, CEO Tim Cook said, we see AI is one of the most profound technologies of our lifetime.
Starting point is 00:02:38 We are embedding it across our devices and platforms and across the company. We are also significantly growing our investments. Apple has always been about taking the most advanced technologies and making them easy to use and accessible for everyone, and that's at the heart of our AI strategy. Now, of course, this is what we all thought made sense last year when they announced Apple Intelligence, but their ability to deliver on that idea has been woefully lacking.
Starting point is 00:03:00 Cook did go on to say that they were, quote, reallocating a fair number of people to focus on AI features and that they have a, quote, great, great team and we're putting all of our energy behind it. A CNBC interview ahead of earnings had focused on mergers and acquisitions as Apple's potential AI solution. Cook said that Apple would significantly grow and is, quote, open to M&A that accelerates our roadmap. He touted seven acquisitions made so far this year, but also acknowledged that none of them were huge in terms of a dollar amount. Later on the earnings call, Apple also said that they were currently making acquisitions at the rate of one every several weeks.
Starting point is 00:03:31 so clearly they are leaning into this narrative that maybe they can buy their way out of this. Now, if that is the strategy that they approach, it will represent a big shift for Apple, which has historically been reluctant to acquire their way to success. For many observers, including me, it's felt like this was their only path. Back as Apple intelligence really started to sputter painfully between Q1 and Q2, I and many others suggested that Apple should go out and try to buy one of the foundation model companies, mistrial perplexity or my top choice at the time Anthropic. However, since then, the problem is that if that is the strategy they want to pursue, valuations
Starting point is 00:04:04 are now running away from Apple. Anthropic was already a stretch in March, but with its valuation rumored to have reached $170 billion, it's now basically unobtainable. Even mistral and perplexity have moved from single-digit billions to the mid-teens. What's more, it'd be surprising if any of the leading AI startups are even really open to an acquisition partner. I don't want to overstate that case. You never know what's going on behind the scenes. But the point is that the longer that Apple waits, the harder it gets. The people who watch Apple most closely just aren't buying this. Bloomberg's Apple Watcher Mark German writes,
Starting point is 00:04:34 to be sure, Cook has said several times that Apple is unafraid of big deals, and they've still never done one. While I do think AI changes that, Apple's pace of M&A has only slowed down dramatically in recent years, and his comments today aren't a new line of thinking. Frankly, to me, it sounds kind of like Tim Cook is trying to write it out for a few years. Honestly, he kind of reminds me of Jay Powell at FOMC Pressors when he does nothing and says we're just going to have to wait
Starting point is 00:04:56 and C. When an analyst asked Cook whether he thinks AI models will become commoditized, Cook declined to answer, adding, that gives away some things on our strategy. But honestly, even if they are playing a wait and see strategy or they have some conviction around when things get commoditized, they are paying real costs in the meantime. Peter Anderson of Anderson Capital Management said, Apple's embarrassing AI shows how it has lost its mojo with innovation. And the lack of innovation speaks to the lack of revenue growth. And that speaks to why we don't see upside in the stock. says Glenbue Trust, Chief Investment Officer Bill Stone, it's hard to be excited about Apple when you can go look to the other magnificent seven stocks and find double-digit growth that should continue for a while amid the AI wave, especially since those are often cheaper. Apple would be a lot more interesting if the multiple was lower, but what finally gets growth going again is the biggest question. Now, moving over to the private market, we have some further reporting on the rumored OpenAI deal. According to New York Times sources, OpenAI has raised $8.3 billion at a $300 billion valuation. The round with,
Starting point is 00:05:55 was completed months ahead of schedule and was five times oversubscribed. ARR is not the 12 billion that was reported earlier this week, but in fact 13 billion. That's up from 10 billion in June and projected to surpass 20 billion by the end of year. Importantly, given how much conversation there is around Anthropics starting to eat their lunch in this area, the number of business users who pay for chat GPT has jumped from 3 million a few months ago to 5 million now. Meanwhile, Anthropics revenue number does seem to be 5 billion, and growing by my calculations at maybe more than a billion a month. Moving over to the geopolitical side of things,
Starting point is 00:06:29 Chinese authorities have summoned NVIDIA to discuss alleged security risks with their H20 chips. The investigations are beginning before the first shipments can even land following the U.S. reversing their ban on H20 exports. Yesterday, the Cyberspace Administration of China called company representatives to discuss what they call serious security vulnerabilities. The CAC wrote, U.S. lawmakers have previously called for advanced chips exported from the U.S.
Starting point is 00:06:51 to be equipped with location tracking features. The location tracking and remote shutdown capabilities on Nvidia computing chips are ready, according to USAI exports. The regulator is demanding that Nvidia release documentation of this and explain the potential loopholes and backdoor capabilities of the H20. Now, the notice has some pretty obvious echoes to the Huawei ban of 2019. US officials declare the company's networking equipment to be an unacceptable national security risk and placed Huawei on an import blacklist. And while there's good reason to believe their claims, the ban had the second order
Starting point is 00:07:20 effect of stopping the Chinese hardware giant in its tracks. It ensured Western competitors like Cisco and Juniper continued to have a place in the market. It's entirely possible that Chinese officials are attempting to do the same thing for Huawei in their domestic AI market. In the six or so months where H20 chips have been banned, Huawei have made large investments in building their own competitive chips. In fact, a large part of the rationale for unbanding the H20s was to ensure that Huawei can't establish a foothold in their domestic market due to a lack of competition. Forrester analyst Charlie Dye said,
Starting point is 00:07:49 the CAC's scrutiny over H20 security risks could further erode NVIDIA's Chinese market share and rising domestic competition. It also aligns with China's broader push to accelerate domestic semiconductor alternatives for technological self-reliance amid U.S. export controls. NVIDIA, for its part, denied the allegations saying, cybersecurity is critically important to us. NVIDIA does not have backdoors in our chips that would give anyone a remote way to access or control them. Lastly today, a new user milestone, GitHub co-pilot has crossed 20 million users.
Starting point is 00:08:18 Microsoft's CEO, Satcha and Nadella, slipped the comment into Wednesday night's earnings call, although it wasn't immediately clear whether this was weekly or monthly active users or some other metric at the time. TechCrunch has now confirmed that Nandela was quoting for all-time users. And while obviously a recurring number would give a better picture of what current usage is actually like, it's still pretty impressive for Microsoft. It means that 5 million users have tried out GitHub copilot for the first time over the past three months, which is a very big ramp up. By way of comparison, although we haven't heard from Cursor for a while,
Starting point is 00:08:48 Back in March, they had a million daily active users. We can only assume that a lot of these GitHub copilot users aren't coming back day after day, but the number still demonstrates just how much of a distribution advantage Microsoft has over its startup rivals. For now, though, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context.
Starting point is 00:09:13 Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.
Starting point is 00:09:38 Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-Naping. native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase
Starting point is 00:09:56 on a real development project in your org. Visit blitzy.com and press book demo to learn how Blitzie transforms your STLC from AI assisted to AI Native. That's BLITZY.com. As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal.
Starting point is 00:10:15 But with AI accelerating how quickly startups build and ship, security expectations are higher earlier than ever. Getting security and compliance right can unlock growth or stall it if you wait too long. With deep integrations and automated workflows built for fast-moving teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models, infra, and customers evolve. Fast-growing customers like Langchain, writer and cursor trusted Vanta to build a scalable foundation from the start. And look, as someone who lives in the world of enterprise procurement, I love how Vanta makes it easy to get compliance right. The last thing you need when you're trying to win that big deal is to
Starting point is 00:10:51 have it scuttled by something that Vanta has solved for over 10,000 companies. Go to Vanta.com slash NLW to save $1,000 today through the Vanta for Startups program and join over 10,000 ambitious companies already scaling with Vanta. That's VANTA.com slash NLW to save $1,000 for a limited time. Today's episode is brought to you by Plum. Are you building with AI? Plum noticed that every technical creator tends to hit the same wall. You've got AI workflows people want, but monetizing them feels impossible because client work doesn't scale. Selling copies gives away your IP. And building your own platform, that's becoming a software company. It's a hard gap to bridge, and that's why they built Plum. Plum helps creators build an audience of paid subscribers for their AI workflows,
Starting point is 00:11:37 all on a single platform. Think substack for automations. There's no need to build extra infrastructure just to get paid for your expertise. Plum handles that so creators can do what they do best, solving problems with AI. Ready to turn your expertise into passive income, visit useplum.com, that's Plum with a B. If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point,
Starting point is 00:12:02 but I wanted to tell you today about the full suite of agent readiness products that go beyond just the initial readiness report. Over the last six months, Super Intelligence has built out an entire agent planning, suite. We help you move from discovery to planning to implementation. After you've completed your agent readiness audits, we help you double-click on your most important use cases with what we call our use case planning reports. These reports are going to help you understand what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination. After that,
Starting point is 00:12:37 you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with, what they need to build exactly the agent that you're looking for. If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions. Just go to bit.ly slash super agent. That's bit.l.ly slash super agent, all one word. And if you have any questions,
Starting point is 00:13:03 the agent can even help you book an appointment with our team. Welcome back to the AI Daily Brief. This is one of those shows where there is a better-than-I-like chance that by the time you're actually listening to it, a huge amount of the information has changed because something has just been released. But given that, I'm going to try to get it out as soon as possible. And I think even if some new models have been released in the meantime,
Starting point is 00:13:23 the broader point that model competition is heating up significantly right now is going to remain. So to dig into this, obviously, if you are paying attention to AI right now, the big thing that everyone is waiting for is GPT-5. It feels like we are a matter of days away from this. We've seen more and more examples of what people think is GPT-5 in the wild. We had a bunch of what we thought were test models that were taken off the testing arena suggesting that it was getting ready for release. And then just at A,
Starting point is 00:13:51 we got GPD-5 for a few minutes only to see it get removed. Basically, very briefly on Hugging Face, a new reported OpenAI model called GPT-5 New Proxy API EV3 popped up only to be withdrawn very, very quickly thereafter. In addition, we got what looked like OpenAI's open-source model. In fact, it looked like we got two versions of it, GBTOSS-120B and GBTOSS-20B, so potentially a 120-b-perameter version and a 20-billion-parameter version. Chedislua writes, the repo only provided three bits of info, D-types, config.json, and the weights. Now, people dug in very quickly to the limited information we had, with a lot of people assuming
Starting point is 00:14:35 that this was actually an intentional leak to build up hype. People are speculating about the architecture, thinking that it's a mixture of experts model, and other people are just starting to get hyped. Vracer X writes, Elon says open AI betrayed their mission, but meanwhile, OpenAI just leaked O3-level open source models and deleted them. Too late, 120 billion parameters, MOE with four experts, runs on a single H-100, 130K context, rope-scaled, blazing fast, multilingual code-native FP4 train, no API, no gatekeeping, just raw weights. They write, this might be the biggest open source moment since Deepseek. Mr. Who, matter what? If this is real, the monopoly just cracked.
Starting point is 00:15:14 Now, a lot of the dissection so far is pretty technical. It obviously doesn't have anything to do with the sort of use case discussion that is the bread and butter of this show. But the point is that it appears that in addition to GPT5 coming very soon, we are very much on the verge of getting their open weights model as well. And at least from the developer community, there is easily as much excitement about that. Although, to be fair, there is also some amount of lingering skepticism. Nathan Lambert writes, I welcome all contributions to open model. ecosystem, but seriously doubt we can rely on OpenAI to be a long-term champion we need in releasing more models. Happy to be proven wrong, we'll see what the next couple days bring. Now, if a lot of
Starting point is 00:15:48 the excitement is about the theoretical OpenAI models coming soon, Google swooped in with something that I don't think people really expected. You'll remember that recently we got both OpenAI and Google achieving the equivalent of a gold medal on the International Math Olympiad with state-of-the-art versions of their models. While Sam Altman made it clear that exactly that version wouldn't be coming to consumers anytime soon, Google just dropped their version of that model. It's called Gemini 2.5 DeepThink. CEOs in Dar Pichai writes, We're bringing a version of Deep Think that achieved gold medal status at IMO to ultra
Starting point is 00:16:23 subscribers in the Gemini app, and the official version is now in the hands of mathematicians. So, editors note here, it sounds like this is close enough for them to claim it's the same thing, but there are some differences apparently with the version that some number of mathematicians have. Anyways, Pichai continues, toggle it on when reasoning through complex scientific literature, tackling a coding problem that requires careful consideration of time complexities, or anything else Demis Sivas considers a fun Friday night. Putting my branding hat on for a second, being geeky and playful is a good fit for Google.
Starting point is 00:16:53 They should do more of this. In their blog post, they talk a little bit more about how deep think works. Basically, they say that it extends Gemini's parallel thinking time. Just as people tackle complex problems by taking the time to explore different angles, weigh potential solutions, and refine a final answer, Deep Think pushes the frontier of thinking capabilities by using parallel thinking techniques. This approach lets Gemini generate many ideas at once and consider them simultaneously, even revising or combining different ideas over time before arriving at the best answer. By extending the inference time or thinking time, we give Gemini
Starting point is 00:17:23 more time to explore different hypotheses and arrive at creative solutions to complex problems. We've also developed novel reinforcement learning techniques that encouraged the model to make use of these extended reasoning paths, thus enabling Deep Thing to become a better, more intuitive problem solver over time. Now, the principle of this makes total sense to me, right? It's not dissimilar to the core idea underlying my Dr. Strange theory, which is basically that when you have this much intelligence, one of the really interesting ways to deploy it is to use a bunch of different scenarios, basically to answer a question in a bunch of different ways, and then see which one seems to be best for whatever set of criteria. So what are the types of use cases that
Starting point is 00:18:00 this approach opens up. One says Google is iterative design and development. They write, we've been impressed by Deep Think's performance on tasks that require building something complex piece by piece. For example, we've observed Deep Think can improve both the aesthetics and functionality of web development tasks. The example prompt they give is design and create a very creative, elaborate and detailed voxel art scene of a pagoda in a beautiful garden with trees, including some cherry blossoms. Make the scene impressive and varied and use colorful voxels. Use whatever libraries to get this done, but make sure I can paste it all into a place. a single HTML file and open it in Chrome. It shared the outputs of Gemini 2.5 Flash, Gemini
Starting point is 00:18:35 2.5 Pro, and Gemini 2.5 Deep Think. Other areas they highlight for Deep Think are scientific and mathematical discovery, basically saying it's a powerful tool for researchers, as well as algorithmic development and code. Specifically, they say that it excels at tough coding problems in which formulation and careful consideration of tradeoffs and time complexity is paramount. Now, this being a new model release, of course, there had to be some benchmarks, and according to Google at least, deep think out. absolutely crushes them. On humanity's last exam, it meaningfully exceeds Gemini, OpenAI, and GROC 4, which you'll remember just about five minutes ago, was the model that everyone was talking about.
Starting point is 00:19:10 On live codebench, it also sees a major jump up. And then, of course, in mathematics, on both the IMO 2025 and the AIME 2025, there is a major step change with this model. Now, so far, not too many people have popped up sharing their early access experiments with Deep Think, so we're going to have to wait a couple days to see what people do with it when they get their hands on. on it. That said, the people who have had it are affording pretty favorable first impressions. Wright's Professor Ethan Malik. Very good model, big gains over standard Gemini 2.5 Pro for a lot of problems. One of the examples he gives is the Starship Control Panel Prompt that he tries with every model. We recently shared a version that he thought might be GPT5, which was very
Starting point is 00:19:50 impressive relative to the previous competitors, but he pointed out that this is the first time he's seen a model make a 3D interface in response. By the way, if you were watching this, this is the one that we previously shared. He wrote the mystery model summit with the prompt, Create something I can paste into P5JS that will start on me with its cleverness in creating something that invokes the control panel of a starship in the distant future. 2351 lines of code first time. He also shared his otter on a plane using Wi-Fi and draw a unicorn with Tick Z, which as he says is a language built for scientific diagrams and very much not for drawing. Now right now, Deep Think is only available for ultra
Starting point is 00:20:24 subscribers, and over the next couple of days I'll be keeping an eye out for people who are experimenting, and frankly, if I don't see enough of it, I'll just dig in there myself. Now, it's not just raw, state-of-the-art frontier models that are interesting right now. There's a lot of interface development happening around the models as well. Manus, for example, on Thursday introduced with their calling wide research, as opposed to deep research, get it? They write, earlier this year, the launch of Manus defined the category of general AI agents and shaped how people think about what agent products can and should be.
Starting point is 00:20:55 But to us, Manus was never just an AI. It has always been a one-of-a-kind personal cloud computing platform. Traditionally, harnessing cloud compute for custom workflows has been a privilege reserved for engineers and power users. At Manus, we believe AI can democratize that power. Behind every Manus session runs a dedicated cloud-based virtual machine, allowing users to orchestrate complex cloud workload simply by talking to an agent. From generating tailored rental presentations to safely evaluating cutting-edge open-source projects, deterring completeness of the virtual machine is what gives Manus its generality, and opens the door to endless creative possibilities. Naturally, we've been asking ourselves, how can we scale the compute available to each user
Starting point is 00:21:31 by 100x, and what new possibilities emerge when anyone can control a supercomputing cluster just by chatting? After months of optimization, our large-scale virtualization infrastructure and highly efficient agent architecture have made this vision a reality. Today, we're introducing the first feature built on top of this foundation, Manus-wide research. They say that basically Manus is a way to go after complex, large-scale tasks that might require information on, for example, hundreds of items. So, for example, if you wanted to understand recent earnings results across all of the Fortune 500, that's the sort of breadth that wide research is designed for. What the wide really conveys, and this goes back a little bit to the architecture conversation we were just having
Starting point is 00:22:11 around Deep Think as well, is that this is a system for parallel processing, and those are the words that they use. In fact, they write, at its core, wide research is a system-level mechanism for parallel processing and a protocol for agent-to-agent collaboration. The key to wide research they write isn't just having more agents, it's how they collaborate. Unlike traditional multi-agent systems based on predefined roles, every sub-agent and wide research is a fully capable general-purpose minus instance. Which is super interesting. Basically, this means that tasks are not bound by a predetermined architecture of agents, but the model can theoretically figure that out. And once again, people are only just starting to get access to wide research, but people are into exploring this approach of breaking
Starting point is 00:22:50 research into subtasks and seeing agents work on them in parallel. One more platform that I view in this model battle that is, again, more of an interface or an approach update than it is just a model improvement, is Perplexity's Comet Browser. Now, this is something I've talked about a fair bit on this show, but I feel like every day that goes on, I'm seeing more and more people rave about Perplexity Comet and desperately try to get an invite to the system. Take, for example, Toby Luckie, the CEO of Shopify, who wrote, I'm constantly impressed with Perplexity Comet. Amazing to give it a complex task and watch it claim a tab and toil away at it.
Starting point is 00:23:22 Browsers are interesting again. People have been sharing their use cases like cleaning up subscriptions, consolidating information, and more. I think people are definitely seeing in this, not just a new type of browser, but is the first step to something bigger. Arvind and not the Arvind Trinivas, who is the CEO of Perflexity writes. Comet Browser is getting really innovative. Now we can easily automate mundane things we do
Starting point is 00:23:44 the net. I see it as an early precursor to an AIOS. I'm worried Zuck and his meta who suck all our data is going to get there first. I hope not. I root for perplexity winning it. Mostly, if you are an enfranchised AI user, what a time to be alive. We've got exciting rumors coming every day, actually super powerful new models to use, agentic systems in practice for the first time, and whole new interfaces to explore. I highly recommend taking some time this weekend to go check out some of these new tools, although like I said, it's getting a little pricey given that all of them are coming to their version of an ultra subscription first. But still, what a time to be alive, excited to see what we all get to build next. For now, that is going to do it for today's AI Daily Brief.
Starting point is 00:24:25 Appreciate you listening as always. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.