The AI Daily Brief: Artificial Intelligence News and Analysis - AI Just Achieved Something No One Thought it Would Until Years From Now

Starting point is 00:00:00 Today on the AI Daily Brief, a major milestone of advanced AI is breached before pretty much anyone thought it was going to be. Before then, in the headlines, Netflix says they've officially used generative AI in a final production. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements today. First of all, thank you to today's sponsors, Blitzy, Plum, Vanta, and Superintelligent. And to get an ad-free version of the show, go to patreon.com slash AI Daily Brief. Finally, if you were interested in sponsoring the show, shoot me a note at NLW at Breakdown. Dot Network.

Starting point is 00:00:39 Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We got another little interesting tidbit from earnings calls last week, but this time it wasn't about how much code was being completed with AI or anything like that. It was that Netflix has admitted to using Gen A.I. For final footage for the first time in a show that actually appeared on their screens. So this scene appeared in an Argentine show called El Itternata translated to the Eternat. It depicted a building collapsing, and Netflix co-CEO Ted Sarandos said that AI allowed producers

Starting point is 00:01:15 to finish the scene 10 times faster and cheaper than traditional visual effects would have allowed. Now, importantly, this wasn't, at least according to Sarandos, a case of cutting corners to save costs. Basically, in the past, a show intended for a small market like Argentina simply would have had to forego the scene because it didn't fit in the budget. This was then not a replacement for something that could have existed otherwise. It enabled a show to have a type of production quality that wouldn't have been possible before based on the economics of where it was being released. Sarando said, we remain convinced that AI represents an incredible opportunity to help creators make films and series better, not just cheaper. There are AI-powered creator tools,

Starting point is 00:01:55 so this is real people doing real work with better tools. Our creators are already seeing the benefits in production through pre-visualization and shot planning work and certainly visual effects. It used to be that only big budget projects would have access to advanced visual effects like de-aging, but then he went on to describe how this hit show in Argentina was able to do this sequence that just wouldn't have been in the budget before. Sarandos wrapped up, so the creators were thrilled with the result, we were thrilled with the result, and more importantly, the audience was thrilled with the result. So I think these tools are helping creators expand the possibilities of storytelling on screen, and that is endlessly exciting. Now, co-CEO-CEO

Starting point is 00:02:30 Peter's also said that Netflix is also piloting Gen A.I. To drive personalization, search, and ads, and that they plan to introduce AI-powered interactive ads in the second half of the year. Now, this was a test balloon, if ever I've seen one. And frankly, a pretty savvy one. By doing this first in a show that wouldn't have had the budget to have this sort of VFX otherwise, it really puts the emphasis on AI as opportunity technology, not just efficiency technology, to use a parlance from around the AI Daily Brief community. And yet by mentioning it on the earnings call, they also get a chance to see what sort of feedback in vitriol they're going to deal with. And boy, for a tiny throwaway mention on an earnings call, this got a lot of attention.

Starting point is 00:03:09 If you go search Google News for Netflix AI, there are pages and pages of results. It's not just the tech press, it's the New York Times, the BBC, the Guardian, and so on and so forth. Frankly, I can't believe that it's taken this long for this to happen, but you better believe we're going to see a lot more of this in the months to come. Except maybe not in Europe. One of the weird things going on right now with AI regulation is a bit of a global balkanization, where many companies are just not willing to engage in Europe due to the restrictions of the AI Act. Specifically, META has said that it will not sign on to the EU's AI Code of Practice. Released earlier this month, the Code of Practice is a voluntary framework

Starting point is 00:03:47 that is designed to help companies comply with the AI Act that bans training on pirated materials and provides transparency and documentation guidelines. One critical measure requires an AI company to comply with requests to remove copyrighted material from datasets, something which isn't easily done. Signing onto the code of practice isn't required, but it does give model companies more legal protections if they're accused of breaching the AI Act. Announcing that they won't sign on the code, Meta's head of global affairs Joe Kaplan posted, Europe is headed down the wrong path on AI. We have carefully reviewed the European Commission's Code of Practice for general-purpose AI models, and meta won't be signing it. The code introduces a number of legal uncertainties for model

Starting point is 00:04:23 developers, as well as measures which go far beyond the scope of the AI Act. Businesses and policymakers across Europe have spoken out against this regulation. Earlier this month, over 40 of Europe's largest businesses signed a letter calling for the Commission to stop the clock in its implementation. We share concerns raised by these businesses that this overreach will throttle the development and deployment of frontier AI models in Europe and stunt European companies looking to build businesses on top of them. Now, if the dispute turns into a standoff, regulation of AI could become a flashpoint for U.S. European relations. The Trump administration has already fired a few shots across the bow, indicating that they won't abide the EU handing down massive fines to U.S. tech companies.

Starting point is 00:05:00 In a February executive order, the White House spelled out their strategy for defending American companies from extortion. Now, we are still a little bit off from the implementation date, so it's possible that EU bureaucrats could change course. The Code of Practice still needs to receive the final sign-off from the European Commission, as well as individual member states. In addition, big tech firms won't need to comply until August 2nd, although that date date could end up being delayed. UCLA adjunct professor Aaron Rao writes, update, open AI claims it will comply with the quote-unquote voluntary EU AI code. Meta says it won't. Basically, OpenAI has taken option one, fake compliance versus meta doing option three, referring to another tweet of his,

Starting point is 00:05:37 principled rejection. To be clear, though, U.S. regulation isn't necessarily peachy for all AI companies either. For example, Service Now's acquisition of Moveworks is attracting in-depth antitrust review from the DOJ. The Justice Department opened the in-depth probe in June and are now sending follow-up requests. That doesn't necessarily mean the case will go further, but the $2.85 billion acquisition can't be completed until the probe is finalized. Interestingly, this is the first sign we've seen from the Trump Justice Department that they're concerned about market concentration in AI below the hyperscalor level. Service Now is, of course, a major B2B platform, but they don't create their own foundation

Starting point is 00:06:12 models. If the deal is disallowed, it would have big implications for integrated agenic products. MoveWorks was acquired in part to provide a data-compatible. and discoverability layer for ServiceNow's agents. They currently rely on that layer, so blocking the acquisition would make both platforms far less competitive in the agentic era. The probe also raises questions about other acquisitions in the space. Salesforce made a similar integration play in buying informatica in May, and while Meta's aquaire spree is technically exempt from pre-approval from the FTC, they could still end up under DOJ scrutiny at a later stage. Mostly the issue is just the time it

Starting point is 00:06:45 takes. Service Now has already waited four months to close the deal, and with competition moving as fast as they are, they really don't have time to wait. Speaking of acquisitions, one more story today in that front, Enesphere, which is the startup behind Cursor, is staffing up with top talent from across the startup ecosystem in a bid to keep up with larger rivals. The latest deal sees Enesphere hire several top engineers from AI-powered CRM startup, Kowala.

Starting point is 00:07:08 Now, reportedly, Cursor has zero interest in adding CRM to their product. They just need skilled AI engineers. The news comes as Cawala prepares to shut down in September. The four-year-old startup had recently raised a 15 million $1.1 Series A, but apparently decided it didn't make sense to continue. TechCrunch writes, The Kuala deal paints a picture of the two types of AI startups we're seeing in 2025. There's Cursor, a juggernaut of an AI tool that is growing so fast that's starting to encroach on the AI space's largest players, including Microsoft and Anthropic. At the same time, there's a growing

Starting point is 00:07:38 number of startups like Kowala, B2B AI startups that seem promising, with a co-founder from meta and advisors like Jack Altman, but that have quickly run out of Steam. This is going to be a huge reality in a shaping force in the way that Enterprise AI develops in the coming year, and one that frankly I think that there is a lot of really interesting opportunity in. So much so that if you are a private equity firm or holding company who is interested in that category of enterprise or B2B companies, definitely shoot me a note. It's my initials at Bsuper.ai.

Starting point is 00:08:08 I think there are some fascinating things to be done in the space. As some of these startups that got funded in 22, 23, 24, just hit the wall and start to think about what they might want to do next. In any case, with that test balloon of my own floated, let's move on to the main episode. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy is used alongside your favorite coding copilot as your batch software development platform for the Enterprise seeking dramatic development acceleration on large-scale codebases. While traditional copilots help with line-by-line completions, Blitzy works ahead of the IDEE by first documenting your entire codebase, then deploying over 3,000 coordinated AI agents in parallel to batch build millions of lines of high-quality

Starting point is 00:08:52 code. The scale difference is staggering. Copilot's might give you a few hundred lines of code in seconds, but Blitzy can generate up to 3 million lines of thoroughly vetted code. If your enterprise is looking to accelerate software development, contact us at blitzy.com to book a custom demo or press get started to begin using the product right away. Today's episode is brought to you by Plum. You put in the hours, testing the prompts, refining JSON, and

Starting point is 00:09:16 wrangling nodes on the canvas. Now it's time to get paid for it. Plum is the only platform design for technical creators who want to productize their AI workflows. With Plum, you can build, share, and monetize your flows without giving away your prompts or configuration. When you're ready to make improvements, you can push updates to your subscribers with a single click. Launch your first paid workflow at useplum.com. That's Plum with a B and start scaling your impact. As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations are higher

Starting point is 00:09:55 earlier than ever. Getting security and compliance right can unlock growth or stall it if you wait too long. With deep integrations and automated workflows built for fast-moving teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models, infra, and customers evolve. Fast-growing customers like Langchain, writer and cursor, trusted Vanta to build a scalable foundation from the start. And look, as someone who lives in the world of enterprise procurement, I love how Vanta makes it easy to get compliance right. The last thing you need when you're trying to win that big deal is to have it scuttled by something that Vanta has solved for over 10,000 companies.

Starting point is 00:10:29 Go to vanta.com slash NLW to save $1,000 today through the Vanta for Startups program and join over 10,000 ambitious companies already scaling with Vanta. That's V-A-N-T-A-com slash N-L-W to save $1,000 for a limited time. Today's episode is brought to you by Super Intelligence, specifically Agent Readiness Audits. Everyone is trying to figure out what agent use cases are going to be most impactful for their business, and the Agent Readiness Audit is the fastest and best way to do that. We use voice agents to interview your leadership and team, and process all of that information to provide an Agent Readiness score, a set of insights around that score, and a set of highly actionable recommendations on both

Starting point is 00:11:10 organizational gaps and high-value agent use cases that you should pursue. Once you've figured out the right use cases, you can use our marketplace to find the right vendors and partners, and what it all adds up to is a faster, better agent strategy. Check it out at besupor.a.i or email agents at besuper.a.i to learn more. Welcome back to the AI Daily Brief. There is definitely a sense in the air that we are on. a precipice. It's coming from semi-whispered tweets like this one from former stability founder Ahmad Mustak who writes, yes, the acceleration timelines aren't fast enough from some stuff I've seen

Starting point is 00:11:47 recently, not from unreleased AI models. A phase shift is coming very soon, and I hope we will make it okay to the other side. Sad face emoji, and then in a follow-up, he writes, sorry for vague post. This is far from the only example of something like this that I've seen recently. Another piece of this sensibility is the growing enormity of the deals being thrown around for top talent. On Twitter, for example, people are hearing about billion dollars or billion point two five offers for four years of work. And people responded fiercely that there must be some IP included with that, but even if those numbers aren't exactly accurate, they seem directionally correct. So this is all the background noise for news that we got at the end of last week, that OpenAI's most recent experimental reasoning model had actually, won gold at the International Math Olympiad or IMO.

Starting point is 00:12:36 Now, the IMO is a high school math competition, but it's one of the world's most difficult and prestigious, and its participants have gone on to be some of the most decorated mathematicians of their generation. The contest involves high-level theoretical math problems that require formal proofs rather than numerical answers. The model was given the same constraints as human contestants, four-and-a-half-hour exam sessions, no tools or internet access. Now, this was, of course, a test it was not OpenAI actually competing. But still, Alexander Way, a reasoning engineer at OpenAI writes, why is this a big deal? First, international Math Olympiad problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we've now progressed

Starting point is 00:13:16 from GSM 8K, around 0.1 minutes for top humans, to the math benchmark around one minute, to AIME, around 10 minutes, to the International Math Olympiad, which takes around 100 minutes. Second, IMO's submissions are very hard to verify multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut verifiable rewards. By doing so, we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians. Besides the result itself, he continues, I'm excited about our approach. We reach this capability level not via narrow task-specific methodology, but by breaking new ground and general purpose reinforcement learning and test time compute scaling. Now, to get specific about the performance, the model solved five of six questions and was

Starting point is 00:13:59 independently verified by former IMO medalists, which would again place it performing well enough for gold. And to reinforce what's different about this performance as compared to the RKGI test that O3 ran late last year, the result was achieved completely without tools like a Python execution environment or a web browser. Everything the model knows about math was learned in pre-training or during the reinforcement learning process. Now for people who have been watching the benchmarks on a level that's more about just advertising your latest model, the IMO gold medal has been one of the achievements that could mark a significant advancement. In fact, this benchmark is something that people have opined on since at least 2022, and that basically no one thought would arrive this soon.

Starting point is 00:14:38 Nat McAlees wrote, we're seeing much faster AI progress than Paul Cristiano and Iliasor Yudkowski predicted, who had gold in 2025 at 8% and 16% respectively by methods that are more general than expected. Now, those predictions were made in February 2022 and presumed the use of tools, and while someone pointed out that Yudkowski actually had it at at least 16%, because it was in the context of a bet with Cristiano, the point still remains that they had it fairly low. Now, what's relevant about these two guys, is that they've been mainstays of the AI safety world for decades, frequently warning of fast takeoff, meaning that they're inclined to think that things were going to happen quickly. Terence Tao, the youngest person to ever participate in the IMO at

Starting point is 00:15:15 the age of 10, and one of the greatest living mathematicians also didn't see this coming. Last month in an appearance on the Lex Friedman podcast, Tao predicted that AI wouldn't score very highly on the IMO tasks and should start with a contest where the solution is in a long-form proof. Even professional AI skeptic Gary Marcus was impressed when he learned that the model didn't have access to external tools. Now, over the weekend, OpenAI staff chimed in on why the results matter so much. Jerry Torrick wrote, why I'm excited about the IMO results we just published, we did very little IMO-specific work. We just kept training general models. All natural language proofs, no evaluation harness. We needed a new research breakthrough and Alex Way and team delivered.

Starting point is 00:15:52 researcher Nome Brown unpack the technical details a little more, writing, typically for these AI results like in Go, Dota, poker, diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn't an IMO-specific model. It's a reasoning LLM that incorporates new experimental general purpose techniques. So what's different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this,

Starting point is 00:16:19 proofs or pages long, and take experts hours to grade. Compare that to AIME, where answers are simply an integer from zero to 99. Also, this model thinks for a long time. O-1 thought for seconds, deep research for minutes. This one thinks for hours. Importantly, it's also more efficient with its thinking, and there's a lot of room to push the test-time compute and efficiency further. He also discussed the acceleration, commenting,

Starting point is 00:16:42 it's worth reflecting on just how fast AI progress has been, especially in math. In 2024, AI labs were using grade school math, GSM-8K, as an eval in their model releases. Since then, we've saturated the high school math benchmark, then AIME, and are now at IMO Gold. Where does this go? As fast as recent AI progress has been, I fully expect the trend to continue. Importantly, I think we're close to AI substantially contributing to scientific discovery. There's a big difference between AI slightly below top human performance versus slightly above. Now, this is obviously something that Sam Altman talks about all the time, that he thinks 2026 is the year that we start to get actual scientific advancement from AI, which would be a fundamentally

Starting point is 00:17:20 different place than we are now. Now, of course, all of this really begs the question of where we are on the journey towards AGI, or however we want to describe the next clear phase in AI's existence. This kind of generalized reasoning seems like a big unlock. Until now, reinforcement learning training required very clear, verifiable results. Now, you can extend that concept a little to more subjective tasks like writing, but a person still needs to be able to decide if a response is correct or incorrect. Whatever the Open AI research team pulled off, sounds like it used a different method of training that generalizes far better. Imam Mastak again wrote,

Starting point is 00:17:53 This was a year earlier than I expected. Anon, are you still smarter than this stochastic parrot? Being able to infer for hours as one of those takeoff unlocks. He continued, AGI is already here. All the components exist, we just need to stitch them together. It's artificial general intelligence, not artificial top percentile human intelligence.

Starting point is 00:18:12 Two years ago, who would have said an IMO gold medal in topping benchmarks isn't AGI? Will Brown, a reinforcement learning specialist at Prime Intellect posted, I'm much more inclined to say that the RL system inside OpenAI is AGI rather than any fixed model checkpoint which comes out of it. But really what you want is an interface for self-improvement that looks more like email than software engineering. You want to be able to tell it to go get better at PowerPoint and then it figures out how to get durably better. Now recall that Sam Altman in recent essays has said that they feel like they know how to achieve AGI but they just need to iterate on it internally. What does he have to say about all this? Altman tweeted, we achieved gold

Starting point is 00:18:48 metal-level performance on the 2025 IML competition with a general purpose reasoning system. To emphasize, this is an LLM doing math and not a specific formal math system. It is part of our main push towards general intelligence. When we first started Open AI, this was a dream, but not one that felt very realistic to us. It is a significant marker of how far AI has come over the past decade. We're releasing GPD-5 soon, but want to set accurate expectations. This is an experimental model that incorporates new research techniques we will use in future

Starting point is 00:19:16 models. We think you will love GPT-5, but we don't plan to release a model with IMO gold level of capability for many months. Basically, the model that they used in this IMO test was more advanced than GPT-5. So in this case, we have Altman shifting back to trying to tamp down expectations, which is sort of the opposite direction that he's been running recently, or at least properly make people understand that they shouldn't expect this level of performance out of the next big GPT release. Now, one additional benchmark note before we talk a little bit about GPD 5, is that last week, Arc announced a preview of Arc AGI 3. Arc's AGI 2 was already one of the hardest tests when it came to determining how

Starting point is 00:19:55 capable of thinking like humans and AI is, but they call Arc AGI3 the interactive reasoning benchmark with the widest gap between easy for humans and hard for AI. It's a game-based system. They're releasing three games or environments, with a starting score of Frontier AI at 0% and humans at 100%. They write, every game environment is novel, unique, and only requires core knowledge priors. No language, trivia, or specialized knowledge is needed to beat the games. Your ability to efficiently adapt to novelty defines your intelligence, not your performance on a single skill.

Starting point is 00:20:24 Harder puzzles don't prove smarter AI, but rather its ability to learn new rules does. Arc Prize exists to operationalize that insight. Agents' Ark Prize points out are now the frontier. They perceive, plan, act, remember, adapt. Static puzzles aren't equipped to grade that loop. We need interactive benchmarks that test world model building and long horizon planning under sparse feedback. And that's where Arc AGI 3 comes in. In total, it's going to be six games, three of which are live today and three of which will go live in August that are easy for humans but out of reach for today's best AI. So this is something we will look at more and dig into a little bit as companies start testing their models against this, but the point is that we're continuing to see

Starting point is 00:21:01 advancements in how we even test for whatever AGI actually is. Back to GPT5 though, however good it ends up being the rumor mill is running rampant. Yucheng Jin of Hyperbolic Labs writes, Her GPT-5 is imminent from a little bird. It's not one model but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. That's why Sam said they'd fixed model naming. Prompts will just auto-rout to the next model. GPT6 is in training.

Starting point is 00:21:28 Now, although Sam tried to tamp down expectations after the International Math Olympiad gold, Ethan Mollick writes, even if GPT-5 did nothing besides switching people between 03 and 40 automatically, it would really transform most people's view of AI. Very few people even paying users know that they should often switch to a more capable model, and when you show them 03, they're impressed. And if you're looking for one more piece of evidence that whatever they got cooking in the OpenAI lab is serious, whether it's GBT5, GPD6, or further on, let's go back to the talent wars.

Starting point is 00:21:58 The Wall Street Journal reports that more than 10 OpenAI researchers were offered $300 million four-year packages to make the jump to meta, and that many have turned it down. Professional leaker Jimmy Apples commented, Open AI staff declining $300 million packages and you don't feel the AGII? Lots and lots of intrigue to watch, but for now, that is going to do it for today's AI Daily Brief. Actually, I have a quick addendum before we wrap up here. The general pattern for the AI Daily Brief is that I'll record in the morning. It gets edited in the afternoon and comes out in the evening.

Starting point is 00:22:28 And sometimes we get some update between when I record and when it comes out that is particularly meaningful. When it came to this story today about the International Math Olympiad, everything was obviously about OpenAI. However, there had been some scuttle butt that they weren't the only one to achieve this level of results. Over the weekend, Jasper on X wrote, just saw a post from Joseph Myers involved in the Math Olympiad since 1992. The IMO committee reportedly asked AI labs not to publish results until seven days after the closing ceremony, out of respect for human contestants and likely to allow time for proper verification of AI submissions and formats. According to Joseph, OpenAI didn't collaborate with the IMO to test their model,

Starting point is 00:23:07 and none of the 91 official IMO coordinators were involved in grading its solutions. Meanwhile, it seems deep mind is following the rules and patiently waiting their turn. Now, I have no knowledge at all around what was communicated or not to these different firms, not at all interested in any sort of spat therein, but I did want to add that it is not just this new experimental open AI model that got the equivalent of a gold at the IMO. Around noon today, Eastern Time, the Google team took to Twitter slash X to shout the good news that an advanced version of Gemini and DeepThink had also gotten a score consistent with a gold placement. In their announcement post, they quote, Dr. Gregor Dallinar, who wrote,

Starting point is 00:23:44 We can confirm that Google DeepMind has reached the much-desired milestone earning 35 out of a possible 42 points, a gold medal score. Their solutions were astonishing in many respects. IMO graders found them to be clear, precise, and most of them easy to follow. Google DeepMind's CEO, Damas Hasabas, gave a little bit more information on the approach. He wrote, We achieved this year's impressive result using an advance. version of Gemini Deep Think, an enhanced reasoning mode for complex problems. Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions, all within the four-and-a-half-hour competition time limit. We'll be making a version of

Starting point is 00:24:19 this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI ultra-subscribers. So like OpenAI, it sounds like this is a most advanced model, one that is not necessarily on the immediate term horizon when it comes to general consumer or enterprise use, and Demis also did seem to take a swing at OpenAI. He followed that tweet with, by the way, as an aside, we didn't announce on Friday because we respected the IMO board's original request that all AI labs share their results only after the official results had been verified by independent experts, and the students had rightly received the acclamation they deserve. We've now been given permission to share our results and are pleased to have been

Starting point is 00:24:55 part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performing grading for an AI Now, again, I am completely uninterested in getting involved in the back and forth between these two companies. However, as the takeaway and why I thought it was important to come back and add this addendum, first of all, it appears that Google got the same result at the same time, and so we don't want to just tell the story of this being an Open AI triumph. Instead, what it tells the story of is that, generally speaking, the state of the art now includes gold medal performance at the IMO. In fact, I think that everything that we have talked about,

Starting point is 00:25:31 The implications of this for AGI and how fast things are changing are even more reinforced by the fact that it was not one but two labs who got this result at the same event. So congrats to the team at Google as well, and let's hope we get our hands on that model soon. Once again and for real this time, appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - AI Just Achieved Something No One Thought it Would Until Years From Now

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.