The AI Daily Brief: Artificial Intelligence News and Analysis - How to Get to AGI

Starting point is 00:00:00 Today on the AI Daily Brief, the jagged frontier of AI capabilities and how to get to AGI. Before that in the headlines, the latest in AI lawsuits. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, robots and pencils, blitzie, and super intelligent. And to get an ad-free version of the show, go to patreon.com. Lastly, if you are interested in sponsoring the show, shoot us a note, at sponsors at AIDDaily Brief.A.I. Welcome back to the AI Daily Brief Headlines Edition,

Starting point is 00:00:37 all the daily AI news you need in around five minutes. One of the big questions surrounding AI right now is just how disruptive the shift from the traditional search paradigm of blue links that take you to a publisher's website to now AI overviews, be they on Google or in chat GPT or wherever people are getting their information. It's very clear that this is going to shift the fundamental business model of the internet, and it's likely to do so in ways that are a little bumpy along the path. And while Google has argued that reports about LLMs cutting into search traffic are overstated, some publishers seem not to agree. Penske Media, which publishes the Rolling Stone, the Hollywood Reporter, Billboard, and Variety, claim that they've seen a substantial drop in traffic

Starting point is 00:01:19 and a one-third decrease in advertising revenue this year, and has subsequently sued Google. Penske attributes the decline directly to AI overviews, claiming that there's little reason for users to click through to source articles anymore. Penske said, we have a responsibility to proactively fight for the future of digital media and preserve its integrity, all of which is threatened by Google's current actions. Now, interestingly, the lawsuit is not grounded in copyright infringement. News content typically doesn't enjoy the same type of copyright protection when paraphrased because facts about the world are in the public domain. Instead, the lawsuit argues that Google used its position in the market to push unfair terms on website publishers. Google, they claim, hasn't allowed websites

Starting point is 00:01:58 to distinguish between crawling for the purposes of appearing in search results, or to be summarized as an AI overview. That means that Penske can only opt out of AI overviews if they also delist from Google search entirely. Danielle Coffey, the CEO of the News and Media Alliance commented, all of the elements being negotiated with every other AI company doesn't apply to Google because they have the market power to not engage in those healthy practices.

Starting point is 00:02:19 When you have that massive scale and market power that Google has, you're not obligated to abide by the same norms. That is the problem. Coffee was referring, of course, to the content licensing deals that are being struck with OpenAI, perplexity, and other AI firms. In short, Penske is alleging that Google used their monopoly on search

Starting point is 00:02:33 as leverage to extract content without striking similar deals. Now, the lawsuit itself is largely a duplicate of a case filed by education platform Chegg in February. The same boutique legal firm is representing both plaintiffs. Still, for Google's part, they continue to claim that AI overviews are a manifestation of changing user preferences.

Starting point is 00:02:49 Said a spokesperson, with AI overviews, people find search more helpful and use it more, creating more opportunities for content to be discovered. We will defend against these meritless claims. Now, as I said, the rise of AI search has triggered a huge rethink on the fundamental business model of the internet. And frankly, Google's free access to website data is increasingly viewed as a core issue. Speaking at a conference hosted by Fortune earlier last week,

Starting point is 00:03:12 Neil Vogel, the CEO of People, Inc., went so far as to call Google a bad actor. People operates 40 different brands and is the largest digital and print publisher in the U.S. Vogel said, Google has one crawler, which means they use the same crawler for their search, where they still send us traffic as they do for their AI products where they steal our content. Foreshadowing the issue raised in the Rolling Stone lawsuit, Vogel noted that he can't block Google's crawlers without losing the roughly 20% of traffic that comes from the search engine. He commented, they know this and they're not splitting their crawler, so they are an intentionally bad actor here. Interestingly, in contrast, Vogel referred to OpenAI and the other

Starting point is 00:03:46 AI companies that have signed content deals as good actors. It's very clear that this is an issue on the rise. For example, we've recently covered Cloudflare's approach to building a system to block AI crawlers alongside a marketplace to facilitate a new paper crawl internet economy. And speaking at that same event, Cloudflare CEO Matthew Prince commented on the need for a tech solution rather than a legal one. He said, I think that it's a fool's errand to go down that path, because in copyright law, typically, the more derivative something is, the more it's protected under fair use. What these AI companies are doing is that they're actually creating derivatives. Even so, Prince noted that no one wins if the internet business model dies, least of all Google. He commented, internally they're having

Starting point is 00:04:24 massive fights about what they do, and my prediction is that, by this time next year, Google will be paying content creators for crawling their content and taking it and putting it in AI models. Now, this is clearly one of those headline topics that could be a main. In fact, it sort of has been a main in the past, but there are a few other interesting things today, so let's get on to them. Moving over to the talent wars, or the not-so-talent wars, depending on your perspective, a former Siri leader has left Apple, extending the string of high-profile AI departures. Bloomberg's Apple guru, Mark Gurman, reports that Robbie Walker is planning to leave the company next month. Walker was one of the few direct reports to AI chief

Starting point is 00:04:57 John Gionandria, who was in charge of Siri until ownership of the project was shifted in March. After that, Walker was the senior executive in charge of Apple's answers, information, and knowledge team, which was working on an AI web search product to rival perplexity. German wrote, though Walker's duties and staff were dramatically reduced over the past several months, he had still been an influential part of Apple's AI strategy. His exit also adds to an exodus of executives and engineers from that division. Now, this one generated a very different response than some of the high-profile executive departures that we've seen in the past. Encapsulated by this post from TechGod who writes,

Starting point is 00:05:30 Thank God, Walker was one of the biggest reasons why Siri sucks. Next, we have a little more follow-up in the story about the OpenAI Microsoft deal. OpenAI is apparently set to save $50 billion in revenue-sharing payouts after renegotiating that deal. Reporting from the information uncovered more details of the MoU signed late last week. Microsoft has reportedly agreed to ratchet down their profit-sharing take to 8% by the end of the decade. The deal had previously provided for a 20% share.

Starting point is 00:05:54 The report stated that OpenAI should be able to keep an additional $50 billion in revenue over the next five years based on their current forecasts. Sources said the companies are still negotiating how much OpenAI will need to pay to rent Microsoft servers moving forward, so theoretically Microsoft could make up some of the haircut there through more commercial terms. Finally today, XAI has laid off over 500 workers on their data annotation team. Business Insider obtained a copy of the email sent to affected workers, which stated, After a thorough review of our human data efforts, we've decided to accelerate the expansion

Starting point is 00:06:24 and prioritization of our specialist AI tutors while scaling back our focus on general AI tutor roles. The strategic pivot will take effect immediately. As part of this shift in focus, we no longer need most generalist AI tutor positions, and your employment with XAI will conclude. Now, the data annotation team is XAI's largest and had around 1,500 employees prior to the layoffs, meaning that this downsizing is a substantial number of the overall headcount at the company. annotation staff reportedly spent the week in one-on-ones to outline their responsibilities and achievements. They were told on Thursday to drop everything and complete a series of tests designed to identify

Starting point is 00:06:57 aptitude and interest in specific topics. The test covered a number of typical domains like STEM, coding, finance, and medicine, but also covered more specialty topics like Grok's personality and model behavior along with, yes, this is an exact quote, shi posters and doom scrollers. Now, for XAI's part, they definitely focused the story on the idea that this was a big pivot away from general annotation into special specialized model training. On Friday night, they tweeted, Specialist AI tutors at XAI are adding huge value. We will immediately surge our specialist AI tutor team by 10x. We're hiring across domains like STEM, finance, medicine, safety, and many more.

Starting point is 00:07:31 Join us to help build truth-seeking AGI. Many of those open positions actually require a master's or PhD in a STEM-related field. Chief Legal Officer Lily Lim posted a huge list of the open positions, commenting, join XAI. We are hiring like crazy. Tinker Chief Growth and Marketing Officer Lomit Patel had an interesting LinkedIn post about this. He wrote, Most of the conversation has focused on the Y, shifting from broad-based data annotation to domain-specific expertise in fields like STEM and finance.

Starting point is 00:07:57 This is a common-sense business decision in the race for AI dominance. However, he continues, it's time we move past the obvious and ask what it means for the rest of us. The real takeaway from XAI's decision is this. Any job that relies on generalized, repetitive, or easily codifiable tasks is now on the chopping blocks. This includes not just data annotators, but every role where the how can be taught to a machine.

Starting point is 00:08:17 The question, he writes, is no longer will AI replace my job, but rather how quickly can I evolve my skills beyond what a machine can do? Boy, is that ever one of the big meta-subjects for everything that we do here at the show. For now that that is going to do it for today's headlines. Next up, the main episode. Small, nimble teams beat bloated consulting every time. Robots and pencils partners with organizations on intelligent cloud-native systems powered by AI. They cover human needs, design AI solutions, and cut-through complexity. to deliver meaningful impact without the layers of bureaucracy. As an AWS-certified partner, Robots & Pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the U.S., Canada, Europe, and Latin America, clients gain local expertise and

Starting point is 00:09:01 global scale. As AI evolves, they ensure you keep peace with change. And that means faster results, measurable outcomes, and a partnership built to last. The right partner makes progress inevitable. Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80%

Starting point is 00:09:43 plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie as their pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project in your org. Visit blitzy.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted

Starting point is 00:10:15 to AI Native. That's B-L-I-T-ZY.com. If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point. But I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial readiness report. Over the last six months, Super Intelligence has built out an entire Agent Planning suite. We help you move from Discovery to Planning to Implementation. After you've completed your Agent Readiness Audits, we help you double-click on your most important. use cases with what we call our use case planning reports. These reports are going to help you

Starting point is 00:10:50 understand what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination. After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for. If you want to learn more about super intelligence agent planning suite, we've built a custom GPT to answer your questions. Just go to bit.ly slash super super super agent. That's bit.l.ly slash super super agent, all one word. And if you have any questions, the agent can even help you book an appointment with our team.

Starting point is 00:11:32 Welcome back to the AI Daily Brief. Today we are talking about what it takes to get to AGI. This is sort of a perpetual background conversation that is interesting, I think, in two very different ways. There are, of course, all of the technical aspects of it, what it's going to take from a development standpoint to actually achieve AGI, which is something we're going to get into a bunch today. But then there's also the practical dimension of this, and to what extent it actually impacts the way that you as an AI practitioner have to think about these systems today. Now, interestingly, we have comments from Google DeepMind CEO Demas Sassabas that sort of span the gap between both of those. This came from the recent All-In Summit, where Demas sketched out

Starting point is 00:12:12 what he sees as coming in the next few years. Unsurprisingly, he is very bullish. The final question of the day was what 10 years looks like from now, and he said we'll have full AGII by then, and it will usher in a new golden era of science, all of which is very exciting, and there is actually a lot to dig into around what DeepMind and Google are doing

Starting point is 00:12:29 with some of the advanced science. However, that's the topic for a different show. Earlier in the interview, he had made an interesting assertion suggesting that we are very far from that goal. Let's listen to what he had to say. Some of our competitors talk about, you know, these modern systems that we have today are PhD intelligences. I think that's a nonsense. They're not PhD intelligences. They have some capabilities that are PhD level,

Starting point is 00:12:53 but they're not in general capable, and that's exactly what general intelligence should be, of performing across the board at the PhD level. In fact, as we all know, interacting with today's chatbots, if you pose the question in a certain way, they can make simple mistakes with, even like high school maths and simple counting. So that shouldn't be possible for a true AGI system. So this is a really interesting conversation. And something that I think is lost in the nuance a little bit. Hasabas is basically taking task the idea that AI's R-PHD replacement level,

Starting point is 00:13:30 because while they can do some things at that level, other things that should be trivially easy they struggle with. This is the jagged frontier of AI capability. As you might imagine, this generated a huge. huge amount of discussion online. Harvard Professor Davidson-Clair, who's working on anti-aging technology posted, respect Demis, but disagree on this point. My lab is using a novel AI system that makes non-intuitive scientific discoveries and writes up the paper plus figures with no human intervention, easily at a PhD level. Biomedical professor Daria Anutmask commented,

Starting point is 00:14:00 I've great respect for Demis's opinions, but on this I disagree. I've trained a dozen PhD students and I can confidently claim that current state-of-the-art AI models like GPT5 Pro operate at a much higher level. Also, even I would fail at some high school math, though I was a top math student. He could be right if he were only referring to the top 1% of PhD students who are at a super genius level. Even then, I am sure the next major updates to AI models will surpass them as well. OpenAI research scientist Aidan McLaughlin writes, Demis is of course right here. No one should fire their PhDs for GPT5. Rather, we've democratized the experience of being a 10-year-old growing up in Cambridge,

Starting point is 00:14:35 who can ring random doorbells and ask experts about hologram theory or Sumerian history. everyone gets a PhD in their pocket. So which interesting here is that these comments cut right to the middle of this question of how much the designation of AGI matters. For the labs whose goal is this sort of generalized intelligence, it really matters, right? AI commentator Cole Tragoski's writes, the ceiling capabilities are undoubtedly rising, but the floor capabilities are still lacking.

Starting point is 00:15:02 Only when these floor capabilities are solved, can we talk about these now mostly meaningless terms like AGI and ASI? And yet when it comes to the work we do, obviously if you're a business who is just trying to figure out how AI can help you, you don't need to care how AI does across a full range of tasks. You need to care how well it does across your specific task. Now, in that context, though, where the jagged frontier still matters is around just how much autonomy an AI or an agent can be given and still complete the job well. And this is the frontier that matters for the business world, as opposed to, again, some scientific definition of AGI. And yet, even if that is our measure, the frontier when it comes to autonomy for productive work tasks is jagged as well. And if that is true, given that we are looking at jagged frontiers, both when it comes to generalist definitions of AGI, but also when it comes to applied autonomy, think it's worth spending a little bit of time understanding what people think are the barriers to unlocking that next level.

Starting point is 00:15:58 However, we define it. Immediately following those comments about PhDs, Demis actually spoke a little bit about this. Here's what he said. So I think that we are maybe, you know, I would say sort of five to ten years away from having an AGI system that's capable of doing those things. Another thing that's missing is continual learning, this ability to like online teach the system something new or some, or adjust its behavior in some way. And so a lot of these, I think, core capabilities are still missing. And maybe scaling will get us there. But I feel, if I was to bet, I think there are probably one or two missing breakthroughs that are still required. and will come over the next five or so years. Earlier this year, podcaster Dwar Keshe Patel

Starting point is 00:16:40 wrote a post and released a video called Why I Don't Think AGI is right around the corner. There's a lot of really valuable stuff in there. It is definitely worth checking out if you haven't yet. You can find it at Dwar Keshe.com, but the thing that he hones in on is exactly the thing that we just heard from Demis, which is this idea of continual learning.

Starting point is 00:16:57 Dwar Keshe writes, the fundamental problem is that LLMs don't get better over time the way a human would. The lack of continual learning is a human. huge, huge problem. The LLM baseline at many tasks might be higher than an average humans, but there's no way to give a model high-level feedback. You're stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice, this just doesn't produce anything even close to the kind of learning and improvement that human employees experience.

Starting point is 00:17:20 The reason humans are so useful is not mainly their raw intelligence. It's their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice the task. Now, from there, he talks about the various strategies that LLMs take to quote-unquote learn right now, but the point that he keeps coming back to is summed up in the sentence, it's just not a deliberative adaptive process the way human learning is. He writes, eventually the models will be able to learn on the job in the subtle organic ways that humans can. However, it's just hard for me to see how that could happen within the next few years, given that there's no obvious way to slot in online continuous learning into the kinds

Starting point is 00:17:53 of models these LLMs are. Alongside Duar Kesh and Demis, OpenAI co-founder Andre Carpathy also sees a lack of continuous learning as a key gap for LLMs. He posted on Twitter slash X. Agree that this is an important capability hole right now. I like to explain it as LLMs are a bit like a co-worker with interro-grade amnesia. They don't consolidate or build long-running knowledge or expertise once training is over, and all they have is short-term memory, the context window. It's hard to build a relationship, see 51st dates, or do work, see Memento with this condition.

Starting point is 00:18:23 The first mitigation of this deficit I saw is the memory feature in chat CBT, which feels like a primordial crappy implementation of what could be. There might be other and better ways to do it, but I agree. it feels like it needs to be in the realm of research. And there is definitely a lot of research focused on this. Rich Sutton, the author of the bitter lesson paper, has come up with an architecture that seems to show some promise. It's essentially a system of agents that can do reinforcement learning at runtime

Starting point is 00:18:45 in the same way they can do planning before execution. McKenzie Moorhead of Compound VC recently conducted a study of the various methods being explored today. He commented, overall, we expect the current paradigms of base model training and inference reasoning plus memory slash rag will get us to AIs that can handle entire real world workflows in the next few years,

Starting point is 00:19:02 but we will get new primitives. Memory is an area of this question that is generating a ton of discussion as well. Last month, for example, OpenAI CEO Sam Altman said that improving memory was one of the big focuses for GPT6. He said people want memory, people want product features that require us to be able to understand them. There is quite obviously something fundamentally different about what AI can offer if it truly has persistent memory between sessions. It is certainly not the same or at the same level as continuous learning, but it does feel like an essential step. And even with the nascent implementations within current LLMs, you can see how big of a difference it makes. At this stage, for example, for all of my strategic planning use cases, I feel fairly locked in to chat GBT

Starting point is 00:19:43 because it has much better memory and context of everything that I've talked about with it previously. If I was to switch over to Claude or Grock, I'd have to give a ton of that background context before I could even get into whatever it is that I wanted to discuss in that particular moment. Now, that doesn't mean that I'm not a voracious model switcher when it comes to other use cases. I've talked a lot about how I think that is one of the keys to getting the most out of AI right now is being model omnivorous, but it's clear that memory makes a big difference. So much so that you even have some folks, like Andrew Paganelli of the General Intelligence Company, arguing that memory is the last problem before AGI is reached. He wrote,

Starting point is 00:20:18 Our systems today get the interaction part right, in terms of a Turing test for interaction we're basically all the way there. But that's only half of what's needed to make a digital self. memory is a severely lagging interaction and the next step. Once that's solved, we'll be very close. The first AGI will be a very intelligent processor combined with a very good memory system. So how are we going to get there? One of the areas that people are looking at is coding.

Starting point is 00:20:42 You might remember in an episode recently we were talking about Cognition's big funding round and latent space creator Sean Wang's decision to move over to Cognition, he published a blog post called The Devon is in the details, and dropped this rather bold claim pretty casually. the central realization I had was this. Code AGI will be achieved in 20% of the time of full AGI and capture 80% of the value of AGI. This is what led him to decide to go over to cognition full time. Nick Pash, the head of AI at Open Source Coding Agent Klein, took the conversation in a slightly

Starting point is 00:21:12 different direction, saying, in his words, coding agent platforms have become key players in accelerating progress towards AGI. Interestingly, he kind of argues that most coding agent startups are fairly blind to their role in the AGI race. In an essay published this weekend, he described the biggest roadblock to AGI as a data starvation issue. The Pash laid out that Frontier model companies don't currently have access to the full picture of how developers use their platforms. They don't have access to the full codebases, they don't have repos, and critically,

Starting point is 00:21:38 they don't see what happens when a user turns off the AI or switches models. He quoted one researcher who stated, we have the prompts, but we don't know what the status of the repo is. Pash wrote, One Lab emphasized the urgency. If they could access this type of real-world coding data, it would be incorporated into training by the end of the night. Another described needing representative tasks at meaningful scale with authentic human preferences to build models that actually work in production.

Starting point is 00:22:02 You can have well-funded frontier model labs, but without coding agent platforms collecting real-world usage data, AGI simply doesn't happen. This creates a marriage between model and application layers that most people completely miss. The application layer isn't just a business model built on top of models. It's the prerequisite for unlocking the coding capabilities that serve, as humanity's speed multiplier in the race towards AGI. If you go on Twitter slash X right now and just search AGI, you can find a ton of discussion about what it's going to take.

Starting point is 00:22:30 Continuous learning, memory, there are other speculations as well. And while I have argued and still believe that when it comes to businesses thinking about AI, AGI is just about the most useless term in the trough. I think that the explorations and developments on the path to AGI are useful in understanding what new types of capability and autonomy, especially when it comes to agents, get unlocked, at each different development. Whether it gets us to quote-unquote AGI or not, for example, more memory and or continuous learning would have huge implications for how these systems could be deployed in the enterprise. So that's the story for today. This is obviously a very

Starting point is 00:23:04 ongoing conversation. And sometimes we get these hints from labs that they have way more of an idea about how to get to AGI than they're telling us. And given that we are heading right back into the fall announcement season, maybe we'll hear about some of that in the weeks or months to come. For now that that's going to do it for today's AID Daily brief, Appreciate you listening or watching as always. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - How to Get to AGI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.