The AI Daily Brief: Artificial Intelligence News and Analysis - Just How Fast is AI Evolving?

Starting point is 00:00:00 Today on the AI Daily Brief, just how fast is AI evolving? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link at our show notes. Hello, friends, back with another long-reads episode of the AI Daily Brief. Today we turn once again to Professor Ethan Mollock's One Useful Thing blog, reading a piece from now a couple of weeks ago called Prophecies of the Flood, what to make of the statements of the AI Labs. The setup and context of the piece is something that we've been talking about,

Starting point is 00:00:35 about a lot on the show for the last month or so, which is the sense that the tide is rising to continue the water analogy, that AGI is getting closer and closer, the capabilities are increasing, and that something big is on the horizon. As usual, what we're going to do today is turn it over to an 11 Labs version of myself to read the piece, and then I will come back and give you some thoughts of my own to close it out. Prophecies of the Flood Recently something shifted in the AI industry. Researchers began speaking urgently about the arrival of super smart AI systems, a flood of intelligence, not in some distant future, but imminently. They often refer to AGI, artificial general intelligence, defined, albeit imprecisely,

Starting point is 00:01:17 as machines that can outperform expert humans across most intellectual tasks. This availability of intelligence on demand will, they argue, change society deeply and will change it soon. There are plenty of reasons to not believe insiders as they have clear incentives to make bold predictions. They're raising capital, boosting stock valuations, and perhaps convincing themselves of their own historical importance. Their technologists, not profits,

Starting point is 00:01:39 and the track record of technological predictions is littered with confident declarations that turned out to be decades premature. Even setting aside these human biases, the underlying technology itself gives us reason for doubt. Today's large language models, despite their impressive capabilities, remain fundamentally inconsistent tools,

Starting point is 00:01:56 brilliant at some tasks while stumbling over seemingly simpler ones. This jagged frontier is a core characteristic of current AI systems, one that won't be easily smoothed away. Plus, even assuming researchers are right about reaching AGI in the next year or two, they are likely overestimating the speed at which humans can adopt and adjust to a technology. Changes to organizations take a long time.

Starting point is 00:02:17 Changes to systems of work, life and education are slower still, and technologies need to find specific uses that matter in the world, which is itself a slow process. We could have AGI right now and most people wouldn't notice. Indeed, some observers have suggested that has already happened, arguing that the latest AI models like Claude 3.5 are effectively AGI1. Yet dismissing these predictions as mere hype may not be helpful. Whatever their incentives, the researchers and engineers inside AI labs

Starting point is 00:02:44 appear genuinely convinced they're witnessing the emergence of something unprecedented. Their certainty alone wouldn't matter, except that increasingly public benchmarks and demonstrations are beginning to hint at why they might believe we're approaching a fundamental shift in AI capabilities. The water, as it were, seems to be rising faster than expected. The event that kicked off the most speculation was the reveal of a new model by OpenAI called 03 in late December. No one outside of OpenAI has really used this system yet, but it is the successor to 01, which is already very impressive too. The O3 model is one of the new generation of reasoners, AI models that take extra time to think before answering questions,

Starting point is 00:03:21 which greatly improves their ability to solve hard problems. OpenAI provided a number of startling benchmarks for O3 that suggest a large advance over 01, and indeed, over where we thought the state-of-the-art in AI was. Three benchmarks in particular deserve a little attention. The first is the called the Graduate-Level Google-proof Qanda test, GpQA, and it is supposed to test high-level knowledge with a series of multiple-choice problems that even Google can't help you with. PhDs with access to the Internet got 34% of the questions

Starting point is 00:03:50 right on this test outside their specialty, and 81% right inside their specialty. When tested, 03 achieved 87% beating human experts for the first time. The second is Frontier Math, a set of private math problems created by mathematicians to be incredibly hard to solve, and indeed no AI ever scored higher than 2%, until 03, which got 25% right. The final benchmark is ARC AGI, a rather famous test of fluid intelligence that was designed to be relatively easy for humans, but hard for AIs.

Starting point is 00:04:19 Again, O3 beat all previous AIs as well as the baseline human level on the test, scoring 87.5%. All of these tests come with significant caveats, but they suggest, that what we previously considered unpassable barriers to AI performance may actually be beaten quite quickly. As AIs get smarter, they become more effective agents, another ill-defined term, see a pattern, that generally means an AI given the ability to act autonomously towards achieving a set of goals. I have demonstrated some of the early agenic systems in previous posts, but I think the past few weeks have also shown us that practical agents, at least for narrow but economically important areas, are now viable. A nice example of that is Google's Gemini with deep research, accessible to

Starting point is 00:04:59 everyone who subscribes to Gemini, which is really a specialized research agent. I gave it a topic like, research a comparison of ways of funding startup companies from the perspective of founders for high growth ventures, and the agentic system came up with a plan, read through 173 websites, and compiled a report for me with the answer a few minutes later. The result was a 17-page paper with 118 references, but is it any good? I've taught the introductory entrepreneurship class at Wharton for over a decade published on the topic, started companies myself, and, and, and and even wrote a book on entrepreneurship, and I think this is pretty solid. I didn't spot any obvious errors, but you can read it yourself if you would like here. The biggest issue is not

Starting point is 00:05:37 accuracy, but that the agent is limited to public non-paywalled websites and not scholarly or premium publications. It also is a bit shallow and does not make strong arguments in the face of conflicting evidence. So not as good as the best humans, but better than a lot of reports that I see. Still, this is a genuinely disruptive example of an agent with real value. Researching and report writing is a major task of many jobs. What deep research accomplished in three minutes would have taken a human many hours, though they might have added more nuanced analysis. Anyone writing a research report should probably try deep research and see how it works as a starting place, even though a good final report will still require a human touch. I had a chance to speak with the leader of the

Starting point is 00:06:17 Deep Research Project, where I learned that it is just a pilot project from a small team. I thus suspect that other groups and companies that were highly incentivized to create narrow but effective agents would be able to do so. Narrow agents are now a real product rather than a future possibility. There are already many coding agents, and you can use experimental open source agents that do scientific and financial research. Narrow agents are specialized for a particular task, which means they are somewhat limited. That raises the question of whether we soon see generalist agents where you can just ask the AI anything, and it will use a computer and the internet to do it. Simon Willison thinks not, despite what Sam Altman has argued. We will learn more as the year progresses,

Starting point is 00:06:55 but if general agenic systems work reliably and safely, that really will change things, as it allows smart AIs to take action in the world. Agents and very smart models are the core elements needed for transformative AI, but there are many other pieces as well that seem to be making rapid progress. This includes advances in how much AIs can remember, context windows, and multimodal capabilities that allow them to see and speak. It can be helpful to look back a little to get a sense of progress. For example, I have been testing the prompt, Otter on a plane using Wi-Fi for image,

Starting point is 00:07:25 and video models since before ChatGPT came out. In October 2023, that prompt got you this terrifying monstrosity. Less than 18 months later, multiple image creation tools nail the prompt. The result is that I have had to figure out something more challenging. This is an example of benchmark saturation, where old benchmarks get beaten by the AI. I decided to take a few minutes and see how far I could get with Google's VEO2 video model in producing a movie of the Otter's journey. The video you see below took less than 15 minutes of active work, although I had to wait a bit for the videos to be created. Take a look at the quality of the shadows and light. I especially appreciate how the otter opens the computer at the end. And to up the ante even further, I decided to

Starting point is 00:08:05 turn the saga of the otter into a 1980s style science fiction anime, featuring otters in space and a period appropriate theme song, thanks to Suno. Again, very little human work was involved. Given all of this, how seriously should we take the claims of the AI labs that a flood of intelligence is coming? even if we only consider what we've already seen, the O3 benchmarks shattering previous barriers, narrow agents conducting complex research and multimodal systems creating increasingly sophisticated content. We're looking at capabilities that could transform many knowledge-based tasks. And yet the labs insist this is merely the start, that far more capable systems and general agents are imminent. What concerns me most isn't whether the labs are right about this timeline.

Starting point is 00:08:48 It's that we're not adequately preparing for what even current levels of AI can do, let alone the chance that they might be correct. While AI researchers are focused on alignment, ensuring AI systems act ethically and responsibly, far fewer voices are trying to envision and articulate what a world awash in artificial intelligence might actually look like. This isn't just about the technology itself. It's about how we choose to shape and deploy it. These aren't questions that AI developers alone can or should answer. They're questions that demand attention from organizational leaders who will need to navigate this transition, from employees whose work lives may transform and from stakeholders whose futures may depend on these decisions. The flood of

Starting point is 00:09:26 intelligence that may be coming isn't inherently good or bad, but how we prepare for it, how we adapt to it, and most importantly, how we choose to use it, will determine whether it becomes a force for progress or disruption. The time to start having these conversations isn't after the water starts rising. It's now. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a season secure security professionals scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701. Centralized security

Starting point is 00:10:09 workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory who use VANT to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nLW. That's V-A-N-T-A.com for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming.

Starting point is 00:10:54 Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.a.i. Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market.

Starting point is 00:11:44 Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI quarterly Pulse survey. Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption.

Starting point is 00:12:08 KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI. they're leading the charge with practical solutions and real-world applications. For instance, over half of the organization surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at KPMG.com slash US.

Starting point is 00:12:37 All right, back to the real non-AI-NLW here. As usual, Ethan does a great job, I think, of summing up a lot of what's going on as well as a lot of the sentiment out there. It has definitely been the case that a vibe has shifted. The labs seem more and more comfortable and even eager to talk about how quickly AGI is coming. Reasoning models are the watchword of the moment. And there have also been some advances that make it feel like not only are these things coming, but they're likely to be very widely accessible. The Chinese model Deepseek, which has everyone in such a tizzy here, because of how close it performs to open AI models at a tiny fraction of the cost, has everyone

Starting point is 00:13:13 thinking even more about what the implications of incredibly cheap and abundant intelligence really are. Also, since this piece was released, we got the release of OpenAI's operator, which, while still limited in what it can do, is the sort of generalist agent that Ethan is talking about. There was an interesting interview with venture capitalist Chris Saka earlier this week with Tim Ferriss, where Saka became the latest person to articulate just how disruptive this wave of new intelligence and cheap and abundant intelligence could really be when it comes to people's jobs and livelihoods. As I've said before, I think that this transition, while hugely full of potential, will require nothing less than a total reevaluation of the social contract. A new way of thinking about work, a new way of thinking about expectations, a new way of thinking about how we judge her own value, and so much more.

Starting point is 00:13:58 I don't know if things are moving faster or if it just feels like it. I do think that things that have been theoretical for some number of years are now moving into production. I think that this year we're going to see more and more people actually deploying agents in a way that makes the assistant era of AI look quaint. And I agree with Ethan wholeheartedly that the time to be having these conversations about what we want out of a society that has AI embedded is now. I don't think we're turning back the tide. But that doesn't mean that we have no agency in the world that's being created. Big ponderous thoughts for your weekend. And with that, we will close the AI Daily Brief.

Starting point is 00:14:30 Appreciate you listening, as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Just How Fast is AI Evolving?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.