The AI Daily Brief: Artificial Intelligence News and Analysis - Just How Fast is AI Evolving?
Episode Date: January 26, 2025Between reasoning models and agents, things feel like they're heating up. But just how fast are things really going? A reading and discussion inspired by https://www.oneusefulthing.org/p/prophecie...s-of-the-flood Brought to you by: KPMG – Go to www.kpmg.us/ai to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - https://vanta.com/nlw The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, just how fast is AI evolving?
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link at our show notes.
Hello, friends, back with another long-reads episode of the AI Daily Brief.
Today we turn once again to Professor Ethan Mollock's One Useful Thing blog,
reading a piece from now a couple of weeks ago called Prophecies of the Flood,
what to make of the statements of the AI Labs.
The setup and context of the piece is something that we've been talking about,
about a lot on the show for the last month or so, which is the sense that the tide is rising
to continue the water analogy, that AGI is getting closer and closer, the capabilities are
increasing, and that something big is on the horizon. As usual, what we're going to do today
is turn it over to an 11 Labs version of myself to read the piece, and then I will come back
and give you some thoughts of my own to close it out. Prophecies of the Flood
Recently something shifted in the AI industry. Researchers began speaking urgently about the
arrival of super smart AI systems, a flood of intelligence, not in some distant future, but imminently.
They often refer to AGI, artificial general intelligence, defined, albeit imprecisely,
as machines that can outperform expert humans across most intellectual tasks.
This availability of intelligence on demand will, they argue, change society deeply and will
change it soon.
There are plenty of reasons to not believe insiders as they have clear incentives to make bold
predictions.
They're raising capital, boosting stock valuations, and perhaps convincing themselves
of their own historical importance.
Their technologists, not profits,
and the track record of technological predictions
is littered with confident declarations
that turned out to be decades premature.
Even setting aside these human biases,
the underlying technology itself gives us reason for doubt.
Today's large language models,
despite their impressive capabilities,
remain fundamentally inconsistent tools,
brilliant at some tasks
while stumbling over seemingly simpler ones.
This jagged frontier is a core characteristic
of current AI systems,
one that won't be easily smoothed away.
Plus, even assuming researchers are right about reaching AGI in the next year or two,
they are likely overestimating the speed at which humans can adopt and adjust to a technology.
Changes to organizations take a long time.
Changes to systems of work, life and education are slower still,
and technologies need to find specific uses that matter in the world,
which is itself a slow process.
We could have AGI right now and most people wouldn't notice.
Indeed, some observers have suggested that has already happened,
arguing that the latest AI models like Claude 3.5 are effectively AGI1.
Yet dismissing these predictions as mere hype may not be helpful.
Whatever their incentives, the researchers and engineers inside AI labs
appear genuinely convinced they're witnessing the emergence of something unprecedented.
Their certainty alone wouldn't matter,
except that increasingly public benchmarks and demonstrations are beginning to hint
at why they might believe we're approaching a fundamental shift in AI capabilities.
The water, as it were, seems to be rising faster than expected.
The event that kicked off the most speculation was the reveal of a new model by OpenAI called 03 in late December.
No one outside of OpenAI has really used this system yet, but it is the successor to 01, which is already very impressive too.
The O3 model is one of the new generation of reasoners, AI models that take extra time to think before answering questions,
which greatly improves their ability to solve hard problems.
OpenAI provided a number of startling benchmarks for O3 that suggest a large advance over 01, and indeed,
over where we thought the state-of-the-art in AI was.
Three benchmarks in particular deserve a little attention.
The first is the called the Graduate-Level Google-proof Qanda test, GpQA,
and it is supposed to test high-level knowledge
with a series of multiple-choice problems that even Google can't help you with.
PhDs with access to the Internet got 34% of the questions
right on this test outside their specialty,
and 81% right inside their specialty.
When tested, 03 achieved 87% beating human experts for the first time.
The second is Frontier Math, a set of private math problems created by mathematicians to be
incredibly hard to solve, and indeed no AI ever scored higher than 2%, until 03, which got 25%
right.
The final benchmark is ARC AGI, a rather famous test of fluid intelligence that was designed to be
relatively easy for humans, but hard for AIs.
Again, O3 beat all previous AIs as well as the baseline human level on the test, scoring 87.5%.
All of these tests come with significant caveats, but they suggest,
that what we previously considered unpassable barriers to AI performance may actually be beaten
quite quickly. As AIs get smarter, they become more effective agents, another ill-defined term,
see a pattern, that generally means an AI given the ability to act autonomously towards achieving a set of goals.
I have demonstrated some of the early agenic systems in previous posts, but I think the past few weeks
have also shown us that practical agents, at least for narrow but economically important areas, are now viable.
A nice example of that is Google's Gemini with deep research, accessible to
everyone who subscribes to Gemini, which is really a specialized research agent. I gave it a topic
like, research a comparison of ways of funding startup companies from the perspective of founders for
high growth ventures, and the agentic system came up with a plan, read through 173 websites,
and compiled a report for me with the answer a few minutes later. The result was a 17-page paper
with 118 references, but is it any good? I've taught the introductory entrepreneurship class at
Wharton for over a decade published on the topic, started companies myself, and, and, and
and even wrote a book on entrepreneurship, and I think this is pretty solid. I didn't spot any
obvious errors, but you can read it yourself if you would like here. The biggest issue is not
accuracy, but that the agent is limited to public non-paywalled websites and not scholarly or
premium publications. It also is a bit shallow and does not make strong arguments in the face
of conflicting evidence. So not as good as the best humans, but better than a lot of reports that I see.
Still, this is a genuinely disruptive example of an agent with real value. Researching and report writing
is a major task of many jobs. What deep research accomplished in three minutes would have taken a
human many hours, though they might have added more nuanced analysis. Anyone writing a research report
should probably try deep research and see how it works as a starting place, even though a good
final report will still require a human touch. I had a chance to speak with the leader of the
Deep Research Project, where I learned that it is just a pilot project from a small team. I thus suspect
that other groups and companies that were highly incentivized to create narrow but effective agents
would be able to do so. Narrow agents are now a real product rather than a future possibility.
There are already many coding agents, and you can use experimental open source agents that do
scientific and financial research. Narrow agents are specialized for a particular task, which means
they are somewhat limited. That raises the question of whether we soon see generalist agents
where you can just ask the AI anything, and it will use a computer and the internet to do it.
Simon Willison thinks not, despite what Sam Altman has argued. We will learn more as the year progresses,
but if general agenic systems work reliably and safely, that really will change things,
as it allows smart AIs to take action in the world.
Agents and very smart models are the core elements needed for transformative AI,
but there are many other pieces as well that seem to be making rapid progress.
This includes advances in how much AIs can remember, context windows,
and multimodal capabilities that allow them to see and speak.
It can be helpful to look back a little to get a sense of progress.
For example, I have been testing the prompt, Otter on a plane using Wi-Fi for image,
and video models since before ChatGPT came out. In October 2023, that prompt got you this
terrifying monstrosity. Less than 18 months later, multiple image creation tools nail the prompt.
The result is that I have had to figure out something more challenging. This is an example of
benchmark saturation, where old benchmarks get beaten by the AI. I decided to take a few minutes
and see how far I could get with Google's VEO2 video model in producing a movie of the Otter's
journey. The video you see below took less than 15 minutes of active work, although I had to wait
a bit for the videos to be created. Take a look at the quality of the shadows and light. I especially
appreciate how the otter opens the computer at the end. And to up the ante even further, I decided to
turn the saga of the otter into a 1980s style science fiction anime, featuring otters in space and a period
appropriate theme song, thanks to Suno. Again, very little human work was involved. Given all of this,
how seriously should we take the claims of the AI labs that a flood of intelligence is coming?
even if we only consider what we've already seen, the O3 benchmarks shattering previous barriers,
narrow agents conducting complex research and multimodal systems creating increasingly sophisticated content.
We're looking at capabilities that could transform many knowledge-based tasks.
And yet the labs insist this is merely the start, that far more capable systems and general agents are imminent.
What concerns me most isn't whether the labs are right about this timeline.
It's that we're not adequately preparing for what even current levels of AI can do,
let alone the chance that they might be correct. While AI researchers are focused on alignment,
ensuring AI systems act ethically and responsibly, far fewer voices are trying to envision and articulate
what a world awash in artificial intelligence might actually look like. This isn't just about
the technology itself. It's about how we choose to shape and deploy it. These aren't questions
that AI developers alone can or should answer. They're questions that demand attention from
organizational leaders who will need to navigate this transition, from employees whose work
lives may transform and from stakeholders whose futures may depend on these decisions. The flood of
intelligence that may be coming isn't inherently good or bad, but how we prepare for it, how we
adapt to it, and most importantly, how we choose to use it, will determine whether it becomes a force
for progress or disruption. The time to start having these conversations isn't after the water
starts rising. It's now. Today's episode is brought to you by Vanta. Trust isn't just earned,
it's demanded. Whether you're a startup founder navigating your first audit or a season secure
security professionals scaling your GRC program, proving your commitment to security has never been
more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish
trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2701. Centralized security
workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help
you start or scale up your security program by connecting you with auditors and experts to conduct your
audit and set up your security program quickly. Plus, with automation and AI throughout the platform,
Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies
like Atlassian, Kora, and Factory who use VANT to manage risk and prove security in real time.
For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nLW. That's V-A-N-T-A.com
for $1,000 off.
If there is one thing that's clear about AI in 2025, it's that the agents are coming.
Vertical agents by industry, horizontal agent platforms, agents per function.
If you are running a large enterprise, you will be experimenting with agents next year.
And given how new this is, all of us are going to be back in pilot mode.
That's why Superintelligent is offering a new product for the beginning of this year.
It's an agent readiness and opportunity audit.
Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.
If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.a.i.
Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market.
Hello, AI Daily Brief listeners.
Taking a quick break to share some very interesting findings from KPMG's latest AI quarterly
Pulse survey.
Did you know that 67% of business leaders expect AI to fundamentally transform their
businesses within the next two years?
And yet, it's not all smooth sailing.
The biggest challenges that they face include things like data quality, risk management,
and employee adoption.
KPMG is at the forefront of helping organizations navigate these hurdles.
They're not just talking about AI.
they're leading the charge with practical solutions and real-world applications.
For instance, over half of the organization surveyed are exploring AI agents to handle tasks
like administrative duties and call center operations.
So if you're looking to stay ahead in the AI game, keep an eye on KPMG.
They're not just a part of the conversation, they're helping shape it.
Learn more about how KPMG is driving AI innovation at KPMG.com slash US.
All right, back to the real non-AI-NLW here.
As usual, Ethan does a great job, I think, of summing up
a lot of what's going on as well as a lot of the sentiment out there. It has definitely been the
case that a vibe has shifted. The labs seem more and more comfortable and even eager to talk
about how quickly AGI is coming. Reasoning models are the watchword of the moment. And there have
also been some advances that make it feel like not only are these things coming, but they're
likely to be very widely accessible. The Chinese model Deepseek, which has everyone in such a tizzy
here, because of how close it performs to open AI models at a tiny fraction of the cost, has everyone
thinking even more about what the implications of incredibly cheap and abundant intelligence really are.
Also, since this piece was released, we got the release of OpenAI's operator, which, while still limited
in what it can do, is the sort of generalist agent that Ethan is talking about. There was an interesting
interview with venture capitalist Chris Saka earlier this week with Tim Ferriss, where Saka became the
latest person to articulate just how disruptive this wave of new intelligence and cheap and abundant
intelligence could really be when it comes to people's jobs and livelihoods. As I've said before,
I think that this transition, while hugely full of potential, will require nothing less than a total reevaluation of the social contract.
A new way of thinking about work, a new way of thinking about expectations, a new way of thinking about how we judge her own value, and so much more.
I don't know if things are moving faster or if it just feels like it.
I do think that things that have been theoretical for some number of years are now moving into production.
I think that this year we're going to see more and more people actually deploying agents in a way that makes the assistant era of AI look quaint.
And I agree with Ethan wholeheartedly that the time to be having these conversations about what we want out of a society that has AI embedded is now.
I don't think we're turning back the tide.
But that doesn't mean that we have no agency in the world that's being created.
Big ponderous thoughts for your weekend.
And with that, we will close the AI Daily Brief.
Appreciate you listening, as always.
And until next time, peace.
