Making Sense with Sam Harris - #420 — Countdown to Superintelligence
Episode Date: June 12, 2025Sam Harris speaks with Daniel Kokotaljo about the potential impacts of superintelligent AI over the next decade. They discuss Daniel’s predictions in his essay “AI 2027,” the alignment problem, ...what an intelligence explosion might look like, the capacity of LLMs to intentionally deceive, the economic implications of recent advances in AI, AI safety testing, the potential for governments to regulate AI development, AI coding capabilities, how we’ll recognize the arrival of superintelligent AI, and other topics. If the Making Sense podcast logo in your player is BLACK, you can SUBSCRIBE to gain access to all full-length episodes at samharris.org/subscribe. Learning how to train your mind is the single greatest investment you can make in life. That’s why Sam Harris created the Waking Up app. From rational mindfulness practice to lessons on some of life’s most important topics, join Sam as he demystifies the practice of meditation and explores the theory behind it.
Transcript
Discussion (0)
Welcome to the Making Sense Podcast.
This is Sam Harris.
Just a note to say that if you're hearing this, you're not currently on our subscriber
feed and will only be hearing the first part of this conversation.
In order to access full episodes of the Making Sense Podcast, you'll need to subscribe at
samharris.org.
We don't run ads on the podcast, and therefore
it's made possible entirely through the support of our subscribers. So if you enjoy what we're
doing here, please consider becoming one.
I am here with Daniel Cocotello. Daniel, thanks for joining me.
Thanks for having me.
So we'll get into your background in a second. I just want to give people a reference that is going to be of great interest after we have this conversation.
You and a bunch of co-authors wrote a blog post titled AI 2027, which is a very compelling read,
and we're going to cover some of it, but I'm sure there are details there that we're not going to get to.
So I highly recommend that people read that.
You might even read that before coming back to listen to this conversation. Daniel, what's your background?
I mean, we're going to talk about the circumstances under which you left OpenAI,
but maybe you can tell us how you came to work at OpenAI in the first place.
Sure. Yeah. So I've been sort of in the AI field for a while, mostly doing
forecasting and a little bit of alignment research. So that's probably why
I got hired at OpenAI. I was on the governance team. We were making policy
recommendations to the company and trying to predict where all of this was
headed. I worked at OpenAI for two years and then I quit last year and then I worked on AI 2027 with the team that we hired.
And one of your coauthors on that blog post
was Scott Alexander.
That's right.
Yeah, yeah, yeah.
Yeah, it's again, very well worth reading.
So what happened at OpenAI that precipitated your leaving
and can you describe the circumstances of your leaving?
Because I seem to remember you had to walk away with,
you refused to sign an NDA or a non-disparagement agreement
or something and had to walk away from your equity.
And that was perceived as both a sign
of the scale of your alarm and the depth of your principles.
What happened over there?
Yeah, so this story has been covered elsewhere
in greater detail, but the summary is that
it wasn't any one particular event
or scary thing that was happening.
It was more the general trends.
So if you've read AI 2027,
you get a sense of the sorts of things
that I'm expecting to happen in the future. And frankly, I think it's going to be incredibly dangerous.
And I think that there's a lot that society needs to be doing to get ready for this
and to try to avoid those bad outcomes and to steer things in a good direction.
And there's especially a lot that companies who are building this technology need to be doing,
which we'll get into later.
And not only was OpenAI not really doing those things,
OpenAI was sort of not on track to get ready
or to take these sorts of concerns seriously, I think.
And I gradually lost, I gradually came to believe this
over my time there and gradually came to think that,
well, basically that we were on a path
towards something like AI 2027 happening
and that it was hopeless to try to sort of be on the inside
and talk to people and try to sort of be on the inside and talk
to people and, you know, try to steer things in a good direction that way.
So that's why I left.
And then with the equity thing, they make their employees, when people leave, they have
this agreement that they try to get you to sign, which among other things says that you
basically have to agree never to criticize the company again and also never to tell anyone about this agreement, which was the clause that I found objectionable.
And if you don't sign, then they take away all of your equity, including your vested equity.
That's sort of a shocking detail. Is that even legal? I mean, isn't vested equity, vested equity?
One of the lessons I learned from this whole experience is it's good to get lawyers
know your rights, you know?
I don't know if it was legal actually,
but what happened was my wife and I talked about it
and ultimately decided not to sign,
even though we knew we would lose our equity
because we wanted to have the moral high ground
and to be able to criticize the company in the future.
And happily it worked out really well for us
because there was a huge uproar.
Like when this came to light,
a lot of employees were very upset.
The public was upset and the company very quickly
backed down and changed the policies.
So we got to keep our equity actually.
Okay, good.
So let's remind people about
what this phrase alignment problem means.
I mean, I've just obviously discussed this topic
a bunch on the podcast over the years,
but many people may be joining us
relatively naive to the topic.
How do you think about the alignment problem?
And why is it that some very well-informed people
don't view it as a problem at all?
Well, it's different for every person.
I guess working backwards, well, I'll work forwards.
So first of all, what is the alignment problem?
It's the problem of figuring out how to make AIs
sort of reliably do what we want.
It's maybe more specifically the problem
of shaping the cognition of the AIs
so that they have the goals that we want them to have.
They have the virtues that we want them to have,
such as honesty, for example.
It's very important that our AIs be honest with us.
Getting them to reliably be honest with us
is part of the alignment problem.
And it's sort of an open secret
that we don't really have a good solution
to the alignment problem right now.
Like you can go read the literature on this.
You can also look at what's you know, what's currently happening.
The AIs are not actually reliably honest, and there's many documented examples of them
saying things that we're pretty sure they know are not true, right? So this is a big,
open, unsolved problem that we are gradually making progress towards. And right now, the
stakes are very low. Right now, we just have these chatbots that, you know, even when they're misaligned and
even when they, you know, cheat or lie or whatever, it's not really that big of a problem.
But these companies, OpenAI, Anthropic, Google DeepMind, some of these other companies as
well, they are racing to build super intelligence.
You can see this on their website and in the statements of the CEOs, especially OpenAI and Anthropic have been pretty,
have literally said that they are building superintelligence
and they're trying to build it,
that they think they will succeed
around the end of this decade
or before this decade is out.
What is superintelligence?
Superintelligence is an AI system
that is better than the best humans at everything
while also being faster and cheaper.
So if they succeed in getting to super intelligence, then the alignment problem suddenly becomes
extremely high stakes.
We need to make sure that any super intelligences that are built or at least the first ones
that are built are aligned.
Otherwise terrible things could happen such as human extinction. Yeah, so we'll get there.
The leap from having what one person called
a functionally a country of geniuses in a data center,
the leap from that to real world risk
and something like human extinction
is gonna seem counterintuitive to some people.
So we'll definitely cover that.
But why is it, I mean, we have people,
I guess some people have moved on this topic.
I mean, so forgive me if I'm unfairly maligning anyone,
but I remember someone like Yan Lacoon over at Facebook,
who's obviously one of the pioneers in the field,
just doesn't give any credence at all
to the concept of an alignment problem.
Like I'm just, I've lost touch with how these people justify that degree of insuicence.
What what's your view of, of the skepticism that you mean?
Well, it's different for different people.
And honestly, like it would be helpful to have a more specific example of something
someone has said for me to respond to.
With Yann LeCun, if I remember correctly, for a while he was both saying things to the effect of
AIs are just tools and they're going to be submissive
and obedient to us because they're AIs
and there just isn't much of a problem here.
And also saying things along the lines of
they're never going to be super intelligent
or like the current LLMs are not on a path to AGI. They're not going to be super intelligent or like, you know, the, you know, the current LLMs are not on a path to AGI.
They're not going to be able to, you know, actually autonomously do a bunch of stuff.
It seems to me that thinking on that front has changed a lot that there.
Indeed.
Yon himself has sort of walked that back a bit and is now starting to, he's still sort
of like an AI skeptic, but, but now he's, I think there was a quote where he said something
like we're not going to get to super intelligence in the next five years or something, which like an AI skeptic, but now he's, I think there was a quote where he said something like,
we're not gonna get to super intelligence
in the next five years or something,
which is a much milder claim than what he used to be saying.
Well, when I started talking about this,
I think the first time was around 2016.
So nine years ago, I bumped into a lot of people
who would say this isn't gonna happen
for 50 years at least.
I'm not hearing increments of half centuries
thrown around much anymore.
A lot of people are debating the difference
between your time horizon, like two years or three years
and five or 10.
I mean, 10 at the outside is what I'm hearing
from people who seem cautious.
Yep, I think that's basically right as a description
of what smart people in the field are sort of converging towards.
And I think that's an incredibly important fact
for the general public to be aware of.
And everyone needs to know that the field of AI experts
and AI forecasters has lowered its timelines
and is now thinking that there is a substantial chance
that some of these
companies will actually succeed in building super intelligence sometime around the end
of the decade or so.
There's lots of disagreement about timelines exactly, but that's sort of like where a lot
of the opinions are headed towards now.
So the problem of alignment is the most grandiose speculative science fiction inflected version of the risk posed by AI, right?
This is the risk that a super intelligent,
self-improving, autonomous system could get away from us
and not have our wellbeing in its sights
or actually be, you know, actually hostile to it
and for some reason that we didn't put into the AI.
And therefore we could find ourselves playing chess
against the perfect chess engine and failing.
And that poses an existential threat, which we'll describe.
But obviously there are nearer term concerns
that more and more people are worried about.
There's the human misuse of increasingly powerful AI.
There's, we mightuse of increasingly powerful AI.
There's, we might call this a containment problem.
I think Mustafa Suleiman over at Microsoft
used to be a deep mind,
tends to think of the problem of containment first,
that really it's aligned or not,
as this technology gets more democratized,
people can decide to put it to sinister use,
which is to say, use that we would consider unaligned.
They can change the system level prompt
and make these tools malicious
as they become increasingly powerful.
And it's hard to see how we can contain the spread of that risk.
And yeah, I mean, so then there's just the other issues like, you know,
job displacement and economic and political concerns that are all too obvious.
I mean, it's just the spread of misinformation and the political instability
that can arise in the context of spreading
misinformation and shocking degrees of wealth inequality that might initially be unmasked
by the growth of this technology.
Let's just get into this landscape knowing that misaligned superintelligence is the kind
of the final topic we want to talk about.
What is it that you and your co-authors are predicting?
Why did you title your piece AI 2027?
What are the next two years on your account hold for us?
That's a lot to talk about.
So the reason why we titled it AI 2027
is because in the scenario that we wrote,
the most important pivotal events and decisions
happen in 2027.
The story continues to 2028, 2029, et cetera,
but the most important part of the story happens in 2027.
For example, what is called in the literature, AI takeoff happens in AI 2027.
AI takeoff is this forecasted dynamic of the speed of AI research
accelerating dramatically when AIs are able to do AI research
much better than humans.
So in other words, when you automate the AI research,
probably it will go faster.
And there's a question about how much faster it will go,
what that looks like, et cetera,
when it will eventually asymptote.
But that whole dynamic is called AI takeoff
and it happens in our scenario in 2027.
I should say as a footnote,
I've updated my timelines a little bit more optimistic
after writing this and now I would say 2028 is more likely.
But broadly speaking, I still feel like it's basically
the tracks we're headed on.
So when you say AI takeoff,
is that synonymous with the older phrase
and intelligence explosion?
Basically, yeah.
Yeah, I mean, that phrase has been with us
for a long time since the mathematician,
I.J. Good, I think in the 50s,
you posited this, just extrapolated, you know,
from the general principle that once you had machines,
intelligent machines devising the next generation
of intelligent machines,
that this process could be self-sustaining
and asymptotic and get away from us.
And he dubbed it this an intelligence explosion.
So this is mostly a story of software improving software.
I mean, the AI at this point doesn't yet have its hands on,
you know, physical factories building new chips or robots.
That's right.
Yeah.
Yeah.
So, I mean, and this is also another important thing that I think
that I would like people to think about more and understand better is that I think that
at least in our view, most of the important decisions that affect the fates of the world
will be made prior to any massive transformations of the economy due to AI. And if you want to
understand why or how, why we mean that, et cetera, well, it's all laid out in
our scenario. You can sort of see the events unfold and then you sort of after you finish reading it,
you can be like, oh yeah, I guess like the world looked kind of pretty normal in 2027, even though
you know, behind closed doors at the AI companies, all of these incredibly impactful decisions were
being made about automating AI research and producing super intelligence and so forth. And then
in 2028,
things are going crazy in the real world and there's all these new factories and
robots and stuff being built orchestrated by the superintelligences.
But in terms of like where to intervene,
you don't want to wait until the superintelligences are
already building all the factories.
You want to like try to steer things in a better direction before then.
Yeah. So in your piece, I mean, it is a kind of a piece of speculative fiction in a way,
but it's all too plausible. And what's interesting is just the, some of the disjunctions you point
out to me, like moments where the economy is actually, you know, for real people is probably
being destroyed because people are becoming far less valuable. There's another blog post that perhaps you know about
called the intelligence curse,
which goes over some of this ground as well,
which I recommend people look up.
But that's really just a name for this principle
that once AI is better at virtually everything
than people are, right?
Once it's all analogous to chess,
the value of people just evaporates
from the point of view of companies and even governments.
Right? I mean, people are not necessary for,
because they can't add value to any process
that's running the economy,
or the most important process is
they're running the economy.
So there's interesting moments
where the stock market
might be booming, but the economy for most people
is actually in free fall.
And then you get into the implications of an arms race
between the US and China, and it's all too plausible.
The moments you admit that we are in this arms race condition and an arms race
is precisely the situation wherein all the players
are not holding safety as their top priority.
Yeah. And unfortunately, you know,
I don't think it's good that we're in an arms race,
but it does seem to be what we're headed towards.
And it seems to be what the companies are also
like pushing
along, right? If you look at the rhetoric coming out of the lobbyists, for example,
they talk a lot about how it's important to beat China and how the US needs to maintain
its competitive advantage in AI and so forth. I mean, more generally, like, it's kind of,
like I'm not sure what the best way to say this is, but basically a lot of people at these companies
building this technology expect something more or less
like AI 2027 to happen and have expected this for years.
And like this is what they are building towards
and they're doing it because they think if we don't do it,
someone else will do it worse.
And they think it's going to work out well.
Do they think it's gonna work out well or they just think that it's going to work out well. Do they think it's going to work out well
or they just think that there is no alternative
because we have a coordination problem we can't solve?
I mean, if Anthropic stops,
they know that OpenAI is not going to stop.
They can't agree to, you know,
all the US players can't agree to stop together.
And even if they did, they know that China wouldn't stop.
Right, so it's just this, it's a coordination problem.
They can't be solved, even if everyone agrees
that an arms race condition could likely,
with some significant probability,
I mean, maybe it's only 10% in some people's minds,
but it's still a non-negligible probability
of birthing something that destroys us.
Yeah, my take on that is it's both.
So I think that, you know, I have lots of friends at these companies and I used to work there
and I talked to lots of people there all the time.
In my opinion, I think on average, they're overly optimistic about where all this is headed,
perhaps because they're biased, because their job depends on them thinking it's a good idea to do all of this.
But also separately, there
is this very, there is both a real like arms race dynamic where it just really is true
that if, if one company decides not to do this, then other companies will probably just
do it anyway. And it really is true that if one country decides not to do this, other
countries will probably do it anyway. And then there's also an added element of perceived dynamic there,
where a lot of people are basically not even trying
to coordinate the world to handle this responsibly
and to put guard rails in place or to slow down
or whatever.
And they're not trying because they basically think
it's hopeless to achieve that level of coordination.
Well, you mentioned that the LLMs are already showing
some deceptive characteristics.
I guess we might wonder whether what is functionally
appearing as deception is really deception.
I mean, really motivated in any sense that we would,
whether we're guilty of anthropomorphizing these systems
by calling it lying or deception,
but what's the behavior that we have seen
from some of these systems that we're calling lying
or cheating or deception?
Yeah, great question.
So there's a couple of things,
keywords to search for are sycophancy,
reward hacking and scheming.
So there's various papers on this
and there's even blog posts by OpenAI and Anthropic
detailing some examples that have been found.
So sycophancy is an observed tendency of many of these AI systems to basically suck up to or flatter the humans they're talking to,
often in ways that are just extremely over the top and egregious.
And, you know, we don't know for sure what...
If you'd like to continue listening to this conversation, you'll need to subscribe at samharris.org.
Once you do, you'll get access to all full-length episodes of the Making Sense Podcast.
The Making Sense Podcast is ad-free and relies entirely on listener support.
And you can subscribe now at samharris.org.