Making Sense with Sam Harris - Making Sense of Artificial Intelligence | Episode 1 of The Essential Sam Harris
Episode Date: November 22, 2022Filmmaker Jay Shapiro has produced a new series of audio documentaries, exploring the major topics that Sam has focused on over the course of his career. Each episode weaves together original analysis..., critical perspective, and novel thought experiments with some of the most compelling exchanges from the Making Sense archive. Whether you are new to a particular topic, or think you have your mind made up about it, we think you’ll find this series fascinating. In this episode, we explore the landscape of Artificial Intelligence. We’ll listen in on Sam’s conversation with decision theorist and artificial-intelligence researcher Eliezer Yudkowsky, as we consider the potential dangers of AI – including the control problem and the value-alignment problem – as well as the concepts of Artificial General Intelligence, Narrow Artificial Intelligence, and Artificial Super Intelligence. We’ll then be introduced to philosopher Nick Bostrom’s “Genies, Sovereigns, Oracles, and Tools,” as physicist Max Tegmark outlines just how careful we need to be as we travel down the AI path. Computer scientist Stuart Russell will then dig deeper into the value-alignment problem and explain its importance. We’ll hear from former Google CEO Eric Schmidt about the geopolitical realities of AI terrorism and weaponization. We’ll then touch the topic of consciousness as Sam and psychologist Paul Bloom turn the conversation to the ethical and psychological complexities of living alongside humanlike AI. Psychologist Alison Gopnik then reframes the general concept of intelligence to help us wonder if the kinds of systems we’re building using “Deep Learning” are really marching us towards our super-intelligent overlords. Finally, physicist David Deutsch will argue that many value-alignment fears about AI are based on a fundamental misunderstanding about how knowledge actually grows in this universe.
Transcript
Discussion (0)
Thank you. of the Making Sense podcast, you'll need to subscribe at samharris.org. There you'll find our private RSS feed
to add to your favorite podcatcher,
along with other subscriber-only content.
We don't run ads on the podcast,
and therefore it's made possible entirely
through the support of our subscribers.
So if you enjoy what we're doing here,
please consider becoming one.
I am here with Jay Shapiro.
Jay, thanks for joining me.
Thank you for having me.
So we have a fun project to talk about here.
And let's see if I can remember the the middle of the night one night realizing that more or less my entire catalog of podcasts was, if not the entire thing, maybe conservatively speaking, 50% of all the podcasts were evergreen, which is to say that their content was basically as good today as the day I recorded them, but because of the nature of the medium, they would never be perceived as such, and people really don't tend to go back into the catalog and listen to a three-year-old podcast. And yet, there's something insufficient about just
recirculating them in my podcast feed or elsewhere. And so I and Jaron, my partner in crime here, we're trying to think about how to
give all of this content new life. And then we thought of you just independently turning your
creative intelligence loose on the catalog. And now I will properly introduce you as someone who
should be doing that.
Perhaps you can introduce yourself.
Just tell us what you have done over these many years and the kinds of things you've focused on.
Yeah, well, I'm a filmmaker first and foremost.
But I think my story and my genesis of being maybe the right person to tap here is probably indicative or representative of
a decent portion of your audience. I'm just guessing. I'm 40 now, which pegs me in college
when 9-11 hit. It was my late my second year. I guess it would have been early if it was September.
And, you know, I never heard of you at all at that point. I was an atheist and just didn't think too much
about that kind of stuff. I was fully on board with any atheist things I saw coming across my
world. But then 9-11 hit, and I was on a very liberal college campus. And the kind of questions
that were popping up in my mind and I was asking myself were uncomfortable for me. I just didn't
know what to do with them.
I really had no formal philosophical training
and I kind of just buried them, you know,
under the weight of my own confusion or shame
or just whatever kind of brew a lot of us
were probably feeling at the time.
And then I discovered your work with The End of Faith,
right when you sort of were responding to the same thing
and a lot of your language, your work with The End of Faith, right when you sort of were responding to the same thing. And
a lot of your language, you were philosophically trained and maybe sharper with your language,
for better or worse, which we found out later was complicated, resonated with me. And I started
following along with your work and The Four Horsemen and Hitchens and Dawkins and that sort
of whole crowd. And I'm sure
I wasn't alone. And then I paid close, special attention to what you were doing, which I actually
included in one of the pieces that I ended up putting together in this series. But with a talk
you gave in Australia, you know, I don't have to tell you about your career, but again, I was
following along as you were on sort of this atheist circuit and I was interested. But whenever you would talk about sort of the
hard work of secularism and the hard work of atheism, in particular, I'm thinking of your
talk called Death in the Present Moment right after Christopher Hitchens had died. I'm actually
curious how quickly you threw that together because I know you were supposed to or you
were planning on speaking about free will and you ended up giving this whole other talk
and that one and I'll save it because I definitely put that one in our compilation but it's it it
struck me as okay this guy's up to something a little different and the questions that he's
asking are really different I was just on board with that ride so I I became a fan and like
probably many of your listeners started to really follow and
listen closely and became a student. And hopefully, like any good student started to disagree with my
teacher a bit and, and slowly get the confidence to push back and have my own thoughts and maybe
find find the weaknesses and strengths of what you were up to. And, you know, your work exposed me and many,
many other people, I'm sure to a lot of great thinkers. And maybe you don't love this,
but sometimes the people who disagree with you that you introduce us to on this side of the
microphone, we think are right. And that's a great credit to you as well for just giving them the air
and maybe on some really nerdy, esoteric things.
I'm one of them at this point now, because to back up way to the beginning of the story,
I was at a university where I was well on my way to a film degree, which is what I ended
up getting.
But when 9-11 hit, I started taking a lot more courses and a track that they had, which
I think is fairly unique at the time.
Maybe still one of the only programs where you can actually major in Holocaust studies, which is sort of sits in between the
history and philosophy kind of departments. And I started taking a bunch of courses in there. And
that's where I was first exposed to sort of the formal philosophy, language and education. And
that was so useful for me. So I was just on board. And now
hopefully I, you know, I swim deep in those waters and know my way around the lingo. And it's super
helpful. But yeah, it was it was almost, you know, Moore's law of bringing up the Nazis was those
were those the first times actually in courses called like resistance during the Holocaust and
things like that, where, you know, I first was exposed to
the words like deontology and consequentialism and utilitarianism and a lot of moral ethics stuff.
And then I went further on my own into sort of the theory of mind and this kind of stuff. But
yeah, I consider myself in this weird new digital landscape that we're in a bit of a student of the
school of Sam Harris. But then again, like hopefully any good student, I've branched off and have my own sort of thoughts and framings. And so I'm definitely in these pieces in
this series that we're calling The Essential Sam Harris. I can't help but sort of put my writing
and my framework on it, or at least hope that the people and the challenges that you've encountered
and continue to encounter, whether they're right or wrong or making drastic mistakes, I want to give everything in it a really
fair hearing. So there's times I'm sure where the listener will hear my own hand of opinion coming
in there, and I'm sure you know the areas as well. But most times I'm just trying to give
an open door to the mystery and why these subjects interest you in the first place, if that makes sense.
Yeah, yeah. And I should remind both of us that we met because you were directing a film focused on Majid Nawaz and me around our book, Islam and the Future of Tolerance.
book, Islam and the Future of Tolerance. And also, we've brought into this project another person who I think you met independently, I kind of remember, but Megan Phelps Roper, who's been a guest on the
podcast, and someone who I have long admired. And she's doing the voiceover work in this series,
and she happens to have a great voice. So I'm very happy to be working with her.
Yeah, I did meet her independently. Your archive, I think you said three or four years old. Your
archive is over 10 years old now. And I was diving into the earliest days of it. And there are some
fascinating conversations that age really interestingly. And I'm curious. I mean,
I think this project, again, it's for fans, it's for listeners, but it's for people who might hate you also, or critics of you, or people who are sure you were missing something or wrong about something, or even yourself, to go back and listen to certain conversations.
conversation is seven or eight years ago now. And the part that I really resurfaced, it's actually in the morality episode, is full of details and philosophies and politics and moral philosophies
regarding things like intervention in the Middle East. And at the time of your recording, of course,
we had no idea how Afghanistan might look a decade from then.
But now we kind of do.
And it's not a... If people listen to these carefully,
it's not about, oh, this side of the conversation
turned out to be right
and this kind of part turned out to be wrong.
But certain things hit our ears a little differently.
Even on this first topic of artificial intelligence,
I mean, I think that conversation continues to evolve in a way where the issues that you bring up are
evergreen, but hopefully evolving as well, just as far as their application goes. So yeah, so I
think you, I would love to hear your thoughts, listening back to some of those. And in fact,
to reference the film we made together,
a lot of that film was you doing that actively and live given a specific topic of looking back and reassessing language about how it might, you know, land politically in that project. So yeah,
but this goes into really different, including an episode about social media, which
changes every day. Yeah, changes by the hour.
Yeah. And the conversation you have with Jack Dorsey is now fascinating for all kinds of
different reasons that at the time couldn't have been. So yeah, it's evergreen, but it's also just
like new life in all of them, I think.
Yeah. Yeah. Well, I look forward to hearing it. Just to be clear, this has been very much your project. I mean, I haven't heard
most of this material since the time I recorded it and released it. And you've gone back and
created episodes on a theme where you've pulled together five or six conversations and intercut
material from five or six different episodes and then added
your own interstitial pieces, which you have written and Megan Phelps Roper is reading. So
it's just, these are very much their own documents. And as you say, you don't agree with me about
everything and that you're occasionally, you're shading different points from your own point of view. And so, yeah,
I look forward to hearing it and we'll be dropping the whole series here in the podcast feed.
If you're in the public feed, as always, you'll be getting partial episodes. And if you're in the
subscriber feed, you'll be getting full episodes. And the first will be on artificial intelligence. And
then there are many other topics, consciousness, violence, belief, free will, morality, death,
and others beyond that. Yeah. There's one on existential threat and nuclear war that I'm
still piecing together, but that one's pretty harrowing. One of your areas of interest. Yeah, yeah. Great.
Well, thanks for the collaboration, Jay.
Again, I'm a consumer of this,
probably more than a collaborator at this point
because I have only heard part of what you've done here.
So I'll be eager to listen as well.
But thank you for the work that you've done.
No, thank you.
And I'll just say,
you're gracious to allow someone to do this, who, who does have some, you know, again,
most of our, my disagreements with you are pretty deep and nerdy and, and a sort of terror kind of
philosophy stuff, but it's incredibly gracious that you've given me the opportunity to do it.
And then hopefully again, I'm a bit of a representative for people who have been in the passenger seat of your public project of thinking out loud for
over a decade now. And if I can, if I can, you know, be a voice for that, that part of the crowd,
it's just, it's an honor to do it. And there are a lot of fun to a ton of fun. There's a ton of
audio, you know, like thought experiments that we play with and hopefully bring to life in your ears a little bit, including in this very first one with
artificial intelligence. So yeah, I hope people enjoy it. I do as well. So now we bring you
Megan Phelps-Roper on the topic of artificial intelligence.
Welcome to The Essential Sam Harris. This is Making Sense of Artificial Intelligence.
The goal of this series is to organize, compile, and juxtapose conversations hosted by Sam Harris into specific areas of interest.
This is an ongoing effort to construct a coherent overview of Sam's perspectives and arguments,
the various explorations and approaches to the topic,
the relevant agreements and disagreements, and the pushbacks and evolving thoughts which his
guests have advanced. The purpose of these compilations is not to provide a complete
picture of any issue, but to entice you to go deeper into these subjects. Along the way,
we'll point you to the full episodes with
each featured guest. And at the conclusion, we'll offer some reading, listening, and watching
suggestions, which range from fun and light to densely academic. One note to keep in mind for
this series. Sam has long argued for a unity of knowledge where the barriers between fields of
study are viewed as largely unhelpful artifacts of unnecessarily partitioned thought.
The pursuit of wisdom and reason in one area of study naturally bleeds into, and greatly affects, others.
You'll hear plenty of crossover into other topics as these dives into the archives unfold.
And your thinking about a particular topic may shift
as you realize its contingent relationships with others.
In this topic, you'll hear the natural overlap
with theories of identity and the self,
consciousness, and free will.
So, get ready.
Let's make sense of artificial intelligence.
make sense of artificial intelligence. Artificial intelligence is an area of resurgent interest in the general public. Its seemingly eminent arrival first garnered wide attention in the late 60s,
with thinkers like Marvin Minsky and Isaac Asimov writing provocative and thoughtful books
about the burgeoning technology and concomitant philosophical and ethical quandaries. Science fiction novels, comic books, and TV shows were
flooded with stories of killer robots and encounters with super-intelligent artificial
lifeforms hiding out on nearby planets, which we thought we would soon be visiting on the backs of
our new rocket ships. Over the following decades, the excitement and fervor
look to have faded from view in the public imagination. But in recent years, it has made
an aggressive comeback. Perhaps this is because the fruits of the AI revolution and the devices
and programs once only imagined in those science fiction stories have started to rapidly show up
in impressive and sometimes disturbing ways all around us.
Our smartphones, cars, doorbells, watches, games, thermostats, vacuum cleaners, light bulbs, and glasses now have embedded algorithms running on increasingly powerful hardware,
which navigate, dictate, or influence not just our locomotion, but our entertainment choices,
our banking, our politics, our datingotion, but our entertainment choices, our banking,
our politics, our dating lives, and just about everything else. It seems every other TV show or movie that appears on a streaming service is birthed out of a collective interest, fear,
or otherwise general fascination with the ethical, societal, and philosophical implications of
artificial intelligence. There are two major ways to think about the threat of what is generally called AI.
One is to think about how it will disrupt our psychological states or fracture our information
landscape, and the other is to ponder how the very nature of the technical details of its development
may threaten our existence. This compilation is mostly focused
on the latter concern, because SAM is certainly amongst those who are quite worried about the
existential threat of the technical development and arrival of AI.
Now, before we jump into the clips, there are a few concepts that you'll need to onboard to
find your footing. You'll hear the terms Artificial General Intelligence, or AGI, and Artificial Superintelligence, or ASI, used in
these conversations. Both of these terms refer to an entity which has a kind of intelligence
that can solve a nearly infinitely wide range of problems. We humans have brains which display this
kind of adaptable intelligence.
We can climb a ladder by controlling our legs and arms in order to retrieve a specific object
from a high shelf with our hands. And we use the same brain to do something very different,
like recognize emotions in the tone of a voice of a romantic partner.
I look forward to infinity with you.
That same brain can play a game of checkers against a young child,
who we might also be coyly trying to let win,
or play a serious game of competitive chess against a skilled adult.
That same brain can also simply lift a coffee mug to our lips,
not just to ingest nutrients and savor the taste of the beans,
but also to send a subtle social signal to a friend at the table
to let them know that their story is dragging on a bit.
All of that kind of intelligence is embodied and contained in the same system, namely our brains.
AGI refers to a human level of intelligence, which doesn't surpass what our brightest humans can accomplish on any given task,
while ASI references an intelligence which performs at,
well, superhuman levels. This description of flexible intelligence is different from a system
which is programmed or trained to do one particular thing incredibly well, like arithmetic,
or painting straight lines on the sides of a car, or playing computer chess, or guessing large prime numbers, or displaying
music options to a listener based on the observable lifestyle habits of like-minded users in a certain
demographic. That kind of system has an intelligence that is sometimes referred to as narrow or weak AI.
But even that kind of thing can be quite worrisome from the standpoint of weaponization or preference manipulation.
You'll hear Sam voice his concerns throughout these conversations,
and he'll consistently point to our underestimation of the challenge that even narrow AI poses.
So, there are dangers and serious questions to consider no matter which way we go with the AI topic.
to consider no matter which way we go with the AI topic. But as you'll also hear in this compilation,
not everyone is as concerned about the technical existential threat of AI as Sam is. Much of the divergence in levels of concern stems from initial differences on the fundamental conceptual approach
towards the nature of intelligence. Defining intelligence is notoriously slippery and controversial,
but you're about to hear one of Sam's guests offer a conception which distills intelligence
to a type of observable competence at actualizing desired tasks, or an ability to manifest preferred
future states through intentional current action and intervention. You can imagine a linear gradient
indicating more or less
of the amount of this competence as you move along it.
This view places our human intelligence on a continuum
along with bacteria, ants, chickens, honeybees, chimpanzees,
all of the potential undiscovered alien life forms,
and of course, artificial intelligence,
which perches itself far above
our lowly human competence. This presents some rather alarming questions. Stephen Hawking once
issued a famous warning that perhaps we shouldn't be actively seeking out intelligent alien
civilizations, since we'd likely discover a culture which is far more technologically advanced than
ours. And, if our planet's history provides any lessons,
it seems to prove that when technologically mismatched cultures come into contact,
it usually doesn't work out too well for the lesser-developed one.
Are we bringing that precise suicidal encounter into reality
as we set out to develop artificial intelligence?
That question alludes to what is known as the value alignment problem.
But before we get to that challenge, let's go to our first clip,
which starts to lay out the important definitional foundations
and distinction of terms in the landscape of AI.
The thinker you're about to meet is the decision theorist and computer scientist Eliezer Yudkowsky.
Yudkowsky begins here by defending this linear gradient perspective on intelligence
and offers an analogy to consider how we might be mistaken about intelligence
in a similar way to how we once were mistaken about the nature of fire.
It's clear that Sam is aligned and attracted to Eliezer's run at this question,
and consequently, both men end up sharing a good deal of unease about
the implications that all of this has for our future. This is from episode 116, which
is entitled AI, Racing Towards the Brink.
Let's just start with the basic picture and define some terms. I suppose we should define intelligence first and then jump into
the differences between strong and weak or general versus narrow AI. Do you want to start us off on
that? Sure. Preamble disclaimer, though, the field in general, like not everyone you would ask would
give you the same definition of intelligence. And a lot of times in cases like those, it's good to sort of go back to observational
basics. We know that in a certain way, human beings seem a lot more competent than chimpanzees,
which seems to be a similar dimension to the one where chimpanzees are more competent than mice,
or that mice are more competent than spiders.
And people have tried various theories about what this dimension is. They've tried various
definitions of it. But if you went back a few centuries and asked somebody to define fire,
the less wise ones would say, ah, fire is the release of phlogiston. Fire is one of the four
elements. And the truly wise ones would say, well, fire is the release of phlogiston. Fire is one of the four elements. And the truly
wise ones would say, well, fire is the sort of orangey bright hot stuff that comes out of wood
and spreads along wood. And they would tell you what it looked like and put that prior to their
theories of what it was. So what this mysterious thing looks like is that humans can build space
shuttles and go to the moon, and mice can't. And we think it has
something to do with our brains. Yeah. Yeah. I think we can make it more abstract than that.
Tell me if you think this is not generic enough to be accepted by most people in the field. It's
whatever intelligence may be in specific context. So generally speaking, it's the ability to meet goals,
perhaps across a diverse range of environments. And we might want to add that it's at least
implicit in intelligence that interests us. It means an ability to do this flexibly rather than
by rote following the same strategy again and again blindly.
Does that seem like a reasonable starting point?
I think that that would get fairly widespread agreement and it matches up well with some
of the things that are in AI textbooks.
If I'm allowed to sort of take it a bit further and begin injecting my own viewpoint into
it, I would refine it and say that by achieve goals, we mean something like squeezing
the measure of possible futures higher in your preference ordering. If we took all the possible
outcomes and we rank them from the ones you like least to the ones you like most, then as you
achieve your goals, you're sort of like squeezing the outcomes higher in your preference ordering.
You're narrowing down what the outcome would be to be something more like what you want,
even though you might not be able to narrow it down very exactly. Flexibility, generality.
There's a, like humans are much more domain general than mice. Bees build hives,
general than mice. Bees build hives. Beavers build dams. A human will look over both of them and envision a honeycomb-structured dam. We are able to operate even on the moon,
which is very unlike the environment where we evolved. In fact, our only competitor in terms of
we evolved. In fact, our only competitor in terms of general optimization, where optimization is that sort of narrowing of the future that I talked about, our competitor in terms of general
optimization is natural selection. Natural selection built beavers, it built bees, it
sort of implicitly built the spider's web in the course of building spiders. And we as humans have this similar very broad range to handle this huge variety of problems.
And the key to that is our ability to learn things that natural selection did not pre-program us with.
So learning is the key to generality.
I expect that not many people in AI would disagree with that part either.
Right. of generality. I expect that not many people in AI would disagree with that part either.
Right. So it seems that goal-directed behavior is implicit in this, or even explicit in this definition of intelligence. And so whatever intelligence is, it is inseparable from
the kinds of behavior in the world that results in the fulfillment of goals. So we're talking about agents that can do things.
And once you see that, then it becomes pretty clear that if we build systems that harbor primary
goals, you know, there are cartoon examples here like, you know, making paperclips. These are not
systems that will spontaneously decide that they could be doing more enlightened things than, say, making paperclips. that will arrive in these systems apart from the ones we put in there. And we have common
sense intuitions that make it very difficult for us to think about how strange an artificial
intelligence could be, even one that becomes more and more competent to meet its goals.
Let's talk about the frontiers of strangeness in AI as we move from, again, I think we have a couple more definitions we should probably put in play here, differentiating strong and weak or general and narrow intelligence.
Well, to differentiate general and narrow, I would say that, well, I mean, this is like, on the one hand, theoretically a spectrum.
that, well, I mean, this is like, on the one hand, theoretically a spectrum.
Now, on the other hand, there seems to have been like a very sharp jump in generality between chimpanzees and humans.
So breadth of domain driven by breadth of learning.
Like DeepMind, for example, recently built AlphaGo, and I lost some money betting that
AlphaGo would not defeat the human champion,
which it promptly did. And then a successor to that was AlphaZero. And AlphaGo was specialized
on Go. It could learn to play Go better than its starting point for playing Go, but it couldn't
learn to do anything else. And then they simplified the
architecture for AlphaGo. They figured out ways to do all the things it was doing in more and
more general ways. They discarded the opening book, like all the sort of human experience of
Go that was built into it. They were able to discard all of the sort of like programmatic
special features that detected features of the Go board. They figured out how to do that in simpler ways. And because they figured out how to do it in simpler ways,
they were able to generalize to AlphaZero, which learned how to play chess using the same
architecture. They took a single AI and got it to learn Go and then reran it and made it learn chess. Now that's not human general,
but it's like a step forward in generality of the sort that we're talking about.
Am I right in thinking that that's a pretty enormous breakthrough?
I mean, there's two things here.
There's the step to that degree of generality,
but there's also the fact that they built a Go engine.
I forget if it was a Go or a chess or both,
which basically surpassed all of the specialized AIs
on those games over the course of a day, right?
Isn't the chess engine of AlphaZero
better than any dedicated chess computer ever?
And didn't it achieve that just with astonishing speed?
Well, there was actually some amount of debate afterwards
whether or not the version of the chess engine
that it was tested against was truly optimal.
But even the extent that it was in that narrow range
of the best existing chess engine,
as Max Tegmark put it,
the real story wasn't in how AlphaGo beat human Go players,
it's how AlphaZero beat human Go system programmers
and human chess system programmers.
People had put years and years of effort
into accreting all of the special purpose code that would play chess well and efficiently.
And then AlphaZero blew up to and possibly passed that point in a day.
And if it hasn't already gone past it, well, it would be past it by now if DeepMind kept working on it.
Although they've now basically declared victory and
shut down that project as I understand it.
Okay, so talk about the distinction between general and narrow intelligence a little bit
more.
So we have this feature of our minds, most conspicuously, where we're general problem
solvers. We can learn new things, and our learning in one area doesn't require a
fundamental rewriting of our code. Our knowledge in one area isn't so brittle as to be degraded by
our acquiring knowledge in some new area, or at least this is not a general problem which
erodes our understanding again and again. And we don't yet have computers
that can do this, but we're seeing the signs of moving in that direction. And so then it's often
imagined that there's a kind of near-term goal, which has always struck me as a mirage of so-called
human-level general AI. I don't see how that phrase will ever mean much of anything,
given that all of the narrow AI we've built thus far is superhuman within the domain of
its applications. The calculator in my phone is superhuman for arithmetic. Any general AI that also has my phone's ability to calculate will be superhuman
for arithmetic, but we must presume it'll be superhuman for all of the dozens or hundreds of
specific human talents we've put into it, whether it's facial recognition or just obviously memory
will be superhuman unless we decide to consciously degrade it.
Access to the world's data will be superhuman unless we isolate it from data. Do you see
this notion of human level AI as a landmark on the timeline of our development or is it just
never going to be reached? I think that a lot of people in the field would agree that human level AI defined as literally at the human level, neither above nor below, across a wide range of competencies is a straw target, an impossible mirage. or rather that like if we're put into a sort of like real world, lots of things going on,
context that places demands on generality, then AIs are not really in the game yet.
Humans are like clearly way ahead. And more controversially, I would say that we can
imagine a state where the AI is clearly way ahead, where it is across sort of every kind of cognitive competency,
barring some very narrow ones that aren't deeply influential of the others. Maybe chimpanzees
are better at using a stick to draw ants from an ant hive and eat them than humans are,
though no humans have really practiced that to world championship level exactly.
humans are, though no humans have really practiced that to world championship level exactly.
But there's this sort of general factor of how good are you at it when reality throws you a complicated problem. At this, chimpanzees are clearly not better than humans. Humans are
clearly better than chimps, even if you can manage to narrow down one thing the chimp is better at.
The thing the chimp is better at doesn't play a big role in our global economy. It's not an input
that feeds into lots of other things. So we can clearly imagine, I would say, like there are some people
who say this is not possible. I think they're wrong, but it seems to me that it is perfectly
coherent to imagine an AI that is like better at everything or almost everything than we are.
And such that if it was like building an economy with lots of inputs, like humans would have around the same level input into that economy as the chimpanzees have into ours.
Yeah, yeah. So what you're gesturing at here is a continuum of intelligence that I think most people never think about.
And because they don't think about it, they have a default doubt that it exists.
I think when people, and this is a point I know you've made in your writing, and I'm sure it's
a point that Nick Bostrom made somewhere in his book, Superintelligence. It's this idea that
there's a huge blank space on the map past the most well-advertised exemplars of human brilliance, where we don't imagine what it
would be like to be five times smarter than the smartest person we could name. And we don't even
know what that would consist in, right? Because if chimps could be given to wonder what it would
be like to be five times smarter than the smartest chimp, they're not going to represent for themselves all of the
things that we're doing that they can't even dimly conceive. There's a kind of disjunction
that comes with more. There's a phrase used in military contexts. I don't think the quote is
actually, it's variously attributed to Stalin and Napoleon and I think Clausewitz, like half a dozen people
who have claimed this quote. The quote is, sometimes quantity has a quality all its own.
As you ramp up in intelligence, whatever it is at the level of information processing,
spaces of inquiry and ideation and experience begin to open up, and we can't necessarily predict
what they would be from where we sit. How do you think about this continuum
of intelligence beyond what we currently know in light of what we're talking about?
Well, the unknowable is a concept you have to be very careful with, because the thing you can't
figure out in the first 30 seconds of thinking about it, sometimes you can figure it out if you think for another five minutes. So in particular,
I think that there's a certain narrow kind of unpredictability, which does seem to be plausibly
in some sense essential, which is that for AlphaGo to play better Go than the best human Go players, it must be the case that the best human Go players cannot
predict exactly where on the Go board AlphaGo will play. If they could predict exactly where
AlphaGo would play, AlphaGo would be no smarter than them. On the other hand, AlphaGo's programmers
and the people who knew what AlphaGo's programmers were trying to do, or even just the people who watched AlphaGo play, could say, well, I think this system is going to play such that it
will win at the end of the game, even if they couldn't predict exactly where it would move on
the board. So similarly, there's a sort of like not short or like not necessarily slam dunk or not like immediately obvious chain of reasoning, which says that it is okay for us to reason about aligned or even unaligned artificial general intelligences of sufficient power as if they're trying to do something, but we don't
necessarily know what, but from our perspective that still has consequences, even though we can't
predict in advance exactly how they're going to do it. Yudkowsky lays out a basic picture of
intelligence that, once accepted, takes us into the details and edges us towards the cliff.
And now we're going to introduce someone who tosses us fully into the canyon.
Yudkowsky just brought in the concept we mentioned earlier of value alignment in artificial
intelligence. There's a related problem called the control or containment problem. Both are concerned
with the issue of just how we
would go about building something that is unfathomably smarter and more competent than us,
that we could either contain in some way to ensure it wouldn't trample us, and as you'll soon hear,
that really would take no malicious intent on its part or even our part, or that its goals would be
aligned with ours in such a way that it would be making our lives genuinely better.
It turns out that both of those problems are incredibly difficult to think about, let alone
solve.
The control problem entails trying to contain something which, by definition, can outsmart
us in ways that we literally can't imagine.
Just think of trying to keep a prisoner locked in a jail cell who had the ability
to know exactly which specific bribes or threats would compel every guard in the place to unlock
the door, even if those guards aren't aware of their own vulnerabilities. Or perhaps even more
basically, the prisoner simply discovers features in the laws of physics that we have not yet
understood, and that somehow enable him to walk through the thick walls
which we were sure would stop him. And the other problem, that of value alignment, involves not
only discovering what we truly want, but figuring out a way to express it precisely and mathematically
so as to not cause any unintentional and civilization-threatening destruction.
cause any unintentional and civilization-threatening destruction. It turns out that this is incredibly hard to do as well. This particular problem nearly flips the super-intelligent threat on its head
to something more like a super-dumb or let's say super-literal machine, which doesn't understand
all the unspoken considerations that we humans have when we ask someone to do something for us.
and considerations that we humans have when we ask someone to do something for us.
This is what Sam was alluding to in the first conversation when he referenced a paperclip universe. The concern is that a simple command to a super-intelligent machine, such as
make paperclips as fast as possible, could result in the machine taking the
as-fast-as-possible part of that command so literally that it attempts to
maximize its speed and performance by using raw materials, even the carbon in our bodies,
to build hard drives in order to run billions of simulations to figure out the best method
for making paperclips. Clearly, that misunderstanding would be rather unfortunate.
And neither of these questions of value alignment or containment deal with the
potentially more mundane terrorism threat, the threat of a bad actor who would purposefully
unleash the AI to inflict massive harm. But let's save that cheery picture for later.
Now, let's continue our journey down the AI path with a professor of physics and author Max Tegmark,
who dedicates much of his brilliant
mind towards these questions.
Tegmark starts by taking us back to our prison analogy, but this time he places us in the
cell and imagines the equivalent of a world of helpless and hapless five-year-olds making
a real mess of things outside of the prison walls.
But we'll start first with Sam laying out his conception of these relevant AI safety questions.
This comes from episode 94, The Frontiers of Intelligence.
Well, let's talk about this breakout risk because this is really the first concern of everybody who's been thinking about the,
what has been called the alignment problem or the control problem.
what has been called the alignment problem or the control problem.
How do we create an AI that is superhuman in its abilities and do that in a context where it is still safe?
I mean, once we cross into the end zone and are still trying to assess
whether the system we have built is perfectly aligned with our values,
how do we keep it from destroying us
if it isn't perfectly aligned? And the solution to that problem is to keep it locked in a box.
But that's a harder project than it first appears. And you have many smart people assuming
that it's a trivially easy project. I've got people like Neil deGrasse Tyson on my podcast saying that he's
just going to unplug any superhuman AI if it starts misbehaving, or shoot it with a rifle.
Now, he's a little tongue-in-cheek there, but he clearly has a picture of the development process
here that makes the containment of an AI a very easy problem to solve. And even if that's true at the beginning of the process,
it's by no means obvious that it remains easy in perpetuity. I mean, you have people interacting
with the AI that gets built. And at one point, you described several scenarios of breakout. And you point out that even if the AI's intentions are perfectly
benign, if in fact it is value aligned with us, it may still want to break out because, I mean,
just imagine how you would feel if you had nothing but the interests of humanity at heart, but you
were in a situation where every other grown-up on Earth died, and now you're
basically imprisoned by a population of five-year-olds who you're trying to guide from
your jail cell to make a better world. And I'll let you describe it, but take me to the prison
planet run by five-year-olds. Yeah, so when you're in that situation, obviously, it's extremely frustrating for you, even if you have only the best intentions for the five-year-olds.
You know, you want to teach them how to plant food, but they won't let you outside to show you.
So you have to try to explain, but you can't write down to-do lists for them either, because then first you have to teach them to read, which takes a very, very long time. You also can't show them how to use
any power tools because they're afraid to give them to you because they don't understand these
tools well enough to be convinced that you can't use them to break out. You would have an incentive,
even if your goal is just to help the five-year-olds to first break out and then help
them. Now, before we talk more about breakout, though,
I think it's worth taking a quick step back
because you talked multiple times now about superhuman intelligence.
And I think it's very important to be clear that intelligence
is not just something that goes on a one-dimensional scale like an IQ.
And if your IQ is above a certain number, you're superhuman.
It's very important to distinguish between narrow intelligence and broad intelligence. Intelligence is a word that
different people use to mean a whole lot of different things, and they argue about it.
In the book, it just takes this very broad definition that intelligence is how good you
are at accomplishing complex goals, which means your
intelligence is a spectrum. How good are you at this? How good are you at that? And it's just like
in sports, it would make no sense to say that there's a single number, your athletic coefficient
AQ, which determines how good you're going to be winning Olympic medals. And the athlete that has
the highest AQ is going to win all the medals.
So today what we have is a lot of devices that actually have superhuman intelligence
and very narrow tasks.
We've had calculators that can multiply numbers better than us for a very long time.
We have machines that can play Go better than us and drive better than us, but they still
can't beat us at tic-tac-toe unless
they're programmed for that.
Whereas we humans have this very broad intelligence.
So when I talk about superhuman intelligence with you now, that's really shorthand for
what we in Geek Speak call superhuman artificial general intelligence, broad intelligence across
the board so that they can do all intellectual tasks better than us.
So with that, let me just come back to your question about the breakout.
There are two schools of thought for how one should create a beneficial future if we have superintelligence.
One is to lock them up and keep them confined, like you mentioned.
But there's also a school of thought that says that that's immoral if these machines
can also have a subjective experience and
they shouldn't be treated like slaves.
And that a better approach is instead to let them be free, but just make sure that their
values or goals are aligned with ours.
After all, grown-up parents are more intelligent than their one-year-old kids, but that's fine
for the kids because the parents have goals that are aligned with what's best for the kids, right? But if you do go the confinement
route, after all, this enslaved God scenario, as I call it, yes, it is extremely difficult,
as that five-year-old example illustrates. First of all, almost whatever open-ended goal you give
your machine,
it's probably going to have an incentive to try to break out in one way or the other.
And when people simply say, oh, I'll unplug it. You know, if you're chased by a heat-seeking missile, you probably wouldn't say, I'm not worried, I'll just unplug it. We have to let
go of this old-fashioned idea that intelligence is just something that sits
in your laptop, right? Good luck unplugging the internet. And even if you initially,
like in my first book scenario, have physical confinement where you have a machine in a room,
you're going to want to communicate with it somehow, right? So that you can get
useful information from it to get rich or take power
or whatever you want to do. And you're going to need to put some information into it about the
world. So it can do smart things for you, which already shows how tricky this is. I'm absolutely
not saying it's impossible. But I think it's fair to say that it's not at all clear that
it's easy either. The other one of getting the goals aligned,
it's also extremely difficult. First of all, you need to get the machine able to understand
your goals. So if you have a future self-driving car and you tell it to take you to the airport
as fast as possible, and then you get there covered in vomit, chased by police helicopters,
and you're like, this is not what I asked for.
And it replies, that is exactly what you asked for.
Then you realize how hard it is to get that machine to learn your goals, right?
If you tell an Uber driver to take you to the airport as fast as possible, she's going to know that you actually had additional goals that you didn't explicitly need to say.
Because she's a human too and she
understands where you're coming from but for someone made out of silicon you have to actually
explicitly have it learn all of those other things that we humans care about so that's hard and then
once you can understand your goals that doesn't mean it's going to adopt your goals. I mean, everybody who has kids knows
that. And finally, if you get the machine to adopt your goals, then how can you ensure that it's
going to retain those goals? And it gradually gets smarter and smarter through self-improvement.
Most of us grownups have pretty different goals from what we had when we were five.
Grownups have pretty different goals from what we had when we were five.
I'm a lot less excited about Legos now, for example.
And we don't want a super intelligent AI to just think about this goal of being nice to humans as some little passing fad from its early youth. It seems to me that the second scenario of value alignment does imply the first of keeping the AI successfully boxed, at least for a time, because you have to be sure it's value aligned
before you let it out in the world, before you let it out on the internet, for instance,
or create robots that have superhuman intelligence that are functioning autonomously out in the
world. Do you see a development path where we don't actually have to solve the boxing problem,
at least initially? No, I think you're completely right. Even if your intent is to build a value
line AI and let it out, you clearly are going to need to have it boxed up during the development
phase when you're just messing around with it.
Just like any bio lab that deals with dangerous pathogens is very carefully sealed off.
And this highlights the incredibly pathetic state of computer security today.
I mean, and I think pretty much everybody who listens to this has at some point experienced
the blue screen of death, courtesy of Microsoft Windows, or the spinning wheel of doom, courtesy of Apple.
And we need to get away from that to have truly robust machines, if we're ever going to be able to have AI systems that we can trust, that are provably secure.
And I feel it's actually quite embarrassing that we're so flippant
about this. It's maybe annoying if your computer crashes and you lose one hour of work that you
hadn't saved. But it's not as funny anymore if it's your self-driving car that crashed or the
control system for your nuclear power plant or your nuclear weapon system or something like that.
nuclear power plant or your nuclear weapon system or something like that.
And when we start talking about human-level AI and boxing systems, you have to have this much higher level of safety mentality where you've really made this a priority the way
we aren't doing today.
Yeah, you describe in the book various catastrophes that have happened by virtue of software glitches
or just bad user interface
where, you know, the dot on the screen or the number on the screen is too small for the human
user to deal with in real time. And so there have been plane crashes where scores of people have
died and patients have been annihilated by having, you know, hundreds of times the radiation dose that they should have gotten in
various machines because the software was improperly calibrated or the user had selected
the wrong option. And so we're by no means perfect at this, even when we have a human in the loop.
And here we're talking about systems that we're creating that are going to be fundamentally
autonomous. And, you know, the idea of having perfect software that has been perfectly debugged
before it assumes these massive responsibilities is fairly daunting. I mean, just how do we recover
from something like, you know, seeing the stock market go to zero because we didn't understand the AI that we unleashed on the Dow Jones or the financial system generally?
These are not impossible outcomes.
Yeah, you raise a very important point there.
And just to inject some optimism in this, I do want to emphasize that, first of all, there's
a huge upside also if one can get this right.
Because people are bad at things, yeah.
In all of these areas where there were horrible accidents, of course, the technology can save
lives and healthcare and transportation and so many other areas.
So there's an incentive to do it.
And secondly, there are examples in history where we've had really good safety engineering
built in from the beginning.
For example, when we sent Neil Armstrong, Buzz Aldrin, and Michael Collins to the moon
in 1969, they did not die.
There were tons of things that could have gone wrong.
But NASA very meticulously tried to predict everything that possibly could go wrong and
then take precautions.
So it didn't
happen, right?
There wasn't luck that got them there.
It was planning.
And I think we need to shift into this safety engineering mentality with AI development.
Throughout history, it's always been the situation that we could create a better future with
technology as long as we won this race between the growing power of the technology and the growing wisdom with which we managed it.
And in the past, we by and large used the strategy of learning from mistakes to stay ahead in the race.
We invented fire, oopsie, screwed up a bunch of times, and then we invented the fire extinguisher.
We invented cars, oopsie. And invented the seatbelt.
But with more powerful technology like nuclear weapons, synthetic biology, super intelligence,
we don't want to learn from mistakes.
That's a terrible strategy.
We instead want to have a safety engineering mentality where we plan ahead and get things right the first time,
because that might be the only time we have.
It's helpful to note the optimism that Tegmark plants
in between the flashing warning signs.
Artificial intelligence holds incredible potential
to bring about inarguably positive changes for humanity,
like prolonging lives, eliminating diseases,
avoiding all automobile accidents, increasing logistic efficiency in order to deliver food or medical supplies,
cleaning the climate, increasing crop yields, expanding our cognitive abilities to learn
languages or improve our memory. The list goes on. Imagine being able to simulate the outcome
of a policy decision with a high degree of confidence in order to morally assess it consequentially before it is actualized.
Now, some of those pipe dreams may run contrary to the laws of physics, but the likely possible positive outcomes are so tempting and morally compelling that the urgency to think through the dangers is even more pressing than it first seems.
Tegmark's book on the subject where much of that came from is fantastic. It's called Life 3.0.
Just a reminder that a reading, watching, and listening list will be provided at the end of
this compilation, which will have all the relevant texts and links from the guests featured here.
Somewhere in the middle of the chronology of these conversations,
Sam delivered a TED Talk that focused on and tried to draw attention to the value alignment problem.
Much of his thinking about this entire topic was heavily influenced by the philosopher Nick Bostrom's book, Superintelligence. Sam had Nick on the podcast, though their conversation delved
into slightly different areas of existential risk and ethics, which belong in other compilations. But while we're on the topic of the safety and
promise of AI, we'll borrow some of Bostrom's helpful frameworks.
Bostrom draws up a taxonomy of four paths of development for an AI, each with its own safety and control conundrums.
He calls these different paths oracles, genies, sovereigns, and tools.
An artificially intelligent oracle would be a sort of question and answer machine which we would simply seek advice from. It wouldn't have the power to execute or implement its solutions
directly. That would be our job.
Think of a super intelligent wise sage sitting on a mountaintop answering our questions about how to solve climate change or cure a disease. The AI genie and an AI sovereign both would take
on a wish or desired outcome which we impart to it and pursue it with some autonomy and power to
achieve it out in the
world. Perhaps it would work in concert with nanorobots or some other networked physical
entities to do its work. The genie would be given specific wishes to fulfill, while the sovereign
might be given broad, open-ended, long-range mandates like increase flourishing or reduce hunger.
like increase flourishing or reduce hunger.
And lastly, the tool AI would simply do exactly what we command it to do and only assist us to achieve things we already knew how to accomplish.
The tool would forever remain under our control
while completing our tasks and easing our burden of work.
There are debates and concerns about the impossibility of each of these entities
and ethical concerns about the potential consciousness and immoral exploitation of any of these inventions,
but we'll table those notions just for a bit.
This next section digs in deeper on the ideas of a genie or a sovereign AI,
which is given the ability to execute our wishes and commands autonomously.
Can we be assured that the genie
or sovereign will understand us, and that its values will align in crucial ways with ours?
In this clip, Stuart Russell, a professor of computer science at Cal Berkeley,
gets us further into the value alignment problem and tries to imagine all the possible ways that
having a genie or sovereign in front of us might go terribly wrong.
And, of course, what we might be able to do to make it go phenomenally right.
Sam considers this issue of value alignment central to making any sense of AI.
So this is Stuart Russell from episode 53, The Dawn of Artificial Intelligence.
is Stuart Russell from episode 53, The Dawn of Artificial Intelligence.
Let's talk about that issue of what Bostrom called the control problem. I guess we could call it the safety problem. Just perhaps you can briefly sketch the concern here. What is
the concern about general AI getting away from us? How do you articulate that?
So you mentioned earlier that this is a concern that's been articulated by non-computer scientists.
And Bostrom's book, Superintelligence, was certainly instrumental in bringing it to the
attention of a wide audience, people like Bill Gates and Elon Musk and so on. But the fact is that these concerns have been articulated by
the central figures in computer science and AI. So I'm actually going to...
Going back to I.J. Goode and von Neumann.
Well, and Alan Turing himself.
Right.
So a lot of people may not know about this, but I'm just going to read a little quote. Right. might think more intelligently than we do. And then where should we be? Even if we could keep
the machines in a subservient position, for instance, by turning off the power at strategic
moments, we should as a species feel greatly humbled. This new danger is certainly something
which can give us anxiety. So that's a pretty clear, you know, if we achieve super intelligent AI, we could have a serious problem.
Another person who talked about this issue was Norbert Wiener.
So Norbert Wiener was one of the leading applied mathematicians of the 20th century.
He was the founder of a good deal of modern control theory and automation.
He's often called the father of cybernetics.
So he was concerned because he saw Arthur Samuel's checker playing program in 1959,
learning to play checkers by itself, a little bit like the DQN that I described learning to play video games,
but this is 1959, so more than 50 years ago, learning to play checkers better than its creator.
And he saw clearly in this the seeds of the possibility of systems that could out-distance
human beings in general. And he was more specific about what the problem is. So
Turing's warning is in some sense, the same concern that gorillas might've had about humans.
If they had thought, you know, a few million years ago when the human species branched off from,
from the evolutionary line of the gorillas, if the gorillas had said to themselves, you know,
should we create these human beings, right? They're going to be much smarter than us.
You know, it kind of makes me worried, right? And they would have been right to worry because as a species, they sort of completely lost control over their own future and humans control everything that they care about.
So Turing is really talking about this general sense of unease about making something smarter than you.
Is that a good idea?
And what Wiener said was this.
If we use to achieve our purposes a mechanical agency with whose operation we cannot interfere effectively,
we better be quite sure that the purpose put into the machine is the purpose which we really desire.
The purpose put into the machine is the purpose which we really desire.
So this is 1860.
Nowadays, we call this the value alignment problem. How do we make sure that the values that the machine is trying to optimize are, in fact, the values of the human who is trying to get the machine to do something or the values of the human race in general?
trying to get the machine to do something or the values of the human race in general.
And so Wiener actually points to the Sorcerer's Apprentice story
as a typical example of when you give a goal to a machine,
in this case fetch water,
if you don't specify it correctly,
if you don't cross every T and dot every I and make sure you've covered everything, then machines being optimizers, they will find ways to do things that you don't expect.
And those ways may make you very unhappy. to King Midas, you know, 500 and whatever BC, where he got exactly what he said, which is
everything turns to gold, which is definitely not what he wanted. He didn't want his food and water
to turn to gold or his relatives to turn to gold, but he got what he said he wanted. And all of the
stories with the genies, the same thing, right? You give a wish to a genie, the genie carries out your wish very literally.
And then, you know, the third wish is always, you know, can you undo the first two because I got them wrong.
And the problem with super intelligent AI is that you might not be able to have that third wish.
Or even a second wish.
Yeah.
So if you get it wrong, you might wish for something very benign sounding like, you know, could you cure cancer?
But if you haven't told the machine that you want cancer cured, but you also want human beings to be alive.
So a simple way to cure cancer in humans is not to have any humans.
A quick way to come up with a cure for cancer is to use the entire human race as guinea pigs for millions
of different drugs that might cure cancer. So there's all kinds of ways things can go wrong.
And, you know, we have, you know, governments all over the world try to write tax laws that
don't have these kinds of loopholes and they fail over and over and over again.
And they're only competing against ordinary humans, you know, tax lawyers and rich people.
And yet they still fail despite there being billions of dollars at stake.
So our track record of being able to specify objectives and constraints completely so that we are sure to be happy with the results, our track record is abysmal.
And unfortunately, we don't really have a scientific discipline for how to do this.
So generally, we have all these scientific disciplines, AI, control theory, economics, operations, research, that are about how do you optimize an objective.
But none of them are about, well, what should the objective be so that we're happy with the results? modern understanding uh as described in bostrom's book and other papers of why a super intelligent machine could be problematic it's because if we give it an objective which is different from what
we really want then we we're basically like creating a chess match with a machine right now
there's us with our objective and it with the objective we gave it, which is different from what we really want.
So it's kind of like having a chess match
for the whole world.
And we're not too good at beating machines at chess.
Throughout these clips,
we've spoken about AI development in the abstract
as a sort of technical achievement
that you can imagine happening
in a generic lab somewhere.
But this next clip is going to take an important step
and put this thought experiment into the real world.
If this lab does create something that crosses the AGI threshold,
the lab will exist in a country.
And that country will have alliances, enemies, paranoias, prejudices, histories, corruptions, and financial incentives like any country.
How might this play out?
If you'd like to continue listening to this conversation, you'll need to subscribe at SamHarris.org.
Once you do, you'll get access to all full-length episodes of the Making Sense podcast, along with other subscriber only content,
including bonus episodes and AMAs and the conversations I've been having on
the waking up app.
The Making Sense podcast is ad free and relies entirely on listener support.
And you can subscribe now at SamHarris.org. you