Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 94 | Stuart Russell on Making Artificial Intelligence Compatible with Humans

Starting point is 00:00:00 This June, the world comes to Los Angeles. Kick off FIFA World Cup 2026 at the FIFA Fan Festival at the iconic Los Angeles Memorial Coliseum. Watch matches live on Giant Scream. Feel every goal with thousands of fans. And celebrate with music, culture, and flavors from around the world. Join us June 11th through 14th opening weekend as the tournament kicks off in Los Angeles. Tickets are just $10 and kids under 12 were free. Get yours now at Los Angeles, FWC26.com.

Starting point is 00:00:30 Hello, everyone, and welcome to the Mindscape podcast. I'm your host, Sean Carroll. I hope everyone is more or less staying home and staying safe in these surreal times that we're in with the COVID-19 quarantine, lockdown, etc. It's very challenging out there for everyone. As I've said, frequently, probably less so for me than most people, since I basically work from home and I have a pretty comfortable situation here. But it's different. You know, you don't get to see your friends. Don't get to socialize. don't get to eat out, don't get to go to class in the same way. One thing that struck me is how really different it would have been if something like this happened before we had computers and the Internet, right? Of course, plagues and quarantines and pandemics did happen, but just the idea that we couldn't talk to each other over Zoom or send emails or get updates, I think it would be much worse. It would certainly be very, very different.

Starting point is 00:01:26 So it's appropriate that we once again turn to the world of computers, in this week's episode of Mindscape, and in particular the challenge of artificial intelligence. We all know that AI is becoming more and more important, not just as some hypothetical future thing that will talk to us and be our friend, but right here, right now, programs that figure out puzzles in a way

Starting point is 00:01:49 that you could think of as being intelligent. Now, there's a looming problem here that everybody knows about, which is as the AIs become better and better, as they become smarter and smarter, more and more powerful, it's going to become increasingly important that they act in ways that are what we want them to do, right? They're intelligent.

Starting point is 00:02:10 They can figure out things to do for themselves. It would be bad if what they wanted to do in some sense, and we can figure out the extent to which the word want is appropriate in the case of a computer, but what their goals are are the same as what we want them to be. And this can be very difficult because as anyone who is, ever programmed a computer knows, it is very difficult to say what you want. It is very difficult to put into precise algorithmic terms what it is you're looking for. That's why computer programming is hard. And so it's very easy to imagine asking a computer, asking an AI to do a certain

Starting point is 00:02:46 thing, and then realizing at the end of the day, we didn't really want that. We wanted something that was that, but there were some constraints that we forgot to mention. So today's guest is Stuart Russell. He's a very big name in artificial intelligence. Stuart and Peter Nordvig wrote what is the classic textbook on AI called Artificial Intelligence, A Modern Approach. Even I have this book on my bookshelf, and I'm not really an AI person, so that shows you the pervasiveness of that particular text. And recently, Stewart has been thinking about exactly this problem, you know, making sure that AIs sort of do the things that we wanted them to do when we designed them. So he has a new book out called Human Compatible, Artificial Intelligence and

Starting point is 00:03:28 the Problem of Control, and to really oversimplify it, his proposal is that rather than telling the AIs what we want, we will teach the AIs to learn from us what we want. Maybe we don't even know what we want, but maybe the AIs can watch our actual behavior, and from that work backwards, to how they should be behaving to do what it is we want. Now, this opens a whole bunch of other questions, right? So that's what we talk about on the podcast, you know, how realistic this is, how much we should be worried about super intelligent AI and what it means for the future more generally. So anyway, I hope that, you know, the Mindscape podcast is contributing a little bit to taking our minds off of the situation we're in with the pandemic. You know, I don't want to,

Starting point is 00:04:12 one, one alternative would be, you know, to switch all efforts into thinking about pandemics and epidemiology and things like that. But I figured there's a, plenty of people with more background knowledge than I am doing those things. So I'm going to try to keep Minescape more or less the same kind of thing it always has been. I think it's still important to keep thinking about other questions. You know, pandemics and virology are part of that. Medicine and networks and how people communicate. It's all part of that. Those are interesting things to talk about, and I'm sure we will. But we'll also be talking about physics and biology and philosophy and culture and the media and so forth. Remember that I'm also

Starting point is 00:04:52 doing a set of videos called the biggest ideas in the universe that are appearing on YouTube once a week. Once a week I do a big idea. I was really hoping to, you know, make these around 20 minutes per idea. They've crept up to more like an hour with me talking and a scribbling little pictures and very primitive equations, but I hope people are finding them fun. So far has been, you know, more or less classic ideas like space and time and things like that, but soon enough we'll be getting into some pretty wild ideas, which I I think are, you know, not, I'm not going to be speculating too much when I choose the ideas themselves. We're not going to be doing like string theory or loop quantum gravity or whatever,

Starting point is 00:05:31 but some of the ideas that are part and parcel of modern physics are pretty wild, even the established ones. So we'll be getting into those, and I hope you enjoy those also. And with that, let's go. Stuart Russell, welcome to the Mindscape podcast. It's nice to be here. We're going to talk about artificial intelligence, the warnings, the worries that we have about artificial intelligence and also the prospects of it. But I think that it would be a wonderful opportunity here, since you not only are someone who has specific ideas about fixing any dangers of artificial intelligence, but you're also someone who has been doing research in this field, literally wrote the textbook, etc., to talk about some of the background here. So I thought I would just start by asking you, in your mind, what does the phrase, artificial intelligence? intelligence actually mean? How do you define that term? So obviously artificial intelligence is about making machines intelligent and what that has turned into actually from right at the beginning of the

Starting point is 00:06:47 field is an approach to building systems that are given objectives by humans and then find ways to carry out those objectives. So the identities might be goals or can you find a plan? And that achieves this goal. It might be a reward function. Can you behave in a way that that generates the largest amount of reward? For example, can you make money on the stock market? So this has been the way that the general idea of intelligence has been instantiated to build systems that act in ways that can be expected to achieve the objectives that we set for them. And is there some way of sort of definitively saying when a certain computer program is AI versus just a regular computer program? I mean, in some sense, isn't a pocket calculator optimized to add numbers together correctly?

Starting point is 00:07:43 Yes, and I think that's part of the difficulty. There really is a continuum between something as simple as a thermostat that switches the heat on when it gets cold and switches it off when it gets warm, all the way up to humans and even beyond. And the continuum is mainly in the nature of the task environment, how complicated is the world that the entity has to deal with. And then what is the actual objective? And some objectives can be achieved, even in a complicated world can be achieved with very, very simple rules. And some, for example, winning at a game of go or running a country or, you know, teaching someone to speak French. these are very, very complicated goals, and they take place in a very, very complicated task environment,

Starting point is 00:08:36 which is an environment that includes human beings. And that's probably the most complicated environment we know of is the one where there are humans that are part of it. But you would say, I mean, I like the point that there's a continuum here. It's not like that there is some sharp phase transition between dumb computer programs and artificially intelligent ones. That's right. In fact, there's a nostrum that's been put about a long time, that as soon as it works, it stops being artificial intelligence. I know.

Starting point is 00:09:05 Which is a little bit unfair, because of course that means that AI is the field of continual failure. But that I think that's in some ways accurate, that a lot of things that AI has produced over the years. For example, every time you drive in your car, and you get directions, there's an AI algorithm running in your cell phone or maybe in the cloud that is computing shortest paths and incorporating expected delays and so on. This is a very classical AI algorithm that was developed in the late 60s and early 70s. And no one thinks of that as AI anymore, but of course, you know, someone had to invent it.

Starting point is 00:09:51 And before it was invented, no one knew how to do it. and, you know, we now, I think, take speech recognition for granted, but that's another part of AI that's been gradually improving for the last 50 years or so. But now that it works, and it's just one of the things that runs on your cell phone, you don't think of it as AI anymore. And that's a pity because it also means that there's a tendency, I think, for students to not want to study things that we already know how to do, which means, of course, that the students don't learn how to do anything.

Starting point is 00:10:32 That works, right. But, I mean, even you mentioned playing go as one of the more complicated tasks, but I would sort of put playing go more on the finding a route to your destination side of the ledger and driving a car or writing a poem on another side. if we think about the fact that the rules of Go are extremely explicitly defined, and even though the state space is huge, the space of possible things that can be going on, it's still finite, right? I mean, there is a right answer to every question, whereas these more human-scale things are

Starting point is 00:11:05 a little bit fuzzier. Is that a relevant distinction, you think? It is, yes. I think there's multiple distinctions being made here. One is that, as you say, the rules of Go are known. So the objective is to win, and that has a precise definition that varies slightly from country to country, but pretty much agreed on. And you know that when you put a piece on a square, the piece ends up on that square. It doesn't sort of run around by itself and go somewhere else.

Starting point is 00:11:37 And of course, when you're driving, you don't even know the rules. You don't know what's going to happen when some other entity is involved. making decisions about where they're going to go, which will affect your own ability to get what you want done, done. And the other part of driving that's difficult is that the objective is also not known. So obviously, you would like, you know, if you say take me the airport, you would like to get to the airport. But you don't mean take me to the airport at all costs.

Starting point is 00:12:14 you know so if there's an earthquake and the bay bridge falls down again then that doesn't mean you know drive me 200 miles around and get me to the airport an hour after my flight leaves it doesn't mean you know plunge into the into the bay and try to swim across the bay um when you give a human an objective like take me to the airport the human understands under what circumstances is it reasonable to give up on that. Plus, there are many different ways of getting to the airport. Is it that you take the shortest route?

Starting point is 00:12:50 Is it that you take the fastest route? And then how fast do you drive? How safely do you drive? Do you go right up to the edge of catastrophe at all times in the hope of getting to the airport a little bit quicker? And, you know, I've taken taxis where that was how they drove. And you've taken other taxis where you just want to say, look, can you just get on with it?

Starting point is 00:13:13 because they're extremely cautious. So there are many ways of driving, and there isn't an actually agreed-on objective for what counts as good driving and how that depends on the circumstance and the urgency and so on. So driving is different from Go in both of these dimensions, and it's an unsolved problem right now,

Starting point is 00:13:40 whereas I think we would say that Go, to the extent that it can be solved, and there are mathematical reasons to believe that we probably can't solve it exactly, but to the extent that we can come up with good algorithms, we consider the game go to be one where AIs had success. We've beaten the human world champions and actually taught humans things about the game that they didn't know before. So you mentioned an idea that I think is worth sort of lingering on and drawing out because it's going to be important for the rest of the conversation, the idea of optimizing for some goal, right? I mean, probably if you told a person on the street that that's what an AI program was trying to do, they might not instantly see the connection between ordinary human intelligence. I mean, do you think that we can think of all intelligence as optimizing towards some goal or maximizing some utility function or something like that? It's a good question, and I would say that, It's taken a few thousand years to convince at least one branch of the intellectual community that that's a good way of thinking. And the branch that's most convinced about that would be the economists.

Starting point is 00:14:59 Because they've reached by the 1940s, they reached a formulation of rational decision-making that is quite general. So it allows for the decision maker to have uncertainty about the state of the world, about the future. It does, at least in the original formulation, assume that the decision maker has preferences. And this is the way economists think about it, that if you think about each possible future that could unfold, in principle I could express a preference as to which future I love. like better than which other future. And that's the basis for then the economic notion of rationality as utility maximization. It doesn't mean that you have to have a number in your head, which is the utility of each future, and then you have to add them all up and divide by the

Starting point is 00:16:01 probabilities and so on. It simply means that you act as if you did, and you can be described, as long as you're not making mistakes, then your behavior can be described as if that's what you were doing. It doesn't mean that you are actually doing that. So, for example, if I try to poke myself in the eye with my finger, my eyelid closes. Now, my brain is not doing an expected utility calculation to reason that I ought to close my eyes

Starting point is 00:16:32 because having a fingernail poking my eyeball is painful and dangerous, and that will be a low utility, and so if I close my eye and then I'll protect it. No such thing. It's just an instantaneous reaction to a looming object in the visual field, but it's still a rational thing to do. In other words, if you were to do the expected utility calculation, it would tell you to close your eyes, but you don't have to do that calculation in order to close your eyes. I'm visualizing thousands of podcast listeners trying to poke themselves in the eyes as we speak

Starting point is 00:17:04 right here. Hopefully. Yeah, so Max Tagmark said it didn't work. He tried to poke himself in the eye and he succeeded. Yeah, well, someone who used to wear contacts, I think I have no trouble for doing that. I love recommending The Great Courses Plus. This is a streaming service that gives you access to a huge variety of courses from the great courses, taught by some of the best professors in the world on a wide variety of topics.

Starting point is 00:17:28 You can learn about everything from playing guitar to practicing yoga, to history, to science, to literature. I, of course, am one of the professors at the Great Courses. I recently did my little video on the nature of time for the biggest ideas in the universe. I did a whole course for the Great Courses on Mysteries of Time. So if the hour-long version was not enough, if you want the 12-hour-long version, go to the Great Courses Plus, look for mysteries of modern physics time, learn about entropy and the arrow of time, but also about timekeeping, about time in particle physics, about the psychology and neuroscience,

Starting point is 00:18:04 philosophy of time. The Great Courses Plus is giving Mindscape listeners a special offer, a free trial with unlimited access to the entire library. Find all the details at the greatcoursesplus.com slash mindscape. That's T-H-E great courses, plus, dot com slash mindscape. Start learning today. I do, and this is probably leaping ahead, and that's okay, we don't need to go in logical order, but you quote a theorem in your book. You wrote a wonderful book called Human Competable. Actually, I'm not just saying that as boilerplate. I really enjoyed the book, and I encourage everyone to take a look at it.

Starting point is 00:18:43 So it's a theorem by von Neumann and Morganstern, which, if I understand it correctly, I've heard of it before and then was reintroduced to it in your book. Roughly, it's that conclusion you just said, that if you're behaving rationally, then there's always a way to cast the decisions you're making in terms of maximizing some utility function. Is that right? Yeah. So it starts from a description of what we mean by behaving rationally that's pretty hard to argue with. And that's how these kinds of proofs usually go. You propose some axioms that sound completely reasonable. And then you draw what appears to be a very strong conclusion. So the axioms say things like, you know, if you prefer sausage pizza

Starting point is 00:19:27 to plain pizza and you prefer plain pizza to pineapple pizza, then you prefer sausage pizza to pineapple pizza. Yeah. Right? This is called transitivity of preferences. And what that basically says is your preferences don't go round in a circle. And that seems perfectly reasonable because if your preferences did go round in a circle, then basically you would pay me to go in the other direction. So if you liked sausage better than plain and you had a plain pizza, you'd pay me to get the sausage pizza.

Starting point is 00:20:01 And then if you liked pineapple better than sausage, you would pay me to get the pineapple. And then if you liked pineapple better than plain, so your preferences are now in a circle, you'd pay me to get the plane. So now you paid me three times and you're back where you started. So that doesn't seem very rational. So the axioms of rational behavior have this flavor, but they're really hard to argue with. you'd have to agree that an entity that didn't follow those axioms would basically be shooting itself in the foot all the time. And then the consequence of these axioms is that the behavior of a rational agent can be described as if it's maximizing a utility function. Good, yeah.

Starting point is 00:20:43 And so that was the theorem of von Neumann and the main theorem in that book. But I think you've also already hinted at a little bit of attention there, right? because there's lots of mathematical theorems that say something is possible in principle. But as you already mentioned, with the poking the eye example, for actual systems that do something we call reasoning, they might not implement that design in order to get things done. But is it safe to say that most AI algorithms do try to implement something like that, that they do actually sort of optimize over a utility function

Starting point is 00:21:20 over a whole bunch of hypothetical actions? Yes and no. So the biggest obstacle from the point of view of AI is that actually optimizing in many situations is what we call computationally intractable, which roughly means that if you really want your algorithm to come up with the optimal solution, you're going to have to run it for, you know, until the universe ends. So billions and billions of years of computation. and then if you make the problem twice as big, it's going to be a billion times more than that. So these kinds of problems, which we call computationally attractable, we cannot expect that any entity is going to be able to behave rationally

Starting point is 00:22:10 if it has to solve those problems in order to get what it wants. So humans don't behave rationally, and we don't expect that machines will behave rationally. The interesting question is how much closer to being rational can machines get than humans, particularly taking advantage of their enormous computational power? So already we have computers whose actual computational throughput is larger than even the theoretical

Starting point is 00:22:45 maximum throughput of the human brain. and no one thinks that in fact the human brain really is able to use every single neuron computing at maximum speed all time. So the interesting question is, are machines going to exceed the human capability for decision making in the real world? And when will that happen and by how much will it happen? but interestingly, you know, humans have developed over both evolutionary time and during their lifetimes all kinds of incredible schemes for overcoming the computational difficulty of the problem of deciding what to do. And so even though we're not particularly good at playing Go,

Starting point is 00:23:41 and in fact a lot of games are specifically designed to not be easy for our brains. Our brains are actually extremely good at organizing our behavior into hierarchies. So, you know, if you're writing a book, let's say, at any given moment, your fingers are typing one character of one word, of one sentence, of one paragraph, of a chapter of that book. And this is a natural hierarchy. Just typing one character on the keyboard might involve hundreds of motor control commands

Starting point is 00:24:21 from your brain to your arm and your hands and your finger, as well as your eye to keep track of what's happening on the screen. If you think about what AlphaGo is doing, which is the Go program that beat the human world champion, it's able to look ahead in time maybe 50 moves into the future, which sounds pretty amazing. But if you apply that to a robot or a human body,

Starting point is 00:24:48 where those 50 moves would be 50 motor control commands to my muscles, that would only take you about a tenth of a second into the future. So it would be completely useless for real decision-making in the real world. And humans, of course, we think, usually quite a bit further than a tenth of a second into the future, sometimes even years. We plan to do a PhD, which is five or six years. In other words, a trillion motor control commands are involved in doing a PhD. So we're planning at many, many levels of abstraction, and we do it kind of seamlessly.

Starting point is 00:25:25 Somehow, miraculously, our brain always has something lined up for the present moment so that if you're speaking, your tongue and your lips are saying the next word, if you're typing, the next word is coming out of your fingers, if you're going for a walk, the right things are happening for your legs to move in alternation. And yet the same thing is happening at the scale of multiple seconds and minutes and hours and days and weeks. And it all feels seamless. We're not aware of sort of context switching. Oh, now I need to think. think about what's happening in the next millisecond. Oh, now I'm going to think about what's happening an hour and a half from now. It's all very smoothly integrated. And we really don't know

Starting point is 00:26:14 how to get machines to do this. But this is the main technique that humans use to manage the complexity of the real world. Yeah, that's a really great summary of a whole bunch of podcast interviews that I've done, you know, with people who do neuroscience and cognitive science and consciousness studies. And absolutely the message comes through that human beings are these complicated networks of subsystems, each using little heuristics to get through the day, rather than some perfect reasoning machine that is trying to play a very abstract version of Go. So, but at the end there, you said, and we're not very good at converting that into an AI algorithm. Is this something that people are trying to do? Is this a huge difference in the

Starting point is 00:26:59 approach of conventional AI and where we eventually want to be? So it's something that we've known about. In fact, Herbert Simon, who was both a psychologist and an economist who won the Nobel Prize and an AI researcher, one of the early AI researchers who really set some of the directions for the field, wrote a very famous paper called the Architecture of Complexity, where he specifically talks about this. this device of hierarchical layers of abstraction that help us to organize our behavior in time. And early planning systems, going back to the mid-70s, attempted to implement some kind of hierarchical

Starting point is 00:27:46 planning capability. So there is actually a fairly well-understood ability if I give the hierarchy to the system. In other words, if I tell it, okay, these are the primitive actions, and then these are the actions that use primitive actions to be implemented. So this is motor control command for your fingers. This is typing a character, this is typing a word, this is typing a sentence. We have algorithms that can construct long-term plans, and people have used this to, for example, organize the manufacturing process in a factory, you know, for a month.

Starting point is 00:28:21 So that month-long plan actually bottoms out at something in the order of 600 million manufacturing operations that are going to take place over that month. The difficulty is that those hierarchies, those abstract actions, have to be provided by the human. So we are doing a lot of the work. And one of the big missing pieces in AI is how does a system develop that hierarchy and invent those abstract actions for itself. I mean, there was a time when getting a PhD didn't exist. There was just no such thing. Emailing someone, Googling someone, even a few years ago,

Starting point is 00:29:06 those didn't exist. We have a library of these things at all kinds of levels of abstraction from multi-year things like getting a PhD or emigrating to the United States down to much shorter term things like Googling something or emailing someone or texting. And a lot of these things actually we don't invent individually. They're gradually formed somehow by cultural processes and we inherit them. So in some sense then the intelligence isn't really in the individual human. It's something that we've inherited from our culture. And it's incredibly powerful.

Starting point is 00:29:49 without that sort of vocabulary of, without that library of abstract actions, life would be intolerably complicated. And we simply wouldn't be able to have the civilization that we have because it would be too difficult to navigate it and to run it if we had to think of everything in terms of just the primitive operations. Yeah, I mean, you give an example in the book about, I forget exactly the contest, but there was a context, but there was a question, you know, a bunch of fourth graders are going to have a roller skating contest. What is the best kind of surface that they should choose to move on, you know, gravel, sand, blacktop or whatever? And you talk about how much background knowledge is necessary to even think about such a question. Like, what does it mean

Starting point is 00:30:42 to have a roller skating contest? What is the goal that the fourth graders have and so forth? And you seem to say in the book that right now our computers are really bad at that background knowledge. And I had a conversation with Melanie Mitchell on the podcast where she also argues about the lack of common sense in our AI algorithms. So, you know, again, is this something where everyone knows this in AI and we're working hard and we think we can crack it? Or is there a school of thought that says, well, it'll just come eventually if we keep working on what we're working on now? So that's a great question. I mean, we've known it for a long time.

Starting point is 00:31:19 The papers going back to the 60s saying that absolutely, particularly in the case of understanding language, that we need this background of common sense knowledge in order to make the right decisions. And for a long time, there's a field called knowledge representation where people tried to actually capture the knowledge in mathematical form, typically in, typically in, you know, mathematical logic and computer algorithms are able to process logic in order to do reasoning and we can even learn these logical representations from examples. People found it quite difficult to get it right. So for example, writing a description in mathematical logic of the

Starting point is 00:32:08 process of making omelets turns out to be pretty good. difficult to get right, that you can capture some parts of it with precise logical statements, but other parts of it are quite resistant. It's hard to say, you know, what you mean by, you know, not runny, but not too overcooked and things like that. So, I'm I would say that effort is kind of on hold right now. There were valiant attempts to build up large-scale knowledge bases with lots of common sense, but those have not turned out to be all that useful in practice. So right now, the AI community is working largely on approaches based on deep learning.

Starting point is 00:33:14 And in deep learning systems, there's no attempt to actually construct representations of knowledge. There's no attempt to put in common sense at all. We simply take data and we feed in the data and use the data to train very large tunable circuits to perform various tasks. So you can put in image data and train these circuits to recognize the objects that appear in the image. but, you know, having trained an image recognition system on millions of photographs of animals, that's all the system can do. So it's now an animal recogniser, but it doesn't in some sense know anything about animals. So you can't ask questions like, you know, well, which of these animals could pick up a suit?

Starting point is 00:34:14 case. Right? That's totally outside the scope of what that large train circuit is able to do. And at the moment, I don't think there are any of these machine learning systems that are capable of answering those kinds of common sense questions. You can make it appear as if they do by, and this is another, I would say, failing of the current trends in AI, you can just put in millions and millions of question-answer pairs like that. And then you hope that if I give it a new question, that there's something similar enough in the dataset that it can cobble together some answer that is right most of the time.

Starting point is 00:35:02 And there's a Japanese researcher, Nariko, Rai, who has been able to train a machine learning system to pass the entrance exam for the University of Tokyo, which is a very difficult and selective exam. That's pretty good, yeah. Most students in Japan obviously fail to do because the University of Tokyo is the premier university. But she freely admits that the system has absolutely no idea about anything.

Starting point is 00:35:30 It doesn't understand any of the questions. It has no knowledge to speak of. it's just been trained on enough examples and enough background text data that it's able to spot enough distal correlations between what the words are in the question and constituents in the question to what the answer should be. But there's no reasoning process going on at all. So my guess is, and I think people like Demis Sassabas, who's the CEO of DeepMind and one of the authors of the AlphaGo system.

Starting point is 00:36:05 So Demis has said that deep learning is going to hit a wall, basically, and we're going to have to go back and reintegrate all of the things that classical AI worked on, the symbolic techniques, the logic, reasoning, knowledge representation, and so on. And the question is, how does that integration work? How does deep learning get integrated with these kinds of symbolic techniques? And whoever figures out the answer to that question is going to be able to make the next set of breakthroughs in AI. That's a challenge for our listeners. Absolutely. Let me pause for a second to talk about the Blinkist app.

Starting point is 00:36:46 This is a wonderful little app you can put on your phone, your other mobile device, and what it is is basically a way to figure out what's going on in some of the best and most popular books ever written. A 15-minute read will tell you the major points in what could be a really long and intimidating read. I mean, we all love reading books, but there's two major problems. Number one, finding enough time to do the reading you want, and number two, deciding which books you want to read. Let's say all of your friends have been saying you really should read Zen and the Art of Motorcycle Maintenance. But just from the title, it doesn't seem necessarily up your alley. Well, you could go to Blinkist, you could get the lowdown on what's going on in that book, and from there you could decide whether to dig in further. With Blinkist, you get unlimited access to read or listen to a massive library of condensed nonfiction books, all the books you want, and all for one low price.

Starting point is 00:37:36 And right now, for a limited time, Blinkist has a special offer for Minescape listeners. Go to Blinkist.com slash Mindscape to try it free for seven days and save 25% off your new subscription. That's Blinkist, B-L-I-N-K-I-S-T, Blinkist.com slash Mindscape, start your seven-day free trial, get 25% off. This is great because we have a lot of the prospects and also worries about artificial intelligence and how similar it is to human intelligence on the table. So therefore, let's just skip ahead to the question of should we be worried or concerned or optimistic about the prospect of artificial superintelligence.

Starting point is 00:38:17 You know, given that artificial intelligence is not very good at understanding the questions on the Tokyo Entrics exam, should we be aware? worried about AI becoming? Number one, is there a prospect that will become much smarter than human beings? And number two, is that a prospect we should be worried about? So these are two really important questions. It's really hard to think of more important questions that we could be facing. So on the first question, are we going to be able to create super-intelligent AI systems. I would say that we will be able to solve these obstacles.

Starting point is 00:39:05 I think there are half a dozen major obstacles, major breakthroughs that have to happen between here and human-level AI. And I already mentioned one of them, which is the ability of systems to form their decision-making hierarchies. which allow them to cope with complexity. And so if you think of that, that's sort of one breakthrough. And as far as we can tell, there's about half a dozen of that scale that need to happen. And it might be that once we have all those breakthroughs,

Starting point is 00:39:41 the system still don't work because there's something that we didn't think of. But I think at that point we'd be pretty close. And we also have to remember, it doesn't have to be more capable, than humans along every dimension in order to be something that we have to worry about. So just to give you an example, chimpanzees, it turns out, have considerably better short-term memory than human beings, even for something that's very human like a string of digits. So chimpanzees, I think, can remember 20-digit telephone numbers without too much difficulty, and humans can't.

Starting point is 00:40:20 but I think chimpanzees still have good reason to worry about human intellectual capabilities because we are basically in the process of destroying them with the capabilities that we have. And in fact, for many species on Earth, the existential risk that they face is entirely a consequence of another species on Earth, namely us, that is more intelligent than they are. So of course, it's reasonable to ask if we create entities that are more intelligent than us, is that a good idea, right? In particular, are we going to be able to retain control, retain power over entities that are more powerful than we are? It's hard to find good examples in nature of a less powerful species having power over a more powerful species. But that's what we're being asked to, that's the problem we're being asked to

Starting point is 00:41:26 solve. So I think the public discussion of this topic is confusing a lot of different things. So one is, is super intelligent AI imminent? In other words, do I need to be worried right now that something bad is going to happen right now? And the answer is no. AI systems, at least not in the existential sense of a risk to humanity. No, there is no risk to humanity. But bad things are already happening with AI systems, even ones that are really, really, really stupid because they run on billions of computers,

Starting point is 00:42:10 particularly the algorithms that run on social media that select what you see, what you read, what you watch, what you listen to, for hours every day, for billions of people, those algorithms, I think, are already having a serious negative impact on the world. Another question is, well, should we be worried right now that something bad is going to happen in the future? And I think that's the right question to ask,

Starting point is 00:42:44 because the timelines that most people think about are on the order of decades. And I'm actually a little more conservative. I would generally say, well, I expect human level, you know, sort of seriously risky AI capabilities more towards the end of the century, rather than mid-century, which is where most AI researchers think that will have them.

Starting point is 00:43:10 So it's entirely reasonable to be concerned about that, just as if, you know, if we discovered, you know, a previously, undetected asteroid that was quite likely to hit the earth in 60 or 70 years time, it would be quite reasonable to be worried now about that future event. And actually, we are already worried about that. We already have a planetary defense agency as part of NASA. They are busy tracking all these asteroids.

Starting point is 00:43:44 They're trying to figure out what would we do if we detect. a sizable one that really was on a collision course. And so far we haven't found one that's on a collision course, but so far the statistics show that we've only detected about 30% of the objects that are going to come close to the earth and are large enough to be a civilization-ending event. So don't be too complacent even about that risk. Now, the risk from AI is a little different from an asteroid because, you know, the impact of an asteroid and why it would be a bad thing are pretty clear and straightforward.

Starting point is 00:44:32 Whereas with AI, it's a little harder to figure out exactly how things might go wrong. And the same thing would be true if you asked the chimpanzees, you know, a few thousand years ago, okay, so you've got these humans, they hear, they're much smarter than you. how is it all going to end? The chimpanzees would have a very hard time figuring out how it was all going to end for them. Because in some sense, it doesn't really matter. Once you seed control of the world and your environment and everything else to another species that really doesn't care about you, then getting into precise details about how exactly it all ends seems sort of beside the point.

Starting point is 00:45:15 the point is let's not seed control to a more powerful species if we can help it. So that's the sense in which I think it's reasonable to be concerned. We're investing billions of dollars or hundreds of billions of dollars pushing forward a technology with no plan for what to do when we succeed. Well, yeah, okay. I mean, that was great, and there's a lot going on here, so I'm just trying to figure out which direction to a tech in first. I mean, you mentioned that we're just overwhelmed with artificial intelligence programs doing work for us already.

Starting point is 00:46:00 None of them are anything that you would call super intelligent. And so there is obviously a risk just from not understanding what they're doing, but that's a different risk than I think that you're highlighting here at the end in, a more intelligent species than us. Can we be a little bit more definite about things like species here? I mean, how do you imagine this super intelligent AI existing, being embodied? Is it, you know, someone literally writes a program in C++, or is it sort of emergently arising out of the internet?

Starting point is 00:46:35 Is it in a robot? Is it distributed in the cloud? What should we be visualizing? So I certainly think that the typical science fiction depiction for obvious reasons is that it's in the head of a single robot. And I think that's a huge mistake. It's extremely unlikely that that's how things would go. So it's much more likely that it would be a composite system drawing on computational power of the entire cloud or a good fraction of it. some people would say that

Starting point is 00:47:09 the negative consequences could emerge from the sort of unanticipated interaction of many different systems and we've already seen this in the so-called flash crash which was when a lot of trading algorithms on the stock market somehow interacted in a way that we still don't really understand and then in the course of about 20 minutes, they wipe the trillion dollars off the value of the stock market.

Starting point is 00:47:41 And we had to shut the stock market down, and we had to undo all those trades. And then, you know, fortunately, the stock market had these built-in circuit breakers, which allowed the intervention to take place. But when that kind of thing happens, it has an effect in the real world. It really shook the confidence of investors in the reasonable operation of the market,

Starting point is 00:48:09 and it could have been much worse. So these unanticipated interactions are a possibility. But I think the failure mode that we anticipate is the consequence of the standard model of AI, whereby we make this optimizing machinery and we put in objectives. And then it turns out that the objective, is incomplete or incorrect in some way. And this is not a new story, right? We go back to King Midas, who said,

Starting point is 00:48:40 I want everything I touch to turn to gold. And so that's the objective that he put into the machine, so to speak, and he got exactly what he asked for. And then he realized that now his food and his drink is turning to gold and his family is turned to gold, and then he dies in misery and starvation. So the standard model is, in fact, I think, an engineering mistake.

Starting point is 00:49:05 It's a way of creating systems that relies on perfection on the part of humans in our ability to specify objectives. And that's a really, it's a really bad idea to assume perfection on the part of humans, particularly when the thing you're putting an objective, into is potentially more powerful than human beings are. So it could have physical appendages, so it might reside in the cloud, but it could still have physical appendages

Starting point is 00:49:45 with which it can bring about changes in the world. But it doesn't even need to do that. You know, when you think about Adolf Hitler and the changes he brought about, it wasn't because of his physical appendages. It was because he basically, did it with words. And AI systems can have impact on the world through words, through communication and the

Starting point is 00:50:11 internet, as they already are, as social media algorithms are already impacting the world, even though they have no arms and legs, and they have no laser guns and nuclear bombs and all the rest. Yeah, but I mean, I think this is exactly what I want to sort of, maybe it's an unfair question, but I want to really try to drill in on the physical manifestation. of this. I mean, there is the threat from artificial stupidity,

Starting point is 00:50:36 as you said, you know, algorithms that don't really do their job well or are biased or whatever. Yeah, so what I'm saying is precisely not that the algorithm is stupid,

Starting point is 00:50:46 that it's not doing its work correctly, is that we are specifying its work incorrectly. Sure. Okay. But it, the algorithm, is extraordinarily capable at carrying out the assigned objective, the signed task,

Starting point is 00:51:05 in ways that turn out to have extremely bad side effects. So if you don't believe that's possible, then just think about the fossil fuel industry and imagine the fossil fuel industry as an algorithm. Now, it happens to have human components that are replaceable and are replaced on a regular basis, But as an algorithm, it's trying to optimize the objective of generating a discounted stream of quarterly profits or whatever it might be, and is in the process of destroying the world. And we are unable to stop it.

Starting point is 00:51:46 So this is a failure mode that we are already familiar with. And it can happen with AI systems just as much as it can happen with AI systems just as it can happen with. with corporations that are optimizing an objective and neglecting the side effects. Right, but somehow that doesn't, I think we all agree, that does not count super-intelligent, the fossil fuel industry. It's kind of, that's the stupidity that I was thinking of, not ineffectualness, but lack of self-awareness, self-reflection. It's just doing its job, right?

Starting point is 00:52:21 I think, maybe I'm a misunderstanding, but I think when people use the word super-intelligent, they really are imagining something more human-like, right? Something that knows what it's doing and wants to do it. Maybe it's not wanting to do the right thing, but it has a little bit more self-control. And I'm wondering, would things like that exist? Where would they exist? How would they manifest their control in the physical world? I think there's a tendency to anthropomorphize.

Starting point is 00:52:52 And we see this in a number of commentators. So Stephen Pinker, for example, Melanie Mitchell is another one, who assert that it's intrinsic in the nature of intelligence that a sufficiently intelligent entity is going to act in a way that's beneficial to human beings. And I see absolutely no justification for this a priori. for example, right, there are millions of species on Earth. Some of them are much more numerous than human beings. Why wouldn't the AI system decide to be beneficial to Antarctic krill? You know, there's probably a thousand times more krill than there are humans. So why not be beneficial to them or why not be beneficial to bacteria?

Starting point is 00:53:49 Or why be so parochial, right? What's wrong with being beneficial to an alien species on some stuff? star on the other side of the galaxy. So I think the onus is on those who believe that somehow there's this magic connection between the capability for rational decision-making and the consequence of being beneficial to humans. There is no such magic connection. It has to be designed in.

Starting point is 00:54:22 and that's really what human compatible is about is how do we design in that magic connection so that the more intelligent the AI system, the better the outcome for human beings specifically, not for aliens on the other side of the galaxy and not for Antarctic krill, but for human beings. Yeah, no, I'm completely on your side in thinking there's no natural tendency

Starting point is 00:54:46 toward having AI be beneficial to humans. I'm just trying to wonder how vividly can we be about how this AI would hurt us? Would it be just massively misdirecting the food supply? Would it be a million subtle interventions onto social media? Is this something we can even say anything about? Or we should just be worried across a broad spectrum of bad outcomes? So the difficulty with, you know, so I can describe scenarios.

Starting point is 00:55:16 And of course, since I can describe them, you can always say, well, okay, then we'll just make a rule against that, right? And then, of course, that scenario won't happen, but another scenario will happen. So the kinds of examples that you see commonly involve, for example, side effects that modify our physical environment. And this is sort of what the fossil fuel companies are doing, in terms of adding carbon dioxide to the atmosphere.

Starting point is 00:55:51 And so you could imagine that, for example, you want to fix that problem. You say, okay, I would really like carbon dioxide levels to return to their pre-industrial concentrations. And that sounds like a great goal. And it's a perfectly reasonable goal to state to another human being who already shares all of the other background preferences that humans have for the future. but if that's the sole objective, then there are many ways of achieving that objective. For example, probably the simplest way, is it, well, where's all the carbon dioxide coming from?

Starting point is 00:56:30 I was coming from the economic activities of humans. Okay, let's get rid of the humans. And so you say, okay, well, I didn't mean that. What I meant was, you know, restore the carbon dioxide levels and don't kill any people. Fine. Okay, then we'll have. a decades-long social media campaign, which convinces people that it's immoral to have children

Starting point is 00:56:53 because those children are what's responsible for the gradual destruction of our planet. And so we all stop having children and the human race goes extinct. And the AI system has solved its problem without killing any people. And then, you know, if there's anyone left to have a third wish, then they come up with a revised version of that. And then the solution to that problem kills them off. So the issue is that we haven't noticed this problem up until recently because our intelligent systems have been not very intelligent.

Starting point is 00:57:33 They're mostly in the lab, they're mostly working on toy problems, and so they don't have consequences on the world. And if they start misbehaving, we have an off switch. We switch them off. In fact, every robot above a certain size, is legally required to have a big red off switch, usually on the back, so that if it starts misbehaving, you can switch it off in an emergency or even a remote control off switch. I did not know that.

Starting point is 00:58:00 But once you get outside the lab, you know, the effects, as with social media algorithms that are directly operating on the real world, can be enormous. us. And you also have to understand that the off switch, you know, people often say, oh, well, if something really bad happens, we can just switch it off. But of course, that's like saying, you know, if I'm losing to AlphaGo, I'll just play a better move. Right. Well, sorry, whatever better move you might think you have, AlphaGo has already thought of it and has a way to refute it. And whatever attempt you might make to switch off the machine, the AI system has, of course, anticipated that that's a possible failure mode, right?

Starting point is 00:58:46 One of the most obvious ways of not achieving your objective is that you're dead, right? You can't fetch the coffee if you're dead. So the system has already taken preemptive steps to prevent that failure because it's very good at achieving the objectives that we give it, not because it really wants to be alive. There's nothing to do with self-preservation. There's no instinct. It's a logical consequence of having an objective.

Starting point is 00:59:13 that's fixed, that a system wants to remain alive in order to achieve the objective. So I guess there's no reason to keep the audience in suspense anymore. I mean, the real purpose of your book is to actually propose a new strategy for dealing with exactly this problem. You think that we should be giving our AI systems a different kind of goal than we usually do? Yes, in fact, you could argue we should not be giving the AI system a goal. at least not one that is precisely defined and known to the AI system. Because it's exactly when the AI system believes that it knows the objective correctly,

Starting point is 00:59:57 that whatever action it comes up with in furtherance of that objective, it then sort of believes that this is the correct action to do. and doesn't tolerate necessarily interference from people who are jumping up and down saying, you know, stop doing that, you're destroying the world. From the AI system's point of view, that's just noise, right? It's just, it's me, you know, my mouth flapping open and closed. So what? It's pursuing the optimal solution to the objective that we gave it.

Starting point is 01:00:32 So the solution that's proposed in the book has so two, two main parts, right? The first part is that we design systems in such a way that their only objective is whatever it is that humans want the future to be like. So we use the word preferences, which is the term that economists use, but it doesn't just mean like preferences between different kinds of pizza. It really means, you know, out of the possible futures, that the way the world could unfold in future, how would you like it to unfold? And so this is what the AI system is supposed to be helping us with. But the second part of the solution is that it knows that it doesn't know what those preferences are.

Starting point is 01:01:26 And this avoids the problem of a machine that is given a fixed objective and believes that that is the only objective and therefore has to be pursued at all costs. So if we say to the AI system that we don't like what it's doing, we're conveying additional preference information to the machine, and it then is rational for the machine

Starting point is 01:01:52 to revise its plan, because now it's learned a bit more about what we want and don't want. And if it comes up with a plan that says, okay, I can fix carbon dioxide, by getting rid of all the people. So it knows that we want to fix the carbon dioxide. Let's say it doesn't know whether the people want to be alive or dead.

Starting point is 01:02:12 Well, what's the rational thing for it to do? If it just wants to optimize the carbon dioxide, then it just kills all the people. If it knows that it doesn't know whether the people want to be alive or dead, then it's reasonable, in fact, it's rational for it to ask permission. It says, okay, I've got this plan. here's how we solve it, we kill all the people, and then carbon dioxide goes back to normal. And then you say, no, that's not quite what I meant, right? So these solutions, like asking permission, deferring to human instruction, and also allowing

Starting point is 01:02:49 oneself to be switched off, these behaviors don't have to be programmed in. We don't have to sort of put special rules and regulations into the AI system to make it do that. those behaviors are just a logical consequence of the way we frame the problem. They are rational for the AI system to do. It's rational for it to ask permission. It's rational for it to allow itself to be switched off because it doesn't want to do whatever it is that would cause us to switch it off. And it doesn't know what that might be.

Starting point is 01:03:24 But if there's a danger that it might do something we really don't like, then it would prefer to be switched off in order to avoid that from happening. Yeah, I certainly like the idea of this initial uncertainty in what it thinks the humans would prefer. Is this something that, just by the way, have you written AI algorithms that have this feature and turn them loose and watch them behave? Yes, in tiny little worlds. And the mathematical description of this, we call it an assistance. game. So the word game is a general term actually that economists used to mean any situation where

Starting point is 01:04:05 there are multiple entities that are making decisions. And here, we could imagine just two entities, a human being and a machine. And the machine's objective is whatever it is the human wants, that's what I want. Right. But the machine doesn't know what that is. And so you can set up that game. And for example, you can make a little grid world and you can put a human in there and it's a simulated human and a simulated robot. And you can solve the game. You can calculate what is the solution to this game for both the human and the robot. And when you look at the solutions, you see what you hope. You see that, in fact, the human tries to teach the robot. Sometimes a human shows the robot what not to do, right?

Starting point is 01:04:55 So basically kind of takes the robot by the hand and says, look, you could go over here, but there's a big pit. Don't go in there. And then you should go over here, and this is where you should sit, because this is a nice place to be. And the robot, you know, other versions of the game, the robot asked permission. In some versions of this game,

Starting point is 01:05:15 the robot and the human actually devise a little protocol where the robot kind of comes and asks a question, and the human gives them a yes-no, and then the robot goes away and does the right thing. So we're just at the beginning of this process of exploring the nature of solutions to these kinds of formal mathematical problems. But what we've seen so far is that, first of all, the kinds of behaviors that the robot exhibits

Starting point is 01:05:46 are exactly what we hope. They defer, they ask permission, they learn from the human about what the human wants, and they help the human achieve whatever it is the human wants to achieve. And interestingly, right, the one way of thinking about it is that this game is a strict generalization of the classical notion of a machine pursuing a fixed objective, because now it's not a fixed objective, it's an uncertain objective rather than a certain objective. So it's strictly a more general problem-solving situation, and the kinds of behaviors that are exhibited are much richer and more interesting as a result.

Starting point is 01:06:31 It certainly bears a family resemblance to Bayesian inference, right, where we have some set of priors with the probability of all them, and then we update on the basis of more information. So I'm wondering, would we expect that such a system would become less and less uncertain over time about what it thinks the, motivations are? I mean, do these systems grow old and crotchety and harder to change their minds? In a sense, yes. I think in a practical, in a practical sense, it's pretty unlikely that machines would ever acquire enough information to be able to predict our preferences, particularly about things that have never been experienced in the real world. So, for example, you can look through, there are whole books written about what people want. You know, and the development economists write these books and moral philosophers write these books.

Starting point is 01:07:33 People come up with long lists of, you know, what are the needs of human beings? What, you know, what... But in none of those books will you find what color do people prefer the sky to be? and that's because up to now no one's bothered to change the color of the sky to find out how upset people are as a result so it's unlikely that an AI system

Starting point is 01:07:59 is going to be able to predict our responses to changes in the color of the sky other than just sort of being generally upset that it's not what it used to be but maybe we'd like the new one better actually, I don't know. But there's also this sort of fundamental fact that there's a lot we don't know about our own preferences. So there's genuine what we call epistemic uncertainty, meaning that it doesn't matter how much I think about it. I actually have to go and experience something.

Starting point is 01:08:35 I have to get real actual empirical experience in the world in order to learn more about my preferences. And there's a recent book by L.A. Paul, who's a philosopher, called Transformative Experience, and she uses the example of the durian fruit. We just had Lori as a guest on the podcast, and we talked all about transformative experiences here. Right. And the durian, yeah. So independently, I came up with the durian fruit as an example of this. So it's a fruit that's extremely smelly, and some people absolutely love it,

Starting point is 01:09:10 and they think it's the most marvelous food on the surface of the earth. And there are other people who utterly despise it. And there have been cases of where there was a cargo of durian on an airplane, and some of the passengers could not stand it, so they actually mutinied on the airplane and forced the captain to go back and land at the airport and unload the cargo of durians because it was so unbearable. So it's clearly something that elicits either extremely positive

Starting point is 01:09:39 or extremely negative preferences. But I have no idea which one I am, because I've never tried it, I've never smelt it. So I have real epistemic uncertainty about how I would enjoy a future where I'm eating lots of durian. So, you know, and very similar kinds of arguments apply

Starting point is 01:10:00 to much more practical kinds of things. Like, well, you know, would you imagine a school lever, you know, would you like to be a coal miner or a librarian, right? Those are the only two jobs we have right now, lad. Coal miner, librarian, which is it going to be? Of course, at 16 years old, you have no idea how much you're going to enjoy being a coal miner, or how much you're going to enjoy being a librarian.

Starting point is 01:10:23 So you have some vague feelings about it. You might have seen references to these professions, and you've got some images in your mind and little bits of evidence here and there. But, of course, you could certainly be wrong. if you made one prediction or another. So this real uncertainty about human preferences within humans means that there's no practical way that the AI system is going to reach a situation of perfect knowledge.

Starting point is 01:10:53 But the way Bayesian inference works actually is that unless you make certain kinds of mistakes in the way you set up the problem at the beginning, the Bayesian process only reaches certainty when it's correct. Yeah. So as long as you make sure that your prior beliefs are sufficiently broad as to encompass every possible human preference structure, which is a tall order, I agree. But let's say if you can do that, then you only approach certainty as you approach having correct beliefs about the human preference structure.

Starting point is 01:11:31 So in that situation, you might not mind too much. you know, the AI system stops asking permission because it knows what you're going to say, so it doesn't need to ask permission. So it becomes sort of like the perfect butler. And if you've read P.G. Woodhouse and his books where Jeeves is the butler and Bertie Wooster is the aristocrat, Jeeves actually in some ways knows Bertie Wuster better than Bertie knows himself and always anticipates what his master needs and wants and manages to do it in advance. So it would be a little bit like that. But I think, as I say, it would be very unlikely that we would approach any degree

Starting point is 01:12:18 of certainty in any finite amount of time. How much do we have to deeply think about the idea of a preference? I mean, in the usual, in the standard model, I guess, of AI, where you just programming and utility function, that is operationally what you mean. But here, you're using the word preference to mean something about human beings, right? It's the human preferences that are supposed to be learned by the AI. If we push it hard enough to implement in software, is it even clear what human preference is? I mean, not only do humans not know what their preferences are, they might change, they might be inconsistent, right? Internally incoherent. Are there ways to deal with Yes, so these are great questions. It's certainly the case that when economists or behavioral

Starting point is 01:13:08 economists do a lot of experiments to try to figure out the actual preferences of humans, even for something as simple as money. So let alone the color of the sky or what kind of piece you want, but just money. How much do you like money and risks involving money? Money today, versus money tomorrow. And I would say that the consensus of these experiments is that consistency of preferences seems to be fairly well supported. And so consistency, meaning that you do appear to be able to rank different outcomes in a consistent way.

Starting point is 01:13:58 but we don't exactly seem to follow some of the at least straightforwardly if you interpret the other axioms of rationality we don't quite seem to follow those and we seem to have our own peculiar ways of making decisions and Daniel Kahneman who's leading psychologist and another Nobel Prize winner in economics has studied actual human preferences and we seem to have actually sort of two minds. So his explanation is that there's an experiencing self, which is sort of the moment to moment amount of enjoyment or pain that you're experiencing. And then there's a remembering self. And in standard economic theory, the remembering self is sort of supposed to add up the experiencing self's experiences and say, okay, well, that's how good this was.

Starting point is 01:14:56 it's the sort of the sum of all the pleasures and pains that I experienced. But it turns out empirically that the remembering self does something quite a bit different. So in particular, the remembering self really remembers the most pleasurable or most painful part of the experience. And also the part that was most recent. So they call this the peak end theory. So the peak and the end of the experience seem to be very salient in how you remember the desirability or undesirability of something. And that's pretty hard to reconcile with standard economic models.

Starting point is 01:15:34 But actually, when you think about it, it's clear that memory and expectations are quite reasonably folded in to human decision making. Because if I have a bad experience, I might remember that for my whole life. And so it isn't just the experience itself that was bad. It's the recurring memory of it. That's going to be bad. And that recurring memory is going to remember the worst thing, right? So if I ask you, what was the worst thing that ever happened in your life?

Starting point is 01:16:05 You know, probably maybe three or four things pop into your head, and there are things you think about every so often. And some people even have to go through psychiatric treatment to try to extinguish some of these memories that are ruining the rest of their lives. So it might be that in fact what the remembering self is doing is perfectly rational if you incorporate memory and expectation into how you evaluate the costs or the benefits of some experience. So I would say that at the moment we're a long way from understanding the rich many-layered nature of human preferences and so on. But if we really are inconsistent, so if, for example, if our transities are, if our preferences,

Starting point is 01:16:52 are intransitive, in other words, they kind of go around in circles, then there's nothing the AI system can do, because any pizza the AI system gives you, you say, well, actually, I prefer this other one. And the AI system gives you that one. It's actually I prefer the first one. And of course, you'll never be satisfied. And there's really nothing that any discipline of,

Starting point is 01:17:18 of building artificial systems could possibly do to satisfy inconsistent preferences. But we can give you a pizza as opposed to no pizza at all, right? You probably prefer some kind of pizza to starving to death. So we'll make sure you get some kind of pizza. We might end up learning a lot about psychology from the way that the AI determines what our preferences are in practice, which might be different than what we say they are

Starting point is 01:17:46 when we try to state them. Yeah, I was going to say, I think that this is probably the, conceptually the most difficult part of the research agenda that I'm proposing is the gap between the idealized model of a human that has stable preferences and consistent preferences about the future and the reality of actual humans. And I think particularly stability is conceptually very difficult to deal with. If a human being's preferences can change, in other words, what I want tomorrow to be like today is different from what I want tomorrow to be like when I get to tomorrow, then which of my two selves is the AI system supposed to satisfy myself today or myself tomorrow? It's hard to pick apart that problem philosophically.

Starting point is 01:18:44 and then practically, if human beings' preferences can change, which clearly they do, we're not born with a whole complex set of preferences, then how do you make sure that the AI system doesn't satisfy preferences by changing those preferences to make them easier to satisfy? Well, what I was going to say was, how can we be sure that the AI won't just change its mind about what its preferences are to satisfy the human preferences? Like if we really are imagining AIs that are super intelligent, much less just intelligent,

Starting point is 01:19:20 can we really also assume that we're baking in their fundamental motivational structure? Or is that something subject to editing on their own part? So this is an interesting set of questions having to do with our ability to prove mathematical theorems about the software that we build. So we already have the capability to prove that algorithms are correct, that they behave according to some specification, and this capability is getting more and more sophisticated. And we use it even in areas like machine learning,

Starting point is 01:19:59 which you might think, how could you possibly prove anything about an algorithm that learns? But it turns out that you can. and you can typically we're able to show that we expect to get performance within some small epsilon of the desired performance and that will happen with very high probability so that's the kind of theorem that we prove all the time so the idea here would be we would set up the general

Starting point is 01:20:34 framework for how the system operates, and it would have learning components strung together in the right way. And then what you prove is that no matter how good each of those components becomes, the system as a whole still conforms to this require design where it satisfies these principles. There are some tricky parts to this. So traditionally, when we prove, theorems about software, we're kind of assuming that the parts of the program that we want to remain fixed, remain fixed. And you can prove that this piece of software can never overwrite itself and change its sort of constitution, so to speak. But there's another set of theorems you'd have to prove, which is that there's no circumstance under which the software would

Starting point is 01:21:34 convince a human being to go in and overwrite the constitution. And that's a much more difficult kind of guarantee to provide. And this comes about, you know, we've noticed this in computer security where we prove that security protocols are correct. But then we find out that there's what we call a side channel. and side channels include things like listening to the sound of the keyboard by bouncing a laser beam off the window of your office building or measuring the electrical current in the main supply that's going into the computer

Starting point is 01:22:19 and then being able to extract information from these channels and when you put an AI system into the real world you automatically create this side channel which is that the AI system can convince a human being to make physical changes to the memory of the machine and its software. And we have to make sure that that side channel is somehow cut off. All right, this is a lot to think about. I've got to say, this is a very good episode for Food for Thought purposes. I'll just close with just giving you a chance to sort of speculate about the future.

Starting point is 01:22:57 I mean, what do you think is going to be the vision? of AI that comes true 50 years from now? I mean, what should we be looking for overall, given all that we've talked about over the last hour? So assuming things go well, so assuming we figure out how to design AI systems in such a way that we retain control over them, and I think this is actually feasible.

Starting point is 01:23:23 There's a lot of very hard algorithm design and engineering work to be done to take the basic framework that I've just and turn it into technology that will replace and supersede all of the AI technology that's already out there. But assuming that works and assuming that we overcome these conceptual obstacles and achieve human level or sometimes called general purpose or superintelligent AI, that becomes a tool that enables us to have a much better civilization than the one we currently.

Starting point is 01:24:01 have. And this, you know, our civilization is sort of the best we can do with the brains that we have. But if we had access to much better brains, even if they don't belong to us, we get to use them. And we can have a better world than the one we have right now. We could, I think, quite easily without imagining the sort of sci-fi advances of, you know, eternal life or fast and light travel, you know, curing all disease. but just using AI and robotics technology to deliver the goods and services that, you know, well-off people in the West take for granted. So giving that sort of standard of life to everyone on Earth would be it's about a tenfold

Starting point is 01:24:50 increase in the GDP of the world, which when you calculate the sort of cash equivalent value, what the economists call net present value. a half trillion, sorry, 13.5,000 trillion dollars. So just for comparison, you know, the GDP of the US is on the order of 20 trillion dollars a year. So we're talking about hundreds of times larger than the GDP of the US or in fact the GDP of the world. So that's the cash value. And when you look at the investments, the tens or hundreds of billions of dollars that people are talking about, it's absolutely minuscule in comparison to the potential benefits. And so we could enter a golden era in a way where we no longer have to fight about access

Starting point is 01:25:44 to the wherewithal of life, which is one of the main reasons we've been killing each other since the beginning of time. I mean, there are other reasons like religion that we kill each other, but at least we would get rid of this reason. So some people have said, look, if we have this AI that can do everything the human beings can do and sort of can do it for us, essentially for free, then what's left for human beings? What role do we have? Where do we find a useful purpose in life if there's nothing that we can do that's better than what the machine can do? And I think this is a really important question for us. And presumably if the machines understand our preferences, they know that we don't want a future of idle enfeeblement, right, where we essentially become couch potatoes for the rest of time, where we don't learn, we don't try, we don't do anything of any value with our lives, because that's sort of not what we want the future to be like. So the AI systems will have to say at some point, no, we're not going to keep tying your shoelaces for you.

Starting point is 01:27:07 You humans have got to learn and keep learning to do stuff for yourself. So that's going to be an interesting discussion to have in a few decades' time. Well, you've certainly given us a very good set of reasons to believe that we should all be thinking hard about this and trying to figure out what's going to happen next. Stuart Russell, thanks so much for being on the Mindscape podcast. Pleasure, Sean. Very nice talking to. Deadlines move. Plans change.

Starting point is 01:27:57 And sometimes opportunities pop up out of nowhere. When you need branded gear fast, For Imprint is ready to deliver. Four Imprint offers hundreds of promotional products in their 24-hour category. Everything from custom apparel, bags and drinkware to writing tools, trade show staples, and high-tech gear. At Four Imprint, it's not just about speed. It's about doing it right.

Starting point is 01:28:21 Your logo is printed with precision. Your order is packed with care and it all ships out fast. And with their 360-degree guarantee, you can be confident it'll show up right on time, just the way you planned it. That's the certainty of four imprint. So if you're prepping for a last-minute event or jumping on a big opportunity,

Starting point is 01:28:40 you don't have to settle or scramble. With four-imprint, you get fast, reliable service, and peace of mind built right in. Check out their full 24-hour selection at 4imprint.com.

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 94 | Stuart Russell on Making Artificial Intelligence Compatible with Humans

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.