a16z Podcast - Controlling AI

Starting point is 00:00:00 Hi, and welcome to the A16Z podcast. I'm DOS, and in this episode, Frank Chen interviews UC Berkeley Professor of Computer Science, Stuart Russell. Russell literally wrote the textbook for artificial intelligence that has been used to educate an entire generation of AI researchers. More recently, he's written a follow-up, human-compatible, artificial intelligence in the problem of control. Their conversation covers everything from AI misclassification and bias problems, to the questions of control and competence in these systems, to a potentially new and better way to design AI. But first, Russell begins by answering, where are we really when it comes to artificial general intelligence, or AGI, beyond the scary picture of

Starting point is 00:00:42 SkyNet? Well, the SkyNet metaphor is one that people often bring up, and I think generally speaking, Hollywood has got it wrong. They always portray the risk as an intelligent machine that somehow becomes conscious, and it's the consciousness that causes the machine to hate people and want to kill us all. And this is just a mistake. The problem is not consciousness, it's really competence. And if you said, oh, by the way, you know, your laptop's now conscious, it doesn't change the rules of C++, right? The software still runs exactly the way it was always going to run when you didn't think it was conscious. So on the one hand, we have people like Elon Musk saying artificial general intelligence, like that's a real possibility. It may be sooner than a lot of

Starting point is 00:01:27 people think. And on the other, you've got people like Andrew Ng who are saying, like, look, we're so far away from AGI. All of these questions seem premature, and I'm not going to worry about the downstream effects of superintelligent systems until I'd worry about overpopulation on Mars. So what's your take on the debate? He, in fact, upgraded that to overpopulation on Alpha Centurion. So let's first of all talk about timelines and predictions for achieving human level or superhuman AI. So Elon actually is reflecting advice that he's received from AI experts. So some of the people, for example, at OpenAI think that five years is a reasonable timeline. And that the necessary steps mainly involve much bigger machines and much more data.

Starting point is 00:02:18 So no conceptual or computer science breakthroughs, just more compute, more storage, and we're there. Yeah. So I really don't believe that. Crudely speaking, the bigger, faster the computer, the faster you get the wrong answer. But I believe that we have several major conceptual breakthroughs that still have to happen. We don't have anything resembling real understanding of natural language, which would be essential for systems to then acquire the whole of human knowledge. We don't have the capability to flexibly plan and make decisions over long time scales. So, you know, we're very impressed by AlphaGo or Alpha Zero's ability to think 60 or 100 moves ahead.

Starting point is 00:03:02 That's superhuman. But if you apply that to a physical robot whose decision cycle is a millisecond, you know, that gets you a tenth of a second into the future, which is not very useful if what you're trying to do is not just lay the take. for dinner, but do it anywhere. Right. In any house, in any country in the world, figure it out. Laying the table for dinner is several million or tens of millions of motor control decisions. And at the moment, the only way you can generate behavior on those timescales is actually to have canned subroutines that humans have defined pick up a fork.

Starting point is 00:03:39 Okay, I can train picking up a fork, but I've defined picking up a fork as a thing. So machines right now are reliant on us to supply that hierarchical. that hierarchical structure of behavior. When we figure out how they can invent that for themselves as they go along and invent new kinds of things to do that we've never thought of, that will be a huge step towards real AI. As we march towards general intelligence, this literal ability to think outside the box, right, will be one of the hallmarks I think we look for. If you think about what we're doing now, we're trying to write down human objectives. It's just that we tend, because we have very stupid systems, they only operate in these very limited context, like a go board. And on the

Starting point is 00:04:22 go board, you know, a natural objective is win the game. If AlphaGo was really smart, even if you said win the game, well, you know, I can tell you here's what chess players do when they're trying to win the game. They go outside the game. And a more intelligent AlphaGo would realize, okay, well, I'm playing against some other entity. What is it? Where is it? Right? There must be some other part of the universe besides my own processor and this go board. And then it figures out how to break out of its little world and start communicating. Maybe it starts drawing patterns on the go board with go pieces to try and figure out what visual language it can use to communicate with these other entities. Now, how long do these kinds of breakthroughs take? Well, if you look back

Starting point is 00:05:12 at nuclear energy, for the early part of the 20th century, when we knew that nuclear energy existed. So, you know, from E equals MC squared in 1905, we could measure the mass differences between different atoms. We knew what their components were. And they also knew that, you know, radium could emit vast quantities of energy over a very long period. So they knew that there was this massive store of energy. But mainstream physicists were adamant that it was impossible to ever release it. And there was a harness it in some... Right. So there was a famous speech that Lord Rutherford gave and he was, you know,

Starting point is 00:05:46 with the man who split the atoms. So it's like the leading nuclear physicist of his time. And that was September 11th, 1933. And he said that the possibility of extracting energy by the transmutation of atoms is moonshine. The question was, is there any prospect in the next 25 or 30 years? So he said, no, it's impossible. And then the next morning, Leo Zillard actually read a report of that in the Times

Starting point is 00:06:10 and went for a walk and invented the nuclear chain reaction based on neutrons, which people hadn't thought of before. And that was a conceptual breakthrough. Right. You went from impossible to now it's just an engineering challenge. So we need more than one breakthrough, right? It takes time to sort of ingest each new breakthrough and then build on that to get to the next one.

Starting point is 00:06:31 So the average AI research thinks that we will achieve superhuman AI sometime around the middle of this century. So my personal belief is actually more conservative. One point is we don't know how long it's going to take to solve the problem of control. If you ask the typical AI research, okay, and how are we going to control machines that are more intelligent than us?

Starting point is 00:06:54 Dasa, beats me. So you've got this multi-hundred billion dollar research enterprise with tens of thousands of brilliant scientists all pushing towards a long-term goal where they have absolutely no idea what to do if they get there. So coming back to Andrew Ng's prediction, the analogy just doesn't work. If you said, okay, the entire scientific establishment on Earth

Starting point is 00:07:19 is pushing towards a migration of the human race to Mars and they haven't thought about what we're going to breathe when we get there, you'd say, well, that's clearly insane. Yeah, and that's why you're arguing, we need to solve the control problem now, or at least the right design approaches to solving this control problem. So it's clear that the current formulation, the standard model of AI, as build machines,

Starting point is 00:07:42 that optimize fixed objectives is wrong. We've known this principle for thousands of years that be careful what you wish for. You know, King Midas wished for everything he touched to turn to gold. That was the objective he gave to the machine, which happened to be the gods, and the gods gave him his objective. And then that was his food and his drink and his family all turned to gold,

Starting point is 00:08:07 and then he dies in misery. And so we've known this for thousands of years, And yet we built the field of AI around this definition of machines that carry out plans to achieve objectives that we put into them. And it only works if and only if we are able to completely, perfectly, perfectly specify the objective. So the guidance is don't put fixed objectives into machines, but build machines in a way that acknowledges the, the uncertainty about what the true objective is. For example, take a very simple machine learning task, learning to label objects and images. So what should the objective be? Well, you go and talk to a room full of computer vision people, they will say labeling accuracy. And that's actually

Starting point is 00:09:00 the metric used for all these competitions. In fact, this is the wrong metric. How so? Because different kinds of misclassifications have different costs in the real world. misclassifying one type of Yorkshire Terrier as a different type of Yorkshire Terrier is not that serious. Classifying a person as a gorilla is really serious. And Google found that out when the computer vision system did exactly that. And it probably costs them billions in goodwill and public relations. And that opened up actually a whole series of people observing the ways that these online systems were basically misbehaving in the way they classified people. If you do a search on Google Images for CEO, I think it was one of the women's magazines

Starting point is 00:09:53 pointed out that the first female CEO appears on the 12th row of photographs and turns out to be Barbie. So if accuracy isn't the right metric, what are the design paths that you're suggesting we optimize for? If you're going to have that image labeling system take action in the real world and posting a label on the web is an action in the real world, then you have to ask, what's the cost of misclassification? And when you think, okay, so ImageNet has 20,000 categories, and so there are 400 million, so 20,000 squared different ways of misclassifying one object as another. So now you've got 400 million unknown costs. Obviously, you can't specify a joint distribution over 400 million numbers one by one. It's far too big.

Starting point is 00:10:42 So you might have some general guidelines that misclassifying one type of flower as another is not very expensive, misclassifying a person as inanimate object, those are going to be more expensive. But generally speaking, you have to operate under uncertainty about what the costs are. And then how does the algorithm work? One of the things it should do, actually, is refuse to classify. certain photographs saying, I'm not sure enough about what the cost of misclassification might be. So I'm not going to classify it. So that's definitely a divergence from state of the art today, right?

Starting point is 00:11:18 State of the art today is you're going to assign some class to it, right? That's a dog or your extraterrower or a pedestrian or a tree. And then the algorithms can say, like, I'm really sure or I'm not really sure. And then a human decide. You're saying something different, which is, I don't understand the cost of a uncertainty, so therefore, I'm not even going to give you a classification or a confidence interval on the classification. Like, I shouldn't. It's irresponsible for me. So I could give confidence intervals and probability, but that wouldn't be what image labeling systems

Starting point is 00:11:49 typically do. They're expected to plump for one label. And the argument would be if you don't know the costs of plumping for one label or another, then you probably shouldn't be plumping. And I read that Google Photos won't label gorillas anymore. So you can give it a picture that's perfectly obviously a gorilla, and it'll say, I'm not sure what I'm seeing here. And so how do we make progress on designing systems that can factor in this context, sort of understanding the uncertainty, characterizing the uncertainty? So there's sort of two parts to it. One is how does the machine behave, given that it's going to have radical levels of uncertainty? certainty about many aspects of our preference structure.

Starting point is 00:12:34 And then the second question is, how does it learn more about our preference structure? As soon as the robot believes that it has absolute certainty about the objective, it no longer has a reason to ask permission. And in fact, if it believes that the human is even slightly irrational, which of course we are, then it would resist any attempt by the human to interfere or to switch it off because the only consequence of human interference in that case would be a lower degree of achievement of the objective. So you get this behavior where a machine with a fixed objective will disable its own off switch

Starting point is 00:13:14 to prevent interference with what it is sure is the correct way to go forward. Yeah. And so we'd want a very high threshold on confidence that it's understood what my real preference or desire is. Well, I actually think it's in general not going to be possible for the machine to have high confidence that it's understood your entire preference structure. It may understand aspects of it, and if it can satisfy those aspects without messing with the other parts of the world, that it doesn't know what you want, then that's good. But, you know, there are always going to be things that it never occurs to you to write down. So I can see how this design approach would lead to much safer systems, right?

Starting point is 00:13:59 because you have to factor in the uncertainty. I can also imagine sort of a practitioner today sitting in their seat going, wow, like, that is so complex. I don't know how to make progress. So what do you say to somebody who's now thinking, wow, I thought my problem was X hard, but it's really 10x or 100x or 1,000 X hard? So, you know, interestingly, the safe behaviors fall out as solutions of a mathematical game

Starting point is 00:14:25 with a robot and a human. In some sense, they're cooperating. They both have the same objective, which is whatever it is the human wants. It's just that the robot doesn't know what that is. So if you formulate that as a mathematical game and you solve it, then the solution exhibits these desirable characteristics that you want, namely deferring to the human, allowing yourself to be switched off, asking permission, only doing minimally invasive things. We've seen, for example, in the context of self-driving cars that when you formulate things this way, the car actually invents for itself protocols for behaving in traffic that are quite helpful. For example, one of the constant problems with self-driving cars is how they behave

Starting point is 00:15:14 at four-way stop signs because they're never quite sure who's going to go first and they don't want to, you know, they don't want to cause an accident. They're optimized for safety. So they'll end up stuck at that four-way intersection. So they're stuck and everyone ends up pulling around them and it's actually, you know, it would probably cause accidents rather than reducing accidents. So what the algorithm figured out was that if it got to the stop sign and it was unclear who should go first,

Starting point is 00:15:38 it would back up a little bit. And that's a way of signaling to the other driver that it has no intention of going first and therefore they should go. And that falls out as a solution of this game theoretic design for the problem. Let's go to another area where machine learning is often being used. I'm about to make a loan, right, to an individual. And so they take in all this data, they figure out your credit worthiness, and they say, like, loan or not, right? How would sort of game theory inside loan decision-making different than traditional methods? So what happens with traditional methods

Starting point is 00:16:15 is that they make decisions based on past data. And a lot of that past data reflects biases that are inherent in the way society works. So if you just look at historical data, you might end up making decisions that discriminate in effect against groups that have previously been discriminated against because that prior discrimination resulted in lower loan performance. And so you end up actually just perpetuating

Starting point is 00:16:44 the negative consequences of social biases. So loan underwriting in particular has to be inspectable. and the regulators have to be able to verify that you're making decisions on criteria that neither mention race or are not proxies for race. So the principles of those regulations need to be expanded to a lot of other areas. For example, data seems to be suggesting that the job ads that people see online are extremely biased by race. If you're just trying to fit historical data and maximize predictive accuracy, you're missing out these other objectives about fairness at the individual level and the social level.

Starting point is 00:17:27 So economists call this the problem of externality. And so pollution is a classic example. A company can make more money by just dumping pollution into rivers and oceans and the atmosphere rather than treating it or changing its processes to generate less pollution. So it's imposing costs on everybody else. The way you fix that is buy fines or tax penalty. Yeah. You create a price for something that doesn't have a price.

Starting point is 00:17:55 Right. Now the difficulty, and this is also true with the way social media content selection algorithms have worked, it's, I think, very hard to put a price on this. And so the regulators dealing with loan underwriting have not put a price on it, they put a rule on it. Right. You cannot do things that way. So let's take, you know, making a recommendation at an e-commerce site for here's a

Starting point is 00:18:21 product that you might like. How would we do that differently by baking game theory? So the primary issue with recommendations is understanding user preferences. One of the problems I remember with a company that sends you a coupon to buy a vacuum cleaner and you buy a vacuum cleaner. Great. So now it knows you really like vacuum cleaners. It keeps sending you coupons for vacuum cleaners. But of course, you just bought a vacuum cleaner. So you've no interest in getting another vacuum cleaner. You're not collecting vacuum cleaners. So just this distinction between consumable things and non-consumerable things is really important when you want to make recommendations.

Starting point is 00:19:00 And I think you need to come to the problem for an individual user with a reasonably rich prior set of beliefs about what that user might like based on demographic characteristics. How do you then adapt that and update it with respect to the decisions that the user makes about what products to look at, which coupons they cash in, which ones they don't, and so on. And one of the things that you might see falling out would be that the recommendation system, it might actually ask you a question. I've noticed that you've showed no interest in all these kinds of products. Are you, in fact, a vegetarian?

Starting point is 00:19:41 As you look back at your own career in the space, are you surprised that the field is where it is? You know, 10 years ago, I would have been surprised to see that speech recognition is now just a commodity that everyone is using on their cell phones across the entire world. Yeah. When I was in undergrad, they said, we definitely have to solve the touring test before we're going to get speaker and independent natural language. And, you know, I worked on self-driving cars in the early 90s, and it was pretty clear that the perception capabilities were the real bottleneck. the system would detect about 99% of the other cars. So every 100th car, you just wouldn't see it. So these are things that are coming true, right?

Starting point is 00:20:23 They were sort of holy grails. It's interesting that even though they've achieved superhuman performance on these test bed data sets, there are still these adversarial examples that show that actually it's not seeing things the same way that humans are seeing. Definitely making different mistakes than we make. And so it's fragile in ways that we're.

Starting point is 00:20:45 We don't understand. For example, OpenAI has a system with simulated humanoid robots that learn to play soccer. One learns to be the goalkeeper, the other one learns to take penalties, and it looks great. This was a big success. And he basically said, okay, can we get adversarial behavior from the goalkeeper? So the goalkeeper basically falls down on the ground, immediately, waggles its leg in the air. And the penalty taker, when he's kicking the ball, just completely falls apart, right? I don't know how to respond to that.

Starting point is 00:21:18 And never actually gets around to kicking the ball at all. I don't know whether he's laughing so hard, he can't kick the ball or what. But, you know, so it's not that just because we have superhuman performance on some nicely curated data set, that we actually have superhuman vision or superhuman motor control learning. Are you optimistic about the direction for the field? So one reason I'm optimistic is that as we see more and more of these failures of the standard model, people will say, oh, well, you know, clearly we need to build these systems this other way because that sort of, you know, gives us guarantees that it will, it won't do anything rash,

Starting point is 00:21:53 it'll last permission, it will adapt to the user gradually, and it'll only start taking bigger steps when it's reasonably sure that that's what they use at once. I think there are reasons for pessimism as well in misuse for surveillance, misinformation. I mean, there's more awareness of it, but there's nothing concrete being done about that with a few honorable exceptions, like San Francisco's ban on face recognition in public spaces, and California's ban actually on impersonation of humans by AI systems. The deepfakes? Not just deep fakes, but for example, robocalls where... I'm pretending to schedule a haircut appointment, and I didn't self-identify as an AI.

Starting point is 00:22:37 Right. So Google has now said they're going to have their machine self-identify as an AI. So it's a relatively simple thing to comply with. It doesn't have any great economic cost. But I think it's a really important step that should be rolled out globally. That principle of not impersonating human beings is a fundamentally important principle. Another really important principle is don't make machines that decide to kill people. Bairn on offensive weapons. Sounds pretty straightforward, but again, although there's much greater awareness of this, there are no concrete steps being taken and countries are now moving ahead with this technology.

Starting point is 00:23:20 So I just last week found out that a Turkish defense company is selling an autonomous quadcopter with a kilogram of explosive that uses face recognition and tracking of humans. and is sold as an anti-personnel weapon. So we made a movie called SlaughterBots to illustrate this concept. We've had more than 75 million views. So to bring this home to people who are sitting at their desks working on machine learning systems, if you could give them a piece of advice on what they should be doing, like what should they be doing differently having heard this podcast that they might not have been thinking?

Starting point is 00:24:01 So for some applications, it probably isn't going to change very much. One of my favorite applications of machine learning and computer vision is the Japanese cucumber farmer. Oh, right. Who downloaded some software and trained a system to pick out bad cucumbers from his... And sorts them into the grades. The Japanese are very fastidious about the grades of produce. And he did it so inexpensively. Yeah.

Starting point is 00:24:25 So that's a nice example. And it's not clear to me that there's any particular way you might change that because it's... No game theory really needed for it. It's a very, I mean, in some sense, it's a system that has a very, very limited scope of action, which is just to, you know, sort cucumbers. The sorting is not public and there's no danger that he's going to label a cucumber as a person or anything like that. But in general, you want to think about, first of all, what is the effect of the system that I'm building on the world, right? And it's not just that it accurately classifies cucumbers or photographs.

Starting point is 00:25:03 It's that, you know, of course, people will buy the cucumbers, or people will see the photographs and what effect does it have? And so often when you're defining these objectives for a machine learning algorithm, they're going to leave out effects that the algorithm, that the resulting algorithm is going to have on the real world. And so can you fold those other effects back into the objective rather than just optimizing some narrow subset, like click-through, for example, which could have extremely bad external effects.

Starting point is 00:25:39 So the model, if you sort of want to anthropomorphize model, you would rather have the perfect butler than the genie in the lamp. All-powerful, kind of unpredictable genie. Right, and very literal-minded about this is the objective. Right. Awesome. Well, Stuart, thanks so much for joining us on the A16Z podcast. Okay, thank you, Frank. It's been a delight.

a16z Podcast - Controlling AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.