Making Sense with Sam Harris - #312 — The Trouble with AI

Episode Date: March 7, 2023

Sam Harris speaks with Stuart Russell and Gary Marcus about recent developments in artificial intelligence and the long-term risks of producing artificial general intelligence (AGI). They discuss the ...limitations of Deep Learning, the surprising power of narrow AI, ChatGPT, a possible misinformation apocalypse, the problem of instantiating human values, the business model of the Internet, the meta-verse, digital provenance, using AI to control AI, the control problem, emergent goals, locking down core values, programming uncertainty about human values into AGI, the prospects of slowing or stopping AI progress, and other topics. If the Making Sense podcast logo in your player is BLACK, you can SUBSCRIBE to gain access to all full-length episodes at samharris.org/subscribe. Learning how to train your mind is the single greatest investment you can make in life. That’s why Sam Harris created the Waking Up app. From rational mindfulness practice to lessons on some of life’s most important topics, join Sam as he demystifies the practice of meditation and explores the theory behind it.

Transcript
Discussion (0)
Starting point is 00:00:00 Thank you. of the Making Sense Podcast, you'll need to subscribe at samharris.org. There you'll find our private RSS feed to add to your favorite podcatcher, along with other subscriber-only content. We don't run ads on the podcast, and therefore it's made possible entirely through the support of our subscribers. So if you enjoy what we're doing here, please consider becoming Okay, before I jump in today, I want to take a moment to address some confusion that keeps coming up. I was on another podcast yesterday and spoke about this briefly, but I thought I might be a little more systematic here. It relates to the paradoxical way that we value expertise in really all fields, and scientific authority in particular. Seems to me there's just a lot of confusion about how this goes. Expertise and authority are unstable, intrinsically so, because the truth of any claim doesn't depend on the credentials of the person making that claim. So a Nobel laureate can be wrong, and a total ignoramus can be right, even if only by accident.
Starting point is 00:01:41 So the truth really is orthogonal to the reputational differences among people. And yet, generally speaking, we are right to be guided by experts, and we're right to be very skeptical of novices who claim to have overturned expert opinion. Of course, we're also right to be alert to the possibility of fraud among so-called experts. There are touted experts who are not who they seem to be. And we're right to notice that bad incentives can corrupt the thinking of even the best experts. So these can seem like contradictions, but they're simply different moments in time, right? The career of reason has to pass through all these points again and again and again. We respect authority, and we also disavow its relevance by turns. We're guided by it until the
Starting point is 00:02:34 moment we cease to be guided by it, or until the moment when one authority supplants another, or even a whole paradigm gets overturned. But all of this gets very confusing when experts begin to fail us, and when the institutions in which they function, like universities and scientific journals and public health organizations, get contaminated by political ideologies that don't track the truth. Now, I've done many podcasts where I've talked about this problem from various angles, and I'm sure I'll do many more because it's not going away, but much of our society has a very childish view of how to respond to this problem. Many, many people apparently believe that just having more unfettered dialogue on social media and on podcasts
Starting point is 00:03:27 and in newsletters is the answer. But it's not. I'm not taking a position against free speech here. I'm all for free speech. I'm taking a position against weaponized misinformation and a contrarian attitude that nullifies the distinction between real knowledge, which can be quite hard won, and ignorance, or mere speculation. And I'm advocating a personal ethic of not pretending to know things one doesn't know. My team recently posted a few memes on Instagram. These were things I had said, I think, on other people's podcasts. And these posts got a fair amount of crazed pushback. Apparently many people thought I was posting these memes myself,
Starting point is 00:04:14 as though I had just left Twitter only to become addicted to another social media platform. But in any case, my team posted these quotes, and my corner of Instagram promptly became as much of a cesspool as Twitter. And then people even took these Instagram memes and posted them back on Twitter so they could vilify me in that context. Needless to say, all of this convinces me again that my life is much better off of social media. But there is some real confusion at the bottom of the response, which I wanted to clarify. So one of the offending Instagram quotes read, during the pandemic, we witnessed the birth of a new religion of
Starting point is 00:04:51 contrarianism and conspiracy thinking, the first sacrament of which is, quote, do your own research. The problem is that very few people are qualified to do this research, and the result is a society driven by strongly held, unfounded opinions on everything from vaccine safety to the war in Ukraine. And many people took offense to that, as though it was a statement of mere elitism, but anyone who has followed this podcast knows that I include myself in that specific criticism. I'm also unqualified to do the, quote, research that so many millions of people imagine they're doing. I wasn't saying that I know everything about vaccine safety or the war in Ukraine. I'm saying that we need experts in those areas to tell us what is real, or likely to be real,
Starting point is 00:05:46 and what's misinformation. And this is why I've declined to have certain debates on this podcast that many people have been urging me to have, and even alleging that it's a sign of hypocrisy or cowardice on my part that I won't have these debates. There are public health emergencies and geopolitical emergencies that simply require trust in institutions. They require that we acknowledge the difference between informed expertise and mere speculation or amateurish sleuthing. And when our institutions and experts fail us, that's not a moment to tear everything down. That's the moment where we need to do the necessary work of making them trustworthy again.
Starting point is 00:06:28 And I admit, in many cases, it's not clear how to do that, at least not quickly. I think detecting and nullifying bad incentives is a major part of the solution. But what isn't a part of the solution at all is asking someone like Candace Owens or Tucker Carlson or even Elon Musk or Joe Rogan or Brett Weinstein or me what we think about the safety of mRNA vaccines or what we think about the nuclear risk posed by the war in Ukraine. Our information ecosystem is so polluted and our trust in
Starting point is 00:07:08 institutions so degraded, again, in many cases for good reason, that we have people who are obviously unqualified to have strong opinions about ongoing emergencies dictating what millions of people believe about those emergencies and therefore dictating whether we as a society can cooperate to solve them. Most people shouldn't be doing their own research, and I'm not saying we should blindly trust the first experts we meet. If you're facing a difficult medical decision, get a second opinion. Get a third opinion.
Starting point is 00:07:43 But most people shouldn't be jumping on PubMed and reading abstracts from medical journals. Again, depending on the topic, this applies to me, too. So the truth is, if I get cancer, I might do a little research, but I'm not going to pretend to be an oncologist. The rational thing for me to do, even with my background in science, is to find
Starting point is 00:08:06 the best oncologists I can find and ask them what they think. Of course, it's true that any specific expert can be wrong or biased, and that's why you get second and third opinions. And it's also why we should be generally guided by scientific consensus, wherever a consensus exists. And this remains the best practice even when we know that there's an infinite number of things we don't know. So while I recognize that the last few years has created a lot of uncertainty and anxiety and given a lot of motivation to contrarianism, And the world of podcasts and newsletters and Twitter threads has exploded as an alternative to institutional sources of information. The truth is we can't do without a culture of real expertise. And we absolutely need the institutions that produce it and
Starting point is 00:08:59 communicate it. And I say that as someone who lives and works and thrives entirely outside of these institutions. So I'm not defending my own nest. I'm simply noticing that Substack and Spotify and YouTube and Twitter are not substitutes for universities and scientific journals and governmental organizations that we can trust. And we have to stop acting like they might be. Now that I got that off my chest, now for today's podcast. Today I'm speaking with Stuart Russell and Gary Marcus. Stuart is a professor of computer science and a chair of engineering at the University
Starting point is 00:09:42 of California, Berkeley. He is a fellow of the American Association for Artificial Intelligence, the Association for Computing Machinery, and the American Association for the Advancement of Science. And he is the author, with Peter Norvig, of the definitive textbook on AI, Artificial Intelligence, A Modern Approach. And he is also the author of the very accessible book on this topic, Human Compatible, Artificial Intelligence and the Problem of Control. Gary Marcus is also a leading voice on the topic of artificial intelligence. He is a scientist, best-selling author, and entrepreneur. He was founder and CEO of Geometric Intelligence, a machine learning company that was acquired by Uber in 2016. And he's also the author
Starting point is 00:10:26 of the recent book, Rebooting AI, along with his co-author, Ernest Davis. And he also has a forthcoming podcast titled Humans vs. Machines. And today we talk about recent developments in AI, chat GPT in particular, as well as the long-term risks of producing artificial general intelligence. We discuss the limitations of deep learning, the surprising power of narrow AI, the ongoing indiscretions of chat GPT, a possible misinformation apocalypse, the problem of instantiating human values in AI, the business model of the internet, the metaverse, digital provenance, using AI to control AI, the control problem, emergent goals, locking down core values, programming uncertainty about human values
Starting point is 00:11:22 into AGI, the prospects of slowing or stopping AI progress, and other topics. Anyway, I found it a very interesting and useful conversation on a topic whose importance is growing by the hour. And now I bring you Stuart Russell and Gary Marcus. Gary Marcus. I am here with Stuart Russell and Gary Marcus. Stuart, Gary, thanks for joining me. Thanks for having us. So I will have properly introduced both of you in the intro, but perhaps you can just briefly introduce yourselves as well. Gary, let's start with you. You're new to the podcast. How do you describe what it is you do and the kinds of problems you focused on? I'm Gary Marcus, and I've been trying to figure out how we can get to a safe AI future.
Starting point is 00:12:16 I'm maybe not looking as far out as Stuart is, but I'm very interested in the immediate future, whether we can trust the AI that we have, how we might make it better so that we can trust it. I've been an entrepreneur. I've been an academic. I've been coding since I was eight years old. So throughout my life, I've been interested in AI and also human cognition and what human cognition might tell us about AI and how we might make AI better. Yeah, I'll add you did your PhD under our mutual friend, Steven Pinker, and you have a wonderful book, Rebooting AI, Building Artificial Intelligence We Can Trust. And I'm told you have a coming podcast later this spring titled Humans vs. Machines, which I'm eagerly awaiting. I'm pretty excited about that. It's going to be fun.
Starting point is 00:13:03 Nice. And you have a voice for radio. Yeah, I know that joke. I'll take it in good spirit. No, that's not a joke. A face for radio is the joke. A voice for radio is high praise. That's right. Thank you.
Starting point is 00:13:19 Stuart, who are you? What are you doing out there? So I teach at Berkeley. I've been doing AI for about 47 years. And I spent most of my career just trying to make AI systems better and better, working in pretty much every branch of the field. And in the last 10 years or so, I've been asking myself, what happens if I or if we as a field succeed in what we've been trying to do, which is to create AI systems that are at least as general in their intelligence as human beings.
Starting point is 00:13:58 And I came to the conclusion that if we did succeed, it might not be the best thing in the history of the human race. In fact, it might be the worst. And so I'm trying to fix that if we did succeed, it might not be the best thing in the history of the human race. In fact, it might be the worst. And so I'm trying to fix that if I can. And I will also add, you have also written a wonderful book, Human Compatible, Artificial Intelligence and the Problem of Control, which is quite accessible. And then you have written an inaccessible book or co-written one, literally the textbook on AI. an inaccessible book or co-written one, literally the textbook on AI. And you've been on the podcast a few times before. So you each occupy different points on a continuum of concern about general AI and the perhaps distant problem of superintelligence.
Starting point is 00:14:39 And Stuart, I've always seen you on the sober side of the worried end. And I've spoken to many other worried people on the podcast and at various events. People like Nick Bostrom, Max Tegmark, Eliezer Yudkowsky, Toby Ord. I've spoken to many other people in private. I've always counted myself among the worried and have been quite influenced by you and your book. and have been quite influenced by you and your book. Gary, I've always seen you on the sober side of the not worried end, and I've also spoken to people who are not worried, like Steve Pinker, David Deutsch, Rodney Brooks, and others. I'm not sure if either of you have moved in the intervening years at all. Maybe we can just start there. We'll start with narrow AI and chat GPT and the explosion of interest on that topic. But I do want us to get to concerns about
Starting point is 00:15:33 where all this might be headed. But before we jump into the narrow end of the problem, have you moved at all in your sense of the risks here? Have you moved at all in your sense of the risks here? There are a lot of things to worry about. I think I actually have moved just within the last month a little bit. So we'll probably disagree about the estimates of the long-term risk. But something that's really struck me in the last month is there's a reminder of how much we're at the mercy of the big tech companies. So my personal opinion is that we're not very close to artificial general intelligence. Not sure Stuart would really disagree, but he can jump in later on that. And I continue to think
Starting point is 00:16:14 we're not very close to artificial general intelligence. But with whatever it is that we have now, this kind of approximative intelligence that we have now, this mimicry that we have now, this kind of approximative intelligence that we have now, this mimicry that we have now, the lessons of the last month or two are we don't really know how to control even that. It's not full AGI that can self-improve itself. It's not sentient AI or anything like that. But we saw that Microsoft had clues internally that the system was problematic, that it gaslighted its customers and things like that. And then they rolled it out anyway. And then initially the press hyped it, made it sound amazing. And then it came out that it wasn't really so amazing.
Starting point is 00:16:54 But it also came out that if Microsoft wants to test something on 100 million people, they can go ahead and do that, even without a clear understanding of the consequences. So my opinion is we don't really have artificial general intelligence now, but this was kind of a dress rehearsal and it was a really shaky dress rehearsal. And that in itself made me a little bit worried. And suppose we really did have AGI and we had no real regulation in place about how to test it. My view is we should treat it as something like drug trials. You want to know about costs and benefits and have a slow release, but we don't have anything like regulation around that. And so that actually pushed me a little bit closer to maybe the worry side of the spectrum. I'm not
Starting point is 00:17:33 as worried maybe as Stuart is about the long-term complete annihilation of the human race that I think Stuart has raised some legitimate concerns about. I'm less worried about that because I don't see AGI as having the motivation to do that. But I am worried about whether we have any control over the things that we're doing, whether the economic incentives are going to push us in the right place. So I think there's lots of things to be worried about. Maybe we'll have a nice discussion about which those should be and how you prioritize them. But there are definitely things to worry about. Yeah. Well, I want to return to that question of motivation, which has always struck me as a red herring. So we'll talk about that when we get to AGI. But Stuart, have you been pushed around at all by recent events or anything else? So actually, there are two recent events.
Starting point is 00:18:22 One of them is ChatGPT, but another one, which is much less widely disseminated, but there was an article in the Financial Times last week, was finding out that the superhuman Go programs that I think pretty much everyone had abdicated any notion of human superiority in Go completely. And that was 2017. And in the five years since then, the machines have gone off into the stratosphere. Their ratings are 1,400 points higher than the human world champion. And 1,400 points in Go or in chess is like the difference between a professional and a five-year-old who's played for a few months. So what's amazing is that we found out that actually an average, good average
Starting point is 00:19:13 human player can actually beat these superhuman Go programs, beat all of them, beat all of them, giving them a nine-stone handicap, which is the kind of handicap that you give to a small child who's learning the game. Isn't the caveat there, though, that we needed a computer to show us that exploit? Well, actually, the story is a little bit more complicated. We had an intuition that the Go programs, because they are circuits, right? The circuit is a very bad representation for a recursively defined function. So what does that mean? So in Go, the main thing that matters is groups of stones. So a group of stones are stones that are connected to each other by vertical and horizontal connections
Starting point is 00:20:00 on the grid. And so that, by definition, is a recursive concept because I'm connected to another stone if there's an adjacent stone to me and that stone is connected to the other stone. And we can write that, I can say it in English, you know, I just did, in sort of one small sentence. I can write it in a program in a couple of lines of Python. I can write it in formal logic in a couple of lines of Python. I can write it in formal logic in a couple of lines. But to try to write it as a circuit is, in some real sense, impossible. I can only do a finite approximation. And so we had this idea that actually the programs didn't really understand what a group
Starting point is 00:20:40 of stones is, and they didn't understand in particular whether a group of stones is going to live or going to die. And we concocted by hand some positions in which we thought that just deciding whether the program needed to rescue its group or whether it could capture the opponent's group, that it would make a mistake because it didn't understand group. And that turned out to be right. So that was... Can I just jump in for one second? Sure. It actually relates to the thing that's on the cover of Perceptrons, which is one of the most famous books in the history of artificial general intelligence. That was an argument by Minsky and that two-layer Perceptrons, which are the historical ancestors of the deep learning systems we have now, couldn't understand some very basic concepts. And in a way, what Stuart and his lab did is a riff
Starting point is 00:21:32 on that old idea. People hate that book in the machine learning field. They say that it prematurely dismissed multilayer networks. And there's an argument there, but it's more complicated than people usually tell. But in any case, I see this result as a descendant of that, showing that even if you get all these pattern recognition systems to work, that they don't necessarily have a deep conceptual understanding of something as simple as a group in Go. And I think it's a profound connection to the history of AI and kind of disturbing that here we are, you know, 50 some years later, and we're still struggling with the same problems. Yeah, I think it's the same point that Minsky was making,
Starting point is 00:22:10 which is expressive power matters, and simple perceptrons have incredibly limited expressive power, but even larger deep networks and so on, in their native mode, they have very limited expressive power. You could actually take a recurrent neural net and use that to implement a Turing machine and then use that to implement a Python interpreter, and then the system could learn all of its knowledge in Python. But there's no evidence that anything like that is going on in the Go program. So the evidence seems to suggest that actually they're not very good at recognizing what a group is and liveness and death, except in the cases, you know, so they've learned sort of
Starting point is 00:22:53 multiple fragmentary, partial, finite approximations to the notion of a group and the notion of liveness. And we just found that we could fool it where we're constructing groups that are somewhat more complex than the kinds that typically show up. And then, as Gary said, sorry, Sam, as you said, there is a program that we used to explore whether we could actually find this occurring in a real game. Because these were contrived positions that we had by hand, and we couldn't force the game to go in that direction. And indeed, when we started running this program, we've called it sort of an adversarial program. It's just supposed to find ways of beating one particular Go program called CataGo.
Starting point is 00:23:43 Indeed, it found ways of generating groups, kind of like a circular sandwich. So you start with a little group of your pieces in the middle, and then the program, the computer program, surrounds your pieces to prevent them from spreading. And then you surround that surrounding. So you make a kind of circular sandwich. And it simply doesn't realize that its pieces are going to die because it doesn't understand what is the structure of the groups. And it has many opportunities to rescue them and it pays no attention. And then you capture 60 pieces and it's lost the game. And this was something that we saw our adversarial program doing, but then a human can look at that and say,
Starting point is 00:24:23 oh, okay, I can make that happen in a game. And so one of our team members is a good Go player, and he played this against CataGo, which is the best Go program in the world, and beat it easily and beat it with a nine-stone handicap. But also, it turns out that all the other Go programs, which were trained by completely different teams using different methods and different network structures and all the rest, they all have the same problem. They all fail to recognize this circular sandwich and lose all their pieces. So it seems to be not just an accident. It's not sort of a peculiar hack that we found for one particular program. It seems to be a qualitative failure of these networks to generalize properly. And in that sense, it's somewhat similar to adversarial images, where we found that these systems that are supposedly superhuman at recognizing objects are extremely vulnerable to making tiny tweaks in images that
Starting point is 00:25:20 are, you know, those tweaks are totally invisible to a human, but the system changes its mind and says, oh, that's not a school bus, it's an ostrich, right? And it's, again, a weakness in the way the circuits have learned to represent the concepts. They haven't really learned the visual concept of a school bus or an ostrich, because they're obviously, for a human, not confusable. And this notion of expressive power is absolutely central to computer science. We use it all over the place when we talk about compilers and we talk about the design of hardware. If you use an inexpressive representation and you try to represent a given concept,
Starting point is 00:26:03 you end up with an enormous and ridiculously overcomplicated representation of that concept. And that representation, you know, let's say it's the rules of Go in an expressive language like Python, that's a page. In an inexpressive language like circuits, it might be a million pages. So to learn that million page representation of the rules of Go requires billions of experiences. And the idea that, oh, well, we'll just get more data and we'll just build a bigger circuit. And then we'll be able to, you know, learn the rules properly. That just does not scale. The universe doesn't have enough data
Starting point is 00:26:42 in it. And we can't, you know, there's not enough material in the universe to build a computer big enough to achieve general intelligence using these inexpressive representations. So I'm with Gary, right? I don't think we're that close to AGI. And I've never said AGI was imminent. You know, generally, I don't answer the question, when do I think it's coming? But I am on the record because someone violated the off-the-record rules of the meeting. Someone plied you with scotch?
Starting point is 00:27:14 No, they literally just broke... I was at a Chatham House rules meeting, and I literally prefaced my sentence with, off the record. And 20 minutes later, it appears on the Daily Telegraph website. So anyway, so I was, you know, the Daily Telegraph, you can look it up. What I actually said was, I think it's quite likely to happen in the lifetime of my children, right? Which you could think of as another way of like, sometime in this century. Before we get into that, can I jump into, sort of wrap up Stuart's point? Because I agree with him. It was a profound result from his lab. There's some people arguing about particular Go programs and so forth, but I wrote an article about Stuart's result called David Beats Goliath. It was on my sub stack. And I'll just read a
Starting point is 00:28:02 paragraph and maybe we can get back to why it's a worry. So Kellen Pellrine, I guess, is the name of the player who actually beat the Go program. And I said, his victory is a profound reminder that no matter how good deep learning, data-driven AI looks when it is trained on an immense amount of data, we can never be sure that systems of this sort really can extend what they know to novel circumstances. We see the same problem, of course, with the many challenges that have stymied the driverless car industry and the batshit crazy errors we've been seeing with the chatbots in the last week. And so that piece also increased my worry level. It's a reminder that these things are almost like aliens. We think we understand them like, oh, that thing knows how to play Go. But there are these little weaknesses there, some of which turn into adversarial attacks,
Starting point is 00:28:49 and some of which turn into bad driving, and some of which turn into mistakes on chatbots. I think we should actually separate out genuine artificial general intelligence, which maybe comes in our lifetimes and maybe doesn't, from what we have now, which is this data-driven thing that, as Stuart would put, is like a big circuit, we don't really understand what those circuits do, and they can have these weaknesses. And so, you know, you talk about alignment or something like that. If you don't really understand what the system does and what weird circumstances it might break down in, you can't really be that confident around alignment for that system. in, you can't really be that confident around alignment for that system.
Starting point is 00:29:26 Yeah, I totally agree. This is an area that my research center is now putting a lot of effort in is if we're going to control these systems at all, we've got to understand how they work. We've got to build them according to much more, I guess, traditional engineering principles where the system is made up of pieces and we know how the pieces work we know how they fit together and we can prove that the whole thing does what it's supposed to do and there's plenty of technological elements available from the history of AI that I think can move us forward in ways where we understand what the system is doing. But I think the same thing is happening in GPT in terms of failure to generalize, right?
Starting point is 00:30:09 So it's got millions of examples of arithmetic. You know, 28 plus 42 is what? 70, right? And yet, despite having millions of examples, it's completely failed to generalize. So if you give it a three or four digit addition problem that it hasn't seen before, and particularly ones that involve carrying, it fails. I think it can actually, just to be accurate, I think it can do
Starting point is 00:30:38 three and four addition addition to some extent. It completely fails on multiplication at three or four digits, if we're talking about Minerva, which is, I think, the state of the art. To some extent, yeah. But I think it works when you don't need to carry because I think it has figured out that eight plus one is nine because it's got a few million examples of that. But when it involves carrying or you get to more digits outside the training set, it hasn't extrapolated correctly. It hasn't learned. The same with chess. It's got lots and lots of grandmaster chess games in its database. But it thinks of the game as a sequence of notation, like in A4, D6, knight takes C3, B3, B5, right? That's what a chess game looks
Starting point is 00:31:28 like when you write it out as notation. It has no idea that that's referring to a chess board with pieces on it. It has no idea that they're trying to checkmate each other. And you start playing chess with it, it'll just make an illegal move because it doesn't even understand what is going on at all. And the weird thing is that almost certainly the same thing is going on with all the other language generation that it's doing. It has not figured out that the language is about a world and the world has things in it. And there are things that are true about the world. There are things that are false about the world. The world has things in it, and there are things that are true about the world.
Starting point is 00:32:04 There are things that are false about the world. And if I give my wallet to Gary, then Gary has my wallet. And if he gives it back to me, then I have it and he doesn't have it. It hasn't figured out any of that stuff. I completely agree. I think that people tend to anthropomorphize, and I'd actually needle Stuart a little bit and say he used words like think and figured out. These systems never think and figure out. They're just finding close approximations to the text that they've seen. And it's very hard for someone who's not tutored in AI to really get that, to look at it, see this very well-formed output, and realize that it's
Starting point is 00:32:41 actually more like an illusion than something that really understands things. So Stuart is absolutely right. It can talk about me having a wallet or whatever, but it doesn't know that there's a me out there, that there's a wallet out there. It's hard for people to grasp that, but that's the reality. And so when it gets the math problem right, people are like, it's got some math. And then it gets one wrong. They're like, oh, I guess it made a mistake.
Starting point is 00:33:02 But really, it never got the math. It finds some bit of text that's close enough some of the time that it happens to have the right answer and sometimes not. Well, I want to return to that point, but I think I need to back up for a second and define a couple of terms just so that we don't lose people. I realize I'm assuming a fair amount of familiarity with this topic from people who've heard previous podcasts on it, but it might not be fair. So quickly, we have introduced a few terms here. We've talked about narrow AI, general AI or AGI, or artificial general intelligence, and super intelligence. And those are interrelated concepts. Stuart, do you just want to break those apart and suggest what we mean by them?
Starting point is 00:33:52 Sure. So narrow AI is the easiest to understand because that typically refers to AI systems that are developed for one specific task. For example, playing Go or translating French into English or whatever it might be. And AGI, or artificial general intelligence, or sometimes called human level artificial intelligence, or general purpose artificial intelligence, would mean AI systems that can quickly learn to be competent in pretty much any kind of task to which maybe to which the human intellect is relevant, and probably a lot more besides. And then artificial superintelligence, or ASI, would mean systems that are far superior to humans in all these aspects. And I think there's something worth mentioning briefly about narrow AI. A lot of commentators talk as if working on narrow AI doesn't present any kind of risk or problem
Starting point is 00:34:47 because all you get out of narrow AI is a system for that particular task. And you could make 100 narrow AI systems and they would all be little apps on your laptop and none of them would present any risk because all they do is that particular task. I think that's a complete misunderstanding of how progress happens in AI. So let me give you an example. Deep learning, which is the basis for the last decade of exploding AI capabilities, emerged from a very, very narrow AI application, which is recognizing handwritten digits on checks at Bell Labs in the 1990s. And you can't really find a more narrow application than that. But whenever a good AI researcher works on a narrow task, and it turns out that the task is not solvable by existing methods, they're likely to push on methods, right, to come up with more
Starting point is 00:35:48 general, more capable methods. And those methods will turn out to apply to lots of other tasks as well. So it was Jan LeCun who was working in the group that worked on these handwritten digits. And he didn't write a little program that sort of follows the S around and says, okay, I found one bend. Okay, let me see if I can find another bend. Okay, good. I've got a left bend and a right bend, so it must be an S, right? That would be a very hacky, very non-general, very brittle way of doing handwritten recognition. What he did was he just developed a technique for training deep networks that had various kinds of invariances about images. For example, an S is an S no matter where it appears in the image. You can build that into the structure of
Starting point is 00:36:38 the networks, and then that produces a very powerful image recognition capability that applies to lots of other things, turned out to apply to speech, and in a slightly different form, is underlying what's going on in chat GPT. So don't be fooled into thinking that as long as people are working on narrow AI, everything's going to be fine. If I could just jump in also on the point of general intelligence and what that might look like. Chat is interesting because it's not as narrow in some ways as most traditional narrow AI. And yet it's not really general AI either. It doesn't perfectly fit into the categories. And let me explain what I mean by that. So a typical narrow AI is I will fold proteins or I will play chess or something like that.
Starting point is 00:37:25 It really does only one thing well. Anybody who's played with ChatGPT realizes it does many things, maybe not super well. It's almost like a jack of all trades and a master of none. You can talk to it about chess, and it will play okay chess for a little while, and then, as Stuart points out, probably eventually break the rules because it doesn't really understand them. Or you can talk to it about word problems in math and it will do some of them correctly and get some of them wrong. Almost anything you want to do, not just one thing like say chess, it can do to some extent, but it never really has a good representation of any of
Starting point is 00:37:59 those and so it's never really reliable at any of them. As far as I know, there's nothing that chat GPT is fully reliable at, even though it has something that looks a little like generality. And obviously, when we talk about artificial general intelligence, we're expecting something that's trustworthy and reliable that could actually play chess, let's say, as well as humans or better than them or something like that. They could actually do word problems as well as humans or better than that and so forth. And so it gives like an illusion of generality, but it's so superficial because of the way it works in terms of approximating bits of text that it doesn't really deliver on the promise of being
Starting point is 00:38:36 what we really think of as an artificial general intelligence. Yes. Okay, so let's talk more about the problems with narrow AI here. And we should also add that most narrow AI, although chat GPT is perhaps an exception here, is already, insofar as we dignify it as AI and implement it, it's already superhuman, right? I mean, your calculator is superhuman for arithmetic, and there are many other forms of narrow AI that perform better than people do. And one thing that's been surprising of late, as Stuart just pointed out, is that superhuman AI of certain sorts, like our best GoPlaylan programs, have been revealed to be highly imperfect such that they're less than human in specific instances. And these instances are surprising and can't necessarily be foreseen in advance. And therefore, it raises this question of, as we implement narrow AI, because it is superhuman, it seems that we might always be surprised by its failure modes
Starting point is 00:39:46 because it lacks common sense. It lacks a more general view of what the problem is that it's solving in the first place. And so that obviously poses some risk for us. If I could jump in for one second, I think the cut right there actually has to do with the mechanism. So a calculator really is superhuman. We're not going to find an Achilles heel where there's some regime of numbers that it can't do within what it can represent. And the same thing with Deep Blue. I'd be curious if Stuart disagrees, but I think Deep Blue is going to be able to beat any human in chess, and it's not clear that we're actually going to find an Achilles heel. But when we talk about deep learning driven systems, they're very heavy on the big data
Starting point is 00:40:27 or using these particular techniques, they often have a pretty superficial representation. Stewart's analogy there was a Python program that's concise. We know that it's captured something correctly versus this very complicated circuit that's really built by data. And when we have these very complicated circuits built by data, sometimes they do have Achilles heel. So some narrow AI, I think we can be confident of. So GPS systems that navigate turn by turn, there's some problems like the map could be
Starting point is 00:40:57 out of date, there could be a broken bridge. But basically, we can trust the algorithm there. Whereas these go things, we don't really know how they work. We kind of do. And it turns out sometimes they do have these Achilles heels that are in there. And those Achilles heels can mean different things in different contexts. In one context, it means, we can beat it at go and it's a little bit surprising. In another context, it means that we're using it to drive a car and there's a jet there and that's not in the training set. And it doesn't really understand that you don't run into large objects and doesn't know what to do with a jet
Starting point is 00:41:27 and it actually runs into the jet. So the weaknesses can manifest themselves in a lot of different ways. And some of what I think Stuart and I are both worried about is that the dominant paradigm of deep learning often has these kind of gaps in it. Sometimes I use the term pointillistic. They're like collections of many points in some cloud. And if you come close enough to the points in the cloud, they usually do what you expect. But if you move outside of it, sometimes people call it distribution shift, to a different point, then they're kind of unpredictable. So in the example of math that Stuart and I both like, it'll get a bunch of math problems that are kind of near the points in the
Starting point is 00:42:04 cloud where it's got experience in. And then you move to four-digit multiplication and the cloud is sparser. And now you ask a point that's not next to a point that it knows about, it doesn't really work anymore. And so this illusion, oh, it learned multiplication. Well, no, it didn't. It just learned to jump around these points in this cloud. And that has an enormous level of unpredictability that makes it hard for humans to reason about what these systems are going to do. And surely there are safety consequences that arise from that. And something else Stuart said that I really appreciated is in the old days, in classical AI, we had engineering techniques around these. You
Starting point is 00:42:40 built modules, you knew what the modules did. There were problems then too. I'm not saying it was all perfect, but the dominant engineering paradigm right now is just get more data if it doesn't work. And that's still not giving you transparency into what's going on. And it can be hard to debug. And so like, okay, now you built this Go system and you discover it can't beat humans doing this thing.
Starting point is 00:43:00 What do you do? Well, now you have to collect some data pertaining to that. But is it going to be general? You kind of have no way to know. Maybe there'll be another attack tomorrow. That's what we're seeing in the driverless car industry is like their adversaries may be of a different sort. They're not deliberate, but you find some error and then people try to collect more data, but there's no systematic science there. You can't tell me, are we a year away or 10 years away or 100 years away from driverless
Starting point is 00:43:26 cars by kind of plotting out what happens? Because most of what matters are these outlier cases. We don't have metrics around them. We don't have techniques for solving them. And so it's this very empirical, we'll try stuff out and hope for this best methodology. And I think Stuart was reacting to that before. And I certainly worry about that a lot, that we don't have a sound methodology where we know, hey, we're getting closer here, and we know that we're not going to deliver us to the promised land of AGI, whether aligned with our interests or not. It's just we need more to actually be able to converge on something like general intelligence, because these networks, as powerful as they seem to be in certain cases, they're exhibiting obvious failures of abstraction, and they're not learning the way humans learn, and we're discovering these failures,
Starting point is 00:44:31 perhaps to the comfort of people who are terrified of the AGI singularity being reached. Again, I want to keep focusing on the problems and potential problems with narrow AI. So there's two issues here. There's narrow AI that fails, that doesn't do what it purports to do. And then there's just narrow AI that is applied in ways that prove pernicious, intentionally or not, you know, bad actors or good actors, you know, reaching unintended consequences. Let's focus on chat GPT for another moment or so, or things like chat GPT. Many people have pointed out that this seems to be potentially a thermonuclear bomb of misinformation. We already have such an enormous misinformation problem just letting the apes concoct it. Now we have created a technology that makes the cost of producing nonsense and nonsense that passes for knowledge almost go to zero. What are your concerns about where this is all headed, where narrow AI of this sort is headed in both of its failure modes? It's failure to do what it's attempting to do, that is,
Starting point is 00:45:45 it's making inadvertent errors, or it's just, it's failure to be applied, you know, ethically and wisely, and we, however effective it is, we plunge into the part of the map that is just, you know, bursting with unintended consequences. Yeah, I find all of this terrifying. It's maybe worth speaking for a second just to separate out two different problems you kind of hinted at it. So one problem is that these systems hallucinate. Even if you give them clean data, they don't keep track of things like the relations between subjects and predicates or entities and their properties. And so they can just make stuff up. So an example of this is a system can say that Elon Musk died in a car crash in 2018.
Starting point is 00:46:27 That's a real error from a system called Galactica. And that's contradicted by the data in the training set. It's contradicted by things you could look up in the world. And so that's a problem where these systems hallucinate. Then there's a second problem, which is that bad actors can induce them to make as many copies or variants, really, of any specific misinformation that they might want. So if you want a QAnon perspective on the January 6th events, well, you can just have the system make that, and you can have it make 100 versions of it. Or if you want to make up propaganda about COVID and vaccines, you can make up 100 versions, each mentioning studies in Lancet and JAMA with data and so forth. All of the data
Starting point is 00:47:08 made up, the study's not real. And so for a bad actor, it's kind of a dream come true. So there's two different problems there. On the first problem, I think the worst consequence is that these chat style search engines are going to make up medical advice. People are going to take that medical advice and they're going to get hurt. On the second one, I think what's going to get hurt is democracy because the result is going to be there's so much misinformation, nobody's going to trust anything. And if people don't trust that there's some common ground, I don't think democracy works. And so I think there's a real danger to our social fabric there. So both of these issues really matter. And it comes down to, in the end, that if you have systems that approximate the world, but have no real representation of the world at all, they can't validate what they're saying. So they can be abused, they can make mistakes. It's not a great basis, I think, for AI. It's certainly not what I had hoped for.
Starting point is 00:48:00 Stuart? So I think I have a number of points, but I just wanted to sort of go back to something you were saying earlier about the fact that the current paradigm may not lead to the promised land. I mean, I think that's true. I think some of the properties of chat GPT have made me less confident about that claim because it's an empirical claim. As I said, sufficiently large circuits with sufficient recurrent connections can implement Turing machines and can learn these higher level, more expressive representations and build interpreters for them. They can emulate them. They don't really learn them. Well, no, they can actually, they can do that, right? I mean, think about your laptop.
Starting point is 00:48:44 Your laptop is a circuit, but it's a circuit that supports these higher level abstractions. Your brain is a circuit, but it's a... That's right. It's a question of representation versus learning. Right. A circuit that supports that. So it can learn those internal structures which support representations that are more expressive and can then learn in those more expressive representations. So theoretically, it's possible that this can happen. Well, but what we always see in reality is your example before about the four-digit arithmetic, like the systems don't, in fact, converge on the sufficiently expressive representations. They just always converge on these things that are more like masses of conjunctions of different
Starting point is 00:49:24 cases, and they leave stuff out. So I'm not saying no learning system could do that, but these learning systems don't. Well, we don't know that, right? We see some failures, but we also see some remarkably capable behaviors that are quite hard to explain as just sort of stitching together bits of text from the training set. I mean, I think we're going to disagree there. It's up to Sam how far he wants us to go down that rabbit hole. Well, actually, let's just spell out the point that's being made. I also don't want to lose Stuart's reaction to his general concerns about narrow AI, but I think this is an interesting point intellectually. So yes, there's some
Starting point is 00:50:06 failure to use symbols or to recognize symbols or to generalize. And it's easy to say things like, you know, here's a system that is playing Go better than any person, but it doesn't know what Go is, or it doesn't know there's anything beyond this grid. It doesn't recognize the groups of pieces, etc. But on some level, the same can be said about the subsystems of the human mind, right? I mean, like, you know, yes, we use symbols, but the level at which symbol use is instantiated in us, in our brains, is not itself symbolic, right? I mean, there is a reduction to some piecemeal architecture. I mean, there's just atoms in here, right?
Starting point is 00:50:53 Of course. At the bottom is just atoms, and the same is true of a laptop. There's nothing magical about having a meat-based computer. In the case of your laptop, if you want to talk about something like, I don't know, the folder structure in which you store your files, it actually grounds out and computer scientists can walk you through the steps, we could do it here if you really wanted to, of how you get from a set of bits to a hierarchical directory structure. And that hierarchical directory structure can then be computed over. So you can, for example, move a subfolder to inside of another subfolder. And we all know the algorithms for how to do that.
Starting point is 00:51:33 But the point is that the computer has essentially a model of something and it manipulates that model. So it is a model of where these files are, or representation might be a better word in that case. Humans have models of the world. So I have a model of the two people that I'm talking to and their backgrounds and their beliefs and desires to some extent. It's going to be imperfect, but I have such a model. And what I would argue is that a system like ChatGPT doesn't really have that. And in any case, even if you could convince me that it does, which would be a long uphill battle, we certainly don't have access
Starting point is 00:52:01 to it so that we can use it in reliable ways in downstream computation. The output of it is a string, whereas in the case of my laptop, we have very rich representations. I'll ignore some stuff about virtual memory that make it a little bit complicated, and we can go dig in and we know which part of the representation stands for a file and what stands for a folder and how to manipulate those and so forth. We don't have that in these systems. What we have is a whole bunch of parameters, a whole bunch of text, and we kind of hope for the best. Yeah, so I'm not disagreeing that we don't understand how it works.
Starting point is 00:52:37 But by the same token, given that we don't understand how it works, it's hard to rule out the possibility that it is developing internal representational structures, which may be of a type that we wouldn't even recognize if we saw them. They're very different. And we have a lot of evidence that bears on this. For example, all of the studies of arithmetic or Guy Van der Broek's work on reasoning, where if you control things, the reasoning doesn't work properly. In any domain where we can look, or math problems, or anything like that, we always see spotty performance. We always see hallucinations. They always point to there not being a deep, rich, underlying representation of any phenomena that we're talking about.
Starting point is 00:53:19 So from my mind, yes, you can say there are representations there, but they're not like world models. They're not world models that can be reliably interrogated and acted on. And we just see that over and over again. Okay, I think we're going to just agree to disagree on that. But the point I wanted to make was that if Gary and I are right, and we're really concerned about the existential risk from AGI, we should just keep our mouth shut, right? We should let the world continue along this line of bigger and bigger deep circuits. Well, yeah, I think that's the really interesting question I wanted your take on, Stuart. And it
Starting point is 00:53:58 goes back to the word Sam used about promised land. And the question is, is AGI actually the promised land we want to get to? So I've kind of made the argument that we're living in a land of very unreliable AI and said, there's a bunch of consequences for that. Like we have chat search, it gives bad medical advice, somebody dies. And so I have generally made the argument, but I'm really interested in Stuart's take on this, that we should get to more reliable AI where it's transparent, it's interpretable, it kind of does the things that we expect. So if we ask it to do four-digit arithmetic, it's going to do that, which is kind of the classical computer programming paradigm
Starting point is 00:54:34 where you have subroutines or functions and they do what you want to do. And so I've kind of pushed towards, let's make the AI more reliable. And there is some sense in which that is more trustworthy, right? You know that's going to do this computation. But there's also a sense in which maybe things go off the rail at that point that I think Stuart is interested in. So Stuart might make the argument, let's not even get to AGI. I'm like, hey, we're in this lousy point with this unreliable AI. Surely it must be better if we get to reliable AI. But Stuart, I think, sees somewhere along the way where we get to a transition where, yes, it reliably does its computations, but also it poses a new set of risks. Is that right, Stuart? And do you want to
Starting point is 00:55:15 spell that out? I mean, if we believe that building bigger and bigger circuits isn't going to work, and instead we push resources into, let's say, methods based on probabilistic programming, which is a symbolic kind of representation language that includes probability theory, so it can handle uncertainty, it can do learning, it can do all these things. But there are still a number of restrictions on our ability to use probabilistic programming to achieve AGI. But suppose we say, okay, fine, well, we're going to put a ton of resources into this much more engineering-based, semantically rigorous component composition kind of technological approach. And if we succeed,
Starting point is 00:56:00 right, we still face this problem that now you build a system that's actually more powerful than the human race. How do you have power over it? And so I think the reason to just keep quiet would be give us more time to solve the control problem before we make the final push towards AGI. Against that... And if I'm being intellectually honest here, I don't know the right answer there. So I think we can, for the rest of our conversation, take probabilistic programming as kind of standing for the kinds of things that might produce more reliable systems like I'm talking about. There are other possibilities there, but it's fine for present purposes. The question is, if we could get to a land of probabilistic programming that at least is transparent, it generally does the things we expect it to do, is that better or worse than
Starting point is 00:56:50 the current regime? And Stuart is making the argument that we don't know how to control that either. I mean, I'm not sure we know how to control what we've got now, but that's an interesting question. Yeah. So let me give you an example. Let me give you a simple example of systems that are doing exactly what they were designed to do and having disastrous consequences. And that was the recommender system algorithm. So in social media, let's take YouTube, for example. When you watch a video on YouTube, it loads up another video for you to watch next. How does it choose that? Well, that's the learning algorithm
Starting point is 00:57:25 and it's watched the behavior of millions and millions of YouTube users and which videos they watch when they're suggested and which videos they ignore or watch a different video or even check out of YouTube altogether. And those learning algorithms are designed to optimize engagement, right? How much time you spend on the platform, how many videos you watch, how many ads you click on, and so on.
Starting point is 00:57:51 And they're very good at that. So it's not that they have unpredictable failures, like they sort of get it wrong all the time. And they don't really have to be perfect anyway, right? They just have to be considerably better than just loading up a random video. And the problem is that they're very good at doing that. But that goal of engagement is not aligned with the interests of the users. And the way the algorithms have found to maximize engagement is not just to pick the right next video, but actually to pick a whole sequence of videos that will turn you into a more predictable victim. And so they're literally
Starting point is 00:58:32 brainwashing people so that once they're brainwashed, the system is going to be more successful at keeping them on the platform. I completely agree. keeping them on the platform. I completely agree. They're like drug dealers. And so this is the problem, right? That if we made that system much better, maybe using probabilistic programming, if that system understood that people exist and they have political opinions, if the system understood the content of the video, then they will be much, much more effective at this brainwashing task that they've been set by the social media companies. And that would be disastrous, right? It wouldn't be a promised land. It would be a disaster. So Stuart, I agree with that example in its entirety. And I
Starting point is 00:59:17 think the question is what lessons we draw from it. So I think that has happened in the real world. It doesn't matter that they're not optimal at it. They're pretty good and they've done a lot of harm. And those algorithms we do actually largely understand. So I accept that example. It seems to me like if you have AGI, it can certainly be used to good purposes or bad purposes. That's a great example where it's to the good of the owner of some technology and the bad of society. I could envision an approach to that, and I'm curious what you think about it. And it doesn't really matter whether they have decent AI or great AI in the sense of being able to do what it's told to do. There's already a problem now. You could imagine systems that could compute the
Starting point is 01:00:01 consequences for society, sort of Asimov's law approach, maybe taken to an extreme, that would compute the consequences of society and say, hey, I'm just not recommending that you do this. The strong version just wouldn't do it, and the weak version would say, hey, here's why you shouldn't do it. This is going to be the long-term consequence for democracy. That's not going to be good for your society. We have an axiom here that democracy is good. So one possibility is to say, if we're going to build AGI, it must be equipped with the ability to compute consequences and represent certain values and reason over them. What's your take on that, Stuart?
Starting point is 01:00:35 Well, that assumes that it's possible for us to write down, in some sense, the utility function of the human race. We can see the initial efforts there in how they've tried to put guard rails on chat GPT, where you ask it to utter a racial slur and it won't do it even if the fate of humanity hangs on the balance, right? So that like, insofar as you... Yeah, I mean, that's not really true. That's a particular example in a particular context. You can still get whatever horrible thing you want out of it. The point being, right, we've not been very successful. We've been trying to write tax law for 6,000 years.
Starting point is 01:01:15 We still haven't succeeded in writing tax law that doesn't have loopholes. Right. I always worry about a slippery slope argument at this point. So it is true, for example, that we're not going to get uniform consensus on values, that we've never made a tax code work, but I don't think we want anarchy either. And I think the state that we have now is either you have systems with no values at all that are really reckless, or you have the kind of guardrails based on reinforcement learning that are very sloppy and don't really do what you want to do. Or in my view, we look behind door number three, which is uncomfortable in itself, but which would do the best we can to have some kind of consensus values and try to work according
Starting point is 01:01:56 to those consensus values. Well, I think there's a door number four, and I don't think door number three works because really there are sort of infinitely many ways to to write the to write the wrong objective and only you can say that about society and it you know no but we're not we're not doing great but we're you know better than anarchy i mean it's it's it's the churchill line about democracy is you know the best of some lousy options we try. That's because individual humans tend to be of approximately equal capability. And if one individual human starts doing really bad things, then the other ones sort of tend to react and squish them out. It doesn't always work. We've certainly had near total disasters, even with humans of average ability. But once we're talking about
Starting point is 01:02:48 AI systems that are far more powerful than the entire human race combined, then the human race is in the position that, as Samuel Butler put it in 1872, that the beasts of the field are with respect to humans, that we would be entirely at their mercy. Can we separate two things, Stuart, before we go to door number four? Yes. Which are intelligence and power. So I think, for example, our last president was not particularly intelligent. He wasn't stupid, but he wasn't the brightest in the world. But he had a lot of power, and that's what made him dangerous, was the power, not the sheer intellect. And so sometimes I feel like in these conversations, people confound superintelligence with what
Starting point is 01:03:28 a system is actually enabled to do, with what it has access to do, and so forth. At least I think it's important to separate those two out. So I worry about even dumb AI like we have right now having a lot of power. There's a startup that wants to attach all the world's software to large language models, and there's a new robot company that I'm guessing is powering their humanoid robots with large language models. That terrifies me.
Starting point is 01:03:53 Maybe not on the existential threat to humanity level, but the level of there's going to be a lot of accidents because those systems don't have good models. So Gary, I completely agree, right? I mean, I wrote an op-ed called We Need an FDA for Algorithms about six years ago, I think. No, I need to read it. Sorry, Stuart.
Starting point is 01:04:13 I think we should hold the conversation on AGI for a moment yet, but I would just point out that that separation of concepts, intelligence and power, might only run in one direction, which is to say that, yes, for narrow AI, you can have it become powerful or not, depending on how it's hooked up to the rest of the world. But for true AGI that is superhuman, one could wonder whether or not intelligence of that sort can be constrained. I mean, then you're in relationship to this thing that, you know, are you sufficiently smart to keep this thing that is much smarter than you from doing whatever it intends to do? One could wonder. I've never seen an argument that compels me to think that that's not possible.
Starting point is 01:04:54 I mean, like, Go programs have gotten much smarter, but they haven't taken more power over the world. No, but they're not general. Honestly, Gary, that's a ridiculous example, right? So suppose you're a gorilla. Wait, wait, wait, Stuart. Before we plunge in, I want to get there. I want to get there. I just don't want to extract whatever lessons we can over this recent development in narrow AI. And then I promise you, we're going to be able to fight about AGI in a mere matter of minutes. But Stuart, so we have got a few files open here. I just want to acknowledge them. One is you suggested that if in fact you think that this path of throwing more and
Starting point is 01:05:33 more resources into deep learning is going to be a dead end with respect to AGI and you're worried about AGI, maybe it's ethical to simply keep your mouth shut or even cheerlead for the promise of deep learning so as to stymie the whole field for another generation while we figure out the control problem. Did I read you correctly there? I think that's a possible argument that has occurred to me and people have put it to me as well. Okay. It's a difficult question. question. I think it's hard to rule out the possibility that the present direction will eventually pan out, but it would pan out in a much worse way because it would lead to systems that were extremely powerful, but whose operation we completely didn't understand,
Starting point is 01:06:21 where we have no way of specifying objectives to them or even finding out what objectives they're actually trying to pursue because we can't look inside. We don't even understand their principles of operation. So, okay, so let's table that for a moment. We're going to talk about AGI, but on this issue of narrow AI getting more and more powerful, I'm especially concerned, and I know Gary is, about the information space. And again, because I just view what ordinary engagement with social media is doing to us as more or less entirely malignant. And the algorithms, as simple as they are and as diabolically effective as they are, have already proven sufficient to test the very fabric of society and the long-term prospects of democracy. But there are other elements here. So for instance, the algorithm is effective and
Starting point is 01:07:22 employed in the context of what I would consider the perverse incentives of the business model of the internet. The fact that everything is based on ads, which gives the logic of endlessly gaming people's attention. If we solve that problem, if we decided, okay, it's the ad-based model that's pernicious here, would the problem of the very narrow problem you pointed to with YouTube algorithms, say, would that go away? Or are you still just as worried by some new rationale about that problem? My view, and then Stuart can jump in with his, my view is that the problems around information space and social media are not going away anytime soon, that we need to build new technologies to detect misinformation.
Starting point is 01:08:12 We need to build new regulations to make a cost for producing it in a wholesale way. And then there's this whole other question, which is like, right now, maybe Stuart and I could actually agree that we have this sort of mediocre AI, can't fully be counted on, has a whole set of problems that goes with it. And then really, the question is, there's a different set of problems if you get to an AI that could, for example, say for itself, you know, I don't really want to be part of your algorithm because your algorithm is going to have these problems. That opens a whole new can of worms.
Starting point is 01:08:47 I think Stuart is terrified about them, and I'm not so sure as to not be worried about them. I'm a little bit less concerned than Stuart, but I can't in all intellectual honesty say that no problems lie there. Maybe there are problems that lie there. Long before we get an algorithm that can reflect in that way, just imagine a fusion of what we almost have with chat GPT with deepfake video technology, right? So you can just get endless content that is a persuasive simulacrum of real figures saying
Starting point is 01:09:23 crazy things. Yeah. This is minutes away, not years away. I have an editor at a major paper. He's not an ordinary editor. He has a special role, but at a paper everybody knows. He read a debate that I got in on Twitter two days ago where I said, I'm worried about these things. And somebody said, ah, it's not a problem. And he showed me how in like four minutes he could make a fake story about like Antifa protesters caused the January 6th thing using his company's templates and an image from Midjourney. And it looked completely authentic. Like this can be done at scale right now. There are dissemination questions, but we know that, for example, you know, Russia has used armies of
Starting point is 01:10:03 troll farms and lots of, you know, iPhones and fake accounts and stuff like that. So this is like an imminent problem. It will affect the 2020 election. It's a past problem, right? It's an ongoing problem. It is here. It's already happened many times. on Google News and look at the fact check section, just in the last day, there have been faked videos of President Biden saying that all 20 to 22-year-olds in the United States will be drafted to fight in the war. Oh my God. This is just here now, and it's a question of scope and
Starting point is 01:10:38 spread and regulation. This doesn't require really further advances in AI. What we have now is already sufficient to cause this problem and is. And I think Sam's point is it's going to explode as the capabilities of these tools and their availability increase. So I would completely agree. I think, you know, I don't want to give the impression that I only care about extinction risk and none of this other stuff matters. I spend a ton of time actually working on lethal autonomous weapons, which again, already exist, despite the Russian ambassador's claim that this is all science fiction and won't
Starting point is 01:11:17 even be an issue for another 25 years. Yeah, it's just nonsense. As he was saying that, you know, there was a Turkish company that was getting ready to announce a drone capable of fully autonomous hits on human targets. So I think the solution here, and I have a subgroup within my center at Berkeley that is specifically working on this, headed by Jonathan Stray, the solution is very complicated. It's an institutional solution. It probably involves setting up some sort of third-party infrastructure, much as in real estate, there's a whole bunch of third parties like title insurance, land registry,
Starting point is 01:12:00 notaries, who exist to make sure that there's enough truth in the real estate world that it functions as a market. Same reason we have accountants and auditing in the stock market. So there's enough truth that it functions as a market. We just haven't figured out how to deal with this avalanche of disinformation and deepfakes, but it's going to require similar kinds of institutional solutions. And our politicians have to get their hands around this and make progress, because otherwise, I seriously worry about democracies all over the world. The only thing I can add to what Stuart said is all of that with the word yesterday. We don't
Starting point is 01:12:41 have a lot of time to sort this out. If we wait till after the 2024 election, that might be too late. We really need to move on this. I have to think the business model of the internet has something to do with this, because if there was no money to be made by gaming people's attention with misinformation, it's not to say it would never happen, but the incentive would evaporate, right? And I mean, there's a reason why this doesn't happen on Netflix, right? There's a reason why we're not having a conversation about how Netflix is destroying democracy in the way it serves up each new video to you. And it's because there's no incentive.
Starting point is 01:13:20 I mean, I guess they've been threatening to move to ads in certain markets, or maybe they have done, so this could go away eventually. But heretofore, there's been no incentive for Netflix to try to figure out—I mean, they're trying to keep you on the platform because they want you not to churn. They want you to end every day feeling like Netflix is an integral part of your life. But there is no incentive— They want you to binge watch for 38 hours straight. Exactly, yeah. That's not entirely innocent in this. No, yeah, but it's not having the effect of giving them a rationale to serve you up insane confections of pseudoscience and overt lies so that someone else can drag your attention for moments or hours because it's their business model, because they've sold them the right to do that on their platform. It's not entirely an internet phenomena in the sense that Fox News also has a kind of
Starting point is 01:14:18 engagement model that does center around, in my view, maybe I get sued for this, but center around, in my view, maybe I get sued for this, but center around misinformation. So, for example, you know, we know that the executives there were not all on board for the big lie about the election, but they thought that, you know, maybe it was good for ratings or something like that. Yeah, I mean, you can look at the Weekly World News, right? That was a... That's right. And go back to yellow journalism in the 1890s. Ordinary print outlet, which every week would tell you that the creatures of hell have been photographed emerging from cracks in the streets of Los Angeles, and you name it, right?
Starting point is 01:14:53 Right. So what happened historically, the last time we were this bad off was the 1890s with yellow journalism, Hearst and all of that. And that's when people started doing fact-checking more. And we might need to revert to that to solve this. We might need to have a lot more fact-checking, a lot more curation, rather than just random stuff that shows up on your feed and is not in any way fact-checked. That might be the only answer here at some level. But probably taking it, so not saying, okay, Facebook has to fact-check all the stuff, or Google has to fact-check all the stuff, but facebook has to fact check all the stuff or google has to fact check
Starting point is 01:15:25 all the stuff but facebook has to make available filters where i can say okay i don't want stuff in my news feed that hasn't passed some uh some basic standard of of accountability and accuracy and it could be voluntary right as a business. So coming back to this business model question, I think that the tech companies are understanding that the digital banner ad has become pretty ineffective and advertisers are also starting to understand this. And I think when you look at the metaverse and say, well, what on earth is the business model here, right? Why are they spending billions and billions of dollars? And I went to a conference in South Korea where the business model was basically revealed by the previous speaker, who was an AI researcher who is very proud of being able to use chat GPT-like technology, along with the fact that you're in the
Starting point is 01:16:21 metaverse, so you have these avatars to create fake friends. So these are people who are avatars who appear to be avatars of real humans who spend weeks and weeks becoming your friend, learning about your family, telling you about their family, blah, blah, blah. And then occasionally will drop into the conversation that they just got a new BMW
Starting point is 01:16:42 or they really love Rolex watch, blah, blah, blah, right? So the digital banner ad is replaced by the chat GPT-driven fake human in the metaverse and goes from 30 milliseconds of trying to convince you to buy something to six weeks, right? That's the business model of the metaverse. And this would be far more effective, far more insidious and destructive.
Starting point is 01:17:08 Although when you think about it, it's what happened, it's what people do to one another anyway. I mean, there's like product placement in relationships. There's a little bit of that, but they're really expensive, right? I mean, you know, an influencer on YouTube, you know, you have to pay them tens of thousands of dollars, you know, to get 10, 20 seconds of product placement out of them. But these are quasi-humans, you know, they cost pennies to run and they can take up hours and hours and hours of somebody's time. And interestingly, the European Union, the AI Act, has a strict ban on the impersonation of human beings. So you always have a right to know if you're interacting with a real person or with a machine.
Starting point is 01:17:54 And I think this is something that will be extremely important. like, yeah, okay, it's not a big risk right now, but I think it's going to become an absolute linchpin of human freedom in the coming decades. I tend to agree. Is this going to be a story of AI to the rescue here, where the only way we can detect deep fakes and other sources of misinformation in the future will be to have sufficiently robust AI that can go to war against the other AIs that are creating all the misinformation? I think it's a useful tool, but I think what we need actually is provenance. So a video, for example, that's generated by a video camera is watermarked and timestamped and location coded. And so if a video is produced that doesn't have that, and it doesn't match up cryptographically with, you know, with the real
Starting point is 01:18:52 camera and so on, then it's just filtered out. So it's much more that it doesn't even appear unless it's verifiably real. It's not that you let everything appear and then you try to sort of take down the stuff that's fake. It's much more of a sort of positive permission to appear based on authenticated provenance. I think that's the right way to go. I think we should definitely do that for video. I think that for text, we're not going to be able to do it. People cut and paste things from all over the place.
Starting point is 01:19:24 We're not really going to be able to track them. It's going to be too easy to beat the watermark schemes. We should still try, but I think we're also going to need to look at content and do the equivalent of fact checking. And I think that AI is important because the scale is going to go up and we're not going to have enough humans to do it. We're probably going to need humans in the loop. I don't think we can do it fully by machine, but I think that it's going to be important to develop new technologies to try to evaluate the content
Starting point is 01:19:48 and try to validate it in something like the way that a traditional fact checker might do. Yeah, I mean, I think also, I think the text is probably more validation of sources. So at least until recently, there are trusted sources of news and we trust them because if a journalist was to generate a bunch of fake news, they would be found out and they would be fired. And I think we could probably get agreement on certain standards of operation of that type. And then if the platforms provide the right filters, then I can simply say I'm not interested in new sources that don't subscribe to these standardized principles of operation. I'm less optimistic about that particular
Starting point is 01:20:39 approach because we've had it for several years and most people just don't seem to care anymore. And the same way that most people don't care about privacy anymore, most people just don't care that much about sources. I would like to see educational campaigns to teach people AI literacy and web literacy and so forth. And hopefully we'd make some progress on that. I think labeling particular things as being false, or I think the most interesting ones are misleading, has some value in it. So a typical example of something that's misleading is if Robert Kennedy says that somebody took a COVID vaccine and then they got a seizure, the facts might be true, but there's an invited inference that taking COVID vaccines is bad for you. And there's lots
Starting point is 01:21:19 of data that show on average, it's good for you. And so I think we also need to go to the specific cases in part because lots of people say some things that are true and some that are false. data that show on average it's good for you. And so I think we also need to go to the specific cases, in part because lots of people say some things that are true and some that are false. I think we're going to need to do some addressing of specific content and educating people through labels around them about how to reason about these things. Okay, gentlemen, AGI alignment and the control problem. Let's jump in. So Gary, early on you said something skeptical about this being a real problem because you didn't necessarily see that AGI could ever form the motivation to be hostile to humanity. Many people have said, certainly Steve Pinker has said similar things. And I think that is, I'll put words into Stuart's mouth and then let him complete the sentence. I think that really is a red herring at this point or a straw man version of the concern. It's not a matter of our robot overlords spontaneously becoming evil.
Starting point is 01:22:29 our robot overlords spontaneously becoming evil. It's a story of what mismatches in competence and in power can produce in the absence of perfect alignment, in the absence of that ever-increasing competence. I mean, now we're talking about a situation where presumably the machines are building the next generation of even better machines. The question is, if they're not perfectly aligned with our interests, which is to say, if human well-being isn't their paramount concern, even as they outstrip us in every conceivable or every relevant cognitive domain, they can begin to treat us spontaneously based on goals that we can no longer even contemplate. The way we treat every other animal on Earth that can't contemplate the goals we have formed. So it's not that we have to be hostile to the creatures of the field or the ants that are walking across our driveways.
Starting point is 01:23:25 that are walking across our driveways. But it's just that the moment we get it into our heads to do something that ants and farm animals can't even dimly glimpse, we suddenly start behaving in ways that are totally inscrutable to them, but also totally destructive of their lives. And just by analogy, it seems like we may create the very entities that would be capable of doing that to us. Maybe I didn't give you much of the sentence to finish, Stuart, but weigh in and then let's give it to Gary. Yeah, I mean, there are a number of variations on this argument. So Steve Pinker says, you know, there's no reason to create the alpha male AI. If we just build AI along more feminine lines, it'll have no incentive to take over the world. Jan LeCun says, well, there's nothing to worry about.
Starting point is 01:24:11 We just don't have to build in instincts like self-preservation. And I made a little grid world MDP, which is a Markov decision process. So it's just a little grid where the AI system has to go and fetch the milk from a few blocks away. And at one corner of the grid, there's a bad person who wants to steal the milk. And so what does the AI system learn to do? It learns to avoid the bad person and go to the other corner of the grid to go fetch the milk so that there's no chance of being intercepted. And we didn't put self-preservation in at all.
Starting point is 01:24:52 The only goal the system has is to fetch the milk. And self-preservation follows as a sub-goal, right? Because if you're intercepted and killed on the way, then you can't fetch the milk. So this is an argument that a five-year-old can understand, right? The real question in my mind is why are extremely brilliant people like Jan LeCun and Steven Pinker not able- Pretending not to understand. I think there's some motivated cognition going on. I think there's a self-defense mechanism that kicks in when your whole being feels like it's under attack because you, in the case of Jan, devoted his life to AI. In the case of Stephen, his whole thesis these days is that progress and
Starting point is 01:25:42 technology has been good for us. And so he doesn't like any talk that progress in technology could perhaps end up being bad for us. And so you go into this defense mode where you come up with any sort of argument. And I've seen this with AI researchers. They immediately go to, oh, well, there's no need to worry.
Starting point is 01:26:02 We can always just switch it off, right? As if a super intelligent AI would never have thought of that possibility. You know, it's kind of like saying, oh, yeah, we can easily beat Deep Blue and all these other chess programs. We just play the right moves. What's the problem?
Starting point is 01:26:18 It's a form of thinking that makes me worry even more about our long-term prospects, because it's one thing to have technological solutions, but if no one is willing to accept the fact that we're not successful... If you'd like to continue listening to this conversation, you'll need to subscribe at samharris.org. Once you do, you'll get access to all full-length episodes of the Making Sense podcast, along with other subscriber-only content,
Starting point is 01:26:44 including bonus episodes and AMAs and the conversations I've been having on the Waking Up app. The Making Sense podcast is ad-free and relies entirely on listener support, and you can subscribe now at SamHarris.org. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.