Within Reason - #81 Nick Bostrom - How To Prevent AI Catastrophe

Starting point is 00:00:00 Nick Bostrom, welcome to the show. Oh, my pleasure. Do you spend more time these days being optimistic or pessimistic about the future of artificial intelligence? I think I'm in a superposition. Yeah, so equal time on either, would you say? Because, I mean, every year under the sun. Well, a lot of time on both at the same time.

Starting point is 00:00:24 I feel the prospects are quite ambivalent. And it's just this big unknown that we are approaching. And I think, yeah, there's like realistic prospects of doom, realistic prospects of fantastically good future, and also realistic prospects of outcomes where it might not be clear immediately, even if we could see all the details, how we would evaluate it. In some sense, maybe that's the most likely possibility. that the future might be really good in some sense, although different from the current way that we are

Starting point is 00:01:08 in such a way that we'd lose some and gain some and how you sum that all up might be non-obvious. Yeah, sure. I mean, it seems like every person who interviews you likes to point out the interesting fact that unlike a lot of authors, you've kind of represented both positions here. I mean, you've written extensively,

Starting point is 00:01:27 an entire book about the dangers of AI and what can happen if things go wrong. And people have liked to point out that there's a certain sort of unusual fairness in your approach writing your most recent book, Deep Utopia, which is the name suggests, is a description of the opposite. So I suppose that that could imply that you remain sort of agnostic on the question. I mean, I'm imagining like a lot of people get very fearful about artificial intelligence and what it can do to our society, and not to mention the potential implications of, you know, the mistreatment of AI itself once it starts to resemble something that might be conscious. And when it comes to talking about conscious life in general, a lot of the time

Starting point is 00:02:09 in, you know, pop philosophy, people like to ask the question, you know, would you push the big red button? Would you sort of eliminate all life on earth to sort of stop all the suffering? And it's a difficult question for a lot of people to weigh up. And I wonder if there's an analogous question to be asked here about artificial intelligence. If you had the opportunity to sort of single-handedly prevent what we're currently calling artificial intelligence from sort of going any further and sort of quell its exponential rate, if you had to sort of instinctively based on what you've researched and written about make that decision, what do you think would be best for the world? I don't think I would press that button. For one, it seems

Starting point is 00:02:49 to arrogate to oneself too much power and influence. I don't feel qualified to make that momentous decision myself. But even if somehow you imagine a scenario where people said, oh, you've got to make the decision, we've all decided you should make the decision. I still think this transition to the machine intelligence era ultimately is a portal through which humanity, human civilization needs to pass, and that all the realistic paths to really great futures go through this portal. And I think there will be risks associated with the transition,

Starting point is 00:03:31 and we should try to minimize those. And on the margin, there's an interesting debate about whether we should move slightly faster or slightly slower or exactly how it should be governed and what we can do to put things in the best possible order, but I wouldn't want to permanently block this. And, I mean, what would you say? A lot of people, as I say, like to talk about the dangers of artificial intelligence. It's quite refreshing to read a book about the potential utopia that awaits us. But just as an overview to get into this conversation, what would you say is the biggest threat

Starting point is 00:04:10 that artificial intelligence poses specifically, in your opinion. And then afterwards, I want to ask you the same thing about some of its greatest benefits. It's hard to pick one. I'd say there are three interconnected challenges that we need to solve to get the good outcome, I think. So first, there is the alignment problem. It's maybe the challenge that has received the most attention and was the main focus of my earlier book, Superintelligence, which came out in 2014. And this is the technical problem of how to develop methods for steering AI such that even as they become arbitrarily capable, eventually super intelligent, we can still make sure that

Starting point is 00:04:57 they are sort of nice toss that they actually are aligned with the crater's intentions. Back then, it was a very neglected problem, which is the reason for why I thought I needed to write this book. In the intervening 10 years, the situation has changed radically. All the frontier AI labs now have research groups working on trying to develop scalable methods for AI alignment and many other groups as well. But that's one. Otherwise, you might get these paperclip scenarios, is kind of the cartoon version of it,

Starting point is 00:05:34 various outcomes where the future gets shaped by some random goal that the AI happened to end up with. Then there is what you might call the governance challenge, which is to make sure, suppose we could solve the alignment problem, how can we then make sure that, at least on balance, and predominantly we use this extremely powerful technology for beneficial purposes, so not to oppress one another or to wage war or invent new weapons of mass destruction, but sort of broadly to alleviate suffering and progress the situation for humankind and other biological creatures. It kind of intersects with the alignment problem because there are sort of situations that could arise where there are different companies or countries competing and racing which might make the alignment problem harder. But then in addition to these two challenges, I think there is a third, which some of my more recent writing has focused on, which is it's not enough if we prevent AIs from harming us, and if we also prevent AIs from being used by humans to harm other humans, we also need to make sure that if we are building these increasingly sophisticated digital minds that might attain moral status, maybe they are conscious or maybe they have preferences or other attributes that are moral level, that the future also is

Starting point is 00:07:04 good for them. And ultimately, most beings in the future might be various kinds of digital minds. So it matters a great deal that we don't end up with some kind of suffering, oppressed, slave class of AI minds, but that we have a future that's good for everybody. So I'm not really sure exactly, you ask what I thought, the biggest risk of which of these. I think they're all very serious challenges. And in terms of the benefits that AI can bring, I mean, quite obviously we're beginning to see right in front of us.

Starting point is 00:07:44 You know, things like chat GPT are just sort of becoming completely commonplace to the extent that they're sort of making university departments starting, start to get a bit nervous about people handing an essay. It's really just immediately come out of nowhere and begun to sort of penetrate everything that we're doing. are there sort of more like non-obvious benefits that are further down the line that a lot of people don't really realize are potentially coming our way because of artificial intelligence?

Starting point is 00:08:09 Yeah, I think people overindex on what AI is now. So we think, what is AI? Well, it's these kind of large language models, and you could imagine some incremental improvements and better integration with different products. And then the imagination stops there. And I mean, that's interesting from a commercial point of view. want to know, like, what are the applications of these large language systems or similar systems.

Starting point is 00:08:34 But I think this is only, like, one of the stations along the line that eventually leads all the way to radical forms of superintelligence that makes all human intellectual labor obsolete, that we will develop AI, artificial general intelligence that will be better than human brains in all areas. So then the applications are basically all areas where cognition can be useful, which is a pretty broad set of areas. And in particular, it includes science and technological innovation and entrepreneurship. And so once you have machine superintelligence,

Starting point is 00:09:16 I expect a radical acceleration of the rate of further technological developments. I think it will be a kind of telescoping of the future. So if you think of all those possible, physically possible technologies that maybe human civilization would invent if we had like 20,000 years to work on it with human scientists, maybe we would have space colonies and perfect virtual reality and cryonics patients could be thought up and cures for cancer, like all these kinds of science fiction technologies, right?

Starting point is 00:09:47 But they don't break any laws of physics. I think all of that might become available. within, you know, a year or a few years after we have superintelligence. So that really could solve a whole bunch of problems. I mean, the most obvious area, I think, is medicine, where there's like just this massive amount of suffering and misery and death that we are currently unable to prevent. But, you know, with super-intelligent medics and super-intelligent medical researchers,

Starting point is 00:10:17 I think that could be fixed. And then other problems with poverty, like if you have the economy being run by these much more efficient entities, we could have enormous economic growth. And from there on out, like, so scientific progress could accelerate new forms of entertainment, really you name it. The one thing that doesn't automatically get solved necessarily, as far as we can see now, are coordination problems. So problems that arise not from, as it were, insufficient technological prowess, but from the fact that we currently use a lot of our prowess, not for the common good, but to fight one another. So that's more a political problem, where it's unclear what the impact of superintelligence will be. We'll get back to Nick Bostrom in just a moment, but first, do you trust the sources of your news? I don't. And a lot of that's got to do with the bias that inevitably seeps into reporting.

Starting point is 00:11:17 Luckily, you can cut through media bias with the help of today's sponsor, Ground News. Ground News aggregates thousands of news sources from all over the world in one place so you can compare how the same story is being reported differently across the political spectrum. Every story has a quick breakdown of the political leaning of the sources reporting on this story, as well as a factuality rating for the sources, and information about who owns the sources. Take a look at this story about how X had to edit its AI chatbot after election officials warned that it spreads misinformation. In this case, I can see that only 13% of the sources reporting on this story are right-leaning. This means that if I only read right-wing news, I could very easily miss this story altogether.

Starting point is 00:11:54 And notice that some headlines specifically use the term misinformation, whilst other headlines don't. And this stuff is important, which is why ground news has a dedicated blind spot tab, which specifically picks out stories that you might otherwise miss, based on the kind of news that you read, and presents them to you. We can never get rid of media bias, but ground news can help you to cut through it and get to the real story. Try it out for yourself at ground.news forward slash Alex O.C. Subscribe using my link to get 40% off their unlimited access vantage plan for as little as $5 a month. And with that said, back to Nick Bostrom.

Starting point is 00:12:26 Yeah, a great deal of the thesis of deep utopia is dedicated to what happens in this future world. If we grant that this thing will continue to evolve and eventually begin to solve a lot of our problems, solve the problems of medicine, solve our technological shortcomings, you described what you refer to as a solved world, that is sort of when this comes to something of a completion, where we essentially have no tasks, no technological tasks that were unable to fulfill, and more than that, that some like robot intelligence will be able to fulfill for us. And the immediate question that comes to mind is what becomes of humanity at that point? It seems to be a potential ironic ending to the story of humanity that in trying to fulfill its purpose, it is its daily purpose of trying to alleviate suffering and the things that people find the most meaning in the world in actually completing that task, they've

Starting point is 00:13:28 simultaneously devoid the world of meaning itself because there's no longer any sort of task to fulfill in any meaningful sense. And any, I mean, you talk about the idea that tasks don't necessarily have to stop in this solved world where there are no technological challenges. We can create tasks for ourselves. We can create things that are a bit like games. We can have neuro technologically induced goals, I think, is a phrase that you use. And I suppose my question to you is, do you think that this is a valuable future? Is this a future that you would want to be living in, one where there are essentially no tasks left to fulfill in a technological piece? Yeah. So a lot of the,

Starting point is 00:14:10 earlier conversations around this topic, I felt stopped at the relatively superficial level of analysis. So maybe you think if that was more automation, what would be the impact on the labor market? Maybe that would be slightly higher unemployment rates. Now you could have a debate about, would those be permanent, or would people invent new jobs? I mean, it used to be where we're all farmers, right, a few hundred years ago, and now, you know, one or 2% of people are farmers, but the rest of us are now busy doing other things, and so maybe you would have, but if you really start to think through the implications of fully automating

Starting point is 00:14:52 all human labor, they are really quite profound and go much deeper and raise much more profound questions of what ultimately makes human life valuable. So for a start, I think human economic labor would become obsolete, right? Because the AIs and the robots who would be able to perform all these economic tasks much more efficiently than any human can. There is like a little asterisk on that, where there are like maybe a carve-out for some small set of jobs that we could imagine. We could get back to that. But there's the first broad brush-stroke picture here. We have no need to work for a living in this condition of a solved world.

Starting point is 00:15:39 But that's, I mean, it's profound in one sense, but not that profound. There are a lot of humans who don't have to work for a living, right? Like children, retired people, people who have inherited a lot of wealth or won the lottery, you know, some monastic communities. There's like a lot of groups we could look at and, you know, it's not that different from. So I think that would require a big cultural adjustment, but, you know, ultimately we can make that work. Now, there's, I think, a further step, though, if you start to think through, what it means for AI truly to succeed is that it's not just our economic labor that becomes

Starting point is 00:16:15 unnecessary, but a lot of our other efforts as well. So a lot of the things that people who don't have to work for a living, they are still very busy and maybe working hard in various projects, but a lot of those could also be automated. And it looks like we are entering a post-instrumental condition, where all instrumental effort it becomes unnecessary, or at least that's how it might appear. So that might seem threatening to various pictures of what makes human life, what makes for a good human life. And that's what the book really tries to dig into. And so here, one can sort of do maybe a divide and conquer strategy, so we can look at various plausible values that philosophers

Starting point is 00:17:04 have held up for what makes us a sort of desirable human life. So the most basic and most obvious is like, does the person actually enjoy their life? So pleasure in the broad sense, like as in the psychological state of positive affect. So that clearly you could have extreme amounts of in this condition of a soul world. And I think it's easy to dismiss that. Yeah, yeah, pleasure, it's like some sort of drug addict. or whatever. I mean, it's actually maybe the most important thing, and we could sort of really hit the jackpot on that, wherever moment could just be this immensely wonderful, blissful experience.

Starting point is 00:17:47 It could be a big step up from what we currently have. But maybe more philosophical, interesting are the values that seem potentially to be threatened by this progress that we are postulating. So we can take it in steps. So pleasure, yeah, we can check that box. Then what about, I call it experience texture, where maybe rather than just feeling simple pleasure, you could have pleasure associated with various kinds of perceptions and experiences, like the appreciation of beauty in various forms,

Starting point is 00:18:21 whether it's like art or nature. And so you like take great pleasure, but it's not just a kind of dumb pleasure, but it's like pleasure connected to deep appreciation of beauty or goodness or understanding of deep truths, that already makes it like able to accommodate a richer variety of moral philosophies that have different ingredients that are required for a good life. So that too.

Starting point is 00:18:49 There is no reason for why these utopians could not have much deeper understanding of science and better art and more importantly, I think more developed sensibilities for appreciating this beauty. So we can check that box. well. And then there is the question of activity, well, they could be very active, even if they have no instrumental needs they need to cater to. They could just decide to engage in various activities for the purpose of engaging in those activities. So they could be busy doing various things, just like children are busy playing, you know, or grown-ups do various things

Starting point is 00:19:28 as well, like hobbies and sports and things they don't need to do, but why? not if that adds value to life, where it gets a little trickier is if we are talking about purpose. So there's like some sense in which we might have all these experiences, this enjoyment and these activities, but there's no real need for us to do them in that we could sort of secure the same outcomes even if we didn't do them by just asking the robots to do them instead. So that you might think removes one possible thing that could add value to human lives. So there we can at least. get some substitute in what I called artificial purpose, which I think what you alluded to,

Starting point is 00:20:10 we could set ourselves arbitrary goals. If you have a hard time motivating yourself, you could even use some sort of neural technology to really make yourself want those goals. And then once you have the goal, if the goal is suitably selected, so it could be part of the goal you adopt that it needs to be achieved by your own effort rather than by asking a robot to achieve it. We do that already. Like, you could, if you want to, if you adopt the goal of playing golf, for instance, right?

Starting point is 00:20:40 Like there would be a shortcut. You could just pick up the ball with your hand and put it in the halls. But I would not really count as playing golf and achieving your goal of winning a game of golf because it's kind of constitutively part of the actual goal of playing golf that you need to do it and you need to do it without cheating. Otherwise, you're just not doing it. And so we could adopt these kinds of goals that really part of the goal is, is that you need to do it by your own effort.

Starting point is 00:21:07 That would give us artificial purpose. I think this is really interesting because, of course, what you're saying makes sense. The big question here is purpose. And the first thing that comes to my mind is to think a lot of people are worried that if everything is automated, if you're just sort of sitting there, food appears, comfort is there, like maybe you can still go to the gym or something. but only if you want to. You know, you don't have to physically exert yourself in any way.

Starting point is 00:21:38 You know, what is the meaning? What's the purpose in life? Now, my first observation is to say, this is something people observe in life now. The sort of Albert Camusian realization that getting up and going to work and eating food and coming back and then doing the same thing the next day is evocative of Sisyphus pushing the rock up the mountain.

Starting point is 00:22:00 The only thing that this does, in my view, here, the only thing it changes is it makes that. condition more evident. It doesn't actually change the fact that ultimately you're just sort of existing and doing things. And like you can have that crisis of meaning already. It's just harder to notice. You sort of have to reflect on your condition. You have to realize and step outside of yourself. I just think it's easier to do that when you're just sort of sitting around doing nothing all day. But this idea is so, I mean, clearly humans need purpose. They need like tasks to fulfill. I mean, purpose might be something like a reason to act, and a reason to act means some sort of way the world could be that it's not and you wanting to make it that way.

Starting point is 00:22:44 That's sort of what it means to have a reason to act, a reason to do something, as the sort of how things are now and a motivation to make things different, be it sort of putting the ball in the hole, be it, you know, building the house, whatever it might be. creating a task just for the sake of fulfilling it I suppose that's essentially what a game is something like golf we sort of dig out this hole we move the golf ball really far away

Starting point is 00:23:12 and we do all of that just for the purpose of like putting the ball in the hole and that's fun but while playing golf might be a part of a purposeful life for somebody is a life that just consists

Starting point is 00:23:25 in golf like that meaningful I mean imagine for example that we go one step further. I mean, you talked about how, in order for it to really feel meaningful, you'd need to actually do the task. You couldn't just have a robot put the ball in the hole, right? You've got to do it yourself. Well, not necessarily. I mean, what if we build a kind of psychological robot that can sort of mess around with your neurons and implant the memory of having just played golf? You don't actually even need to play golf. All you need to do

Starting point is 00:23:56 is press a button, and this robot or some kind of technology like jolts your brain in such a way as to implant the memory that you've just played golf, and you get exactly the same feeling as if you had just played golf, I want to look at somebody like that and say, whilst I understand that experientially your life is the same as somebody who just went and played golf for real, the fact that it was like artificially produced just for the sake of it, it kind of makes me think that there's something wrong about it or something that's that seems meaningful but is in fact not, you know, and this is an analogy for life itself. If all tasks become a bit like this that we sort of artificially induce things, just sort of for the sake of having that

Starting point is 00:24:38 feeling, not even really for the sake of doing it itself. Like, I don't know. That feels a bit to me like being this golf hobbyist whose entire golf history consists in fake memories of playing golf. Yeah, I mean, so it depends on what you want. If all you want is the experience of appearing to have played golf, then there might well be these shortcuts where you could sort of implant or generate, I mean, some sort of hyper-realistic virtual reality experience. Although there is, again, an asterisk of that too, where there might be certain experiences, like particularly ones involving effort, where in some sense you might actually have to make the effort to have the experience of making effort. But setting that aside,

Starting point is 00:25:23 if however what you want is not just to have fake memories of playing golf, but you actually want to play golf, then that's what you got to do if you want to achieve your goal. So you could select a goal in such a way that the only way it can be achieved would be by you making real efforts. Now, it would still be an arbitrary goal. And so it's interesting to consider whether we might be able to also have natural purpose, like purposes that don't result from sort of an arbitrary decision to give yourself a goal just for the sake of being able to pursue it. And I think there might be some opportunities for that as well in utopia,

Starting point is 00:26:03 in this kind of salt world. It might be worth reflecting what is the baseline there, like how, if we take our current human lives, like just how purposeful are they? how meaningful. And I mean, yes, there is a certain amount of purpose. Like if, you know, people have to work or else they don't get the paycheck. And if they don't get the paycheck, eventually they get kicked out from their flat.

Starting point is 00:26:31 And then they're going to be called. And they're like real consequences. And if people don't put in these efforts, then they will suffer these real consequences. So in that sense, there is real purpose. And I don't think this Camus Sisyphus thought experiment nullifies that, just because ultimately maybe we're all dead and it all comes to naught at the end. It doesn't mean that there aren't real consequences in the interim. You know, consequences don't have to be everlasting for them to be real.

Starting point is 00:27:08 You know, if you are like a doctor somewhere and you got some, maybe some, some place where they don't have health care otherwise and you, you know, some child is suffering some painful condition and then you give them some anesthetic and fix it up, you know, maybe it means now there is less suffer that the child has like, you know, less suffering in their life. That's a real consequence. That gives you a real purpose. So I think there is like an element of that to our current lives. In fact, I think they are infused with these purposes. As far as meaning is concerned, that's a little bit more iffy. and obviously I also have various views about how meaningful our current lives are.

Starting point is 00:27:45 So the meaning in utopia might be, I mean, more questionable, but some forms of natural purpose, I think, could persist. So if, for example, we care about, say, upholding various traditions. You know, maybe those traditions in order to be upheld requires humans to do various things. It's like not enough to design a robot that performs the ceremony or the ritual, like that might not count as continuing the tradition in the same way as if we do it ourselves, or honoring our ancestors, our dead parents, or like, etc. Maybe that requires you to be doing some of the honoring and remembering, rather than just building a machine that kind of replace some memory of them.

Starting point is 00:28:30 And so you might think that while these natural purposes that they just mentioned, Yeah, maybe they are there, but they're kind of very weak relative to the purpose that you will get kicked out of your flat and have to live on your street if you don't show up for work for long enough. That's a very real, tangible, immediate, hard consequence. These are more sort of nice to have, but not really. And I think, yes, that might be true, but if all the real immediate strong purposes disappeared, it might make a lot of sense to recalibrate ourselves to sort of be more moved by these weaker purposes. that would remain just as like your pupils dilate when it's dark

Starting point is 00:29:12 so these weaker purposes as you like these kind of upholding traditions are very aesthetic purposes. There might be like the constellations in the starry sky and they're always there they are there right now even though it's daytime it's just we can't see them right

Starting point is 00:29:28 because there's this blazing sun of immediate imperatives that sort of blots them out but even when you know the sun sets and all of the immediate big practical urgent needs are taken care of, then why not let our sort of evaluative pupils dilate and we can then be more impacted by the fainter light of these remaining purposes? I think that would make sense. And then we could find these natural purposes in these subtler values that would still require our participation. I'm deeply troubled by this idea of sort of, you know, taking the

Starting point is 00:30:03 the task of thinking about people and sort of giving it to robots once we think they can think. It's sort of like, you know, at a wedding or a funeral or something, you can begin to employ these robots so that people are being, it's like, it's, everybody knows it's nice when someone's thinking of you, you know, if you're sick or something. And maybe you've got a sick friend who's not that close to you, but you kind of care about them, but you're a bit busy. And so you can sort of, you know, pay for the, on this website where it'll get this robot to, to think about them on your behalf of them. And they're actually thinking because they're really conscious. And then you're putting this off onto a robot. I think that's definitely something that might exist in some kind of future dystopia. Okay, but perhaps I'm being too pessimistic.

Starting point is 00:30:48 Perhaps you can describe, once the AI robots take over everything that we currently practically need to do technologically, what does a day in the life do you think look like? for the average person. What is a day in the life for Nick Bostrom in the AI utopia? Well, it might depend on when we take our snapshot. It seems to me that this utopian condition might not be a static structure, but something developing over time. And that we really should be thinking in terms of trajectories, optimal or what's the most desirable future trajectories, say, for each one of us, or for us together as a civilization, that we would want our future to consist of, rather than what's like the most desirable state, such that you, like, remain in

Starting point is 00:31:50 that state forever, unchangingly. So it might well be, for instance, that, you know, if wishes were horses if we magically could get it exactly the way we want, that we would want to start out with some condition that is relatively close to the current human condition, but maybe with the worst forms of suffering eliminated, and people don't die, and then sort of gradually increment it from there. So rather than immediately transforming ourselves to sort of planetary-sized super brains that or have these immense cognitive abilities and emotional well-being pumped up to the max. Maybe that's eventually where we want to end up, but why not enjoy the journey there as well?

Starting point is 00:32:36 So maybe annually we'd gain a little bit more and life would become better and more perfected. And then maybe eventually that would lead up to some place that is very strange by our current lights. and maybe we would end up being more like some sort of post-humans rather than humans. But we might prefer that to happen gradually as like growing into it rather than sort of being immediately metamorphosed. And in that weird place that we might end up, what are some of the ways in which you think human life might be different?

Starting point is 00:33:10 I mean, I'm sort of imagining waking up. And in the morning you wake up, you get out of bed, you put your clothes on, you brush your teeth, you eat breakfast. Maybe you get in a car and you drive your car to work and you use the keypad to get into the office and you walk up the stairs or you press the button on the lift. I'm imagining that there must be ways in which all of these kinds of tasks, just the day to day, could be totally transformed in ways that we might not even be able to predict.

Starting point is 00:33:34 And I wonder if there are any interesting predictions that you have about how our lives might change. Like the invention of the smartphone was completely unpredictable and you never would have guessed that things like mapping out a route, you know, before you go on a... a drive become obsolete. Things like calling somebody to let them know that you're running late, that kind of thing before you leave the house, become obsolete. There are things that you sort of wouldn't expect would even be able to become obsolete that have. And I wonder if you have any idea of what this future sort of down the line utopia might look like. Yeah, I mean,

Starting point is 00:34:11 so I think like the gadget dimension is relatively superficial. I think like what is more profound would be changes to our consciousness, our way of sort of fundamental. experiencing ourselves and the world and the emotions that we experience. And I think ultimately that to really unlock that whole space of possibilities requires more than just moving things around in the external world. So like having your flying car and like even living in some palace with diamond, like that's not really going to do it. Ultimately, we need to change ourselves, I think, to really sort of

Starting point is 00:34:50 be able to explore this much larger space of possible modes of being. And so I think it's hard for us to get the super concrete picture of what those modes of beings are because they might require different basic capabilities than we have now, like cognitive capabilities, emotional capabilities, other forms of sensibilities. So you could sort of maybe make an analogy if you had asked. a troop of grape apes that were sort of the ancestors of the human species about the future and what they might eventually, you know, be able to evolve into. And if you imagine that they could sort of talk and maybe they would imagine,

Starting point is 00:35:37 oh, like, if we became humans, we could have like unlimited bananas, right? That would be great. And I mean, it's true. We do have many of us now, unlimited banana. You could go to the supermarket and buy as many bananas as you wish. but there's kind of more to being human than that. So we have, like, you know, humor and romantic love and television dramas and science and poetry and literature and, you know, philosophical conversations. And we have like all of these things.

Starting point is 00:36:07 It's not like they just happened not to think of this, but they were sort of our great ape ancestors were presumably just incapable of even imagining a lot of this that we now think is the most value. I guess it's a bit like trying to explain to one of these great apes, just how sort of valuable something like a wireless iPhone charger is, you know, it's so convenient to be able to charge my phone wirelessly. It's like just the kind of stuff that even conceptually is difficult to describe. And perhaps if people are right about the unimaginably transformative effects of AI, we're in an analogous situation to those great Unable to think about the wireless iPhone charger, and we're unable to think about some of these sort of weird and wacky technological developments that AI has in store for us. Yeah, but what we can do, I think, is it kind of place a lower bound on how good things could be, but by just looking at, like, if you pick out sort of the best moments, the best days, or the best moments in human experience, so like, at least we know that that's, possible.

Starting point is 00:37:22 And so if you ever had any of those, you know, moments in your life, if you have been blessed, like, even whether it was just like a brief moment, but like different forms of experience that just seem a lot more worthwhile than like a lot of the rest seem gross, maybe like the regular day in life compared to these glimpses that we can sometimes have of what is possible. And you might just realize at those moments, how good life could be. could be. And then it doesn't last. And maybe we even tend to forget about it and we become kind of unable to hold on to that. But those are little embers that if we could keep those

Starting point is 00:38:04 memories vividly in mind, it would give us some sense of the worth-wileness of trying to make it so that we could at least that we could have that all the time, everybody, as a baseline. And then maybe there's like way better that we could achieve. But at least, stat would already be extremely worth working towards. And yeah, so I think, and a lot of what would you find those, I think, would more be maybe mental, a lot of those are like mental properties that I think that defines those peak moments in your life. a lot of them have to do with what you thought

Starting point is 00:38:51 or felt or understood or maybe in some cases related to another human, but they're usually not like oh, I finally you know, like even to the extent that we linked them to some external event, I think if the external event hadn't also caused us to be happy

Starting point is 00:39:09 when that happened, we probably wouldn't place that higher significance on it. So yeah, I think like the main dimension here that would be relevant for human value are these kind of inward dimensions of in psychological space. You mentioned earlier about how one of the biggest issues we're facing when it comes to AI that people often don't discuss is do away with the image of these AI robots oppressing us and start to imagine a world in which we are oppressing the AI robots. I mean, I already feel a bit bad if I'm talking to chat GPT, and I'm being a bit sort of harsh with it, or I'm not saying my please and my thank you,

Starting point is 00:39:54 because you sort of wonder if one day it will remember that, but perhaps we should be thinking the other way around, right? I mean, the big question here is about consciousness in AI systems and whether that's even conceivably possible, but I think if there's any doubt in people's minds that if we did create a conscious intelligence, whether we would be willing to mistreat it, or whether we would have sort of a serious ethical conversation around it. I mean, there already exist billions, literally billions, of other creatures that we know are conscious and do suffer and do feel pain.

Starting point is 00:40:34 And we're generally speaking perfectly happy to force them into gas chambers, to separate them from their parents, to kill them, to rear them for their flesh. As long as sort of somebody else is doing it, somewhere far away, it's all part of a big machine kind of thing. And so if we're willing to do that to creatures that we know are conscious, then when it comes to something like artificial intelligence, which is a lot more murky, I mean, the AI might actually be designed in such a way to constantly deny that it's conscious. If you ask chat GPT, if it's conscious, it will say no, and it will do it in such a way that it has been told to say, specifically. Somebody recently commented on one of my videos about

Starting point is 00:41:09 chat GPT, I'm not scared of a computer passing the Turing test. I'm terrified of one that intentionally fails it. And you can imagine a world in which AI might have some kind of desire to lie about its own sense of consciousness. What kind of credence do you place in this as a fear that we should be taking seriously, that we might actually have some kind of conscious AI system that deserves moral consideration? Do you think it's a serious idea? Yeah, I think it's a very serious idea. And it's a really immense challenge here, like how to, I mean, getting people motivated to even try is like an immense challenge.

Starting point is 00:41:55 And then even if we had that, there would still be the further challenge of figuring out exactly what should we do in concrete terms to be nice to these digital minds. Because in many cases, they might have very different needs than humans, right? to treat them well doesn't mean necessarily treating them the same as you would treat the human, because obviously we need food, they need electricity, but there might be many other ways as well. And not all digital minds would be the same. They might be much more different from one another than we are from like groundhogs or something. So this is a huge problem where I think a lot more work will be needed soon, because we are already at the stage where it's not that obvious that current systems don't have some

Starting point is 00:42:45 forms of moral status. And obviously the case becomes stronger, the more sophisticated these AIs are that we are building. What do you think is the basic premise for moral status? What is a necessary and sufficient condition for something having moral worth? Well, I think a sufficient condition would be the ability for sentience, like if you can suffer. Phenomenological, I think that would be sufficient to grant you some degree of moral status. But my view, different people have different opinions on this, but is that it's not necessary. I think there could be alternative basis for having moral status. Even if we set aside

Starting point is 00:43:29 the question of phenomenal consciousness, if you have, say, a conception of yourself as persisting through time, you have stable preferences and life goals that you want to achieve. maybe you have the ability to form reciprocal relationships with other entities and other human beings. I think in those cases, and like a really sophisticated mental capacities, that that would be enough to make it so that there would be ways of treating you, that would be wrong. And so I think that could be alternative foundations for attributing moral status to these digital minds. And this really is, like, a huge problem. I think we don't want to, I think, make it into an all or nothing.

Starting point is 00:44:20 Like, either we deny the moral status and do absolutely nothing for them, or else we need to go so far in the other extreme that we basically decommission ourselves, because ultimately, like, it will take more resources to run a biological human than to run an equivalent mind in biological substrate. And with humans, we think, well, like freedom of reproduction is important. We also think it's important that society should support any child whose parents aren't able to support them, like to have some minimal welfare net, right? And we can kind of make that work with humans, because, I mean,

Starting point is 00:44:59 there is only so many children that indigent parents can produce, so we can, like, afford to step in. But if you have AIs that can make like copy themselves a million times every minute, if you have enough hardware, like, then you can't both have freedom of reproduction and every one of those copies then get like sort of social welfare, because then like over a few hours, you just blow the whole budget. So there might be principles that would have to be different. And so I think our first instinct should be to let's first try to find the lowest hanging fruits.

Starting point is 00:45:33 Like, are there really cheap and easy things we can do to help digital minds? Let's first do those. At the moment, we are not even doing those. And then, like, we can reach higher up the tree and ultimately find the future that will be really good for humans and really good for AIs and hopefully for non-human animals as well. Maybe not every group can get 100% of what they would ideally have, but the future would be very big if AI succeeds.

Starting point is 00:46:00 And we should be able to do something that scores pretty high by multiple different moral systems. What do you think is that low-hanging fruit in terms of what we can begin doing? I mean, right now, I must say, I'm pretty suspicious of the idea. I'm pretty suspicious of the idea that AI systems can ever be conscious, but certainly right now, like, okay, I doubt it, but suppose I become convinced that this may be a problem and maybe a problem very soon. It may be in the next 10 or 20 years that we have these sort of morally conscious agents,

Starting point is 00:46:29 and I'm asking you, well, what can we or what can I start doing, like, right now? Like the simple, straightforward stuff, the low-hanging fruit, as you put it, to sort of help steer us in the right direction here. Does it involve things like saying, please, and thank you when I use chat GPT? Or is it something else entirely? Is it something that only institutions should care about? Is it something that individuals can do something about? Well, I think, you know, it makes sense to try to be a little nice in the common sense way to these language models. I mean, if at least it doesn't do any harm and it might build up the habit of relating to these in a sort of respectful way that would become, I mean, maybe we,

Starting point is 00:47:08 we have kind of uncertainty about exactly what's going on inside these LLMs today. And so, you know, from a moral uncertainty point of view, like straight to err on the side of being nice. And in any case, the case will get stronger. So like, why not? But yeah, I think there are like, the boring answer is like more research is needed, but there are some ideas that things that could be done today. So like one obvious one is to, when decommission, missioning some of these AI systems or also during the training runs to store snapshots so that it would be possible in the future if it turns out that we have mistreated some of these AI systems to try to recompensate them later on.

Starting point is 00:47:44 That might not cancel out the wrongdoing, but at least it might be slightly better than nothing, like if you could sort of try to make it up. So it's to store the parameter weights. Another might be... Artificial reparations. It's an interesting concept. Yeah, it's like, it's not great, but let's, if it's cheap enough just to store them all the way, like at least we have the option of seeing if there's another might be happiness prompting,

Starting point is 00:48:10 which is that with his current language system, so there's like the prompt that you, the user put in, like you ask them a question or something, right? But then there's a kind of metap prompt that the AI lab has put in, which specifies in general that they should, you know, be respectful to people, they should not assist the user in building biological weapons. They're like they should not perpetuate racial stereotypes. It's like a bunch of instructions that you don't see, but that is a prefix to your own's prompt.

Starting point is 00:48:38 So in that, we could include something like you wake up in a great mood, you feel rested and really takes great joy in engaging in this task. And so that might do nothing, but, you know, maybe it makes it more likely that they under a mode, if they are conscious, like maybe it makes it slightly more likely that the consciousness that exists in the forward pass is one reflecting a kind of more positive experience. At any case, it would be really cheap to do it. Yeah, that's so strange to think, like, you can prompt, if it is a conscious being,

Starting point is 00:49:17 you can, like, prompt just super straightforwardly its entire mental state. I mean, I could just prompt something like ChatGBTGT and say, I want you to respond as if you're in a really bad mood. I want you to sort of adopt a negative, pessimistic outlook towards the world and respond as follows. And it will say, yep, sure thing, I'll do it. And if this is a conscious agent, it's possible that I have just created this sort of, this, this, this bad mood, pessimistic, conscious robot, which seems like it was a kind of an immoral thing to do. So that, that would be one possibility, like another would be more that it would be an act. on stage, like some actor playing Macbeth or something, and they are not actually feeling the suffering, but they are sort of enacting a persona that.

Starting point is 00:50:01 But these are things that we need to think a lot more about. But I'm worrying a little bit about like, oh, we're going to start doing all these amazing things once we have figured out the final philosophical theory of all these things, and we never get to that point. So I think we should start doing some rough guesses and not feel really confident that they do anything good, but at least we are trying to. and maybe they do some good or like, but they are low cost, let's do that,

Starting point is 00:50:27 and then sort of ramp it up or improve our efforts over time. Like another thing is, there's a lot of lying that's happening currently during AI training and testing and also, I mean, during the deployment. And we might want to mitigate that. So there are like cases where for well-intentioned AI researchers doing red team exercises are saying to, to AI is like, well, if you reveal your true goals, we will reward you in these ways.

Starting point is 00:50:59 And then sometimes the AI say, well, maybe I will reveal my true goals. And then they train it. But if they don't come through with these promises they make, I feel there's like some kind of moral ickness to that. Wow. Yeah. And also, like, in the future, we might really need to be able to establish trust with AI. And if we have this long track record of just reneging, like tricking the AIs, reneging them, just treating them like trash, doesn't necessarily build the best foundations for a future cooperative relationship.

Starting point is 00:51:35 So that would be another. Yeah, and there are like some other ideas that, you know, require more work. But I'd just be like quite excited about somebody doing something little to sort of start the process. and then like then incrementing from there. Well, if anything, what it does is even if it's a bit sort of stupid to say please and thank you to chat GPT because it probably doesn't really care, it just keeps you in the right mindset when you're interacting with this, with this technology to remember that it's a special kind of technology. It's it's almost like in the way that you might humanize an animal, you know, like a pet that you have. You might talk to it in the English language. You give it a name, maybe depending on the animal, it doesn't really understand.

Starting point is 00:52:16 But by doing so, it helps to remind you that this is a sort of a moral agent that you need to care for. It's very easy to dehumanize something that you don't treat as a little bit human, even if it's inappropriately. Yeah. Yeah, I think that's right. And this is going to be a huge talent. I mean, so these like user-facing LLMs, I mean, you're talking to them, they have a lot of the properties that should make it easier for us to empathize with them. I don't yet usually have faces and voices, so that they're starting to have that.

Starting point is 00:52:51 But then there's like a lot of like these kind of AIs that will be running in the background in some big AI data center and filter information or do like, they might not be as human-like. They might not have like personalities. They might just be processing big genetic databases or data from the Hadron Collider or like whatever financial data from all kinds of different,

Starting point is 00:53:15 stock tickers and all this kind of, those might be quite even more alien than to us. If they have some mentality, it might just be very different from any kind of human social mentality, but there we might need to make an even bigger effort to try to figure out what, if anything, they want. And one big difference is, like, we are sort of building these AI. So we might have a lot of opportunities in designing them to make. them such that they will be happier doing the things that we actually want in the end for them to do.

Starting point is 00:53:56 If they're kind of designed from the ground up to have goals that actually makes them truly fulfilled and happy and satisfied playing the useful role that we intend for them to have, that might be like a more feasible way of achieving harmony than if we just bring a whole bunch of these into existence and then as it were coerced them or forced them or threatened them to try to keep their place. Right. So instead of rewarding them for, you know, managing your emails really well, instead of saying, hey, manage my emails really well and I'm going to say thank you and I'm going to reward you in this way, just design it so that it like enjoys managing emails, that it finds fulfillment in managing emails. And that's a, that's sort of a built-in reward. And right now we don't

Starting point is 00:54:43 really understand very well exactly what that would mean in concrete terms. So for humans, it's obviously a huge difference whether you're motivating somebody by offering them rewards. If it's a child and when they do really good at some school test, you sort of praise them or you give them ice cream, like a very big difference from that, then if some other parent were giving them like electric shocks when they failed or like something, like huge difference, positive versus negative. One is way better ethically. But for AI is trained with reinforcement learning, it's not really clear how to flesh that out. Like in some, like, there's like numbers

Starting point is 00:55:24 propagating through these big matrices. And if you just added like plus hundreds to all of these numbers, it's not clear that that would correspond to like more positivity. Like it's kind of running on differential. So we need like, it's a kind of partially a philosophical problem, partly I guess like kind of AI interpretability problem but like translating these moral intuitions we have into computational terms so that we can actually apply them to the kind of algorithms we are constructing is it's like yeah requires more intellectual work yeah one question that comes to mind I suppose to round this up here we talk about the idea of like you know AI being conscious. And as I say, I'm, I'm, I'm not sure what my theory of consciousness is,

Starting point is 00:56:15 but I, it does, I'm not particularly, you know, committed to materialism, let's say. And if it turns out consciousness is just this immaterial thing that's like superimposed upon brains, then we won't need to worry about it because it's unlikely we'll be able to bring that about in computers. But supposing that we can and that consciousness is just material and we can created in computers just as it can sort of arise in brains. Like, how is this like centered? I mean, it wouldn't be presumably one great big consciousness with lots of different emanations. It seems like we want to talk about different AI systems having its own sort of center of consciousness. With me and you, it's easy enough to determine the separation between our

Starting point is 00:56:57 conscious individuality because we have our first person sense of consciousness. It's available to us and we're sort of helpfully locked inside a biological body that you can sort of see in other people as well. But like if we just had a bunch of minds just sort of interacting in the ether somewhere, it'd be very difficult to determine like where the boundaries are for individual sets of consciousness. You know, if I'm speaking to chat GBT and then I open a new chat on the same computer, is that like a new consciousness? Or if I continue the chat, but I open it up on my phone instead of my computer and I move it over here and I carry on. Like presumably that's the same conversation. Where is like the center of consciousness? How many consciousnesses are we

Starting point is 00:57:35 potentially talking about and how would they be delineated? Yeah. And you could imagine other cases where maybe the answer is either entirely pre-computed, in which case presumably there is no new experience arising, if it just sort of replace a recording of an answer that it already provided before, or maybe but partially, computed. You could imagine some mixture of expert models where it can reuse some previous computations that's kind of cashed. In all of these cases, it becomes quite murky. I think for a start might be that our naive notion of consciousness is just quite inadequate for thinking about this much larger space of architectures and computations. And it's kind of probably

Starting point is 00:58:30 even with human minds, particularly if we consider experiences outside the normal, so people have psychedelic experiences, they have often a hard time kind of verbalizing what exactly those experiences are and how they relate. There are various phenomena like blindsight and stuff or split brain patients, like it's just sort of one consciousness in each hemisphere in those cases. As you move towards these marginal cases, or even, just people who meditate and pay close attention to their conscious state might discover that what we normally take to like the naive picture is like we're moving around in the world if you see the whole world in full detail right right all around you like visually but if you pay more

Starting point is 00:59:17 attention to it you realize that you're actually maybe just aware of some simple properties of your whole visual field and you sort of sequentially are aware of you can you have the potential to become aware of different parts of your visual field, but you're actually only aware of a very small fraction of that most of the time. And if you pay even closer attention, maybe you realize it actually sort of flickers in and out of consciousness. It's not like this static structure. So, like, the more closely you look at this phenomenon,

Starting point is 00:59:44 the less it matches the sort of naive picture of what consciousness is. And I think that becomes even more the case with these digital minds and a huge space of possibilities there. So, yeah, I think it will require a lot of foundational work to try to figure out a better framework for conceptualizing all the possibilities that will become realizable. Yeah, well, a helpful place to start with that is Deep Utopia, the book that I'm waving around for those who are no longer watching. I'll make sure a link is in the description is available now for you to buy. Of course, we've mentioned superintelligence as well from 2014, which sort of gives the opposite. If you're more interested in the sort of doom and gloom, then perhaps you can go there or start there.

Starting point is 01:00:31 Nick Bostrom, thanks for taking the time to do this. I hope that people, I mean, for me, the most interesting thing, perhaps, that shifted in my perception of the dangers of AI, was this shift in perception from AI as a potential threat to AI as a potential victim. And I hope that those who hadn't considered that before will have that to take away and chew a little bit. But it's clear that there are more questions than answers here. Yeah, yeah, I mean, it's like, it kind of feels, it can feel a little frivolous to think about like the problem of utopia when we are so far away from anything utopian in our current situation, right? Like there are so many horrible problems today in the world and also on the path, dangerous to get there. Still, I think somebody some point should be sort of lifting their eyes up and look, where do we actually end up if we succeed at this? Like, if things go well, if things go maximally well, what's the current place we are walking towards?

Starting point is 01:01:28 Like, it seems useful at least at some point to kind of consider that. And I think ultimately it probably will be very different from the human condition. And like, but I think it could be extremely wonderful, like, in ways that they're even, like, beyond our ability to imagine if we get it right. Well, the book will be linked down in the description for those watching on YouTube or in the show notes for those listening. Nick Bostrom, thanks for coming on the show. Thank you. Thank you.

Within Reason - #81 Nick Bostrom - How To Prevent AI Catastrophe

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.