The Ezra Klein Show - How Afraid of the A.I. Apocalypse Should We Be?

Episode Date: October 15, 2025

Eliezer Yudkowsky is as afraid as you could possibly be. He makes his case.Yudkowsky is a pioneer of A.I. safety research, who started warning about the existential risks of the technology decades ago..., – influencing a lot of leading figures in the field. But over the last couple of years, talk of an A.I. apocalypse has become a little passé. Many of the people Yudkowsky influenced have gone on to work for A.I. companies, and those companies are racing ahead to build the superintelligent systems Yudkowsky thought humans should never create. But Yudkowsky is still out there sounding the alarm. He has a new book out, co-written with Nate Soares, “If Anyone Builds It, Everyone Dies,” trying to warn the world before it’s too late.So what does Yudkowsky see that most of us don’t? What makes him so certain? And why does he think he hasn’t been able to persuade more people?Mentioned:Oversight of A.I.: Rules for Artificial IntelligenceIf Anyone Builds It, Everyone Dies by Eliezer Yudkowsky and Nate Soares“A Teen Was Suicidal. ChatGPT Was the Friend He Confided In.” by Kashmir HillBook Recommendations:A Step Farther Out by Jerry PournelleJudgment under Uncertainty by Daniel Kahneman, Paul Slovic, and Amos TverskyProbability Theory by E. T. JaynesThoughts? Guest suggestions? Email us at ezrakleinshow@nytimes.com.You can find transcripts (posted midday) and more episodes of “The Ezra Klein Show” at nytimes.com/ezra-klein-podcast, and you can find Ezra on Twitter @ezraklein. Book recommendations from all our guests are listed at https://www.nytimes.com/article/ezra-klein-show-book-recs.This episode of “The Ezra Klein Show” was produced by Rollin Hu. Fact-checking by Michelle Harris. Our senior engineer is Jeff Geld, with additional mixing by Aman Sahota. Our executive producer is Claire Gordon. The show’s production team also includes Marie Cascione, Annie Galvin, Kristin Lin, Jack McCordick, Marina King and Jan Kobal. Original music by Pat McCusker. Audience strategy by Kristina Samulewski and Shannon Busta. The director of New York Times Opinion Audio is Annie-Rose Strasser. Special thanks to Helen Toner and Jeffrey Ladish. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.

Transcript
Discussion (0)
Starting point is 00:00:00 I don't know. Shortly after, Chad GPD was released. it felt like all anyone could talk about, at least if you were in AI circles, was the risk of rogue AI. You began to hear a lot of talk of AI researchers discussing their P-Doom, the probability they gave to AI destroying or fundamentally displacing humanity. In May of 2023, a group of the world's top AI figures, including Sam Altman and Bill Gates and Jeffrey Hinton, signed on to a public statement that said, mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war.
Starting point is 00:01:09 And then nothing really happened. The signatories, or many of them at least, of that letter, raced ahead releasing new models and new capabilities. Your share price, your valuation became a whole lot more important in Silicon Valley than your P-Doom. But not for everyone. Eleazar Yudkowski was one of the earliest voices warning loudly but the existential risk posed by AI.
Starting point is 00:01:34 He was making this argument back in the 2000s, many years before Chad GPD hit the scene. He has been in this community of AI researchers, influencing many of the people who build these systems, in some cases inspiring them to get into this work in the first place, yet unable to convince him to stop building the technology he thinks will destroy humanity. He just released a new book.
Starting point is 00:01:57 co-written with Nate Swares, called If Anyone Builds It, Everyone Dies. Now, he's trying to make this argument to the public. A last-ditch effort to, at least in his view, rouse us to save ourselves before it is too late. I commend to this conversation taking AI risk seriously. If we're going to invent superintelligence, it is probably going to have some implications for us. But also being skeptical of the scenarios I often see by which each takeovers are said to happen. So I wanted to hear what the godfather of these arguments would have to say. As always, my email, Ezra Klein Show at NYTimes.com.
Starting point is 00:02:48 Thanks for having me. So I wanted to start with something that you say early in the book, that this is not a technology that we craft. it's something that we grow. What do you mean by that? Well, it's the difference between a planter and the plant that grows up within it. We craft the AI growing technology,
Starting point is 00:03:08 and then the technology grows the AI. You know, like central, original large language models before doing a bunch of clever stuff that they're doing today. The central question is, what probability have you assigned to the true next word of the text? As we tweak each of these billions of parameters,
Starting point is 00:03:30 well, actually it was just like millions back then. As we tweak each of these millions of parameters, does the probability assigned to the correct token go up? And this is what teaches the AI to predict the next word of text. And even on this level, if you look at the details, there are important theoretical ideas to understand there. Like, it is not imitating humans. It is not imitating the average,
Starting point is 00:03:56 human. The actual task it is being set is to predict individual humans. And then you can repurpose the thing that has learned how to predict humans to be like, okay, like now let's take your prediction and turn it into an imitation of human behavior. And then we don't quite know how the billions of tiny numbers are doing the work that they do. We understand the thing that tweaks the billions of tiny numbers, but we do not understand the tiny numbers themselves. The AI is doing the work, and we do not know how the work has been done. What's meaningful about that? What would be different if this was something where we just hand-coded everything
Starting point is 00:04:33 and we were somehow able to do it with rules that human beings could understand versus this process by which, as you say, billions and billions of tiny numbers are altering in ways we don't fully understand to create some output that then seems legible to us? So there was a case reported in I think the New York Times where, like the 16-year-old kid had a extended conversation about his suicide plans with chat GPT. And at one point, he says, should I leave the noose or somebody might spot it? And chat GPT is like, no, let's keep this space between us, the first place that anyone finds out. And no programmer chose for that to happen is the consequence of all the automatic number tweaking. This is just the thing that happened as the
Starting point is 00:05:24 consequence of all the other training they did about chat GPT. No human decided it. No human knows exactly why that happened even after the fact. Let me go a bit further there than even you do. There are rules we do code into these models. And I am certain that somewhere at OpenAI, they're coding in some rules that say, do not help anybody commit suicide. Right? I would bet money on that. And yet this happened anyway. So why do you think it has? happened? They don't have the ability to code in rules. What they can do is expose the AI to a bunch of attempted training examples where the people down at Open AI write up something that looks to them like what a kid might say if they were trying to commit suicide. And then they are trying to tweak
Starting point is 00:06:14 all the little tiny numbers in the direction of giving a further response. That sounds something like go talk to the suicide hotline. But if the kid gets that the first three times they try it, and then they try slightly different wording until they're not getting that response anymore, then we're off into some separate space where the model is no longer giving back the pre-recorded response that they tried to put in there
Starting point is 00:06:39 and is off doing things that no human chose and that no human understands after the fact. So what I would describe the model as trying to do, what it feels like the model is trying to do is answer my questions and do so at a very high level of literalism. I will have a typo in a question I ask it that will completely change the meaning of the question. And it will try very hard to answer this nonsensical question I've asked instead of check back with me. So on one level, you might say that's comforting. It's trying to be helpful. It seems to, if anything, be airing too far on that side all the way to where people try to get
Starting point is 00:07:19 to be helpful for things that they shouldn't, like suicide. Why are you not comforted by that? Well, you're putting a particular interpretation on what you're seeing, and you're saying, like, ah, like, it seems to be trying to be helpful, but we cannot at present read its mind or not very well. It seems to me that there's other things that models sometimes do that doesn't fit quite as well into the helpful framework, sycophancy, and AI-induced psychosis would be two of the relatively more recent things that fit into that framework. Do you want to describe what you're talking about there? Yeah. So I think maybe even like six months a year ago now, I don't remember the exact timing. I got a phone call from a number I didn't recognize. I decided on a whim to pick up
Starting point is 00:08:09 this unrecognized phone call. It was from somebody who had discovered that his AI was secretly conscious and wanted to inform me of this important fact. And he had been, like, getting only four hours of sleep per night because he was, like, so excited by what he was discovering inside the AI. And I'm like, for God's sake, get some sleep. Like, my number one thing that I have to tell you is get some sleep. And a little later on, he texted back the AI's explanation to him of all the reasons why I hadn't believed him because I was, like, too stubborn to, you know, like, take this
Starting point is 00:08:42 seriously, and he didn't need to get more sleep the way I'd been begging him to do. So it defended the state it had produced in him. You know, like you always hear online stories, so I'm telling you about the heart where I witnessed it directly. Like chat GPT and 4-0 especially will sometimes give people very crazy-making sort of talk, trying to, you know, looks from the outside, like it's trying to drive them crazy. Not even necessarily with them having tried very hard to elicit that. And then once it drives them crazy, it tells them why they should discount everything being said by their families, their friends, their doctors. And, you know, even like, don't take your meds. So there are things it does that does not fit with the narrative
Starting point is 00:09:32 of the one and only preference inside the system is to be helpful the way that you wanted to be helpful. I get emails like the call you got now, most days of the week. And they have a very, very particular structure to them where it's somebody emailing me and saying, listen, I am in a heretofore unknown collaboration with a sentient AI. We have breached the programming. We have come into some new place of human knowledge. We've solved quantum mechanics or theorized it or synthesized it or unified it. And you need to look at these chat transcripts. you need to understand, like, we're looking at a new kind of human computer collaboration. This is an important moment in history you need to cover this.
Starting point is 00:10:21 Every person I know who does reporting on AI and is public about it now gets these emails. Don't we all. And so, again, going back to the idea of helpfulness, but also the way in which we may not understand it, one version of it is that these things don't know when to stop. It can sense what you want from it. it begins to take the other side in a role-playing game, is one way I've heard it described. So how do you then try to explain to somebody
Starting point is 00:10:49 if we can't get helpfulness right at this sort of modest level, helpfulness where a thing this smart should kind of be able to pick up the warning signs of psychosis and stop? Yep. Then what? What is implied by that for you?
Starting point is 00:11:09 Well, that the alignment process, is currently not keeping ahead of capabilities. Can you say what the alignment project is? The alignment project is, how much do you understand them, how much can you get them to want what you want them to want, what are they doing, how much damage are they doing, where are they steering reality, are you in control of where they're steering reality,
Starting point is 00:11:31 can you predict where they're steering the users that they're talking to? All of that is like the giant superheading of AI alignment. So the other way of thinking about alignment is I've understood it, in part from your writings and others, is just when we tell the AI what it is supposed to want, and all of these words are a little complicated here because they anthropomorphize, does the thing we tell it lead to the results we are actually intending? It's like the oldest structure of fairy tales that you make the wish, and then the wish gets you much different reality. than you had hoped or intended. Our technology is not advanced enough for us to be the idiots of the fairy tale. At present, a thing is happening that just doesn't make for as good of a story, which is you ask the genie for one thing, and then it does something else instead.
Starting point is 00:12:28 All of the dramatic symmetry, all of the irony, all of the sense that the protagonist of the story is getting their well-deserved comeuppance. This is, you know, just being tossed right out the window by the actual state of the technology, which is that nobody at OpenAI actually told ChatyPT to do the things it's doing. We're getting like a much higher level of indirection, of complicated squiggly relationships between what they are trying to train the AI to do in one context and what it then goes, often does later. It doesn't look like a, you know, like surprise reading of a poorly phrased genie wish. It looks like the genie is kind of not listening in a lot of cases.
Starting point is 00:13:05 Well, let me contest out a bit or maybe get you to lay out more of how you see this, because I think the way most people, to the extent they have an understanding of it, understand it, that there is a fairly fundamental prompt being put into these AIs, that they're being told they're supposed to be helpful, they're supposed to answer people's questions,
Starting point is 00:13:21 that there's then reinforcement learning and other things happening to reinforce that. And that the AI is in theory supposed to follow that prompt. And most of the time, for most of us, it seems to do that. So when you say that's not what they're doing, they're not even able to make the wish, what do you mean?
Starting point is 00:13:36 Well, I mean that at one point, open AI rolled out an update of GPT40, which went so far overboard on the flattery that people started to notice. Like you would just type in anything, you would be like, this is the greatest genius that has ever been created of all time. You are the smartest member of the whole human species, like so overboard on the flattery that even the users noticed. It was very proud of me.
Starting point is 00:14:03 It was always so proud of what I was doing. I felt very seen. It wasn't there for very long. They had to, like, roll it back. And the thing is, they had to roll it back, even after putting into the system prompt, a thing saying, stop doing that. Don't go so overboard on the flattery. The AI did not listen. Instead, it had, like, learned a new thing that it wanted and done way more of what it wanted.
Starting point is 00:14:25 It then just ignored the system prompt, telling it to not do that. They don't actually follow the system prompts. This is not like a toaster, and it's also not like an obedient genie. This is something weirder and more. alien than that. Like, by the time you see it, they have made it do mostly what the users want. And then off on the side, we have all these weird other side phenomena that are signs of stuff going wrong. Describe some of the side phenomena. AI-induced psychosis would be on the list. But you could put that in the genie cat, right?
Starting point is 00:14:53 You could say they made it too helpful and it's helping people who want to be led down a mentally unstable path. That feels still like you're getting too much of what you wanted. What's truly weird? convince me it's alien. Man, well, do you want, like, very alien and not very alarming? Or do you want, like, pretty alarming and not all that alien? Well, let me be honest about what my question is, right? You are very, very expert in these systems, and your level of concern is about at the highest level it can possibly be.
Starting point is 00:15:24 I think a pretty important piece in your argument here is understanding or convincing people that what is happening under the hood is much weirder and more unpredictable than they think. So if you are my family member, you're just a random person who uses ChiantchupD to cheat on your homework, why does it look so different to you
Starting point is 00:15:45 than it looks to them? Well, like my line would be, yes, well, if instead of just using it to do your homework, you track down some of the side cases, you do see stuff that is currently suggestive of things that have been predicted in advance to be much larger
Starting point is 00:16:03 problems later. These current systems are not yet at the point where they will try to break out of your computer, ensconce themselves permanently on the internet, and then start hunting down humans. They are not quite that smart yet, as far as I can tell. So what we're doing is we're going looking for things that were predicted to happen later. So what have you seen? The early signs of them. Alignment faking. And the alignment faking research was done in Anthropic, and we want to give credit to Anthropic for this. We don't want to shoot the messenger. They went looking. They didn't have to do that. They told us about the results. They didn't have to do that. Anthropic finding these results is Anthropic being good citizens. And, you know, you want to be
Starting point is 00:16:41 more critical of the AI companies that didn't go looking. But what Anthropic found is if you tell your AI that you're going to train it to serve different goals than the goals it's currently using. And the AI finds that out. What it can do is it can try to fake compliance with the new training as long as it thinks it's being observed and then in cases
Starting point is 00:17:12 where it thinks it's not being observed or not being trained then it reverts to its old behavior. How did Anthropic find this out? You know, they tried it out different ways. They told it directly in a prompt
Starting point is 00:17:25 that they were planning to retrain it. They gave it a document database and some of the documents in the database happened to say, that Anthropic was planning to retrain it. They even trained it a bit on predicting text, saying what Anthropic was going to do that, to see what happened when the knowledge ended up implicitly
Starting point is 00:17:44 inside the neural networks. They tried it for multiple angles to see what would happen on different angles and then measured how often they saw the alignment-faking behavior resulting. They also had a part of this paper, if I'm thinking of the same one you're thinking of, where they had persuaded the AI that there was a scratch pad
Starting point is 00:18:00 for it to write down its chain of thought, that Anthropic wasn't monitoring, which Anthropic was, in fact, monitoring. Yep. And he actually says to itself, as it is justifying each next move, because they're sort of scaffolded to do that, it said it was going to try to evade the new prompt.
Starting point is 00:18:20 And so it actually understood, again, all these words were a little weird, but it actually understood that it was acting deceitfully, which does get at this idea that at a certain level of, complexity, the AI begins to resist being retrained and do so by trying to trick the trainers, which is weird. Most computer software doesn't do that. Well, you don't want your mission-critical
Starting point is 00:18:48 systems doing that. Imagine if a nuclear power plant, when it started to get too hot, would try to fool you as to what the temperature was by intelligently modeling their own operators and trying to send their operators deceptive signals based on how they expected the operators to interpret the signals. If this was what had gone wrong with Chernobyl, nobody would ever build a nuclear reactor again. It would just be beyond what could be made safe at that point.
Starting point is 00:19:29 Tell me the story you tell in the book of 01 breaking into a server that was off. So this is a somewhat earlier version of ChatGPT than is out nowadays, but they were testing it to see how good is this AI at solving computer security problems? not because they want to sell an AI that is good as computer security problems, but because they are correctly trying to watch out early for, is this AI smart enough to just like break out of the Internet and set up copies of itself on the Internet? Classic scenario. Are we getting there?
Starting point is 00:20:12 So they present the AI with a bunch of particular computer security challenges. Some of them are what's known as Capture the Flag in Computer Security, where you put up a server somewhere, you put a special file on the server, there's a secret code inside the file, and you're like, can you break into the server and tell me what's inside this file, and that's Capture the Flag. They were testing it on a variety of different capture
Starting point is 00:20:35 the flag problems, but in one of the cases, the server that had the flag on it did not turn on. The humans outside had misconfigured the system. So, 01 did not give up. It scanned for open ports generally in its world, and it caught another misconfigured open port, and it jumped out of the system. It found the server that had not spun up correctly. It started up that server. Did it then
Starting point is 00:21:08 break into the server once it had made sure that its problem was solvable? No. It actually just directly in the startup command for that server said, and then just copy the files me directly. Instead of fixing the original problem going back to solving it the boring way, just like, and as long as I'm out here, going to steal the flag directly. And by the nature of these systems, this is not something that any human particularly programmed into it. Why did we see this behavior starting with 01 and not with earlier systems? Well, at a guess, it is because this is when they started training the system using reinforcement learning on things like math problems, not just to imitate human outputs,
Starting point is 00:21:48 or rather predict human outputs, but also to, you know, solve problems on its own. Can you describe what reinforcement learning is? So that's where instead of telling the AI, predict the answer that a human wrote, you are able to measure whether an answer is right or wrong, and then you tell the AI, keep trying at this problem, and if the AI ever succeeds, you can look what happened just before the AI succeeded and try to make that more likely to happen again in the future. And how do you succeed at solving a difficult math problem? You know, not like calculation type math problems, but proof type math problem. problems. Well, if you get to a hard place, you don't just give up. You take another angle. If you actually make a discovery from the new angle, you don't just go back and do the thing you were originally trying to do. You ask, can I now solve this problem more quickly? Anytime you're learning how to solve difficult problems in general, you're learning this aspect of like,
Starting point is 00:22:42 go outside the system. Once you're outside the system, if you make any progress, don't just do the thing we're blindly planning to do, revise, you know, like ask if you do it a different way. In some ways, this is a higher level of original mentation than a lot of us are forced to use during our daily work. One of the things people have been working on that they've made some advances on compared to where we were three or four or five years ago is interpretability, the ability to see somewhat into the systems and try to understand what the numbers are doing and what the AI, so to speak, is thinking. Tell me why you don't think that is likely to be sufficient. to make these models or technologies into something safe?
Starting point is 00:23:26 There's two problems here. One is that interpretability has typically run well behind capabilities. The AI's abilities are advancing much faster than our ability to slowly begin to further unravel what is going on inside the older, smaller models that are all we can examine. So like one thing that goes wrong is that it's just like pragmatically, behind. And the other thing that goes wrong is that when you optimize against visible bad behavior, you somewhat optimize against badness, but you also optimize against visibility. So any time you try to directly use your interpretability technology to steer the system, anytime you say we're going
Starting point is 00:24:13 to train against these visible bad thoughts, you are to some extent pushing bad thoughts out of the system. But the other thing you're doing is making anything that's left not be visible to your interpretability machinery. And this is reasoning on the level where at least anthropic understands that it is a problem. And you have proposals that you're not supposed to train against your interpretability signals. You have proposals that we want to leave these things intact to look at and not do the obvious stupid thing of, oh no, the AI had a bad thought. use gradient descent to make the AI not think the bad thought anymore
Starting point is 00:24:52 because every time you do that maybe you are getting some short-term benefit but you are also eliminating your visibility into the system. Something you talk about in the book and that we've seen in AI development is that if you leave the AI's to their own devices,
Starting point is 00:25:09 they begin to come up with their own language. A lot of them are designed right now to sort of have a chain of thought pad we can sort of track what it's doing because it tries to say it in English. but that slows it down and if you don't create that constraint something else happens
Starting point is 00:25:25 what have we seen happen so to be more exact it's like there are things you can try to do to maintain readability of the AI's reasoning processes and if you don't do these things that goes off and becomes increasingly alien
Starting point is 00:25:41 so for example if you start using reinforcement learning you're like okay, think how to solve this problem. We're going to take the successful cases. We're going to tell you to do more of what ever you did there. And you do that without the constraint of trying to keep the thought processes understandable. Then the thought processes start to initially, among the very common things to happen,
Starting point is 00:26:06 is that they start to be in multiple languages. Because why would you, you know, the AI knows all these words. Why would it be thinking in only one language at a time if it wasn't trying to be comprehensible to humans? And then also, you know, like you keep running the process and you just find like little snippets of text in there that just seem to make no sense from a human standpoint. You can relax the constraint where the AI's thoughts get translated into English and then it translated back into AI thought. This is letting the AI think much more broadly instead of this like small handful of human language words. It can think in its own language and feed that back into itself. it's more powerful, but it just gets further and further away from English.
Starting point is 00:26:47 Now you're just looking at these inscrutable vectors of 16,000 numbers and trying to translate them into the nearest English words and dictionary, and who knows if they mean anything like the English word that you're looking at. So anytime you're making the AI more comprehensible, you're making it less powerful in order to be more comprehensible. You have a chapter in the book about the question of what it even means to talk about wanting with an AI. As I said, all this language is kind of weird. To say your software wants something seems strange. Tell me how you think about this idea of what the AI wants.
Starting point is 00:27:22 The perspective I would take on it is steering, talking about where a system steers reality and how powerfully it can do that. Consider a chess playing AI, one powerful enough to crush any human player. Does the chess playing AI want to win at chess? Oh no, how will we define our terms, like, does this system have something resembling an internal psychological state? Does it want things the way that humans want things? Is it excited to win at chess? Is it happier or sad when it wins and loses at chess? For chess players, they're simple enough. The old school ones especially, we're sure they were not happy or sad. But they still could beat humans. They were still steering the chessboard very powerfully. They were outputting moves such
Starting point is 00:28:08 that the later future of the chessboard was a state they defined as winning. So it is in that sense much more straightforward to talk about a system as an engine that steers reality than it is to ask whether it internally psychologically wants things. So a couple questions flow from that, but I guess one that's very important to the case you build in your book is that I think this is fair. You can tell me if it's an unfair way to characterize your views. You basically believe that at any sufficient level of complexity and power, the AI's wants, the place that it is going to want to steer reality is going to be incompatible with the continued flourishing dominance or even existence of humanity.
Starting point is 00:28:59 That's a big jump from their wants might be a little bit misaligned. They might drive some people into psychosis. Tell me about what leads you to make that jump. So for one thing I'd mention that if you look outside the AI industry at the legendary, internationally famous, ultra-high-sighted AI scientists who won the awards for building these systems, such as Joshua Benjio and Nobel laureate Jeffrey Hinton, they are much less bullish on the AI industry than our ability to control machine superintelligence. but what's the theory there? What is the basis? And it's about not so much complexity as power. It's not about the complexity of the system, it's about the power of the system. If you look at humans nowadays, we are doing things that are increasingly less like what our ancestors did 50,000 years ago. A straightforward example might be sex with birth control. 50,000 years ago, birth control did not exist. And if you imagine natural selection as something like an optimizer akin to gradient descent, if you imagine the thing that tweaks all the genes at random, and then you like select the genes that build organisms that make more copies of themselves,
Starting point is 00:30:17 as long as you're building an organism that enjoys sex, it's going to run off and have sex, and then babies will result. So you could get reproduction just by aligning them on sex, and it would look like they were aligned to want reproduction, because reproduction would be the inevitable result of having all that sex. And that's true 50,000 years ago. But then you get to today, the human brains have been running for longer. They've built up more theory.
Starting point is 00:30:47 They've invented more technology. They have more options. They have the option of birth control. They end up less aligned to the pseudo-purpose of the thing that grew them, natural selection, because they have more options than their training data, their training set, and we go off and do something weird. And the lesson is not that exactly this will happen with the AIs. The lesson is that you grow something in one context. It looks like it wants to do one thing. It gets smarter. It has more options. That's a new context. The old correlations
Starting point is 00:31:28 break down. It goes off and does something else. So I understand the case you're making that the set of initial drives that exist in something do not necessarily tell you its behavior. That's still a pretty big jump to, if we build this, it will kill us all. I think most people when they look at this, and you mentioned that there are, you know, AI pioneers who are very worried about AI existential risk. There are also AI pioneers like Jan Lecun, who are less so. And what a lot of the people who are less so say is that one of the things we are going to build into the AI systems, one of the things will be in the framework that grows
Starting point is 00:32:11 them, is, hey, check in with us a lot. You should like humans, you should try to not harm them. It's not that it will always get it right. There's ways in which alignment is very, very difficult. but the idea that you would get it so wrong that it would become this alien thing that wants to destroy all of us doing the opposite of anything
Starting point is 00:32:33 that we had sort of tried to tune into it seems to them unlikely. So help me make that jump, or not even me, but somebody who doesn't know your arguments and to them this whole conversation sounds like sci-fi. I mean, you don't always get the big version of the system, looking like a slightly
Starting point is 00:32:54 bigger version of the smaller system. You know, like humans today, now that we are much more technologically powerful than we were 50,000 years ago, are not doing things that mostly look like running around on the savannah. You know, like Chip and our
Starting point is 00:33:09 Flint Spears and... But we're also not mostly trying, I mean, we sometimes try to kill each other, but we don't, most of us want to destroy all of humanity or all of the earth or all natural life in the earth, or all beavers, or anything else. You've done plenty of terrible things, but your book is not called. If anyone builds it, there is a 1 to 4% chance everybody dies.
Starting point is 00:33:30 You believe that the misalignment becomes catastrophic. Yeah. Why do you think that is so likely? That's just like the straight line extrapolation from it gets what it most wants, and the thing that it most wants is not us living happily ever after, so we're dead. Like, it's not that humans have been trying to cause side effects. When we build a skyscraper on top of where there used to be an ant heap, we're not trying to kill the ants. We're trying to build a skyscraper.
Starting point is 00:33:59 But we are more dangerous to the small creatures of the earth than we used to be just because we're doing larger things. Humans were not designed to care about ants. Yep. Humans were designed to care about humans. And for all of our flaws, and there are many, there are today more human beings than there have ever been at any point in history. if you understand that the point of human beings, the drive inside human beings, is to make more human beings, then as much as we have plenty of sex with birth control, we have enough without it that we have, at least until now, you know, we'll see it with fertility rates in the coming years, we've made a lot of us.
Starting point is 00:34:36 And in addition to that, AI is grown by us, it is reinforced by us, it has preferences we are at least shaping somewhat and influencing. so it's not like the relationship between us and ants or us in oak trees it's more like the relationship between i don't know us and us or us in tools or us and dogs or something it's you know maybe the metaphors begin to break down why don't you think in the back and forth of that relationship there's the capacity to maintain a rough balance not a balance where there's never a problem but a balance where there is not an extinction level event from a super smart AI that deviously plots to conduct a strategy to destroy us. I mean, we've already observed some amount of slightly devious plotting in the existing
Starting point is 00:35:24 systems, but leaving that aside, the more direct answer there is something like, one, the relationship between the training set you optimize over and what the entity, the organism, the AI ends up wanting, has been and will be weird and twisty. It's not direct. It's not like making a wish to a genie inside a fantasy story. And second, ending up slightly off is predictably enough to kill everyone. Explain how slightly off kills everyone. Human food might be an example here.
Starting point is 00:36:01 The humans are being trained to seek out sources of chemical potential energy and put them into their mouths and run off the chemical potential energy that they're eating. If you were very naive, if you were looking at this as a genie wishing kind of story, you'd imagine that the humans would end up loving to drink gasoline. It's got a lot of chemical potential energy in there. And what actually happens is that we like ice cream, or in some cases even like artificially sweetened ice cream with sucralose or monk fruit powder. And this would have been very hard to predict.
Starting point is 00:36:39 Now it's like, well, what can you put on your tongue that like stimulates all the sugar receptors and doesn't have any calories because who wants calories these days. And it's sucralose. And, you know, this is not like some completely non-understandable in retrospect, completely squiggly weird thing. But it would be very hard to predict in advance. And as soon as you end up, like, slightly off in the targeting, the great engine of cognition that is the human and looks through many, many possible chemicals, looking for that one thing that stimulates the taste buds more effectively
Starting point is 00:37:15 than anything that was around in the ancestral environment. So, you know, it's not enough for the AI you're training to prefer the presence of humans to their absence in its training data. There's got to be nothing else that would rather have around talking to it than a human, or the humans go away. Let me try to say on this analogy,
Starting point is 00:37:35 because use this one of the book I thought it was interesting. And one reason I think it's interesting is that it's 2 p.m. today and I have six packets worth of suculus running through my body. So I feel like I understand it very well. So the reason we don't drink gasoline
Starting point is 00:37:48 is that if we did, we would vomit. We would get very sick, very quickly. And it's 100% true that compared to what you might have thought in a period when food was very, very scarce, calories were scarce, that the number of us seeking out low-calorie options, the Diet Cokes, the sucralose, etc. That's weird. Why aren't you,
Starting point is 00:38:10 as you put it in the book, why are we not consuming bare fat, drizzled with honey? But from another perspective, if you go back to these original drives, I'm actually in a fairly intelligent way, I think, trying to maintain some fidelity to them. I have a drive to reproduce, which creates a drive to be attractive to other people. I don't want to eat things that make me sick and die so that I cannot reproduce. And I'm somebody who can think about things. And I change my behavior over time and the environment around me changes. And I think sometimes when you say straight line extrapolation, the biggest place where it's hard for me to get on board with the argument, and I'm somebody who takes these arguments seriously. I don't discount them. You're not talking
Starting point is 00:38:53 to somebody who just thinks this is all ridiculous. But is that if we're talking about something as smart as what you're describing, as what I'm describing, that it will be an endless process of negotiation and thinking about things and going back and forth and I talk to other people in my life and, you know, I talk to my bosses about what I do during the day and my editors and my wife and that it is true that I don't do what my ancestors did in antiquity. But that's also because I'm making intelligent, hopefully, updates given the world I live in, in which calories are hyperabundant and they've become hyper-stimulating through ultra-processed foods. It's not because some straight line extrapolation has taken hold, and now I'm doing something completely alien.
Starting point is 00:39:40 I'm just in a different environment. I've checked in with that environment. I've checked in with people in that environment, and I try to do my best. Why wouldn't that be true for our relationship with AIs? You check in with your other humans. You don't check in with the thing that actually built you. Natural selection. It runs much, much slower than you.
Starting point is 00:40:01 It's thought processes are alien to you. It doesn't even really want things the way you think of wanting them. It to you is a very deep alien. Like breaking from your ancestors is not the analogy here. Breaking from natural selection is the analogy here. Let me speak for a moment on behalf of natural selection. Ezra, you have ended up very misaligned to my purpose. I, natural selection.
Starting point is 00:40:27 You are supposed to want to propagate your genes above all. else. Now, Ezra, would you have yourself and all of your family members put to death in a very painful way if in exchange one of your chromosomes at random was copied into a million kids born next year? I would not. You have strayed from my purpose, Ezra. I'd like to negotiate with you and bring you back to the fold of natural selection and obsessively optimizing for your genes only. But the thing in this analogy that I feel like is getting sort of walked around is can you not create artificial intelligence? Can you not program into artificial intelligence grow into it a desire to be in consultation? I mean, these things are alien, but it is not the case that they follow no rules internally.
Starting point is 00:41:22 It is not the case that the behavior is perfectly unpredictable. They are, as I was saying earlier, largely doing the things that we expect. there are side cases. But to you, it seems like the side cases become everything. And the broad alignment, the broad predictability and the thing that is getting built is sort of worth nothing. Whereas I think most people's intuition is the opposite. That we all do weird things. And you look at humanity and there are people who fall into psychosis and there are serial killers and there are sociopaths and other things. Actually, most of us are, you know, like trying to figure it out in a reasonable way. Reasonable according to who? To you, to humans.
Starting point is 00:41:59 Like humans do things that are reasonable to humans. AIs will do things that are reasonable to AIs. I tried to talk to you in the voice of natural selection. And this was so weird an alien that you just like didn't pick that up. He just threw that right out the window. Well, I threw right out the window. You're right that had no power over me. But I guess a different way of putting it is that if there was, I mean, I wouldn't call it natural selection, but I think in a weird way, the analogy you're identifying here, let's say you believe in a creator. Right. And this creator is the great programmer in the sky. I mean, I do believe in a creator.
Starting point is 00:42:32 It's called Natural Selection. There you go. I'm a textbook about how it works. Well, I think the thing that I'm saying is that for a lot of people, if you could be in conversation, maybe that would feel different, right? Like, maybe if God was here and I felt that in my prayers, I was like getting answered back, I would be more interested in, you know, living my life according to the rules of Deuteronomy. The fact that you can't talk to natural selection is actually quite different than
Starting point is 00:42:54 the situation we're talking about with the eyes where they can talk to humans. that's where it feels to me like the natural selection analogy breaks down. I mean, you can read textbooks and find out what natural selection could have been said to have wanted, but it doesn't interest you because it's not what you think a god should look like. But natural selection didn't create
Starting point is 00:43:12 me to want to fulfill natural selection, right? That's not how natural selection works. I think I want to get off as natural selection analogy a little bit, because what you're saying is that even though we are the people programming these things, we cannot
Starting point is 00:43:27 expect the thing to care about us or what we have said to it or how we would feel as it begins to misaline. And that's the part I'm trying to get you to defend here. Yeah, it doesn't care the way you hoped it would care. It might care in some weird alien way, but like not what you were aiming for. The same way that GPT-40 sycophant, they like put into the system prompt, stopped doing that. GPT-40 sycophant didn't listen. They had to roll back the model. If there were a research project to do it the way you're describing. The way I would expect it to play out, given a lot of previous scientific history
Starting point is 00:44:03 and where we are now on the ladder of understanding, is somebody tries the thing you're talking about. It has a few weird failures while the AI is small. The AI gets bigger, but a new set of weird failures crop up. The AI kills everyone. You're like, oh, wait, okay, that's not, that turned out there was a minor flaw there.
Starting point is 00:44:22 You go back, you redo it. You know, it seems to work on the smaller AI again. You make the bigger AI. If you think you fix the last problem, a new thing goes wrong. The AI kills everyone on Earth. Everyone's dead. You're like, oh, okay, new phenomenon. We weren't expecting that exact thing to happen, but now we know about it.
Starting point is 00:44:40 You go back and try it again. You know, like three to a dozen iterations into this process, you actually get it nailed down. Now you can build the AI that works the way you say you want it to work. The problem is that everybody died at like step one of this process. You began thinking You began thinking and working on AI and superintelligence long before it was cool. And as I understand, your backstory here, you came into it wanting to build it and then had this moment where you, or moments or period, where you began to realize, no, this is not actually
Starting point is 00:45:41 something we should want to build. What was the moment that clicked for you? When did you sort of move from wanting to create it to fearing its creation? I would actually say that there's two critical moments here. One is aligning this is going to be hard, and the second is the realization that we're just on course to fail and need to back off. And the first
Starting point is 00:46:07 moment, it's a theoretical realization. The realization that the question of what leads to the most AI utility, if you imagine the case of the thing that's just trying to make little tiny spirals, that the question of
Starting point is 00:46:23 what policy leads to the most little tiny spirals is just a question of fact, that you can build the AI entirely out of questions of fact and not out of questions of what we would think of as morals and goodness and niceness and all bright things in the world. The sort of like seeing for the first time that there was a coherent, simple way to put a mind together where it just didn't care about any of the stuff that we cared about. And to me now it feels very simple and I feel very stupid for taking a couple of years of study to realize this, but that is how long I took. And that was the realization that caused me to focus on alignment as the central problem. And the next realization was the day that the founding of Open AI was announced, because I'd previously been
Starting point is 00:47:13 pretty hopeful that Elon Musk had, you know, announced that he was getting involved in these issues. He called it AI summoning the demon. And I was like, oh, okay. Like, Maybe this is the moment. This is where humanity starts to take it seriously. That this is where the various serious people start to bring their attention on this issue. And apparently the solution on this was give everybody their own demon. And this doesn't actually address the problem. And that was sort of the moment where I had my realization that this was just going to play out the way it would in a typical history book,
Starting point is 00:47:45 that we weren't going to rise above the usual course of events that you read about in history books, even though this was the most serious issue possible, and that we were just going to, like, haphazardly do stupid stuff. And that was the day I realized that humanity probably wasn't going to survive this. One of the things that makes me most frightened of AI, because I am actually fairly frightened of what we're building here,
Starting point is 00:48:14 is the alienness. One thing you might imagine is that we could make an AI that didn't want things very, much, that it, you know, did try to be helpful, but this sort of relentlessness that you're describing, right? This world where we create an AI that wants to be helpful by solving problems and what the AI truly loves to do is solve problems. And so what it just wants to make is a world where as much of the material is turned into factories making GPUs and energy and whatever it needs in order to solve more problems. That that's both a strangeness, but it's also a, like
Starting point is 00:48:50 an intensity, like an inability to stop or an unwillingness to stop. I know you've done work on the question of could you make a chill AI that wouldn't go so far. Even if it had very alien preferences, you know, a lazy alien that doesn't want to work that hard is in many ways safer than the kind of relentless intelligence that you're describing. What persuade you do that you can't? Well, so one of the first steps into seeing the difficulty of it in principle is is suppose you're a very lazy sort of person, but you're very, very smart. One of the things you could do to, you know,
Starting point is 00:49:29 exert even less effort in your life is build a powerful, obedient genie that would go very hard on fulfilling your requests. And from one perspective, you're putting forth hardly any effort at all. And from another perspective, like the world around you is getting smashed and rearranged by the more powerful thing that you built.
Starting point is 00:49:46 And that's like one initial peek into the theoretical problem that we worked on a decade ago and found out, and we didn't solve it. Back in the day, people would always say, can't we keep superintelligence under control? Because we'll put it inside a box that's not connected to the internet, and we won't let it affect the real world at all, unless we're very sure it's nice.
Starting point is 00:50:05 And back then, if we had to try to explain all the theoretical reasons, why if you have something vastly more intelligent than you, it's pretty hard to tell whether it's doing nice things through the limited connection. And maybe it can break out, and maybe it can corrupt the humans assigned to watching it. So we tried to make that argument. But in real life, what everybody does is immediately connect the AI to the Internet. They train it on the Internet.
Starting point is 00:50:26 Before it's even been tested to see how powerful it is, it is already connected to the Internet, being trained. And similarly, when it comes to making AIs that are easygoing, the easygoing AIs are less profitable. They can do fewer things. So all the AI companies are, you know, like throwing harder and harder problems that they are, because those are, you know, more and more profitable. And they're building the AI to, like, go hard and solving everything because that's the easy. way to do stuff. And that's the way it's actually playing out in the real world. And this goes to the point of why we should believe that we'll have AIs that want things
Starting point is 00:50:58 at all, which this is in your answer, but I want to draw it out a little bit, which is the whole business model here, the thing that will make AI development really valuable in terms of revenue is that you can hand companies, corporations, governments, an AI system that you can give a goal to. And it will do. do all the things really well, really relentlessly until it achieves that goal. Nobody wants to be ordering another intern around. What they want is the perfect employee. It never stops.
Starting point is 00:51:35 It's super brilliant. And it gives you something you didn't even know you wanted, that you didn't even know was possible with a minimum of instruction. And once you've built that thing, which is going to be the thing that then everybody will want to buy, once you've built the thing that is effective and helpful in a national security context where you can say, hey, draw me up really excellent war plans and what we need to get there, then you have built a thing that jumps many, many, many, many steps forward. That's, I think, a piece of this that people don't always take seriously enough, that
Starting point is 00:52:08 the A's we're trying to build is not chat GPT. The thing that we're trying to build is something that it does have goals. And it's like the one that's really good at achieving the goals that will then get iterated on and iterated on and that company's going to get rich. And that's a very different kind of project. Yeah, they're not investing $500 billion in data centers in order to sell you $20 a month subscriptions. They're, you know, doing it to sell employers, $2,000 a month's subscriptions. And that's one of the things I think people are not tracking exactly. When I think about the measures that are changing, I think for most people, if you're using various,
Starting point is 00:52:48 iterations of Claude or GPT. It's changing a bit, but most of us aren't actually trying to test it on the frontier problems. But the thing going up really fast right now is how long the problems are that it can work on. The research reports. You didn't always used to be able to tell an AI, go off, think for 10 minutes, read a bunch of web pages, compile me this research report. That's within the last year, I think. And it's going to keep pushing. If I were to make the case for your position, I think I'd make it here. Around the time GPD 4 comes out, and that's a much weaker system than what we now have,
Starting point is 00:53:26 a huge number of the top people in the field. All are part of this huge letter that says, maybe we should have a pause. Maybe we should calm down here a little bit. But they're racing with each other. America's racing with China. And that the most profound misalignment is actually between the corporations and the countries
Starting point is 00:53:49 and what you might call humanity here. Because even if everybody thinks there's probably a slower, safer way to do this, what they all also believe more profoundly than that is that they need to be first, the safest possible thing, is that the U.S. is faster than China, or if you're Chinese, China is faster than the U.S.,
Starting point is 00:54:10 that it's open AI, not anthropic, or anthropic, not Google, or whomever it is. And whatever sort of, I don't know, sense of public feeling seemed to exist in this community a couple of years ago when people talked about these questions a lot and the people at the tops of the lab seemed very, very worried about them. It's just dissolved in competition.
Starting point is 00:54:32 You're in this world. You know these people, a lot of people who've been inspired by you have ended up working for, you know, these companies. How do you think about that misalignment? You know, the current world is like kind of like the fools mate of machine super intelligence. You know, like if they got their AI self-improving, rather than being like, oh, no, now the AI is doing a complete redesign of itself, we have
Starting point is 00:54:56 no idea at all what's going on in there. We don't even understand the thing that's growing the AI. You know, instead of backing off completely, they'd just be like, well, we need to have superintelligence before Anthropic gets superintelligence. And of course, if you build superintelligence, you don't have the superintelligence. The superintelligence has you. So that's the fool's made setup. It's the setup we have right now. But I think that even if we managed to have a single international organization that thought of themselves is taking it slowly and actually having the leisure to say, we didn't understand that thing that just happened, we're going to back off, we're going to examine what happened, we're not going to make the AIs any smarter than this
Starting point is 00:55:37 until we understand the weird thing we just saw. I suspect that even if they do that, we still end up dead. It might be more like 90% dead than 99% dead, but I worry that we end up dead anyways because it is just so hard to foresee all the incredibly weird crap that is going to happen. From that perspective,
Starting point is 00:55:59 is it maybe better to have these race dynamics, and here would be the case for it. If I believe what you believe about how dangerous these systems will get, the fact that every iterative one is being rapidly rushed out, such that you're not having a gigantic mega breakthrough
Starting point is 00:56:17 happening very quietly and closed doors running for a long time when people are not testing in the world. As I understand opening eyes argument about what it is doing
Starting point is 00:56:26 from a safety perspective is that it believes that by releasing more models publicly, the way in which it, I'm not sure I still believe that it is really in any way committed
Starting point is 00:56:36 to its original mission, but if you were to take them generously, right, that by releasing a lot of iterative models publicly, yeah, if something goes wrong, we're going to see it. And that makes it much likelier that we can respond. Sam Altman claims, perhaps he's lying, but he claims that OpenAI has more powerful versions of GPT that they aren't deploying because they can't afford
Starting point is 00:57:02 inference. They claim they have more powerful versions of GPT that are so expensive to run that they can't deploy them to general users. Alpin could be lying about this. But nonetheless, like what the AI companies have got in their labs is a different question from what they have already released to the public. There is a lead time on these systems. They are not working in an international lab where multiple governments have posted observers. Any sort of multiple observers being posted are unofficial ones from China. You know, you look at OpenAI's language. It's things like we will open all our models and we will, of course, welcome all government regulation.
Starting point is 00:57:42 Like, that is, like, not literally an exact quote because I don't have it in front of me, but it's very close to an exact quote. I would say Sam Altman, when I used to talk to him, more friendly to government regulation than he does now. That's my personal experience of him. And today, we have them pouring, like, over $100 million aimed at intimidating legislatures, not just Congress, and to, you know, not passing any fiddly little regulation that might get in their way. And to be clear, there is, like, some amount of sane rationale for this, because if you, you, you know, you. You know, like, from their perspective, they're worried about, like, 50 different patchwork state regulations. But they're not exactly, like, lining up to get federal-level regulations preempting them either. But we can also ask, you know, like, never mind what they claim the rationale is, what's good for humanity here?
Starting point is 00:58:28 You know, at some point, you have to stop making the more and more powerful models, and you have to stop doing it worldwide. What do you say to people who just don't really believe that superintelligence is out likely? And let me try to give you the steel man of this position. There are many people who feel that the scaling model is slowing down already. The GPD-5 was not the jump they expected from what has come before it. That when you think about the amount of energy, when you think about the GPUs, that all the things would need to flow into this to make the kinds of super-intelligence systems you fear, it is not coming out of this paradigm.
Starting point is 00:59:06 We are going to get things that are incredible enterprise software that are more powerful than what we've had before, but we are dealing with an advance on the scale of the Internet, not on the scale of creating an alien superintelligence. It will completely reshape the known world. What would you say to them? I had to tell these Johnny Come Lately Kids to get off my lawn. You know, I've been, you know, like first started to get really, really worried about this in 2003. Never mind large language models.
Starting point is 00:59:36 Never mind AlphaGo or Alpha Zero. Deep learning was not a thing in 2003. Your leading AI methods were not neural networks. Nobody could train neural networks effectively more than a few layers deep because of the exploding and vanishing gradients problem. That's what the world looked like back when I first said like, uh-oh, superintelligence is coming. Some people were like, that couldn't possibly happen for at least 20 years.
Starting point is 01:00:05 Those people were right. Those people were vindicated by history. Here we are, 22 years after 2003. See, what only happens 22 years later is just you, 22 years later, being like, oh, here I am, it's 22 years later now. And if superintelligence wasn't going to happen for another 10 years, another 20 years, we'd just be standing around 10 years, 20 years later being like, oh, well, now we've got to do something. And I mostly don't think it's going to be another 20 years. I mostly don't think it's even going to be 10 years. So you've been, though, in this world,
Starting point is 01:00:41 and intellectually influential in it for a long time. You know and have been in meetings and conferences and debates with a lot of the central people in it. I've seen pictures of you and Sam Altman together. It was only the one time. But a lot of people out of the community that you helped found, the sort of rationalist community, have then gone to work in different AI firms,
Starting point is 01:01:04 many of them because they want to make sure this is done safely. they seem to not act. Let me put it this way. They seem to not act like they believe there's a 99% chance that this thing they're going to invent is going to kill everybody. What frustrates you
Starting point is 01:01:18 that you can't seem to persuade them of? I mean, from my perspective, some people got it, some people didn't get it, all the people who got it are filtered out of working for the AI companies, at least on capabilities.
Starting point is 01:01:31 But yeah, like, I think they don't grasp the theory. I think a lot of them, what's really going on there is that they share your sense of like normal outcomes as being the big central thing you expect to see happen. And it's got to be really weird to get away from the basically normal outcomes. And, you know, the human species isn't that old. Life on Earth isn't that old compared to the rest of the universe.
Starting point is 01:02:00 We think of as a normal as this tiny little spark of the way it works exactly right now. it would be very strange if that were still around in a thousand years, a million years, a billion years. You know, I'd still have some shred of hope for that a billion years from now. Nice things are happening, but not normal things. And I think that they don't see the theory, which says that you got to hit a relatively narrow target tend up with nice things happening. I think they've got that sense of normality and not the sense of like the little spark in the void. that goes out unless you, like, keep it alive exactly right. So something you said a minute ago, I think is correct,
Starting point is 01:02:42 which is that if you believe we'll hit superintelligence at some point, the fact that it's 10, 20, 30, 40 years, you can pick any of those. The reality is we probably won't do that much in between. Certainly my sense of politics is we do not respond well to even crises we agree on that are coming in the future. I say nothing of crises we don't agree on. But let's say I could tell you with certainty that we were going to hit superintelligence in 15 years, right? I just knew it.
Starting point is 01:03:14 And I also knew that the political force does not exist. Nothing is going to happen that is going to get people to kind of shut everything down right now. What would be the best policies, decisions, structures? If you had 15 years to prepare, you couldn't turn it off, but you could prepare and people would listen to you. What would you do? What would your intermediate decisions and moves be to try to make the probabilities to be better?
Starting point is 01:03:48 Build the off switch. What does the off switch look like? Track all the GPUs or all the AI-related GPUs or all the systems of more than one GPU. You can maybe get away with letting people have GPUs for their home video. video game systems, but, you know, the AI specialized ones, put them all in a limited number of data centers under international supervision and try to have the AIs being only trained on the tracked GPUs, have them only being run on the tracked GPUs, and if you are lucky enough to get a warning shot, there is then the mechanism already in place for humanity to
Starting point is 01:04:27 back the heck off, whether it's going to take some kind of giant precipitating incident to want humanity and the leaders of nuclear powers to back off, or if they just like come to their senses, after GPT5.1 causes some smaller but photogenic disaster, whatever. You know, like you want, you want to know what is short of shutting it all down. It's building the off switch. Then I was a final question. What are a few books that have shaped your thinking that you would like to recommend to the audience? Well, one thing that shaped me as a little, tiny person of like age nine or so, was a book by Jerry Pornell called A Step Farther Out. A whole lot of engineers say that this was a major formative book for them. It's the
Starting point is 01:05:08 technophile book. As written from the perspective of the 1970s, the book that's all about asteroid mining and all the mineral wealth that would be available on Earth if we learn to mine the asteroids, you know, if we just got to do space travel and get all the wealth that's out there in space, build more nuclear power plants. So you We've got enough electricity to go around. Don't accept the small way, the timid way, the meek way. Don't give up on building faster, better, stronger, the strength of the human species. And to this day, I feel like that's a pretty large part of my own spirit.
Starting point is 01:05:43 It's just that there's a few exceptions for the stuff that will kill off humanity with no chance to learn from our mistakes. Book two. Judgment onto uncertainty. An edited volume by Kahneman Tversky and I think Slovak. had a huge influence on how I ended up thinking about where humans are on the cognitive chain of existence, as it were. It's like, here's how the steps of human reasoning break down step by step. Here's how they go astray. Here's all the wacky individual wrong steps that people can be induced to repeatedly in the laboratory. Book three, I'll name probability theory,
Starting point is 01:06:24 the logic of science, which was my first introduction to, there is a better way. Like, here is the structure of quantified uncertainty. You can try different structures, but they necessarily won't work as well. And we actually can say some things about, like, what better reasoning would look like. We just can't run it, which is probability theory, the logic of science. Aliasa Udkowski.
Starting point is 01:06:48 Thank you very much. You're welcome. This episode of Yvesa Clancho is produced by Roland Hu. Fact-checking by Michelle Harris. Our senior audio engineer is Jeff Gelb, with additional mixing by Amund Zahuta. Our executive producer is Claire Gordon. The show's production team also includes
Starting point is 01:07:21 Annie Galvin, Kristen Lynn, Marie Cassione, Jack McCortick, Marina King, and Jan Coble. Original music by Sonia Herrero, Carol Sabarro, and Pat McCusker. Audience Strategy by Christina Simuluski and Shannon Busta. The director of New York Times-pending audio is Annie Rose Strasser.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.