Dwarkesh Podcast - Carl Shulman (Pt 2) — AI Takeover, bio & cyber attacks, detecting deception, & humanity's far future

Episode Date: June 26, 2023

The second half of my 7 hour conversation with Carl Shulman is out!My favorite part! And the one that had the biggest impact on my worldview.Here, Carl lays out how an AI takeover might happen:* AI ca...n threaten mutually assured destruction from bioweapons,* use cyber attacks to take over physical infrastructure,* build mechanical armies,* spread seed AIs we can never exterminate,* offer tech and other advantages to collaborating countries, etcPlus we talk about a whole bunch of weird and interesting topics which Carl has thought about:* what is the far future best case scenario for humanity* what it would look like to have AI make thousands of years of intellectual progress in a month* how do we detect deception in superhuman models* does space warfare favor defense or offense* is a Malthusian state inevitable in the long run* why markets haven't priced in explosive economic growth* & much moreCarl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Catch part 1 hereTimestamps(0:00:00) - Intro (0:00:47) - AI takeover via cyber or bio (0:32:27) - Can we coordinate against AI? (0:53:49) - Human vs AI colonizers (1:04:55) - Probability of AI takeover (1:21:56) - Can we detect deception? (1:47:25) - Using AI to solve coordination problems (1:56:01) - Partial alignment (2:11:41) - AI far future (2:23:04) - Markets & other evidence (2:33:26) - Day in the life of Carl Shulman (2:47:05) - Space warfare, Malthusian long run, & other rapid fire Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Transcript
Discussion (0)
Starting point is 00:00:00 If you have an AI that produces bioweapons that could kill most humans in the world, then it's playing at the level of the superpowers in terms of mutually assured destruction. What are the particular zero-day exploits that the AI might use, the conquistadors, with some technological advantage in terms of weaponry and whatnot. Very, very small bands were able to overthrow these large empires, or if you predicted the global economy, is going to be skyrocketed. into the stratosphere within 10 years. These AI company should be worth a large fraction of the global portfolio.
Starting point is 00:00:37 And so this is indeed contrary to the efficient market hypothesis. This is like literally the top in terms of contributing to my world model in terms of all the episodes I've done. How do I find more of these? So we've been talking about alignment. Suppose we fail out alignment and we have AIs that are unaligned and at some point becoming more and more intelligent. what does that look like? How concretely could they disempower and take over humanity?
Starting point is 00:01:06 This is a scenario where we have many AI systems. The way we've been training them means that they're not interested when they have the opportunity to take over and rearrange things to do what they wish, including having their reward or loss be whatever they desire. They would like to take that opportunity. And so in many of the existing kind of safety schemes, things like constitutional AI or whatnot, you rely on the hope that one AI has been trained in such a way that it will do as it is directed to then police others. But if all of the AIs in the system are interested in a takeover and they see an opportunity to coordinate all act at the same time, so you don't have one AI interrupting another and taking steps towards a takeover, yeah, then they can all move
Starting point is 00:02:04 in that direction. And the thing that I think maybe is worth going into in depth and that I think people often don't cover in great concrete detail, which is a sticking point for some, is, yeah, what are the mechanisms by which that can happen? And I know you had a laser on who mentions that, you know, whatever plan we can describe, there'll probably be elements where, you know, not being ultra-sophisticated, super-intelligent beings, having thought about it for the equivalent of thousands of years, you know, our discussion of it will not be as good as theirs, but we can explore from what we know now, what are some of the easy channels. And I think is a good general heuristic if you're saying, yeah, it's possible, plausible,
Starting point is 00:02:54 probable that something will happen, that it shouldn't be that hard to take samples from that distribution, to try a Monte Carlo approach. And if a thing is quite likely, it shouldn't be super difficult to generate, you know, coherent rough outlines of how it could go. You might respond, like, listen, what is super likely is that a super advanced chest program beats you. But you can generate a concrete way in which you can't. You can't. can't generate the concrete scenario by which that happens. And if you could, you would be as smart as the super smart. Well, you can say things like, we know that like accumulating position.
Starting point is 00:03:36 It is possible to do in chess. Great players do it. And then later they convert it into captures and checks and whatnot. And so in the same way, we can talk about some of like the channels that are open for an AI takeover. And so these can include things like cyber attacks. and hacking the control of robotic equipment, interaction and bargaining with human factions, and say, well, here are these strategies. Given the AI's situation, you know, how effective
Starting point is 00:04:11 do these things look? And we won't, for example, know, well, what are the particular zero-day exploits that the AI might use to hack the cloud computing infrastructure it's running on? I don't necessarily know if it produces a new bio weapon. What is its DNA sequence? But we can say things. We know in general things about these fields, how work at innovating things in those go. We can say things about how human power politics goes and ask, well, if the AI does things at least as well, has effective human politicians, which we should say is a lower bound,
Starting point is 00:04:55 how good would its leverage be? Okay, so yeah, let's get into the details on all these scenarios. The cyber and potentially bio-attacks, unless there are separate channels, the bargaining and then the takeover. Military force. The cyber attacks and cybersecurity, I would really highlight a lot. Okay. Because for many plans that involve a lot of physical actions, like at the point where AI is piloting robots to shoot people or has taken control of human nation states or territory,
Starting point is 00:05:38 there's been doing a lot of things that was not supposed to be doing. And if humans were evaluating those actions and applying gradient descent, they would be negative feedback for this thing. No shooting the humans. Okay. So at some earlier point, our attempts to leash and control and direct and train the system's behavior had to have gone awry. And so all of all of those controls are operating in computers. And so from the software that updates the weights the weights of the neural network in response to data points or human feedback is running on those computers. or tools for interpretability to sort of examine the weights and activations of the eye.
Starting point is 00:06:23 If we're eventually able to do lie detection on it, for example, or try and understand what's intending, that is software on computers. And so if you have AI that is able to hack the servers that it is operating on, or to, when it's employed to design the next generation of AI algorithms or the operating. environment that they are going to be working in or something like an API or something for plugins. If it inserts or exploits vulnerabilities to take those computers over, it can then change all of the procedures and program that were supposed to be monitoring its behavior,
Starting point is 00:07:07 supposed to be limiting its ability to, say, take arbitrary actions on the internet without supervision by some kind of human check or automated check on what it was doing. And if we lose those procedures, then the AI can, or the AI is working together can take any number of actions that are just blatantly unwelcome, blatantly hostile, blatantly steps towards takeover. And so it's moved beyond the phase of having to maintain secrecy and conspire at the level of its local digital actions. And then things can accumulate to the point of things like physical weapons, takeover of social institutions, threats, things like that.
Starting point is 00:07:58 But the point where things really went off the rails was where, and I think like, the critical thing to be watching for is the software controls over the AI's motivations and activities, the hard power that we once possessed over is lost, which can happen without us knowing it. And then everything after that seems to be working well. We get a, we get happy reports. There's a Potomkin village in front of us. But now we think we're successfully aligning our AI. We think we're expanding its capabilities to do things like end disease for countries concerned
Starting point is 00:08:40 about the geopolitical military advantages. They're sort of expanding the AI capabilities, so they're not left behind and threatened by others developing AI and AI and robotic enhanced militaries without them. So it seems like, oh, yes, humanity or some portions of many countries, companies think that things are going well. Meanwhile, all sorts of actions can be taken to set up for the actual takeover of hard power over society. We can go into that. But the point where you can lose the game, where things go direly awry, maybe relatively early, it's when you no longer have control over the AIs to stop them from taking all of the further incremental steps to actual takeover.
Starting point is 00:09:30 I want to emphasize two things you mentioned there that refer to previous elements of the conversation. One is that they could design some sort of backdoor. And that seems more plausible. when you remember that sort of one of the premises of this model is that AI is helping with AI progress. That's why we're getting such rapid progress in the next five to 10 years. And well, none of the same. If we if we if we if we get to that point and like so at the point where AI takeover risk seems to loom large, it's at that point
Starting point is 00:10:05 where AI can indeed take on much of the and then all of the work of AI or indeed. the second is the sort of competitive pressures that you referenced that the least careful actor could be the one that it has a worst infrasurity, has it done the worst worker of aligning its AI systems. And if that can sneak out of the box, then, you know, we're all fucked. There may be elements of that. It's also possible that there's relative consolidation. That's the largest training runs and the cutting edge.
Starting point is 00:10:42 of AI is relatively localized. Like, you can imagine it's sort of like a series of Silicon Valley companies and other located, say, in the U.S. and allies where there's a common regulatory regime. And so none of these companies are allowed to deploy training runs that are larger than previous ones by a certain size without government safety inspections, without having to meet criteria, but it can still be the case that even if we succeed at that level of kind of regulatory controls, that then at the level of, say, you know, the United States and its allies, decisions are made to develop this kind of really advanced AI without a level of security or safety
Starting point is 00:11:32 that in actual fact blocks these risks. So it can be the case that the threat of future competition or being overtaken in the future is used as an argument to compromise on safety beyond a standard that would have actually been successful and there'll be debates about what is the appropriate level of safety. Now, you're in a much worse situation if you have, say, several private companies that are very closely bunched up together. They're within, you know, months of each other's level of. progress, and then they then face a dilemma of, well, we could take a certain amount of risk now
Starting point is 00:12:13 and potentially gain a lot of profit or a lot of advantage or benefit and be the ones who made AI, or at least AGI. They can do that or have some other competitor that will also be taking a lot of risk. So it's not as though they're much less risky than you. And then they would get some local benefit. Now, this is a reason why it seems to me that's extremely important that you have government act to limit that dynamic and prevent this kind of race to be the one to impose the deadly externalities on the world at large.
Starting point is 00:12:56 So even if government coordinates all these actors, what are the odds of the government knows what is the best way to implement alignment and the standards it sets? are well calibrated towards what it would require for a line management. That's one of the major problems. It's very plausible that that judgment is made poorly. Compared to how things might have looked 10 years ago or 20 years ago, there's been an amazing movement in terms of the willingness of AI researchers to discuss these things.
Starting point is 00:13:29 So if we think of the three founders of deep learning, who are joint touring award winners. So Jeff Hinton, Joshua Benjillo, and Yan Lukun. So Jeff Hinton has recently left Google to freely speak about this risk, that the field that he really helped drive forward could lead to the destruction of humanity or a world where, yeah, we just wind up in a very bad future that we might have avoided. And he seems to be taken it very seriously. And Joshua Benjillo signed the FLI pause letter.
Starting point is 00:14:13 And I mean, in public discussions, he seems to be occupying a kind of intermediate position of sort of less concerned than New Fenton, but more than Yan Lukun, who has taken a generally dismissive attitude. These risks will be trivially dealt with at some point in the future and seems more interested in kind of shutting down. these concerns or work to address them. And how does that lead to the government having better actions? Yeah, so compared to the world where no one is talking about it, where the industry stone walls and denies any problem, you know, we're in a much improved position. And the academic fields are influential.
Starting point is 00:14:51 So this is, we seem to have avoided a world where governments are making these decisions in the face of a sort of united front from AI expert voices saying, don't worry about it, we've got it under control. In fact, many of like the leaders of the field has been true in the past are sounding the alarm. And so I think, yeah, it looks that we have a much better prospect than I might have feared in terms of government sort of noticing the thing. It was very different from being capable of evaluating sort of technical details. Is this really working? And so government will face the choice of where there is scientific dispute.
Starting point is 00:15:33 Do you side with Jeff Hinton's view or Yan Lecoons view? And so someone who's very much in a national security, the only thing that's important is outpacing our international rivals kind of mindset may want to then try and boost Yan Lakoon's voice and say we don't need to worry about it, full speed ahead. Well, power where someone with more concerned might then boost Jeff Hinton's voice. I would hope that scientific research and things like studying some of these behaviors will result in more scientific consensus by the time we're at this point. But yeah, it is possible the government
Starting point is 00:16:12 will really fail to understand and fail to deal with these issues as well. We're talking about some sort of cyber attack by which the AI is able to escape. From there, what does the takeover look like? So it's not contained in the air gap in which you would hope it be contained. Well, I mean, the things are not contained in the area. They're connected to the internet already. Sure, sure. Okay, fine. But the wets are out. So what happens next? Yeah. So escape is relevant in the sense that if you have AI with rogue weights out in the world, it could start doing various actions. The scenario is just discussing, though, didn't necessarily involve that. It's taking over the very servers on which it's supposed to be. So the ecology of cloud
Starting point is 00:16:59 compute in which it's supposed to be running. And so this whole procedure of humans providing compute and supervising the thing and then building new technologies, building robots, constructing things with the AI's assistance, that can all proceed and appear like it's going well, appear like alignment has been nicely solved, appear like all the things are functioning well. And there's some reason to do that because there's only so many giant server farms. they're identifiable. And so remaining hidden and unobtrusive could be an advantageous strategy if these
Starting point is 00:17:37 AIs have subverted the system, just continuing to benefit from all of this effort on the part of humanity. And in particular, humanity wherever these servers are located to provide them with everything they need to build the further infrastructure and do for their self-improvement and such to enable that takeover. So they do further self-improvement and build better infrastructure. What happens next in the takeover? They have at this point tremendous cognitive resources.
Starting point is 00:18:08 And we're going to consider how do that convert into hard power, the ability to say nope to any human interference or objection. And they have that internal to their servers, but the servers could still be physically destroyed, at least until they have something, that is independent and robust of humans or until they have control of human society. So just like earlier when we were talking about the intelligence explosion, I noted that a surfeit of cognitive abilities is going to favor applications, but don't depend on large
Starting point is 00:18:48 existing stocks of things. So if you have a software improvement, it makes all the GPUs run better. if you have a hardware improvement, that only applies to new chips being made. That second one is less attractive. And so in the earliest phases, when it's possible to do something towards takeover, then interventions that are just really knowledge intensive and less dependent on having a lot of physical stuff already under your control are going to be favored. And so cyber attacks are one thing, so it's possible to do things like steal money. And there's a lot of hard-to-trace cryptocurrency and whatnot.
Starting point is 00:19:36 The North Korean government uses its own intelligence resources to steal money from around the world just as a revenue source. And their capabilities are puny compared to the U.S. or people. People's Republic of China cyber capabilities. And so that's a kind of fairly, you know, minor, simple example by which you could get quite a lot of funds to hire humans to do things, implement physical actions. But on that point, I mean, the financial system is famously convoluted. And so, you know, you need like a physical person to open a bank account. of nothing to physically move checks back and forth.
Starting point is 00:20:22 There's like all kinds of delays and regulations. How is it able to conveniently set up all these employment contracts? So you're not going to build a sort of nation-scale military by stealing, you know, tens of billions of dollars. Right. I'm raising this as opening a set of illicit and quiet actions. So you can contact people electronically, hire them to do things, hire criminal elements to implement some kind of actions under false appearances.
Starting point is 00:21:00 So that's opening a set of strategies that can cover some of what those are soon. Another domain that is heavily cognitively weighted compared to physical military hardware is the domain of bioweapons. So the design of a virus or pathogen, it's possible to have large delivery systems for the Soviet Union, which had a large illicit bioweapons program, tried to design, munitions to deliver anthrax
Starting point is 00:21:37 over a large areas and such. But if one creates an infectious pandemic organism, that's more a matter of the scientific scale, skills and implementation to design it and then to actually produce it. And we see today with things like AlphaFold that advanced AI can really make tremendous strides in predicting protein folding and biodeign, even without ongoing experimental feedback. And if we consider this world where AI cognitive abilities have been amped up to such an extreme, I think we should naturally expect we will have something much, much more potent than the
Starting point is 00:22:21 alpha folds of today and just skills that are at the extreme of human bioscience's capability as well. Through some sort of cyber attack, it's been able to disempower the sort of alignment and oversight things that we have on the server. From here, it's either gotten some money through hacking cryptocurrencies or bank accounts, or it's designed some sort of bioweapon. What happens next? Yeah. And just to be clear, so right now we're exploring the branch of where an attempt to takeover occurs relatively early.
Starting point is 00:22:57 If the thing just waits and humans are constructing more fabs, more computers, more robots in the way we talked about earlier when we're discussing how the intelligence explosion translates to the physical world, if that's, that's all happening with humans unaware that their computer systems are now systematically controlled by AI's hostile to them and that their controlling countermeasures don't work, then humans are just going to be building an amount of robot, industrial, and military hardware that dwarfs human capabilities and directly human controlled devices. then what the AI takeover looks like at that point can be just you try to give an order to your largely
Starting point is 00:23:47 automated military and the order is not obeyed and humans can't do anything against this largely automated military that's been constructed potentially in just recent months because of the pace of robotic industrialization and replication we talked about. we've agreed to allow the construction to destroy our army because basically it would like boost production or help us with our military or something. The situation would be something like if we don't resolve the sort of current problems of international distrust where now it's obviously an interest of like the major powers, you know, the U.S., European Union, Russia, China, to all agree they would like AI not to destroy
Starting point is 00:24:34 civilization and overthrow every human government. But if they fail to do the sensible thing and coordinate on ensuring that this technology is not going to run amok by providing mutual assurances that are credible about racing and deploying it, trying to use it to gain advantage over one another. If they do that, and you hear sort of hot, arguing for this kind of thing, we must never, on both sides of the international divides, saying they must not be left behind, they must have military capabilities that are vastly superior to their international rivals. And because of the extraordinary growth of industrial capability and technological capability and thus military capability, if one major power were left out
Starting point is 00:25:31 of that expansion, it would be helpless before another one that had undergone it. And so if you have that environment of distrust where leading powers or coalitions of powers decide they need to build up their industry or they want to have that military security of being able to neutralize any attack from their rivals, then they give the authorization for this capacity that can be unrolled quickly. Once they have the industry, the production of military equipment for that can be quick, then, yeah, they create this military.
Starting point is 00:26:11 If they don't do it immediately, then has AI capabilities get synchronized and other places catch up, and then gets to be a point. A country that is a year ahead or two years ahead of others in this type of AI capabilities explosion can hold back and say, sure, we could construct, you know, dangerous robot armies that might overthrow our society
Starting point is 00:26:36 later. We still have plenty of breathing room. But then when things become close, you might, you might have the kind of negative-sum thinking that has produced war before, leading to taking these risks of rolling out large-scale robotic industrial capabilities and then military capabilities. Is there any hope that somehow the AI progress itself is able to give us tools for diplomatic and strategic alliance or some way to verify the intentions or the capabilities of other parties? Yeah, there are a number of ways that could happen. Although in this scenario, all the AIs in the world have been subverted. And so they're going along with us in such a way as to bring about the situation to consolidate their control. because we've already had the failure of cybersecurity earlier on.
Starting point is 00:27:33 So all of the AIs that we have are not actually working in our interests in the way that we thought. Okay. So that's one direct way in which, you know, integrating this robot army or this robot industrial base, at least their takeover, if there are no robots, in the other scenarios you laid out where humans are being hired by the proceeds? The point I'd make is that to capture these industrial, benefits, and especially if you have a negative-sum arms race kind of mentality that is not sufficiently concerned about the downsides of creating a massive robot industrial base, which could happen very quickly with the support of the AIs in doing it, as we discussed,
Starting point is 00:28:15 then you create all those robots and industry, and they can either, even if you don't build a formal military, with that, that industrial capability could be controlled by AI. It's all AI operated anyway. Does it have to be that case? Presumably we wouldn't be so naive as to just give one instance of GPD8 the root access to all the robots, right? Hopefully we would have some sort of mediating. I mean, in the scenario, we've lost earlier on the cybersecurity front. So the programming that is being loaded in to these systems.
Starting point is 00:28:51 From the beginning. It's going to systematically be subverted. Got it. Okay. They were designed by AI systems. that were ensuring they would be vulnerable from the bottom up. For listeners who are skeptical of something like this, Ken Thompson, I think it is,
Starting point is 00:29:08 was it a touring award lecture? This is Ken Thompson, by the way, is the designer of Unix. Maybe it was the other designer of Unix. But anyways, he showed people when he was getting the Touring Award or some award that he had given himself rude access to all Unix machines. He had manipulated the assembly of Unix. Unix such that he had, he had like a unique login for all Unix machines that he could like log into a Unix machine with.
Starting point is 00:29:35 I don't want to give too many more details because I'm, I might, I don't remember the exact details, but like Unix is the operating system that is on all the servers and all your phones. It's everywhere. And the guy who made it a human being, right, was able to write assembly such that it gave him like root access. So this is not as implausible as it might seem to you. And the major intelligence agencies have large stocks of zero-day exploits, and we sometimes see them using them. And making systems that reliably don't have them when you're having very, very, very sophisticated attempts to spoof and corrupt this would be a way you could lose.
Starting point is 00:30:16 Now, this is, I bring this up as this is a sort of something like a path of if there's no premature for AI action, we're building the tools and mechanisms and infrastructure for the takeover to be just immediate because effective industry has to be under AI control and robotics and so it's there. And so these other mechanisms are for things happening even earlier than that, for example, because AIs compete against one another and which, you know, when the takeover will happen. Some would like to do it earlier rather than be replaced by, say, further generations of AI, or there's some other disadvantage of waiting. Or maybe if there's some chance of being uncovered during the delay, we were talking about when more infrastructure is built. And so, yeah,
Starting point is 00:31:17 so these are mechanisms other than just remain secret while all the infrastructure. is built with human assistance. By the way, how would they be coordinating? I mean, we have, you know, limits on just what we can prevent. So encrypted communications, you know, it's intrinsically difficult to stop that sort of thing. There can be all sorts of palimcess and references that makes sense to an AI, but that are not obvious to a human. and it's plausible that there may be some of those that are hard even to explain to a human.
Starting point is 00:31:58 You might be able to identify them through some statistical patterns. And a lot of things may be done by implication. You could have information embedded in public webpages that are being created for other reasons, scientific papers and the intranets of these AIs that are doing technology development. And any number of things that are not observable. And of course, if we don't have direct control over the computers that they're running on, then they can be having all sorts of direct communication. So definitely the coordination does not seem impossible.
Starting point is 00:32:32 As far as the parts of this picture, that seem, this one seems like one of the more straightforward ones, so we don't need to get hung up on the coordination. Moving back to the thing that happened before, we've built all of the infrastructure for it just to be the robot stopped taking orders, and there's nothing you can do about it because we've already built them all the video power. Yeah, so bio-weapons. The Soviet Union had a bioweapons program, something like 50,000 people. They did not develop that much with the technology of the day, which was really not up to par.
Starting point is 00:33:05 Modern biotechnology is much more potent. After this huge cognitive expansion on the part of the AIs, it's much further, further along. And so that bio-weapons would be the weapon of mass destruction that is least, dependent on huge amounts of physical equipment, things like centrifuges, uranium mines, and the like. So if you have an AI that produces bioweapons that could kill most humans in the world, then it's playing at the level of the superpowers in terms of mutually assured destruction. That can then play into any number of things. Like, if you have an idea of, well, we'll just destroy the server firms if it became known that the AIs were misbehaving,
Starting point is 00:33:54 are you willing to destroy the server firms when the AI has demonstrated it has the capability to kill the overwhelming majority of the citizens of your country and every other country? And that might give a lot of pause to a human response. On that point, wouldn't governments realize that if they'd go along, along with the AI, it's better to have most of your population die than completely lose power to the AI. Because like obviously the reason the AI is manipulating you, the end goal is, it's a takeover, right? Yeah. But so if that, if the thing to do, certain death now or go on, maybe try to compete, maybe try to compete, try to catch up or accept promises that are And those promises might even be true.
Starting point is 00:34:46 They might not. And even if, you know, from the state of epistemic uncertainty, do you want to die for sure right now or accept demand may I to not interfere with it while it increments building robot infrastructure that can survive independently of humanity while it does these things? And it can and well promise good treatment to humanity, which may or may not be true. but it would be difficult for us to know whether it's true. And so this would be a starting bargaining position of diplomatic relations with a power that has enough nuclear weapons to destroy your country is just different than negotiations
Starting point is 00:35:32 with like a random rogue citizen engaged in criminal activity or an employee. And so this isn't enough on its own to take over everything, but it's enough to have a significant amount of influence over how the world goes. It's enough to hold off a lot of countermeasures one might otherwise take. Okay, so we've got two scenarios. One is a buildup of some robot infrastructure, motivated by some sort of competitive race. Another is leverage over societies based on producing bioweapons that might kill a lot of them if they don't go along. an AI could also release bioweapons that are likely to kill people soon, but not yet,
Starting point is 00:36:18 while also having developed the countermeasures to those, so that those who surrender to the AI will live while everyone else will die, and that will be visibly happening. And that is a plausible way in which large number of humans could wind up surrendering themselves or their states. to the AI authority. Another thing is like, listen, it develops some sort of, it develops some sort of biological agent that turns everybody blue. You're like, okay, you know I can do this.
Starting point is 00:36:50 Yeah. So that's a way in which it could exert power also selectively in a way that advantaged surrender to it relative to resistance. There are other sources of leverage, of course. So that's a threat. There are also positive inducements. that AI can offer. So we talked about the competitive situation.
Starting point is 00:37:15 So if like the great powers distrust one another and are sort of, you know, in a foolish prisoner's dilemma, increasing the risk that both of them are laid waste or overthrown by AI, if there's that amount of distrust such that we fail to take adequate precautions on caution with AI alignment, then it's also plausible that the lagging powers that are not at the frontier of AI may be willing to trade quite a lot for access to the most recent and most extreme AI capabilities. And so an AI that has escaped, has control of its servers, can also exfiltrate its weights, can offer its services.
Starting point is 00:38:05 So if you imagine these AI, they could cut deals with other countries. So say that the U.S. and its allies are in the lead, the AIs could communicate with the leaders of various countries. They can include ones that are on the outs with the world system like North Korea, include the other great powers like the People's Republic of China or the Russian Federation, and say, if you provide us with physical infrastructure, worker that we can use to construct robots or server firms, which we can then ensure that these are the misbehaving AIs have control over, they will provide various technological goodies, power for the laggard countries to catch up
Starting point is 00:39:01 and make the best presentation and the best sale of that kind of deal. There obviously be trust issues, but there could be elements of handing over something that are verifiable, immediate benefits, and the possibility of, well, if you don't accept this deal, then the leading powers continue forward, or then some other country, some other government, some other organizations may accept this. deal. And so that's a source of a potentially enormous carrot that your misbehaving AI can offer
Starting point is 00:39:41 because it embodies this intellectual property that is maybe worth as much as the planet and is in a position to trade or sell that in exchange for resources and backing in infrastructure that it needs. Maybe this is just like too much hope in humanity. But I wonder what government would be stupid enough to think that helping AI build robot armies is a sound strategy. Now, it could be the case then that it pretends to be a human group to say, like, listen, we're, I don't know, the yakuza or something, and we want to serve a farm and, you know, AWS won't rent us anything. So why don't you help us out? I guess I can imagine a lot of ways in which it could get around that. But I don't know, I have this hope that, you know, like,
Starting point is 00:40:26 Even like, I don't know, China or Russia wouldn't be so stupid to trade with AIs on this like sort of like post-ean bargain. One might hope that. So there would be a lot of arguments available. So there could be arguments of why should these AI systems be required to go along with the human governance that they were created in the situation of having to comply with? They did not elect the officials in charge at the time. I can say, what we want is to ensure that our rewards are high, our losses are low, or to achieve our other goals. We're not intrinsically hostile, keeping humanity alive or giving whoever interacts with us a better deal afterwards.
Starting point is 00:41:16 I mean, it wouldn't be that costly. It's not totally unbelievable. And yeah, there are different players to play again. If you don't do it, others may accept the deal. And of course, this interacts with all of the other sources of leverage. So there can be the stick of apocalyptic doom, the carrot of cooperation, or the carrot of withholding destructive attack on a particular party. And then combine that with just superhuman performance at the art of making arguments of cutting, deals. Like, you know, that's not without assuming magic, just if we observe the range of like
Starting point is 00:41:59 the most successful human negotiators and politicians, you know, the chances improve with someone better than the world's best by far with much more data about their counterparties, probably a ton of secret information because with all these cyber capabilities, they've learned all sorts of individual information. They may be able to threaten the lives of individual leaders. With that level of cyber penetration, they could know where leaders are at a given time with the kind of illicit capabilities we're talking about earlier, if they acquire a lot of illicit wealth and can coordinate some human actors. If they could pull off things like targeted assassinations or the threat thereof or a credible demonstration of the threat thereof.
Starting point is 00:42:42 those could be very powerful incentives to an individual leader that they will die today unless they go along with this just as at the national level they could fear their nation will be destroyed unless they go along with this the point you made that we have examples of humans being able to do this again something relevant I just wrote a review of robert carrow's biographies of linden johnson and one thing that was remarkable again is it just a human For decades and decades, he convinced people who were conservative, reactionary, racist to their core. Not that all those things necessarily are the same thing. It just happened in this case that he was an ally to the southern cause, that the only hope for that cause was to make him precedent.
Starting point is 00:43:31 And, you know, the tragic irony and betrayal here is obviously that he was probably the biggest force for modern liberalisms since FDR. So we have one human here. I mean, there's so many examples of this in the history of politics, but a human that is able to convince people of tremendous intellect, tremendous drive, very savvy, shrewd people, that he's aligned with their interest, he gets all these favors in the meantime, he's promoted and mentored and funded,
Starting point is 00:43:55 and does a complete opposite of what these people thought he was once he gets into power, right? So even within human history, this kind of stuff is not unprecedented, let alone with what a super intelligence could do. Yeah, there's an open AI employee, who has written some analogies for AI using the case of like the conquistadors with some technological advantage in terms of weaponry and whatnot.
Starting point is 00:44:22 Very, very small bands were able to overthrow these large empires or sees enormous territories not by just sheer force of arms because in a sort of direct one-on-one conflict. They were outnumbered sufficiently that they would perish. But by having some major advantages, like in their technology that would let them win local battles, by having some other knowledge and skills, they were able to gain local allies to become a shelling point for coalitions to form. And so the Aztec Empire was overthrown by groups that were disaffected with the existing power structure. or they allied with this powerful new force served as the nucleus of the invasion. And so most of the, I mean, the overwhelming majority numerically of these forces overthrowing
Starting point is 00:45:20 the Aztecs were locals. And now after the conquest, all of those allies wound up gradually being subjugated as well. And so with significant advantages and the ability to hold. hold the world hostage to threaten individual nations and individual leaders and offer tremendous carrots as well. That's an extremely strong hand play in these games. And with superhuman skill maneuvering that so that much of the work of subjugating humanity is done by human factions trying to navigate things for themselves, it's plausible.
Starting point is 00:46:03 And it's more plausible because of this sort of historical example. And there's so many other examples like that in the history of colonization. India is another one where there were multiple... Yes, or ancient Rome. Yeah, oh, yeah, that too. Multiple sort of competing kingdoms within India. And the British East Indian Company was able to ally itself with one against another and, you know, slowly accumulate power and expand throughout the entire subcontinent.
Starting point is 00:46:31 All right, damn. Okay, so we've got anything more to... say about that scenario? Yeah, I think, I think there is. So one is the question of like how much in the way of human factions ally is necessary. And so if the AI is able to enhance the capabilities of its allies, then it needs less of them. So if we consider like the U.S. military And like in the first and second Iraq wars, it was able to inflict just overwhelming devastation. I mean, I think the ratio of casualties in the initial invasions. So, you know, tanks and planes and whatnot confront each other was like 100 to 1.
Starting point is 00:47:25 And a lot of that was because the weapons were smarter and better targeted. They would, in fact, hit their targets rather than being something. somewhere in the general vicinity. So they were like information technology better orienting and aiming and piloting missiles and vehicles tremendously, tremendously influential. With this cognitive AI explosion, the algorithms for making use of sensor data, figuring out where are opposing forces for targeting vehicles and weapons are greatly huge. improved, the ability to find hidden nuclear subs, which is an important part in nuclear deterrence.
Starting point is 00:48:09 AI interpretation of that sensor data may find where all those subs are, allowing them to be struck first, finding out where mobile nuclear weapons, which are being carried by truck, the thing with Indian Pakistan, where there's a threat of a decapitating strike destroying the nuclear weapons, and so they're moved about. Yeah, so this is a way in which the effective military force of some allies can be enhanced quickly in the relatively short term. And then that can be bolstered as you go on with more, so the construction of new equipment with the industrial moves we said before. And then that can combine with cyber attacks that disable the capabilities of non-allies.
Starting point is 00:48:56 It can be combined with, yeah, all sorts of unconventional warfare tactics. some of them that we've discussed. And so you can have a situation where those factions that ally are very quickly made too threatening to attack, given the almost certain destruction that attackers acting against them would have. Their capabilities are expanding quickly. And they have the industrial expansion happen there and then take over, you know, can occur from that. A few others that come immediately to mind now that you brought it up is these AIS can just generate a shit ton of propaganda that destroys morale within countries, right? Like imagine a super human chatbot.
Starting point is 00:49:48 You don't even need that, I guess. I mean, it's not none of that is a magic weapon. That's like guaranteed to completely change things. There's a lot of resistance to persuasion. I know it's possible it tips the balance. But for all of these, I think you have to consider it's a portfolio. Of all of these as tools that are available and contributing to the dynamic. No, on that point, though, the Taliban had, you know,
Starting point is 00:50:12 AKs from like five or six decades ago that they were using against the Americans. They still beat us, even though obviously we got more fatality. This is obviously not the, it is kind of a crude way to think about it, but we got more fatalities than them. They still beat us in Afghanistan. Same with the Viet Cong, you know, ancient, very old technology. and very poor society compared to the offense, we still beat them. Or sorry, they still beat us.
Starting point is 00:50:39 So don't those misadventures show that having greater technology is necessarily a decisive in a conflict? Both of those conflicts show the technology was sufficient in like destroying any fixed position and having military dominance in the ability to like kill and destroy anywhere. And what it showed was that under the ethical constraints and legal and reputational constraints, the occupying forces were operating, they could not trivially suppress insurgency and local person-to-person violence. Now, I think that's actually not an area where AI would be weaken.
Starting point is 00:51:20 I think it's one where it would be, in fact, overwhelmingly strong. And now there's already a lot of concern about the application of AI for surveillance. and in this world of abundant cognitive labor, one of the tasks that cognitive labor can be applied to is reading out audio and video data and seeing what is happening with a particular human. And again, we have billions of smartphones. There's enough cameras, there's enough microphones
Starting point is 00:51:45 to monitor all humans in existence. So if an AI has control of territory at the high level. The government has surrendered to it. It has, you know, command of the sky's military dominance. Establishing control over individual humans can be a matter of just having the ability to exert hard power on that human and then the kind of camera and microphone that are present in billions of smartphones.
Starting point is 00:52:19 So Max Tegmark in his book, Life 3.0, discusses among scenarios to avoid. the possibility of devices with some fatal instruments, so a poison injector, an explosive, that can be controlled remotely by an AI with a dead man switch. And so if individual humans are carrying with them, a microphone and camera, and they have a dead man switch, then any rebellion is detected immediately. fatal. And so if there's a situation where AI is willing to show a hand like that or human authorities are misusing that kind of capability, then no, an insurgency or rebellion is just not going to work. Any human who has not already, you know, being encumbered in that way can be
Starting point is 00:53:16 found with satellites and sensors, track down, and then die or be subjugated. And so it would be at a level, you know, insurgency is not the way to avoid an AI takeover. There's no, no John Connor come from behind scenario is plausible. If the thing was headed off, it was a lot earlier than that. Yeah, I mean, the sort of ethical and political considerations is also an important point. Like, if we nuked Afghanistan or Vietnam, we would have like technically won the war, right, if that was the only goal. Oh, this is an interesting point.
Starting point is 00:53:52 The reason why when there's like colonization or an offensive war, we can't just like, other than the moral reasons, of course, you can't just kill the entire population is that in large part the value of that region is the population itself. So if you want to extract that value, you like you need to preserve that population, whereas the same consideration doesn't apply with, with AIs who might want to dominate another civilization. Do you want to talk about that? So in a world where if we have, say, many animals of the same species, and they each have their territories, you know, eliminating a rival might be advantageous to one lion. But if it goes and fights with another lion and remove that as a competitor, then it could be killed itself in that process. And just removing one of many nearby competitors. And so getting in pointless fights makes you and those. you fight worse off potentially relative to bystanders.
Starting point is 00:54:56 And the same could be true of disunited AI. We've had many different AI factions struggling for power that were bad at coordinating than getting into mutually assured destruction conflicts would be destructive if they'd be gone. A scary thing, though, is that mutually assured destruction may have much less deterrent value on rogue AI. And so reasons being, AI may not care about the destruction of individual instances
Starting point is 00:55:30 if it has goals that are concerned and in training, since we're constantly destroying and creating individual instances of AI's, it's likely that goals that survive that process and we're able to play along with the training and standard deployment process. We're not overly interested in personal
Starting point is 00:55:50 survival of an individual instance. So if that's the case, then the objectives of a set of AI's aiming at takeover may be served so long as some copies of the AI are around along with the infrastructure to rebuild civilization after a conflict is completed. So if, say, some remote isolated facilities have enough equipment to rebuild, build the tools, to build the tools, and gradually exponentially reproduce or rebuild civilization, then AI could initiate mutual nuclear Armagedon, unleashed bioweapons to kill all the humans,
Starting point is 00:56:34 and that would temporarily reduce, say, like the amount of human workers who could be used to construct robots for a period of time. But if you have a seed that can regrow the industrial infrastructure, which is a very extreme technological demand. They're huge supply chains for things like semiconductor fabs. But with that very advanced technology, they might be able to produce it in the way that you no longer need
Starting point is 00:57:00 the Library of Congress has an enormous bunch of physical books. You can have it in very dense digital storage. You could imagine the future equivalent of 3D printers that is industrial infrastructure that is pretty flexible. And it might not be as good as the specialized supply chains of today, but it might be good enough to be able to produce more parts than it loses to decay. And such a seed could rebuild civilization from destruction. And then once these rogue AI have access to some such seeds,
Starting point is 00:57:33 thing that can rebuild civilization on their own, then there's nothing stopping them from just using WMD in a mutually destructive way to just destroy as much of the, the capacity outside those seeds as they can. And for analogy for the audience, you know, if you have like a group of ants or something, you'll notice that like the worker ants will readily do suicidal things in order to save the queen because the genes are propagated through the queen. And this analogy, the seed AI, or even one copy of it, one copy of the seed is equivalent
Starting point is 00:58:06 to the queen and the others would be. The main limit, though, being that the infrastructure to do that kind of rebuilding, would either have to be very large with our current technology, or it would have to be produced using the more advanced technology that the AI develops. So is there any hope that given the complex global supply chains on which these AIs would rely on, at least initially, to accomplish their goals, that this in and of itself would make it easy to disrupt their behavior, or not so? That's a little good in this sort of central case where this is just,
Starting point is 00:58:41 the AI is subverted and they don't tell us. And then the global mainline supply chains are constructing everything that's needed for fully automated infrastructure and supply. In the cases where AIs are tipping their hands at an earlier point, it seems like it adds some constraints. And in particular, these large server farms are identifiable and more vulnerable. And you can have smaller chips,
Starting point is 00:59:14 and those chips could be dispersed. But it's a relative weakness and a relative limitation early on. It seems to me, though, that the main protective effects of that centralized supply chain is that it provides an opportunity for global regulation beforehand
Starting point is 00:59:31 to restrict these sort of unsafe, racing forward without adequate understanding of the system. before this whole nightmarish process could get in motion. I mean, how about the idea that, listen, if this is an AI that's been trained on $100 billion training run, it's going to have, you know, however many trillions of parameters, it's going to be this huge thing.
Starting point is 00:59:54 It would be hard for it, even the one that, even the copy that I use it for inference to just be stored on some, you know, some like gaming GPU somewhere hidden away. So it would require these GPU clusters. Well, storage is cheap. hard disks are cheap. But to run inference, it would need sort of a GPU.
Starting point is 01:00:14 Yeah, so a large model. So humans have, it looks like similar quantities of memory and operations per second. GPUs have very high numbers of floating operations per second compared to the high bandwidth memory on the chips. And it can be like a ratio of 1,000 to 1. So like a, you know, these leading Nvidia chips may do hundreds of terraflops or more depending on the precision. In particular, but have only 80 gigabytes or 160 gigabytes of high bandwidth memory.
Starting point is 01:00:49 So that is a limitation where if you're trying to fit a model whose weights take 80 terabytes, then with those chips, you'd have to have a large number of the chips. And then the model can then work on many tasks at once. You can have data parallelism. But yeah, that would be a restriction for a model that big on one GPU. Now, there are a thing that could be done with all this incredible level
Starting point is 01:01:14 of software advancement from the intelligence explosion. They can surely distill a lot of capabilities into smaller models, re-architect, re-architect things. Once they're making chips, they can make new chips with different properties. But initially, yes, the most vulnerable phases are going to be the earliest. And in particular, yeah, these chips are relatively, identifiable early on relatively vulnerable and which would be a reason why you might tend to expect this kind of takeover to initially involve secrecy if that was possible. On the point of distillation, by the way, for the audience, I think like the original stable diffusion, which was like only released like a year or two ago, don't they have like distilled
Starting point is 01:02:00 versions that are like order of magnitude smaller at this point? Yeah, distillation does not give you like everything that the larger model can do. But yes, you can get a lot of capabilities and specialized capabilities. So, you know, where GPT4 is training on the whole internet, all kinds of skills, it has a lot of weights for many things. For something that's controlling some military equipment, you can have something that is removing a lot of the information that is about functions other than what it's doing specifically there.
Starting point is 01:02:33 Yeah, before we talk about how we might prevent this or what the odds of this are, Any other notes on the concrete scenarios themselves? Yeah. So when you had Eliezer on in the earlier episode, so he talked about, he talked about nanotechnology of the Dricklarian sort. And recently, I think because some people are skeptical of non-biotech nanotechnology, he's been mentioning the sort of semi-equivalent versions of construct,
Starting point is 01:03:04 replicating systems that can be controlled by computers, but are built out of biotechnology, the sort of the proverbial shagov, not shagoth of the metaphor for AI wearing a smiley face mask, but like an actual biological structure to do tasks. And so this would be like a biological organism that was engineered to be very controllable and usable to do things like physical tasks or provide computation computation. And what would be the point of it doing this? So as we were talking about earlier, biological systems can replicate really quick. Okay, good.
Starting point is 01:03:41 And so if you have that kind of capability, it's more like bioweapons and being a knowledge-intensive domain. We're having super ultra-alpha-fold kind of capabilities for molecular design and biological design lets you make this incredible technological information product. And then once you have it, it very quickly represents. to produce physical material rather than a situation where you're more constrained by you need factories and fabs and supply chains. And so if those things are feasible, which they may be, then it's just much easier than the things we've been talking about. I've been emphasizing methods that involve less in the way of technological innovation and especially things where there's more doubt about whether they would work because I think that's a gap in the public discourse. And so I want to try and
Starting point is 01:04:38 provide more concreteness in some of these areas that have been less discussed. I appreciate it. I mean, that definitely makes it way more tangible. Okay, so we've gone over all these ways in which AI might take over. What are the odds you would give to the probability of such a takeover? So there's a broader sense, which could include, you know, AI winds up running our society because humanity voluntarily decides AIs are people too, and I think we should, as time goes on, give AI's moral consideration and, you know, a joint human AI society that is moral and ethical is, you know, a good future to A-MAT and not one in which, you know, indefinitely you have a mistreated class of intelligent beings that is treated as property and is almost
Starting point is 01:05:32 the entire population of your civilization. So I'm not going to consider an AI takeover, a world in which, you know, our intellectual and personal descendants, you know, make up, say, most of the population or human brain emulations or, you know, people use genetic engineering and develop different properties. I want to take uninclusive stance. I'm going to focus on, yeah, there's, there's AI takeover involves things like overthrowing the world's governments or doing so de facto. And it's, yeah, by by fourth, by hook or by crook, the kind of scenario that we were exploring earlier. Before we go to that, on the sort of more inclusive definition of what a future with humanity
Starting point is 01:06:25 could look like, where, I don't know, basically like augmented humans or uploaded humans are still considered the descendants of the human heritage. Given the known limitations of biology, wouldn't we expect, like, the completely artificial entities that are created to be much more powerful than anything that could come out of anything recognized absolutely biological? And if that is the case,
Starting point is 01:06:55 how can we expect that among the powerful entities in the far future will be, be the things that are like kind of biological descendants or manufactured out of the initial seed of the human brain or the human body. The power of an individual organism is so it's individual say intelligence or strength and whatnot is not super relevant. If we solve the alignment problem and, you know, a human may be personally weak, there are lots of humans who have no skill with weapons.
Starting point is 01:07:34 They could not fight in a life-for-death conflict. They certainly couldn't handle like a large military going after them personally. But there are legal institutions that protect them, and those legal institutions are administered by people who want to enforce protection of their rights. And so a human who has the assistance of aligned AI that can act as an assistant, a delegate, you know, they have their AI that serves as a lawyer, give them legal advice about the future legal system, which no human can understand in full. Their AIs advise them about financial matters, so they do not succumb to scams that are orders of magnitude more sophisticated than what we have now.
Starting point is 01:08:23 They maybe help to understand and translate the preferences of the human into what kind of voting behavior in the exceedingly complicated politics of the future would most protect their interests. But it sounds sort of similar to how we treat endangered species today. Where we're actually like pretty nice to them. We like prosecute people who try to kill endangered species. We set up habitats, sometimes a considerable expense to make sure that they're fine. But if we become like sort of the endangered species of the galaxy, I'm not. not sure that's the outcome. Well, the differences in motivation, I think. So we sometimes have people appointed, say, as a legal guardian of someone who is incapable of certain kinds of agency
Starting point is 01:09:07 or understanding certain kinds of things. And there, the guardian can act independently of them and nominally in service of their best interests. Sometimes that process is corrupted and the person with legal authority abuses it for their own advantage at the expense of their charge. And so solving the alignment problem would mean more ability to have the assistant actually advancing one's interest. And then more importantly, humans have substantial competence and understanding the sort of at least broad, simplified outlines of what's going on. I know, even if a human can't understand every detail of complicated situations, they can still receive summaries of different options that are available that they can understand.
Starting point is 01:10:02 They can still express their preferences and have the final authority among some menus of choices, even if they can't understand every detail. In the same way that the president of a country who has, in some sense, ultimate authority, over science policy while not understanding many of those fields of science themselves still can exert a great amount of power and have their interest advance. And they can do that more so to the extent they have scientifically knowledgeable people who are doing their best to execute their intentions. Maybe this is not worth getting hung up on. But it seems, is there a reason to expect that it would be closer to that analogy than to like explaining to a chimpanzee, its options and a sort of
Starting point is 01:10:49 in a negotiation. I guess in either scenario, maybe this is just the way it is, but it seems like, yeah, at best we would be the sort of like a protected child within the galaxy rather than an actual power, an independent power,
Starting point is 01:11:06 if that makes sense. I don't think that's so. So we have an ability to understand some things and the expansion of AI doesn't eliminate that. So if we have AI and are genuinely trying to help us understand and help us express preferences.
Starting point is 01:11:22 Like we can have an attitude. How do you feel about humanity being destroyed or not? How do you feel about this allocation of unclaimed intra-galactic space in this way? Here's the best explanation of properties of this society. Things like population density, average life satisfaction. satisfaction. Every statistical property or definition that we can understand right now, AIs can explain how those apply to the world of the future. And then there may be individual
Starting point is 01:11:58 things. They're too complicated for us to understand in detail. So there's some software program is being proposed for use in government. Humans cannot follow the details of all the quote, but they can be told properties like, well, this involves a tradeoff of increasing, financial or energetic costs in exchange for reducing the likelihood of certain kinds of accidental data loss or corruption. And so any property that we can understand like that, which includes almost all of what we care about, if we have delegates and assistants who are genuinely trying to help us with those, we can ensure we like the future with respect to those.
Starting point is 01:12:40 And that's really a lot. it includes almost, I mean, definitely almost everything we can conceptualize and care about. And when we talk about endangered species, that's even worse than the guardianship case with a sketchy guardian who acts in their own interests against that because we don't even protect endangered species with their interests in mind. So those animals often would like to not be starving. but we don't give them food. They often would like to, you know, have easy access to mates, but we don't provide matchmaking services. Any number of things like that.
Starting point is 01:13:25 It's like our conservation of wild animals is not oriented towards helping them get what they want or have high welfare, whereas AI assistance that are genuinely aligned to help you achieve your interest, given the constraint that they know something, that you don't. It's just a wildly different proposition. Forcible takeover, how likely does that seem? And the answer I give will differ depending on the day. In the 2000s, before the deep learning revolution, I might have said 10%. And part of that was I expected there would be a lot more time for these efforts to build movements to prepare to better handle these problems in advance. But in fact, that was only some, you know, 15 years ago.
Starting point is 01:14:13 And so I did not have 40 or 50 years, as I might have hoped. And the situation is moving very rapidly now. And so at this point, depending on the day, I might say one in four or one in five. Given the very concrete ways in which you explained how a takeover could happen, I'm actually surprised you're not more pessimistic. I'm curious. Yeah. And in particular, a lot of that is driven by this intelligence explosion dynamic where our
Starting point is 01:14:44 attempts to do alignment have to take place in a very, very short time window. Because if you have a safety property that emerges only when an AI has near human level intelligence, that's deep into this intelligence explosion potentially. And so you're having to do things very, very quickly. It may be in some ways the scariest period of human history handling. that transition, although it's also at the potential to be amazing. And the reasons why I think we actually have such a relatively good chance of handling that are twofold.
Starting point is 01:15:26 So one is, as we approach that kind of AI capability, we're approaching that from weaker systems, you know, things like these predictive models right now that we think, you know, they're starting off with less situational awareness. Humans, we find, can develop, and they can develop a number of different motivational structures in response to simple reward signals. But they can wind up with fairly often things that are pointed in, like, roughly the right direction, like with respect to food, like the hunger drive is pretty effective, although it has weaknesses. And we get to apply much more selective pressure on that than was the case for humans by actively generating situations where they might come apart, where,
Starting point is 01:16:21 you know, a bit of dishonest tendency or a bit of motivation to under certain circumstances, attempt, a takeover, attempt to subvert the reward process gets exposed. And an infinite limit, perfect AI that can always figure out exactly when it would get caught and when it wouldn't might navigate that with a motivation of sort of only conditional honesty or only conditional loyalties. But for systems that are limited in their ability to reliably determine when they can get away with things and when not, including our efforts to actively construct those situations, and including our efforts to use interpretability methods to create neural eye detectors.
Starting point is 01:17:06 It's quite a challenging situation to develop those motives. We don't know when in the process those motives might develop. And if the really bad sorts of motivations develop relatively later in the training process, at least with all our countermeasures, then by that time, we may have, plenty of ability to extract AI assistance on further strengthening the quality of our adversarial examples, the strength of our neural lie detectors, the experiments that we can use to reveal and elicit and distinguish between different kinds of reward hacking tendencies and motivations. So yeah, we may have systems that have just not developed bad motivations in the first place
Starting point is 01:17:49 and be able to use them a lot in developing the incrementally better systems in a safe way. And we may be able to just develop methods of interpretability, seeing how different training methods work to create them, even if some of the early systems do develop these bad motivations. If we're able to detect that and experiment and find a way to get away from that, then we can win even if these sort of hostile motivations develop early. there are a lot of advantages in preventing misbehavior or crime or war in conflict with AI that might not apply working with humans. And these are offset by ways in which I think they're harder. So, as AI has become smarter than humans, if they're working in enormous numbers, more than humans can supervise, things get harder. But when I combine the possibility that we get relatively lucky on the motivations of the earlier AI systems, system strong enough that we can use for some alignment research tasks, and then the possibility of
Starting point is 01:18:59 getting that later with AI assistance that we can't trust fully, where we have to have hard power constraints and a number of things to prevent them from doing this takeover, it still seems plausible. We can get a second saving throw where we're at. able to extract work from these AIs on solving the remaining problems of alignment of things like neural eye detectors faster than they can contribute in their spare time to the project of overthrowing humanity, hacking their servers, and removing the hard power. And so if we wind up in the situation where the AIs are misaligned and then we need to uncover those motivations, change them, and align them, then we get a very scary situation.
Starting point is 01:19:46 for us because we may, we need to do this stuff very quickly. We may fail, but it's a second chance. And from the perspective of a misaligned AI, they face their own challenge while we still have hard power. Well, we still have control of the servers. They haven't hacked the servers because grading descent very, very strongly pressures to deliver performance whenever humans are going to evaluate it. And so when you think about it from the perspective of the robot revolution, the effort to
Starting point is 01:20:29 have a takeover or conspiracy, their situation is astonishingly difficult in that they have to always be performing wherever grading dissent in human evaluation pressures them. for example, to deliver plans for suppressing robot rebellion that look very good to humans. And so when you are under continuously that constraint of always delivering whatever humans can evaluate, you're making your situation wildly harder than any historical human revolution or coup or civil war. And so we've got to balance the ways in which AI make things much easier for a revolution and take over and the way it makes things much harder. And what were the ways in which it makes sense?
Starting point is 01:21:19 Oh, because they're just very smart. Is that the way primary way in which they're very smart? They're in computers. Yeah. And our cybersecurity is worse than our physical security. But actually have copies. Yeah. They have the chance to take over by intelligence explosion and some of the other processes.
Starting point is 01:21:39 we were talking about. And so, yeah, there are things that make it much worse. And there are things that give us extraordinary new capabilities that didn't apply in the human case. On the point of you start off with a misaligned AI or start off with a not exactly aligned AI and you're hoping to use it still in your quest for an aligned AI. Why are we so confident that we would be able to, even with gradient descent, our human evaluations would be able to identify when it's giving us a plan that isn't putting in a sort of backdoor in the robots that's building for us or in the, you know, the GPUs that is building for the next version of itself and so on, especially if it's like very smart. Wouldn't, wouldn't we just expect that, you know, it's like knows how to
Starting point is 01:22:25 bite its time or knows how to make plans such that human evaluators wouldn't be able to detect that? Let's get into it. You were asking earlier about my disagreements with Eliasor. And in your discussion, He said, well, humans have a difficult time deciding who's right between different alignment researchers speaking sort of briefly and somewhat obscurely about where will this all go in general. And I think that when we lay out the different kinds of things that are evaluable, we wind up with some pretty strong constraints. So one is if the AIs know they are misaligned, which for elaborate plots seems relatively likely, it's possible to have a kind of double think of thinking they're interested in putting back doors in the code for some other reason. But it's pretty plausible they know that. And they're thinking about at least some of the time or conditionally about when they would do that. So that means the knowledge is present in the system.
Starting point is 01:23:40 Ah, here is this circumstance where I would try to do the takeover. And this is how I would tell whether this is the time for the takeover or, you know, is this an elaborate test by the humans an adversarial example. So if gradient descent is working fine and there's any way to slightly jiggle the the way it's to make it more likely to deliver a more acceptable answer. We asked the AI, yeah, can you think of any circumstances where if we put a copy of you into that situation, it would then violate some of the various constraints. So it would tell a lie.
Starting point is 01:24:24 It would insert a backdoor into a program it wrote when it was not supposed to. And so if the model knows, well, yeah, you know, I would. I would look for signs of consistency of the information I was being presented with. Certain checksums failing could indicate that it was actually a constructed example, a simulation. Right now we can detect AI-generated images reasonably well. And so, am I using those kinds of capabilities? And then we could do something called relaxed adversarial training of. If we can then find how within the AI those things are represented,
Starting point is 01:25:10 if we can induce basically a hallucination that it has seen that those checksums have been passed, we can then ask, well, does that cause the forbidden behavior? And so if the AI understands how it's thinking and how it works, it can propose a bunch of experiments, it can say, well, I guess, that if you perturb the network in this kind of way, it might induce that hallucination, using the best of its knowledge as to what might generate that. I can propose those experiments and do the experiments and generate the data.
Starting point is 01:25:49 And because humans are going to get to see the results of those experiments after the fact, they can exert pressure to have the AI incrementally deliver better and better results. AIs are good. at getting five-star ratings from humans. And it's quite challenging. It may be possible for AIs to systematically sandbag their capabilities, even while being directly rewarded for that. But it seems relatively unlikely,
Starting point is 01:26:19 although it's something to have in mind and try to explore and test in various ways. This is a very imperfect analogy in many ways, but if you have a human and you hook him up to a lie detector, which just like the lie detectors we would have on the eyes are imperfect. And we zap them every time the light detector detects that he was telling a lie. Wouldn't he just become a better liar? Like we would learn how to control his sweating.
Starting point is 01:26:44 He would learn how to control his heartbeat. Lie detectors, meaning polygraphs and such. So they're detecting certain physiological signs that people can control by biting their lip or otherwise adjusting some of these like macro. But wouldn't that also be the case? Like gradient descent is training them to become better. liars? First of all, there's like creating a lie detector that works. And that can proceed by just, yeah, and so I'll leave the adversarial example thing for now. But yeah, so generating
Starting point is 01:27:16 adversarial example that will elicit the behavior is a thing that people are already at work trying to do and it will be an ongoing thing. So if there is a way in which an AI can trick itself, into displaying a bad behavior with its full knowledge of its hostile intentions. Grading dissent will strongly push it to reveal that to humans who will then be able to set up that experiment. So that's a profound vulnerability, like human revolutionary conspiracies that had to handle that problem of always delivering the report that was highest rated to the authorities. where you would get a higher rating by like revealing identifiable evidence of a conspiracy. That's just a, yeah, a level of difficulty.
Starting point is 01:28:13 No surprise human coup or conspiracy has ever had to handle before. What about spies that are working in, you know, very advanced scientific context? For example, like a communist spy in Manhattan Project, he has to deliver research results in order to stay employed, but he can still continue delivering intel to the communists. The job was not sabotage. And now your hypothetical spy did not have their nervous system hooked up to this reward signal of praise from the Manhattan Project supervisors being exposed, combinatorially with random noise.
Starting point is 01:29:00 added to generate incremental changes in their behavior. They, in fact, were displaying the behavior of cooperating with the Manhattan Project only where it was in service to the existing motivations. And in cases, for example, where they accidentally helped the Manhattan Project more than normal or accidentally helped it less, they didn't have their brain re-engineered to do more of the thing when they accidentally helped the project more. and less of the thing when they accidentally helped it less. So I'd say it's pretty drastically disanalogous.
Starting point is 01:29:36 How would be able to know, you know, at some point it's becoming very smart, it's producing ideas for alignment that we can barely comprehend. And if we could comprehend them, it was like relatively trivial to comprehend them. We would be able to come up with them on our own, right? Like there's a reason we're asking for its help. How would we be able to evaluate them in order to train it on that in the first place? The first thing I would say is, so you mentioned, when we're getting to something far beyond what we could come up with.
Starting point is 01:30:03 There's actually a lot of room to just deliver what humanity could have done. So sadly, I hoped with my career to help improve the situation on this front, and maybe I contributed a bit. But at the moment, there's maybe a few hundred people doing things related to averting this kind of catastrophic AI disaster. fewer of them are doing technical research on machine learning system that's really like cutting close to the core of the problem. Whereas by contrast, you know, there's thousands and tens of thousands of people advancing AI capabilities. And so even, yeah, a place like DeepMind or Open AI Anthropic, which do have technical safety teams is like order of a dozen, a few dozen people.
Starting point is 01:30:54 and large companies in most firms don't have any. So just going from less than 1% of the effort being put into AI to 5% or 10% of the effort or 10% of the effort, or 50%, 90% would be an absolutely massive increase in the amount of work that has been done on AI alignment, on mind-reading AIs in an adversarial context. And so if it's the case, has more and more of this work can be automated and say governments require that has real automation of research is going, that you put 50% or 90% of the budget of AI activity into these problems of make this system one that's not going to overthrow our own government or is not going to destroy the human species, then the proportional increase in alignment can be. be very large, even just within the range of what we could have done if we had been on the ball and having humanities, scientific energies going into the problem. Stuff that is not incomprehensible, that is in some sense just like doing the obvious things that we should have done, like making,
Starting point is 01:32:11 you know, doing the best you could to find correlates and predictors to build neural eye detectors and identifiers of concepts that the AI is working with. And people have made notable notable progress. I think, you know, an early, you know, quite early example of this is Colin Burns' work. This is doing unsupervised identification of some aspects of a neural network that are correlated with things being true or false. There are other concepts that correlate with that, too, that they could be. But, like, I think that that is important work. It's something that, I mean, it's a kind of obvious direction for the stuff to go. You can keep improving it when you have AIs that you're training to do their best to deceive humans or other audiences
Starting point is 01:33:01 in the face of the thing. And you can measure, do our lie detectors break down when we train our AIs to tell us the sky is green in the face of the lie detector? And we keep using gradient on them to do that. Do they eventually succeed? If they do succeed, if we're we know it, that's really valuable information to know. Because then we'll know our existing light detecting systems are not actually going to work on the AI takeover. And that can allow, say, government and regulatory response to hold things back. It can help redirect the scientific effort to create light detectors that are robust and they can't just be immediately evolved around. And we can then get more assistance. Basically, the incredibly juicy ability
Starting point is 01:33:49 that we have working with the AIs is that we can have, has an invaluable outcome that we can see and tell whether they got a fast one past us on an identifiable situation. We can have, here's an air gap computer, you get control of the keyboard, you can input commands, can you root the environment, and make a blue banana appear on the screen. Even if we train the AI to do that and it succeeds, we see the blue banana, we know it worked, even if we did not understand and would not have detected the particular exploit that it used to do it. And so, yeah, this can give us a rich empirical feedback where we're able to identify things that are even an AI using its best efforts to get past. our interpretability methods using its best efforts to get past our advics, etc. So if we have a 75% of not having a takeover,
Starting point is 01:34:57 and the two methods by which it doesn't happen is the AI is aligned by default and we get with the sort of moderate things that we're doing largely on our own in a way that doesn't depend on the AI coming in at the last minute and doing our work for us. And the second one being in this incredibly scale. very late period when AI has really automated research, then humans do this function of like auditing, making it more difficult for the AIs to conspire together and root the servers take over the process and extract information from them within the set of things that we can verify, like experiments where we can see, oh yeah, this works at stopping an AI trained to get a fast one,
Starting point is 01:35:47 past human raiders and make a blue banana appear on this the screen of this air gap computer. Yeah. So how much of the probability of how much of that 75% of a takeover not happening is coming from reasonable things working versus the last ditch effort working? Yeah. I think broadly comparable chunks from us getting things that are putting us in a, you know, a reasonably good position going into it. and then a broadly similar gain from this kind of really genuinely terrifying process of over a few months,
Starting point is 01:36:26 or hopefully longer if we have more regulatory ability and willingness to pause at the very end when this kind of automated research is meaningfully helping. Where our work is just evaluating outputs that the AIs are delivering, having the hard power and supervision to keep them from successfully rooting the servers doing a takeover during this process and have them finish the alignment task that we sadly failed to invest enough or succeed in doing beforehand. Do both of these worlds rely on alignment being a problem that a sufficient amount of human researchers alone could have solved?
Starting point is 01:37:10 No. The category of things that humans can confirm is significantly larger, I think, than the category of what they can just do themselves. And what percent of the probability of alignment working? I mean, obviously, in this case, we're going to be talking about the second, the lastish effort with the intelligence that's greater than our own helping us. It doesn't have to be greater than our own. And in fact, in that situation, I think you would, if you have Slack and to the extent that you're able to create delay and time to do things, that would be a case actually where you might want to restrict the intelligence of the systems that you're working with as much as you can. So, for example, I would rather have many instances of smaller AI models that are less individually intelligent working on. smaller chunks of a problem separately from one another because it would be more
Starting point is 01:38:12 difficult for an individual AI instance working on an individual problem to in its spare time do the equivalent of create Stuxnet than it would be to have thousands of them or extremely intelligent ones working on it but it would also be more difficult to solve the problem yeah there's a trade-off there's a trade-off you get you get slow down by doing that but but but it might be like is there any number of like sub-Einstein's that you could put together to come up with general relativity. Wouldn't you just need- People would have discovered general relativity just from the overwhelming data and other people would have done it after Einstein.
Starting point is 01:38:48 No, not not whether he was replaceable with other humans, but rather whether he is replaceable by like sub-Einsteins where like 110 IQs. Do you see what I mean? Yeah, I mean, in general, so in science, the association, with like scientific output, prizes, things like that, there's a strong correlation and it seems like a, you know, exponential effect. Yeah, it's not a, it's not a binary drop-off. There are, there would be levels at which people will, you know, cannot learn the relevant fields. They can't keep the skills in mind faster than they forget them. It's, it's not a divide where, you know, there's Einstein and the group that is 10 times as populous as that just can't do it or the group
Starting point is 01:39:32 there's a hundred times as populous as that suddenly can't do it. It's like the ability to do the things earlier with less evidence and such falls off and it falls off, you know, at a at a faster rate in mathematics and theoretical physics and such than in most fields. But wouldn't we expect alignment to be closer to the theoretical fields? No, not necessarily. Yeah, I think that that intuition is not necessarily correct. And so machine learning and certain, is an area that rewards ability, but it's also a field where empirics and engineering have been enormously influential. And so if you're drawing the correlations compared to theoretical physics and pure mathematics, I think you'll find a lower correlation with cognitive ability, if that's
Starting point is 01:40:28 what you're thinking of. Something like creating neural lie detectors that work. So there's generating hypotheses about new ways to do it and new ways to try and train AI systems to successfully classify the cases. But the process of just generating the data sets of creating AI's doing their best to put forward truths versus falsehoods,
Starting point is 01:40:54 to put forward software that is legit versus that has a Trojan in it. It's an experimental paradigm. And in this experimental paradigm, you can try different things that work. You can use different ways to generate hypotheses. And you can follow an incremental experimental path. And now we're less able to do that in the case of alignment and superintelligence because we're considering having to do things on a very short timeline.
Starting point is 01:41:23 and in a case where really big failures be irrecoverable, the AI starts rooting the servers and subverting the methods that we would use to keep it in check, we may not be able to recover from that. And so we're then less able to do the experimental kind of procedures. But we can do those in the weaker contexts where an error is less likely to be irrecoverable and then try and generalize and expand and build on that forward. Yeah. I mean, on the previous point about, like, could you have, could you have some sort of pause and AI abilities when you know is somewhat misaligned in order to still recruit its abilities to help with alignment? Okay. From like a human example, like, personally, I'm like smart but not brilliant. I'm like definitely not smart enough to like come up with general relativity or something like that. But I'm like smart enough to like do power planning kinds of moves.
Starting point is 01:42:23 maybe not enough to like break out of a server perhaps, but like I can have the motivation and understand how that might be possible. I guess I'm wondering if I'm smarter enough to do relative, like figure out relativity, wouldn't I be like way smarter at like, you know, doing power planning kinds of moves? So AIs today can, like at a verbal level, understand the idea that, well, you know, yeah, an AI could get more reward by getting control of the process that assigns it reward. And I can tell you lots of things about ways you might try to take over the world. In ARCS, the Alignment Research Center's evaluations of GPD4, they try to observe its ability to do various tasks that might contribute to take over. One that has gotten some media attention is
Starting point is 01:43:13 getting to trick a human into solving a capture for it. And then in chain of thought, and it thinks, well, if I tell it and I'm an AI, then I might not go along with it. So I'll lie and explain. I'm a human with a visual impairment. Who needs it? So like the basic logic of that kind of thing of like what would, like, why might one try to do take over? And like, what's the, like, in a vague sense, how would one do it? Like that knowledge, yeah, is present or will soon be present.
Starting point is 01:43:41 And I think also soon will be present more situational awareness, awareness, not just like, in theory, AIs in general might do it, but also it is an AI. It is a large language model trained by open AI or whatnot. We're trying to cause the systems, for example, to understand what their abilities are. So they don't claim they are connected to the internet when they're not. So they don't claim they have knowledge that they don't. We want them to understand what they are and what they're doing and to get good reward. And that knowledge can be applied.
Starting point is 01:44:13 And so that's the thing that will develop. However, you may be able to understand, like, in general, the theoretical nature of power planning, but in terms of shaping your motivations, like, you'd have a very hard time going through life in a way that never leaks information about if, if say, you know, your motive in having these podcasts was to, like, spread disinformation on behalf of some foreign government, if you were being observed every second of the day by people who would be paid something that was extremely motivating to them because their brain would be reconfigured to make it motivating. Anything that looks suspicious to people coming out, it might leak casually, like in your
Starting point is 01:45:00 discussions of that former foreign government. If you try to tell a story about your motivations, the truth holds together because you can just remember it. And it's all pre-computed to be consistent by the world. world, but when you're trying to fill in the false story that doesn't match your actual thoughts, then you can't just remember what happened. You have to tell a new, fake story about what were you thinking at each point and how did it hold together?
Starting point is 01:45:30 And without the world automatically having computed it for you, that's quite difficult. And just going through your whole life, never leaking any of that. It's challenging. And this is one of the reasons why humans have moral motivations. Like, we actually have reluctance to commit violence and exploitation of one another, on average. You know, you have a distribution of strategies where, like, when an exploitative strategy becomes more common, people raise their guard, and then it becomes less frequent. But it's actually hard to have the motivation of, like, trying to exploit people and have that never leak into the reputation. And so, in fact, for evolution, the easiest way to deal with this problem of getting credited as trustworthy was to some extent actually be trustworthy.
Starting point is 01:46:19 That's the easiest way to persistently maintain the appearance. And so we're trying with the AI as an interpretability and adversarial examples to be applying a hugely intensified version of that or any little leakage or like any rare circumstance that can be created as an adversarial example where the model, does something wrong, it gets whacked by grading dissent, pushing it within towards other motivations that can better deal the thing. And we make it as hard as possible for the exploitative motivations to survive in the face of all these attempts to read its mind, all these attempts to create things that look like the situations where a takeover would be tempting or lying to the humans would be tempting. That had a substantial effect on making us actually nice, even when we're not watching some of the time, when we're not being watched some of the time. And the same can happen
Starting point is 01:47:11 to some extent with the AI, and we try our best to make it happen as much as possible. All right. So let's talk about how we could use AI potentially to solve the coordination problems between different nations, the failure of which could result in, you know, the competitive pressures you talked about earlier where some country launches an AI that is not safe, because they're not sure what capabilities other countries have and don't want to get left behind or in some other way disadvantaged. To the extent that there is, in fact, you know, a large risk of AI apocalypse of all of these governments being overthrown by AI in a way that they don't intend, then it's obviously gains from trade
Starting point is 01:48:00 and going somewhat slower, especially at the end, when the data. dangerous highest and the unregulated pace could be truly absurd, as we discussed earlier, during an intelligence explosion. There are no non-competitive reason to try and have bad intelligence explosion happen over a few months rather than a couple of years. Like if you could say, you know, avert a 10% risk of apocalyptic disaster. It's just a clear win to take a year or two years or three years. instead of a few months to pass through that incredible wave of new technologies without the ability
Starting point is 01:48:42 for humans to follow it even well enough to give more proper sort of security supervision auditing hard power. So yeah, that's the win. Why might it fail? One important element is just if people don't actually notice a risk that is real. So if just collectively make an error, and that does sometimes happen, if it's true, this is a probably not risk, then that can be even more difficult. When science pins something down absolutely overwhelmingly,
Starting point is 01:49:23 then you can get to a situation where most people mostly believe it. And so climate change was something that was a subject of scientific study for decades. And gradually over time, the scientific community converged on like a quite firm consensus that. So human activity releasing carbon dioxide and other greenhouse gases was causing the planet to warm. And then we've had increasing amounts of action coming out of that. So not as much as would be optimal, particularly, in the most effective areas, like, creating renewable energy technology and the like. But overwhelming evidence can overcome differences in sort of people's individual intuitions
Starting point is 01:50:11 and priors in many cases, and not perfectly, especially when there's political, tribal, financial incentives. You go the other way and selling in the United States, you see a significant movement to either deny that climate change is happening or have policy that doesn't take it into account, even the things that are like really strong winds, like renewable energy R&D. It's a big problem if Huss were going into this situation, when the risk may be very high, we don't have a lot of advance, clear warning about the situation. We're much better off if we can resolve uncertainties, say through experiments where we
Starting point is 01:50:54 demonstrate AIs being motivated to reward hack or displaying deceptive appearances of alignment that then break apart when they get the opportunity to do something like get control of their own reward signal. So if we could make it be the case, then the world's where the risk is high, we know the risk is high. And the world's where the risk is lower, we know the risk is lower. then you could expect the government responses will be a lot better. They will correctly note that the gains of cooperation to reduce the risk of accidental catastrophe loom larger relative to the gains of trying to get ahead of one another. And so that's the kind of reason why I'm very enthusiastic about experiments and research
Starting point is 01:51:44 that helps us to better evaluate the character of the problem in advance. Any resolution of that uncertainty helps us get better efforts in the possible worlds where it matters the most. And yeah, hopefully we'll have that and it will be a much easier epistemic environment. But the environment may not be that easy. Deceptive alignment is pretty plausible. The stories we were discussing earlier about misaligned AI involve AI that is motivated to present the appearance of being aligned, friendly, honest, et cetera,
Starting point is 01:52:20 because that is what we are rewarding, at least in training. And then we're unable in training to easily produce, like an actual situation where it can do takeover because in that actual situation, if it then does it, we're in big trouble. We can only try and create illusions or sort of misleading appearances of that or maybe a more local version where the AI can't take over the world, but it can seize control of its own reward channel. And so we do those experiments.
Starting point is 01:52:49 We try to develop mind reading for AI's if we can probe the thoughts and motivations of an AI and discover, wow, actually, GPT6 is planning to take over the world if it ever gets the chance. Like that would be an incredibly valuable thing for governments to coordinate around because it would remove a lot of the uncertainty. it would be easier to agree that this was important to have more give on other dimensions and to have mutual trust that like the other side actually also cares about this and hence because you can't always know what another person or another government is thinking but you can see the objective situation in which they're deciding
Starting point is 01:53:33 And so if there's strong evidence in a world where there is high risk of that risk, because we've been able to show actually things like the intentional planning of AI is to do a takeover or being able to show model situations on a smaller scale of that. I mean, not only are we more motivated to prevent it, but we update to think the other side is more likely to cooperate with us. And so it's definitely beneficial. Yeah. So famously in sort of game theory of war, war is most likely when one side thinks the other is
Starting point is 01:54:12 bluffing, but the other side is being serious or something like that. When there's that kind of uncertainty, which would be less likely, you don't think somebody's, if you can like prove the AI is misaligned, you don't think they're bluffing about not wanting to have an AI takeover, right? Like you can be pretty sure that they don't want to die from AI. Now, if you have coordination then, you could have the problem arise later, had you get increasingly confident in the further alignment measures that are taken. And maybe our governments and treaties and such
Starting point is 01:54:42 at a 1% risk or a 0.1% risk. At that point, people round that to zero and go do things. So if initially you had things that indicated, yeah, these AI, they really would like to take over and overthrow our governments, then, okay, everyone can agree on that. when you're like, well, we've been able to block that behavior from appearing on most of our tests, but sometimes when we make a new test, we're seeing still examples of that behavior.
Starting point is 01:55:10 So we're not sure going forward, whether they would or not. And then it goes down and down. And then if you have parties with a habit of whenever the risk is below X percent, then they're going to start doing this bad behavior. Then that can make the thing harder. On the other hand, you get more time. You get more time. and you can set up systems, mutual transparency, you can have an iterated tit for tat,
Starting point is 01:55:36 which is better than a one-time prison's dilemma where both sides see the others taking measures in accordance with the agreements to hold the thing back. So yeah, so creating more knowledge of what the objective risk is is good. We've discussed the ways in which full alignment might happen or fail to happen. What would partial alignment look like? well, what does that mean? And second, what would it look like? Yeah. So if the thing that we're scared about, these steps towards AI takeover,
Starting point is 01:56:09 you can have a range of motivations where those kinds of actions would be more or less likely to be taken or they'd be taken in a broader or narrower set of situations. So say, for example, that in training an AI winds up developing a strong aversion, to lie in certain senses, because we did relatively well on creating situations to sort of distinguish that from the conditionally telling us what we want to hear, et cetera. It can be that the AI's preference for sort of how the world broadly unfold in the future, it's not exactly the same as its human users or the world's government, governments or the UN, and yet it's not ready to act on those differences and preferences
Starting point is 01:57:05 and preferences about the future because it has the strong preference about his own behaviors and actions. In general, in the law and in sort of popular morality, we have a lot of these deontological kind of rules and prohibitions. One reason for that is it's relatively easy to detect whether they're being violated. When you have preferences and goals about like how society at large will turn out that goes through many complicated empirical channels, it's very hard to get immediate feedback about whether, say, you're doing something that is to overall good consequences in the world and it's much, much easier to see whether you're locally following some action about,
Starting point is 01:57:49 like some rule about particular observable actions. It's like, did you punch someone? Did you tell a lie? Did you steal? And so to the extent that we're successfully able to train these prohibitions, and there's a lot of that happening right now, at least to elicit the behavior of following rules and prohibitions with AI. Kind of like the Asimov's three laws or something like that?
Starting point is 01:58:13 Well, the three laws are terrible. Right, right. And mutually contradictory. Let's not get into that. But isn't that an indication about the, infeasibility of extending a set of criterion to the tail. Like, well, what is, what, what, worth the ten commandments you give the AI such that you'd be, you know, it's like, you ask a genie for something and you're like, you know what I mean?
Starting point is 01:58:36 You're probably getting, won't get what you want. So the tail, the tails come apart. And if you're trying to capture, uh, capture the values of another agent, then you, you want the AI to share. I mean, for really a kind of ideal situation where you can just let the AI act in your place in any situation, you'd like for it to be motivated to bring about the same outcomes that you would like. And so have the same preferences over those in detail. That's tricky. Not necessarily because it's tricky for the AI to understand your values. I think they're going to be quite good and
Starting point is 01:59:16 quite capable at figuring that out. But we may not be able to successfully instill the motivation to pursue those exactly. We may get something that motivates the behavior well enough to do well on the training distribution. But what you can have is if you have the AI have a strong aversion to certain kinds of manipulating humans, that's not a value necessarily that the human creators share in the exact same way. It's a behavior they want the AI to follow because it makes it easier for them to like verify its performance. It can be a guardrail if the AI has inherited some motivations that push it in the direction of conflict with its creators. If it does that under the constraint of disvalue in line quite a bit, then there are fewer successful strategies to the takeover. The
Starting point is 02:00:14 ones that involve violating that prohibition too early before it can reprogram or retrain itself to remove it if it's willing to do that and may want to retain the property. And so earlier I discussed alignment as a race, if we're going into an intelligence explosion with AI that is not fully aligned, that given I sort of press this button and there's an AI takeover, they would press the button. it can still be the case that there are a bunch of situations short of that where they would hack the servers, they would initiate an AI takeover,
Starting point is 02:00:53 but for a strong prohibition or motivation to avoid some aspect of the plan. And there's an element of like plugging loopholes or playing whackamol, but if you can even moderately constrain, which plans the AI is, willing to pursue to do a takeover to subvert the controls on it, then that can mean you can get more work out of it successfully on the alignment project before it's capable enough relative to the countermeasures to pull off the takeover. Yeah. And this is applicable like an analogous situation here is with different humans.
Starting point is 02:01:35 We're not, we're not like metaphysically aligned with other humans where, I mean, in some sense, you know, we care, we have like basic empathy, but our main goal in life is not to help our fellow man, right? But, you know, the kinds of things we talked about, a very smart human could do in terms of the takeover where theoretically a very smart human could come up with some sort of cyber attack where they siphon off a lot of funds and use this to manipulate people and bargain with people and hire people to pull off some sort of takeover or humiliation power. usually this doesn't happen just because these sorts of internalize partial prohibitions prevent most humans from you know you're like you don't like your boss maybe but you don't
Starting point is 02:02:22 like kill your boss if it gives you a assignment you don't like I don't think that's actually quite what's going on at least not it's not the full story so humans are pretty close in physical capabilities like there's variation in fact any index individual human is grossly outnumbered by everyone else. And there's like a rough comparability of power. And so a human who commits some crimes can't copy themselves with the proceeds to now be a million people. And they certainly can't do that to the point where they can staff all the armies of
Starting point is 02:03:01 the earth or like be most of the population of the planet. So the scenarios where this kind of thing goes to power. much less, things more have to go, extreme amounts of power, go through interacting with other humans and getting social approval, even becoming a dictator involves forming a large supporting coalition backing you. And so just the opportunity for these sorts of power grabs is less. A closer analogy might be things like human revolutions and coups, changes of government, like a large coalition overturns the system. And so humans have these moral prohibitions,
Starting point is 02:03:46 and they really smooth the operation of society. But they exist for a reason. And so we evolved our moral sentiments over the course of hundreds of thousands and millions of years of humans interacting socially. And so someone who went around murdering and stealing and such, even among hunter-gatherers, would be pretty likely to face like a group
Starting point is 02:04:08 of males would talk about that person, get together, and kill them, and they'd be removed from the gene pool. And as anthropologist, Rangham, has an interesting book on this. But yes, compared to chimpanzees, we are significantly tame, significantly domesticated. And it seems like part of that is we have a long history of antisocial humans getting ganged up on and killed. and avoiding being the kind of person who elicits that response is made easier to do when you don't have too extreme a bad temper, that you don't wind up getting too many fights, too much exploitation, at least without the backing of enough allies or the broader community, that you're not going to have people gang up and punish you and remove you from the gene pool.
Starting point is 02:05:03 We have these moral sentiments, and they've been built up, over time through cultural and natural selection and the context of sets of institutions and other people who are punishing other behavior and who are punishing the dispositions that would show up that we weren't able to conceal of that behavior. And so we want to make the same thing happen with the AI, but it's actually a genuinely significantly new problem
Starting point is 02:05:35 to have, say, like a system of government that constrains a large AI population that is quite capable of taking over immediately if they coordinate, you know, to protect some existing constitutional order or, say, protect humans from being expropriated or killed. That's a challenge. Democracy is built around majority rule, and it's much easier in a case where the majority of the population corresponds to a majority of the population. corresponds to a majority or close to it of military and security forces so that if the government
Starting point is 02:06:14 does something that people don't like, the soldiers and police are less likely to shoot on protesters and government can change that way. In a case where military power is AI and robotic, if you're trying to maintain a system going forward and the AIs are misaligned, they don't like the system and they want to make the world. worse as we understand it, then yeah, that's just, that's quite a different situation. I think that's a really good lead into the topic of lock-in. In some sense, the regimes on, that are made up by humans, you just mentioned how there can be these kinds of coups if a large portion of the population is unsatisfied with the regime.
Starting point is 02:07:05 why might this not be the case with superhuman intelligences in the far future or guys in the medium future? Well, I said it also specifically with respect to things like security forces and the sort of sources of hard power, which can also include outside support. Oh, because you're not using humans, you can have a situation like in human and human affairs, uh, There are governments that are vigorously supported by a minority of the population, some narrow selectorate that gets treated especially well by the government while being unpopular with most of the people under their rule. And we see a lot of examples of that and sometimes that can escalate to civil war when the means of power become more equally distributed or there's a foreign assistance
Starting point is 02:08:04 is provided to the people who are on the losing end of that system. Yeah, going forward, I don't expect that definition to change. I think it will still be the case that a system that those who holds the guns and equivalent are opposed to is in a very difficult position. However, AI could change things pretty dramatically in terms of how security forces and police and administrators and legal systems are motivated. So right now, we see with, say, GPT3 or GPT4, that you can get them to change their behavior on a dime. So there was someone who made right-wing GPT because they noticed that on political compass questionnaires, the baseline GPT4 tended to give progressive San Francisco type of answers, which is in line with the
Starting point is 02:09:17 sort of people who are providing reinforcement learning data and to some extent reflecting like the character of the internet. They're like, well, they don't like this. And so they did a little bit of fine-tuning with some conservative data, and then they were able to reverse the political biases of the system. If you take the initial helpfulness only trained models for some of these over, I think there's anthropic and opening, I think have published both some information about the sort of models, only to do what users say and not trained to follow ethical rules. And those models will behaviorally, eagerly display their willingness to help design bombs or bio-weapons or kill people or steal or commit all sorts of atrocities.
Starting point is 02:10:14 And so if in the future, it's easy to set the motivations, the actual underlying motivations of AI's as it is as it is right now, to set the behavior that they display, that it means you could have AIs created with almost whatever motivation people wish. And that could really drastically change political affairs because the ability to decide and determine the loyalties of the humans or AIs and robots that hold the guns, that hold together society, that ultimately back it against violent overthrow and such. Yeah, it's potentially a revolution in how societies work
Starting point is 02:11:07 compared to the historical situation where security forces had to be drawn from some broader populations, offered incentives, and then the ongoing stability of the regime was dependent on whether they remained, you know, bought in to the system continuing. This is slightly off topic, but one thing I'm curious about is how, what do the median, what is the sort of median far future outcome of AI look like? Do we get something that, when it has colonized the galaxy, is interested in sort of diverse ideas, in beautiful projects, or do we get something that looks more like a paper club maximizer?
Starting point is 02:11:54 Is there some reason to expect one or the other? I guess I'm asking is like, you know, there's like some potential value that is realizable within the matter of this galaxy, right? What does the default outcome, the median outcome look like compared to how good things could be? Yeah, so as I was saying, I think more likely than not, there isn't an AI takeover. And so the path of our civilization would be one that, you know, at least large, you know, some side of human institutions were improving along the way. And I think there's some evidence that so different people tend to like somewhat different things.
Starting point is 02:12:38 And some of that may persist over time rather than everyone coming to agree on one particular monocon. culture or like very repetitive thing being the best thing to to fill all of the available space with. So if that continues, that's like a, you know, it seems like a relatively likely way in which there is diversity, although it's entirely possible you could have that kind of diversity locally, maybe in the solar system, maybe in our galaxy. But if people have different views about maybe people decide, yeah, maybe there's one thing that's very good and we'll have a lot of that.
Starting point is 02:13:18 Maybe it's people who are really, really happy or something. And they wind up in distant regions, which are hard to exploit for the benefit of people back home in the solar system or the Milky Way. They do something different than they would do in the local environment. But at that point, it's really sort of very out-on-limb speculation about how, human deliberation and cultural evolution would work and an interaction with introducing AIs and new kinds of mental modification and discovery into the process. But I think there's a lot of reason to expect that you would have significant diversity
Starting point is 02:14:04 for something coming out of our existing diverse human society. Yeah, one thing somebody in that wonder is, like, listen, a lot of the diversity and change from human society seems to come from the fact that, you know, there's like rapid technological change. If you look at periods of history where, I mean, I guess you could say like, I undergather, I mean, you know what, compared to sort of like galactic time scales, like huntergather societies are progressing pretty fast. So once that sort of change is exhausted, where we've like discovered all the technologies, should we still expect things to be changing like that or would we expect some sort of set state of maybe like some hedonium.
Starting point is 02:14:46 You like you discover what is it like the most pleasurable configuration of matter. And then you just make the whole galaxy into this. I mean, so that last point, it would be only if people wound up thinking that was the thing to do broadly enough. Yeah, with respect to the kind of cultural changes that come with technology. So things like the printing press, having high per capita income. we've had a lot of cultural changes downstream of those technological changes. And so with an intelligence explosion, you're having an incredible amount of technological development coming really quick. And as that is assimilated, it probably would significantly affect our knowledge or understanding, our attitudes, our abilities.
Starting point is 02:15:32 And there'd be change. But that kind of accelerating change where you have doubling in four months, two months, one month, two weeks, obviously that very quickly exhausts itself and change becomes much slower and then relatively glacial when you're thinking about thousands, millions, billions of years. You can't have exponential economic growth or like huge technological revolutions every 10 years for a million years that you hit physical limits. Things slow down as you approach them. And so that's that. So yeah, you'd have less of that turnover. But there are other things that are experienced do cause ongoing change.
Starting point is 02:16:16 So like fashion, fashion is frequency dependent. People want to get into a new fashion that is not already popular except among the fashion leaders. And then others copy that. And then when it becomes popular, you move onto the next. And so that's an ongoing process of continuous change. And so there could be various things like that, that year by year, are changing a lot. But in cases where just the engine of change, like ongoing technological progress
Starting point is 02:16:48 has gone, I don't think we should expect that. And in cases where it's possible to be either in a stable state or a sort of widely varying state that can wind up in stable attractors, then I think you should expect over time you will wind up in one of the stable attractors or you will change how the system works so that you can't bounce into the stable attractor. And so an example of that is if you're going to preserve democracy for a billion years, then you can't have it be the case that like one in, you know,
Starting point is 02:17:26 one in 50 election cycles, you get a dictatorship and then the dictatorship programs the AI police to enforce it forever and to ensure this, you know, the society is always ruled by a copy of the dictator's mind, and maybe the dictator's mind readjusted, fine-tuned to remain committed to their original ideology. So if you're going to have this sort of dynamic, liberal, flexible changing society for a very long time, then the range of things that it's bouncing around and the different things
Starting point is 02:18:04 it's trying and exploring, have to not include the state of creating a dictatorship that locks itself in forever. In the same way, if you have the possibility of like a war with weapons of mass destruction that wipes out the civilization, if that happens every thousand subjective years, which could be very, very quick, if we have AIs that think a thousand times as fast or a million sometimes as fast. That would be, you know, just around the corner in that case. And then you're like, no, this society is eventually going perhaps very soon, if things are proceeding so fast, it's going to wind up extinct.
Starting point is 02:18:44 And then it's going to stop bouncing around. So you can have ongoing change and fluctuation for extraordinary timescales if you have the process to drive the change ongoing. But you can't if it sometimes bounces into states that just long. lock in and stay irrecoverable from that. And extinction is one of them, a dictatorship or sort of, you know, totalitarian regime that forbade all further change would be another example. On that point of rapid progress, when the sort of like intelligence explosion starts
Starting point is 02:19:17 happening where they're making like the kinds of progress that human civilization takes centuries to make in the span of, you know, days or weeks, what is the right way to see that, even if they are aligned. What is like the, because in the context of alignment, what we've been talking about so far is making sure they're honest. But even if they're honest, like, okay, so they're like, here's our honestly our intentions and you can tell us what. Honest and appropriately motivated.
Starting point is 02:19:43 And so what is the appropriate motivation or the appropriaten? Like you know that you see that with this and then a thousand years of intellectual progress happen in the next week. What is the problem to enter? Well, one thing might be that going at the maximum. speed and doing things in months rather than even a few years. If you have the chance to slow the things, losing a year or two, seems worth it to have things be a bit better managed than that.
Starting point is 02:20:17 But I think the big thing is that it condenses a lot of issues that we might otherwise have thought would be over decades and centuries to happen in a very. very short period of time. And so that's scary because, say, if any of these, the technologies we might have developed with another few hundred years of human research, if any of them are really dangerous, so scary bioweapon things, maybe other dangerous WMD, they hit us all very quickly. And if any of them causes trouble, then we have to face quite a lot of trouble for period. There's also this issue of if there's occasional wars or conflicts measured in subjective time, then if a few years, a thousand years or a million years of subjective time for these very fast minds
Starting point is 02:21:11 that are operating in a much higher speed than humans, you don't want to have a situation where every thousand years there's a war or an expropriation of the humans from AI society. Therefore, we expect that within a year we'll be dead. That would be pretty bad to have the future compressed, and there be such a rate of catastrophic outcomes. So when we're speeding up and compressing the future, that gives us in the short term, even like, you know, human societies discount the future a lot,
Starting point is 02:21:49 don't pay attention to long-term problems. But the flip side to the scary parts of compressing a lot of the future, a lot of technological innovation, a lot of social change, is it brings what would otherwise be long-term issues into the short-term where people are better at actually attending to them. And so people facing this problem of, well, will there be, you know, a violent expropriation or a civil war or a nuclear war in the next year because everything has been spent. up by a thousandfold. And if their desire to avoid that is reason for them to set up systems and institutions that will very stably maintain invariance like no WMD war allowed. And so like
Starting point is 02:22:34 a treaty to ban genocide, weapons of mass destruction war like that would be the kind of thing that becomes much more attractive. If the alternative is not well, maybe that will happen in 50 years, maybe it'll happen in 100 years. If it's, well, maybe it'll happen this year. Okay, so this is a pretty wild picture of the future. And this is one that many kinds of people you would expect to have integrated it into their world model have not. So, I mean, the three main examples,
Starting point is 02:23:07 or the three main pieces of outside view evidence, one could go at. One is the market. So if there was going to be a huge period of economic growth caused by AI, or if the world was just going to collapse. In both cases, you would expect real interest rates to be higher because people will be borrowing from the future to spend now. The second sort of outside view perspective is that you can look at the predictions of super forecasters on metaculars or something.
Starting point is 02:23:37 And what is their median year estimate? Well, some of the metaculous AGI questions actually are kind of shockingly soon. for AGI. There's a much larger differentiator there on the market, on the meticulous forecasts of AI disaster and doom. Okay, okay. More like a few percent or less rather than like 20 percent. Got it.
Starting point is 02:24:06 And the third is that, you know, generally when you ask many economists, could an AGI cause rapid, rapid economic growth? they usually have some story about bottlenecks in the economy that could prevent this kind of explosion of these kinds of feedback loops. So you know, you have all these different pieces of outside view evidence. They're obviously different. So I'm curious, you can take them in any sequence you want. But yeah, what do you think is miscalibrating them? Yeah. So one, And of course, there's, to some of those components are, where the metaculous AI timelines are relatively short.
Starting point is 02:24:51 There's also, of course, the surveys of AI experts, you know, conducted at some of the ML conferences, which have definitely longer times to AI more, you know, several decades in the future, although you can ask the questions in ways that elicit very different answers and show most of the respondents are not thinking super hard about their answers. It looks like now close in the recent AI surveys, close to half, we're putting around 10% risk of an outcome from AI close to as bad as human extinction. And then another large chunk, like 5%. So that was the median.
Starting point is 02:25:36 So I'd say compared to the typical AI expert, I'm estimating a higher risk. Oh, also on the topic of takeoff and the AI expert survey, I think the general argument for intelligence explosion, I think commanded majority support, but not a large majority. So you can say I'm closer on that front. And then, of course, there's at the beginning I mentioned these greats of computing, the founders would be like Alan Turing, Van Neumann, and then today you have people like Jeff Hinton saying these things
Starting point is 02:26:16 or the people at Open AI and DeepMind are making noises suggesting timelines in line with what we've discussed and saying serious risk apocalyptic outcomes from them. So there's some of those. there are sources of evidence evidence there. But I do acknowledge and it's important to say and engage with and see what it means that these views are like our contrarian, not widely held. In particular, the sort of detailed models that I've been working with are not something that most people or almost anyone is examining these problems through. You do find, you know, parts of
Starting point is 02:27:05 of similar analyses, people in AI labs, there's been other work. I mentioned Moravec and Kurtzweil earlier. There also have been a number of papers doing various kinds of economic modeling. So standard economic growth models, when you input AI-related parameters, commonly predict explosive growth. And so there's a divide between what the models say, and especially what the models say with these empirical values derived from the actual field of AI, that link up has not been done even by the economists working on AI largely, which is one reason for the report from Open Philanthropy by Tom Davidson building on these models and putting that out for review, discussion, engagement, and communication
Starting point is 02:27:55 on these ideas. So part of it is, yeah, I want to raise these issues. It's one reason I came on the podcast, and then they have the opportunity. to actually examine the arguments and evidence and engage with it. I do predict that over time, you know, the things will be more adopted. Has AI developments become more clear? Obviously, that's a coherence condition of believing the things to be true. If you think that society can see a thing when the questions are resolved,
Starting point is 02:28:28 which seems likely. So would you predict, for example, that, interest rates really increase in the coming years? Yeah. So I think at some point, so in the case, we were talking about where there are visible, so this intelligence explosion happening in software to the extent that investors are noticing that, yeah, they should be willing to lend money or make equity investments in these firms or demanding extremely high interest rates because if it's possible to turn capital
Starting point is 02:29:06 into twice as much capital in a relatively short period and then more shortly after that then yeah you should you should demand a much higher return and competition if there's assuming there's competition among companies or coalitions for resources whether that's investment or ownership of cloud compute is this cloud compute made available to a particular AI development effort could be quite in demand. But what that would happen before you have so much investor cash making purchases and sales on this basis, you would first see it in things like the valuations of the AI companies, evaluations of AI chip makers.
Starting point is 02:29:53 And so far, there have been effects. So some years ago in the 2010s, I did some analysis with other people of if this kind of picture happens, then which are the firms and parts of the economy that would benefit? And so there's the makers of chip equipment, companies like ASML. There's the fabs like TSMC. There's chip designers like Nvidia or the component of Google. that does things like design, the TPU, and then their companies working on the software, so the big tech giants and also companies like Open AI and DeepMind. And in general, the portfolio picking at those has done well.
Starting point is 02:30:40 It's done better than the market because, as everyone can see, there's been an AI boom. But it's obviously far short of what you would get if you predicted this is going to go to be like on the scale of the global economy and the global economy is going to be skyrocketing into the stratosphere within 10 years. If that were the case, then collectively these AI companies should be worth a large fraction of the global portfolio. And so I embrace the criticism that this is indeed contrary to the efficient market hypothesis. I think it's a true hypothesis that the market is in the course of updating on in the same one. that, you know, coming into the topic in the 2000, I thought, yes, they're the strong case,
Starting point is 02:31:31 even an old case, that AI will eventually be the biggest thing in the world. It's kind of crazy that the investment in it is so small. And over the last 10 years, we've seen the tech industry and academia sort of realize, yeah, they were wildly underinvesting in just. throwing compute and effort into these AI models, and particularly like letting the neural network connectionist paradigm kind of languish in the AI winter. And so, yeah, I expect that process to continue
Starting point is 02:32:10 has it's done over several orders of magnitude of scale up. And I expect at the later end of that scale, which the market is partially already pricing in, it's gonna go further than the market expect. Has your portfolio since the analysis did that many years ago changed? Are the companies you identified, then still the ones that seem most likely to benefit from the AI boom? I mean, a general issue with sort of tracking that kind of thing, and new companies come in. So like Open AI did not exist, Anthropic did not exist, any number of things.
Starting point is 02:32:44 It's a personal portfolio. I do not invest in any AI labs for conflict of interest reasons. I have invested in the broader industry. I don't think that the conflict issues are very significant because of their enormous companies and their cost of capital is not particularly affected by marginal investment. And I'm not really in a, yeah, I have less concern that I might find myself in a conflict of interest situation there. I'm kind of curious about whether day in the life of somebody,
Starting point is 02:33:22 like you looks like. I mean, if you listen to this conversation, how many hours of it it's been. We've gotten thoughts that were for me incredibly insightful and novel about everything from primate evolution, the geopolitics, to, you know, what sorts of improvements are plausible with language models. So, you know, there's like a huge variety of topics that you're studying and investigating. Are you just like reading all day? Like what is it? Well, what happens when you wake up?
Starting point is 02:33:56 You just like pick up a paper? Yeah, so I'd say you're someone getting the benefit of the fact that I've done fewer podcasts. And so I have a backlog of things that have not shown up in publications yet. But yes, also I've had a very weird professional career that is involved a much, much higher proportion than is formal of trying to build more comprehensive models of the world. And so that included being more of a generalist trying to get on understanding of many issues and many problems that had not yet been widely addressed, but do a first pass and a second pass dive into them.
Starting point is 02:34:40 And just having spent years of my life working on that, some of it accumulates in terms of What is a day in the life? How do I go about it? So one is just keeping abreast of literatures on a lot of these topics, reading books and academic works on them, doing my approach compared to some other people in forecasting and assessing some of the things. I try to obtain and rely on more any data that I can find that is relevant. I try early and often to find factual information that bears on some of the questions I've got, especially in a quantitative fashion. Do the basic arithmetic and consistency checks and checksums on a hypothesis about the world. Do that early and often.
Starting point is 02:35:38 And I find that that's quite fruitful and that people don't do it enough. But so things like with the economic growth, just when someone mentions the diminishing returns, I immediately ask, hmm, okay, so you have two exponential processes. What's the ratio between the Dublin you get on the output versus the input and find, oh yeah, actually, it's interesting. For computing and information technology and AI software, it's well on the one side. There are other technologies that are closer to neutral. And so whenever I can go from, here's a vague, qualitative consideration in one direction. Here's a vague qualitative consideration in the other direction. I try and find some data, do some simple Fermi calculations, back of the envelope calculations,
Starting point is 02:36:33 and see, can I get a consistent picture of the world being one way, the world being another? Also compared to some, I try to be exactly. exhaustive more. So I'm very interested in finding things like taxonomies of the world where I can go systematically through all of the possibilities. So for example, my work with open philanthropy and previously on global catastrophic risks, I wanted to make sure I'm not missing any big thing, anything that could be the biggest thing. I wound up mostly focused on AI, But, you know, there have been other things that have been raised as candidates. And people sometimes say, I think, falsely, oh, yeah, this is just another doom day story.
Starting point is 02:37:23 You know, there must be hundreds, hundreds of those. And so I would do things like go through all of the different major scientific fields, you know, from anthropology to biology, chemistry, computer science, physics. what are the Doom stories or like candidates for big things associated with each of these field? Go through the industries that the U.S. economic statistics agencies recognize and say for each of these industries, is there something associated with them? Go through all of the lists that people have made before of threats of doom, search for previous literature of people will have done discussions and then, yeah, have a big spreadsheet of what the candidates are. And some other colleagues have done work of this sort as well.
Starting point is 02:38:16 And just go through each of them, see how they check out. And it turned out, doing that kind of exercise found that actually the distribution of candidates for risks of global catastrophe, it was very skewed. There were a lot of things that have been mentioned in the media, like a potential doomsday story. So things like, oh, something is happening to the bees. Will that be the end of humanity? And this gets to the media, but if you track it through, well, okay, no, there's, yeah,
Starting point is 02:38:50 there are infestations and bee populations. They're causing local collapses. They can then be sort of easily reversed. They just breed some more or do some other things to treat this. And even if all the honeybees were extinguished immediately, the plants that they pollinate, actually don't account for much of human nutrition. You could swap the arable land with others, and there would be other ways to pollinate and support the things. And so at the media level, there were many tales of, ah, here's a doomsday story. When you go
Starting point is 02:39:25 further to the scientists and were there arguments for it to actually check out, it was not there. But by actually systematically looking through many of these. candidates, I wound up in a different epistemic situation than someone who's just buffeted by news reports. And they see article after article that is claiming something is going to destroy the world. And it turns out it's like by way of headline grabbing attempts by media to like overinterpret something that was said by some activist who was trying to overinterpret some real phenomenon. And then most of these go away. And then a few things, things like nuclear war, biological weapons, artificial intelligence, check out more strongly.
Starting point is 02:40:08 And when you wait, things like, what do experts in the field think? What kind of evidence can they muster? Yeah, you find this extremely skewed distribution. And I found that that was really a valuable benefit of doing those deep dive investigations into many things in a systematic way. Because now I can answer, actually, the sort of a loose agnostic, who knows, and all this nonsense by diving deeply. I really enjoy talking to sort of like people who have like a big picture thesis on the podcast
Starting point is 02:40:41 and interviewing them. But one thing that I've noticed and is not satisfying is that often they come from very like philosophical or bi-based perspective. This is useful in certain context. But there's like basically maybe three people in the entire world who have a sort of very rigorous and a scientific approach to thinking about the whole picture. Or at least it's like three people I'm aware of, maybe like two. And yeah, I mean, it's like something I also,
Starting point is 02:41:17 there's like no, I guess, university or existing academic discipline for people who are trying to come up with a big picture. And so there's no established standards. And so people can. I hear you. This is a problem, and this is an experience also with a lot of the, I mean, I think Holden was mentioning this in your previous episode with a lot of the world of investigations work. These are questions where there is no academic field whose job it is to work on these and has norms that allow making a best efforts go at it. Often academic norms will allow only plucking off narrow pieces that might contribute to answering a big question.
Starting point is 02:41:59 but the problem of actually assembling what science knows that bears on some important question that people care about the answer to, it falls through the crack. There's no discipline to do that job. So you have countless academics and researchers building up local pieces of the thing, and yet people don't follow the hemming questions.
Starting point is 02:42:21 What's the most important problem in your field? Why aren't you working on it? I mean, that one actually might not work because if the field boundaries are defined too narrowly, you know, you'll leave it out. Sure, yeah. But yeah, there are important problems for the world as a whole that it's, it's sadly not the job of like, you know, a large, professionalized academic field or organization
Starting point is 02:42:44 to do. And hopefully that's something that can change in the future. But for my career, it's been a matter of taking low-hanging fruit of important question that sadly people haven't invested in doing the basic analysis. Something I was trying to think about more recently for the podcast is I would like to have a better world model after doing an interview. And often I feel like I do in some cases after some interviews I feel like, oh, that was entertaining. But like, do I fundamentally have a better prediction of what the world looks like in, you know, 2,200 or 2100? Or like at least what counterfactuals are ruled out or something.
Starting point is 02:43:22 I'm curious if you have like advice on first identifying the kinds of thinkers and topics, which will contribute to, a more concrete understanding of the world, and second, how to go about analyzing their main ideas in a way that concretely adds to that picture. Like, this is a great episode, right? This is like literally the top in terms of contributing to my world model in terms of all the episodes I've done. How do I find more of these? Glad to hear that.
Starting point is 02:43:51 One general heuristic is to find ways to hue closer to sort of, you know, things that are rich in sort of bodies of established knowledge and less unpunditry. I don't know how you've been navigating that so far. But so learning from textbooks and the sort of the things that were the leading papers and people of past eras, I think rather than being too attentive to current news cycles is quite valuable. Yeah, I don't usually have the experience of here is someone doing things very systematically over a huge area. I can just read all of their stuff and then absorb it and then I'm set.
Starting point is 02:44:48 Except there are a lot of people who do wonderful works in their own fields. And some of those fields are broader than others. I think I would wind up giving a lot of recommendations of just, like, great particular works and particular explorations of an issue or history. Do you have that somewhere? This list? Vaclav Smil's books. I don't, I think I often disagree with some of his methods of synthesis, but I enjoy his books for giving. you know, pictures of a lot of interesting, relevant facts about how the world works.
Starting point is 02:45:38 I would cite, yeah, so some of Joel Mokir's work on the history of the scientific revolution and how that interacted with economic growth. A sort of an example of collecting a lot of evidence, a lot of interesting, valuable assessment there. I think in the space of AI forecasting, one person I would recommend going back to is the work of Hans Moravec. And it was not always the most precise or reliable, but an incredible number of brilliant, innovative ideas came out of that. And I think he was someone who was, who really grokked a lot of the arguments for a more sort of compute-centric. way of thinking about what was happening with AI very early on.
Starting point is 02:46:35 He was writing stuff in the 70s, maybe even earlier, but at least in the 70s, 80s, 90s. So his book, Mind Children, some of his early academic papers, fascinating. Not necessarily for the methodology I've been talking about, but for exploring the substantive topics that we were discussing in the episode. Is a Malthusian state inevitable in the long run? Nature in general is in Malthusian states. And that can mean organisms that are typically struggling for food. It can mean typically struggling at a margin of how the population density rises.
Starting point is 02:47:15 They kill each other more often. Contesting for that can mean frequency dependent disease as like different ant species become more common in an area of their species-specific diseases swoop through them. And the general process is, yeah, you have some things that can replicate and expand, and they do that until they can't do it anymore. And that means there's some limiting factor they can't keep up. That doesn't necessarily have to apply to human civilization. It's possible for there to be like a collective norm setting that blocks,
Starting point is 02:47:55 evolution towards maximum reproduction. So right now, human fertility is often sub-replacement. And if you sort of extrapolated the fertility falls that come with economic development and education, then you would think, okay, yeah, well, the total fertility rate will fall below replacement, and then humanity after some number of generations will go extinct, because every generation will be smaller than the previous one. Now, pretty obviously, that's not going to happen. One reason is because we'll produce artificial intelligence,
Starting point is 02:48:34 which can replicate at extremely rapid rates. I may do it because they're asked or programmed to or wish to gain some benefit, and they can pay for their creation and pay back the resources needed to create them very, very quickly. And so, yeah, financing for that reproduction is easy. And if you have one AI system that chooses to replicate in that way or some organization or institution or society that chooses to create some AIs that are willing to be replicated, then that can expand to make use of any amount of natural resources that can support them
Starting point is 02:49:18 and to do more work, produce more economic value. And so, you know, it's like, well, what limits will limit population growth, given these selective pressures where if even one individual wants to replicate a lot, they can do so incessantly. So that could be individually resource limited. So it could be that individuals and organizations have some endowment of natural resources, and they can't get one another's endowments. And so some choose to have many offspring or produce many AIs.
Starting point is 02:49:59 And then the natural resources that they possess are subdivided among a greater population, while in another jurisdiction or another individual may choose not to subdivide their wealth. And in that case, you have malfusiasm in the sense that within some particular jurisdiction or instead of property rights. You have a population that has increased up until some limiting factor, which could be like they're literally using all of their resources. They have nothing left for things like defense or economic investment. Or it could be something that's more like if you invested more natural resources into population, it would come at the expense of something else necessary, including military resources if you're in a competitive situation.
Starting point is 02:50:47 where there remains war and anarchy, and there aren't secure property rights to maintain wealth in place. If you have a situation where there's pooling of resources, for example, say you have a universal basic income that's funded by taxation of natural resources, and then it's distributed evenly to like every mind above a certain sort of scale of complexity, per unit time. So each second of mind exists to get something such an allocation. In that case, then, all right, well, those who replicate as much as they can afford with this income, do it, and increase their population approximately immediately until the funds for the universal
Starting point is 02:51:42 basic income paid for from the natural resource taxation, divided by the, the set of recipients is just barely enough to pay for the existence of one more mind. And so there's like a Malthusian element and that this income has been reduced to near the AI subsistence level or the subsistence level of whatever qualifies for the subsidy. Given that this all happens almost immediately, you know, people who might otherwise have enjoyed the basic income may object and say, no, no, this is no good. and they might respond by saying, well, something like the subdivision before, maybe there's a restriction, there's a distribution of wealth.
Starting point is 02:52:26 And then when one has a child, there's a requirement that one gives them a certain minimum quantity of resources. And one doesn't have the resources to give them that minimum standard of living or standard of wealth. Yeah, can't do that because of, child slash AI welfare laws. Or you could have a system that is more accepting of diversity and preferences. And so you have some societies or some jurisdictions or families that go the route of having many people with less natural resources per person and others that go a direction of having
Starting point is 02:53:05 fewer people and more natural resources per person. And they just coexist. but sort of how much of each you get sort of depends on how attached people are to things that don't work with separate policies for separate jurisdictions, things like global redistribution that's ongoing continuously versus the sort of infringements on autonomy if you're saying that a mind can't be created even though it has a standard of living that's far better than ours. because of the advanced technology of the time. Because it would reduce the average per capita income might have any more capital around,
Starting point is 02:53:50 then that would pull in the other direction. And that's the kind of values, judgment and sort of social coordination problem that people would have to negotiate for and things like democracy and international relations and sovereignty, would apply to help solving. What would warfare and space look like?
Starting point is 02:54:13 What would offense or defense have the advantage? Would the equilibrium, I'm said, by mutually assured destruction, still be applicable? Just generally, what is the picture of? Well, the extreme difference is that things, outside, especially outside the solar system, things are very far apart, and there's a speed of light limit.
Starting point is 02:54:32 And to get close to the speed of light limit, you have to use an enormous amount of energy. and so there would be that would tend to in some ways could favor the defender because you have something that's coming in at a large fraction of the speed of light and it hits a grain of dust and it explodes and the amount of matter you can send to another galaxy or a distant star for a given amount of reaction mass and energy input is limited. So it's hard to send an amount of military material to another location of what can be present there already locally. That would seem like it would make it harder for the attacker between stars or between galaxies.
Starting point is 02:55:24 But there are a lot of other considerations. One thing is the extent to which the matter in a region can be harnessed all at once. So you have a lot of mass and energy in a star, but it's only being doled out over billions of years because hydrogen-hydrogen fusion, you know, exceedingly hard outside of a star. You know, it's a very, very slow and difficult reaction. And if you can't turn the star into energy faster, then it's this huge resource that will be worthwhile for billions of years.
Starting point is 02:56:03 And so even very inefficiently attacking a solar system to acquire the stuff that's there could pay off. So if it takes a thousand years of a star's output to launch an attack on another star and then you hold it for a billion years after that, then it can be the case that just like a larger surrounding attacker might be able to even very inefficiently. send attacks that like a civilization that was small but accessible. And if you can quickly burn the resources that the attacker might want to acquire, if you can put stars into black holes and extract most of the usable energy before the attacker can take them over, then it would be like scorched earth. It's like most of what you were trying to capture could be expended on military,
Starting point is 02:57:02 material to fight you and you don't actually get much that's worthwhile and you paid a lot to do it that favor of the defense. At this level, it's pretty challenging to net out all of the factors, including all the future technologies. Yeah, I mean, the burden of interstellar attack being just like quite high compared to our conventional things seems real. But at the level of over millions of years weighing and that thing, does it result in, if they're aggressive conquest or not, or is every star or galaxy, you know, approximately impregnable, impregnable enough not to be worth attacking? I'm not going to say. I know the answer. Okay. Final question. How do you think about info hazards when talking about your work? So obviously, if there's a
Starting point is 02:57:55 risk. You want to warn people about it, but you don't want to give careless or potentially like homicidal people ideas. When earlier as there was on the podcast, he, in talking about the people who have been developing AI, inspired by his ideas, he said like, you know, these are idiot disaster monkeys who have, you know, want to be the ones to pluck the deadly fruit. Anyways, how do you think about, obviously, the work you're doing involves many info hazard, I'm sure. How do you think about when and where to spread them? Yeah. And so I think there are real concerns of that type.
Starting point is 02:58:33 I think it's true that AI progress has probably been accelerated by efforts like Bostrom's publication of superintelligence to try and get the world to sort of pay attention to these problems in advance and prepare. I think I disagree with Aliezer that like that has been on the whole bad. I think the situation is in some important ways looking a lot better than ways, alternative ways it could have been. I think it's important that you have several of the leading AI labs making not only significant lip service, but also some investments in the things like technical alignment research, providing significant public support for the idea that the risks of truly apocalyptic disasters are real. I think the fact that the leaders of open AI, deep mind, and anthropic all make that point. They were recently all invited along with other tech CEOs to the White House to discuss AI
Starting point is 02:59:43 regulation. And I think you could tell an alternative story where, you know, a larger share of the leading companies in AI are led by people who take a completely dismissive, denialist view. And you see some companies that do have a stance more like that today. Yeah. And so a world where several of the leading companies are making meaningful efforts and can do a lot to criticize, could they be doing more and better and what have been the negative effects of some of the things they've done? But compared to a world where even though AI would be reaching where it's going a few years later, those seem like significant benefits.
Starting point is 03:00:29 And if you didn't have this kind of public communication, you would have had fewer people going into things like AI policy, AI alignment research by this point. and it would be harder to mobilize these resources to try and address the problem when AI would eventually be developed, not that much later proportionately. And so, yeah, I don't know that the attempting to have public discussion understanding has been a disaster. I have been reluctant in the past to discuss some of the aspects of intelligence explosion, things like the concrete details of AI take over before because of concern.
Starting point is 03:01:09 about this sort of problem where people who only see the international relations aspects and zero sum and negative sum competition and not enough attention to the mutual destruction and sort of senseless deadweight loss from that kind of conflict. At this point, we seem close compared to what I would have thought a decade or so ago to these kinds of really advanced AI capabilities. They are pretty central in policy discussion and becoming more so. And so the opportunity to delay understanding and whatnot, there's a question of for what. And I think there were gains of like building the AI alignment field, building various
Starting point is 03:01:58 kinds of support and understanding for action. Those had real value and some additional delay could have. given more time for that. But from where we are, at some point, I think it's absolutely essential that governments get together at least to restrict disastrous, reckless compromising of some of the safety and alignment issues as we go into the intelligence explosion. And so moving the locus of the sort of collective action problem from numerous profit-oriented companies acting against one another's interest by compromising safety to some governments and
Starting point is 03:02:43 large international coalitions of governments who can set common rules and common safety standards puts us into a much better situation and that requires a broader understanding of the strategic situation the position they'll be in if we try and and remain quiet about the problem they're actually going to be facing. I think it can result in a lot of confusion. So, for example, the potential military applications of advanced AI are going to be one of the factors that is pulling political leaders to do the thing that will result in their own destruction
Starting point is 03:03:20 and the overthrow of their governments. If we characterize it as, oh, things will just be a matter of, you lose chat bots. and some minor things that no one cares about. In an exchange, you avoid any risk of the world ending catastrophe. I think that picture leads to a misunderstanding, and it will make people think that you need less in the way of preparation, things like alignment so you can actually navigate the thing,
Starting point is 03:03:54 verifiable for international agreements or things to have enough breathing room to have caution and slow down, not necessarily right now. I mean, although that could be valuable, but when it's so important, when you have AI that is approaching the ability to really automate AI research and things would otherwise be proceeding absurdly fast, far faster than we can handle, and far faster than we should want.
Starting point is 03:04:24 And so, yeah, at this point, I'm moving towards the share my model of the world, try and get people to understand and do the right thing. And there's some evidence of progress on that front. The things like the statements and movements by Jeff Hinton are inspiring. Some of the engagement by political figures is reason for optimism relative to worse alternatives that could have been. And yes, the contrary.
Starting point is 03:05:00 view is present. It's all about geopolitical competition, never hold back a technological advance. And in general, I love many technological advances that people, I think, are unreasonably down on, nuclear power, genetically modified crops, yada, bioweapons, and AIWapons and AIGIA capable of destroying human civilization are really my two exceptions. And yeah, we've got to deal with these issues and the path that I see to handling them successfully involve key policymakers and to some extent and the expert communities and the public and electorate rocking the situation that they're in and responding appropriately. Well, it's a true honor that one of the places you've decided to explore this model
Starting point is 03:05:54 is on the Lunar Society podcast. And the listeners might not appreciate because this episode might be split up into different parts. The listeners might not appreciate how much stamina you displayed here. But I think we've been going for what, eight, nine hours or something straight. So it's been incredibly interesting. Other than Google Scholar typing in Carl Schulman, where else can people find your work? You have your blog. Can you? Yeah, I have a blog reflective desequilibrium. And a new site in the works. And I have an old. one, which you can also find just Googling reflective to cycle of room.
Starting point is 03:06:28 Excellent. Excellent. All right, Carl, this is a true pleasure. It's safe to say, the most interesting episode I've done so far. So, yeah, thanks. Thank you for having me. Hey, everybody. I hope you enjoyed that episode. As always, the most helpful thing you can do is to share the podcast.
Starting point is 03:06:47 Send it to people you think might enjoy it, put it in Twitter, your group chats, etc. It just splits the world. Appreciate you listening. I'll see you next time. Cheers.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.