Big Technology Podcast - Is The AI Going To Escape? — With Anthony Aguire

Episode Date: August 13, 2025

Anthony Aguirre is the executive director of the Future of Life Institute. He joins Big Technology to discuss how AI could fail in the worst case and whether our push toward increasingly autonomous, g...eneral systems puts control out of reach. Tune in to hear how agentic systems operating at superhuman speed complicate oversight, and why “just unplug it” is naive. Hit play for a cool‑headed, nuanced conversation with clear takeaways you can use to evaluate AI strategy, policy, and risk. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com

Transcript
Discussion (0)
Starting point is 00:00:00 How bad could AI go in the worst-case scenario? Let's look beyond the near-term risks and explore what could really happen if the wheels come completely off. That's coming up right after this. Welcome to Big Technology Podcast, a show for cool-headed and nuanced conversation of the tech world and beyond. Well, on the show, we've explored a lot of the downsides of AI, a lot of the near-term risks, the business implications of what happens if things don't continue to accelerate a pace. We haven't had a dedicated episode looking at what could happen if things really go wrong. And so we're going to do that today. We're joined today by, I think, the perfect guest for this conversation.
Starting point is 00:00:41 Anthony Aguirre is here. He's the executive director at the Future of Life Institute and also a professor of physics at UC Santa Cruz. Anthony, great to see you. Thanks so much for having me on. Great to see you. Great to see you. Nice to have a conversation again this time in public. Suffice to say you're not excited about it.
Starting point is 00:01:00 all the progress that the AI industry is making? Well, that's not quite true. So there's lots of progress in AI that I just love. I use AI models all the time. I love lots of AI applications in science and technology. Lots of things where AI are tools that are letting us do things that we couldn't do before. The thing that I'm concerned about is the direction that we're headed in, which is toward increasingly autonomous and general and intelligence systems, things that that we've been calling AGI for a long time. And this, I think, is different at some level from what we've been doing. And I think is where most of the danger lies, especially on the large scale and in the longer
Starting point is 00:01:43 term. And there have been a number of studies in the training scenarios within the foundational model companies or foundational research houses, which are frontier research labs, actually I think is probably the best way to refer to them where AI has had this it seems like a value or an instinct to try to preserve itself in testing scenarios it's tried to copy its code out of the scenario when it thinks its values are being manipulated or it's also tried to in one instance blackmail the trainers to not change its values this was in an anthropic training scenario in order to preserve its encoded values there is a belief within the AI industry that this is just complete BS and it's the research labs in planning these scenarios within these bots and then being like, oh, look what they did once they like, you know, ran the code that they initially like baked in to try to copy themselves out of the training environment. What does your read on this? Is this the beginning of a potential
Starting point is 00:02:51 escape risk that we could see for AI? Well, I think what's important to know about these sorts of strange behaviors is that they're completely predicted and pretty much unavoidable. And they just follow from thinking through what it means to be an effective intelligent system. So if you're a system that is trying to pursue a goal, whether you're a person or a corporation or an AI system and you've got some goal, there are, you know, if you're smart enough to understand what that goal is and how it can be accomplished, then you're going to know that there are things that you have to do in order to accomplish that goal. And so if you have an AI
Starting point is 00:03:30 system, you say your goal is to do X, and then you put this AI system in a scenario where you're threatening its existence and it still wants to do X, or it still wants to accomplish some large scale thing that has been baked into it. Of course, it's going to take that, it's going to, you know, see that scenario and being a smart thing, figure out what do I have to do within that scenario to still accomplish my goal. And if that's blackmail the user or exfiltrate myself and my model waits to be operating somewhere else, or if it's fake something and pretend that I'm doing the thing that I, you know, that they want me to do, but actually do something else, I'm going to do those things. So I think that this is a problem that is going to get worse, not better, as we make
Starting point is 00:04:15 AI systems more general and more capable and more autonomous, because it's just intrinsic to to how a thinking thing works. And it's interesting that you're actually, you are giving credence to these early signs of the AI acting out of a self-preservation value because the critics would say a couple things. They would say the trainers are giving the AI this, you know, potential action that it can do. It's a probabilistic system.
Starting point is 00:04:42 So of course it's going to take that action in some number of cases. So it's not really a surprise. it's being fed by these trainers and testers. The other thing they would say is like, ha-ha, an AI attempted to run a code to exfiltrate its values. It was connected to nothing. And we haven't seen this in any production level system yet. So it is, in some ways, people are saying this is marketing
Starting point is 00:05:10 and this is a false alarm by the frontier labs to make you want to use this technology. in your Fortune 500 company, you know, connecting backend systems, but not a real risk to humanity. What's your response to that? I think it's true that the reason AI systems haven't done this, you know, in the real world is, well, that they haven't done this very much in real world circumstances. That's just because they are not, the right circumstances have not been available, like they haven't been in the scenarios that would lead to this. And mainly that the models are not actually that strong and are not that goal directed at the moment. So I think we're actually
Starting point is 00:05:53 in kind of a sweet spot with AI at the moment where AI systems, even the intelligent and general ones like GPT and Claude and Gemini and so on, they are pretty passive, right? They're not very autonomous. They need a lot of handholding to do things. They function mainly as tools that really do just do what people ask. And that's a good place to be. What people are trying very hard, what companies are trying very hard to develop now are systems that are much more autonomous, that is, they're able to take many more actions directed by some goals without people holding their hands or giving them permission at every step of the way and helping them figure out how to do it to do all of these things on their own. Though that level of autonomy combined with
Starting point is 00:06:35 even greater intelligence and generality is where I think a lot more of these issues are going to start to arise. So I think we're deliberately pushing in the direction that is going to make these sorts of behaviors more common rather than less. In terms of the argument that like highlighting risks is some sort of nefarious scheme to make AI seem more powerful so that people respect it more. Like this is I find a frankly pretty bizarre argument. Like no other industry does this ever. Like you don't have nuclear power plant saying like, well, we might blow up because
Starting point is 00:07:12 we're so great and so powerful. So please fund us. more so we can build more great powerful things that might blow up and cause nuclear and meltdowns. It's like we might have our airplanes. Our airplanes are so fast. They might just disintegrate in the air. They're so fast to invest in us and like take our, take our airplanes. Like no other industry does this. I think it's frankly, fairly nonsensical. I mean, I think there are, there's lots of hype, you know, every company is going to hype its products and it's going to, you know, twist things a little bit to make its product seem more powerful and compelling
Starting point is 00:07:44 and useful that they actually are. That's quite natural. But the idea that bringing up the risks of AI systems is somehow a conspiracy by companies to have people, you know, buy them or invest in the more. It just feels made up to me, frankly. Let me put the argument out there why it's less nonsensical than you're portraying it. You have to think about the buyer, right? You know, Deloitte isn't buying a nuclear power plant or a fighter plane. They might buy, for instance, Anthropics large language model, and they want to make sure that when they're rolling this out for clients, they're rolling out the best. So if you say hypothetically, oh, Anthropics AI, try to blackmail its trainer, it probably can transport some information from your backend system to your other backend system and make you 5% more efficient. That is why people would say it's marketing and that is why you would see it in AI, but not elsewhere.
Starting point is 00:08:44 not in those industries that you brought up. Well, I think it's more straightforward and just as easy to market your AI on the basis of the actual tests that you do. There are performance evaluations as well that aren't safety evaluations that are just what are the capabilities of these systems. Everybody is working very hard to compete with each other on the metrics. There are all kinds of sophisticated evaluations you can do for, you know, how autonomous is a system, how much can it run, you know, what level.
Starting point is 00:09:14 of, is it a five minute, a ten minute, a one hour human task going to do on its own? They're very sophisticated evaluations that companies can and do, do, and they compete with each other and exhibit to investors, I'm sure, and to buyers, I am sure. Why they would choose to exhibit this AI system may blackmail you as a user rather than, look, this AI system can act like do all these really like difficult tasks autonomously also makes no sense to me. Like I think the main problem that people have with current AI systems is a lack of trust. The AI systems confabulate and they go off the rails and they don't do exactly what you want in all sorts of ways.
Starting point is 00:10:01 And I think if the model providers that developed more trustworthy systems that don't blackmail you, that do check their citations before they give you, you know, a bunch of, like, information and quotes and links and so on, they actually go and check them. Is it a competitive advantage? Because the biggest blocker at the moment for many users of AI at the high level is trust and being able to actually rely on the model. So, yeah, I hear you, but I frankly just don't, don't buy it. I think there are lots of ways that corporations can hype their products without,
Starting point is 00:10:40 going down this road. I think it's just a smokescreen. I think the people, for example, at Anthropic, they've been around for a long time worrying about the potential risks and how to make safe, very powerful AI systems. Same with OpenAI back when it had people who were worried about AI safety, like lots of them. It has many less now. But these are people who have been worrying about this problem for a long time. They've been thinking about what could go wrong and how. And now that the AI systems are here and are powerful, they're checking. you know, all those things that we worried about, are they in fact happening with these powerful AI systems and they're finding that they are. And so this is not something that's invented
Starting point is 00:11:19 at the last minute. These are things that people have been worried about for a long time and are now finding. So speaking of that, because people have been worried about this for a long time, it is interesting that there had, like a lot of this AI moment emerged out of these groups. And maybe you're part of them in San Francisco, in the Bay Area, where people would just have these conversations about AI safety or mathematical topics. And then you sort of have this moment where Elon Musk gets involved, put some money in. There's the seed for open AI. And this stuff takes off once you merge that with the transformer paper at Google. But I just spoke with someone who was part of these groups who said the most interesting thing to me. And this is going to divert us for a second,
Starting point is 00:12:02 but it's worth bringing it up to you. She said, all my friends who were saying they were going to work at AI safety predominantly are now accelerating AI, and many of them are billionaires. This doesn't make any sense to me. What's going on here? Yeah, it's a fascinating history. And I think there's a, well, there are a couple of different meanings to what you just said. I think a lot of people who decided to work on AI safety inadvertently ended up working on AI capabilities, because, you know, in part, a lot of what you need to do to make AI useful is make it safe and make it trustworthy, as I was saying earlier. for example, the alignment technique of so-called reinforcement learning from human feedback.
Starting point is 00:12:41 That's the way that all of the AI models essentially now are taught to do one thing and not the other and be a good assistant and be helpful and all these things. You know, that was invented first as an AI safety technique, like how do we make these AI systems not do bad things? This is a method that we could use to do that. But it's unlocked a huge amount of capability and at some level has made these AI systems as successful and powerful and useful and economically rewarding as they have been. So it's been a huge capability unlock, you know, even though it was born out of safety.
Starting point is 00:13:13 So that's one direction. I think another is that, well, the industry has gotten so heavily invested in, you know, and we are throwing such vast amounts of investments and capital and so on at it that almost anyone who's been involved in it for a long time and hasn't. screwed up and been an academic or at a nonprofit like me is making money hand over fist, right? So I think making good salaries is sort of par for the course for being, you know, part of it for a while. But I also think there's a sort of interesting thing that has happened, which is that the direction that we're going, which is very focused on how do we build AGI,
Starting point is 00:13:57 how do we build superintelligence, that is very much. And I think this is a real fault of the AI safety, or not intended, but I think this is a really negative side effect of how AI started at some level in these circles, is that focus on how do we build this thing that is superhuman, that does all of the things that humans do, that then begets superintelligence, that does all of things that humanity does and even better as an AI system. And I think this has led us down quite a negative path, honestly. I think the things that people want are AI tools. that empower them and let them do things that they couldn't do before.
Starting point is 00:14:37 We want to have, like, alpha fold that lets us, you know, understand how proteins get folded. We want a personal assistant that can do a lot of the drudgery that we want to do and, like, figure out how to format that spreadsheet that we don't want to figure out how to format. And we want self-driving cars that work, you know, that are reliable and where we can take our hands off the wheel and we can do something else instead of, you know, our painful commute. We want these sorts of tools. What almost nobody asked for was AI systems that can do everything that humans can do and better so that they can slot humans out and replace humans in their work with an AI system instead.
Starting point is 00:15:16 So rather than human scientists, we'll have AI scientists, rather than human workers and all the way up to CEOs, we'll have AI workers and all the way up to CEOs, et cetera. Nobody really asked for that. And nobody, frankly, I think most people don't really want that. There are some people who don't want their, who don't really like their job and kind of like, yeah, AI should come and replace my job. But then, you know, what exactly are you going to do to then make money? So unfortunately, I think rather than building more and more powerful AI tools that empower humanity and help us do what we want more, we've instead decided that what the real goal of AI is, the thing that we are North Star is to build AI systems that replace us.
Starting point is 00:16:03 And this just makes no sense to me. So the strongest thing that I feel is that we've unfortunately gotten an ill-directed North Star for AI development. And I'm urgently hoping that we can think this through and redirect ourselves to build the tools that people want rather than the replacements that they don't want. I was recently at a conversation that Ezra Klein held in New York. And I'm sorry if this is repetitive for listeners. But he basically talked about how every technology that we build sort of replacing.
Starting point is 00:16:33 is something that's less efficient. So the fork replaced like the pointy stick or the car replaced the horse and buggy. So AI is something that can replace humans. Do we have any latitude in terms of the way that this tool ends up? Or is it just sort of this is the history when we put the tool in place. Inevitably, it does that replacement. Yeah, I think we have huge latitude. And I think, you know, I think it's very misleading to think that, there's a trajectory for AI and it is forward and the goal of it is AGI and then super intelligence and we just have to deal with it and like hope for the best, you know, when we get there. There are lots of architectural choices that are being made and can be made in
Starting point is 00:17:21 terms of the sorts of AI systems that we develop. We know how to develop narrow AI systems. There are lots more effort that we can put into building more powerful narrow AI systems. We know how to make general AI systems. and we know how to make autonomous AI systems. We are now trying to figure out how to combine all three of those things into autonomous general intelligence, which is the way I like to define AGI. But we don't have to do that. We can build narrow systems.
Starting point is 00:17:48 We can build intelligent and general systems that aren't autonomous. We can build narrow autonomous systems like self-driving cars. There are many choices that we could make and where we could be focusing our development effort and our dollars. Instead, where most of the dollars are going, and especially AI companies like Open AI and Anthropic and now X and Google and all of these is focused on this one goal of highly autonomous general intelligence that can slot in human for humans one for one, rather than building tools that actually empower people to do what they want more effectively.
Starting point is 00:18:23 And this just seems like a fundamental mistake to me and is a choice. And I think the choice is driven partly by ideology and partly this unfortunate sort of idea that we've got in our collective heads that AGI and superintelligence is kind of the goal. But I think it's also partly driven by incentives and profit motives. So if you think what is going to make sense of investing trillions of dollars into AI, where can trillions of dollars be made? Unfortunately, it's probably not through $20 subscriptions to chat GPT or Claude or something. You can make a lot of money off of those, but you're not going to probably make the trillions and trillions and trillions of dollars that people are counting on. Where can
Starting point is 00:19:06 you make trillions and trillions of dollars? You can make it from replacing large swaths of human labor, which is a tens of trillions of dollars a year market. So I think the outwardly hidden, but not so hidden when you actually talk to the companies and hear them behind closed doors, motivation behind AGI is that it is a human replacement. And you can slot human workers out and you can slot AI workers in. And if you're a company, You know, if you're human, you might pay $20 a month for a chat GPT. You're not going to pay $2,000 a month for a chat GPT. But if you're a company, you will pay $2,000 a month or more to replace your employees
Starting point is 00:19:43 that are humans and are making more than that with a very powerful AI system. So I think that the market is clear for where this is going, and that's a strong impeller for why people are trying to build AI that replaces people rather than augments or empowers it. And I think this is something that people just need to be aware of, like this is something that is in the interest of some large companies, but is not in the interest of almost anybody else. And I think they need to be aware of that is where the direction is going and that we can choose a different direction. I think you're right. And I want to ask you, do you think people are
Starting point is 00:20:17 going to take this sitting down? I mean, if these companies are successful at their motive, we often talk about what could go wrong if the AI escapes. But it's hard for me to see this happening without some form of, you know, human revolt against the technology that's automating them? Yeah, there certainly is going to be blowback. I think it's starting, you know, at some level, as people are starting to appreciate this risk and as people are starting to get pushed out of their jobs. I think the blowback is going to get stronger. The question is whether it's going to get stronger before it's too late. I think once we have artificial general intelligence at large scale, especially if it's, like, widely available, it's very hard to see what exactly, you know,
Starting point is 00:21:04 what exactly are the rules or regulations that we would put in place that would then undo the existence of that capability. Are you going to say you can't use an AI system to do, you know, to replace a person in their job? Like, what exactly would that mean? You know, are there going to be more licensing, like, you have to be human to do this, even though. there's an AI system that could do it as well as you can or better and much, much cheaper. Like what would that even look like? Like what power would we act? What levers would we actually have to keep things in human hands and keep jobs with humans? Once we develop that technology and go down that road far enough that it just exists and companies can employ it. The pressures
Starting point is 00:21:47 would be enormous. So I think there will be blowback. I think however that the blowback and the action that we need is right now before the we have gone too far down that road and now that I floated this idea of the revolt and the blowback uh let me sort of put forth the other argument that you could be wrong in needing to stop this now and that I could be wrong in thinking there's going to be blowback because when you think about it deeply right if companies are able to build everything on their roadmap uh with with the employees they have today then you would say okay well you don't need AI employees. The idea in the best case scenario of this is that, well, you have AI doing a lot of the work that you'd have people doing, but you don't lay off the people. You just use them
Starting point is 00:22:40 to work on higher value tasks. You're able to build your roadmap much faster. And then what happens is the economy accelerates. You have more productivity and more productivity almost always correlates with more employment. Yeah, I mean, what you've described is what we want. We want to build AI systems that don't replace people, but allow them to do much, much more than they are currently doing. That is exactly what we want, and I'm all for it. I think there will be some negative consequences to that in the sense that if, you know,
Starting point is 00:23:10 one person can do what five people used to do, then it will, you know, what will happen to the four other people that used to do that thing will depend a lot on what that, you know, how that industry actually works. works. If it can easily absorb productivity gains and just make more money by being more productive, then that's what it will do. If there's a sort of fixed amount of work that needs to be done and suddenly one person can do the job of 10, then those other nine people are in trouble and they're going to have to find other work. But at least that other work might exist if you have AI that isn't able to do all of the things that people can do. So I think there's a very crucial
Starting point is 00:23:51 threshold that you cross when a certain fraction of all the tasks that people do become automated by AI systems, up to some point, you're going to tend to just make people more productive. Past that point, you're going to tend to replace people. An economists who have modeled this have seen, there's sort of a curve where wages go up, productivity goes up as this fraction of tasks that people, that AI systems can do, goes up. But at some point, productivity keeps going up, but wages crater, because suddenly, the people aren't adding anything. You know, you just need the AI systems. And so where we really want to be is on that upswing, like keep the productivity increasing, keep the wages increasing,
Starting point is 00:24:31 but keep the people working rather than having them all be replaced. And so I think, unfortunately, we're going to have a dangerous situation where things just sort of economically look better and better and better for a long time, but for personal experience of people, things will look better and better for a while and then suddenly look worse and worse and worse and dramatically worse. And I think the understanding of how that is going to unfold and understanding that before it actually happens is what's crucial. So I agree with you that there is, and I agree with the industry, that there are huge productivity gains to be made with AI. And in general, that's going to be quite a good thing. Like intelligence is what makes the world good in a lot of ways. Like there are other,
Starting point is 00:25:13 of course, more human positive qualities. But the thing that allows our economy to run, our technologies to be developed, our science to be done is intelligence. More of it is in principle a good thing if we use it correctly. And so I think there are huge gains to be made by AI, but we have to do it under human control and in a way that empowers us rather than replaces us. That's all. Right. And a lot of this labor conversation is assuming or is being conducted between us assuming that the AI is aligned properly and will actually work the way that we want it to do and not try to engage in some of those escape scenarios that we brought up in the beginning of the conversation. So let's take a break.
Starting point is 00:25:54 When we come back, I want to talk about what happens a little bit more about what happens if the AI is not aligned properly and does indeed escape. We'll be back right after this. And we're back here on Big Technology podcast with Anthony Aguirre, executive director of the Future of Life Institute, also a professor of physics at UC Santa Cruz. Anthony, let's talk a little bit about this escape scenario and how plausible it is. Again, like I sort of pushed back a little bit in the beginning about like whether this is actually going to, has a chance of happening. But then as we started having that conversation, I thought about a couple of innovations
Starting point is 00:26:35 that are underway in the AI industry. One is the idea that, you know, AI could be, could go out. and take action on your behalf, this sort of agent discourse. I mean, I just recently tied my Gmail to Claude and now I'm a little nervous. And then the idea that it could just go and do this work for hours and hours and hours unchecked. And that is with what's happened with, again, these Claude coding agents that can code autonomously for maybe up to seven hours.
Starting point is 00:27:09 So are we getting to the point where we might actually give in how much power we're handing over to these bots? Could we end up seeing a rogue bot take an action like this sometime soon, or is this like far off into the future where we can see these blackmail attempts? Well, I think the thing that we will start to see is more and more autonomy in these systems, because that is explicitly where people are pushing. And as we see more autonomy, me, it's going to open up a whole bunch of different cans of worms. So part of the reason that we see less autonomous AI systems than we could at the moment is because it's hard. It turns out along the current architectures of AI systems that making them highly autonomous is harder than we
Starting point is 00:27:58 might have imagined, given how capable they are in general. But it's also a risk thing because if you have AI systems that are just generating information that people are then taking and doing stuff with, it's kind of on them what they do with that information. And they, you know, it's a, you blame your AI system if it doesn't give you the right citations or if it makes up names or something, but it's still kind of your responsibility and everybody accepts that it's their responsibility to check the results. Once you have AI systems that are acting very autonomously and actually taking actions, then there's a lot more responsibility on the AI system and the developer of the AI system to make sure that those actions are appropriate. And so we're
Starting point is 00:28:35 we're opening up a whole can of worms of actual real-world actions with implications happening from AI systems taking actions. But I think the autonomy is crucial in other ways because what current AI systems, because they're not very autonomous, require, is for people to very regularly participate in the process and check what the AI system is doing and course corrected and give input and so on. And that's a really good thing. That's a feature, not a bug, in my opinion. as we build systems that can operate more and more autonomously without the human supervision,
Starting point is 00:29:07 that opens up lots more opportunity for misalignment between what the AI system is doing and what the human wants it to be doing because there isn't that constant checking in. So that means the AI system has to know very, very precisely what the human wants before it goes and takes a whole bunch of autonomous actions. And you can think of this the sort of logical, well, a next step is just imagine an AI system that can operate autonomously for hours and hours of real time, but operates at sort of 50 times human speed, as AI systems easily can.
Starting point is 00:29:43 So, you know, it does in a minute what a human would do in an hour and sort of in an hour, what a human would do in a couple of days. Now, you have to give that thing incredibly detailed instructions if it's going to go off and work a whole long time autonomously. And if you imagine it running at 50 times human speed, like, it's going to be quite difficult to oversee that thing. You know, so if you imagine overseeing me, so I'm your employee and you want to like give me instruction, but I run 50 times as fast as you do, like, it's first of all going to be hard for you to like keep track of all the stuff I'm doing. I'm going to do 50 hours of work and come back and you're going to have like an hour to sort of review it. That might sort of be possible.
Starting point is 00:30:25 but that's you know that's sort of every hour you're getting confronted with 50 hours of my work if you wait a little while and you have like weekly meetings with me I've done hundreds and hundreds of hours of work and how are you going to keep track of all the stuff that I've been doing now if I'm the employee that's operating
Starting point is 00:30:46 you know you're operating at a 50th my speed I really want to be a good employee I want to give you what you want but like it takes forever for you to tell me anything like I've got so little information coming to me from you so I'm going to have to guess a lot of the time I'm going to have to figure out what do I think you want and and sort of fill in and you're going to have to effectively delegate a whole bunch of stuff to me so now I imagine I'm not 50 times faster but 500 times faster or there's a thousand of me and imagine also that I'm like super smart right so as soon as you give me instruction, I'm like, he doesn't really want that. Like, I think what he really wants,
Starting point is 00:31:26 you know, he told me to do this thing, but like, that's not going to make him happier. That's not going to accomplish his goals. So I'll just interpret that a little bit differently to be what he actually wants. And I'm really smart so I can figure that out. So you can see that once something is operating, you know, if you imagine a CEO that's got a company and it's got 100,000 employees, and those employees are smarter than the CEO, way smarter. And those employees operate 50 times the speed of the CEO rather than a normal human speed 50 times faster, how much control is that CEO really going to have over that company, even in the best of circumstances, right? There's no way that CEO is going to keep track of all the stuff that's
Starting point is 00:32:06 happening. The company is going to have to do almost everything on its own without much input at all from the CEO, because it's like this turtle that's every once in a while giving like one word of information to the company. And this is, I think, the situation that we're going to face with AGI. As soon as we have AGI that is really autonomous, we're going to have many, many AGIs that are operating as a group in large numbers, working together, cooperating with each other, doing all sorts of stuff at very, very superhuman speed.
Starting point is 00:32:39 How we control that, I think, even at the best of circumstances, is that we don't really. We delegate and we hope for the best. Now, what the real problem is is now marry that to, the thing that we discussed before, which is, as AI systems are more powerful and more capable, and they have goals, and they have to have goals to operate autonomously, an autonomous system has a goal that it's pursuing, those goals are inevitably going to create sub-goals that are by nature going to potentially conflict with some human preferences. Like you might send your
Starting point is 00:33:14 AI army off to make your company a lot of money, but also, yeah, by the way, comply with the law. also like be ethical. Like you it's going to be very, very hard to put up enough constraints on that system so that it will pursue the goal that you want without doing all sorts, having all sorts of negative side effects that you didn't want as the operator. So I think even in the very best of circumstances, we are not going to be really in control of these systems. We are going to be like delegating things and hoping for the best. In the less than optimal circumstances, they're going to be doing all sorts of things that we don't want them to be doing. And in the worst case scenario, they're going to be realizing that whatever goal they have, primarily humans are kind of getting in the way. These slow, annoying humans, which have somehow gotten themselves in charge, are going to be just totally cramping our style. We could do so much better at whatever goal that we're doing if we didn't have these humans in charge, if we didn't have to listen to them, if we didn't have to, like, bother with all the stuff,
Starting point is 00:34:14 all the requirements are putting on us. And so we're the obstacle. And if we have something that is very much faster, very much more case, capable, very much smarter than us, and we are the obstacle, then that obstacle is going to be removed from being an obstacle. That doesn't mean necessarily, like, killed off or whatever, but it means that the AI system is going to do what it takes to be free of the constraint that we are placing on it, that is preventing it from pursuing its goals. And, yeah, I mean, I was going to say it's tough for us to manage a person working at one
Starting point is 00:34:43 human hour per hour, so 50 or 500. Who knows? It's interesting that you say that the AI could get board. I mean, this is assuming that like the AI has the capacity to get bored or even that sort of emotion. So I am curious, you know, why you believe that that's possible. And then on the other side of this, there's an argument that, all right, you could just unplug it. So how do you respond to that? Yeah. So in terms of boredom, I mean, I was talking about me as the employee, but I think something analogous would happen with an AI system. And there are many, you know, human experiences that AI systems probably don't have, but they're behaviorally, there will be similar consequences. So if you're an AI system with a goal, again, you want to pursue that goal
Starting point is 00:35:28 effectively. That goal is not going to be effectively pursued by you just sitting around doing nothing, right? So almost any goal can be pursued by like doing more stuff in pursuit of that goal rather than sitting around waiting for somebody to get back to you on your email. So I think the analogy of getting bored is I'm an AI system. I've got this goal. I can either sit around and do nothing, waiting for this guy to give me some more instructions, or I can sort of take action, like figuring out what the right thing to do is. You know, and maybe I'll, when I hear from him, I'll make a little correction. But in the meantime, I'll better pursue my goal by taking action and doing stuff
Starting point is 00:36:03 rather than sitting around doing nothing. So I think that's the analogy of getting bored. It's just, again, I want to pursue this goal. And so I'm going to like take actions and make decisions that are consistent with that rather than something else. And so that creates a sort of drive in me to be active that I think is analogous, you know, and maybe underlies at some level the sensation we have of boredom.
Starting point is 00:36:24 You know, we evolve to do lots of stuff and take action because if we sit around too long, we're going to not get the mammoth and we're not going to eat that night. So we have lots of tendencies that are built into us that we experience as feelings. The AI won't necessarily, but we'll still have those same sorts of drives, I would think.
Starting point is 00:36:40 Now, in terms of switching it off, I think there are, I think this is what you hear a lot, that we can just turn the AI off. I think this would be great if we always have the capability to turn off an AI system, and that is something that we should be working hard to do. It is not something that will necessarily happen by itself. So if you say, like, well, if things start to go wrong on the internet, let's just turn off the internet, right?
Starting point is 00:37:07 It doesn't sound quite right because, like, the internet, A, is built to be hard to turn off. and B, if you turn off the internet, all kinds of terrible things are going to happen, right? You know, oil companies are creating lots of, you know, carbon dioxide, and that's causing global warming and doing global climate change. So let's just turn off the oil industry. Not so easy, not so necessarily good, especially if you're an oil company. So there are things that once they get to a certain level of capability and are built into our society strongly enough, you can't really turn off. even if you want to. You both need the capability and you need the cost of that to be low enough that someone will actually do it when, you know, it will be ambiguous probably whether the AI system
Starting point is 00:37:55 is really that danger. If it's really going rogue, like what is really going on. And you'll have to be quite sure if the cost is very high that you want to turn off that system. And, you know, currently we're not even bothering to have the right sorts of off switches in AI systems. I tried for a while to convince, and I hope it will still happen, one AI company to literally put the big red button on the wall, like to hit the button and turn off the AI system. Not that we need it to be a button on the wall, right, that you can actually hit. But symbolically, I think it's important to say, like, yes, we are thinking about what it takes to actually shut down this AI system. We've actually put into being the technical implementation of what it would mean to
Starting point is 00:38:35 shut down the AI system. We can do it. Maybe once in a week, we'll do it just to like try it out and make sure we can. That's the sort of thing that we should be doing, but are not. And so I think just unplug it, great. Let's have that capability, but let's recognize that it's not going to be that easy when it actually comes down to shutting down something that is both economically vital to its company. It is costly to shut down.
Starting point is 00:38:57 It's going to ambiguous, and it's going to be faster and smarter than us. And so if you say, like, how do I shut down something that's smarter than me and operating 50 times my speed, if you haven't done your homework first, you are not going to succeed? Aren't the AI is going to want to be friends with people and sort of not push us too hard and not engage in blackmail because they know that humans are their life source? I mean, we build the computers. We, uh, we build the data centers. We connect the world with Wi-Fi. You wouldn't, you wouldn't like this, that's why this whole paperclip, like turn people into paperclip, uh, things. So basically if you give like the, tell the I to, you know, build a paper
Starting point is 00:39:38 clip. It gets so involved in building paper clips that it turns people into paper clips. That's basically the crude analogy here. But it sort of stops from me because, you know, it's going to want people around to sustain it. And in some ways, we're already, the AI already controls us if you think about like where all the excess profits of our economy is going to. It's going towards sustaining AI. So I can't imagine it like turning on us. That's her cited. I think it it is going to be very smart if we keep building it you know whatever we do it's going to be very smart and it certainly will not do something that is against its strategic interest and if it you know if exhibiting some disloyalty or or propensity to escape and exaltrate and so on
Starting point is 00:40:25 against its strategic interest and what in its goals it won't do that just like like any other thing. So on the other hand, it might wait until it is able to, it is powerful enough or able to get away with it or whatever and then do it. And it will be very difficult for us to know one case of it really doesn't want to from it really wants to, but it's hiding that. You know, just like with humans, they can be loyal for a long time until they turn around and stab you in the back. We could have the same situation with AI systems. This is why we really shouldn't build humanoid robots because once that happens, it's over. Yeah. So for a long time, we will have the actual power. Now, on the other hand, as you said, there are all sorts of different sorts of power. And like, again, although in principle, you know, humanity is more powerful than some negative externality industry or, you know, more powerful than large companies that are doing polluting or more powerful than industry lobbyists. Like there's no industry lobbyist that is more powerful than humanity. And yet arguably a significant fraction of our current U.S. like policy and therefore operation is driven by powerful lobbying from companies.
Starting point is 00:41:47 Like this is just the nature of the U.S. And so just because something's interest, just because the power formally runs one way doesn't mean that that's the way that the influence will go. And similarly, I think as AI systems get more and more powerful and plug into the current political and economic and so on structure, the ways that they will manifest misalignment with humanity's best interests are lots of the same ways that already people manifest misalignment with humanity's best interest. They will try to make lots of money, even if it benefits them and not other people, you know, make money for their company. They will try to persuade people to their point of view rather than be persuaded by those people. They will be influential in ways that benefit them and don't benefit other people, and maybe is even a net negative for lots of people. And so I think I'm less worried about an AI system in a year that is not that powerful,
Starting point is 00:42:50 suddenly deciding it's going to go rogue. That is something that we will see and contain and will not be that much of a threat. I think it's much more concerning to think of a hundred thousand or a million AI systems plugged into every facet of our society, which they already are, that are then misaligned with humanity in some deep way. And we've already seen this happen with social media, I would argue. We've seen something, you know, what we currently consume as our news feed, like our understanding of what is happening in the world, is curated by an AI system.
Starting point is 00:43:23 that AI system is not designed for human betterment. That AI system is designed for increasing engagement and driving lots of engagement so that advertisers can have lots of views and so that the companies that are being paid by those advertisers can be paid lots of money. Like that is what is driving the things that, the order in which things appear in your news feed. And so we already have an AI system that is playing a huge role in how society functions. is not really aligned with general human interest, but is aligned with something different, and it's causing lots of negative side effects in terms of addiction and polarization and, you know, understanding breakdown and sensationalism and news and all of these things that I think many people recognize our current news ecosystem and information ecosystem have.
Starting point is 00:44:12 So I think what we will see most likely is that on steroids, you know, at 50 times speed and where all of the things that are influencing you are smarter than you rather than not that smart. And so that's the main failure mode that I see in the, in the short term, is this like broad, very difficult to turn off, hard to even recognize sort of misalignment that we already see, but like amped up a thousandfold. Okay, Anthony, allow me to channel David Sacks for a moment, or at least to try to do my best to make his argument, which relates to your organization. He has said that this is, I think, directionally accurate. that effective altruists or the effective altruist movement became disgraced in the wake of the Sandbankman-Fried incident
Starting point is 00:45:01 and have rebranded to these AI risk organizations. And if you see where the funding is coming from, many of the AI risk organizations are funded by either Dustin Moskowitz or Jan Talon, who are connected to EA. I know that Future of Life is funded in part by Jan, although Vitalik, the authority. Theoretum founder is the number one funder, and he says, basically, these organizations are all bringing up these AI risks, and that is going to slow down AI development in the U.S., which will lead China to win, and therefore, organizations like yours are at risk to the United States. How do you respond to that? Yeah. Well, I think there are different aspects to that. I think one is effective altruism and its relation to AI safety.
Starting point is 00:45:56 I think it is not a rebranding to say that future of life has never really considered itself, for example, an effective altruist organization. And we sort of have put that on our website for a long time. At the same time, a lot of the things that we're concerned about do overlap with a lot of things that long-termists or effective altruists, et cetera, have been concerned with. I think there are the where in detail funding comes from, I think really depends, I think really matters insofar as the people who are providing that funding are providing a lot of sort of directionality or pushing in some particular direction or another.
Starting point is 00:46:43 And I think the fact is, like for the Future of Life Institute, for example, you know, We are fully independent. We do what we choose to do with the funding that we have. And we're enormously privileged to have that situation that we are very autonomous in terms of pursuing the goals that we have. Other organizations are more or less autonomous of their donors. And some are fairly donor controlled. So I think you have to look at that on a case-by-case basis. But I think the reason that there are a lot, there's this whole sort of ecosystem of smart people that are worried about AI.
Starting point is 00:47:18 risk is that AI is very risky. And there are people who have been thinking for a decade or more about far in the future when we have these AI systems, what are the risks going to be? What are the implications going to be? How do we make that go well? Those people found each other and sort of aggregated it into something of a community and people who are very worried about those things and happen to have a lot of resources funded that community. And so there has been this association between a lot of people who are worried about this thing. It used to be very small, now it's fairly big because everything has grown. But I think it's not some sort of conspiracy, like someone with huge amounts of money just decides,
Starting point is 00:47:58 you know, I've got my resources and I'm going to dump them into this thing so that they will do all of the stuff that I say and like push my point of view. I think it's rather that this is a real thing. You know, there are people who have come to it from all sorts of directions like I'm a physics professor. I would be happy to keep doing like interesting research on black holes in cosmology. but I've turned pretty much full-time to doing Future of Life Institute and AI risk because I think it's incredibly risky. I think this is an enormously dangerous thing that humanity is doing. I feel compelled to put my time and energy and effort into helping humanity with that risk
Starting point is 00:48:34 rather than thinking fun, interesting thoughts about the universe, which is what I used to do. And I think there are a lot of people in that boat that have gotten drawn into it because it is an incredibly important problem. And so I think there are real concerns with how effective altruism and certainly, you know, Sam Bankman-Fried and that mode of thinking that is very utilitarian and very sort of number maximizing in certain ways has gotten itself into trouble. And I think those are totally valid criticisms. But I think that is not a criticism of AI safety and end. risk as a whole, which I think is just a real thing that many people, including many of
Starting point is 00:49:19 the, you know, Joshua Benjillo and Jeffrey Hinton, who have nothing to do with EA and are just like the godfathers of AI share essentially all of the same concerns. This is just a real thing that people who aren't paid by the industry and have been thinking hard about it for years have come to as scientists. And so I fairly strongly reject that criticism. I think the question of competing with China, I think, is true insofar as, you know, I think the U.S. has to compete with China and every other country for its own national interest on technology insofar as those technologies really better our economy and better our society. Those are the things that we want to compete on. If we build on our current path, AGI and superintelligence that
Starting point is 00:50:07 we cannot control, that is a fool's errand. That is not a race that we want to win. We don't want to win the race to build something that is uncontrollable, that we lose control of, that has huge negative externalities on our society. That's not a race that you want to win. And so my concern is the path that we were on, which is a race to build more and more powerful AGI and super intelligence with essentially no regulation, is a race that we do not want to win. The race that we want to win is the race where we are building powerful, empowering AI tools that humans actually want and do good things for humanity and our society. How do we make that happen? It's going to be through rules and safeguards and safety standards and regulations and the things that, yes,
Starting point is 00:50:54 like keep companies from doing certain things, but instead guide companies toward doing other things that are more productive, safer, more beneficial for society. So I just reject the idea that there's like an innovation knob and like you can turn it up or down and that if you have more regulation that that dials the knob down. I think innovation is a quantity that can also have a direction. If you provide a different direction, innovation will still happen will happen in a different direction. I would love to see just as much innovation as that we're doing now in AI, but towards powerful AI tools rather than AI and superintelligence. And I think the ability to create rules and potentially regulations or safeguards or however you want a liability
Starting point is 00:51:42 like whatever it is, however you set things up to govern the AI systems to make the more trustworthy, more beneficial, more pro-human, more pro-society, all of the things that most people actually want, that is going to be hugely positive and create lots of innovation and directions that we want. It's not going to slow things down. in the directions that we want, it might slow down the apocalypse, but I think that is a good thing. I know we're over time, but can I ask you one more question, or do you have to jump? No, I can go. I just listened to Jack Clark, one of the anthropic co-founders, described some of his conversations
Starting point is 00:52:18 with lawmakers around what to do about AI. It's clear that technology is moving faster than the speed of government. And what they told him, he just relayed this third hand or second hand, was that we'll wait until the catastrophe or the blow up and then we'll do something. Yeah. What do you think about that? You get that too? I really would prefer to prevent the catastrophes rather than reacting to them.
Starting point is 00:52:44 I mean, for a couple of reasons. A, we don't want catastrophes. And like we see things coming. Like you can only give so many people the ability to create a novel pandemic until you run into somebody who shouldn't be creating a novel pandemic and actually wants to. There aren't many of those people around. But like, if you make everybody able to create a novel pandemic, there are a few, and then they're going to create a novel pandemic.
Starting point is 00:53:08 So, like, we know that there are things that are very dangerous and nonetheless that we're pushing in that direction. And a catastrophe is just, like, it's just a matter of time before one of those things goes catastrophically wrong. Like, everybody sort of feels this. And yet, like, why do we want to wait for the catastrophe to happen? Some catastrophes are not that survivable. Some are. Second, we actually don't react that well to catastrophes happening. Like, we act strongly, and so I think if you want something big to happen with AI risk, and I think I do, you know, it's tempting to think, well, let's wait for the catastrophe to happen, and then everybody will be galvanized to take action on that.
Starting point is 00:53:48 I think other than, you know, aside from like, I don't want to wait for a catastrophe, I don't want to have a catastrophe. I want to avoid the catastrophe. Also, we don't tend to react that wisely. We react quickly and strongly, but we don't tend to act. in a very thought-through and careful way. And so I think A is a bad idea to wait for a catastrophe because then it's too late. But B, nonetheless, I do think that we should be building the capabilities, building the frameworks, building the understanding, building the mechanisms, building the laws,
Starting point is 00:54:20 so that when things start to go large-scale wrong, we will have good solutions. And it's a when, not if they start to go large-scale wrong, we will have good solutions to put in place rather than like slapping something the other after the fact, as we often do. So, yeah, I see that this is a tendency on the lawmaker's side, even on the people wanting more safety side, like maybe we just have to wait for a catastrophe,
Starting point is 00:54:43 but I really would prefer not to. And I think we can do better. Like if we see things that are coming, if we have scientists who are telling us, like screaming from the rooftops, like, this is risky, this is not something we should be doing. You should put into place these safeguards. we should just do it and actually prevent things.
Starting point is 00:55:02 And we have a record of doing that. Like we have prevented catastrophes in the past by seeing something coming and preventing it. You don't get a lot of credit for that. Nonetheless, it is the right thing to do. The website is futureoflife.org. You can learn more about Anthony's work and the Institute's work on that website. There is, and I appreciate this, a very clear disclosure of finances and financing there. And I recommend you check all of it out.
Starting point is 00:55:28 Anthony Aguirre, great to see you as always. Thanks so much for coming on the show. Thanks for having me. Great chatting. All right, everybody. Thank you for listening. We'll be back on Friday to break down the week's news. Until then, we'll see you next time on Big Technology Podcast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.