Big Technology Podcast - Is The AI Going To Escape? — With Anthony Aguire
Episode Date: August 13, 2025Anthony Aguirre is the executive director of the Future of Life Institute. He joins Big Technology to discuss how AI could fail in the worst case and whether our push toward increasingly autonomous, g...eneral systems puts control out of reach. Tune in to hear how agentic systems operating at superhuman speed complicate oversight, and why “just unplug it” is naive. Hit play for a cool‑headed, nuanced conversation with clear takeaways you can use to evaluate AI strategy, policy, and risk. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
How bad could AI go in the worst-case scenario?
Let's look beyond the near-term risks and explore what could really happen if the wheels come completely off.
That's coming up right after this.
Welcome to Big Technology Podcast, a show for cool-headed and nuanced conversation of the tech world and beyond.
Well, on the show, we've explored a lot of the downsides of AI, a lot of the near-term risks, the business implications of what happens if things don't continue to accelerate a pace.
We haven't had a dedicated episode looking at what could happen if things really go wrong.
And so we're going to do that today.
We're joined today by, I think, the perfect guest for this conversation.
Anthony Aguirre is here.
He's the executive director at the Future of Life Institute and also a professor of physics at UC Santa Cruz.
Anthony, great to see you.
Thanks so much for having me on.
Great to see you.
Great to see you.
Nice to have a conversation again this time in public.
Suffice to say you're not excited about it.
all the progress that the AI industry is making? Well, that's not quite true. So there's lots of
progress in AI that I just love. I use AI models all the time. I love lots of AI applications
in science and technology. Lots of things where AI are tools that are letting us do things that
we couldn't do before. The thing that I'm concerned about is the direction that we're headed in,
which is toward increasingly autonomous and general and intelligence systems, things that
that we've been calling AGI for a long time.
And this, I think, is different at some level from what we've been doing.
And I think is where most of the danger lies, especially on the large scale and in the longer
term.
And there have been a number of studies in the training scenarios within the foundational
model companies or foundational research houses, which are frontier research labs,
actually I think is probably the best way to refer to them where AI has had this it seems like a value or an instinct to try to preserve itself in testing scenarios it's tried to copy its code out of the scenario when it thinks its values are being manipulated or it's also tried to in one instance blackmail the trainers to not change its values this was in an anthropic training scenario in order to preserve its encoded values there
is a belief within the AI industry that this is just complete BS and it's the research labs
in planning these scenarios within these bots and then being like, oh, look what they did
once they like, you know, ran the code that they initially like baked in to try to copy themselves
out of the training environment. What does your read on this? Is this the beginning of a potential
escape risk that we could see for AI? Well, I think what's important to know about these
sorts of strange behaviors is that they're completely predicted and pretty much unavoidable.
And they just follow from thinking through what it means to be an effective intelligent
system.
So if you're a system that is trying to pursue a goal, whether you're a person or a corporation
or an AI system and you've got some goal, there are, you know, if you're smart enough to
understand what that goal is and how it can be accomplished, then you're going to know that
there are things that you have to do in order to accomplish that goal. And so if you have an AI
system, you say your goal is to do X, and then you put this AI system in a scenario where
you're threatening its existence and it still wants to do X, or it still wants to accomplish
some large scale thing that has been baked into it. Of course, it's going to take that, it's going
to, you know, see that scenario and being a smart thing, figure out what do I have to do within that
scenario to still accomplish my goal. And if that's blackmail the user or exfiltrate myself and my
model waits to be operating somewhere else, or if it's fake something and pretend that I'm doing
the thing that I, you know, that they want me to do, but actually do something else, I'm going to do
those things. So I think that this is a problem that is going to get worse, not better, as we make
AI systems more general and more capable and more autonomous, because it's just intrinsic to
to how a thinking thing works.
And it's interesting that you're actually,
you are giving credence to these early signs of the AI acting out of a self-preservation value
because the critics would say a couple things.
They would say the trainers are giving the AI this, you know,
potential action that it can do.
It's a probabilistic system.
So of course it's going to take that action in some number of cases.
So it's not really a surprise.
it's being fed by these trainers and testers.
The other thing they would say is like,
ha-ha, an AI attempted to run a code to exfiltrate its values.
It was connected to nothing.
And we haven't seen this in any production level system yet.
So it is, in some ways, people are saying this is marketing
and this is a false alarm by the frontier labs
to make you want to use this technology.
in your Fortune 500 company, you know, connecting backend systems, but not a real risk to
humanity. What's your response to that? I think it's true that the reason AI systems haven't done
this, you know, in the real world is, well, that they haven't done this very much in real world
circumstances. That's just because they are not, the right circumstances have not been
available, like they haven't been in the scenarios that would lead to this. And mainly that the models
are not actually that strong and are not that goal directed at the moment. So I think we're actually
in kind of a sweet spot with AI at the moment where AI systems, even the intelligent and general
ones like GPT and Claude and Gemini and so on, they are pretty passive, right? They're not very
autonomous. They need a lot of handholding to do things. They function mainly as tools that really do
just do what people ask. And that's a good place to be. What people are trying very hard, what
companies are trying very hard to develop now are systems that are much more autonomous,
that is, they're able to take many more actions directed by some goals without people
holding their hands or giving them permission at every step of the way and helping them figure out
how to do it to do all of these things on their own. Though that level of autonomy combined with
even greater intelligence and generality is where I think a lot more of these issues are going
to start to arise. So I think we're deliberately pushing in the direction that is going to make
these sorts of behaviors more common rather than less.
In terms of the argument that like highlighting risks is some sort of nefarious scheme
to make AI seem more powerful so that people respect it more.
Like this is I find a frankly pretty bizarre argument.
Like no other industry does this ever.
Like you don't have nuclear power plant saying like, well, we might blow up because
we're so great and so powerful.
So please fund us.
more so we can build more great powerful things that might blow up and cause nuclear and meltdowns.
It's like we might have our airplanes. Our airplanes are so fast. They might just disintegrate
in the air. They're so fast to invest in us and like take our, take our airplanes. Like no other
industry does this. I think it's frankly, fairly nonsensical. I mean, I think there are,
there's lots of hype, you know, every company is going to hype its products and it's
going to, you know, twist things a little bit to make its product seem more powerful and compelling
and useful that they actually are. That's quite natural. But the idea that bringing up the risks
of AI systems is somehow a conspiracy by companies to have people, you know, buy them or invest in
the more. It just feels made up to me, frankly. Let me put the argument out there why it's less
nonsensical than you're portraying it. You have to think about the buyer, right? You know,
Deloitte isn't buying a nuclear power plant or a fighter plane.
They might buy, for instance, Anthropics large language model, and they want to make sure that when they're rolling this out for clients, they're rolling out the best.
So if you say hypothetically, oh, Anthropics AI, try to blackmail its trainer, it probably can transport some information from your backend system to your other backend system and make you 5% more efficient.
That is why people would say it's marketing and that is why you would see it in AI, but not elsewhere.
not in those industries that you brought up.
Well, I think it's more straightforward and just as easy to market your AI on the basis
of the actual tests that you do.
There are performance evaluations as well that aren't safety evaluations that are just
what are the capabilities of these systems.
Everybody is working very hard to compete with each other on the metrics.
There are all kinds of sophisticated evaluations you can do for, you know, how autonomous
is a system, how much can it run, you know, what level.
of, is it a five minute, a ten minute, a one hour human task going to do on its own?
They're very sophisticated evaluations that companies can and do, do, and they compete with
each other and exhibit to investors, I'm sure, and to buyers, I am sure. Why they would choose
to exhibit this AI system may blackmail you as a user rather than, look, this AI system
can act like do all these really like difficult tasks autonomously also makes no sense to me.
Like I think the main problem that people have with current AI systems is a lack of trust.
The AI systems confabulate and they go off the rails and they don't do exactly what you want
in all sorts of ways.
And I think if the model providers that developed more trustworthy systems that don't blackmail you,
that do check their citations before they give you, you know, a bunch of, like,
information and quotes and links and so on, they actually go and check them.
Is it a competitive advantage?
Because the biggest blocker at the moment for many users of AI at the high level is trust
and being able to actually rely on the model.
So, yeah, I hear you, but I frankly just don't, don't buy it.
I think there are lots of ways that corporations can hype their products without,
going down this road. I think it's just a smokescreen. I think the people, for example,
at Anthropic, they've been around for a long time worrying about the potential risks and how to
make safe, very powerful AI systems. Same with OpenAI back when it had people who were worried
about AI safety, like lots of them. It has many less now. But these are people who have been
worrying about this problem for a long time. They've been thinking about what could go wrong and
how. And now that the AI systems are here and are powerful, they're checking.
you know, all those things that we worried about, are they in fact happening with these powerful
AI systems and they're finding that they are. And so this is not something that's invented
at the last minute. These are things that people have been worried about for a long time
and are now finding. So speaking of that, because people have been worried about this for a long
time, it is interesting that there had, like a lot of this AI moment emerged out of these groups.
And maybe you're part of them in San Francisco, in the Bay Area, where people would just have
these conversations about AI safety or mathematical topics. And then you sort of have this moment
where Elon Musk gets involved, put some money in. There's the seed for open AI. And this stuff takes
off once you merge that with the transformer paper at Google. But I just spoke with someone who was
part of these groups who said the most interesting thing to me. And this is going to divert us for a second,
but it's worth bringing it up to you. She said, all my friends who were saying they were going to work
at AI safety predominantly are now accelerating AI, and many of them are billionaires.
This doesn't make any sense to me. What's going on here? Yeah, it's a fascinating history.
And I think there's a, well, there are a couple of different meanings to what you just said.
I think a lot of people who decided to work on AI safety inadvertently ended up working on
AI capabilities, because, you know, in part, a lot of what you need to do to make AI
useful is make it safe and make it trustworthy, as I was saying earlier.
for example, the alignment technique of so-called reinforcement learning from human feedback.
That's the way that all of the AI models essentially now are taught to do one thing and not the
other and be a good assistant and be helpful and all these things.
You know, that was invented first as an AI safety technique, like how do we make these AI systems
not do bad things?
This is a method that we could use to do that.
But it's unlocked a huge amount of capability and at some level has made these AI systems
as successful and powerful and useful and economically rewarding as they have been.
So it's been a huge capability unlock, you know, even though it was born out of safety.
So that's one direction.
I think another is that, well, the industry has gotten so heavily invested in, you know,
and we are throwing such vast amounts of investments and capital and so on at it
that almost anyone who's been involved in it for a long time and hasn't.
screwed up and been an academic or at a nonprofit like me is making money hand over fist,
right? So I think making good salaries is sort of par for the course for being, you know,
part of it for a while. But I also think there's a sort of interesting thing that has happened,
which is that the direction that we're going, which is very focused on how do we build AGI,
how do we build superintelligence, that is very much. And I think this is a real fault of the AI safety,
or not intended, but I think this is a really negative side effect of how AI started at some level
in these circles, is that focus on how do we build this thing that is superhuman, that does
all of the things that humans do, that then begets superintelligence, that does all of things
that humanity does and even better as an AI system.
And I think this has led us down quite a negative path, honestly.
I think the things that people want are AI tools.
that empower them and let them do things that they couldn't do before.
We want to have, like, alpha fold that lets us, you know, understand how proteins get folded.
We want a personal assistant that can do a lot of the drudgery that we want to do
and, like, figure out how to format that spreadsheet that we don't want to figure out how to format.
And we want self-driving cars that work, you know, that are reliable and where we can take
our hands off the wheel and we can do something else instead of, you know, our painful commute.
We want these sorts of tools.
What almost nobody asked for was AI systems that can do everything that humans can do and better
so that they can slot humans out and replace humans in their work with an AI system instead.
So rather than human scientists, we'll have AI scientists, rather than human workers and all the way up to CEOs,
we'll have AI workers and all the way up to CEOs, et cetera.
Nobody really asked for that.
And nobody, frankly, I think most people don't really want that.
There are some people who don't want their, who don't really like their job and kind of like, yeah, AI should come and replace my job.
But then, you know, what exactly are you going to do to then make money?
So unfortunately, I think rather than building more and more powerful AI tools that empower humanity and help us do what we want more,
we've instead decided that what the real goal of AI is, the thing that we are North Star is to build AI systems that replace us.
And this just makes no sense to me.
So the strongest thing that I feel is that we've unfortunately gotten an ill-directed North Star
for AI development.
And I'm urgently hoping that we can think this through and redirect ourselves to build
the tools that people want rather than the replacements that they don't want.
I was recently at a conversation that Ezra Klein held in New York.
And I'm sorry if this is repetitive for listeners.
But he basically talked about how every technology that we build sort of replacing.
is something that's less efficient. So the fork replaced like the pointy stick or the car
replaced the horse and buggy. So AI is something that can replace humans. Do we have any
latitude in terms of the way that this tool ends up? Or is it just sort of this is the history
when we put the tool in place. Inevitably, it does that replacement. Yeah, I think we have
huge latitude. And I think, you know, I think it's very misleading to think that,
there's a trajectory for AI and it is forward and the goal of it is AGI and then super
intelligence and we just have to deal with it and like hope for the best, you know, when we get
there. There are lots of architectural choices that are being made and can be made in
terms of the sorts of AI systems that we develop. We know how to develop narrow AI systems.
There are lots more effort that we can put into building more powerful narrow AI systems.
We know how to make general AI systems.
and we know how to make autonomous AI systems.
We are now trying to figure out how to combine all three of those things
into autonomous general intelligence, which is the way I like to define AGI.
But we don't have to do that.
We can build narrow systems.
We can build intelligent and general systems that aren't autonomous.
We can build narrow autonomous systems like self-driving cars.
There are many choices that we could make and where we could be focusing our development
effort and our dollars.
Instead, where most of the dollars are going,
and especially AI companies like Open AI and Anthropic and now X and Google and all of these
is focused on this one goal of highly autonomous general intelligence that can slot in human for humans one for one,
rather than building tools that actually empower people to do what they want more effectively.
And this just seems like a fundamental mistake to me and is a choice.
And I think the choice is driven partly by ideology and partly this unfortunate
sort of idea that we've got in our collective heads that AGI and superintelligence is kind of the
goal. But I think it's also partly driven by incentives and profit motives. So if you think
what is going to make sense of investing trillions of dollars into AI, where can trillions of
dollars be made? Unfortunately, it's probably not through $20 subscriptions to chat GPT or
Claude or something. You can make a lot of money off of those, but you're not going to probably
make the trillions and trillions and trillions of dollars that people are counting on. Where can
you make trillions and trillions of dollars? You can make it from replacing large swaths of human
labor, which is a tens of trillions of dollars a year market. So I think the outwardly hidden,
but not so hidden when you actually talk to the companies and hear them behind closed doors,
motivation behind AGI is that it is a human replacement. And you can slot human workers out
and you can slot AI workers in. And if you're a company,
You know, if you're human, you might pay $20 a month for a chat GPT.
You're not going to pay $2,000 a month for a chat GPT.
But if you're a company, you will pay $2,000 a month or more to replace your employees
that are humans and are making more than that with a very powerful AI system.
So I think that the market is clear for where this is going, and that's a strong impeller
for why people are trying to build AI that replaces people rather than augments or empowers
it.
And I think this is something that people just need to be aware of, like this is something
that is in the interest of some large companies, but is not in the interest of almost anybody
else. And I think they need to be aware of that is where the direction is going and that we can
choose a different direction. I think you're right. And I want to ask you, do you think people are
going to take this sitting down? I mean, if these companies are successful at their motive,
we often talk about what could go wrong if the AI escapes. But it's hard for me to see this
happening without some form of, you know, human revolt against the technology that's automating
them? Yeah, there certainly is going to be blowback. I think it's starting, you know, at some
level, as people are starting to appreciate this risk and as people are starting to get pushed out
of their jobs. I think the blowback is going to get stronger. The question is whether it's going
to get stronger before it's too late. I think once we have artificial general intelligence at large
scale, especially if it's, like, widely available, it's very hard to see what exactly, you know,
what exactly are the rules or regulations that we would put in place that would then undo the
existence of that capability. Are you going to say you can't use an AI system to do, you know,
to replace a person in their job? Like, what exactly would that mean? You know, are there going to be
more licensing, like, you have to be human to do this, even though.
there's an AI system that could do it as well as you can or better and much, much cheaper.
Like what would that even look like? Like what power would we act? What levers would we actually
have to keep things in human hands and keep jobs with humans? Once we develop that technology
and go down that road far enough that it just exists and companies can employ it. The pressures
would be enormous. So I think there will be blowback. I think however that the blowback and the
action that we need is right now before the we have gone too far down that road and now that I
floated this idea of the revolt and the blowback uh let me sort of put forth the other argument that
you could be wrong in needing to stop this now and that I could be wrong in thinking there's going to be
blowback because when you think about it deeply right if companies are able to build everything on
their roadmap uh with with the employees they have today then you would say okay well you don't need
AI employees. The idea in the best case scenario of this is that, well, you have AI doing a lot
of the work that you'd have people doing, but you don't lay off the people. You just use them
to work on higher value tasks. You're able to build your roadmap much faster. And then what happens
is the economy accelerates. You have more productivity and more productivity almost always correlates
with more employment.
Yeah, I mean, what you've described is what we want.
We want to build AI systems that don't replace people, but allow them to do much, much
more than they are currently doing.
That is exactly what we want, and I'm all for it.
I think there will be some negative consequences to that in the sense that if, you know,
one person can do what five people used to do, then it will, you know, what will happen
to the four other people that used to do that thing will depend a lot on what that, you know,
how that industry actually works.
works. If it can easily absorb productivity gains and just make more money by being more productive,
then that's what it will do. If there's a sort of fixed amount of work that needs to be done
and suddenly one person can do the job of 10, then those other nine people are in trouble and
they're going to have to find other work. But at least that other work might exist if you have
AI that isn't able to do all of the things that people can do. So I think there's a very crucial
threshold that you cross when a certain fraction of all the tasks that people do become automated
by AI systems, up to some point, you're going to tend to just make people more productive.
Past that point, you're going to tend to replace people.
An economists who have modeled this have seen, there's sort of a curve where wages go up,
productivity goes up as this fraction of tasks that people, that AI systems can do, goes up.
But at some point, productivity keeps going up, but wages crater, because suddenly,
the people aren't adding anything. You know, you just need the AI systems. And so where we really
want to be is on that upswing, like keep the productivity increasing, keep the wages increasing,
but keep the people working rather than having them all be replaced. And so I think,
unfortunately, we're going to have a dangerous situation where things just sort of economically
look better and better and better for a long time, but for personal experience of people,
things will look better and better for a while and then suddenly look worse and worse and worse
and dramatically worse. And I think the understanding of how that is going to unfold and understanding
that before it actually happens is what's crucial. So I agree with you that there is, and I agree with
the industry, that there are huge productivity gains to be made with AI. And in general, that's going to
be quite a good thing. Like intelligence is what makes the world good in a lot of ways. Like there are other,
of course, more human positive qualities. But the thing that allows our economy to run, our technologies
to be developed, our science to be done is intelligence. More of it is in principle a good thing
if we use it correctly. And so I think there are huge gains to be made by AI, but we have to do it
under human control and in a way that empowers us rather than replaces us. That's all.
Right. And a lot of this labor conversation is assuming or is being conducted between us
assuming that the AI is aligned properly and will actually work the way that we want it to do
and not try to engage in some of those escape scenarios that we brought up in the beginning of the conversation.
So let's take a break.
When we come back, I want to talk about what happens a little bit more about what happens if the AI is not aligned properly and does indeed escape.
We'll be back right after this.
And we're back here on Big Technology podcast with Anthony Aguirre, executive director of the Future of Life Institute, also a professor of
physics at UC Santa Cruz.
Anthony, let's talk a little bit about this escape scenario and how plausible it is.
Again, like I sort of pushed back a little bit in the beginning about like whether this
is actually going to, has a chance of happening.
But then as we started having that conversation, I thought about a couple of innovations
that are underway in the AI industry.
One is the idea that, you know, AI could be, could go out.
and take action on your behalf, this sort of agent discourse.
I mean, I just recently tied my Gmail to Claude and now I'm a little nervous.
And then the idea that it could just go and do this work for hours and hours and
hours unchecked.
And that is with what's happened with, again, these Claude coding agents that can code autonomously
for maybe up to seven hours.
So are we getting to the point where we might actually give in how much power we're
handing over to these bots? Could we end up seeing a rogue bot take an action like this
sometime soon, or is this like far off into the future where we can see these blackmail
attempts? Well, I think the thing that we will start to see is more and more autonomy in these
systems, because that is explicitly where people are pushing. And as we see more autonomy,
me, it's going to open up a whole bunch of different cans of worms. So part of the reason that we
see less autonomous AI systems than we could at the moment is because it's hard. It turns out
along the current architectures of AI systems that making them highly autonomous is harder than we
might have imagined, given how capable they are in general. But it's also a risk thing because
if you have AI systems that are just generating information that people are then taking and doing
stuff with, it's kind of on them what they do with that information. And they, you know,
it's a, you blame your AI system if it doesn't give you the right citations or if it makes up
names or something, but it's still kind of your responsibility and everybody accepts that it's
their responsibility to check the results. Once you have AI systems that are acting very
autonomously and actually taking actions, then there's a lot more responsibility on the AI system
and the developer of the AI system to make sure that those actions are appropriate. And so we're
we're opening up a whole can of worms of actual real-world actions with implications happening
from AI systems taking actions.
But I think the autonomy is crucial in other ways because what current AI systems, because
they're not very autonomous, require, is for people to very regularly participate in the process
and check what the AI system is doing and course corrected and give input and so on.
And that's a really good thing.
That's a feature, not a bug, in my opinion.
as we build systems that can operate more and more autonomously without the human supervision,
that opens up lots more opportunity for misalignment between what the AI system is doing
and what the human wants it to be doing because there isn't that constant checking in.
So that means the AI system has to know very, very precisely what the human wants before it goes
and takes a whole bunch of autonomous actions.
And you can think of this the sort of logical, well, a next step is just imagine an AI system
that can operate autonomously for hours and hours of real time,
but operates at sort of 50 times human speed,
as AI systems easily can.
So, you know, it does in a minute what a human would do in an hour
and sort of in an hour, what a human would do in a couple of days.
Now, you have to give that thing incredibly detailed instructions
if it's going to go off and work a whole long time autonomously.
And if you imagine it running at 50 times human speed, like, it's going to be quite difficult to oversee that thing.
You know, so if you imagine overseeing me, so I'm your employee and you want to like give me instruction, but I run 50 times as fast as you do, like, it's first of all going to be hard for you to like keep track of all the stuff I'm doing.
I'm going to do 50 hours of work and come back and you're going to have like an hour to sort of review it.
That might sort of be possible.
but that's you know
that's sort of every hour
you're getting confronted with 50 hours of my work
if you wait a little while and you have like
weekly meetings with me I've done
hundreds and hundreds of hours of work
and how are you going to keep track of all the stuff that I've been doing
now if I'm the employee that's operating
you know you're operating at a 50th my speed
I really want to be a good employee I want to give you what you want
but like it takes forever for you to tell me anything like I've got so little information coming to me
from you so I'm going to have to guess a lot of the time I'm going to have to figure out what do I
think you want and and sort of fill in and you're going to have to effectively delegate a whole
bunch of stuff to me so now I imagine I'm not 50 times faster but 500 times faster or there's
a thousand of me and imagine also that I'm like super smart right so as soon as you give me
instruction, I'm like, he doesn't really want that. Like, I think what he really wants,
you know, he told me to do this thing, but like, that's not going to make him happier.
That's not going to accomplish his goals. So I'll just interpret that a little bit differently
to be what he actually wants. And I'm really smart so I can figure that out. So you can see that
once something is operating, you know, if you imagine a CEO that's got a company and it's got
100,000 employees, and those employees are smarter than the CEO, way smarter. And those
employees operate 50 times the speed of the CEO rather than a normal human speed 50 times
faster, how much control is that CEO really going to have over that company, even in the best
of circumstances, right? There's no way that CEO is going to keep track of all the stuff that's
happening. The company is going to have to do almost everything on its own without much input
at all from the CEO, because it's like this turtle that's every once in a while giving like
one word of information to the company. And this is, I think, the situation that we're going to face
with AGI.
As soon as we have AGI that is really autonomous,
we're going to have many, many AGIs that are operating as a group in large numbers,
working together, cooperating with each other,
doing all sorts of stuff at very, very superhuman speed.
How we control that, I think, even at the best of circumstances,
is that we don't really.
We delegate and we hope for the best.
Now, what the real problem is is now marry that to,
the thing that we discussed before, which is, as AI systems are more powerful and more capable,
and they have goals, and they have to have goals to operate autonomously, an autonomous system
has a goal that it's pursuing, those goals are inevitably going to create sub-goals that are
by nature going to potentially conflict with some human preferences. Like you might send your
AI army off to make your company a lot of money, but also, yeah, by the way, comply with the law.
also like be ethical. Like you it's going to be very, very hard to put up enough constraints on that system so that it will pursue the goal that you want without doing all sorts, having all sorts of negative side effects that you didn't want as the operator. So I think even in the very best of circumstances, we are not going to be really in control of these systems. We are going to be like delegating things and hoping for the best. In the less than optimal circumstances, they're going to be doing all sorts of things that we don't want them to be doing.
And in the worst case scenario, they're going to be realizing that whatever goal they have,
primarily humans are kind of getting in the way.
These slow, annoying humans, which have somehow gotten themselves in charge,
are going to be just totally cramping our style.
We could do so much better at whatever goal that we're doing if we didn't have these humans in charge,
if we didn't have to listen to them, if we didn't have to, like, bother with all the stuff,
all the requirements are putting on us.
And so we're the obstacle.
And if we have something that is very much faster, very much more case,
capable, very much smarter than us, and we are the obstacle, then that obstacle is going to be
removed from being an obstacle. That doesn't mean necessarily, like, killed off or whatever,
but it means that the AI system is going to do what it takes to be free of the constraint that
we are placing on it, that is preventing it from pursuing its goals.
And, yeah, I mean, I was going to say it's tough for us to manage a person working at one
human hour per hour, so 50 or 500. Who knows? It's interesting that you say that the AI could get
board. I mean, this is assuming that like the AI has the capacity to get bored or even that
sort of emotion. So I am curious, you know, why you believe that that's possible. And then on the
other side of this, there's an argument that, all right, you could just unplug it. So how do you
respond to that? Yeah. So in terms of boredom, I mean, I was talking about me as the employee,
but I think something analogous would happen with an AI system. And there are many, you know,
human experiences that AI systems probably don't have, but they're behaviorally, there will be
similar consequences. So if you're an AI system with a goal, again, you want to pursue that goal
effectively. That goal is not going to be effectively pursued by you just sitting around doing
nothing, right? So almost any goal can be pursued by like doing more stuff in pursuit of that goal
rather than sitting around waiting for somebody to get back to you on your email. So I think
the analogy of getting bored is I'm an AI system. I've got this goal. I can either
sit around and do nothing, waiting for this guy to give me some more instructions,
or I can sort of take action, like figuring out what the right thing to do is.
You know, and maybe I'll, when I hear from him, I'll make a little correction.
But in the meantime, I'll better pursue my goal by taking action and doing stuff
rather than sitting around doing nothing.
So I think that's the analogy of getting bored.
It's just, again, I want to pursue this goal.
And so I'm going to like take actions and make decisions that are consistent with that
rather than something else.
And so that creates a sort of drive in me to be active
that I think is analogous, you know,
and maybe underlies at some level the sensation we have of boredom.
You know, we evolve to do lots of stuff and take action
because if we sit around too long,
we're going to not get the mammoth
and we're not going to eat that night.
So we have lots of tendencies that are built into us
that we experience as feelings.
The AI won't necessarily,
but we'll still have those same sorts of drives, I would think.
Now, in terms of switching it off,
I think there are,
I think this is what you hear a lot, that we can just turn the AI off.
I think this would be great if we always have the capability to turn off an AI system,
and that is something that we should be working hard to do.
It is not something that will necessarily happen by itself.
So if you say, like, well, if things start to go wrong on the internet, let's just turn
off the internet, right?
It doesn't sound quite right because, like, the internet, A, is built to be hard to turn off.
and B, if you turn off the internet, all kinds of terrible things are going to happen, right?
You know, oil companies are creating lots of, you know, carbon dioxide, and that's causing global warming and doing global climate change.
So let's just turn off the oil industry.
Not so easy, not so necessarily good, especially if you're an oil company.
So there are things that once they get to a certain level of capability and are built into our society strongly enough, you can't really turn off.
even if you want to. You both need the capability and you need the cost of that to be low enough
that someone will actually do it when, you know, it will be ambiguous probably whether the AI system
is really that danger. If it's really going rogue, like what is really going on. And you'll have to be
quite sure if the cost is very high that you want to turn off that system. And, you know,
currently we're not even bothering to have the right sorts of off switches in AI systems. I tried for a
while to convince, and I hope it will still happen, one AI company to literally put the
big red button on the wall, like to hit the button and turn off the AI system. Not that we need
it to be a button on the wall, right, that you can actually hit. But symbolically, I think it's
important to say, like, yes, we are thinking about what it takes to actually shut down this
AI system. We've actually put into being the technical implementation of what it would mean to
shut down the AI system. We can do it. Maybe once in a week, we'll do it just to like try it out
and make sure we can.
That's the sort of thing that we should be doing, but are not.
And so I think just unplug it, great.
Let's have that capability, but let's recognize that it's not going to be that easy
when it actually comes down to shutting down something that is both economically vital
to its company.
It is costly to shut down.
It's going to ambiguous, and it's going to be faster and smarter than us.
And so if you say, like, how do I shut down something that's smarter than me
and operating 50 times my speed, if you haven't done your homework first, you are not going to
succeed? Aren't the AI is going to want to be friends with people and sort of not push us too
hard and not engage in blackmail because they know that humans are their life source? I mean,
we build the computers. We, uh, we build the data centers. We connect the world with Wi-Fi.
You wouldn't, you wouldn't like this, that's why this whole paperclip, like turn people into
paperclip, uh, things. So basically if you give like the, tell the I to, you know, build a paper
clip. It gets so involved in building paper clips that it turns people into paper clips. That's
basically the crude analogy here. But it sort of stops from me because, you know, it's going to want
people around to sustain it. And in some ways, we're already, the AI already controls us if you
think about like where all the excess profits of our economy is going to. It's going towards
sustaining AI. So I can't imagine it like turning on us. That's her cited. I think it
it is going to be very smart if we keep building it you know whatever we do it's going to be
very smart and it certainly will not do something that is against its strategic interest and
if it you know if exhibiting some disloyalty or or propensity to escape and exaltrate and so on
against its strategic interest and what in its goals it won't do that just like like any other
thing. So on the other hand, it might wait until it is able to, it is powerful enough or able to get away with it or whatever and then do it. And it will be very difficult for us to know one case of it really doesn't want to from it really wants to, but it's hiding that. You know, just like with humans, they can be loyal for a long time until they turn around and stab you in the back. We could have the same situation with AI systems.
This is why we really shouldn't build humanoid robots because once that happens, it's over. Yeah.
So for a long time, we will have the actual power.
Now, on the other hand, as you said, there are all sorts of different sorts of power.
And like, again, although in principle, you know, humanity is more powerful than some negative externality industry or, you know, more powerful than large companies that are doing polluting or more powerful than industry lobbyists.
Like there's no industry lobbyist that is more powerful than humanity.
And yet arguably a significant fraction of our current U.S. like policy and therefore operation is driven by powerful lobbying from companies.
Like this is just the nature of the U.S.
And so just because something's interest, just because the power formally runs one way doesn't mean that that's the way that the influence will go.
And similarly, I think as AI systems get more and more powerful and plug into the current political and economic and so on structure, the ways that they will manifest misalignment with humanity's best interests are lots of the same ways that already people manifest misalignment with humanity's best interest.
They will try to make lots of money, even if it benefits them and not other people, you know, make money for their company.
They will try to persuade people to their point of view rather than be persuaded by those people.
They will be influential in ways that benefit them and don't benefit other people,
and maybe is even a net negative for lots of people.
And so I think I'm less worried about an AI system in a year that is not that powerful,
suddenly deciding it's going to go rogue.
That is something that we will see and contain and will not be that much of a threat.
I think it's much more concerning to think of a hundred thousand or a million AI systems
plugged into every facet of our society, which they already are, that are then misaligned
with humanity in some deep way.
And we've already seen this happen with social media, I would argue.
We've seen something, you know, what we currently consume as our news feed, like our understanding
of what is happening in the world, is curated by an AI system.
that AI system is not designed for human betterment.
That AI system is designed for increasing engagement and driving lots of engagement so that advertisers can have lots of views and so that the companies that are being paid by those advertisers can be paid lots of money.
Like that is what is driving the things that, the order in which things appear in your news feed.
And so we already have an AI system that is playing a huge role in how society functions.
is not really aligned with general human interest, but is aligned with something different,
and it's causing lots of negative side effects in terms of addiction and polarization
and, you know, understanding breakdown and sensationalism and news and all of these things
that I think many people recognize our current news ecosystem and information ecosystem have.
So I think what we will see most likely is that on steroids, you know, at 50 times speed and
where all of the things that are influencing you are smarter than you rather than not that
smart. And so that's the main failure mode that I see in the, in the short term, is this like
broad, very difficult to turn off, hard to even recognize sort of misalignment that we already
see, but like amped up a thousandfold. Okay, Anthony, allow me to channel David Sacks for a
moment, or at least to try to do my best to make his argument, which relates to your organization.
He has said that this is, I think, directionally accurate.
that effective altruists or the effective altruist movement became disgraced in the wake of the Sandbankman-Fried incident
and have rebranded to these AI risk organizations.
And if you see where the funding is coming from, many of the AI risk organizations are funded by either Dustin Moskowitz or Jan Talon, who are connected to EA.
I know that Future of Life is funded in part by Jan, although Vitalik, the authority.
Theoretum founder is the number one funder, and he says, basically, these organizations are
all bringing up these AI risks, and that is going to slow down AI development in the U.S.,
which will lead China to win, and therefore, organizations like yours are at risk to the United
States. How do you respond to that? Yeah. Well, I think there are different aspects to that.
I think one is effective altruism and its relation to AI safety.
I think it is not a rebranding to say that future of life has never really considered itself,
for example, an effective altruist organization.
And we sort of have put that on our website for a long time.
At the same time, a lot of the things that we're concerned about do overlap with a lot of things that long-termists or effective altruists, et cetera,
have been concerned with.
I think there are the where in detail funding comes from, I think really depends,
I think really matters insofar as the people who are providing that funding are providing
a lot of sort of directionality or pushing in some particular direction or another.
And I think the fact is, like for the Future of Life Institute, for example, you know,
We are fully independent.
We do what we choose to do with the funding that we have.
And we're enormously privileged to have that situation that we are very autonomous in terms of pursuing the goals that we have.
Other organizations are more or less autonomous of their donors.
And some are fairly donor controlled.
So I think you have to look at that on a case-by-case basis.
But I think the reason that there are a lot, there's this whole sort of ecosystem of smart people that are worried about AI.
risk is that AI is very risky. And there are people who have been thinking for a decade or more
about far in the future when we have these AI systems, what are the risks going to be?
What are the implications going to be? How do we make that go well? Those people found each other
and sort of aggregated it into something of a community and people who are very worried about
those things and happen to have a lot of resources funded that community. And so there has been
this association between a lot of people who are worried about this thing.
It used to be very small, now it's fairly big because everything has grown.
But I think it's not some sort of conspiracy, like someone with huge amounts of money just decides,
you know, I've got my resources and I'm going to dump them into this thing so that they will do all of the stuff that I say and like push my point of view.
I think it's rather that this is a real thing.
You know, there are people who have come to it from all sorts of directions like I'm a physics professor.
I would be happy to keep doing like interesting research on black holes in cosmology.
but I've turned pretty much full-time to doing Future of Life Institute and AI risk
because I think it's incredibly risky.
I think this is an enormously dangerous thing that humanity is doing.
I feel compelled to put my time and energy and effort into helping humanity with that risk
rather than thinking fun, interesting thoughts about the universe, which is what I used to do.
And I think there are a lot of people in that boat that have gotten drawn into it because
it is an incredibly important problem.
And so I think there are real concerns with how effective altruism and certainly, you know,
Sam Bankman-Fried and that mode of thinking that is very utilitarian and very sort of number maximizing in certain ways has gotten itself into trouble.
And I think those are totally valid criticisms.
But I think that is not a criticism of AI safety and end.
risk as a whole, which I think is just a real thing that many people, including many of
the, you know, Joshua Benjillo and Jeffrey Hinton, who have nothing to do with EA and are just
like the godfathers of AI share essentially all of the same concerns. This is just a real thing
that people who aren't paid by the industry and have been thinking hard about it for years
have come to as scientists. And so I fairly strongly reject that criticism. I think the question
of competing with China, I think, is true insofar as, you know, I think the U.S. has to compete
with China and every other country for its own national interest on technology insofar as
those technologies really better our economy and better our society. Those are the things
that we want to compete on. If we build on our current path, AGI and superintelligence that
we cannot control, that is a fool's errand. That is not a race that we want to win. We don't want to
win the race to build something that is uncontrollable, that we lose control of, that has huge
negative externalities on our society. That's not a race that you want to win. And so my concern
is the path that we were on, which is a race to build more and more powerful AGI and super
intelligence with essentially no regulation, is a race that we do not want to win. The race that we
want to win is the race where we are building powerful, empowering AI tools that humans actually
want and do good things for humanity and our society. How do we make that happen? It's going to be
through rules and safeguards and safety standards and regulations and the things that, yes,
like keep companies from doing certain things, but instead guide companies toward doing other
things that are more productive, safer, more beneficial for society. So I just reject the idea
that there's like an innovation knob and like you can turn it up or down and that if you have
more regulation that that dials the knob down. I think innovation is a quantity that can also
have a direction. If you provide a different direction, innovation will still happen will happen
in a different direction. I would love to see just as much innovation as that we're doing now in
AI, but towards powerful AI tools rather than AI and superintelligence. And I think the
ability to create rules and potentially regulations or safeguards or however you want a liability
like whatever it is, however you set things up to govern the AI systems to make the more
trustworthy, more beneficial, more pro-human, more pro-society, all of the things that most people
actually want, that is going to be hugely positive and create lots of innovation and directions
that we want. It's not going to slow things down.
in the directions that we want, it might slow down the apocalypse, but I think that is a good thing.
I know we're over time, but can I ask you one more question, or do you have to jump?
No, I can go.
I just listened to Jack Clark, one of the anthropic co-founders, described some of his conversations
with lawmakers around what to do about AI.
It's clear that technology is moving faster than the speed of government.
And what they told him, he just relayed this third hand or second hand, was that we'll wait
until the catastrophe or the blow up and then we'll do something.
Yeah.
What do you think about that?
You get that too?
I really would prefer to prevent the catastrophes rather than reacting to them.
I mean, for a couple of reasons.
A, we don't want catastrophes.
And like we see things coming.
Like you can only give so many people the ability to create a novel pandemic
until you run into somebody who shouldn't be creating a novel pandemic and actually
wants to.
There aren't many of those people around.
But like, if you make everybody able to create a novel pandemic, there are a few, and then they're going to create a novel pandemic.
So, like, we know that there are things that are very dangerous and nonetheless that we're pushing in that direction.
And a catastrophe is just, like, it's just a matter of time before one of those things goes catastrophically wrong.
Like, everybody sort of feels this.
And yet, like, why do we want to wait for the catastrophe to happen?
Some catastrophes are not that survivable.
Some are.
Second, we actually don't react that well to catastrophes happening.
Like, we act strongly, and so I think if you want something big to happen with AI risk, and I think I do, you know, it's tempting to think, well, let's wait for the catastrophe to happen, and then everybody will be galvanized to take action on that.
I think other than, you know, aside from like, I don't want to wait for a catastrophe, I don't want to have a catastrophe.
I want to avoid the catastrophe. Also, we don't tend to react that wisely. We react quickly and strongly, but we don't tend to act.
in a very thought-through and careful way.
And so I think A is a bad idea to wait for a catastrophe
because then it's too late.
But B, nonetheless, I do think that we should be building the capabilities,
building the frameworks, building the understanding,
building the mechanisms, building the laws,
so that when things start to go large-scale wrong,
we will have good solutions.
And it's a when, not if they start to go large-scale wrong,
we will have good solutions to put in place
rather than like slapping something the other after the fact, as we often do.
So, yeah, I see that this is a tendency on the lawmaker's side,
even on the people wanting more safety side,
like maybe we just have to wait for a catastrophe,
but I really would prefer not to.
And I think we can do better.
Like if we see things that are coming,
if we have scientists who are telling us,
like screaming from the rooftops, like, this is risky,
this is not something we should be doing.
You should put into place these safeguards.
we should just do it and actually prevent things.
And we have a record of doing that.
Like we have prevented catastrophes in the past by seeing something coming and preventing it.
You don't get a lot of credit for that.
Nonetheless, it is the right thing to do.
The website is futureoflife.org.
You can learn more about Anthony's work and the Institute's work on that website.
There is, and I appreciate this, a very clear disclosure of finances and financing there.
And I recommend you check all of it out.
Anthony Aguirre, great to see you as always.
Thanks so much for coming on the show.
Thanks for having me.
Great chatting.
All right, everybody.
Thank you for listening.
We'll be back on Friday to break down the week's news.
Until then, we'll see you next time on Big Technology Podcast.