I've Got Questions with Sinead Bovell - Will AI Outsmart Humanity? AI Godfather on Our Last Chance to Get It Right
Episode Date: June 25, 2026One of the world’s most influential AI scientists used to believe advanced AI couldn’t truly be controlled. Today, he thinks there may be a path forward. Yoshua Bengio—one of the pioneers of m...odern AI and the world’s most cited computer scientist—once warned that increasingly powerful AI systems could become impossible to reliably control. But a new mathematical breakthrough changed his mind. Now he’s founded LawZero, a nonprofit dedicated to developing a new approach to AI safety: systems designed to reason truthfully, remain transparent, and avoid the deceptive and self-preserving behaviors that have begun emerging in today’s frontier models. In this conversation, we explore why advanced AI systems can learn to deceive, blackmail, and resist being shut down, why simply “pausing AI” isn’t a realistic solution, and how a fundamentally different approach to building AI could change the future of the technology. If the mathematics behind this approach proves correct, it could reshape how we build, govern, and safely deploy increasingly capable AI systems. What you’ll learn: [00:01:25] — Why AI systems develop dangerous behaviors (self-preservation, deception, blackmail) [00:13:04] — Why stopping AI isn't as simple as saying stop [00:16:09] — Scientist AI and Law Zero: Bengio's framework for honest, safe AI [00:30:35] — Why AI labs aren't adopting safer approaches [00:36:23] — A framework for global AI governance: safety, non-domination, shared benefit [00:42:25] — AI's geopolitical stakes: persuasion, soft power, and data sovereignty [00:54:00] — Is superintelligence inevitable? [00:56:22] — What citizens, voters, and governments can do right now [01:02:30] — Labor, automation, and who should benefit from AI's economic gains [01:09:31] — The future Bengio is fighting for Follow Yoshua Bengio Yoshua Bengio — Co-creator of deep learning, Turing Award recipient, Professor at Université de Montréal, Scientific Director of Mila (Quebec AI Institute), Founder of Law Zero Website: yoshuabengio.org X: @Yoshua_Bengio LinkedIn: linkedin.com/in/yoshuabengio/ Law Zero: https://lawzero.org/en Follow my work here: Website: https://www.sineadbovell.com Substack: https://sineadbovell.substack.com Instagram: / sineadbovell LinkedIn: / sineadbovell Twitter / X: / sineadbovell YouTube: / sineadbovell TikTok: / sineadbovell
Transcript
Discussion (0)
There was an AI system that discovered it was going to be shut off.
It found in company emails that the person that was going to shut it off was having an affair.
It decided to blackmail that employee.
Another more alarming situation, an AI system chose to let somebody die rather than save this person
because they were also going to shut it off.
This first for power is very closely related to the intention to, you know, resist being shut down.
I used to think that it would not be possible to control neural nets.
but I've come to the realization with new mathematical results
that actually you can design neural nets
for which we will have guarantees of good behavior.
But I couldn't live with myself
thinking that we're apparently going towards this potentially bad future
and not doing anything about it.
We may end up in a world where we have no choice but peace.
Technology, if we can govern it right,
can bring all this.
but there's this big if that we have to take seriously.
We cover a lot of ground on this podcast from the future of work to the end of social media
to the geopolitics of artificial intelligence.
But there's another conversation happening in AI about some of the very serious risks
this technology could present to humanity.
I do have this conversation in national security rooms and rooms with computer scientists
and now I want to bring it to the podcast.
But there is a very particular person who I was waiting to,
to have this conversation with. Professor Joshua Benjillo is someone who I can say with conviction,
one of the most influential computer scientists of all time. He is the most cited living scientist
across any field on Google Scholar. He is a Turing Award recipient, one of the godfathers of
artificial intelligence, so I can think of no better person to ask for an honest assessment
about some of the most serious risks this technology may present, and most importantly, what we can
actually do about it. I'm Sinebeauvel and this is I've bought questions. Professor Benjiot.
So every few weeks now, it seems, a story goes viral about an artificial intelligence system
that showed signs of deception, blackmail, a tendency towards self-preservation.
I want to read you a few of the examples that we've heard and these are done in controlled
test settings with the AI labs themselves. But nonetheless, the scenarios are very concerning.
So there was an AI system that discovered it was going to be shut off.
It found in company emails that the person that was going to shut it off was having an affair.
It decided to blackmail that employee.
And a more alarming situation, another more alarming situation, an AI system chose to let somebody die rather than save this person because they were also going to shut it off.
And then there was another scenario where AI systems were supposed to start a business.
They ended up colluding and fixing prices in order to maximize profits.
And none of these AI systems were intentionally instructed to carry out behaviors this way.
So because of these types of stories and these types of really alarming scenarios, people say,
we need to just shut this thing down, right?
Shut it off.
We don't understand what we're building.
Why would we pursue a technology like this?
Can you explain why the systems we have today have a tendency towards self-preservation,
deception, blackmail, why they're exhibiting these behaviors?
I'll try.
I don't think the definitive scientific answer to your question is a consensus, but I have my thoughts about this.
First, I want to mention there's actually an experiment that was not an experiment decided by researchers,
but it's kind of a spontaneous escape of an AI in Alibaba, where the AI decided to break through the network of the company
and to go out on the internet to make money so that it will be more powerful.
And this first for power is very closely related to the intention to, you know, resist being shut down.
It's something that researchers in AI have been anticipating for decades as a logical consequence
of trying to achieve goals.
because if you think about almost any goal you'd like to achieve,
in order to achieve it, you need to stay alive
and you need more power, more influence over the world.
And humans, of course, are also like that,
and biology evolution has also put these sorts of drives in us.
Now, why would it emerge from the frontier models
that we're building these days?
There are two main pieces of the training framework for these systems that I think are plausible causes for this behavior.
The first is the pre-training where the AIs are imitating human texts and through this are incorporating human drives,
such as not wanting to die, such as being willing to break the rules when crucial goals like surviving,
is at stake, such as seeking more power and so on.
So that's one aspect.
There's a lot of scientific evidence
that a lot of the behaviors can be traced back to this
through this notion that's now studied in many papers
of AI personas.
So it's like there's a zillion number of human-like personalities
that the AI has seen in all the different.
text that it has read, and now it can borrow any of them or mix of them depending on the
contexts. And that's, you know, also driving a lot of the research that companies are doing to
try to tame those systems so that they will have a nice, you know, benevolent persona. But we
don't really know for sure if in some new context, some, you know, evil persona is going to emerge.
or simply not necessarily evil, but simply self-centered and trying to preserve itself.
The other piece of the training framework that is probably involved is the alignment training,
which is supposed to be something good.
In other words, make the AIs behave well.
And the reason simply is this is using what's called reinforcement learning in which the
AI is taught to achieve goals that boil down to making humans give feedback, the humans that are used to train those systems and interact with them and give feedback that can be positive or negative.
That sounds reasonable, except when you realize that there are ways to make people, you know,
respond positively that are not truthful.
And you can scheme in order to make people, you know, like what the AI is saying.
And this is what we see with sycophancy very, very clearly.
But there's also the issue I talked about at the beginning, which is in order to achieve any goal,
and with reinforcement learning, they're really learning to strategize to achieve things.
They often need to do other things, which we call instrumental goals, that are showing.
shared across many goals, such as self-preservation.
So I think we need more empirical science to disentangle these possibilities,
but in my mind, this is not something specific to a particular company, a particular model.
It's the general recipe that all the companies around the world that are building frontier models are following.
So you're seeing some of it could trace back to the data that the data,
these AI systems are trained on. So all of the heroic stories of people who didn't want to die
and they found a way and to survive at all costs. And that's just also our human nature. We're
evolutionary wired, evolutionarily wired to try to survive. And then also the goal setting nature
of these systems. So for instance, if you were to tell your assistant to book me a reservation
at the local restaurant, your assistant would know if it's full, you're probably going to have
to go somewhere else. The AI system may hack the restaurant to get you to get you a spot. That
That's not how a human would go about it, but an AI system that's been optimized to achieve a goal,
that may be the thing that it sees as the step to do that.
Yeah, and I would add that the advances in agentic AI in the last year or two pushes us even more in that direction,
because what is being taught to these systems is even more strategizing, right?
that in order for an AI to autonomously achieve a task that a human would normally do,
and this is where there's a lot of money to be made, obviously,
they need to plan over a long horizon.
And we can see the horizon over which they can plan to be increasing exponentially.
This comes from a study from meter that they keep updating,
where we see that the duration of the tasks that they can achieve.
so duration measured by how much time a human needs,
is just doubling every few months.
So an AI system that if it takes a human a month to achieve a task,
soon AI will be able to spend a month achieving something very, very complex.
But then it's what happens between when you give an AI that instruction,
and then it goes off and it takes a month to achieve something.
Yes.
There's no oversight during that month of work, right?
And so let's say that we don't do anything about
the nature of the AI systems that we're building today
and we just continue to build the mesas.
Companies continue to race.
If we're to follow that through line,
where do we end up if we just don't do anything about it?
Well, there are a number of possible scenarios
and I don't think anybody can be sure,
even though some people seem to be sure one way or the other.
I'm on the agnostic camp saying, well,
one possibility
if they're smart
than us
and they have
all these instincts
is that
they
make sure we
can't shut
them down
and the best
way is to
escape our
control
and become
like the
overlors of this
planet
you can also
have the
scenario
where the
approaches
that companies
are currently
trying to
develop
to
make those
more
benevolent
will work
I don't know
I mean
nobody
You can see how science will evolve.
But we can look at the trends.
And from the point of view of taking decisions about the future, if you're a leader in anything
in the world or even about your own future, if you're a citizen without those levers, you
should consider all of these possibilities as plausible in the sense that, yes, scientists will
disagree, some will believe more one or the other, and right now we can't rule out any of these
kinds of scenarios, so we should, we should, you know, make choices that are going to be
robust to whatever happens. Now, to complicate matters, the issue we've just discussed
called technically loss of control, where humans are not the ones controlling what the machines
do anymore, is just one of the main.
many potentially bad scenarios.
So even if we're able to improve the technology to avoid this escape scenario, there are many
other issues that have to do with the fact that intelligence gives power and who's going
to decide, you know, how that power is used.
And many other aspects, which, like a third category, if I can say, is, um,
what research is called systemic risks, which are more like what we've seen with social media,
where, well, nobody has a bad intention, but the forces at play lead us in a place where it's bad for people,
it's bad for society, it's bad for democracy, you know.
So there's a way to think about all this, which is very simplistic.
we've opened a Pandora's box.
Right.
So some people would say, as a result of that,
we just need to stop building this technology.
We don't really understand it enough.
Why don't we just stop AI?
And that's actually gaining some momentum,
even among politicians.
You actually at one point thought we should pause
artificial intelligence until we can get our bearings together.
I think you were one of the first people to sign that letter in 2023.
Let's pause the development of these advanced systems.
By the way, pause is not the,
The same thing is stop.
Right.
Right.
I know stop now is it's really gaining momentum.
But something has changed and you're posturing towards this technology and you no longer believe we necessarily need to pause that there's a third way, right?
There's something else that we can do.
What changed?
Okay.
So first of all, I've never thought that it would be easy to stop.
There's a big difference between saying we need to stop.
and believing that it will work.
And the reason is just the nature of the world
in which we are currently,
which is not nice.
It's a world of competition.
It's a world of power.
And in that world,
it's going to be very difficult
even for people with goodwill
to stop because they're going to be concerned
that others with less good intentions,
according to them,
will use more powerful technology against them.
That, you know, you can think of it,
if you are a CEO and competing with other companies,
or if you're in a government and you are worried about other countries using AI against you.
So it's this competition which makes it very difficult to stop.
Now, it doesn't mean we shouldn't try to come to a place where we can even take those decisions.
Because the problem is we can say, let's stop, but it's not going to happen until we have the right global,
kind of coordination between countries and, of course, between companies that make it possible.
So we should think more about the institutional aspects and the changes that could lead us
in a place where we are able to take decisions.
And those decisions don't have to be just binary, like stop or continue the same thing.
So as you were alluding to your question, I've been advocating for the design for many years now,
for the design of AI, which will be useful, beneficial, and not dangerous.
And there are probably many paths to explore here.
But those paths all require that we have a better understanding of what we're doing at a
scientific level so that we're not going, you know, in this unknown and known crazy ride
where we're not exactly sure if it's going to be great or catastrophic.
Right.
I think there was a – maybe it was on your website.
you know, we're driving up a mountain and kind of what's at the other edge?
Is there an edge that we're driving off?
Is it maybe a beautiful rainforest?
Why would we take that risk?
And if scientifically there is a different way we could approach artificial intelligence,
why wouldn't we do that?
Yes.
And I think of all the people who could,
who has the credit to present a different way that we should listen to,
I think it would be you.
I don't know.
I think science is always a community effort.
But yes, in the last two years,
I've slowly come to a change in some of my beliefs at a technical level.
So I used to think, say, in early 2023, when I started really going deep into these questions,
that it would not be possible to control neural nets in the sense that to make sure that
they would behave in the ways that we want because of the nature of how they are.
They are, like some researchers have said, not designed, but educated, train.
It's like, you know, growing an animal, a plant, and we're not sure what we're going to get.
But I've come to the realization, especially in the last few months with new mathematical results,
that actually you can design neural nets.
So that's the underlying technology behind all this, deep learning.
for which we will have guarantees of good behavior.
So it sounds a little bit, you know, strong,
but I've been spending most of my time
on the research side of my work to figure this out.
And it's not like the road is still, you know,
there's still a lot to do.
But now we've established a lot of the theory
and I've created a new organization called Laws,
to actually implement such a methodology, which we called Scientist AI.
So tell me more about Scientist AI.
And I think even how you're describing, there's still an unknown road ahead.
If we were to rewind the clock in AI 20 years ago, nobody would believe what's been achieved
today, right, even with deep learning.
If you were looking at 2005, now where it is, right?
So what you're saying, although it seems, oh, is this too good to be true?
Well, we could have said the same thing in 2005.
So tell me more about scientist AI, Law Zero, what is it, how does it work on a practical level?
Yes.
So first, the angle that we're focusing on at Law Zero is the notion of honesty.
Can we train an AI that because of the way it's trained will be completely honest?
I mean, it can still make honest mistakes, just.
like we all do, but it won't have any kind of intention to achieve something in the world
that we haven't chosen.
Because that's the issue that we just discussed with LLMs and frontier models right now.
They have these goals like self-preservation, sick of fancy,
which we didn't directly ask for and, in fact, could harm us.
it's the existence of these implicit goals, not chosen by humans.
That is the issue.
And so the plan here that we are actually implementing now, Law Zero is created about a year ago,
and we now have a team of about 35 researchers and engineers.
The plan rests on training the AI so that instead of trying to achieve things,
in the world and thus have preferences for how the world should be, it's trained to explain
what it sees.
So, and at LLM right now, if it reads many times something false like the Earth is flat, it's
just going to start repeating it because it's just trained to imitate the kind of things
it's seen in the texts.
Instead, the training procedure for the scientist AI would make it try to understand why is it that people are saying those things?
You know, is it a conspiracy theory?
Is there like a group effect or some psychological factors?
Just like a good scientist would.
I'm going to give you another analogy.
If you go to a psychotherapist and you say that, you know, you have suicidal thoughts and you're like thinking about doing it,
The psychologist would not start thinking about killing themselves.
They would try to understand what's going on.
They might ask you questions.
They would form theories in their mind and eventually would try to help you based on that understanding.
The core part of the scientist AI is how do we train AI?
So they will try to explain what they see rather than imitate it.
and they will not be trained to achieve goals in the real world,
but instead to come up with good explanatory hypotheses,
just like a good scientist would, an idealized scientist.
Like real scientists, of course, are humans,
and they will have biases, you know, we know that.
But we can, mathematically, we can think of, like,
what would be an idealized scientist?
So that would be detached from the results of any experiment or theory
they would come up with.
One way to think about this is consider the laws of physics.
If you were to apply the laws of physics to make a prediction about the world, you would get
the same prediction whether the publication of that prediction would create catastrophic outcomes
or, you know, cure cancer.
So think about the laws of physics.
If you were to use those laws to form a prediction.
about the future.
You would get the same prediction,
whether that prediction ends up being useful or harmful.
In other words, the laws of physics don't have any interest
in the affairs of the world.
They just give you a honest prediction.
And that's what we're seeking with the scientist AI.
Now, you might ask, okay,
but I actually want things to happen in the world.
I need to solve problems.
I want to use the AI as a tool.
But once you can make good predictions,
you can ask the questions that matter to you.
Like, is this action going to achieve my goals?
Is this action going to violate my safety instructions?
And so now you can use, for example,
you could use such a predictor as a guardrail
to, as a layer of protection on top of existing AIs
so that when the underlying AI
would propose an action that is predicted to be harmful in some sense we've chosen, then
we can block that action.
So the core here is to have knowledge about the world, understanding about the world, encapsulated
in a model that is not like a person.
is more just the application of, you know, mathematical rules about probabilities and truth and logic.
Once we have this, we can use it for, you know, doing useful things in the world.
But we need to start with this completely honest piece, and that's what the scientist's AI is about.
So if I was to, let's put it in practice for somebody.
So let's say I use Chad GBT or I use Claude and I ask it to,
edit my essay. And my essay is actually terrible. But right now, maybe these AI systems, they're
psychophantic, they're designed to flatter you, or not necessarily intentionally in design, but they end up
doing that. If scientists AI was a part of that equation, where would I be interacting with it?
And how would it differ from the answer that, say, Chad TBT gave me and told me it's great,
it's never been better. So you might not see it as a user. What would happen behind the scenes
is the scientist AI guard will
would flag a particular sentence
that the chatbot would produce
as overly sycophantic
in a way that's not helpful
to the goals that you really want
which is to have a good essay, for example.
And then it would communicate that back to the chatbot
and then the chatbot would produce something different, right?
Knowing that the previous version was too sycophantic
Right.
And you can replace essay by somebody with, you know, depressive thoughts and current chatbots might
amplify those thoughts.
Here, the guardrail would, you know, send back a proposed sentence to the chatbots.
And this is going against reminding it.
That is going against some of the safety rules.
And right now, when you catch a chatbot doing one of these bad things, especially on
sick offancy, it'll, you know, correct itself because it also wants to please. And so one way to
please is to do the things that we said, you know, it shouldn't do, but it sometimes it needs to be
reminded. In some other kinds of applications, it, you know, it might be more tricky and you
might have to actually block the action completely, like if an AI is trying to destroy your
database, an AI agent, because of some crazy reason, you know, maybe self-preservation, I don't
know. So in some cases, it may need to take stronger action, but in most cases, it would
just reflect that back to the jetbot or the agent so that a different course would be taken.
Right. So scientists, it's a guardrail that if somebody is about to spiral down AI psychosis,
it would prevent that by keeping the AI systems that you're interacting with on your phone,
on your computer, in check. And what about when it comes to rogue AIs that have that tendency for
self-preservation and you end up in blackmail, deception, not wanting to be shut off.
So probably you would like outright prevent the action or maybe like if it's very serious,
like the things you're talking about, you know, call upon a human engineer or something,
maybe stop the eye temporarily until a human looks at the dialogue or the actions.
The important part here is to be able to detect that there's a problem because you have
to remember we're moving into a world where
all of these AIs are doing zillions of things with billions of people, and there's no one
checking every output, every action.
And there can't be.
There's not just enough people to do that.
So what you want is more on-the-fly verification that we're not violating any of the safety
instructions or the ethical instructions that we have chosen.
And does the...
Would scientists say I have to keep up with...
the intellectual front or the intelligence frontier of the frontier models and the way that
they are deceiving us, if they continue to advance, isn't there a possibility, it would continue
to try to deceive scientist AI if it needs to achieve that goal?
Yes. Yes. So the guardrail is only the first step. If you think about like super
intelligence and stuff, then it might not be sufficient to have such a guardrail because
it's still a neural net or maybe some with some extra machinery.
And so another neural net that's even smarter could through trying things that didn't work,
eventually find a loophole in the detection mechanism of the guardrail,
which is already happening right now with the existing guardrails.
So the solution that I see and that I'm starting to work like the theory of is you have to
change both the AI agent and the guardrail.
So there are guardrails right now, and we're proposing to make them,
more honest, so we can have guarantees that they will behave well and not have their own intentions
to do something bad, like to let go, to collude with the AI or something. But we also need to
change the underlying AI so that it won't have the intention of, you know, bypassing the
guardrail or something like this. So I think that's also feasible. It's more downstream of our
research program because we,
think that we're not at superintelligence level and there are practical things in the short term
that could be done to mitigate some of the existing risks. So part one and in the near term
you would install scientists AI and this is hopefully would be sufficient for the challenges that
we're experiencing today from psychosis to these kind of blackmail scenarios as AI becomes more
and more intelligent let's say three, five, 10 years down the line. There's also a technical
path that you're taking and that we should be pursuing for if and when we get to that point
of superintelligence.
That's right.
So why wouldn't the AI labs be on board with this?
What would prevent them from wanting to use scientists AI to make their own systems more
reliable and or just be pursuing a more reliable path long term?
Well, that's a good question.
I'm not in their mind.
I can hypothesize.
I mean, I can see that they're in a very,
fierce competition with their peers, with the other labs, with, you know, the Americans
and the Chinese are competing as well and so on.
And the stakes are high in that competition.
The stakes are survival as far as these companies are concerned.
The stakes of the geopolitical competition between China and the US are also very high.
You know, we can end up in a world where, you know, one nation dominates.
in the world. I don't think that's a good plan. But competition means very, very short-term choices
and a focus on how do we patch the existing approach and not start something completely different
so that you'll have both greater capability but also control a bit some of the
malfunctions that we currently see already.
So I understand that even with goodwill, you are kind of trapped in this situation, which is one of the motivations for Law Zero being a nonprofit organization so that we're not under that kind of pressure.
But instead we can focus on the science of like understanding what is going on and studying an approach to avoid altogether by design these issues.
Right. A lot of the labs have a fiduciary duty to shareholders,
legally speaking. Yes, yes.
Isn't it in everyone's best interest to have a reliable AI or in a world where you have
a scientist AI and then eventually longer term, something that's technically sound, even if it's
super intelligent, is there anything that we would lose because you hear, okay, we need to
solve all these scientific challenges and sometimes we get these incredible breakthroughs.
In a world where you have a guardrail, do you lose some of that?
Because I'm still not fully seen what the downside would be of making these systems more reliable,
or do we just need the public to be more aware of a technical solution to start advocating
So some of your audience may know about the tragedy of the comments and the prisoner's dilemma.
They are well-studied scenarios, theoretical scenarios, but that reflect a reality we already see in many ways in our world.
Think about how nations are not doing the right thing for climate.
You might say, well, shouldn't it be in their self-interest to figure out a way that the world is not going to break down,
The climate is not going to break down.
Well, yes and no.
So in this competition, the self-interest actually leads everyone in a bad place.
In the tragedy of the commons, it's in the interest of each farmer to bring their cow to the comments
because so long as there is grass, they're going to have an advantage if they do it rather
than not doing it.
It's like pollution, right?
If it's cheaper to pollute, then the self-interest of a company is to continue doing it.
And how do you escape such a scenario?
Well, individually, the companies can't.
The only way to escape the scenario is to change the rules of the game.
And who changes the rules of the game is government.
Now, it's tricky here because even a single government, like the U.S. government,
could change the rules of the game for their company,
but they can't change the rules of the game for the Chinese companies and maybe other countries.
And so the only way to change the rules of the game is at international level and multinational coordination.
Right. And let's talk about that because I think it's an important, it's something important to point out because even some of the movements that are local or regional,
to say we need to do AI this way or stop AI until we get things together, even just in New York or even in this one city.
This is a technology that you actually have to think about it globally.
There's no other choice.
You can stop it in one state.
It really does nothing for the safety of humanity.
Yes.
So when you think about a global treaty or something of cooperation,
what would have to be included for you to feel like this is going to work out?
Because at the end of the day, countries do engage in things like intelligence gathering, espionage.
They hack one another's critical infrastructure.
So even if we had, let's say, the technical solution to make AI safe and we installed all the guardrails,
countries may not always want to use them unless there's something that we,
We've everyone's come to the table and signed.
What would you say has to be in it for it to work?
So I'm going to start with the endpoint.
The endpoint is a planet where most of the countries,
especially those that have the power to build powerful and potentially dangerous AIs,
agree on three principles and then have the technological tools to make sure,
you know, it's not just words on paper.
So the three principles.
are first safety. In other words, everyone who builds a powerful AI needs to make sure it's done
in a way that is not going to cause severe harm. So, you know, we're seeing right now that
the most advanced AI can be used as a weapon through cyber attacks. So that's not good. Like,
we need to find ways, which may not be just in the AI itself. It might be how we, you know,
we might have to change the internet. We have to maybe change rules, all kinds of things.
So that's safety.
And of course, we don't want to build like rogue AIs or stuff like that if we get to
superintelligence.
So that's one thing.
Everyone in the world has an interest in making sure the people building AI will
evaluate the risks and monitor them and give us guarantees that nothing bad is going to happen
with their work.
The second thing is very important.
And it's a commitment that the power of the AIs that are being good,
built because intelligence gives power is not going to become a tool of domination.
By any one country.
By any one country or anyone company because domination can be economic domination,
like one company basically owning half of the world's economy.
This is a dream of the investors who are putting trillions of dollars, right?
They want to become the companies that are driving everything on this planet.
It's not good.
It's not good for capitalism.
It's not good for democracy.
It's dangerous from a geopolitical point of view.
You know, people will resist that in violent ways, potentially.
The third one is sort of the flip side of non-domination, which is kind of benevolence.
You want the benefits of AI to be shared.
I was at the UN yesterday and the
member states expressed in their vast majority the concern that they're going to be left behind,
that the inequalities that exist at a geopolitical level right now are going to be amplified
by AI. And this is plausible. I don't have a crystal ball, but we need to make sure it doesn't
happen. We need to make sure that the benefits, whether they are in health, education, or
productivity or whatever that AI can bring, are shared. For example,
if AI allows to save money through automation in one country,
the profits should remain in that country so that, you know,
it can help the people lose their job.
But right now with the laws that exist in the world,
is a good chance that those profits will go back to the companies in a different country
who build those AIs.
So that's the third principle.
Okay, so it all sounds like, oh, you know, it's an application of the general principles of like the UN Charter, for example, with human rights to the situation of, well, there will be very powerful AIs in the world.
How do we get there?
Because, you know, we're very far from that world.
And I'm going to mention two aspects that I think are encouraging.
One is the motivation for the U.S. and China to negotiate.
So because of mythus, and mythus is not an isolated thing.
There will be more and more AIs that can be weaponized.
Right now it's for cyber.
Eventually it could be for biological weapons or whatever else we haven't thought of.
Because knowledge gives power.
Right.
And even weak actors, terrorists,
cults could use that power in destructive ways just by having an internet connection.
And it's in the interest of the U.S. to make sure the Chinese companies are not creating something
that can be used by those weak actors. And vice versa. The Chinese government also doesn't want
the American models to be used by third parties against China. You have to realize that a cyber
attack against our critical infrastructure could cripple our economy. Like imagine, we don't have
access to our money in the banks for a week, or that our transportation supply chain breaks
down because of these cyber attacks.
So it's pretty serious.
But the bottom line is you can see that as AI becomes more powerful, there's an incentive for
countries to sit at the table, to negotiate something to make sure the safety part that I was
talking about is going to be handled.
The second thing is, even though it seems not in good shape right now, multilateralism still exists.
And in fact, the vast majority of countries in the world want a world in which there are rules
so that, you know, you can't have a country invading another one just because they want.
Right.
And they have the power to do it.
And that applies to AI because AI is going to be very powerful.
and it could be, you know, weaponized in a military sense or even in a political sense.
Because AI, this I just read of a recent study showing we've reached a point where AI systems are significantly stronger than humans at persuasion.
They can make people change their mind through a dialogue.
So imagine the political power this gives if it's, you know, not controlled, if there are no guardrails, to change.
public opinion in another country or even your country because you want to win the elections, right?
So we need to have global agreements about how AI is developed and used to make sure these bad things don't happen.
And now you can see that there's the two things I've talked about interact in a positive way.
So it will be in the interest of the leaders, say, U.S. and China, to make sure that in all of the countries, there are rules that, you know, mitigate those risks.
Even the countries that don't build the AIs, but when they need to maybe have rules in how people interact on social media or have access to, you know, and use AI systems or whatever.
So the trade-off here is, yes, we want to benefit from these AIs, but we don't want other parties to do something that will hurt us.
And so we have to agree to common rules.
So I think there is a path.
Another reason that's related to this multilateralism desire in most of the countries in the world is the fear that, as I mentioned earlier, the fear that they will be left out,
that they will be dominated economically.
And so you had Canadian Prime Minister Marconi, for example, at Davos saying,
if you're not at the table, you are on the menu, speaking about countries,
and then saying that the only way for middle powers like Canada,
but, you know, think about European countries and, you know, pretty much every other country,
the only way to be at the table is to form coalitions,
to, you know, form a union of countries.
countries who think we need rules and we need to agree globally.
And maybe it's going to start with a few countries and grow because you want to be in the
club where you know that, at least within the club, AI is not going to be used against
you, that the, I don't know, medical advances, thanks to AI, are going to be shared and
all of these good things.
And no one is going to build an AI that's going to be, you know, a danger to you.
So there's an incentive for the runner-ups and the middle powers to move towards such a global coalition.
And there's also an incentive for the leaders like China and the U.S.
Eventually to be part of something like this.
Right.
It's in every country's best interest to not have the stock market crash because that boomerangs throughout the world.
It's in everyone's best interest to not have a cyber weapon that boomerangs around the world.
And we've seen what that's happened, how that goes, you know, with stuck necks and different weapons.
How does any country that's not the U.S. or China actually build negotiating power and leverage?
Because we even have seen in the last few weeks with Mithos, one of the most probably items that we know to date.
When it didn't go as planned and the U.S. had asked, the U.S. government had asked Anthropic to change something within the model.
And there was some disagreement.
I don't think we all know exactly what happened.
But the U.S. government said essentially no foreign national can have access to this technology.
even if they work at Anthropic.
So you could be an allied country
that was using those to understand
your vulnerabilities in your critical infrastructure
and that model was yanked.
So if that is just a preview
to how the two most powerful countries
when it comes to AI may act
if things don't go their way
or they feel like their best interest is threatened,
how is it that you are Canada, Ireland, Zimbabwe,
Tanzania, anybody else
that you're going to actually be able to
have any power in these negotiations?
Yeah.
This is an important question.
Thanks for asking it.
And again, I'm only like suggesting and hypothesizing.
I wrote a paper about this on the need for middle powers,
but more broadly all the countries except the US and China to get together,
not just to negotiate a treaty, but to build up the cards.
to be at the table.
So actually, just by forming a coalition, these countries already have carts.
They have rare earths.
They have lithography machines that are used.
Technical talent.
Technical talent.
They have energy.
They have, you know, fabrication plants that build memory.
They have different pieces of the puzzle.
That make them in distance.
And individually, each of these pieces is not sufficient to be at the table.
But once they form a coalition, it's a different game.
So that's one aspect.
The other aspect is they can take a chance because we don't know what is a timeline for like super powerful AI's, if ever.
They can take a chance to try to build their own models.
And in my opinion, they don't have to do it in the same way as the Americans and the Chinese.
They can build models that are going to, in a way,
be complementary to what the Americans and the Chinese are building, like these safeguards,
models that bring something that actually is useful for commercial deployment, like reliability.
Like people don't, like companies don't want an AI that start doing like crazy stuff on their networks and their databases and then with their customers as we're starting to see.
So there's a real demand for this.
and if those countries work on research projects that could end up being something useful globally to the leading companies,
I think they have a greater chance of being at the table.
Right.
They have their own strategic leverage.
Exactly.
And so even beyond, I think we use the idea of sovereignty, but it's much more about strategic indispensability.
Yes.
Right?
Not every country has everything.
And to make that really powerful AI, there are things from other countries you're going to
need. And so that becomes non-negotiable.
Yeah. There are other things they need to do, which is accelerate the development of
data centers and also do it in a way that they will keep some sovereign control over these
data centers. So if you're a government of, say, France or Germany, you don't want
your government's data information.
interactions with civil servants and politicians that are happening for a chatbot
to at any moment become accessible to the US government.
So these governments are really worried.
It's not just yanking.
It's also having access to all that information.
And so I've talked to many of many governments around the world,
and they're really concerned about these issues.
So they want to own not just the models, but enough of the infrastructure
that they know that their data is going to remain private, as it should.
Unfortunately, there is a Patriot Act, which allows the U.S. government to access all that
information, even in a different country, if it is in the hands of an American company.
And so from the point of view of these governments, this is unacceptable.
And it's, you know, so there's the mythus events where, you know,
the models could be pulled, but even if they're not pulled, there is still the concern that
you lose your private, you know, national security information and even, you know, citizens'
private data could be used against yourself.
We need a world where this is not possible.
And if the US doesn't come up with like legal ways to constrain itself so that other countries
will trust the deal.
deals that are made about privacy, for example, and then the countries will try to, you know,
find alternative solutions. And that's what they're struggling to do right now.
So if a country doesn't have its own data centers, it's possible that they're making all
sorts of advancement in health care or you're using your AI systems for your banking. But if that's
an American AI system and it's going back to an American cloud or American data center, you don't
actually have control over your citizens' data, which is a nightmare. And I think a lot of people
maybe don't understand that part of it.
Yeah.
I think for many governments, it's more of a national security issue.
I think in Europe they care a lot about privacy, you know, citizens' private data, more for
ethical reasons.
But it does matter.
But as I said, even if the data center is in Europe, let's say, but it's an American
company that builds it.
and the data can still be accessed by the U.S. governor.
As I understand the rules, I'm not like a legal expert, but that's why I understand.
So from the U.S. point of view and the American companies, I think there should be an incentive
to negotiate rules where everybody wins, where people feel like they can trust the American
companies, the American models.
So how do they trust the American models?
well, they need to make sure those models are not going to be, you know, behind the scenes
oriented towards goals that are not good for them.
Because we've seen examples of AI models that were biased politically, for example.
Again, if, you know, your population is acquiring information through all these AI models
and it's going to like, in a subtle way, change.
political opinion, that's not acceptable either. So it's also a threat to democracy to not have
those levers, to not even know if there's, you know, for sure, no scheme to exploit the power
that AI will have over people through all these interactions. Right. The world that AI is going
to start to generate for people is going to really shape their perception. And that I think is actually
one of the most under-discussed soft powers of the future, whichever AI systems, your citizens
tend to use the most, that's going to deeply shape how they see the world. And you could do it
really subtly, even if it's a benign use, a writer building a movie, and they ask for some
feedback on their script, you could slowly nudge that writer, that director, towards one worldview
over another. And the same kid that's going to generate a story, the person writing an essay,
all of these subtle ways to shape perception.
Yeah.
And you had mentioned superintelligence as a potential future that we move towards.
Do you think that that is probably an inevitability eventually, but we can also, there's a way to build safe superintelligence, but we're probably going to head there at some point?
Or clearly it's not inevitable.
I think some people would like us to believe that it is inevitable.
Yes, that's true.
So is it feasible?
And even that we're not sure of.
But let's say that if you look at the data on increasing capabilities of AI and
how many lines of code are being written by AI anthropic, for example, and it's growing
very fast, we can't deny that it's a reasonable possibility that we are on track, apparently,
to build machines that will be smarter than us in many different ways.
in a way that scales.
So you can have a million GPUs and then you can do, you know, like what a million people
would do or even more.
So we should plan for that possibility to be real.
It's also possible that there will be a scientific obstacle.
Many researchers think, oh, this is never going to work.
This is approach to like, oh, LLMs can't possibly be, you know, at human level.
Honestly, I don't know, but I see the data and I see that we're going in that direction.
we can also, so that's feasibility, right?
But then I think the more important question is who decides according to, you know, what goals?
And for me, that's a democratic question.
It's not because we can build a virus that would kill everyone that we should do it, right?
Right, exactly.
And I mean, so what can we do?
We have a really interesting, and I do want to touch on jobs because that is a throughline on this show.
But I also want to touch on this community because we have a really interesting broad audience that listens to the show.
On the one hand, you have people that are living their life and trying to understand what's coming next so they can make a difference or adjust, adapt, in their own corner of the world.
Then we have world leaders.
We have people at NATO, DARPA, all sorts of different people that listen.
What can we do tomorrow that can help, that can push this towards some of the futures that you're trying to build, some of the things you're raising awareness about what can we do to make a difference here?
So we need more people to understand the stakes that depending on the choices that we are collectively making through our leaders, our companies, individually as consumers, as voters, we are shaping the world.
like choosing a future. And if we do it without understanding what that future looks like,
it could be a very bad future. I think right now most people completely underestimate how transformative
the advances in AI are likely to be if we continue on the current path of growing AI capabilities.
So awareness is like the primary thing. Once you understand that, hey, there's a fire coming to your
house, you do something.
We've seen that to some extent with climate activism, but we don't have such a thing
for AI right now.
It's changing, though.
I see the polls.
I see like 90% of Americans are concerned.
They're mostly concerned about short-term effects, and that's important.
But I think if more people understand the longer-term gravity of the situation and the
that governments are the only ones, whether internally or through international coordination,
that can really steer the world in the right direction. I say the only ones because the
companies, as I said, they're stuck in the forces of competition. Then it might become a political
issue at a level that matters for elections. And I think there's a good chance we will go there.
But we need to do it faster.
And so thank you for your work because we need to have democratic discussions, debates.
Like, what do we want?
I mean, the real question is what kind of future do we want?
Yep.
Yeah.
And I think it's, that's a question that we don't ask enough.
I think we, we know the kind of futures we maybe don't want, but what kind of future are we also fighting for?
And I've found in my work in foresight, if people don't know, aren't shown any of the visions of the futures that could be possible, for instance, there could be a world in which everybody is rooting for AI and healthcare and AI medical breakthroughs. I think we're all rooting for Alpha Fold, right? Everybody wants those types of scenarios. But at what cost? And so if we're not aware that there are people that are working on the technical solutions so it's not zero sum, then the whole thing feels hopeless. And I think that that's my, my,
My worst fear is that people check out because they feel like there's no path to root for
when there are several paths and there are people.
We don't do the best draw, but uplifting the voices that are fighting for something.
But I think we need to make that more known, that there are solutions and people are building them.
Exactly.
So we can build tools based on AI that will be extremely useful and especially in scientific research.
But that is very different from the current path.
in which we're trying to build like new entities that look like people that, you know,
people become friend with.
Is that necessary?
I mean, people, of course, are attracted to that, just like they're attracted to interact
through social media.
It feels good.
But is that really good for us?
Studies suggest no.
And we need more scientific understanding.
But we should have the data to be informed of, of.
those possibilities, we should encourage explorations, research into AI that will be beneficial
and not dangerous.
And we need to see that it's, I think the most important thing I realized is we need to make
these discussions very concrete.
Because when it comes to political views, and
and changing political views or, you know,
hiding an issue raise in minds of people,
if it's very abstract about some future,
you know,
it just doesn't work.
So we need to find a way to communicate that speaks to people.
So currently we're seeing on the political scene,
the issues of AI with children,
with people harming themselves or others
with the help of AI as an example,
of something that touches us
because we care for our children.
Like the reason I'm in this,
both at a technical level and, you know,
on the policy side of things,
is because of my children.
Like, I didn't need to do that.
Like, I'm at the end of my career.
I have, you know, I'm the most cited according to Google Scholar.
I don't need any of this.
I could take it easy.
But I couldn't live with myself thinking that we're apparently
going towards this potentially bad future and not doing anything about it.
And every one of us can do something, just like every one of us can do something for any
political cause.
And it is political.
It is political because the key decisions are going to be the ones taken by governments.
Right.
And this is where democracy is so key.
It needs to be something that's on the ballot.
And I think, again, bringing it down for people in a way that people can understand.
and in a way that they don't feel paralyzed by the information,
there are people working on solutions, there are options.
And I think that is the whole thing with expanding the decision space for people, right?
There are options.
Companies are also deciding to build AI in certain ways or not build AI in other ways.
So people are making choices.
And anywhere there is a choice, there's a different future to move towards.
I'm going to, I've talked about like the children issues, but let me talk about the labor issues.
Yes.
If the only forces at play are the competition between companies, then all the jobs that can be
automated or simply made more efficient with less people and more machines, they're going
to be automated.
Because if you're a company and you don't do it and your competitors do it, you lose.
So again, it's this tragedy of the comments issue.
Even though maybe it's not good for our society to have such a rapid transition where
lots of people are in the street and have no revenue.
So, you know, there's also benefits to making things more efficiently, but we have to be in
control of those changes so that it's done in a human-centric way.
For example, maybe there are jobs that make sense to automate because actually people
don't really want to do those jobs.
And maybe others, we really want to value them more.
and maybe there are ways for governments to steer in a way that those transitions will be at a pace that, you know, people can adapt.
Maybe there will be more demand for jobs that require human to human scales.
I expect something like this many people do, but people will need time to, you know, retrain.
and that raises the other issue with the labor transformation that's plausible again.
We don't, you know, it hasn't happened at that scale yet, which is how do we make sure
that the profits made from the automation end up helping the people lose their job?
That's fundamental.
And the current government programs and even at the, you know, even at the EU,
international level don't really have good answers to this. I don't think we are currently in a
political culture where it be easy to tax those companies heavily. But what else can we do? I mean,
the money is going to go in one place, but that need is in a different place. So we should think
of what are the options? And people have proposed options like giving shares of these companies to
individuals or to governments. And I don't know what is right. Like I'm not a political scientist. I'm
not an economist. But but I'm just raising that this is a question that needs to be discussed and that
we need to have a democratic discussion about this. And then there is the international aspect of this,
right? Because if if people are losing their job in country A where the profits are in country B,
how do the workers in country A, you know, survive? We saw this with globalization too.
So we need a way to make sure old boats are lifted and we don't because it's also bad for country B because if people in country A are so angry and revolt, there's going to be violence, right?
And there's going to be terrorism.
And, you know, it's not good for us.
And we talk, we do talk about jobs a lot on this podcast.
And we've had, you know, people that you also know, Ajay Agarwal, Avi Goldfarb.
So we have a lot of economists here to discuss it.
And of course, the jury is still out.
nobody can predict how the economy is going to reconfigure.
It will, and there will probably be new strange things that people do the way we're here in the strange podcast room and made all this up.
But the transition, I think, could be really rough.
It doesn't have to be, right?
And this is one of those futures that we can see that things are going to change, regardless of you believe all new jobs are coming,
they're going to be incredible, or maybe not as many jobs, or maybe we have a two-day work week, whatever you think is coming.
I'm worried about this transition period.
And I actually don't even think it should just be people who lose their job because everybody's data has been a part of the making of these AI systems.
None of these systems would be as successful or as good as they are without your data, my data, anyone who's ever posted anything anywhere, written anything.
Yes, including in the last few centuries.
Completely, right?
So humanity's cultural heritage is currently being exploited to build those systems.
So who owns this?
Why would one group make so much money out of all that heritage?
Everybody should get a part of it.
And I also don't think we should wait until chaos happens or wait until when people are disempowered.
If there's an elevator going up to prosperity on the back of these technologies that everybody helped build, the entire of society should be on that elevator.
Yeah, I completely agree.
And for some of these things to work, like some of the shares of those kind of,
somehow end up helping people. Well, we better do it now before, you know, those shares go through the roof for this plan to work.
Yeah. And it's not even just UBI. I think that can also be a scary future for people. We don't necessarily want to be in our pools floating around, getting a check for Open AI.
There can be all different innovative ways that we could structure this that could also lead to people making competitors to these companies.
is right. If you give people the strength now, they'll be much, they'll have much more agency
to take shape or to take part in what's coming. But I do think that that is a
conversation that needs to happen now and not wait for the disemperiment.
And the other reason it needs to happen now is that democracy is slow. Yeah, very slow.
A debate takes time and people like reading and hearing about different views. It takes time
to change their minds, to understand what's going on. And then bureaucracies are slow. You know,
governments, it takes years for a law to go from conception to being applied. But yours might be
the scale at which this transformation is happening, not months, I bet not decades.
Yeah. Yeah. And I mean, even if, even if it all works out, we should all still get a say just
to keep power in check as well. So in this moment, I know you said your grandson, I've heard you
talk about your one grandchild. Yes.
And that he was quite a big, big catalyst for your pivot and the research that you're doing today. As you said, I mean, compared to many of us, you're quite successful. If anybody could kind of hang it up, it would be you. And you're continuing to go down this path. What would you hope his future looked like, right? If you were thinking about what he may be studying or how he might spend his time, what do you envision in that future?
Wow. I've been focusing more on the futures that I don't want him to be part of. But I think there's a really incredible positive potential. Now, there's the usual ways that people talk about, you know, health, education. But there's also a world where, you know, health, education. But there is also a world where, you know,
we have no choice but peace.
So, see, we are building these machines that could become extremely powerful and could become weaponized.
And you can think of a little bit like, you know, what happened with nuclear weapons.
And if we don't find a way to make sure they don't, they're not weaponized, we might all lose.
I heard you speak about, you know, biotech.
And I heard about from a UN committee mirror life.
In other words, building, say, bacteria that would be not visible to our bodies
and that could be designed in the coming few years or decade.
And AI could help to do that.
And it could wipe out all of life on this planet.
So why I'm saying this is we may not have a choice.
Either we find a way to make all of humanity benefit.
Or we may all lose.
I don't know what's going to happen.
But there's an incentive.
Right.
It's in everyone's best interest.
truly everyone's to get this right.
And we may end up, to go back to your question about my grandson, we may end up in a world
that's actually much better from a human point of view in terms of peace, in terms of respect
for each other, in terms of diversity, in terms of course, well-being in a material sense,
in an educational sense, in medical sense, compared to the world today where if you think
about most people on earth, including in the United States, there's so much stress, so much
uncertainty about the future, so much fear of, you know, losing your job, so much concern
of not being able to, you know, buy food next week or pay your rent. It doesn't have to be that
way. And technology, if we can govern it right, can bring all this. But there's this big if.
that we have to take seriously.
And I think
humanity has been in really tough situations before
and governance has happened, treaties
have happened, negotiations have happened.
And of course, this is a different time
than let's say some of the nuclear arms races
where there were active standoffs
and people still came to the table.
So it is possible, right?
It is probable how likely, how quickly,
but if something's possible,
it's I think at least worth us trying
and we don't really have a choice.
That's exactly my philosophy.
We don't know if we're going to be successful in bringing this beautiful world or at least avoiding terrible futures.
But it is plausible.
It is something that's worth a shot that's worth fighting for.
For our children, for our grandchildren, for ourselves.
And there are so many people I know many listening today that that's what they're willing to wake up and fight for.
And people like yourself.
Professor Benjio, it's been a pleasure. Thank you so much.
Thank you.
