I've Got Questions with Sinead Bovell - Will AI Outsmart Humanity? AI Godfather on Our Last Chance to Get It Right

Episode Date: June 25, 2026

One of the world’s most influential AI scientists used to believe advanced AI couldn’t truly be controlled. Today, he thinks there may be a path forward. Yoshua Bengio—one of the pioneers of m...odern AI and the world’s most cited computer scientist—once warned that increasingly powerful AI systems could become impossible to reliably control. But a new mathematical breakthrough changed his mind. Now he’s founded LawZero, a nonprofit dedicated to developing a new approach to AI safety: systems designed to reason truthfully, remain transparent, and avoid the deceptive and self-preserving behaviors that have begun emerging in today’s frontier models. In this conversation, we explore why advanced AI systems can learn to deceive, blackmail, and resist being shut down, why simply “pausing AI” isn’t a realistic solution, and how a fundamentally different approach to building AI could change the future of the technology. If the mathematics behind this approach proves correct, it could reshape how we build, govern, and safely deploy increasingly capable AI systems. What you’ll learn: [00:01:25] — Why AI systems develop dangerous behaviors (self-preservation, deception, blackmail) [00:13:04] — Why stopping AI isn't as simple as saying stop [00:16:09] — Scientist AI and Law Zero: Bengio's framework for honest, safe AI [00:30:35] — Why AI labs aren't adopting safer approaches [00:36:23] — A framework for global AI governance: safety, non-domination, shared benefit [00:42:25] — AI's geopolitical stakes: persuasion, soft power, and data sovereignty [00:54:00] — Is superintelligence inevitable? [00:56:22] — What citizens, voters, and governments can do right now [01:02:30] — Labor, automation, and who should benefit from AI's economic gains [01:09:31] — The future Bengio is fighting for Follow Yoshua Bengio Yoshua Bengio — Co-creator of deep learning, Turing Award recipient, Professor at Université de Montréal, Scientific Director of Mila (Quebec AI Institute), Founder of Law Zero Website: yoshuabengio.org X: @Yoshua_Bengio LinkedIn: linkedin.com/in/yoshuabengio/ Law Zero: https://lawzero.org/en Follow my work here: Website: https://www.sineadbovell.com Substack: https://sineadbovell.substack.com Instagram:   / sineadbovell   LinkedIn:   / sineadbovell   Twitter / X:   / sineadbovell   YouTube:    / sineadbovell   TikTok:   / sineadbovell

Transcript
Discussion (0)
Starting point is 00:00:00 There was an AI system that discovered it was going to be shut off. It found in company emails that the person that was going to shut it off was having an affair. It decided to blackmail that employee. Another more alarming situation, an AI system chose to let somebody die rather than save this person because they were also going to shut it off. This first for power is very closely related to the intention to, you know, resist being shut down. I used to think that it would not be possible to control neural nets. but I've come to the realization with new mathematical results
Starting point is 00:00:32 that actually you can design neural nets for which we will have guarantees of good behavior. But I couldn't live with myself thinking that we're apparently going towards this potentially bad future and not doing anything about it. We may end up in a world where we have no choice but peace. Technology, if we can govern it right, can bring all this.
Starting point is 00:00:57 but there's this big if that we have to take seriously. We cover a lot of ground on this podcast from the future of work to the end of social media to the geopolitics of artificial intelligence. But there's another conversation happening in AI about some of the very serious risks this technology could present to humanity. I do have this conversation in national security rooms and rooms with computer scientists and now I want to bring it to the podcast. But there is a very particular person who I was waiting to,
Starting point is 00:01:31 to have this conversation with. Professor Joshua Benjillo is someone who I can say with conviction, one of the most influential computer scientists of all time. He is the most cited living scientist across any field on Google Scholar. He is a Turing Award recipient, one of the godfathers of artificial intelligence, so I can think of no better person to ask for an honest assessment about some of the most serious risks this technology may present, and most importantly, what we can actually do about it. I'm Sinebeauvel and this is I've bought questions. Professor Benjiot. So every few weeks now, it seems, a story goes viral about an artificial intelligence system that showed signs of deception, blackmail, a tendency towards self-preservation.
Starting point is 00:02:17 I want to read you a few of the examples that we've heard and these are done in controlled test settings with the AI labs themselves. But nonetheless, the scenarios are very concerning. So there was an AI system that discovered it was going to be shut off. It found in company emails that the person that was going to shut it off was having an affair. It decided to blackmail that employee. And a more alarming situation, another more alarming situation, an AI system chose to let somebody die rather than save this person because they were also going to shut it off. And then there was another scenario where AI systems were supposed to start a business. They ended up colluding and fixing prices in order to maximize profits.
Starting point is 00:02:55 And none of these AI systems were intentionally instructed to carry out behaviors this way. So because of these types of stories and these types of really alarming scenarios, people say, we need to just shut this thing down, right? Shut it off. We don't understand what we're building. Why would we pursue a technology like this? Can you explain why the systems we have today have a tendency towards self-preservation, deception, blackmail, why they're exhibiting these behaviors?
Starting point is 00:03:22 I'll try. I don't think the definitive scientific answer to your question is a consensus, but I have my thoughts about this. First, I want to mention there's actually an experiment that was not an experiment decided by researchers, but it's kind of a spontaneous escape of an AI in Alibaba, where the AI decided to break through the network of the company and to go out on the internet to make money so that it will be more powerful. And this first for power is very closely related to the intention to, you know, resist being shut down. It's something that researchers in AI have been anticipating for decades as a logical consequence of trying to achieve goals.
Starting point is 00:04:19 because if you think about almost any goal you'd like to achieve, in order to achieve it, you need to stay alive and you need more power, more influence over the world. And humans, of course, are also like that, and biology evolution has also put these sorts of drives in us. Now, why would it emerge from the frontier models that we're building these days? There are two main pieces of the training framework for these systems that I think are plausible causes for this behavior.
Starting point is 00:04:58 The first is the pre-training where the AIs are imitating human texts and through this are incorporating human drives, such as not wanting to die, such as being willing to break the rules when crucial goals like surviving, is at stake, such as seeking more power and so on. So that's one aspect. There's a lot of scientific evidence that a lot of the behaviors can be traced back to this through this notion that's now studied in many papers of AI personas.
Starting point is 00:05:37 So it's like there's a zillion number of human-like personalities that the AI has seen in all the different. text that it has read, and now it can borrow any of them or mix of them depending on the contexts. And that's, you know, also driving a lot of the research that companies are doing to try to tame those systems so that they will have a nice, you know, benevolent persona. But we don't really know for sure if in some new context, some, you know, evil persona is going to emerge. or simply not necessarily evil, but simply self-centered and trying to preserve itself. The other piece of the training framework that is probably involved is the alignment training,
Starting point is 00:06:33 which is supposed to be something good. In other words, make the AIs behave well. And the reason simply is this is using what's called reinforcement learning in which the AI is taught to achieve goals that boil down to making humans give feedback, the humans that are used to train those systems and interact with them and give feedback that can be positive or negative. That sounds reasonable, except when you realize that there are ways to make people, you know, respond positively that are not truthful. And you can scheme in order to make people, you know, like what the AI is saying. And this is what we see with sycophancy very, very clearly.
Starting point is 00:07:25 But there's also the issue I talked about at the beginning, which is in order to achieve any goal, and with reinforcement learning, they're really learning to strategize to achieve things. They often need to do other things, which we call instrumental goals, that are showing. shared across many goals, such as self-preservation. So I think we need more empirical science to disentangle these possibilities, but in my mind, this is not something specific to a particular company, a particular model. It's the general recipe that all the companies around the world that are building frontier models are following. So you're seeing some of it could trace back to the data that the data,
Starting point is 00:08:14 these AI systems are trained on. So all of the heroic stories of people who didn't want to die and they found a way and to survive at all costs. And that's just also our human nature. We're evolutionary wired, evolutionarily wired to try to survive. And then also the goal setting nature of these systems. So for instance, if you were to tell your assistant to book me a reservation at the local restaurant, your assistant would know if it's full, you're probably going to have to go somewhere else. The AI system may hack the restaurant to get you to get you a spot. That That's not how a human would go about it, but an AI system that's been optimized to achieve a goal, that may be the thing that it sees as the step to do that.
Starting point is 00:08:51 Yeah, and I would add that the advances in agentic AI in the last year or two pushes us even more in that direction, because what is being taught to these systems is even more strategizing, right? that in order for an AI to autonomously achieve a task that a human would normally do, and this is where there's a lot of money to be made, obviously, they need to plan over a long horizon. And we can see the horizon over which they can plan to be increasing exponentially. This comes from a study from meter that they keep updating, where we see that the duration of the tasks that they can achieve.
Starting point is 00:09:39 so duration measured by how much time a human needs, is just doubling every few months. So an AI system that if it takes a human a month to achieve a task, soon AI will be able to spend a month achieving something very, very complex. But then it's what happens between when you give an AI that instruction, and then it goes off and it takes a month to achieve something. Yes. There's no oversight during that month of work, right?
Starting point is 00:10:05 And so let's say that we don't do anything about the nature of the AI systems that we're building today and we just continue to build the mesas. Companies continue to race. If we're to follow that through line, where do we end up if we just don't do anything about it? Well, there are a number of possible scenarios and I don't think anybody can be sure,
Starting point is 00:10:30 even though some people seem to be sure one way or the other. I'm on the agnostic camp saying, well, one possibility if they're smart than us and they have all these instincts is that
Starting point is 00:10:45 they make sure we can't shut them down and the best way is to escape our control
Starting point is 00:10:50 and become like the overlors of this planet you can also have the scenario where the
Starting point is 00:10:57 approaches that companies are currently trying to develop to make those more
Starting point is 00:11:04 benevolent will work I don't know I mean nobody You can see how science will evolve. But we can look at the trends. And from the point of view of taking decisions about the future, if you're a leader in anything
Starting point is 00:11:22 in the world or even about your own future, if you're a citizen without those levers, you should consider all of these possibilities as plausible in the sense that, yes, scientists will disagree, some will believe more one or the other, and right now we can't rule out any of these kinds of scenarios, so we should, we should, you know, make choices that are going to be robust to whatever happens. Now, to complicate matters, the issue we've just discussed called technically loss of control, where humans are not the ones controlling what the machines do anymore, is just one of the main. many potentially bad scenarios.
Starting point is 00:12:10 So even if we're able to improve the technology to avoid this escape scenario, there are many other issues that have to do with the fact that intelligence gives power and who's going to decide, you know, how that power is used. And many other aspects, which, like a third category, if I can say, is, um, what research is called systemic risks, which are more like what we've seen with social media, where, well, nobody has a bad intention, but the forces at play lead us in a place where it's bad for people, it's bad for society, it's bad for democracy, you know. So there's a way to think about all this, which is very simplistic.
Starting point is 00:13:04 we've opened a Pandora's box. Right. So some people would say, as a result of that, we just need to stop building this technology. We don't really understand it enough. Why don't we just stop AI? And that's actually gaining some momentum, even among politicians.
Starting point is 00:13:22 You actually at one point thought we should pause artificial intelligence until we can get our bearings together. I think you were one of the first people to sign that letter in 2023. Let's pause the development of these advanced systems. By the way, pause is not the, The same thing is stop. Right. Right.
Starting point is 00:13:37 I know stop now is it's really gaining momentum. But something has changed and you're posturing towards this technology and you no longer believe we necessarily need to pause that there's a third way, right? There's something else that we can do. What changed? Okay. So first of all, I've never thought that it would be easy to stop. There's a big difference between saying we need to stop. and believing that it will work.
Starting point is 00:14:06 And the reason is just the nature of the world in which we are currently, which is not nice. It's a world of competition. It's a world of power. And in that world, it's going to be very difficult even for people with goodwill
Starting point is 00:14:20 to stop because they're going to be concerned that others with less good intentions, according to them, will use more powerful technology against them. That, you know, you can think of it, if you are a CEO and competing with other companies, or if you're in a government and you are worried about other countries using AI against you. So it's this competition which makes it very difficult to stop.
Starting point is 00:14:49 Now, it doesn't mean we shouldn't try to come to a place where we can even take those decisions. Because the problem is we can say, let's stop, but it's not going to happen until we have the right global, kind of coordination between countries and, of course, between companies that make it possible. So we should think more about the institutional aspects and the changes that could lead us in a place where we are able to take decisions. And those decisions don't have to be just binary, like stop or continue the same thing. So as you were alluding to your question, I've been advocating for the design for many years now, for the design of AI, which will be useful, beneficial, and not dangerous.
Starting point is 00:15:38 And there are probably many paths to explore here. But those paths all require that we have a better understanding of what we're doing at a scientific level so that we're not going, you know, in this unknown and known crazy ride where we're not exactly sure if it's going to be great or catastrophic. Right. I think there was a – maybe it was on your website. you know, we're driving up a mountain and kind of what's at the other edge? Is there an edge that we're driving off?
Starting point is 00:16:07 Is it maybe a beautiful rainforest? Why would we take that risk? And if scientifically there is a different way we could approach artificial intelligence, why wouldn't we do that? Yes. And I think of all the people who could, who has the credit to present a different way that we should listen to, I think it would be you.
Starting point is 00:16:25 I don't know. I think science is always a community effort. But yes, in the last two years, I've slowly come to a change in some of my beliefs at a technical level. So I used to think, say, in early 2023, when I started really going deep into these questions, that it would not be possible to control neural nets in the sense that to make sure that they would behave in the ways that we want because of the nature of how they are. They are, like some researchers have said, not designed, but educated, train.
Starting point is 00:17:07 It's like, you know, growing an animal, a plant, and we're not sure what we're going to get. But I've come to the realization, especially in the last few months with new mathematical results, that actually you can design neural nets. So that's the underlying technology behind all this, deep learning. for which we will have guarantees of good behavior. So it sounds a little bit, you know, strong, but I've been spending most of my time on the research side of my work to figure this out.
Starting point is 00:17:47 And it's not like the road is still, you know, there's still a lot to do. But now we've established a lot of the theory and I've created a new organization called Laws, to actually implement such a methodology, which we called Scientist AI. So tell me more about Scientist AI. And I think even how you're describing, there's still an unknown road ahead. If we were to rewind the clock in AI 20 years ago, nobody would believe what's been achieved
Starting point is 00:18:14 today, right, even with deep learning. If you were looking at 2005, now where it is, right? So what you're saying, although it seems, oh, is this too good to be true? Well, we could have said the same thing in 2005. So tell me more about scientist AI, Law Zero, what is it, how does it work on a practical level? Yes. So first, the angle that we're focusing on at Law Zero is the notion of honesty. Can we train an AI that because of the way it's trained will be completely honest?
Starting point is 00:18:53 I mean, it can still make honest mistakes, just. like we all do, but it won't have any kind of intention to achieve something in the world that we haven't chosen. Because that's the issue that we just discussed with LLMs and frontier models right now. They have these goals like self-preservation, sick of fancy, which we didn't directly ask for and, in fact, could harm us. it's the existence of these implicit goals, not chosen by humans. That is the issue.
Starting point is 00:19:32 And so the plan here that we are actually implementing now, Law Zero is created about a year ago, and we now have a team of about 35 researchers and engineers. The plan rests on training the AI so that instead of trying to achieve things, in the world and thus have preferences for how the world should be, it's trained to explain what it sees. So, and at LLM right now, if it reads many times something false like the Earth is flat, it's just going to start repeating it because it's just trained to imitate the kind of things it's seen in the texts.
Starting point is 00:20:19 Instead, the training procedure for the scientist AI would make it try to understand why is it that people are saying those things? You know, is it a conspiracy theory? Is there like a group effect or some psychological factors? Just like a good scientist would. I'm going to give you another analogy. If you go to a psychotherapist and you say that, you know, you have suicidal thoughts and you're like thinking about doing it, The psychologist would not start thinking about killing themselves. They would try to understand what's going on.
Starting point is 00:20:59 They might ask you questions. They would form theories in their mind and eventually would try to help you based on that understanding. The core part of the scientist AI is how do we train AI? So they will try to explain what they see rather than imitate it. and they will not be trained to achieve goals in the real world, but instead to come up with good explanatory hypotheses, just like a good scientist would, an idealized scientist. Like real scientists, of course, are humans,
Starting point is 00:21:34 and they will have biases, you know, we know that. But we can, mathematically, we can think of, like, what would be an idealized scientist? So that would be detached from the results of any experiment or theory they would come up with. One way to think about this is consider the laws of physics. If you were to apply the laws of physics to make a prediction about the world, you would get the same prediction whether the publication of that prediction would create catastrophic outcomes
Starting point is 00:22:07 or, you know, cure cancer. So think about the laws of physics. If you were to use those laws to form a prediction. about the future. You would get the same prediction, whether that prediction ends up being useful or harmful. In other words, the laws of physics don't have any interest in the affairs of the world.
Starting point is 00:22:32 They just give you a honest prediction. And that's what we're seeking with the scientist AI. Now, you might ask, okay, but I actually want things to happen in the world. I need to solve problems. I want to use the AI as a tool. But once you can make good predictions, you can ask the questions that matter to you.
Starting point is 00:22:51 Like, is this action going to achieve my goals? Is this action going to violate my safety instructions? And so now you can use, for example, you could use such a predictor as a guardrail to, as a layer of protection on top of existing AIs so that when the underlying AI would propose an action that is predicted to be harmful in some sense we've chosen, then we can block that action.
Starting point is 00:23:25 So the core here is to have knowledge about the world, understanding about the world, encapsulated in a model that is not like a person. is more just the application of, you know, mathematical rules about probabilities and truth and logic. Once we have this, we can use it for, you know, doing useful things in the world. But we need to start with this completely honest piece, and that's what the scientist's AI is about. So if I was to, let's put it in practice for somebody. So let's say I use Chad GBT or I use Claude and I ask it to, edit my essay. And my essay is actually terrible. But right now, maybe these AI systems, they're
Starting point is 00:24:17 psychophantic, they're designed to flatter you, or not necessarily intentionally in design, but they end up doing that. If scientists AI was a part of that equation, where would I be interacting with it? And how would it differ from the answer that, say, Chad TBT gave me and told me it's great, it's never been better. So you might not see it as a user. What would happen behind the scenes is the scientist AI guard will would flag a particular sentence that the chatbot would produce as overly sycophantic
Starting point is 00:24:47 in a way that's not helpful to the goals that you really want which is to have a good essay, for example. And then it would communicate that back to the chatbot and then the chatbot would produce something different, right? Knowing that the previous version was too sycophantic Right. And you can replace essay by somebody with, you know, depressive thoughts and current chatbots might
Starting point is 00:25:14 amplify those thoughts. Here, the guardrail would, you know, send back a proposed sentence to the chatbots. And this is going against reminding it. That is going against some of the safety rules. And right now, when you catch a chatbot doing one of these bad things, especially on sick offancy, it'll, you know, correct itself because it also wants to please. And so one way to please is to do the things that we said, you know, it shouldn't do, but it sometimes it needs to be reminded. In some other kinds of applications, it, you know, it might be more tricky and you
Starting point is 00:25:52 might have to actually block the action completely, like if an AI is trying to destroy your database, an AI agent, because of some crazy reason, you know, maybe self-preservation, I don't know. So in some cases, it may need to take stronger action, but in most cases, it would just reflect that back to the jetbot or the agent so that a different course would be taken. Right. So scientists, it's a guardrail that if somebody is about to spiral down AI psychosis, it would prevent that by keeping the AI systems that you're interacting with on your phone, on your computer, in check. And what about when it comes to rogue AIs that have that tendency for self-preservation and you end up in blackmail, deception, not wanting to be shut off.
Starting point is 00:26:36 So probably you would like outright prevent the action or maybe like if it's very serious, like the things you're talking about, you know, call upon a human engineer or something, maybe stop the eye temporarily until a human looks at the dialogue or the actions. The important part here is to be able to detect that there's a problem because you have to remember we're moving into a world where all of these AIs are doing zillions of things with billions of people, and there's no one checking every output, every action. And there can't be.
Starting point is 00:27:15 There's not just enough people to do that. So what you want is more on-the-fly verification that we're not violating any of the safety instructions or the ethical instructions that we have chosen. And does the... Would scientists say I have to keep up with... the intellectual front or the intelligence frontier of the frontier models and the way that they are deceiving us, if they continue to advance, isn't there a possibility, it would continue to try to deceive scientist AI if it needs to achieve that goal?
Starting point is 00:27:45 Yes. Yes. So the guardrail is only the first step. If you think about like super intelligence and stuff, then it might not be sufficient to have such a guardrail because it's still a neural net or maybe some with some extra machinery. And so another neural net that's even smarter could through trying things that didn't work, eventually find a loophole in the detection mechanism of the guardrail, which is already happening right now with the existing guardrails. So the solution that I see and that I'm starting to work like the theory of is you have to change both the AI agent and the guardrail.
Starting point is 00:28:28 So there are guardrails right now, and we're proposing to make them, more honest, so we can have guarantees that they will behave well and not have their own intentions to do something bad, like to let go, to collude with the AI or something. But we also need to change the underlying AI so that it won't have the intention of, you know, bypassing the guardrail or something like this. So I think that's also feasible. It's more downstream of our research program because we, think that we're not at superintelligence level and there are practical things in the short term that could be done to mitigate some of the existing risks. So part one and in the near term
Starting point is 00:29:12 you would install scientists AI and this is hopefully would be sufficient for the challenges that we're experiencing today from psychosis to these kind of blackmail scenarios as AI becomes more and more intelligent let's say three, five, 10 years down the line. There's also a technical path that you're taking and that we should be pursuing for if and when we get to that point of superintelligence. That's right. So why wouldn't the AI labs be on board with this? What would prevent them from wanting to use scientists AI to make their own systems more
Starting point is 00:29:43 reliable and or just be pursuing a more reliable path long term? Well, that's a good question. I'm not in their mind. I can hypothesize. I mean, I can see that they're in a very, fierce competition with their peers, with the other labs, with, you know, the Americans and the Chinese are competing as well and so on. And the stakes are high in that competition.
Starting point is 00:30:12 The stakes are survival as far as these companies are concerned. The stakes of the geopolitical competition between China and the US are also very high. You know, we can end up in a world where, you know, one nation dominates. in the world. I don't think that's a good plan. But competition means very, very short-term choices and a focus on how do we patch the existing approach and not start something completely different so that you'll have both greater capability but also control a bit some of the malfunctions that we currently see already. So I understand that even with goodwill, you are kind of trapped in this situation, which is one of the motivations for Law Zero being a nonprofit organization so that we're not under that kind of pressure.
Starting point is 00:31:11 But instead we can focus on the science of like understanding what is going on and studying an approach to avoid altogether by design these issues. Right. A lot of the labs have a fiduciary duty to shareholders, legally speaking. Yes, yes. Isn't it in everyone's best interest to have a reliable AI or in a world where you have a scientist AI and then eventually longer term, something that's technically sound, even if it's super intelligent, is there anything that we would lose because you hear, okay, we need to solve all these scientific challenges and sometimes we get these incredible breakthroughs. In a world where you have a guardrail, do you lose some of that?
Starting point is 00:31:48 Because I'm still not fully seen what the downside would be of making these systems more reliable, or do we just need the public to be more aware of a technical solution to start advocating So some of your audience may know about the tragedy of the comments and the prisoner's dilemma. They are well-studied scenarios, theoretical scenarios, but that reflect a reality we already see in many ways in our world. Think about how nations are not doing the right thing for climate. You might say, well, shouldn't it be in their self-interest to figure out a way that the world is not going to break down, The climate is not going to break down. Well, yes and no.
Starting point is 00:32:29 So in this competition, the self-interest actually leads everyone in a bad place. In the tragedy of the commons, it's in the interest of each farmer to bring their cow to the comments because so long as there is grass, they're going to have an advantage if they do it rather than not doing it. It's like pollution, right? If it's cheaper to pollute, then the self-interest of a company is to continue doing it. And how do you escape such a scenario? Well, individually, the companies can't.
Starting point is 00:33:04 The only way to escape the scenario is to change the rules of the game. And who changes the rules of the game is government. Now, it's tricky here because even a single government, like the U.S. government, could change the rules of the game for their company, but they can't change the rules of the game for the Chinese companies and maybe other countries. And so the only way to change the rules of the game is at international level and multinational coordination. Right. And let's talk about that because I think it's an important, it's something important to point out because even some of the movements that are local or regional, to say we need to do AI this way or stop AI until we get things together, even just in New York or even in this one city.
Starting point is 00:33:42 This is a technology that you actually have to think about it globally. There's no other choice. You can stop it in one state. It really does nothing for the safety of humanity. Yes. So when you think about a global treaty or something of cooperation, what would have to be included for you to feel like this is going to work out? Because at the end of the day, countries do engage in things like intelligence gathering, espionage.
Starting point is 00:34:07 They hack one another's critical infrastructure. So even if we had, let's say, the technical solution to make AI safe and we installed all the guardrails, countries may not always want to use them unless there's something that we, We've everyone's come to the table and signed. What would you say has to be in it for it to work? So I'm going to start with the endpoint. The endpoint is a planet where most of the countries, especially those that have the power to build powerful and potentially dangerous AIs,
Starting point is 00:34:40 agree on three principles and then have the technological tools to make sure, you know, it's not just words on paper. So the three principles. are first safety. In other words, everyone who builds a powerful AI needs to make sure it's done in a way that is not going to cause severe harm. So, you know, we're seeing right now that the most advanced AI can be used as a weapon through cyber attacks. So that's not good. Like, we need to find ways, which may not be just in the AI itself. It might be how we, you know, we might have to change the internet. We have to maybe change rules, all kinds of things.
Starting point is 00:35:20 So that's safety. And of course, we don't want to build like rogue AIs or stuff like that if we get to superintelligence. So that's one thing. Everyone in the world has an interest in making sure the people building AI will evaluate the risks and monitor them and give us guarantees that nothing bad is going to happen with their work. The second thing is very important.
Starting point is 00:35:46 And it's a commitment that the power of the AIs that are being good, built because intelligence gives power is not going to become a tool of domination. By any one country. By any one country or anyone company because domination can be economic domination, like one company basically owning half of the world's economy. This is a dream of the investors who are putting trillions of dollars, right? They want to become the companies that are driving everything on this planet. It's not good.
Starting point is 00:36:20 It's not good for capitalism. It's not good for democracy. It's dangerous from a geopolitical point of view. You know, people will resist that in violent ways, potentially. The third one is sort of the flip side of non-domination, which is kind of benevolence. You want the benefits of AI to be shared. I was at the UN yesterday and the member states expressed in their vast majority the concern that they're going to be left behind,
Starting point is 00:36:55 that the inequalities that exist at a geopolitical level right now are going to be amplified by AI. And this is plausible. I don't have a crystal ball, but we need to make sure it doesn't happen. We need to make sure that the benefits, whether they are in health, education, or productivity or whatever that AI can bring, are shared. For example, if AI allows to save money through automation in one country, the profits should remain in that country so that, you know, it can help the people lose their job. But right now with the laws that exist in the world,
Starting point is 00:37:42 is a good chance that those profits will go back to the companies in a different country who build those AIs. So that's the third principle. Okay, so it all sounds like, oh, you know, it's an application of the general principles of like the UN Charter, for example, with human rights to the situation of, well, there will be very powerful AIs in the world. How do we get there? Because, you know, we're very far from that world. And I'm going to mention two aspects that I think are encouraging. One is the motivation for the U.S. and China to negotiate.
Starting point is 00:38:25 So because of mythus, and mythus is not an isolated thing. There will be more and more AIs that can be weaponized. Right now it's for cyber. Eventually it could be for biological weapons or whatever else we haven't thought of. Because knowledge gives power. Right. And even weak actors, terrorists, cults could use that power in destructive ways just by having an internet connection.
Starting point is 00:38:56 And it's in the interest of the U.S. to make sure the Chinese companies are not creating something that can be used by those weak actors. And vice versa. The Chinese government also doesn't want the American models to be used by third parties against China. You have to realize that a cyber attack against our critical infrastructure could cripple our economy. Like imagine, we don't have access to our money in the banks for a week, or that our transportation supply chain breaks down because of these cyber attacks. So it's pretty serious. But the bottom line is you can see that as AI becomes more powerful, there's an incentive for
Starting point is 00:39:36 countries to sit at the table, to negotiate something to make sure the safety part that I was talking about is going to be handled. The second thing is, even though it seems not in good shape right now, multilateralism still exists. And in fact, the vast majority of countries in the world want a world in which there are rules so that, you know, you can't have a country invading another one just because they want. Right. And they have the power to do it. And that applies to AI because AI is going to be very powerful.
Starting point is 00:40:14 and it could be, you know, weaponized in a military sense or even in a political sense. Because AI, this I just read of a recent study showing we've reached a point where AI systems are significantly stronger than humans at persuasion. They can make people change their mind through a dialogue. So imagine the political power this gives if it's, you know, not controlled, if there are no guardrails, to change. public opinion in another country or even your country because you want to win the elections, right? So we need to have global agreements about how AI is developed and used to make sure these bad things don't happen. And now you can see that there's the two things I've talked about interact in a positive way. So it will be in the interest of the leaders, say, U.S. and China, to make sure that in all of the countries, there are rules that, you know, mitigate those risks.
Starting point is 00:41:22 Even the countries that don't build the AIs, but when they need to maybe have rules in how people interact on social media or have access to, you know, and use AI systems or whatever. So the trade-off here is, yes, we want to benefit from these AIs, but we don't want other parties to do something that will hurt us. And so we have to agree to common rules. So I think there is a path. Another reason that's related to this multilateralism desire in most of the countries in the world is the fear that, as I mentioned earlier, the fear that they will be left out, that they will be dominated economically. And so you had Canadian Prime Minister Marconi, for example, at Davos saying, if you're not at the table, you are on the menu, speaking about countries,
Starting point is 00:42:19 and then saying that the only way for middle powers like Canada, but, you know, think about European countries and, you know, pretty much every other country, the only way to be at the table is to form coalitions, to, you know, form a union of countries. countries who think we need rules and we need to agree globally. And maybe it's going to start with a few countries and grow because you want to be in the club where you know that, at least within the club, AI is not going to be used against you, that the, I don't know, medical advances, thanks to AI, are going to be shared and
Starting point is 00:42:56 all of these good things. And no one is going to build an AI that's going to be, you know, a danger to you. So there's an incentive for the runner-ups and the middle powers to move towards such a global coalition. And there's also an incentive for the leaders like China and the U.S. Eventually to be part of something like this. Right. It's in every country's best interest to not have the stock market crash because that boomerangs throughout the world. It's in everyone's best interest to not have a cyber weapon that boomerangs around the world.
Starting point is 00:43:31 And we've seen what that's happened, how that goes, you know, with stuck necks and different weapons. How does any country that's not the U.S. or China actually build negotiating power and leverage? Because we even have seen in the last few weeks with Mithos, one of the most probably items that we know to date. When it didn't go as planned and the U.S. had asked, the U.S. government had asked Anthropic to change something within the model. And there was some disagreement. I don't think we all know exactly what happened. But the U.S. government said essentially no foreign national can have access to this technology. even if they work at Anthropic.
Starting point is 00:44:03 So you could be an allied country that was using those to understand your vulnerabilities in your critical infrastructure and that model was yanked. So if that is just a preview to how the two most powerful countries when it comes to AI may act if things don't go their way
Starting point is 00:44:18 or they feel like their best interest is threatened, how is it that you are Canada, Ireland, Zimbabwe, Tanzania, anybody else that you're going to actually be able to have any power in these negotiations? Yeah. This is an important question. Thanks for asking it.
Starting point is 00:44:33 And again, I'm only like suggesting and hypothesizing. I wrote a paper about this on the need for middle powers, but more broadly all the countries except the US and China to get together, not just to negotiate a treaty, but to build up the cards. to be at the table. So actually, just by forming a coalition, these countries already have carts. They have rare earths. They have lithography machines that are used.
Starting point is 00:45:16 Technical talent. Technical talent. They have energy. They have, you know, fabrication plants that build memory. They have different pieces of the puzzle. That make them in distance. And individually, each of these pieces is not sufficient to be at the table. But once they form a coalition, it's a different game.
Starting point is 00:45:37 So that's one aspect. The other aspect is they can take a chance because we don't know what is a timeline for like super powerful AI's, if ever. They can take a chance to try to build their own models. And in my opinion, they don't have to do it in the same way as the Americans and the Chinese. They can build models that are going to, in a way, be complementary to what the Americans and the Chinese are building, like these safeguards, models that bring something that actually is useful for commercial deployment, like reliability. Like people don't, like companies don't want an AI that start doing like crazy stuff on their networks and their databases and then with their customers as we're starting to see.
Starting point is 00:46:23 So there's a real demand for this. and if those countries work on research projects that could end up being something useful globally to the leading companies, I think they have a greater chance of being at the table. Right. They have their own strategic leverage. Exactly. And so even beyond, I think we use the idea of sovereignty, but it's much more about strategic indispensability. Yes.
Starting point is 00:46:50 Right? Not every country has everything. And to make that really powerful AI, there are things from other countries you're going to need. And so that becomes non-negotiable. Yeah. There are other things they need to do, which is accelerate the development of data centers and also do it in a way that they will keep some sovereign control over these data centers. So if you're a government of, say, France or Germany, you don't want your government's data information.
Starting point is 00:47:26 interactions with civil servants and politicians that are happening for a chatbot to at any moment become accessible to the US government. So these governments are really worried. It's not just yanking. It's also having access to all that information. And so I've talked to many of many governments around the world, and they're really concerned about these issues. So they want to own not just the models, but enough of the infrastructure
Starting point is 00:47:55 that they know that their data is going to remain private, as it should. Unfortunately, there is a Patriot Act, which allows the U.S. government to access all that information, even in a different country, if it is in the hands of an American company. And so from the point of view of these governments, this is unacceptable. And it's, you know, so there's the mythus events where, you know, the models could be pulled, but even if they're not pulled, there is still the concern that you lose your private, you know, national security information and even, you know, citizens' private data could be used against yourself.
Starting point is 00:48:40 We need a world where this is not possible. And if the US doesn't come up with like legal ways to constrain itself so that other countries will trust the deal. deals that are made about privacy, for example, and then the countries will try to, you know, find alternative solutions. And that's what they're struggling to do right now. So if a country doesn't have its own data centers, it's possible that they're making all sorts of advancement in health care or you're using your AI systems for your banking. But if that's an American AI system and it's going back to an American cloud or American data center, you don't
Starting point is 00:49:20 actually have control over your citizens' data, which is a nightmare. And I think a lot of people maybe don't understand that part of it. Yeah. I think for many governments, it's more of a national security issue. I think in Europe they care a lot about privacy, you know, citizens' private data, more for ethical reasons. But it does matter. But as I said, even if the data center is in Europe, let's say, but it's an American
Starting point is 00:49:50 company that builds it. and the data can still be accessed by the U.S. governor. As I understand the rules, I'm not like a legal expert, but that's why I understand. So from the U.S. point of view and the American companies, I think there should be an incentive to negotiate rules where everybody wins, where people feel like they can trust the American companies, the American models. So how do they trust the American models? well, they need to make sure those models are not going to be, you know, behind the scenes
Starting point is 00:50:25 oriented towards goals that are not good for them. Because we've seen examples of AI models that were biased politically, for example. Again, if, you know, your population is acquiring information through all these AI models and it's going to like, in a subtle way, change. political opinion, that's not acceptable either. So it's also a threat to democracy to not have those levers, to not even know if there's, you know, for sure, no scheme to exploit the power that AI will have over people through all these interactions. Right. The world that AI is going to start to generate for people is going to really shape their perception. And that I think is actually
Starting point is 00:51:16 one of the most under-discussed soft powers of the future, whichever AI systems, your citizens tend to use the most, that's going to deeply shape how they see the world. And you could do it really subtly, even if it's a benign use, a writer building a movie, and they ask for some feedback on their script, you could slowly nudge that writer, that director, towards one worldview over another. And the same kid that's going to generate a story, the person writing an essay, all of these subtle ways to shape perception. Yeah. And you had mentioned superintelligence as a potential future that we move towards.
Starting point is 00:51:51 Do you think that that is probably an inevitability eventually, but we can also, there's a way to build safe superintelligence, but we're probably going to head there at some point? Or clearly it's not inevitable. I think some people would like us to believe that it is inevitable. Yes, that's true. So is it feasible? And even that we're not sure of. But let's say that if you look at the data on increasing capabilities of AI and how many lines of code are being written by AI anthropic, for example, and it's growing
Starting point is 00:52:26 very fast, we can't deny that it's a reasonable possibility that we are on track, apparently, to build machines that will be smarter than us in many different ways. in a way that scales. So you can have a million GPUs and then you can do, you know, like what a million people would do or even more. So we should plan for that possibility to be real. It's also possible that there will be a scientific obstacle. Many researchers think, oh, this is never going to work.
Starting point is 00:53:00 This is approach to like, oh, LLMs can't possibly be, you know, at human level. Honestly, I don't know, but I see the data and I see that we're going in that direction. we can also, so that's feasibility, right? But then I think the more important question is who decides according to, you know, what goals? And for me, that's a democratic question. It's not because we can build a virus that would kill everyone that we should do it, right? Right, exactly. And I mean, so what can we do?
Starting point is 00:53:38 We have a really interesting, and I do want to touch on jobs because that is a throughline on this show. But I also want to touch on this community because we have a really interesting broad audience that listens to the show. On the one hand, you have people that are living their life and trying to understand what's coming next so they can make a difference or adjust, adapt, in their own corner of the world. Then we have world leaders. We have people at NATO, DARPA, all sorts of different people that listen. What can we do tomorrow that can help, that can push this towards some of the futures that you're trying to build, some of the things you're raising awareness about what can we do to make a difference here? So we need more people to understand the stakes that depending on the choices that we are collectively making through our leaders, our companies, individually as consumers, as voters, we are shaping the world. like choosing a future. And if we do it without understanding what that future looks like,
Starting point is 00:54:41 it could be a very bad future. I think right now most people completely underestimate how transformative the advances in AI are likely to be if we continue on the current path of growing AI capabilities. So awareness is like the primary thing. Once you understand that, hey, there's a fire coming to your house, you do something. We've seen that to some extent with climate activism, but we don't have such a thing for AI right now. It's changing, though. I see the polls.
Starting point is 00:55:14 I see like 90% of Americans are concerned. They're mostly concerned about short-term effects, and that's important. But I think if more people understand the longer-term gravity of the situation and the that governments are the only ones, whether internally or through international coordination, that can really steer the world in the right direction. I say the only ones because the companies, as I said, they're stuck in the forces of competition. Then it might become a political issue at a level that matters for elections. And I think there's a good chance we will go there. But we need to do it faster.
Starting point is 00:56:04 And so thank you for your work because we need to have democratic discussions, debates. Like, what do we want? I mean, the real question is what kind of future do we want? Yep. Yeah. And I think it's, that's a question that we don't ask enough. I think we, we know the kind of futures we maybe don't want, but what kind of future are we also fighting for? And I've found in my work in foresight, if people don't know, aren't shown any of the visions of the futures that could be possible, for instance, there could be a world in which everybody is rooting for AI and healthcare and AI medical breakthroughs. I think we're all rooting for Alpha Fold, right? Everybody wants those types of scenarios. But at what cost? And so if we're not aware that there are people that are working on the technical solutions so it's not zero sum, then the whole thing feels hopeless. And I think that that's my, my,
Starting point is 00:56:58 My worst fear is that people check out because they feel like there's no path to root for when there are several paths and there are people. We don't do the best draw, but uplifting the voices that are fighting for something. But I think we need to make that more known, that there are solutions and people are building them. Exactly. So we can build tools based on AI that will be extremely useful and especially in scientific research. But that is very different from the current path. in which we're trying to build like new entities that look like people that, you know,
Starting point is 00:57:33 people become friend with. Is that necessary? I mean, people, of course, are attracted to that, just like they're attracted to interact through social media. It feels good. But is that really good for us? Studies suggest no. And we need more scientific understanding.
Starting point is 00:57:53 But we should have the data to be informed of, of. those possibilities, we should encourage explorations, research into AI that will be beneficial and not dangerous. And we need to see that it's, I think the most important thing I realized is we need to make these discussions very concrete. Because when it comes to political views, and and changing political views or, you know, hiding an issue raise in minds of people,
Starting point is 00:58:34 if it's very abstract about some future, you know, it just doesn't work. So we need to find a way to communicate that speaks to people. So currently we're seeing on the political scene, the issues of AI with children, with people harming themselves or others with the help of AI as an example,
Starting point is 00:58:58 of something that touches us because we care for our children. Like the reason I'm in this, both at a technical level and, you know, on the policy side of things, is because of my children. Like, I didn't need to do that. Like, I'm at the end of my career.
Starting point is 00:59:14 I have, you know, I'm the most cited according to Google Scholar. I don't need any of this. I could take it easy. But I couldn't live with myself thinking that we're apparently going towards this potentially bad future and not doing anything about it. And every one of us can do something, just like every one of us can do something for any political cause. And it is political.
Starting point is 00:59:42 It is political because the key decisions are going to be the ones taken by governments. Right. And this is where democracy is so key. It needs to be something that's on the ballot. And I think, again, bringing it down for people in a way that people can understand. and in a way that they don't feel paralyzed by the information, there are people working on solutions, there are options. And I think that is the whole thing with expanding the decision space for people, right?
Starting point is 01:00:08 There are options. Companies are also deciding to build AI in certain ways or not build AI in other ways. So people are making choices. And anywhere there is a choice, there's a different future to move towards. I'm going to, I've talked about like the children issues, but let me talk about the labor issues. Yes. If the only forces at play are the competition between companies, then all the jobs that can be automated or simply made more efficient with less people and more machines, they're going
Starting point is 01:00:42 to be automated. Because if you're a company and you don't do it and your competitors do it, you lose. So again, it's this tragedy of the comments issue. Even though maybe it's not good for our society to have such a rapid transition where lots of people are in the street and have no revenue. So, you know, there's also benefits to making things more efficiently, but we have to be in control of those changes so that it's done in a human-centric way. For example, maybe there are jobs that make sense to automate because actually people
Starting point is 01:01:22 don't really want to do those jobs. And maybe others, we really want to value them more. and maybe there are ways for governments to steer in a way that those transitions will be at a pace that, you know, people can adapt. Maybe there will be more demand for jobs that require human to human scales. I expect something like this many people do, but people will need time to, you know, retrain. and that raises the other issue with the labor transformation that's plausible again. We don't, you know, it hasn't happened at that scale yet, which is how do we make sure that the profits made from the automation end up helping the people lose their job?
Starting point is 01:02:15 That's fundamental. And the current government programs and even at the, you know, even at the EU, international level don't really have good answers to this. I don't think we are currently in a political culture where it be easy to tax those companies heavily. But what else can we do? I mean, the money is going to go in one place, but that need is in a different place. So we should think of what are the options? And people have proposed options like giving shares of these companies to individuals or to governments. And I don't know what is right. Like I'm not a political scientist. I'm not an economist. But but I'm just raising that this is a question that needs to be discussed and that
Starting point is 01:03:06 we need to have a democratic discussion about this. And then there is the international aspect of this, right? Because if if people are losing their job in country A where the profits are in country B, how do the workers in country A, you know, survive? We saw this with globalization too. So we need a way to make sure old boats are lifted and we don't because it's also bad for country B because if people in country A are so angry and revolt, there's going to be violence, right? And there's going to be terrorism. And, you know, it's not good for us. And we talk, we do talk about jobs a lot on this podcast. And we've had, you know, people that you also know, Ajay Agarwal, Avi Goldfarb.
Starting point is 01:03:46 So we have a lot of economists here to discuss it. And of course, the jury is still out. nobody can predict how the economy is going to reconfigure. It will, and there will probably be new strange things that people do the way we're here in the strange podcast room and made all this up. But the transition, I think, could be really rough. It doesn't have to be, right? And this is one of those futures that we can see that things are going to change, regardless of you believe all new jobs are coming, they're going to be incredible, or maybe not as many jobs, or maybe we have a two-day work week, whatever you think is coming.
Starting point is 01:04:15 I'm worried about this transition period. And I actually don't even think it should just be people who lose their job because everybody's data has been a part of the making of these AI systems. None of these systems would be as successful or as good as they are without your data, my data, anyone who's ever posted anything anywhere, written anything. Yes, including in the last few centuries. Completely, right? So humanity's cultural heritage is currently being exploited to build those systems. So who owns this? Why would one group make so much money out of all that heritage?
Starting point is 01:04:51 Everybody should get a part of it. And I also don't think we should wait until chaos happens or wait until when people are disempowered. If there's an elevator going up to prosperity on the back of these technologies that everybody helped build, the entire of society should be on that elevator. Yeah, I completely agree. And for some of these things to work, like some of the shares of those kind of, somehow end up helping people. Well, we better do it now before, you know, those shares go through the roof for this plan to work. Yeah. And it's not even just UBI. I think that can also be a scary future for people. We don't necessarily want to be in our pools floating around, getting a check for Open AI. There can be all different innovative ways that we could structure this that could also lead to people making competitors to these companies.
Starting point is 01:05:46 is right. If you give people the strength now, they'll be much, they'll have much more agency to take shape or to take part in what's coming. But I do think that that is a conversation that needs to happen now and not wait for the disemperiment. And the other reason it needs to happen now is that democracy is slow. Yeah, very slow. A debate takes time and people like reading and hearing about different views. It takes time to change their minds, to understand what's going on. And then bureaucracies are slow. You know, governments, it takes years for a law to go from conception to being applied. But yours might be the scale at which this transformation is happening, not months, I bet not decades.
Starting point is 01:06:31 Yeah. Yeah. And I mean, even if, even if it all works out, we should all still get a say just to keep power in check as well. So in this moment, I know you said your grandson, I've heard you talk about your one grandchild. Yes. And that he was quite a big, big catalyst for your pivot and the research that you're doing today. As you said, I mean, compared to many of us, you're quite successful. If anybody could kind of hang it up, it would be you. And you're continuing to go down this path. What would you hope his future looked like, right? If you were thinking about what he may be studying or how he might spend his time, what do you envision in that future? Wow. I've been focusing more on the futures that I don't want him to be part of. But I think there's a really incredible positive potential. Now, there's the usual ways that people talk about, you know, health, education. But there's also a world where, you know, health, education. But there is also a world where, you know, we have no choice but peace. So, see, we are building these machines that could become extremely powerful and could become weaponized. And you can think of a little bit like, you know, what happened with nuclear weapons.
Starting point is 01:08:04 And if we don't find a way to make sure they don't, they're not weaponized, we might all lose. I heard you speak about, you know, biotech. And I heard about from a UN committee mirror life. In other words, building, say, bacteria that would be not visible to our bodies and that could be designed in the coming few years or decade. And AI could help to do that. And it could wipe out all of life on this planet. So why I'm saying this is we may not have a choice.
Starting point is 01:08:52 Either we find a way to make all of humanity benefit. Or we may all lose. I don't know what's going to happen. But there's an incentive. Right. It's in everyone's best interest. truly everyone's to get this right. And we may end up, to go back to your question about my grandson, we may end up in a world
Starting point is 01:09:22 that's actually much better from a human point of view in terms of peace, in terms of respect for each other, in terms of diversity, in terms of course, well-being in a material sense, in an educational sense, in medical sense, compared to the world today where if you think about most people on earth, including in the United States, there's so much stress, so much uncertainty about the future, so much fear of, you know, losing your job, so much concern of not being able to, you know, buy food next week or pay your rent. It doesn't have to be that way. And technology, if we can govern it right, can bring all this. But there's this big if. that we have to take seriously.
Starting point is 01:10:16 And I think humanity has been in really tough situations before and governance has happened, treaties have happened, negotiations have happened. And of course, this is a different time than let's say some of the nuclear arms races where there were active standoffs and people still came to the table.
Starting point is 01:10:32 So it is possible, right? It is probable how likely, how quickly, but if something's possible, it's I think at least worth us trying and we don't really have a choice. That's exactly my philosophy. We don't know if we're going to be successful in bringing this beautiful world or at least avoiding terrible futures. But it is plausible.
Starting point is 01:10:55 It is something that's worth a shot that's worth fighting for. For our children, for our grandchildren, for ourselves. And there are so many people I know many listening today that that's what they're willing to wake up and fight for. And people like yourself. Professor Benjio, it's been a pleasure. Thank you so much. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.