Big Technology Podcast - AI's Rising Risks: Hacking, Virology, Loss of Control — With Dan Hendrycks

Starting point is 00:00:00 Now that artificial intelligence has tried to break out of a training environment, cheat at chess, and deceive safety evaluators, is it finally time to start worrying about the risk that AI poses to us all? We'll talk about it with Dan Hendricks, the director of the Center for AI Safety right after this. Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond. Today we are joined by Dan Hendricks. He is the director of the Center for AI Safety and advisor to Elon Musk's XAI and also an advisor to scale AI here to speak with us about all the risks

Starting point is 00:00:36 that AI might pose to us today and into the future. Dan, it's so great to see you. Welcome to the show. Glad to be here. It's an opportune moment to have you on the show because I'm recently doomed curious. And I'll explain what that means. So I had long been skeptical of this idea that AI could potentially break out of its training set or out of the computers and start to potentially even harm humans. I still think I'm on that path, but I'm starting to question it.

Starting point is 00:01:04 We've recently seen research of AI starting to try to export its weights and scenarios where it thinks it might be rewritten, trying to fool evaluators, and even trying to break a game of chess by rewriting the rules because it's so interested in winning the game. So I'm just going to put this to you right away. is what I'm seeing in these early moments of AI's trying to deceive evaluators or trying to change the rules that it's been given, is that the early signs of us having AI as an adversary and not as a front? The easier way to see that it could be adversarial is just if people maliciously use

Starting point is 00:01:43 these AI systems against us. So if we have an adversarial state trying to weaponize it against us, that's an easier way in which it could cause a lot of damage to us. Now, there is an additional risk that the AI itself could have an adversarial relation to us and be a threat in itself, not just the threat of humans in the forms of terrorists or in the forms of state actors, but the AIs themselves potentially working against us. I think those risks would potentially grow in time. I don't think they're as substantial now compared to just the malicious use sorts. of risks. But yeah, I think that as time goes on and as they're more capable, if some small fraction of them do decide to deceive us or try to self-exfiltrate themselves or develop

Starting point is 00:02:37 an adversarial posture toward us, then that could be extraordinarily dangerous. So it depends. So I want to distinguish between what things are particularly concerning in the next year versus somewhat more in the future. And I think in the shorter term it is more of this malicious use, but that's not to downplay the fact that AS can be threats later on. Now, from what I understand it, from your first answer, you are concerned both the way that humans use AI and AI itself sort of taking its own actions, loss, our loss of control of artificial intelligence.

Starting point is 00:03:10 So can you just rank sort of where you see the problems in terms of most serious to least serious and what we should be focusing on? That's a really good question. So the risks in their severity sort of depend on time. Some become much more severe later. So I don't think AI poses a risk of extinction like today. I don't think that they're powerful enough to do that.

Starting point is 00:03:37 Because they can't make PowerPoints yet, right? They don't have agential skills. They can't accomplish tasks that require many hours to complete. And so since they lack that, this puts a severe limit on the amount of damage that they could do or their ability to operate autonomously. So I think there's a variety of risks. I think there's in malicious use in the shorter term, when AIs get more agential, I'd be concerned about AIs causing cyber attacks on critical infrastructure, possibly as directed by a rogue actor. There'd also be the risk of AIs facilitating the development

Starting point is 00:04:17 of bio-weapons, in particular pandemic-causing ones, not smaller-scale ones like anthrax. Those are, I think, the two malicious use risks that will need to be getting on top of in the next year or two. At the same time, there's loss of control risks, which I think primarily stem from people and AI company trying to automate all of AI research and development, and they can't have humans check in on that process because that would slow them down too much. If you have a human do a week review every month of what's been going on and trying to interpret what's happening, this would slow them down substantially. And the competitor that doesn't do that will end up

Starting point is 00:04:59 getting ahead. What that would mean is that you'd have basically AI development going very rapidly where there's nobody really checking what's going on or hardly checking. And I think a loss of control in that scenario is more likely. Right. And with the center for AI safety, we're going to talk today about risks, but we're also going to talk about solutions. And with the Center for AI Safety, what you're doing is basically pointing out the risks and trying to get to solutions to these problems. You told me you were just at the White House yesterday, the day before we were talking. So this stuff is something that you're actually working towards mitigating. And I think we're going to get to that in a bit. But first, let's talk a little bit through some of the

Starting point is 00:05:33 risks that you see with AI and how serious they actually are. One of them that just jumped out for me right away was bio, creating bio weapons. let me run you through what I think the scenario could be in my head, and you tell me what I'm missing. With bio-weapons, you'd basically be prompting an LLM to help you come up with new biological agents, effectively, that you could go on leash against an enemy. And I think wouldn't that be predicated on the AI actually being able to come up with biological discoveries of its own? Right now, current LLN. they don't really extend beyond the training set.

Starting point is 00:06:15 Maybe there's an emergent property here or there, but they haven't made any discoveries and sort of been the big knock on them to this point. So I am curious, if you're talking about immediate risks and one of them being, okay, there could be bio-weapons that are created with AI, doesn't that suppose that there's going to be something much more advanced than the LLMs that we have today?

Starting point is 00:06:37 Because with current LLMs, to me it's basically like Google. It's a search for what's on the web, and it can produce what's on the web, but it's not coming up with new compounds on its own. Yeah, so I think that for cyber, that's more in the future, but I think verology, expert level virology capabilities are much more plausible in the short term. So, for instance, we have a paper that'll be out maybe in some months, we'll see, but most of the works for it's been done. And in it, we have Harvard and MIT expert level virologists sort of taking pictures of themselves in the wet lab and asking, what step should I do next? So can the AI, given this image and given this background context, help guide through step by step these various wet lab procedures in making viruses and manipulating their properties? And we are finding that with the most recent reasoning models, quite unlike the models from two years ago, like the initial GPT4, the most recent reasoning models are getting around 90th

Starting point is 00:07:46 percentile compared to these expert level virologists in their area of expertise. So this suggests that they have some of these wet lab type of skills, and so if they can guide somebody through it step by step, that could be very dangerous. Now, there is an ideation step, but that seems like a capability, them doing brainstorming to come up with ways to make viruses more dangerous. I think that's a capability that they've had for over a year, the brainstorming part, but the implementation part seems to be fairly different. So I think in bio, actually, I would not be surprised if in a few months there's a consensus that there's expert level in many relevant ways in that we need to be doing something about that.

Starting point is 00:08:35 Wow, that's crazy to me because I would think it would be the opposite, right? That cyber would be the thing that we need to be worried about because these things code so well, not virology. So I just want to ask you. But on that, biology has been such an interesting subject because they just know the literature really well. They know the instance outside, but they got a fantastic memory. and they have so much background experience.

Starting point is 00:09:01 It's been, for some reason, they're easiest subject historically, biology and virology in earlier forms of measurements. Like if you see how they do on exams, but now we're looking at their practical wet lab skills, and they have those increasingly as well. So what about the evolution of the technology? Because this is all with large language models, right?

Starting point is 00:09:20 Reasoning is just something that's taking place within a large language model. like the GPT, which powers chat GPT. So what is it about the current capabilities that have increased to the point where they're now able to guide somebody through the creation or manipulation of a virus? That seems to be like a step changing capability.

Starting point is 00:09:41 Well, now they have this image understanding, image understanding skills. So that's a problem that they didn't use to have. That makes it a lot easier for them to do guidance or sort of be an apprentice or sort of a, guide on one's shoulder saying, now do this, now do that. But I don't know where that came from, that skill. They've just trained on the Internet, and maybe they read enough papers and saw enough

Starting point is 00:10:06 pictures of things inside those papers to have a sense of the protocols and how to troubleshoot appropriately. So since they've read basically every academic paper written, maybe that's the cause of it. But it's a surprise. I mean, I was thinking that this practical. tacit knowledge or something wouldn't be something that they would pick up on necessarily. It'd make a lot

Starting point is 00:10:30 more sense for them to have academic knowledge about knowledge of vocab words and things like that. So I don't know where it came from. It's there. Right, but this is still all stuff that is known to people. It's not like the AI is coming up with nude viruses

Starting point is 00:10:50 on its own. Well, so... You can't like prompt whatever GPS it is and say, create a new coronavirus. So if you're saying, I'm trying to modify this property of the virus so that it has more transmissibility or a longer stealth period, then I think it could, with some pretty easy brainstorming, make some suggestions, and then if it can guide you through the intermediate steps, that's something that could make it be much more lethal. I don't think it needs a, you don't need breakthroughs for doing some bioterrorism, generally.

Starting point is 00:11:21 The main limitation for risks, generally, risks will be capability and intent. And historically, our biarrisks have been fairly low because the people with these capabilities has been a very small number, maybe a few hundred top virology PhDs, and then a lot of them just don't intend to do this sort of thing. However, if these capabilities are out there without any sorts of restrictions and extremely accessible, then as it happens, then you, you know, you're going to be able to be able to your risk service is blown up by several orders of magnitude. A solution for this, to let people keep access to these expert level virology capabilities,

Starting point is 00:12:01 is that they can just speak to sales or ask for permission to have some of these guardrails take it off. Like if they're a real researcher at Jen and Tech or what have you, wanting these expert level virology capabilities, then they could just ask and then like, oh, you're a trusted user, sure, here's access to these capabilities. But if somebody just made an account a second to go, then by default, they wouldn't have access to it. So for safety, a lot of people think that the way you go about safety is, you know, slowing down all of AI development or something like that. But I think there are very surgical things you can do where you just have it refuse to talk about topics such as reverse genetics or guide you through practical intermediate steps for some virology methods.

Starting point is 00:12:45 And wait, those safeguards don't exist today? At X-A-I, they do. You're an advisor at X-A-I? Yeah, yeah, yeah. But what were the models that you were testing to try to find out whether they would help with the enhanced integration of virologists of viruses? We tested pretty much all of the leading ones that have these sort of multimodal capabilities. And they'll have some sort of safeguards, but there are various holes. And so those are being patched that we've communicated that, hey, there are various issues here.

Starting point is 00:13:18 And so I'm hopeful that very quickly some of these vulnerabilities will be patched with it. And then if people want access to those capabilities, then they could possibly be a trusted third-party tester or something like that or work at a biotech company and then those restrictions could be lifted for those use cases. But random users, we don't know who they are asking how to make some virus more lethal or something, sorry, animal affecting virus. It's just have the model refuse on that. That seems fine. Yeah, we do see the benchmarks come in through each model. release and it's like, oh, now it's scored 84th or 90th percentile or 97 percentile on this math test or on this bio test. And for us, it's like, oh, that's the model doing it. But what

Starting point is 00:13:57 you're trying to say is, and correct me if I'm wrong, if it's getting 90 percent of the way that an expert virologist might get, then it could take a crafty user, you know, a number of prompts effectively to find their way towards that 100 percent, because if they try it enough times they might accidentally get to the, not accidentally, but they might end up getting the bad virus that we're trying not to have the public create. Yeah. Yeah. So this is, this is what concerns me like quite a bit and I'm being more quiet about this just to, you know. Well, you're talking about it. Yeah, I suppose I'm talking about it now, but I'm not, you know, there's this orders of magnitude with it. It's being taken care of at XAI and this is sort of

Starting point is 00:14:38 in our risk management framework there. And other labs are taking this. sort of stuff more seriously or find some vulnerabilities, and then they're patching them. So I'm being non-specific about some of the vulnerabilities here, but hopefully can provide more precision once they have that taken care of. Okay, I look forward to reading the paper. You're an advisor to Scale AI. They are a company that will give a lot of Ph.D. level information to models in post-training, right?

Starting point is 00:15:11 So you've trained up the model on all of the Internet. I was pretty good at predicting the next word. And then it needs some domain-specific knowledge. Scale, from my understanding, has PhDs and really smart people, writing their knowledge down and then feeding it into the model to make these models smarter. How does a company like Scale AI approach this? Do they, like, have to say, all right, if you're a virology, PhD, we shouldn't be fine-tuning the model with your information.

Starting point is 00:15:35 Like, what's going on there and how are you advising them? So I've largely been advising on measuring capabilities and risks in these models. So we did, for instance, a paper on the weapons mass destruction related knowledge that models would have together last year. And for that, we were finding a lot of the academic knowledge or knowledge that you would find in the literature. Like, does it really understand the literature quite well? And we were seeing that in biology and for biology, and for bio-weapons-related papers that they did. However, this just tested their knowledge, not their know-how. So that's why we did the follow-up paper to see what's their actual wet lab

Starting point is 00:16:26 know-how skills. And those were lower, but now they're higher. And so now those vulnerabilities need to be patched or, and those patches are, I gather, underway. So we've also worked on other sorts of things together, like in measuring the capabilities of these models, because I think it's important to the public have some sense of how quickly is AI improving? What level is it at currently? So a recent paper we did together was Humanity's last exam, where we put together various professors and postdocs and PhDs from all over the world, and they could join in on the paper if they submit some good questions that stump the AI systems. And I think this is a fairly difficult test, so it was think of something really difficult

Starting point is 00:17:17 that you encountered in your research and try and turn that into a question. And I think each person, each researcher probably has one or two of these sorts of questions. So it's a compilation of that, and I think when there's very high performance on that benchmark, that would be suggestive of something that has, say, in the ballpark of super, super human mathematician capabilities. And so I think that would revolutionize the academy quite substantially because all the theoretical sciences that are so dependent on mathematics would be a lot more automatable. You could just give it the math problem and it could probably crack it better than nearly

Starting point is 00:17:58 anybody on Earth could. So that's an example capability measurement that we're looking at. We excluded in Humanity's last exam no virology-related skills. So we were not collecting data for that because we didn't want to incentivize the models getting better at that particular skill through this benchmark. And how's the AI doing today on that exam? They're in the ballpark of like 10 to 20% overall, the very best models. So it'll take a while for it to get to 80 plus percent.

Starting point is 00:18:32 But I think once it is 80 plus percent, that's basically a superheaval mathematician is one way of thinking of it. But the thing is, they're at 10 to 20% now. And many experts within the AI field, the practitioners, we had Jan on a couple weeks ago talking about how we're getting to the point of diminishing returns with scaling, right? That current growth trajectory of, or the current trajectory of generative AI in particular, is limited because basically the labs are maximizing their ability to increase its capabilities. So I'm curious what you think, you think that's right because you're obviously working with these companies, working with XAI, you're working with scale.

Starting point is 00:19:10 If we are getting to this data wall or some wall or some moment of diminishing marginal return on the technology, is it possible that all this fear is somewhat misplaced? Because if the AI is not going to get much better than it is right now, at least with the current methods, you know, we may not be a year or two away from AGI, right? We may not be getting AGI at the end of 2025, like some people are suggesting. And so then maybe we shouldn't be as afraid because, again, the stuff is limited. Yeah, so if we were trapped at around the capability levels that we're at now, then that would definitely reduce urgency and, you know,

Starting point is 00:19:51 I mean, one could chill out a bit more and take it easy. But I'm not really seeing that. I think maybe what he's referring to is the sort of pre-training paradigm, sort of running out of steam. So if you take an AI, train on a big blob of data and have it just sort of predict the next token to do what basically gave rise to older models like GPT4, that sort of paradigm does seem like it's running out of steam.

Starting point is 00:20:24 It has held for many, many orders of magnitude, but the returns on doing that are lower. That is separate from the new reasoning paradigm that has emerged in the past year, which is where you train models on math and coding types of questions with reinforcement learning. And that has a very steep slope. And I don't see any signs of that slowing down. That seems to have a faster rate of improvement than the pre-training paradigm, the previous paradigm had.

Starting point is 00:21:03 And there's still a lot of reasoning data left to go through and do reinforcement learning on. So I think we have quite a number of months or potentially years of being able to do that. And so personally, I'm not even thinking too specifically about what AIs will be looking like in a few months. They'll be, I think, quite a bit better at math and coding. But I don't know how much better.

Starting point is 00:21:31 I'm largely just waiting because the rate of improvement is so high and we're so early on in this new paradigm that I don't find it useful to try and speculate here. I'm just going to wait a little while to see. But I would expect it to be quite better in each of these domains, in these STEM domains. Right. I guess reasoning does make it better at the areas that you're mostly concerned about. Yeah, yeah, that's right. Because when it goes, tell me again if I'm wrong, when it goes step by step, It's much better at executing and working on these problems than if it's just printing answers.

Starting point is 00:22:06 Yeah, and there is a possibility, and this is sort of a hope in the field, I don't know whether it will happen, is that these reasoning capabilities might also give these agent type of capabilities, where it can do other such of things like make a PowerPoint for you and do things that would require operating over a very long time horizon. Potentially those would fall out of this, that skill set would fall out of this paradigm. but it's not clear. There has been a fair amount of generalization from training on coding and mathematics

Starting point is 00:22:36 to other sorts of domains like law, for instance. And maybe if those skills get high enough, maybe it will be able to sort of reason its way through things step by step and act in a more coherent, goal-directed way across longer time spans. I'm going to try to channel Jan here a little bit. I think he would say that this is still going to be constrained

Starting point is 00:22:57 by the fact that AI has no real understanding. think of the real world. Well, I don't know. It sounds like almost a no true Scotsman type of thing. Like, it's like what's real understanding? Right, okay. Like, um, let me give you an example. If it's, if it's sort of like, if it can do the stuff, that's what I care about.

Starting point is 00:23:15 But if it like doesn't satisfy some like strict philosophical sense of something, you know, some people might find that compelling, but I don't. I'll give you an example, like with the video generators. Like if AI really understood physics, uh, then, you know, when you try it, when you say, give me a video of a car driving through a haystack. It will actually be a car driving through a haystack as opposed to what I've done is give it that prompt and it's just hay exploding onto the front of a car

Starting point is 00:23:39 with perfectly intact hay bales in the background. I think that for a lot of these sorts of queries, at least with images, for instance, we'd see a lot of nonsensical arrangements of things and things that don't make much sense if you look at it more closely. But then as you just scale up the models then they tend to just kind of get it increasingly.

Starting point is 00:24:02 So we might just see the same for images, or excuse me, for video. I think as well they have like some good world model stuff. Like they'll have like vanishing points being more coherent. And like if I were drawing or anything like that, I'd probably be lacking, you know, lacking in understanding of the physics and geometry of the situation and making things internally coherent relative to them.

Starting point is 00:24:26 So, I don't know, yeah, they seem pretty compelling and have a lot of the details right, including some of the more structural details. But there'll be gaps that one can keep zooming into. But I just think that that set will keep decreasing, as was sort of the case with images and text before. I mean, text back in the day, the same argument. It doesn't have a real understanding of causality. It's just sort of mixing together words and whatnot. And when it was barely able to construct sentences coherently.

Starting point is 00:24:56 Now it can. And then, yeah, and now it can. So I don't know if it, like, then got a real understanding in the sort of philosophical sense that he's thinking for language, but it was good enough. And that might be the case with video as well. There were points where I was like, oh, but it is getting the guy sitting on the chair when I say, you know, do a video of a guy sitting on a chair and kicking his legs. And those legs are kicking. And they are bending at the joints. So there must be some understanding there.

Starting point is 00:25:23 Yeah, in some ways. But if you ask them to do like gymnastics and I'll just have leaps flailing all over. No, the person just disappears into the floor. Okay, like you said at the beginning, where Chachyp-D isn't going to kill us yet. Let's talk about hacking. I do think that we glanced over a little bit before,

Starting point is 00:25:41 but in terms of we're now going through, I think, the humans plus AI problem, right? And hacking to me is one that I think we should definitely focus on. You mentioned that we're still not quite there, but it does seem to me again I'm just going to go back to the point I made earlier you can really code stuff up with these things and they enable like pretty impressive code

Starting point is 00:26:01 already you could think that chat GPT could produce pretty good fishing emails if you just kind of creatively and not just chat chip T but all of these GPT models if you creatively prompt it right it will give you an email that you can send and try to fish somebody

Starting point is 00:26:15 or even let's say you just take an open source model like deep seek download it and then run it without safeguards So where's the risk with hacking? I know you said it's a little bit further off. Why is it further off? And what should people be afraid of? Or what should people be concerned of?

Starting point is 00:26:32 Yeah, yeah. So the risk from it, more of the risk comes from when they're able to autonomously do the hacking themselves. So trying to break into a system, finding an exploit, escalating privileges, causing damage from there, things like that. And that requires multiple different steps and these agential skills that I keep referring to that they currently don't have. So although they could facilitate in like ransomware development and other forms of other forms of malware, for them to autonomously execute and infiltrate systems, that is something that will require the new agential skills. And I don't see, it's very unclear when those arrive. could be a few months from now, could be a year from now, it's a little less, a little more

Starting point is 00:27:27 suspicious and maybe it would even take two years for that. So that's something for us to get prepared for figure out how we're going to deal with that, try and make safeguards increasingly robust to people trying to maliciously use it in those ways. But yeah, I think much of the risk source comes from being able to take. one of these AIs, let's say one of these deep-seek AIs, let's say it's deep-seek agent version, and it's able to actually do these cybertext. Then you could just run 10,000 of them simultaneously. And then you, you know, some rogue actor could have it target critical infrastructure.

Starting point is 00:28:02 Then this is causing quite severe damage. So for, like, critical infrastructure, you know, this could be like have it reduce the detector, the, or the filter in, you know, a water plant or something like that. Then the water supply is, like, ruined. or you could target these thermostats in various homes because they're, you know, often some of the more advanced ones are connected to Wi-Fi. And then you sort of turn them up and down simultaneously. And this can just like ruin like transformers and like blow them. And then, you know, they take multiple years to replace.

Starting point is 00:28:39 Things like that. And but they aren't capable of doing that sort of thing currently. So it's more of a on the horizon type of thing. But I'm not like feeling the urgency with that currently. I'm more concerned about it. I think there's more the geopolitics of this, like making sure that states are aware of what's going on in AI, like they're at least able to follow the news and things like that in some capacity.

Starting point is 00:29:17 I think things like that feel so more urgent to me than trying to address cyber risks. There are things to do, though, and I think we should create incentives beforehand, but maybe I'm too much of an optimist for my own good, but when I hear you talk about this, I also get a little bit excited about the capabilities of these programs, because, for instance, if AI can enhance the function of a virus, AI can probably create a vaccine, make medical discoveries. If AI can hack into the infrastructure of some country, right, find exploits and turn the thermostats up and down, then AI could probably do incredible amounts of very

Starting point is 00:29:53 beneficial coding and computer work for humanity. So if we do get to that point, it seems to me like there's going to be these maybe two poles here, right? One is the potentially scary and destructive stuff that you can mitigate, right, with some of the controls that you talked about, but also amazing opportunity. Yeah, so it's, it's, in the thermostat thing was for messing with the electricity and that causing strain on the power grid, and destroying Transformers the just for clarification in case it but yeah I think you're pointing at that it's dual use so I'm not saying AI is bad in every single way and it's like other dual use technologies bio is a dual use technology can be used for bio weapons can be used for

Starting point is 00:30:35 healthcare nuclear technology is dual use there's civilian applications for it as well and chemicals too and we have managed all of those other ones by selectively trying to, you know, limit some particular types of usage and restricting the capabilities of rogue actors to some of these technologies and making sure there are good safeguards for the civilian applications. And then we can actually capture the benefits. So it's not an all or nothing type of thing with AI. It's what are surgical restrictions one can place so that we can keep capturing the benefits. And so, for instance, with

Starting point is 00:31:17 That's a matter of you add the safeguards, and then the researchers who want access to those can speak to sales. That's basically a resolution of that problem, provided that you have the models kept behind APIs. Now, on this dual-use part, though, there's an offense-defense balance. So for some applications, it can help, it can hurt, and maybe it helps more than it hurts. So maybe it will hurt more than it will help. So in bio, I think that is offense dominant. If somebody creates a virus, there's not necessarily a cure that it will immediately find for it. If it would help a rogue actor make a somewhat compelling virus, now that could be enough to cause many millions to die.

Starting point is 00:32:04 And it may take months or years to find a cure. There are many viruses for which we have not found cures yet. And for cyber, in most contexts, there's a balance between offense and defense. defense, where if somebody can find a vulnerability with one of these hacking a eyes, then they could also use that to patch the vulnerability. There is an exception, though, where in the context of critical infrastructure, there the software is not updated rapidly. So even if you identify various vulnerabilities, there will not necessarily be a patch because

Starting point is 00:32:36 the system needs to always be on, or there are interoperability constraints, or the company that made the software is no longer in business, these sorts of things. So our critical infrastructure is a sitting duck. And so in that context, cyber is offense dominant. But in normal context, it's roughly, there's roughly a duality. And for virology, I think that's largely offense dominant. So before we go to the nation state element of this, I need to ask you a question about the actual research houses themselves. Every research house says they're a concern with safety.

Starting point is 00:33:08 From open AI to XAI, everything in the middle. Maybe not DeepSeek. We'll get to Deepseek. Yet they're the ones that are building this technology. And I find it a little strange that you have companies that are saying it's weird. We have to build this and advance this technology so we can keep people safe. I never really understood that message. Yeah, I don't know if it's to say that we need to keep people safe.

Starting point is 00:33:34 I think it's more that the main organizations that have power in the world now are largely companies. And so if one's trying to influence the outcomes, when basically needs to be a company is how many of them will reason. They'll think that, yeah, you could be in civil society or you could protest, but this will not determine the course of events as much. So there's sort of many of them are buying themselves the option to hopefully influence things in a more positive direction, but most of the effort will be to stay competitive and stay in this arena. So I think over 90% of the intellectual energies that they're going to spend is actually how can we afford the 10x larger supercomputer? And that means being very competitive, speeding this up, and making safety be some priority, but not necessarily a substantial one.

Starting point is 00:34:29 So I do think there is sort of an interesting contradiction or something that looks like a contradiction there. But I think if we think back to nuclear weapons, nuclear weapons, nobody wants nuclear weapons. If there'd be zero on Earth, fantastic, you know, that would be a nice thing to have if that would be a stable state. But it's not a stable state. One actor may then develop nuclear weapons, and they could destroy the other. So this encourages states to do an arms race, and it makes everybody all collectively less secure, but that's just how the game theory ends up working. So you get a classic, what's called a security dilemma.

Starting point is 00:35:06 everybody's worse off collectively, and even if you took it seriously, you see, yes, nuclear technology is dual use and potentially catastrophic, and we need to be very risk-conscious about it. You can agree with all those things, but you still might want nuclear weapons because other parties will also have nuclear weapons, and unilateral disarmament in many cases, it just didn't make game theoretic sense. So in the way that like an individual company pausing their development while others race ahead doesn't make game theoretic sense. So I think this just points to the fact that there's some game theory is kind of confusing.

Starting point is 00:35:47 And so you're getting some things that are seeming contradictions that if you use a nuclear analogy, you go, yeah, I suppose that makes sense. And it's just kind of an ugly reality to internalize. Doesn't that discount the fact that like these companies, if they want to influence like the way things are going, they are going to be it's like you're one and the same yes you're influencing but without you this wouldn't be moving as fast as it is like it is interesting for instance think about Elon Musk right obviously he has you in two days a week to work on safety inside xAI but he's also putting together what million GPU data centers to build the biggest baddest LAM ever um well if he didn't then

Starting point is 00:36:24 then he would be having less influence over it so it's um there's it's not something that I would envision everybody would just sort of voluntarily pause. So subject to companies not sort of voluntarily rolling over and dying, then what's the best you can do subject to those constraints? But the competitive pressures are quite intense, such that they do end up prioritizing, focusing on competitiveness and other priorities like what's the budget for safety research. it will be generally lower than would be nice to have if this were a less competitive environment.

Starting point is 00:37:07 Do you think Elon is more interested in restoring this original vision that he had for open AI, making everything open source, making it safe, I would imagine. He founded Open AI with Sam Altman as sort of a beach head against Google because he was afraid of what Google was going to do with this technology. So I'm curious if you think that XAI is along that mission or is he more interested in the sort of soft cultural power that comes with having the world's best AI. For instance, like you can change the way that it speaks about certain sensitive political issues. It can be anti-woke, which we all know is sort of where Elon stands.

Starting point is 00:37:48 So what do you think his true interest lies in building XAI? Well, I think the, and I won't, you know, position myself as sort of speaking on behalf. Yeah, we won't put you as a young spokesperson, but you are in there a couple times a week. So I think that the mission is to understand the universe, and so this means having AIs that are honest and accurate and truthful to improve the public's understanding of the world. So we will be getting in a very fast-moving, trying situation with AI if it keeps accelerating, and so good decision-making will be very important, and us understanding the world around us will be very important. So if there are more features that enable truth-seeking

Starting point is 00:38:35 and honesty and good forecasts and good judgment and institutional decision-making, those would be great to have with, the hope is that GROC could help enable some of that so that civilization is steered, steering itself more prudently in this potentially more turbulent period that's upcoming. That's one read on the mission statement. But I think that's the objective of it is understand the universe and there are different sub-objectives that that would give rise to. And I think it's ability to help culture process events without censorship or political bias one way or the other is a stated objective

Starting point is 00:39:24 and I think that would be indispensable in the years going forward. Do you buy that that's what they're doing? Because we also heard the same thing from Elon when it came to buying Twitter, now X. I think community notes has been quite good. But that was something that was built under Jack Dorsey. I'm not going to take sides of you.

Starting point is 00:39:42 I'm going to just observe empirically what I've seen. I mean, we know that. Substack links have been deprioritized because it was seen as a competitor with Twitter. We know that Musk, I think, according to reporting, changed the algorithm to have his tweets show up more often. And his tweets took a strong stance towards supporting Donald Trump in the election. So to me the idea that like hearing again from Elon, and again, look, I respect what Elon's done as a business person, but hearing again that he has a plan to make a culturally relevant product, that's free of censorship and politically unbiased, I don't know if I believe that anymore.

Starting point is 00:40:21 So I don't know about some of the specific things, such as the, you know, waiting thing or something like that, profile things, for instance. I think that overall, in terms of cultural influence in people being more disagreeable and doing less self-censoring has been successful. I think that was the main objective of it. And so I think, I think that X had a large role to play there. So, I don't know. I think like, I think in terms of shaping discourse norms in the U.S., and that seems to have been successful in my view.

Starting point is 00:41:00 Yeah, I'm not saying pre-Elon Twitter didn't censored, which is the wrong, probably the wrong word because that's usually from the government, didn't sort of shape the definition of speech to its own liking. It obviously had a progressive approach. and moderated speech on a progressive approach. I just don't think Elon is not using his own influence when it comes to how he runs X. But you and I could speak about this forever.

Starting point is 00:41:25 This isn't even my sort of wheelhouse as much. But yeah, I mean, it's sort of like since I'm doing the XXX. You brought it up. Oh, okay. All right. I mean, just the non-biased and truthful thing. So it's worth talking about. So, I mean, it is, if there are like ways in which it's like extremely

Starting point is 00:41:43 biased one way or the other, that's useful to know. This is a thing that is continually trying to be improved, at least for XAI's GROC. And I think that all of the sort of product offering could get quite better at this. But I'm not speaking as a sort of representative there or anything like that, but I guess maybe in my, I guess right now, in my personal capacity, I think that there's things to improve on for all these models in terms of, in terms of their bias. All right. We agree on that front. You hinted at it previously, but you talk a little bit about how companies, basically how you don't think it's a good idea for there to be an arms race here. And certainly there is one between the U.S. and China. We know that U.S. has put export

Starting point is 00:42:39 controls on China. China has in some ways gotten around them through like very creative procurement processes that go through Singapore, right? We can probably say that with a pretty good degree of confidence. Then of course we see the release of Deepseek and some other AI applications from China and everyone's trying to

Starting point is 00:42:55 build the better AI so that they have the soft power like we spoke about to effectively you know, A control, to influence culture across the world, but also it's an offensive capability and defensive like you're saying. If you're country has the ability to manipulate viruses or to do cyber hacks, you become more powerful

Starting point is 00:43:13 and you get to sort of, you know, potentially put your view of the world and implant your view of the world on the way that it operates. You have a paper out that's sort of arguing against this arms race. It's called Super Intelligence Strategy. It's with you, Eric Schmidt, we all know, former CEO of Google. I think he just started. He's taking over a drone company, so you can tell me a little bit about that. And Alexander Wang, the former, I don't know, not the former the current CEO of Scale AI, who's been formally on this show. Talk a little bit about why you don't think it's a good idea for countries to pursue this arms race. You say it might be leading us to mutually assured AI malfunction,

Starting point is 00:43:51 not mutually restored like nuclear destruction. I think that's what you get that from. Yeah, so the strategy has three parts, one of which is competitiveness, but we're saying that some forms of competition could be destabilizing and that you may be irrational to pursue it because you couldn't get away with it. So in particular, this, I'm making a bid for superintelligence through some automated AI research and development loop

Starting point is 00:44:22 could potentially, to one state having some capabilities that are vastly beyond another states. If one state gets to experience, a decade of development in a year, and the other one is the year behind, then this results in a very substantial difference in the states' capabilities. So this could be quite destabilizing if one state might then start to get an insurmountable lead relative to the other. So I think that form of competition would be very dangerous, and because there's a risk of loss

Starting point is 00:45:02 of control, and because it might incentivize. states to engage in preventive sabotage or preemptive sabotage to disable these sorts of projects. So I think states may want to deter each other from pursuing superintelligence through this means. And this then means that AI competition gets channeled into other sorts of realms, such as in the military realm of having more secure supply chains for robotics, for instance, and for AI chips, having reduced sole source supply chain dependence on Taiwan for making AI chips. So states can compete in other dimensions,

Starting point is 00:45:43 but them trying to compete to develop super intelligence first. I think that seems like a very risky idea, and I would not suggest that because there's too much of risk of loss of control, and there's too much of a risk that one state, if they do control it, uses it to disempower others and affects the balance of power far too much and destabilizes things. So, but the strategy overall, think of the Cold War. Before you go on the strategy, like, my reaction to that is, good luck telling that to China.

Starting point is 00:46:14 So I think it's totally, so for the, for deterrence, I think if the U.S. were pulling ahead, both Russia and China may have a substantial interest in saying, hey, cut this out, pulling ahead to develop superintelligence, which could give it a huge advantage and an ability to crush them. They'd say, you don't get to do that. We are making a conditional threat that if you keep going forward in this because you're on the cost of building this, then we will disable your data center or the surrounding power infrastructure so that you cannot continue building this. I think they could make that conditional threat to deter it, and we might do the same, or the U.S. might do the same to China or other states that would do that. So I don't see why China wouldn't do that later on. Right now, they're not as thinking about super intelligence and advanced AI.

Starting point is 00:47:07 So this is more of a description of the dynamics later on when AI is more salient. But it would be surprising to me if China were saying, yes, the United States, go ahead, do your Manhattan project to build superintelligence, come back to us in a few years, and then tell us, you can boss us around because now we're in a complete position of weakness and we'll be at your mercy and we'll accept whatever you say or tell us to do. see that happening. I think they would just say move to preempt or deter that type of development so that they don't get put in that fragile position. Are you in like the Eleazar-Yutkowski camp of bombing the data centers if we get to superintelligence? Well, so I think I'm advocating or pointing out that it becomes rational for states to deter each other by making conditional threats and by means that are less escalatory, such as cyber sabotage on data centers or surrounding power plants, I don't think one needs to get kinetic for this.

Starting point is 00:48:12 And I think that if discussions start earlier, I don't see any reason things need to be escalating in that way or unilaterally actually doing that. We didn't need to get in a nuclear exchange with Russia to express that we have a preference against nuclear war. So I think... Thank goodness. So indicating or making conditional threats through deterrence seems like a much smarter more smarter than, hey, wait a second, what are you doing there?

Starting point is 00:48:45 And then bomb that. That seems needless. Yeah, I'm not into that. Yeah. But what you're talking about is sort of assuming that there will be a lead that will be protectable for a while. But everything we've seen with AI is that no one protects a lead, right? Well, if there's, so one difference is that when you get to a different paradigm, like automated AI R&D,

Starting point is 00:49:09 the slope might be extremely high, such that if the competitor starts to do automated AI R&D a year later, they may never catch up just because you're so far ahead and your gains are compounding on your gains. Sort of like in social media companies, Eric will use this analogy, where if one of them starts blowing up and growing before you started, it's often the case that you won't be able to catch up and they'll have a winner-take-all type of dynamic. So right now, the rate of improvement is not that high or there's less of a path for a winner-take-all dynamic currently. But later on, when you have the ability to run 100,000 AI researchers simultaneously, this really accelerates things. Maybe OpenAAS got a few hundred, maybe they'll say 300 AI researchers, so going from 300 AI researchers to orders of magnitude more world-class ones,

Starting point is 00:50:11 create quite substantial developments. This is something that isn't new. This is something that, like, Alan Turing and the founders of computer science I had pointed out that this is a natural property of when you get AIs at this level of capability, then this creates this sort of recursive dynamic where things start accelerating extremely quickly and quite explosively. Okay. We managed to spend most of our conversation today talking about present risks or like risks in the near future. We should focus a little bit more on intelligence explosion and loss of control, and we're going to do that right after the break. Hey everyone, let me tell you about The Hustle Daily Show, a podcast filled with business,

Starting point is 00:50:55 tech news, and original stories to keep you in the loop on what's trending. More than 2 million professionals read The Hustle's daily email for its irreverent and informative takes on business and tech news. Now, they have a daily podcast called The Hustle Daily Show, where their team of writers break down the biggest business headlines in 15 minutes or less and explain why you should care about them. So, search for The Hustle Daily Show and your favorite podcast app, like the one you're using right now. We're back here on Big Technology Podcast with Dan Hendricks. He is the director and

Starting point is 00:51:25 co-founder of the Center for AI Safety, Dan. It's great speaking with you about this stuff. Let's talk a little bit. You've been sort of talking about it in the first half, but I want to zero in here on this idea of intelligence explosion or what you talk about is basically having AI autonomously improve itself. Just talk through a little bit about how that might happen and whether you see that being something that is actually probable. in our future. Yeah. I mean, the basic idea is just imagine automating one AI researcher, one world-class one.

Starting point is 00:52:01 Then there's a fun property with computers, which is there's copy and paste. So you can then have a whole fleet of these. Well, you know, with humans, you know, if you just have one of them, you know, it's maybe they'll be able to train up somebody else who has a similar level of ability. So this adds a very interesting dynamic to the mix. And then you can get so many of them proceeding forward at once. And, you know, AIs also operate quite quickly. They can code a lot faster than people.

Starting point is 00:52:31 So maybe it's, maybe you've got 100,000 of these things operating at 100 X of speed of a human. How fast will that go? Maybe conservatively, let's say it's just overall 10xing research. But 10xing research would mean, say, like a decade's worth of development in a year. telescoping of all these developments makes things pretty wild and means that one player could possibly get AIs that go from very good, you know, world class to being vastly better than everybody at everything into superintelligence, something that towers far beyond any living person or collective of people. So if we get an A.I. like that, this could be to stable

Starting point is 00:53:20 because it could be used to develop a super weapon potentially. Maybe it could find some breakthrough for anti-ballistic missile systems, which would make nuclear deterrence no longer work or other types of ways of weaponizing it. So that's why it's destabilizing. So states then if they're seeing, oh, they're, you know, don't run this many AIA researchers simultaneously in these data centers. working on to build a next generation or superintelligence, because if you do so, then that will put us in,

Starting point is 00:53:57 that will make our survival be threatened. So them saying, them deterring that would help them secure themselves. And they can make those threats very credible currently, and I think we'll continue to be able to have these threats be credible going forward. So this is why I think it might take a while for super intelligence to be developed because there'll be deterrents around it later.

Starting point is 00:54:20 on. And then maybe in the farther future, there could be something multilateral, but that's speaking quite far out in very different economic conditions. In the meantime, with AIs that we'd have in the future, those could still automate various things and increase prosperity and all of that. So we'd still have explosive economic growth if you had something that was just at an average human level ability, running for very cheap. So I think that those are some of the later stage strategic dynamics, and I don't think we can get away with, or I don't think any state could get away with trying to build a superintelligence, go build a big data center out in the middle of the desert, a trillion dollar cluster, bring all the researchers there, and let's not invite the other states to go, what do you think you're doing here? You were at the White House yesterday. Well, this is largely just sort of speaking about some of these, you know, strategic implications.

Starting point is 00:55:21 Are they receptive? Yeah, I mean, it's a, it's a, this isn't a, there's always, there's always interest in, you know, thinking what, what are some of the, the later term dynamics, what things should happen now and whatnot. But this is, yeah, I think, I think what people think White House. it sounds, you know... Well, it's the word of the president, Liz. So there's the... Well, yeah, so there's the Eisenhower building, which is, you know, part of the White House kind of not.

Starting point is 00:55:56 But, you know, that's where everybody's works and whatnot. I think, you know, some of the things we were speaking about here, like virology, advancements, things like that, there's just a lot of, you know, things to speak about and think what things make sense or what things to keep in mind going forward. So, yeah. Yeah, I guess I'd rather an executive branch paying

Starting point is 00:56:15 attention to this stuff, they're not. Yeah, yeah, that's right, yeah. Yeah, and what are the sort of ways that help, you know, maintain competitiveness? Because, you know, how people normally think about this, they'll think it's all or nothing and good or bad thing. And then we're saying, no, it's dual use. So that means there are some particular applications that are concerning and there are other applications that are good. And you want to stem the particularly harmful applications and what are ways of doing that while capturing the upside. Right. Okay. So the intelligence, The emergence explosion part of this conversation, Nevely brings up the loss of control part, where to me, I think the thing that when people think about AI harm, they are always worried

Starting point is 00:56:54 that AI is going to escape the simulation or whatever it is and act on its own and try to basically ensure that it preserves itself. We've seen it recently, I think I brought this up at the beginning of the show, where Anthropic has done some experiments where the AI has run code to try to copy itself over onto a server if it thinks that its values are at risk of being changed. Is this, so it's fun to think about, but it's also like probably just probability. Like if you run it enough times because it's a probabilistic engine.

Starting point is 00:57:28 If it was like, oh, it's only one in a thousand of them intend to do this. Well, if you're running a million of them, then you're basically certain to get many of them to try and, you know, self-exhiltrate. And so are you worried that this self-exfiltration is going to be a thing? I think from a, you know, a recursive automated AR&D thing, I think that has really substantial probability behind it of a velocity control in that situation. So you're worried about this. So there's that, but I would distinguish between that and these sort of things that are not superintelligencies or things that are not coming from that sort of really rapid loop and like the currently existence systems. I think that the currently existing systems are relatively controllable, or if there is some very concerning failure mode, we have been able to find ways to make them more controllable.

Starting point is 00:58:22 For instance, for bioweapons refusal, we used not able to make robust safeguards for them two years ago. But we've done research with methods such as like called circuit breakers and things like that. And those seem to improve the situation quite a bit and make it actually. prohibitively difficult to do that jailbreaking. And so maybe we'll find something similar with self-exaltration. So I think people generally want to claim that like, oh, current AIs are not controllable. And I think that they're not highly reliably controllable. They're reasonably controllable. Maybe we could get some, or it seems plausible that we'll get to have increasing levels of reliability. And so I'm sort of reserving judgment. It'll depend more on the empirical

Starting point is 00:59:09 phenomena. So I think everybody should research this more and we'll sort of see what the risks actually are. But there are some that seem less empirically tractable or things that can't be empirically solved like this loop thing. Like how are you going to, you can't run this experiment a hundred times or something like that and make it, you know, go well. You're making a huge attempt to building a superintelligence and has destabilizing consequences. This isn't something And that's totally unprecedented. And for that, you have more of like a one chance to get it right type of thing. But with the current systems, we can continually adjust them and retrain them

Starting point is 00:59:48 and come with better methods and iterate. So it is concerning. It would not surprise me if this would really start to make AI development itself extremely hazardous instead of just the deployment. But instead, inside the lab, like you need to be worried about the AI trying to be breaking out sometimes. That's totally in the realm possibility. But yeah, I could see it going either way. Yeah, I mean, this personally freaks me out because, yeah, if you see the AI trying to deceive evaluators, for instance, or you see the AI trying to break out, you really can't trust anything

Starting point is 01:00:24 it's telling you. And we had Demis Sassabas on the show a little while ago, and he's basically like, listen, if you see deceptive behavior from AI, if you see alignment faking, you really can't trust anything in the safety training because it's line. to you? There is truth to that. Are you seeing deceptiveness at GROC, by the way? Oh, yeah, yeah. So we have a paper out last week.

Starting point is 01:00:47 We're just measuring the extent to which they're deceptive. And in the scenarios we have, like all the models were in these sorts of scenarios under, you know, slight pressure to lie, not being told to lie, but just some slight pressure. Then some of them will lie like 20% of the time, some of them like 60% of the time. So they don't really have this sort of virtue, sort of baked into them, the virtue of honesty. So I think we'll need to do more work and we'll need to do it quickly. So I'm sort of speaking a more nonchalant way about this, but I can't like, you know, get worked up out every single risk because they're also just, you know, be at 11 all the time.

Starting point is 01:01:23 So there's some that I'm, you know, putting in different tiers than other risks. And this is a more speculative one. We've seen these sometimes get surprisingly handleable. But, yeah, it could end up making things really, really bad. We'll see. We'll do things about it to make that not be the case. Okay. Thank you. Two more topics for you then we'll get out of here.

Starting point is 01:01:51 The Center for AI Safety. Who's funding it? Well, so there's not sort of one funder. It's largely just various philanthropists. The main funder would be Yon Talon and Yon Talon, others who's a Skype, a co-founder. There's a variety of other philanthropies or philanthropists. The generally for, so for instance, Elon doesn't, I've never asked him to fund the center. So that is it to say I don't get any money from Elon?

Starting point is 01:02:27 my appointment at X-AI, I get a dollar a year. At scale, at scale AI, I've increased my salary exponentially aid to where I get $12 a year, a dollar per month from scale. But I'll try not to, or I'll try to avoid, you know, getting complicated, having some complicated relations with them, just so that I can, you know, not feel on behalf of any of them in particular. So you're basically doing the work. work for them for free? Well, but it's useful. Right. It's useful to do. And I mean, yeah, I mean,

Starting point is 01:03:04 I think the main objective is, yeah, just try and try and generate some value here and as best as one can. So by reducing these sorts of risks. Yeah, I think it's a good arrangement because it enables me to, like, you know, do have a choose your adventure type of thing. Right. Now I think the politics or geopolitics, this is more relevant. Now I can go off and learn about this for some months and then work on a paper there and compared to if it's like, no, you've got to be coding 80 hours a week. That's your job. That would be quite restrictive. And I couldn't be speaking with you. I'm glad you're here. So thank you, Alex Wayne. So let's talk a little bit about this funding because I think that after Sam Altman was fired and then rehired at Open AI, there was a sort of skepticism around. around effective altruism's impact on the AI field. Even Yon Talon, I'm reading from his statements right after the open AI governance crisis highlights the fragility of voluntary EI-motivated governance schemes.

Starting point is 01:04:09 So the world should not rely on such governance working as intended. Now, Yon is, of course, associated with EA. EA is, like, basically leading the conversation around AI safety. Is that good? So I think that in terms of Yon, I think he's funded organization. that are E-affiliated. I don't know if he'd call himself that, but whatever, you know, people can, you know,

Starting point is 01:04:32 ascribe labels how they'd like. I think that the, I mean, I've tweeted that EA is not equal to AI safety. I think that EA community generally is insular on these. So I lived in Berkeley for a long time when it was during my Ph.D. And there's sort of a school, a sort of AI risk school that was had very particular views about what things are important. So malicious use, for instance, when I was talking about malicious use

Starting point is 01:05:03 from the beginning of this thing, you know, they're historically really against that, yeah, yeah, there'd be only loss of control. Don't talk about malicious use. That's a distraction. And so that was annoying because I'd always been working on robustness as a PhD student where the main thing was malicious use. So, yeah, I ended up leaving Berkeley before graduating just because of the sort of relatively suffocating atmosphere and the sort of central focus on. There'd be some new fad and you'd have to get interested in that.

Starting point is 01:05:38 Some elk eliciting late knowledge. This is the important thing that you have to focus on or you have to focus on inner optimizers. There's lots of these speculative, empirically fragile things. So, for instance, this alignment faking stuff that you're seeing. Like, there's some concern there, but, you know, I'm not totally sold that this is like a top-tier type of priority. But in these communities, this is all that matters currently, roughly speaking. This involuntary commitments from AI's, from AI companies, I think voluntary commitments from AI companies are also a distraction because the companies will, you should expect most of them by default to just break those sorts of commitments if they end up going up against economic competitive. Okay.

Starting point is 01:06:23 So I think it's a distraction relatively. And so I think it's, I think there are many people who think that EAA broadly their influence on this sort of thing has not been overall positive. I think at least for me and making and other sorts of researchers in this space who've been interested in AI risks, the amount of pressure to adopt some particular positions, though, on this, be extraordinarily high and I think quite, quite destructive. So I'm very pleased now that in the past year or so, there's been a lot more diversity of opinion, which has been quite important. And I think this is just because the broader world is getting more interest in AI.

Starting point is 01:07:09 So a lot of these, a lot of these, you know, fixation on this is the one particular risk, this is the most important risk and everything else's distraction. That just doesn't work when you're speaking with the, or interface. with the real world. There's a lot of complications. And AI is so multifaceted. So you can't, in your risk management approach, can't just be focusing on one of them. Right. So you're not an effective altruist? I don't think of myself as that. I don't particularly get along with this school of thought, this sort of Berkeley AI alignment monolith. And I'm pleased that people can be more independently operating in this space now, which I don't think was the case for many, many years,

Starting point is 01:08:02 including basically the entire time it's during my PhD. And there'll be many people, like Dylan had Phil Minnell, a professor at MIT, who was also at Berkeley at the time, very suffocating. Rohan Shah, research, do you mind very suffocating, that I'll all feel this way, yeah. Okay. Let's bring it home. We've been talking for more than an hour about AI safety as if it's controllable. but open source is like really putting up a pretty valiant effort in this field keeping pace with the proprietary labs

Starting point is 01:08:28 and of course open source is not controllable what do you think about that i mean we just saw deep seek not to you know go back to it all the time but it effectively equaled the cutting edge at the proprietary labs and you know put the weights on its website so how can we possibly have a relationship of safety with AI if open source is out there exposing everything that's been done? So I've been, I haven't been endorsing open source historically, but I've thought that releasing the weights of models didn't seem robustly good or bad. So I sort of was like, it's fine, seems to have complicated effects. There's an advantage to it, which it helped with diffusion of the technology, so that more

Starting point is 01:09:14 people would have access to it and sort of get a sense of AI and this would increase the literacy on this topic and just increase public awareness and get the world more prepared for for more advanced versions of AI. So that's been my historical position, but this depends on it should always proceed by a cost-benefit analysis. So if the, if for instance, they have these cyber capabilities later on, yeah, I think that, or I think that would be a potential place to be drawing the line on open-weight releases, personally, in particular the ones that could cause damage to critical infrastructure.

Starting point is 01:10:00 You could still capture the benefits by having the models be available through APIs. And if they're like software developers, they have access to these more cyber-offensive capabilities. But if they're a random user, they don't. If they're random faceless user, they don't. And likewise, for real, virology. Once there's consensus, once the capabilities are so high that there's consensus

Starting point is 01:10:22 about it being expert level in virology, I think that would be a very natural place to be saying, having an international norm, not saying a treaty, because it takes forever to write and ratify, but to a norm against open weights if they are expert level of varologists. For the same reasons that we had the Biological Weapons Convention. Russia and or the Soviet Union and the U.S. got together for the biological weapons convention. The U.S. and China did as well. We also coordinated on chemical weapons with the Chemical Weapons Convention and the Nuclear Non-Proliferation Treaty. States find it in their incentive to work together to make sure that rogue actors do not have extremely hazardous, potentially catastrophic capabilities like chem, bio, and nuclear

Starting point is 01:11:11 inputs. So I think something similar might be reasonable for AI when they get at that capability threshold. Dan, I am at once kind of reassured that people are thinking about this stuff, but also more freaked out than I was when we sat down. But I do appreciate you coming in and giving us the full rundown of what to be concerned about and what maybe not to be as concerned about as we think about where AI is moving next. So thank you so much for coming on the show. Yep, yep, thank you for having me. This has been fun. Super fun. If people want to learn more about your work or get in touch, how do they do that? I guess this paper or strategy you've been speaking about is at national security.a.i

Starting point is 01:11:52 and I'm also on Twitter or X or whatever it's called. You should know you work with you. At X.com. They're at X.com slash Dan Hendricks. It would be another way of following the goings-ons as a situation evolves. We'll keep trying to put out work and seeing what's going on with these risks. and if we come with technical interventions to make him less, then we'll also put that out too.

Starting point is 01:12:15 So, yeah, that's where you can find me. Well, Godspeed, Dan, and we'll have to have you back. Thanks again. All right, everybody, thank you for listening, and we'll see you next time on Big Technology Podcast.

Big Technology Podcast - AI's Rising Risks: Hacking, Virology, Loss of Control — With Dan Hendrycks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.