Bankless - Revolutionizing AI: Tackling the Alignment Problem | Zuzalu #3

Starting point is 00:00:04 Welcome to Bankless, where we explore the frontier of internet money and internet finance. And today, on this episode of our Zuzalu series, we are exploring some new frontiers. New frontiers and new technologies, all of which are poised to completely revolutionize the world and change everything about the operating system that society is currently running. Bankless Nation, today we are exploring the frontier of AI, which is actually a frontier that we've already been exploring on Bankless. So if you've been listening to our other AI episodes, these will make you feel right at home. AI had a big week at Zuzalu. The AI crypto overlap, everyone knows it's huge,

Starting point is 00:00:38 and it seems like such a massive frontier that people don't actually know where to start with it. ZKML or machine learning models and data that's verified by zero knowledge cryptography was a huge topic of conversation. And you'll hear about that in our cryptography episode with Daniel Short. Phil Diane at AI Week gave a killer talk titled MEV for AI people, which was this gigabrain presentation about how M.EV bots in aggregate kind of presents this omnipotent, omnipresent artificial intelligence. And since MEV has been decently corralled and contained, maybe we can learn a thing or two from the MEV industry in our approach to managing AI risk. There were conversations at Zuzalo about how AI can put the autonomous back into DOWs

Starting point is 00:01:23 and how AI agents could soon be roaming the Ethereum landscape shoulder to shoulder with all the human players out there. But mainly, at Zuzalu, the AI conversation inevitably converged into the alignment conversation, of which you will find two flavors here in this episode, one strongly pessimistic, and then the other characterized by this resigned optimism that is prevalent throughout all of Zuzalo's frontier tech challenges. Unimaginable rewards blocked by seemingly insurmountable obstacles. Up first in this episode, we have Nate Sauri, who is the executive director at Miri, the Machine Intelligence Research Institute, of which Eliezer Yudkowski founded. Nate's perspective on AI and AI risk is definitely downstream of Eliezer.

Starting point is 00:02:07 So we pick up where Bankless left off with Eliezer. And Bankless Nation, it's dark. But nonetheless, Nate admits that it's less dark than it was a few months ago, now that the world is waking up to the potential risk that AI brings to this world. Following the conversation with Nate is Dager Toron, who is charging into the AI frontier with his head held high with a clear path. forward for himself. Degger believes that the AI alignment problem is actually just downstream of human misalignment and that we actually won't be able to align AI until we align ourselves.

Starting point is 00:02:38 This conversation has to do with epistemology, what is truth, individual preferences, and how AI models can help us become the best versions of ourselves. Because if we become the best versions of ourselves with the assistance of some AI tool, we can collectively produce the best versions of our communities. And if we do that, then our communities can coalesce into the best versions of society, all aided by truth-telling AI agents who can help humans navigate through our chaotic world of social organization and politics and social media. Really a fascinating conversation that is actually pretty proximate to our conversation with Tim Urban that we had not too long ago.

Starting point is 00:03:15 I'm really excited for you to listen to these conversations, Bankless Station, so let's go ahead and get right into it. But first, a moment to talk about some of these fantastic sponsors that make this show possible. Cracken Pro has easily become the best crypto trading platform in the industry. the place I use to check the charts and the crypto prices, even when I'm not looking to place a trade. On Cracken Pro, you'll have access to advanced charting tools, real-time market data, and lightning-fast trade execution, all inside their spiffy new modular interface. Cracken's new customizable modular layout lets you tailor your trading experience to suit your needs.

Starting point is 00:03:44 Pick and choose your favorite modules and place them anywhere you want in your screen. With Cracken Pro, you have that power. Whether you are a seasoned pro or just starting out, join thousands of traders who trust Cracken Pro for their crypto trading needs. pro.cracken.com to get started today. Metamask has something new. Introducing Metamask portfolio. Metamask portfolio is the best way to view your crypto portfolio from a holistic level. See everything across all the chains all at once. In your portfolio, Metamask will report the aggregate value of all the assets in your Metamask wallets and even the other wallets you import too.

Starting point is 00:04:18 But MetaMask portfolio isn't just a passive portfolio viewer. It is a place to do all of the money verbs that make defy so powerful. You can buy, swap, bridge, and stake your crypto assets. So not only is Metamask the easiest place to see your wallets in aggregate, but it's also a powerful battle station for all of your Defy moves. So go check out your Metamask portfolio, because it's waiting for you to open it up. Check it out at portfolio.metamask.io. Arbitrum is accelerating the Web3 landscape with a suite of secure Ethereum scaling solutions. Hundreds of projects have already deployed onto Arbitrum 1, with a flourishing defy and NFT ecosystem. Arbitrum Nova is quickly becoming a Web3 gaming hub, and social daps like Reddit are also calling Arbitrum home.

Starting point is 00:04:58 And now Arbitrum orbit orbit, allows you to use Arbitrum's secure scaling technology to build your own layer 3, giving you access to interoperable, customizable permissions with dedicated throughput. All of these technologies leverage the security and decentralization of Ethereum and provide a builder experience that's intuitive, familiar, and fully EVM compatible. Faster transaction speeds and significantly lower gas fees. Are you a dev, but you don't know solidity? With Stylist, Arbitrum's upcoming proposal for a programming environment upgrade, developers can write smart contracts in Rust, C++, and many more coding languages. Arbitrum empowers you to explore and build without compromise.

Starting point is 00:05:33 Visit Arbitrum. Where you can join the community, dive into the developer docs, bridge your assets, and start building your first app on Arbitrum. Bankless Nation, we are here with Nate's stories, and we are starting the AI week here at Suzalo. And Nate is an AI researcher? Is that how you would call yourself? Alignment researcher? I would say I'm the executive director of the Machine Intelligence Research Institute. These days, it's less research than I'd like.

Starting point is 00:05:59 I have done alignment research before. Okay. Can you explain that institution, the institute? What is that? So I didn't found it. It was founded by Elias Juerkowski. I don't know exactly when, maybe around 2001. Fun fact, it was originally founded.

Starting point is 00:06:17 It was originally called the Singularity Institute, and it was founded, because ELEASER wanted to make AGI as fast as he could. And then along the way, he realized that it doesn't go well by default, and it doesn't go well for free. And so then the organization pivoted to trying to make this AI stuff go well. And for many years, the Institute did some research, did some field building, did some awareness raising, and so forth until around 2012, 2013, when they pivoted to peer technical research.

Starting point is 00:06:58 And this was related to some of the field building, some of the awareness raising, moving to other groups as the field got a little larger. And I got involved when they pivoted to the technical research more exclusively. And so I was originally involved as a technical researcher. and then when the previous executive director left, I was the heir apparent. Okay, sorry, the machine, what's the name of the institute? Machine Intelligence Research Institute, aka Meary. Miri, okay. So it sounds like Miri has its own trajectory of itself that probably runs in parallel with

Starting point is 00:07:37 human understanding with machine learning at large. Perhaps, but also perhaps quite compressed. I think, I don't know the exact years. I wasn't around then, but I think it was only a couple years of work before ELEASR was like, hey, wait, this could get tricky. Right, okay, so I actually didn't know that about Lleaser. At first he was like, AGI as fast as possible, and then he was like, whoa, whoa, whoa, AGI as slow as possible. Yeah, I think, I'm not sure as slow as possible, but like AI done correctly, AGI done correctly.

Starting point is 00:08:09 I think, you know, we were hoping for a long time, like, one of the reasons we do technical research is that, like, you can often, like, often your lever is just, like, solve the problem. Right. Like, screw slowing people down, that, like, business people off. It's, like, slowing people down is sort of a last resort. The original hope was, can we just, like, solve the problem in time? It doesn't look like we're on track to solve the problem, and it looks like we have less time than, you know, I was hoping back in 2014. And so I think it is with great sadness.

Starting point is 00:08:44 that people like Elaser and I are now saying we need time. We need more time. Can you kind of walk us through your own trajectory? How did you become the executive director at Miri? Well, if you want to go back far enough, I, at a pretty young age, I realized that the world was not very well organized. and wasn't, I was in a civics class,

Starting point is 00:09:20 and up until that particular civics class, I had some intuition that, like, there were lots of problems in the world, but people were sort of trying to fix them, and the reason there were still all these big problems in the world, was that we didn't have the technology, we didn't have the, like, we were still a young race, we were still a young species.

Starting point is 00:09:41 We hadn't, like, matured to the point where we could fix these issues. This was sort of like an implicit wordless intuition rather than a conscious belief. And then, you know, I started learning how the U.S. government works. And I was like, oh, God, it's like run by a bunch of monkeys. Like, it's like monkeys invented, like monkey systems. And it's all like working about as well as you'd expect if it was like invented by people who had no idea what the hell they were doing. And then like allowed to run for like hundreds or sometimes thousands of years and just like crane off into various.

Starting point is 00:10:13 And so I was like, okay, like, obviously. In the Moloch and crypto world, we would call this coordination and coordination failure. Totally. Yeah. So I was very interested in solving coordination failures and more generally in making the world a better place. And I try my hand at various versions of that while I was, like, pretty young and bad at things. And, you know, it's a hard problem. None of it worked.

Starting point is 00:10:43 And the sort of circuitous route is that ultimately I got a job in tech while I was like trying to find ways to really move the levers on that problem, decided to donate a decent amount of money to good causes to sort of keep me honest about actually trying to make the world a better place, was trying to research where the best places to put money were, donated to some like global poverty. type charities and then started bumping into these arguments that like maybe this AI stuff is actually one of the biggest places to intervene on the world. I sort of read up on it. There were a couple other factors in my life that were also causing me to notice this AI thing. I read up on it and I was like, oh geez, like this is just obviously, you know, I was looking at the wrong problem. Like the coordination problems are big and they're real, but like this AI thing, you know, like, humanity like lives or dies

Starting point is 00:11:47 and gets like a great future or no future, depending how this AI thing goes. And I just like completely missed this problem for like eight years of thinking myself as trying to make the world a better place and going for the heart of the problems. But... So when you ran into the AI topic,

Starting point is 00:12:07 did you see... Did you first see AI as a solution to all of our coordination issues or AI is a problem for all coordination issues or did you see it simultaneously at the same time? A little bit of both. I was maybe somewhat primed towards understanding some of the issues with AI due to my work on coordination problems.

Starting point is 00:12:28 It's like slightly embarrassing, but I was like working on like various coordination mechanisms that could address the sort of concerns people had at the time like how can a well-coordinated society without coercion address, for example, like, concentration of wealth in ways that the society as a whole doesn't like. And you can set of various coordination mechanisms of like, like, yeah, you can sort of try to think about like what are non-coercive ways that a society as a whole can like try to both

Starting point is 00:13:03 have a market system and not let it get out of control in certain ways. And while messing around with like toy models of this and like attempts to like prove certain theorems, I just like couldn't get some of the results I wanted. And it turns out that I couldn't get some of the results I wanted because nothing stops one actor from being powerful enough that they can just run away with everything. And this was sort of like, it was one of those issues where I was sort of like, well, you know, I can get it to work in a lot of cases, but I can't get it to work in all cases. And then with the AI stuff, I was like, oh, that's why. Can you elaborate on that? Why is AI, why is AI like the kernel of the issue?

Starting point is 00:13:42 I mean, AI is a version of that particular issue, but like fundamentally, no matter how good your market coordination systems are on Earth, like, if somebody has the raw technological power to, like, for example, get, well, we could. call in the business a decisive advantage. So, like, maybe the easiest thing to imagine, given that we already know that, you know, trees are machines that turn dirt and sunlight into more trees by stripping carbon out of the atmosphere and building wood. We know that, like, nanotech's possible. If you imagine something that just, like, gets to nanotech before everything else, and it can just, like, reassemble you into a more willing trade partner that asks for less of the gains from trade, suddenly all of your coordination mechanisms that were like market-based and non-coercive or whatever melt before this thing. And like, am I saying that literally happens

Starting point is 00:14:45 or literally like is in a market framework? Not particularly. But you can sort of see how like I was sort of like trying to put a like collaborative agents interacting framework on a like physical reality where it's just a fact about the physical reality that like things with a sufficient technological edge just can work the table with you if they have too much of an edge and if they don't care about you simply put is this kind of are you just combining moloch problems and the bankless nation is pretty familiar with moloch problems we've we've done a lot of content on moloch you combine moloch problems with exponential technology and then you arrive at some sort of like logical end where humans get their atoms repurpose?

Starting point is 00:15:30 Is that more or less the simple articulation? It's not a bad summary. I wouldn't use exponential in particular. I make no strong claim that it's an exponential curve. My guess is that it's not and it's worse. And much of the issue here is if you make something that is optimizing the world and it's optimizing the world towards some target that doesn't have concern for you in it. Like, I would have much fewer qualms about, like, I would have basically no qualms about, like, human technological development.

Starting point is 00:16:08 I sort of am very optimistic about humanity's better natures and humanity like being able to figure out, like, how to make the world more like we would want it to be upon reflection. And if we were wiser, and rather than locking ourselves into totalitarian decisions, topias, which I think totally could happen. But like if we can just like ramp up human like intelligence and capability and so on without like accidentally killing ourselves, I'm like pretty bullish on humanity's prospects. And so it's not it's not so much like, oh no, technology is coming. It's coming too fast. We want to be able to handle it. It's more like, oh no, we are like on the brink of building optimization processes that optimize the future much harder, faster, better than we can, and that are optimizing it to a place that has no room for us.

Starting point is 00:17:04 So it sounds like AI is one way and that might happen, but you're also saying the way that you're talking, it sounds like there's other ways, AI or not, when the same hyper-optimized future that's not optimized for humans could play out without AI. Like, I sort of expect we're going to get to superintelligence one way or the other. AI looks to me like one of the, like basically the only feasible route modulo, like if humanity can't come together and coordinate to take some other route, I think other routes like Holbron emulation are probably preferable. Where to be clear, I'm no carbon chauvinist and I very much want to live in a future with like artificial friends where those artificial friends have like very different sorts of like desires and. goals and objectives from me. I'm not like humanity must keep an iron grip on the future.

Starting point is 00:17:59 I want like space for aliens. I want space for artificial minds. I want space for other kinds of life. The like concern here is like building a mind that doesn't care for life, that doesn't care for fun, that doesn't care for like diversity of experience and like, interesting arcs and like cosmopolitan value

Starting point is 00:18:30 and like broad inclusive like good times and I think that we are in fact barreling towards that cliff edge of making something that fills the universe not with weird valuable stuff but with non-valable stuff.

Starting point is 00:18:52 Okay so you've been thinking about these problems for a long time. When did you start at, Mary? What year was at? That was 2014 that I was hired. Once I noticed that the problem existed, I donated, I think, $16,000, which I think at the time put me in the top 10 public donors list. Which, and they were like, congratulations, you're now in the top 10 public donors list. And I was like what? And they were like, you know, we're doing, we're doing, I don't remember the exact amounts, but they were like, we're doing our fundraiser for $200,000 for our yearly budget this year, $100,000 of which we're trying to raise from the community and like $100,000 of which

Starting point is 00:19:40 is like matched from another donor. And, you know, it's like three weeks into the four-week fundraiser and they raised like 20K of it. And I was like, oh, God, like, this is worse than I thought. Oh my goodness. That's hilarious. And I was in 2014 or is that? That was 2013. 2013. Yeah, so I donated. Precipitated your arrival actually working at Mary. That's right. So I donated more money because I had no-no's-that-bad.

Starting point is 00:20:01 Did you end up funding yourself, your own salary? I took a very big pay cut moving from Google to Mary. And I sort of, you know, I was expecting not to be very skilled at working on these issues. and maybe I'm not. There haven't been a lot of people on these issues. But at the time, they, I was like, how can I help? And they were like, well, maybe if you go to the math, you can, like, come to summer workshops. And so I came to one of their workshops.

Starting point is 00:20:35 And then a few months later, they were hiring me. And then a year later, they were saying, can you run the place? So it, I am largely in this field by dint of, and this, by Dint of showing up early. And like, for the love of God, people more skilled than me, like, by all means, come replace me. Right. So that's kind of what I was leading us into.

Starting point is 00:21:04 So that was 2014 when you started. It's now 2023. So you're almost there for a decade now. Yeah. Now AI is having a moment. Very much spurred by chat GPT. All of a sudden, crypto podcasts are talking to AI people. What is that trajectory like?

Starting point is 00:21:19 as somebody who was immediately compelled by the problem at its very essence so far long ago. Now, fast forward to where we are now and kind of the problem seems to be on the horizon. I don't know how close it is. I don't think anyone does. That's kind of the problem. But like here we are nine years later. And now many, many, many people are talking about it. Can you just talk about that experience?

Starting point is 00:21:40 Yeah, it's heartening. One thing that I have really enjoyed about it is, is I've spent many years having conversations with people in the field, many of whom sort of don't really want to hear that their work by default is like barreling towards destruction. And so I have like these long conversation trees like I have rejoinders to all sorts of counterarguments. And when I go into these discussions with, you know, people on the capability side of things, I have like all sorts of responses prepared and I sort of am like ready to like go down this long decision tree. And then I sort of like nowadays, many more people are noticing the issue. And, you know, I was invited here.

Starting point is 00:22:35 And I think crypto people would actually pretty much really resonate with that where like we have to explain, you know, Bitcoin, 21 million hard cap. We have to explain all these things like proof of work. And like the conversation trees that we have to go down, we've built out those like, nay responses. It was like spinal reflexes. And then lately, moving into 2022 and 2022, three, fewer of those things we have to explain, especially as we just printed out a bunch of money for COVID simi checks. Like all of a sudden, we have to explain the concept of scarcity a little bit less. And so it kind of sounds like a similar experience that the AI people have. Yeah, totally. Like now I go to people who aren't in the field and I'm like ready to go down

Starting point is 00:23:11 all these citizen trees and they're like, what's the issue? And I'm like, well, in the most basic sense, here's the issue. And they're like, oh, yeah, that seems rough. And I'm like, oh, man. this is such a different conversation. I mean, that's the first step, right? The first step is education. And also acceptance of the problem. I could imagine for so long, you were saying, hey, people would ask you,

Starting point is 00:23:29 hey, what are you working on, Nate? And you'd be like, oh, I'm working on AI alignment. And then people are like, why the hell are you working on that? Yeah, they're like, oh, that's weird, or they're like, is that some weird Terminator thing? Yeah. And it's been nice to sort of, I sort of think that, like,

Starting point is 00:23:46 a lot of the basic issues, have been pretty obvious the whole time and that we're now seeing people who don't have distort incentives noticing the issues. But it's really quite hardening to see. I don't know where it will go,

Starting point is 00:24:03 but it's been nice to see people starting to notice that this is a real thing. It's really on the horizon. Like you said, I don't think we know how far. It's very hard to predict. at least with precision. But it has looked to me like one of the biggest issues facing humanity for a while,

Starting point is 00:24:27 and it's very nice to see others start to notice that as well. So that leads me to the question of just like how optimistic you are, and I'll ask that in two phases. First, the same question that we asked both Elyaser and also Paul Cristiano, I was like, all right, what are your chances of doom? What are your chances of the worst AI problem being the worst, the worst version of itself? I mean, worst version of itself, I think, is very hard to get.

Starting point is 00:24:54 But the version where, like, we all die, like, there are faiths worse than death. But, like, the version where we all die, I think this is pretty likely. I think this happens by default. More than 50%. Oh, definitely. Paul Krishonno gave us 10 to 20%. So you're saying more than 50%. My understanding of Paul is that he has 10 to 20 on the scenarios that I think are like

Starting point is 00:25:15 AI takeover and higher probably. bills than that on like humanity completely disempowered. I'm definitely, uh, I'm definitely more pessimistic than Paul on these counts. Uh, like, I would say that on my models and visualizations, on my understanding of the problem, there is very little hope. Uh, and most of my hope comes from me being wrong somehow. Uh, and so my probabilities on this destroying everything I know in love are like as high as my probability, like, they're about as high. as my probability can go, given, like, the fact that I may just be totally wrong, and hopefully am. Okay, so you're pretty close to the L-Azer side of things, which is, like, 95 to 99% doom.

Starting point is 00:26:03 Yeah, I mean, I think 99s are hard to get. But, like, there's also a difference between, like, what does the world look like as I see it? The world looks as I see it. Like, like, the place, like, as things seem to me, we're just like, you know, within a rounding over 100%. And the difference between that and my betting odds is in like, hopefully the world's not as it seems. Right. Yeah.

Starting point is 00:26:32 So what you're saying is like, we don't, the nature of the AI problem is just a lot of we don't knows. And so what you're saying is like, the reason why you maintain some level of optimism is because there's like a white swan event that's possible that could save us. Yeah, and, you know, I have a bunch of, I've thought a lot about various parts of this problem, and I have, you know, various guesses as to where white swans are more or less likely. And for instance, it looks to me like the white swans are less likely in my unknowns about AI and more likely in my unknowns about how humanity is going to react to the problem, although there are still some unknowns in how AI goes where there could be white swans. Sure. Do you remember when you were first working on this problem, I know you weren't as skilled or as knowledgeable back in 2014 to 2017 when you were first working on this problem. But what was your level of optimism or pessimism back then?

Starting point is 00:27:25 And like how has your attitude towards the problem shifted over the last almost decade that you've been working on this? You know, it's gone up and down. I've rarely had double-digit odds of survival. But I have had double-digit odds of survival when I've been explicitly quantifying. And, you know, most of these numbers are like coming straight out of my butt. I would like one put too much. But by definition, everyone's numbers are coming out of their butt, and that's kind of like, there's no alternative. Yeah.

Starting point is 00:27:53 You're right. And I don't spend a lot of time worrying about specific numbers. Like, you know, once it's less than 50% chance we get good outcomes, it doesn't affect my day to day. I'm not like staying up trying to calculate significant digits here. I'm like, man, humanity does not look like it is up to this sort of task. That doesn't, you know, I've seen humanity try to coordinate. and it's the one thing

Starting point is 00:28:19 we have not figured out yeah and for the record the reason that my the way that I managed to have like high probability this is tricky is not that there's any one part

Starting point is 00:28:32 of the puzzle that looks to me insurmountable that you know humans are pretty good at solving problems when they put their minds to them the way that like the reason that I'm like pretty pessimistic here is

Starting point is 00:28:43 it looks to me like there's a bunch of different ways for things to go wrong and there's a lot of things that need to go right for things to go right. Like you not only need to solve various technical challenges, you need to have uptake of the technical solutions in the relevant organizations. Those organizations need to be able to bureaucratically recognize the difference between a real solution and a fake one. You need to have them like carrying at all, which is not even a fight that like we've won yet. There are, you know, you have like the heads of labs that like Microsoft and Facebook like poo-pooing a lot of these.

Starting point is 00:29:15 issues. So there's like five, six, seven needles. And like this, so I want to combine two metaphors where like the stars need to align except the stars are needles that we also need to thread. And like that's, and that we need all of those things to happen. And you're saying it's like that window is small. That's where you get the, uh, the difficulty from. And to be clear, um, uh, I think it would be a fallacy to say like, look, I can give you like six

Starting point is 00:29:44 things you need to do. and what's the chance you can get all of them? That sort of reasoning doesn't really work. Like, if I line up all six and then the one that I assign least likelihood to happens, probably underestimated his likelihood, probably underestimated the correlation. Like, these are not independent events, right? Like, if we can solve the hardest of these issues, whichever one that turns out to be, probably it's because we turned out to have coordination, skill, or competence, and so on.

Starting point is 00:30:07 Like, I'm not saying you can drive the probabilities arbitrarily low by the fact that I can line up a bunch of hurdles. I'm more saying it sure seems to be like there's a bunch of hurdles, man, and each of them, like, well, not all of them, but like many of them have a character that, like, humanity hasn't really faced before, and this all adds up to me being like, man, I'm like single-digit probabilities of survival here. So with this new, or maybe first surgence of interest to the AI problem, now thanks probably thanks to chat, GBT, thanks to the problem itself. how has that shifted your optimism, if at all? It's, I mean, it's, I feel a little hopeful about it. I feel like a spark of hope here. It doesn't, it doesn't, like, shift my probabilities on the ground too much.

Starting point is 00:31:05 Like, this is like a really dumb model, but, like, if you imagine having, like, three variables, each with a one and a hundred chance. and success is like multiplying them all together than raising one of the variables from like 1% to 100%. It doesn't change your overall probability too much. Right, but it is the first needle that is threaded. So like if you go, if you're like,

Starting point is 00:31:28 I know it sounds like you don't really like these like specific numbers, but if you are like 99.9.9% doom and then because people are now optimistic, you go down to 99% doom. It's still an order of magnitude. Right, right. So like it does feel like like some orders or magnitude

Starting point is 00:31:43 on the models that say were screwed, right? So like, like, my, my, the parts of my models that are like, maybe we're fine are like, maybe I'm just wrong about some stuff. And the parts of my models that say we're fucked,

Starting point is 00:31:55 these models are basically saying you have zero percent and like, you know, it's really like, you know, zero points, blah, blah, blah, and then, you know, a one or whatever. And there you're getting some orders of magnitude, which does feel hopeful.

Starting point is 00:32:07 I'm, like, enthusiastic about that. And it, like, ups the probability that I'm wrong about something that matters like humanity's general ability to coordinate, as would like. So it does, it does, like, it does, I'm, like, heartened by it. It doesn't noticeably affect. Of all the stars, we've, we need to align. One is started to show that it's moving in the right direction.

Starting point is 00:32:29 It's moving in the right direction. Like, there's a whole bunch, like, I'm not like, oh, suddenly, like, the populace is going to realize these problems and react sanely. There's, like, there's, like, so many more steps that people can stay that off the train here. There's, like, you know, if, if, you know, if, you look at the national response to COVID, people noticed that there was a pandemic going on, and that didn't make them respond in any sort of reasonable way to it.

Starting point is 00:32:52 Like, maybe AI will be different. You know, they're sure were more movies about, like, robots gone wrong than about pandemics beforehand. At least that would be my guess. But, like, many of the movies don't really understand the issues. There's, like, ample room, like... Like, like, I, there's just, there's, I've seen politicians try to respond to issues before. And I would not say that the, like, social awareness needle has been threaded.

Starting point is 00:33:28 It's like showing, it's showing promising signs. And that, like, gives me some spark of hope. But we haven't, like, cleared a whole obstacle in a way that would make me be like, wow. Like, humanity is bringing much more confidence to this issue than I expected. Not to keep on this one particular line of conversation for too long because it's only one part of the overall figure. But just like, we don't actually need, I want to present the argument that we actually don't need all of humanity. I guess we do need governments to coordinate. And so we need the leaders and we definitely need those key figures.

Starting point is 00:33:58 But like if the bottom half of the IQ of humanity is like, it's not a problem. And then the top half of the IQ of humanity is like, this is a real problem. I'm going to count that as a big win because like we don't need all of the people. We just need the smart people to focus on. the problem. I think that's mostly right. Modulo, the issue where, like, which regulations happen and how the government systems move, I think, does depend a lot on the public. Sure. Or at least it can. I do think you're right that, like, you know, there's some, like, if we, there are ways that we could resolve the technical issues with such, like,

Starting point is 00:34:38 resounding success and it could turn out that the resolve technical issues were sort of like so like obvious in their property of being a solution or like otherwise very beneficial to capabilities such that like you just get very big uptake

Starting point is 00:34:56 and that maybe could be done with like a relatively small handful of geniuses who can do much better at the problem than I ever could and like maybe that would be a way to just solve the whole issue without going through like various other social obstacles or political obstacles or so on. And in that sense, sure, you just need like maybe even the one, like maybe there's like one bright person somewhere in the world who would find this problem easy, who like hasn't had

Starting point is 00:35:24 the opportunity to see the problem yet because the world's really bad at like getting resources to education to people who need them. Right. I wouldn't bet on it at this point. but in that sense I would agree that like in some sense we just need the right mind to the problem okay so pivoting to the looking at the problem as a whole you said your models are effectively close to zero at being able to solve this problem on model yeah on model like like within yeah within my models cool within your models uh why why do your model say that um it's it's again due to sort of like a disjunctive argument, like many paths lead to doom here. And much of the reason is the way that, like, one of the forks is the way that people don't seem to really take the problem too seriously.

Starting point is 00:36:23 And there's sort of, it feels to me like there's sort of gradations of this of like, like people, I don't know, like, Back in the day, a lot of the argument centered around, like, is artificial intelligence even possible? Is significantly smarter than human intelligence even possible? And, like, once you convince people of this, then they're like, well, can it be solved in the next, like, hundred years or whatever? And once you get past this, they're sort of like, well, maybe it'll just be, like, more moral as it gets more smart. and then once you get past this there's like well maybe you can just sort of like train it and it's fine and like the train keeps

Starting point is 00:37:12 going and like people can always find a reason to get off the train at the next stop and you then can put in like a bunch of painstaking effort to try and like lay down the arguments as to like why the issues are like

Starting point is 00:37:29 maybe harder than this and we just seem like very far from like the people at these labs running these labs they're getting off the trains at pretty early stops and even if reality starts beating them over the head with various things I expect them to like only move one more stop or only move like as far as reality is forcing them

Starting point is 00:37:57 and then like separately my models say that there are issues that predictably arise only once the AI gains significant capabilities and can be a real threat to you. And if you're in this regime where you're sort of like need to drag people along and they sort of like only start believing things

Starting point is 00:38:21 when they empirically see those things, now you have an issue where like if there's issues that don't empirically arise until the AI can wipe the planet or wipe your civilization off the planet, that's too slow. It's too slow. Right.

Starting point is 00:38:38 I give like other forms. There's one model for like visual to like understanding this. It's solving the AI alignment problem is that there's a bunch of decision trees that we need to go down. So I'm imagining literally a tree. And say there's like big tree, big big tree, big oak tree. And there's a single fruit on this tree. Oak trees don't grow apples.

Starting point is 00:38:56 But let's say say this oak tree grows apple. There's one apple. And that is the solution apple. And it's very high up in a very far branch away. And we're at the trunk. and we need to find our way to that one single apple, that one single fruit on the tree, except that tree branches eight times. And then when you have each branch, each branch branches itself eight times.

Starting point is 00:39:18 And so, like, it's an exponential problem because you have to choose the right path without knowing where the solution is, without knowing where this golden savior apple is. And we need to make the correct path towards the apple without falling down any sort of the dead ends. And so maybe one of the ways to articulate why your models are basically close to zero is that you're just bearish on humanity picking the right branch fork to lead to the solution apple for the number of times that we actually need to do that. Yeah, that's a pretty decent analogy. I would say that like you've got to be a bit careful with arguments like this because if humanity has managed to choose the past seven forks correctly, you're probably not like thinking that there's an independent probability on the eighth. And like, so like I don't actually buy, like, I'm not actually getting my probabilities here from like these sort of like, well, it's exponential, look at all the independent guesses. Like, once you've been wrong about humanity picking the right branches a few times, you should no longer think they're independent. Right. You can start to be optimistic. Like, hey, maybe we're doing something right here. Right. And like another thing I would add to the analogy is that like a bunch of the branches are like full of apples that give you money until they're apples that give you death. Right. And so, like, and everyone's, like, making arguments as to why they can, like, detour into this money branch.

Starting point is 00:40:37 Right. Which, and they do give you real money right up until they, like, give you death. Right. And then you add that to the fact that humanity, like, has seemed in practice to be, like, very interested in wrong branches. Right. Very susceptible to the money apples. Very susceptible to money apples. The money death apples.

Starting point is 00:40:53 Right. Now you're getting, like, a bit closer. Right. I will note one place where I am commonly misunderstood is that people think that I think that I think the technical problems of alignment are like super duper hard for some reason. This is like basically not the case. My stance is much more like the technical problems of alignment are basically underserved. There's been like, you know, a few dozen people working on these things for not terribly long. Like if you were, like humanity has spent spent much more effort trying to like solve physics, at least I think.

Starting point is 00:41:31 I'm not actually terribly familiar with how many scientists there were in, like... Pre-Newton era. In the pre-Newton era in, like, whatever club Newton eventually was in. But, like, humanity just really hasn't put much of an effort towards these issues. And one thing that makes the problem tricky is that, like, there is much less room for trial and error, or so my models predict, given these, like, issues that empirically only show up right around the time that your civilization is getting wiped. And that raises the difficulty level, but it's not like we have turned the best minds of three generations to these issues, and they have come up empty-handed. It's like we've turned like a few dozen weirdos and nerds who are like able to be compelled by these arguments 10 years ago to these issues. And, you know, now we're turning like a few more people to these issues.

Starting point is 00:42:24 But like, I am not saying like, and there's some great technological feat that needs to be pulled off. I'm saying there's like a normal technological feat, and meanwhile everyone's like scurring around doing something else instead. Okay. Where the normal technological feet does have these extra difficulties of like you can't do as much empiricism, which may be enough to push it over the edge of like humans couldn't do it. But like in large part it's just underserved. Sure.

Starting point is 00:42:52 Sure. When the last few moments of our Paul Cristiano episode, we asked him like, why, what are the bottlenecks? What are the constraints to solving this problem? his answer was, interestingly, not funding. It was talent. It was supply of brain power. Would you agree with that?

Starting point is 00:43:08 Yeah. Or at least funding was a lesser problem. Like, you can always, yeah, like money's fungible. And although it's tricky because it's only fungible to a degree, like you do have issues if you try to put in a lot of funding that you like start to distort the incentives and get people who are like showing up. Grifters. Yeah, grifters.

Starting point is 00:43:29 and you also have legibility issues where, like, are you distorting the field towards the legible work and away from the, like, less legible but potentially more important work? I basically think you shouldn't worry about that at reasonable monetary skills right now, but you should maybe start worried about that. You're saying legible versus illegible, where the illegible is just, like, hard to understand, hard to comprehend, but actually technically correct. and then... Or in particular, like, hard for a grantmaker

Starting point is 00:43:59 or a, like, funder to evaluate whether you succeeded. Right, okay. So, like, if you can read the paper and it's simple to understand, the grantmaker might, like, oh, let's fund that. But it could actually just be a wrong path on the death apple tree.

Starting point is 00:44:13 Yeah. And, like, I basically think you shouldn't worry about this. You could, with sufficient amounts of money, get into these situations where, like, I start to worry that you're distorting the field in that way. but I do think I think talent's lacking. I think there's maybe two kinds of talent that I consider to be relatively different that I think are lacking. One is just like more hands on deck trying to understand how these AI systems that we have today work.

Starting point is 00:44:50 Like we're starting to make like fledgling minds. doing stuff that we can't do by hand. We don't know, like, we couldn't program similar capabilities by hand. We don't, like, know what the, like, algorithms, data structures type stuff. We don't know, like, what, like, we don't know how these things are working. We know how we built them. We know how we got them to work. We don't know, like, internally how they're working.

Starting point is 00:45:14 Understanding that would give us quite an edge in figuring out, like, how to point minds at things on purpose. And I think we sort of want. all available hands on deck trying to do that stuff. And then I think that separately there are questions of like, like, I sort of, the alignment problem I think can basically be factored into, this is like not quite true, but it's like a fine first approximation, can basically be factored into one challenge of like,

Starting point is 00:45:46 how do you sort of make an AI that wants X for some, X of your choosing and then separately a question of like what X can you put in there such that like you're happy with what happened. There's like a bunch of additional issues where the additional issues are like how do you make it be able to like do one thing without a ton of side effects you didn't want and like then shut down rather than like you know prevent you from turning it off so it can verify forever that it's successfully completed his task or whatever like there's there's issues of like how do you and that's sort of like a whole separate back of concerns. But, you know, many people sort of think the problem is, like, what would you ask an AI to do such that the results would be good?

Starting point is 00:46:31 But it seems to me the problem is much more like, how do you get an AI to do to, like, care about what you wanted to care about in the first place. And that seems to me like it takes a different and often less legible type of research that I think can totally be informed by understanding how the current AIs work. these sort of like directions go hand in hand. But one of the big things I think we're missing talent-wise is the sort of person who's like, has the like ambition and the gall to say like, I just can take a swing, figure out what the hell is going on with minds.

Starting point is 00:47:18 How do minds like end up caring about things? or pursuing things or like having preferences or like having targets how does that work like I think that I can like figure out how that works and how to direct it like humanity does not have a theory of minds in this way we do not have a theory of like minds that can be pointed and it's probably not for lack of that theory being possible it's probably for lack of like just having gotten there like science wise and you can sort of come at that from one end which is just like figuring out how the things in front of us work and then like trying to learn what you can about minds and learn what you can about aiming them. I think there are also other ways to come at that that take much, I think, less legible research and like more like independent visionary sort of research, although I think you need a bunch of vision to make progress and figuring out how the current systems work. But that's another place where I feel like we're really hurting for talent is those like ambitious visionary. who just think they can take a swing at the hole, like, alignment challenge.

Starting point is 00:48:26 Okay. So when you, say we fast forward to the future and we've solved the alignment problem, because that's the only future that we'll have to be able to reflect upon this question in the future. Is there, like, going to be a statue of a person who's going to be like, they solve the alignment problem? Or is it going to be, like, a team of people? Or is it going to be not even, like, a moment where, like, AI alignment is solved? and it's just like the alignment problem just dies by a thousand cuts.

Starting point is 00:48:54 Do you have any sort of mental model for this? Like I think these things, I think it's like pretty unclear. And part of the question doesn't come down to how does it happen, but it comes down to like how do humans attribute things? Like it does seem to me like historically a lot of like big theoretical insights, a lot of like paradigm shifting theoretical developments end up attributed post-talk

Starting point is 00:49:26 to individuals like Newton or like Einstein I think there's like a bunch of truth to this I think you know we also want to count like Riemann and Laplace names that we don't know and that's kind of the point but that helped out like Newton and Einstein

Starting point is 00:49:45 that I'm getting my like L names mixed up. I'm not actually sure it was Laplace. But also at the point. Yeah. But like surely it will be an effort that requires lots and lots of people. Surely will be an effort that requires like many insights from many camps. Surely there will be huge amounts of like individual labor, much of it probably thankless,

Starting point is 00:50:13 from like people who show up and can work on like the shovel ready projects that scale with labor. whether there will also be critical insights that come from geniuses that like change the paradigm I think my guess if we like condition on

Starting point is 00:50:33 getting to a future where the problem was solved my guess would be yes but that guess is like a little bit distorted by like how did we get out of the hole that we seem to be in and like well probably there was like some force that like made things go a lot better than it looks like they're on to go and like lone geniuses are the sort of force that can do this.

Starting point is 00:50:51 If you're just like looking for probabilities that it takes lone geniuses, seems hard to call. Sure. So the bankless audience is sufficiently large to the point where I'm going to say that there's at least one person listening to this conversation who's going to be like, I'm ready to dedicate my life to solving the AI problem. What advice do you have for that person? Where should they start?

Starting point is 00:51:14 How should they get started? It's tricky. there's um like a lot a lot of the people that i'm most excited about have come in from very different angles and have their own sort of like novel perspectives on the problem uh i'm like generally much less uh enthusiastic about like some people some people come into this problem and they're like, let me read everything everyone's ever written about the problem and like try to synthesize it and get a sense of like where things are and then like work from there. And other people come in and they're sort of like, you guys are obvious idiots.

Starting point is 00:52:01 You haven't checked like the obvious things. The obvious things are these things. Like let's try looking at it this way. Let's try doing like and like, you know, maybe maybe maybe I'll like send my research around and people can tell me where I'm like being obviously dumb or like retrading past mistakes. But like I'm starting with the assumption that. like no one here previously was like on the ball. I tend to be more optimistic about people in the latter category. I tend to think.

Starting point is 00:52:31 They need a creative solution. Like creative solution and like someone who's more like I can obviously have ideas about this myself rather than like let me make sure to integrate everyone's previous ideas so far. Like everyone's previous ideas so far haven't solved the problem and also many of them are kind of dumb. Some of them mine. like at the same time

Starting point is 00:52:57 there's sort of like a bunch of context that I do think people need and it's sort of hard to get like what are good intro resources like my guess is that like the less wrong wiki has a

Starting point is 00:53:24 AI alignment intro resources page and if you Google like A.A. Alignment Inter Resources is less wrong. You'll find like a collection of a bunch of different intros people have written and then you can maybe like find one of those introses that like resonates with you. It's hard. I don't really know

Starting point is 00:53:43 how to onboard people. I don't really know where the people who come in and are like I have a chance of solving this whole damn thing come from. but I think my advice would be like maybe look around on the internet for some resources that you like and also maybe like just try to solve the call on yourself. It sounds like it's a there's no guide, there's no university path. There's a dark forest that you have to get through.

Starting point is 00:54:12 And if you get through the other side to be able to be competently talking about this thing, congrats you're there, but also the whole problem itself is also a dark forest. That's, yeah, that's, it's definitely part of the issue is that, and it's not, for lack of like a guide someone wrote on the internet you know there's probably like at least half a dozen of those i sort of don't think any of the guides are very good uh like this field is like like like a yeah you're totally right someone who can solve these problems needs to sort of be able to like go off to the frontier where things haven't been done before uh and part of that is like even getting to the to the beginning of the problem

Starting point is 00:54:54 at all. Like, it's a skill you'll need, although I'd love to be able to, like, get people to beginning of the problem more rapidly. But it does feel to me a bit more like trying to, like, figure out physics pre-Newton rather than trying to figure out physics in the days where we have

Starting point is 00:55:11 physics classes. Like, there's not there's not a, there are people who have tried to write intros, there are people who have tried to write like open problem lists. I don't think they're great. You can find them on the internet. like try to develop your own intuitions and approach the problem as if like we know very little because we do in fact know very little. Okay, so that's if they want to direct the solve the problem head on. I know that I don't have the mind to apply my skills to this. So what do I do? I do

Starting point is 00:55:43 podcasting. What about some secondary skills and secondary efforts that people could to help solve this problem. I, like, there's, I think there's a place for regulation on these matters, which pains me deeply to say, given how well regulation has done on various issues in the past. And, you know, I think, I think many harms in society come from over-regulation. But, like, it seems to be human beings. humanity's only tool for going to a field that, like, self-professes largely that it has a decent chance of killing literally everybody and saying, like, hey, maybe, like, back off on that until, like, we understand what we're doing well enough to, like, do this job properly. I would love humanity to have tools that weren't regulatory for this. And, like, if I was trying to design the coordination mechanism, I would be, like, trying to handle it with, with like liabilities rather than laws.

Starting point is 00:57:01 Well, the problem is sufficiently large that the cost of regulation are acceptable. That's what it seems to me, although I say this with sadness. And so, like, that's a whole track where, you know, I'm no expert in how to get regulations to be actually, like, narrowly targeted and good. but, you know, I think there's a bunch to be done there. I think most people will also be like, well, I don't have, like, the mind or the ability to go into, like, politics where it matters at the moment.

Starting point is 00:57:32 And I'm not sure going into politics is the right thing there. But that's another area where people, where I think there's, like, work to be done that takes a different, that draws in a different skill set. I like if if people think they have an edge in like the education problem in like writing up the basic arguments in a way that like reaches a different sort of audience or is like more compelling to a different group of people or is like more modernized or something I think these are all like find things to be doing. Like, for example, you seem to me to have noticed you have an edge in, like, talking to folk, like, helping, like, the sort of arguments and the recognition of the issue reach or broader audience.

Starting point is 00:58:38 I think this sort of stuff is great, draws on a different set of skills. For a lot of people who don't have, like, one of these three opportunities, I think there often isn't an easy way to help out. Reality does not need to give everybody a, like, the laws of physics don't care about you. They can just, like, drop you in a world that is, like, under serious, shutter destruction while not giving you an easy thing to do about it.

Starting point is 00:59:05 And I think, like, there's a skill to sort of, like, not losing a bunch of sleep over it, not getting terribly depressed about it, like, looking around for where you can help, and like if you can and you want to because you're like believe that there is like a big threat here and you care about averting it, then like, hell yeah, I respect that. And if like you look around and you can't find good ways to help out, such as life, keep an eye out and like no need to get depressed about it. psychologically how do you deal with this this looming problem like when you wake up in the morning

Starting point is 00:59:49 are you like ah shit we're going to die or like how do you deal with this mentally i mean i have for a long time not had much faith in humanity's ability to coordinate and so most of the emotional blow most of the update for me uh was in late 2012 when i became persuaded on these issues you know it's it's it's it's not like uh i was like oh yeah is interesting and then like over times i saw humanity like go down paths that seemed to me like quite derpy and like failed to take the problem seriously and handle appropriately my probabilities went down i sort of like i think correctly just like guessed early on that probably humanity was going to be pretty derpy about this and go down false paths.

Starting point is 01:00:41 Love derpy as a technical term. Like, I sort of like try not to make predictable updates. And so, like, there was a day in late 2012 where I was like, oh, geez, like, I was wrong about a lot of, like, I was wrong in my previous pursuits. I missed, like, the biggest problem heading toward this planet. It's, like, kind of embarrassing that I wasn't able to figure out myself. Like, I thought of myself as, like, trying to go for the world. biggest problems. But when I was like 14, I intuited what seemed to me like the world's

Starting point is 01:01:12 biggest problem and never like sat down and tried to make a list of like what other like problems might be bigger. I was just like, obviously coordination is the biggest one. I'll like go for the throat on that and like needed some other people to come along and be like, hey, have you noticed this intelligence thing and how it's like is the primary factor determining the future and how humans are not at the maximum of intelligence and like are on their way to make like other intelligence that won't by default care about anything nice. And so there was a day when I sort of like, that argument hit me. I, like, my probability of a wonderful future dropped from, like, you know, like mid-90s to

Starting point is 01:01:53 mid-tenths or, like, mid-zeros, I guess. And I mourned. And then I, like, didn't feel a need to like psychologically focus on it a bunch after morning like you mourn and then you try to save your civilization. There are still many times like it's not on my mind when I wake up. There are still many times when I'm sad. There's still many times when like I see something particularly beautiful or particularly moving or that I really quite like about like this planet and my species and like sentient life more broadly and I like shed tears about it.

Starting point is 01:02:49 But it is not a dominant psychological factor as opposed to like it's a deep source of sadness, but I'm not like constantly wallowing it. I don't really have a question here, but I will say that, say there's a 1% chance of solving the AI alignment problem. That doesn't just mean that we don't die, though. It means that actually, all of the negative side of the AI alignment problem, the kill us side, inverts and it saves us and produces the inverse. The level of bad turns into a level of good that we've never seen before. Totally. And so there's something there about just like maybe that 1% of solve. that chance is so small, but the good that comes out on the other side of that is really, really good.

Starting point is 01:03:41 I mean, that's what we're fighting for. But no stronger reflections other than that. I do think people often underestimate just how good things could get. Like one place where I prefer talking to most people in AI versus talking. to people from more general population, at least in like America and especially like more blue tribe, it seems to me like there is a big meme of misanthropy, especially like among the blue tribe of like maybe humanity isn't worth saving. Maybe like humanity is done too much harm.

Starting point is 01:04:37 Maybe we're the source of evil. And I do think that we are like the, like, basically the only source of evil around. Like, mosquitoes are still edging out humans for the top killer of humans. Well, the mosquito malaria alliance, really, which I am personally offended by and think we should wipe mosquitoes off the map so that we can be number one for killing humans. And also the number one in killing mosquitoes. And also number one in killing mosquitoes. That'll show them. But, like, I would not dub malaria evil.

Starting point is 01:05:14 I would dub, like, the Holocaust evil. Like, we are the source of evil. We are where all of these ills come from. You know, we are, like, destroying large swathes of the environment, and, like, that is sad. But we are also, like, the source of, like love and beauty and friendship and art. And like these things also aren't like, universally compelling. There isn't like a stone tablet in the stars that says like love is great.

Starting point is 01:05:52 The reason that we have, the reason that we care about like love and friendship and like hope and fun and enjoying ourselves is because these were the correlates of fitness in the ancestral savannah where our species evolved. And the particulars of these emotions and feelings and things we care about, those particulars depend on the specifics of our development. Hopefully, some of them overlap with various aliens, but probably not exactly. And it's very unclear how much, it's very unclear how other evolved aliens, how alien they will be. But like, those things are also in us, and they're also from us. And we might be the only source of that in the universe.

Starting point is 01:06:47 And we are very likely the only source of that within, you know, 100 million light years. And like we also know about ourselves that we appreciate the fun and that we don't like the misery. We can look at ourselves and be like, wow, like, we don't like the evil. We, and, you know, it's subtle. We're like, we, it would probably be a tragedy to remove, like, sadism from humans entirely, but we're like, well, we want ethical sadism, right? Like, pair your sadists with some masochists, have it all be, like, consensual and within the bounds of, like, ethics.

Starting point is 01:07:32 Like, it's not just, like, we don't want to, like, there is, there is some of our. our inheritance, some of our inheritance is in, is like very adjacent to the parts of ourselves we don't like, but we also can look at ourselves and see that we don't like that in aggregate we like have these negative consequences, these negative externalities that no one intended. We can look at ourselves and say we don't like the impulses in us that lead us to like great atrocities. And humanity, like the future I think does not look like a similar mix of humanity's virtues and humanity's vices. As we get smarter, as we get more capable,

Starting point is 01:08:11 as we get better at solving coordination problems, as we get more time to think, as we get wiser, as we become more who we wish we were. Like, we on purpose, like, promote our virtues and demote our vices. And, like, is it tricky? Yeah. And do we know what a wonderful future looks like?

Starting point is 01:08:34 no, it's, it's like very subtle. You can't just go around giving everyone everything that they want this, like, or like solving all the problems. Like part of life is having like, like, obstacles to overcome and having like real choices that are meaningful and so and so forth. But like, we can, we can make a world that is kinder where the obstacles are like more meaningful where you don't have like terrible things happening to good people for like where the only reason is that's the laws of physics like everyone like many people people say like everything

Starting point is 01:09:14 happens for a reason they sort of believe they live in a just world but things don't happen for a reason here like we can build a world where like your trials uh like tend to give you things that were worth of trouble that like pay off later and that you were glad for and we can build a world where like uh children aren't dying unnecessarily. necessarily and needlessly because they were like, were born in the wrong part of the world and got some terrible disease we haven't solved yet. And we can do so much better than that. Like if we, like, you know, transcend the bounce of humanity with, like, like, the technological limit is huge. Like, I'm an old school transhumanist and, like, think we should do something at least as cool as building Dyson spheres, although maybe there are, like, better ways to put your stars to use. Like, I'm looking at forward to the Matroska brains, if we, like, decide that's worth the effort. Like, there's, there's so much potential. There's so much potential that this species has as one of, like, perhaps the only

Starting point is 01:10:19 source of, like, love, friendship, happiness, fun in the universe. Aliens may have other things, and we'll care about that, too, but it might not be quite ours, and if there are aliens, they're probably distant. And like we can solve so many problems, especially if we have smarter friends who are trying to help solve them with us. If we can get, you know, artificial minds that are significantly smarter than us, significantly more capable than us, and that also are into this great project of like the glorious, like, transhumanous future full of like flourishing happy civilizations having good times. There's like, I can't, I can't describe the future specifically for you because I expect it to look like foreign and weird and strange to me and be full of people. people like pursuing desires that like I don't recognize. But there is so much upside. And like, yes, humanity is a lot of darkness in it too. But like that's just one more obstacle on our

Starting point is 01:11:20 way to the glorious transhumanist future that is easier to overcome if you have smarter friends. That was, that was beautiful. Nate, thank you for just articulating what we get out of solving this alignment problem because not only do we get to not die, but we get what seems to be kind of the inverse of that. So thank you for walking us through everything you're doing and why the fight is worth fighting. Totally. Cheers. Mantle, formerly known as BitDAO is the first Dow-led Web3 ecosystem, all built on top of Mantle's first core product, the Mantle Network, a brand new high-performance Ethereum Layer 2 built using the OP stack, but uses Eigenlayer's data availability solution instead of the expansion.

Starting point is 01:12:04 Ethereum Layer 1. Not only does this reduce Mantle network's gas fees by 80%, but it also reduces gas fee volatility, providing a more stable foundation for Mantle's applications. The Mantle treasury is one of the biggest Dow-owned treasuries, which is seeding an ecosystem of projects from all around the Web3 space for Mantle. Mantle already has sub-communities from around Web3 onboarded, like Game 7 for Web3 gaming, and Buy Bit for TVL and liquidity and on-rowns. So if you want to build on the Mantle network, Mantle is offering a grants program that provides milestone-based funding to promising projects that help expand, secure, and decentralize Mantle.

Starting point is 01:12:37 If you want to get started working with the first Dow-led layer-2 ecosystem, check out Mantle at mantle.xy-Z and follow them on Twitter at ZeroX Mantle. Introducing Polygon 2.0, the value layer for the internet. For too long, the limitations of blockchains have held back app development and stifled user adoption. The internet allows anyone to create and exchange information. What's missing is a value layer that lets anyone exchange, store, and program value. That's where Polygon 2.0 comes in. Polygon Labs has unveiled a series of innovations

Starting point is 01:13:06 that will radically alter the Polygon ecosystem and Web3 as a whole. By leveraging groundbreaking ZK innovations, such as Polygon ZK EVM, the next iteration of the best-in-class Plonkey 2 proving system and a first of its kind, ZK-powered interoperability layer, Polygon 2.0 will give users and devs unlimited scalability and unified liquidity.

Starting point is 01:13:25 Right now, there is a Polygon improvement proposal regarding a potential ZK-powered upgrade of Polygon proof of stake. Proved Polygon proof of stake would become a layer 2 ZKEVM Villadium. So make your voice heard on this proposal by joining the Polygon Discord today. You have a chance to help the Polygon community give the internet the value layer it deserves. Are you planning to launch a token? Is your token already live?

Starting point is 01:13:47 And are you granting your employees and contractors vesting token awards? And are you trying to figure out how to take care of taxable events for your team? Toku makes implementing a global token incentive award simple. With Toku, you will get unmatched legal and tax support to grant and administer your global teams tokens. Toku will help you navigate across the life cycle of your token from easy to use pre-launch token grant award templates to managing post-cliff taxable events with payroll. For legal, finance, and HR teams, it's a huge complex task to have to comply with labor laws, payroll, and tax obligations, tax reporting, and crypto regulations in every country that you employ someone.

Starting point is 01:14:21 It's difficult, time-consuming, manual and costly, and it's drawing more attention from global regulators and governments. Toku makes it simple for leading companies in the space, Protocol Labs, Hedera, Gitcoin, and many more. So if you want some help in navigating the complex world of token compliance, go to Toku.com slash Bankless or click the link in the description below. Bankless Nation, we are here at Zuzalo and I'm talking to Deir, Tehr, Terran. Dare, welcome to the show. Thank you. Good to be here.

Starting point is 01:14:47 Deir, you want to just explain for people who don't know who you are, where you are and what you're up to? So my name is Deir. I am currently leading the AI Objectives Institute. We are a non-profit research lab focusing on the question of alignment. and we are interested in building what tools would be able to add value to the current ecosystem to bring the future that we want to be in. So an institute that is focused on solving the alignment problem sounds like that people who are at

Starting point is 01:15:13 this institute believe that the alignment problem is existential. Am I on the right track here? Yes, very much so. Okay, so we've been beginning all of our AI interviews with just asking the guest, what is your percentage of AI doom, of percentage likelihood? My personal percentage of AI doom is fairly low for human civilization to end totally would be quite low. I would give it to 2 to 5% while for us to end up in a future that is not desirable, more so existential risk rather than extinction risk would be very, very high,

Starting point is 01:15:48 given the current dynamics we're living in right now. Okay, so existential risk, can you measure, so we, chance of death low, chance of significant disruption and displacement in the hierarchy of life high. Yes. Right? Yes. Okay. And so maybe you could illustrate what that looks like.

Starting point is 01:16:06 What is your likely scenarios here? The core principle that has brought AI Objectives Institute together was that the AI revolution, as we call it, can bring an unprecedented level of flourishing to human civilization, but the current systems we are living in do not place us on that default path. We currently have a lot of incentive gradients that cause power to be concentrated. We have misaligned incentives in the form of nation states to corporations to attention economy that distract humans from what we actually want to focus on. The AI systems right now are learning and copying these behavioral patterns,

Starting point is 01:16:44 causing much more large-scale disruption in the landscape. And this propagating further, the doom scenarios that I am most concerned about do not look like nanobots crawling all over us all of a sudden, but look much more like economic failures, institutional failures, environmental failures, as we know today, at a much higher, much more unpredictable rate. And we already have mechanisms to deal with these things, but we do not know the scale at which we will be coordinating for this.

Starting point is 01:17:13 Okay, so you listed off a bunch of existential crises, and I think you're saying that just like AI is going to accelerate crises crises that are already in existence, rather than creating a net new crisis, although you're leaving some small amount of room for that, the bigger issue is that there are already human crises that we have, that we don't have solutions for, and AI is going to accelerate those.

Starting point is 01:17:33 Exactly. I think the main crisis that I am most afraid of is, in the shortest term, within the next two to five years, we will have massive disruption to the institutions and systems, as we know today. We see AI as an optimizer, and we have already other misaligned optimizers in the landscape. They do look like markets and corporations. They look like nation states. They look like, you know, misaligned incentives that has driven mass-scale invasions that we're experiencing

Starting point is 01:18:01 in the last, you know, two years with Russia and Ukraine to massive bank collapses, to voting systems that end up in gridlock. These systems will be exacerbated in a pace that we are not yet familiar with. And that's to, in our perspective, from the AI Objectives Institute's perspective, there is a very strong continuum rather than a sharp break between human misalignment as we experience today and AI misalignment. Until we solve human alignments talking about a purely AI alignment system feels superfluous in my opinion. So you're saying that solving alignment, there's needs to be a correct order of operations there. And before we look to solve AI alignment, humans need to

Starting point is 01:18:40 first look inward and solve our own alignment first. That's kind of the take. I think AI tooling, in fact can be quite helpful for us to solve alignment at its base. And there is a lot of cross-pollination that I think is necessary between understanding how we as humans so far before AI have been able to keep certain systems in check, be it being able to give corporations a legal entity that can be interacted with, or the way nation states have different systems that are checks and balances for each other. There's a lot of value to be driven from how AI alignment research can feed into current systems that have been experiencing different levels of misalignment, and how these systems

Starting point is 01:19:22 and how we have dealt with this can feed into AI alignment. We see this as one central problem that AI is learning from. To put it in another spotlight, solving AI alignment, being able to align AGI to a specific set of human values and perspectives, actually doesn't solve human alignment problem. It just pushes the problem from a silicon substrate to the sociotechnical substrate, in which case it might be much harder to control.

Starting point is 01:19:50 That's a really interesting... Could you just elaborate on that? Because, like, there are some human values that we want that I think I can claim without evidence that we think are good, as in, like, don't kill me, and things like that. And so, like, but I think what you're saying is, like, if we go further down, like, what we think are good,

Starting point is 01:20:12 we'll start to get into the very subjective realm. and if we start to align AI without defining what humans think are good, we can run into a problem. Right. Okay. So my next question is like, it's still a problem if the AIs come and kill us. Right.

Starting point is 01:20:31 Right. And so like maybe I'm a little bit lost with how to proceed here, but like there seems to be an order of operations problem, of which problem do we solve first? Right. And how do we even define what human space? that we can solve. So there's this concept of differential technological development.

Starting point is 01:20:51 What is most important here is to decide on the order of operations of which problems can be solved first so that they can shed light into the next problem. This is like developing cars before seatbelts are invented is more risky than understanding if we are to develop cars, we should have seat belts. The question of differential technologies are what are the technological pieces that make sense to tackle first so that we have the right substrate on which we can build the future that we want to build towards. So this is what we focus on at the AI Objectives Institute. What are the pieces that will be able to yield a safe aligned AI downstream? And what are the pieces that are necessary

Starting point is 01:21:30 right now to build first? The center of all of this is a coordination problem. I actually define existential risk as a failure to coordinate at the face of an existential risk is what makes existential risk come together. So there's a lot of tooling that we are focusing on right now on scalable coordination. There's a lot of tooling we're focusing on epistemic security and systems of alignment. These are three avenues that I think need to have much more research so that we can mobilize together. We can identify the loopholes in our thinking as individuals, as collectives, and as systems, so that we can bring a level of systems alignment to in line with how we want humans to proceed. Only thereafter we can start contemplating the scale of AGI. Now, I think it is

Starting point is 01:22:18 quite a ways away for we still have enough time to be able to have, by the time AGI arrives, for us to have built institutions that are built on the backbone of scalable coordination and cooperation. That is why I think there is a lot of hope for us to be able to avoid total catastrophe. And I'm interested in thinking of it from that angle, which I think is quite necessary in the current alignment landscape? What are tooling that we can build right now that will bring an incremental net good rather than talking about what we should avoid, what we should not do? I would like to bring light to the world what we should do, what we should be focusing on today. Okay, so it's your position that AI in the short term is going to produce immensely powerful tools.

Starting point is 01:23:01 We can use these tools to help humans with their human problems that they had before AI ever came on the scene. Exactly. One of those things is human coordination. And so we can apply AI to solve human coordination. Hopefully this all happens before the AI-Doom alignment problem manifests that L. Leaser talks about. And the idea is that we race to use AI tools to align humans amongst other existential crisis,

Starting point is 01:23:26 including our own ability to coordinate. And then we will be able to solve the AI alignment problem more head-on. Is this more or less your roadmap? Yes, I would say that the AI alignment problem is more or less the same problem. it is the natural extension of the human alignment, human coordination problem. So because we cannot coordinate as humans, we can't coordinate on the AI alignment problem. We do not have mechanisms to identify what are the sets of values that an AI system should be aligned to. And I think it is quite short-sighted to say, let's just pump in more text to large language models,

Starting point is 01:23:57 and at some point they will be able to figure out. And more details, you know, I'll go into the weeds a little bit. Solving single-to-single alignment is technically easy and saying, you know, if we have one unitary agent that is able to be superhuman and super intelligent, they will figure out what is best for us, so we should fully focus on that. I think this is quite faulty. I think by the time we get to this stage, there will be many narrow AI applications that will be strong enough to actually put humans in a catastrophic risk.

Starting point is 01:24:26 So we need to actually start from there. What will these narrow tools look like at the hands of misaligned state actors? What will these look like at the hands of exploitative corporate behavior? How can we make sure we can have safeguards around these as humanity, as a civilization? And will those tooling actually bring a better AI future? That is the intersection that we are interested in. I think the answer is there. Okay, so the idea is that modern day late-stage capitalism has produced large-scale corporations

Starting point is 01:24:56 that are misaligned with humanity in general, and then you give them super-powerful, narrow artificial intelligence, and they just become misaligned faster. And that's like one model, one application of where this could go wrong, and there's perhaps like five or six or seven more examples like this. I mean, I would actually argue that we already live in this scenario. This is not the future. This has been happening within the last 10 years.

Starting point is 01:25:19 I mean, a couple simple examples. Insurance companies probably at this point have an AI system to decide which claims they should reject immediately because they are least likely to be followed up on. One could say, well, this is the insurance company's job, and they are rightfully doing this. To me, this is actually a fundamental alignment problem. We already have an optimizer system inside another optimizer system

Starting point is 01:25:43 that is the insurance company that is rejecting claims that is causing human lives to be potentially at risk. And we have devised a society that has normalized this behavior. We have devised ways in which, you know, a corporate company, like companies are able to hide the environmental externalities that they are building to the world. the landscape. The questions I ask is, are AI systems able to share world models with us that will be able to have us understand these externalities better, to be able to incorporate that

Starting point is 01:26:15 into fundamental decision-making? To put it in more fluffy words, can AI systems and our understanding through these tools elevate our sense of what humanity wants to be so that we know where we want to go? That is why I am hopeful about being able to build a future. This requires a lot of coordination, This requires a lot better epistemic security. This requires much more thinking about how do we want to envision the institutions of the future. So maybe you could just illuminate some of the strategies that you are working on at the Machine Objectives Institute? The AI Objectives Institute. One line that we had on our website for a long while was our objective is better objectives,

Starting point is 01:26:56 which is it's almost tongue-in-cheek, as we do not think AI systems can have fixed. objectives. Similarly, humans do not have fixed utility functions. In fact, the relaxedness of these is what gives flexibility to human evolution of thought, of our coordination, of our ethics. So in some ways, the name is tongue-in-cheek on that front. But the goal is to come up with better objectives continuously. Right. Okay. So what are the, if we could drill down into the details of what it takes to come up with better objectives for artificial intelligence, like just the details, like, if you will. Right, for sure.

Starting point is 01:27:33 So we think of the society, let's look at the societal stack from an individual level to a collective level and then to a systems level. On the individual level, the core work is finding individual autonomy and sovereignty through bringing better epistemic security. The world that we're living in right now, especially in the Western world with democracies, a lot of this heavily relies on information transfer.

Starting point is 01:27:57 People vote, people coordinate, based on the information they receive around the world. Now, we are entering a new paradigm in human communication where most of the content is about to be generated by AI systems. In this world, are we able to use AI systems to bring a different level of epistemic security and confidence to have us understand is the content I am engaging with with a latent agenda? How can I stay true to my objectives as an individual? How can I stay true to my alignment with the continuous flow of information, that we are interacting with that is constantly fighting to hijack our bandwidth.

Starting point is 01:28:35 This at the lowest level is the most important level. A lot of mechanistic approaches to alignment assume that individuals have a level of autonomy and sovereignty. They do know what they want. We actually start from that question. We do not know what we want. We do not always take actions in line with our incentives due to bounded rationality, due to myopia, due to just pure distraction.

Starting point is 01:28:58 How can we use tooling that comes from the AI landscape for that? So this is the first avenue of research. We have come up with a research agenda that has some specific avenues that we would like to explore that we believe is a net incremental good. I'll go into some of the details on that front. Please, but first, just to really, just make sure I'm understanding here, you started with the individual. Yes. And I think that was an intentional choice.

Starting point is 01:29:21 Yeah. Saying like starting with optimizing for individual freedom and autonomy is a high-level goal, a higher-level goal than the rest of the stack, which I think we're about to go down. But could you explain just and elaborate on why we start with the individual and why that's important? Individual is the building block of the society. Every decision that we are making ultimately comes down to an individual's ability to understand the world that we are interacting with. We have devised the systems that we are operating in right now assume individuals' ability to give feedback to a corporation, to a government, through our behavior on purchasing, or through our

Starting point is 01:29:58 vote, and we cohere around them. Assuming the individual's ability to give feedback to a corporation, to a government, that AI or democracy or media, AI is just one form of superintelligence. We have devised many other forms of superintelligences in human history. Assuming that the individual maintains a level of autonomy and sovereignty throughout this interaction as we live in the world is what has caused the crisis that we already are in. This is not in the future. We already are in this scenario. Let's look at social media. It's the archetype example. In 2008, we thought, you know, this is going to be a revolution that brings us a level of connectivity of mutual

Starting point is 01:30:35 understanding that will heal democracy. Instead, we ended up with echo chambers. We ended up with massive epistemic fracturing with respect to what facts people believe in. We found people that get locked in more to their own bubble, echo chambers. We could have foreseen some of these stuff.

Starting point is 01:30:53 This all sheds light into it's ultimately the individual autonomy and sovereignty that is the core building block of civilization. Then the question we ask is, is the AI tooling of today able to bring a different level of epistemic security? Our answer is yes, and this is very worthwhile to be trying now. Epistemic security. So epistemic, can you just define that term? Yes. The information that you are receiving, you know where this is coming from.

Starting point is 01:31:20 You have a sense on what this information is trying to accomplish or whether it is true or false. How you relate to this information, how you want to relate to this information, and how how you want to participate in the world given this new information. Currently, a lot of these systems are actually quite shaky. In this society we live in today. And we are entering... Post-truth world. Right.

Starting point is 01:31:45 And we are, instead of securing this, we are saying let's come up with a level of generative AI capabilities that can flood even more information. And then we are talking about what will AI be aligned to? The question is, do humans even have bandwidth to be able to share this information? Understood. Right. Okay. Okay. So epistemology, the study of knowledge, epistemic security is just like securing the ability for the individual. So they have the choice too. Sometimes people just want to live in the cozy comfort of being fed the information that they want. But importantly, giving the individual the choice to have access to truth is a basal building block you're saying to talk about the rest of the societal stack. Exactly. And we're going to use AI tools to improve the. that part. So a couple

Starting point is 01:32:32 of projects we are working on right now on this front that is the core building blocks is, can we use AI technologies right now, AI techniques right now, to be able to inform certain patterns of perception? For example, I'm not even going to go into whether or not the

Starting point is 01:32:48 content you're interacting with is true or false. I will instead start from the side. Is the content you're seeing designed to elicit anger from you? Is the content you're interacting with designed to trigger an addictive loop giving you a dopamine high really fast. Is it eliciting anxiety? Is there a latent agenda in this content? Or is there a subcultural affiliation in the content that you are interacting with? Is it

Starting point is 01:33:13 using language that is geared towards a certain subgroup? Is it repetitive? Is there evasiveness? Is it stating beliefs or is it a response to something else? Turns out large language models are actually really good at detecting these kinds of patterns because what they are doing is pattern matching. I'm predicting the next token. So a language model is able to say, oh, this next token is unusual. It looks like the next token,

Starting point is 01:33:38 the difference here, is guiding me towards a different feature. So this is one building block that on its own is able to add a lot of value to the language. So we're building a prototype called Lucid Lens, which evaluates the content

Starting point is 01:33:52 that you're interacting with continuously to be able to guide you into, hey, it looks like you have been on a dopamine loop lately, do you want to shift what you want to do? Or it looks like you're engaging with content that is extremely repetitive in this discourse. Can we bring a level of intentionality to your objective alignment personally? Okay, so you're building that system. How do you make sure that that system isn't biased? Right. Because like maybe the repetitive loop of iterative content

Starting point is 01:34:25 is the correct thing. And then you're saying this AI suggests to humans like, hey, maybe you should get out of your repetitive loop. How do we know that that's not an unbiased thing to do? Right. There's a couple approaches here that we can take. One of them is, can we ground language models to an individual's own affiliations? There are some people working on this front. I'll give a fairly simple answer.

Starting point is 01:34:50 A heads up, something like a Jiminy cricket on your shoulder that can point out, hey, you might be stuck in, you know, a rage-bates, you know, click-scroll for the last, while, that check-in is in the right context is not harmful to an individual. You might say, yes, I acknowledge that I want to continue, I want to proceed, I like where I am at, rather than have there be a sharp judgment of the nature and the quality of the content. But another part, LLMs are able to do very well is to being able to fetch further information that can give you a larger picture around the content that you're interacting with.

Starting point is 01:35:28 Yeah. This is just one of the many avenues. I will zoom back from lucid lens. Another one is can language models replicate an individual's affinities to the point that they are able to help us stay grounded in a set of objectives that we want to be in? This we call mindful mirror is a different avenue of research. You can think of it this way. The first one is about a machine helping a human stay grounded to their objectives. The second one is helping a human stay ground to a human to give the same feedback to a machine, to say, here are things that I would like to prioritize. These are what is good to me. Having an individualized personal language model that is secure,

Starting point is 01:36:10 that is not living in a large company servers, but that is completely owned by you, having this to be able to secure your understanding of the landscape, of emotions, of content that you're dealing with, that can help you ground yourself in a moment where you are lost. This is an incredible piece of technology that can actually create net value to the society. Okay, so I think I can categorize your mechanisms into two different camps. One, in the crypto world, we use this idea of credible neutrality quite frequently. And there are some mechanisms that are credibly neutral,

Starting point is 01:36:44 which aren't to say, like, hey, what you're doing is bad, but they're just little alerts saying, like, hey, you've engaged in very repetitive, YouTube rabbit hole type behaviors. I'm just going to let you know that that is what's happened. Without saying anything negative or positive or suggesting a rerouting, just like a neutral mechanism to perhaps bring you out of the hole and let you know that this is perhaps a dangerous zone or perhaps not.

Starting point is 01:37:12 Right. And just for you to be able to step out of your consciousness tunnel and zoom out. And just like that in meditation, sometimes they ding that. at gong. Right. Right. And just like, hey, if you're lost in thought, like, ding the gong. Right. And so like it's a way to just get you to snap back out of it if you are in it. Right. And so the incredibly neutral mechanism, which we love. And then the other one is being able to customize your own personal LLM to align with your desires. And since you are the one implanting your biases into the LLM, we also feel that since we are not imposing that upon

Starting point is 01:37:45 others, we're only imposing that upon ourselves. That also checks the box of credible neutrality. Right, and I would like to underline one thing, which is that the large language models, the goal isn't to create a locked-in version. Like, there's many fault lines that we can fall through here as we are building these, which is why the goal isn't to launch this as a product at widespread use right away, but to actually approach this from a question of rigorous study to the point that we make sure these tools are doing the things we want to do. What are their failure points? For example, a very terrible way to do this would be to cause individual value lock-in, in which case, you know, a language model constantly reaffirms things to you from your past state to the point that it prevents you from moving forward.

Starting point is 01:38:30 I am more interested in a language model that is able to give me awareness of my drift through my own evolution of thought. These are some of the aspects that are actually very fundamental to the question of alignment. How is the landscape of values shifting for an individual? How am I changing as a person? Right. What was important to you? There's a series of simple questions.

Starting point is 01:38:52 Are my beliefs consistent with my beliefs? Are my beliefs consistent with my actions? Are my actions in consistency with the community that I'm living in and their beliefs and their actions? Being able to have visibility into these, these systems are incredibly fractured right now. And we are building even more walls. I believe AI tooling can actually help us overcome some of these to be able to understand, Oh, it looks like what I wanted to do, I am not able to do it, given the incentive gradients I am existing in right now. The crypto world has suffered from this quite a bit.

Starting point is 01:39:25 How can we have this be more visible to everyone? I believe questions of contemplating what values do we want AI systems to be aligned to, et cetera, really require a level of rigorous understanding of self first. So this is the first step. Then we move to the next category, which is how can we scale? up to the question of a collective, which is the next tier in human society. Right. And before we get to there, I just really want to drive something home. It really sounds like we're trying to allow AI for humans to become the best versions of

Starting point is 01:39:58 themselves. Right. Right. If we want to talk about some like some people in the psychology realm, Nietzsche would call this the Ubermenschk, right? Like the Uber man, the Superman, the best, the literally becoming the best version of yourself because that also scales up. in society, right? If you, as an individual, become the best versions of yourself, that makes you a better community member, and that makes communities better, which I think is kind of where this idea goes. I call it extended cognition. If I am able to have an ability to understand myself, cross-sectional through my own history, based on what I have engaged with, based on what I have thought, what I have written, that is powerful. We can act some of, like, the scariest

Starting point is 01:40:42 applications are also the most worthwhile. If this data is compromised in any way, that also creates, you know, a mirror of me to exist in the society. So then the question becomes, how can we do this in the most secure manner that is really within your own autonomy and sovereignty? Because we already live in a world with very difficult attention economy, that everything is competing for your attention, for your beliefs, so that you can vote or so that you can purchase a certain way. It's very important for us to be able to bring a level of autonomy. to the individual as AI tooling proliferates in this landscape. Okay, let's move up the social stack.

Starting point is 01:41:18 Community comes next? Yes, collective. Collective. A question here is scalable coordination. This is where I am most passionate about. I think there is incredible value to be added here. In some ways, you know, there's many alignment labs. We do share, you know, our concerns are in line with a lot of the other spaces like

Starting point is 01:41:36 Miri or Redwood where there is massive challenges that are coming in. And the question is, why? Why have we not yet been able to coordinate at a scale where we have lined up our incentives as a society so we can tackle these AI problems? Instead, we ended up in a race dynamic where entities that are spun up to counter end up participating in the race dynamic. For this, being able to level set the understanding of what is true across every participant, being able to bring visibility into what are different perspectives that exist in society right now, how can I engage with these more effectively? How can we come up with collective decision-making systems

Starting point is 01:42:17 so that a collective can find its alignment? The first category was about an individual finding their alignment. The second category is how can a collective find their alignment? So you corrected me when I said community and to replace it with collective. I think the reason why you did that is because a community seems to be like a handful of people, right? 100 people or 1,000 people in a 10.

Starting point is 01:42:41 But a collective is like the hive mind of these people. And that is the thing that we are trying to produce alignment for. That's my interpretation. Yes. And I would say I am in a collective with a lot of people that are not in my community necessarily. A community is a more intimate collective. Depending on how, and you can say, you know, well, there is different ways to cut the social strata that we live in to say, you know, these are different categories. All of these are valid.

Starting point is 01:43:08 The thing I am interested in is say we're able to align it. AGI to one human. What do we align AGI to now? How can we come up with definitions for collectives? Yeah. We recently did a podcast with a guy, Tim Urban, on the subject of liberalism. And he had this great illustration of higher mind thinking versus lower mind thinking. And then higher mind is like a genie and lower mind is like a golem, right? Like just like, a golems just like dominant punches. And a genie is like magical and higher. And then when you have collectives, if you have a society that is a society dominated by lower mind thinking, primitive mind thinking, like reptile brain, you have a collective golem. But then if you have a society

Starting point is 01:43:51 based on higher mind thinking, higher order thinking, using their more recent parts of their developed brain, their prefrontal cortex, then you have a collective genie. And this thing as can actually, even if it engages with a different collective genie, so you have like the Republican genie and the Democrat genie, two genies can actually make progress together, right? They can actually come together and produce a roadmap, whereas two golems just come and fight. And so this is actually a similar subject that we've had on the podcast and maybe a way to illustrate this. I see some parts of this to be in line with our thinking.

Starting point is 01:44:28 The way we have been developing AI systems are much closer to a golem right now. I'll get into that. Hence the downstream problem of AI alignment as human alignment. I see it. Right. I have thoughts on how to make AI systems be more like genies as well. We can get to that in a little bit, but not to change course of the discussion. One of the projects we have on this front is called Talk to the City. And this is a digital town hall that you can summon out of unstructured feedback that you have collected from a polity.

Starting point is 01:45:03 We currently have voting systems where you're sending one bit of information to the government every four. years and you're hoping that they will be able to represent this the best. We have been used to categorical information on voting, multiple choice and referendums. The question is, am I actually able to share my perspective in its real, more true to my own perspective, through human language and have a central entity receive this information and make decisions based on this information? Talk to the City is a prototype that collects different unstructured text feedback from the entire community that we are looking at, synthesizes into a set of different perspectives that exist in the community, and train conversational language models for each of

Starting point is 01:45:51 these perspectives so that you can have these perspectives talk to each other, or as an individual, be it, you know, you're a policymaker or a journalist or just a citizen in this city, be able to engage with all of these reasons why. The goal here is not to find consensus. The goal here is to understand different viewpoints so you can make sure you can address these in some ways one of the problems that we would like to solve and this again goes back to the real alignment problem it is easy to find the lowest common denominator across everyone and that causes a lot of short-sighted policy making that causes a lot of i mean political theory seeing like a state explores this in depth um i am listening i've read that book by the way right book it's incredible

Starting point is 01:46:34 it's very helpful for shaping our thinking i am interested in being able to understand understand a more in-depth policy. How is this going to actually impact people? What are the reasons why this may be bad for some groups, yet still creates a Pareto improvement over the current state? These require a level of sophistication that isn't about saying, okay, it looks like everyone agrees on do not kill humans, so let's codify that. Can we actually understand, okay, but I'm interested in having more resources for my village, and this requires a compromise with something else. If the proposal is, you know, should we build a road from A to B, yes or no, the answer might be we should build a road from B to C. And being able to give a community to express this

Starting point is 01:47:20 and how this be received systematically is something AI tooling can actually bring to the landscape right now. That can bring a different level of collective coordination capability. So, okay, so it sounds like at the individual level, we have these language models that can be perhaps perceived as like our personal assistance, our personal like our series, our Google or Microsoft Cortanas to help us think and help us know and help us learn. And then that can amalgamate to a higher order, LLM that like you said, doesn't come to consensus on our behalf, but allows us to see when you aggregate everyone's personal assistance, what does everyone believe? And it processes that data of individual beliefs into a,

Starting point is 01:48:07 something for us to reason about and understand and move forward. Yes. Now that we understand that. Right. I am not interested in the genie that you're talking about to make policy for us. I am interested in this system to be able to show us these are the considerations we need to take into account that people have voiced. This is something that AI tooling can help us with today right now, which is why we are building it. And that brings a level of scalable coordination and cooperation capability.

Starting point is 01:48:34 Another aspect here is to finding positive some outcomes. that individual groups may not have been able to see. You know, the solution to, you know, should we build the keystone Excel pipeline, the right answer might be, well, a hydroelectric time will still bring the same energy and workforce to the region without actually having an environmental pollution cause.

Starting point is 01:48:52 So it can produce emergent optionality. Right. And these are things that the systems we are building are actually really good at. I would rather not yield the level of reasoning and decision-making to an AI system, But in the near term, we are able to use these as building blocks to improve the capability for humans to cohere with each other. Now, what's really interesting here is we are getting two birds with one stone if we build these pillars.

Starting point is 01:49:20 The first one is these tools are net good for humanity. We need these right now to solve human alignment, to take a step towards human alignment. But also, these tools will create datasets for AI alignment as well. It will be able to show how humans have cohereed around decision-making on specific world models, or have humans say, yes, this AI system actually was able to represent me through time. We currently don't have this data, yet we're talking about solving AI alignment. Don't get me wrong. I do think mechanistic interpretability and a lot of machine alignment research is incredibly important as well.

Starting point is 01:49:56 We also need to look at what does this mean for humanity. Otherwise, we will just push the problem of alignment downstream and make it much harder. to solve. Right. Right. Okay. So what you're saying is like when we have this collective consciousness that we are able to reason about via all of our native large language monotiddle assistants coming together and powwowing about what everyone believes, we'll start to be able to lock in some beliefs. We'll probably lock in the idea that killing humans is bad. That'll probably happen pretty first. Everyone will come, will be in agreement about that so then we can use that shared understanding that killing humans is bad as a way to

Starting point is 01:50:33 codify that into law about AGI. And then we perhaps can go higher from there and be like, okay, now that we understand that everyone believes that killing humans is bad, we can also lock in that theft is bad. And then we can start to get higher up the stack of what we believe and then use that to

Starting point is 01:50:50 operationalize more powerful AI. Is that the path here? I very much disagree with that framing. Oh, no. Okay. Please help me. Help me understand. Value lock-in is a concept that is well explored in AI alignment and effective altruism and similar landscapes. The goal is not to lock in values. The goal is to build systems where people, the polity, is able to continuously give feedback and participate into an ever-evolving consciousness. Instead of one AGI that has learned the

Starting point is 01:51:25 moral code and then can proceed, what is necessary is a system that can continuously take feedback from people as the value landscape shifts as more unpredictable events happen. Actually, the crypto world has lived through this many times, as the incentives shift, as speed of progress shifts, all the way from gas fees to coordinating how purchases can be made. We need systems that are actually resilient towards updating based on how the landscape is changing. So I would be quite worried about building systems that can learn and fix something in place perpetually. Much more so I would be interested in systems that are trained to fetch new information, to understand how the ground is changing throughout time. Most humans,

Starting point is 01:52:13 I mean, some values I hope we perpetually agree with, such as do not kill, do not cause harm, but then again, what does harm mean? What does this mean in case of euthanasia that, you know, might be an opt-in from the individual? How does one proceed? We already are living in a world where we're exploring these kinds of questions before AI. The AI systems, I think it's quite dangerous to force the AI systems into, you need to find what the optimal version is and enforce it to the rest of the world. The right way to go is build an AI system that can learn from humans on where humans want to go at any given point and bring us towards there. And have there be a level of courageability. So there's a rule of thumb that I've come to understand in the crypto economic world, which is called

Starting point is 01:52:57 them, no magic numbers, as in when you build a crypto-economic system, if you just pick a fixed number, that is a point of rigidity and fragility. And I think that's perhaps what you're saying about when we train our AIs to be aligned with us, any sort of rigid or fixed parameter can create fragility and long-tail consequences that we don't understand. Right. Okay, so we're in agreement there. Right. And so, okay, so you didn't like when I was saying, like, hey, all the agreements, all the humans agree that killing is bad, let's lock that in. Maybe I'll rephrase and say, like, in this one moment of time, all of the humans of a local collective all in that one moment of time agree that humans are killing is bad.

Starting point is 01:53:39 So the AI that's reading that data will, in that moment of time, choose to not kill humans. Is that a better way to describe it? Yes, and I would rather have us have systems that don't necessarily give AI the ability to be able to kill humans in the first place, but more so CDs. as the tools that they are. They are super intelligent things we can consult and learn from and iterate from there. But yes, the AI systems values should be able to evolve as human systems evolve. And it's really downstream of human autonomy and sovereignty into collective decision-making.

Starting point is 01:54:14 That can bring the systems to be able to be aligned. Okay. So we started the individual. We've moved up to the collective. Is there our next step? Yes. What's higher up? The last step is systems level.

Starting point is 01:54:24 Systems. Systems are more complex than a collective. In a system, we don't only have multitude of people, but entities that have their own capabilities, that have their own agenda. For example, a corporation consists of individuals, but it has goals, it has affordances that go beyond what any of these individuals can do.

Starting point is 01:54:43 Furthermore, we have developed systems in the world that don't hold the individuals accountable for the failures of corporations. This is the side where we have seen massive problems with misalignment throughout human history, both in terms of states, governance, and we have toppled many systems. Divine right of kings was impossible to overcome, yet here we are. Communism was the same.

Starting point is 01:55:07 There is a lot of systems that we evolve. We currently operate under a capitalist system that is heavily governed by incentive gradients, that have caused a lot of shifts for how even nonprofits that are developing AI have shifted their priorities towards monetization and productization. So the question then becomes, how can we design systems that can stay aligned to the betterment of the collective, to the betterment of the individuals? Everything goes back to, you know, is this actually producing well-being for the participant? Or is it good-hearting something? Goodheart's law, as in when you pick a measure that becomes a target, it ceases to be a good measure.

Starting point is 01:55:47 We live in systems that already do this. This is precisely why we don't want an AI to lock in and create rigidity. but design systems that can continuously evolve through that window. Okay, so this seems like the hardest problem. Yes. It also seems like the frontier of coordination problems that humans have arrived at in the grand scheme of things. Right.

Starting point is 01:56:10 And we still haven't tackled that problem. Right. It sounds like perhaps understanding that that's the foundation that we were at, we actually might need AI to solve that problem and not be able to solve that problem without it. We haven't solved this problem. Right. One thing I say is our work, the AI Objectives Institute, in some sense the word AI is not that relevant.

Starting point is 01:56:33 This work was relevant 300 years ago. I think this world will be relevant. This work will be relevant post-AGI as well. This is a question of how do we design structures, institutions that can stay aligned to the collective. This is a millennia-old problem. AI is just the newest building block in this story. It's a very critical building block in this story that can cause a lot of damage if we don't do it right,

Starting point is 01:56:57 which is why I have high fears of existential risk, if not extinction risk, but that is possible as well if we don't coordinate. I believe we would be able to coordinate and build better institutions, which is what I want to work on. Do you think that actually solving the point of, like, at the systems level, we have borders,

Starting point is 01:57:19 and that's kind of like a coordination breakdown, The fact that different countries operate by different rules and coordinate differently. And then there's also different economic systems, right? Different systems, they're disparate, they're disconnected. It would be better if there was a single global system, along with all the other problems that we've had with Stalinism and other like atrocities throughout the 1900s. My question to you, I'll just reiterate it, is like,

Starting point is 01:57:46 that's the frontier of human coordination that, you know, we've solved it at the tribe, We solved it at the community, solved it at the city, solved it at the nation-state level. Haven't solved it above that. And honestly, haven't really solved it completely at the nation-state level either. We haven't solved it on an individual or collective level either, but we are taking steps towards all of these. It's solved more or less at different parts in the stack. Yes.

Starting point is 01:58:10 And my question to you is, like, do we need AI to take the next step in solving it at a more systemic level? I actually see it from the opposite way around. given we are building AI AI driven institutions given the workflow that we have right now will yield institutions that have AI in it we have to look into how AI will

Starting point is 01:58:31 interface with this if we say let's turn off AI so we can solve this problem for another couple centuries or millennia sure we can do that as all we don't live in that hypothetical AI is here it is present I would ask how can this be helpful

Starting point is 01:58:48 for institution design Yeah. So would you agree that AI, all these language modules, everything we're talking about here, is both the problem and the solution at the same time? It is the problem right now because we haven't yet been able to solve human alignment. Hence, any superintelligence, any super competent entity that can excel human capacity can be dangerous, such as misaligned state actors. or exploitative corporations, of course, an AI will be very dangerous as well. So we need to tackle this problem. Right. So we have a couple experiments on this front that are also incredibly important. And these are the monoliths that require much more coordination and help than one group to solve.

Starting point is 01:59:37 So our goal is to foster an ecosystem that has many approaches all the way from, you know, crypto is super important, ZK proof is super important on this pillar. we are interested in building a proof of concept of an open agency architecture system that can help an institute our goal is to showcase that an institution can make decisions based on feedback from a larger collective based on expert opinion in a level that is transparent and visible and interpretable rather than a complete black box and we believe that it is incredibly important for AI tools building institutions and AI building institutions and AI building institutions to follow this. And also it is important for the AI systems to be built on top of this principle as well. Yeah. And Open Agency is a concept that Eric Drexler has pushed forward.

Starting point is 02:00:31 A simple explanation would be, can we shift our thinking from agents, which are singular monoliths that have fixed goals, that have low visibility, that have their own ways of doing things, towards agencies? which has different faculties and different tasks that are being passed around as it reasons about the world and as it take actions into the world.

Starting point is 02:00:56 Building more open agency systems rather than closed monolithic systems is net good. Designing institutions that operate this way is a net good. And similar to the question of human alignment, we have been doing this for a couple of millennia. We have iterated through different governance structures towards more visibility, towards democratic systems, we are going to continue with these paradigms as now AI tooling enters the picture as well. Yeah, I really just want to kind of drill down on my understanding of how AI fits into the stack because I think your big message that you have is AI is yet another thing that we need to figure out how to align along with all the other things that we need

Starting point is 02:01:39 to figure out how to align. It's quite, it's a very urgent one, given the urgent one. And in my mental model is that like if we can, AI is unique from the other systems that require alignment in the fact that we can use it. It's special. It stands out from the rest of the problems in that if we can align AI, we can align everything else. And we actually might require AI to align everything else so that we can also align AI. I would say it stands out in some ways for it has certain capabilities that hasn't been emerging from other. I highly doubt if we can align AI, we can align everything else. We can solve a version of AI alignment where you can align a single AGI towards a single set of human values

Starting point is 02:02:28 that then can go and rampantly destroy half of the world and we end up in a Thanos scenario or we end up in a Moloch scenario where you end up an aligned AI towards a certain set of values that exploit towards building more and more resources takeover. These are not good cases of alignment. A multi-malti alignment case where we're looking at there is a polity that has different perspectives. And these can be represented as different agents. These can be represented as different general AI systems.

Starting point is 02:03:03 We are more likely to end up in that scenario already. We have multiple tools. We have independent groups that are building many tools. These tools require a level of coordination between each other. These tools need to be able to have their own interpretability to the people they are accountable to. So I highly doubt we will end up with one monolith, but the question is, just like how we are trying to solve coordination today, with many, if not AGI, more narrow tools that are still capable of massive damage to human existence, how do we create alignment across all of these systems? Okay. So if you would, Derr, can you like speed run us through the version of the universe that you hope to see? Like if you and everything that you want to see happens, what does that universe look like over the next five to 50 years? Sure. I am excited about the value AI tooling can bring to the universe. We will have systems that will help us discover ourselves better. We will have systems that help us give visibility to our priorities.

Starting point is 02:04:09 and see how this is acknowledged by the rest of the world. We will have systems that elevate humanity towards what it wanted to be, rather than avoid what we afraid we would become. I am interested in a world in which I understand the participation that I want to make. How does this contribute to the world? That I have bandwidth to explore and play. I would like there to be a world where everyone is able to have bandwidth towards the hobbies, towards the joys that they're bringing in to pursue.

Starting point is 02:04:39 I want a world in which we are able to see how our opinion counts in a larger system that is making decisions, that is interpretable and accessible. I want there to be more human connection. Ultimately, it's around being able to have humans interface with each other more, not less. That's why the core of this always comes down to the coordination problem. Can we have humans see eye-to-eye understand each other? Can we have the AI tooling, reduce the barriers, reduce the incentive gradients that are shaping up right now that prevent humans from finding more agreement, more shared values, more shared choice with each other.

Starting point is 02:05:14 This is what I'm most preoccupied with. The world I'm afraid of is one that we talk about this a lot in our team. Numbers that we have decided to care about are going up and up and up while we don't necessarily have more human flourishing. That is what I'm afraid of. AI tooling can bring us an unprecedented level of human flourishing. The default systems do not place us on that path. and I would like us to go towards there. And I think this is possible.

Starting point is 02:05:41 I think the tooling that can be built today already takes massive steps towards here. This tooling, we're interested in building this tooling for humans to use. We are interested in building this tooling so that the AGI labs can adopt an open agency architecture so they can make more grounded decisions

Starting point is 02:05:58 on what the collective is interested in, what is safe, what is interpretable. There's a lot more in there that we didn't go into in DECD. on how we can have systems that can make decisions that are by design safe and verifiable. All of these will be a net plus to the world we are living in. I don't think we will solve human alignment, but I think AI tooling can take a massive step towards that direction.

Starting point is 02:06:21 That is the world I'm interested in. Dara, I get the intuition that you are an optimistic person. Is that correct? I would say so, yes. I think we need more optimism in this landscape so that we can see what we want to do. I think there's a lot that can be done. And I have many fears as well, but I think these fears can be solved if we get to the level of coordination. Derif people are peaked by this conversation and they want to learn more about aspects of this conversation.

Starting point is 02:06:51 Where should they go? Check our website, Objective.is. That is. Send an email to our team. Hello. What is it? Should that message to me, I guess. Degger at Objective.

Starting point is 02:07:06 I would love to chat. If any of you are interested in helping and creating this vision, come along. We need many, many folks to bring this together and make it a possible truth for us. So, yeah. Thank you so much. Yeah. Cheers. Thank you.

Bankless - Revolutionizing AI: Tackling the Alignment Problem | Zuzalu #3

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.