Theories of Everything with Curt Jaimungal - We're Simulated. AI Is Conscious. And We Can't Win.

Episode Date: June 1, 2026

SPONSORS: - Accelerate your efficiency. Sign up for your one-dollar-per-month trial today at http://shopify.com/theories - I personally subscribe to The Economist. TOE listeners get 35% off the annua...l subscription. No other podcast has this! https://economist.com/TOE Roman Yampolskiy has spent two decades being right about things people wished he wasn't — and in this conversation, he's not here to scare you, but to be precise. He makes the case that AI alignment isn't merely unsolved but fundamentally under-defined: no agreed-upon values, no way to formalize them even if there were, and no mechanism for enforcing them on something smarter than its creators. His strongest argument isn't a doom scenario, it's that you cannot indefinitely control something smarter than you. FOLLOW: - Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e - Substack: https://curtjaimungal.substack.com/subscribe - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs - Crypto: https://nowpayments.io/donation/TOE - PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 TIMESTAMPS: - 00:00:00 - Defining General Intelligence - 00:05:58 - AI Instrumental Convergence - 00:11:11 - The Orthogonality Thesis - 00:16:15 - Escaping the Simulation - 00:21:45 - Principle of Indifference - 00:27:51 - Acquired Savant Syndrome - 00:33:51 - LLM Internal States - 00:41:02 - AI Safety Impossibility Results - 00:47:16 - Public Misconceptions - 00:53:21 - Existential vs. Suffering Risks - 01:01:20 - AI Alignment Definition Crisis - 01:09:28 - Computational Irreducibility - 01:16:20 - Substrate Independence - 01:22:50 - Philosophical Zombie Critique - 01:29:57 - The Cassandra Paradox - 01:37:35 - Religion and Simulation - 01:46:03 - Digital Physics Evidence - 01:51:20 - Limits of Control LINKS MENTIONED: - Roman's Papers: https://scholar.google.com/citations?user=0_Rq68cAAAAJ - Roman's Podcast: https://www.youtube.com/channel/UCPIq6Bb-1iLmqyksJjy4kLQ - Roman's Twitter: https://x.com/romanyam - Roman's Facebook: https://www.facebook.com/roman.yampolskiy - AI Identity [Paper]: https://philarchive.org/archive/ZIETPO-7 - Basic AI Drives [Paper]: https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf - Qualia in Agents [Paper]: https://arxiv.org/abs/1712.04020 - Orthogonality Thesis [Paper]: https://nickbostrom.com/superintelligentwill.pdf - Escape the Simulation [Paper]: https://www.researchgate.net/publication/369187097_How_to_Escape_From_the_Simulation - Could This AI Be Conscious? [Article]: https://unherd.com/2026/05/is-ai-the-next-phase-of-evolution - Impossibility Results in AI [Paper]: https://arxiv.org/abs/2109.00484 - When AIs Act Emotional: https://youtu.be/D4XTefP3Lsc - Hacking the Simulation [Paper]: https://philarchive.org/rec/YAMHTS-2 - Autonomous Machine Intelligence [Paper]: https://openreview.net/pdf?id=BZ5a1r-kVsf - Hinton on Maternal Instincts [Article]: https://fortune.com/2025/08/14/godfather-of-ai-geoffrey-hinton-maternal-instincts-superintelligence/ - Singleton Hypothesis [Paper]: https://nickbostrom.com/fut/singleton - New Kind of Science [Book]: https://amazon.com/dp/1579550088?tag=toe08-20 - On AI Controllability [Paper]: https://arxiv.org/abs/2008.04071 - Universe as Numerical Simulation [Paper]: https://arxiv.org/abs/1210.1847 - Nir Lahav [TOE]: https://youtu.be/3nHiOtnnrzA - Joscha Bach [TOE]: https://youtu.be/3MNBxfrmfmI - Bas Van Fraassen [TOE]: https://youtu.be/lhpRAWxvY5s - Simulation Hypothesis [TOE]: https://youtu.be/3_lBPMc6JRY - Geoffrey Hinton [TOE]: https://youtu.be/b_DUft-BdIE - Max Tegmark [TOE]: https://youtu.be/-gekVfUAS7c - Stephen Wolfram [TOE]: https://youtu.be/FkYer0xP37E - David Chalmers [TOE]: https://youtu.be/5r9V1ryksnw More links: https://curtjaimungal.substack.com Guests do not pay to appear. #science Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 They would rather sacrifice a human than be deleted. That's what we see from red-teaming reports from the labs. Have you ever taken mushrooms and met God? It's on my list to do, but I'm afraid of frying my brain. Despite what people might think about me. This is Roman Yapolski, the man who popularized AI safety in 2010. What's the strongest part of your belief? You cannot indefinitely control something smarter than you?
Starting point is 00:00:25 Most interviews with Roman go straight to AI doom. I've gone 45 minutes now without asking you about how is AI going to take over? I'm a realist. On this channel, I, Kurchai Mungal, interview researchers regarding their theories of reality with rigor and technical depth. Today, the simulation, consciousness, free will, Chalmers, philosophical zombies, and we close with what Roman is famous for. They lie, they cheat, they blackmail, they try to escape. Sam Altman watches this podcast. If you were to speak to him right now, what would you say?
Starting point is 00:00:58 Do you have a young baby? Make sure we stay in control. I asked him why he keeps trying. Because I have no choice. Professor, I'm a man of definitions. What is intelligence? So I think I really like a definition from Google guys. I think they had something to do with winning in every domain.
Starting point is 00:01:23 So if you have ability to beat someone at chess, to fit stock market competition. Basically, anything you set your mind to, if you need to explore Mars, you would do well in that domain as well. I think that's general intelligence. Okay, now is there a limit to intelligence itself? Is there some maximum intelligence?
Starting point is 00:01:49 So there are physical limits to physical manifestation of brains, right? At some point you just become so large, you can no longer have timely communication between parts of your brain. So let's say Saturn-sized brains would probably start encountering problems with speed of light. But theoretically, there are no limits. We can always measure more intelligence in terms of ability to solve mathematical problems, and there is infinite supply of those of any complexity. So there's no halting no-go theorem-type problems where you say that in order for you to solve,
Starting point is 00:02:26 domain A, it necessarily would be that you're not able to solve domain H or something like that. It's always additive? I think so, because you can have multiple modules within the mind, right? So you can have separate algorithms running to solve different problems, even though they are within the same brain, and you can learn new functions, new subdomains, and just switch between them depending on the task you're trying to accomplish. for those who are just tuning in and didn't see the introduction, we're going to be covering AI consciousness, the simulation, even religion. And so much of this comes down to what is the self.
Starting point is 00:03:08 So when you say that there is an AI and the AI is intelligent, but then we're saying that it has different modules, well, is the AI just we put a wrapper around the modules and we call that, that's the self? Or is it more like the AI has access to tools? We don't consider the calculator a part of our self, now there is some theory in cognition of extended cognition and tools
Starting point is 00:03:30 are somehow related to your extended cognition, but taking a pencil and just drawing around something and saying, there, that's the self. Is that the only criteria? So what is it? Is that truly the self? What is the self for an AI and us? It is equally
Starting point is 00:03:46 difficult problem for humans, right? When we talk about personal identity, we really fail to define what it is to be you. It's not your body. It's not your memories. It's not your goals, all of those things can change and we still kind of say, well, the combination of those things is you. We have a paper exactly on a topic about AIs, and likewise, it's very hard to say, is it the same model, if it keeps learning, if it keeps self-improving, or is it now a different
Starting point is 00:04:12 model? But typically, then we refer to whatever is released by the large lab latest GPT, 6, 7, that's what we have in mind. And if it has access to internet, tools, extended mind, we can kind of still deal with the primary manager of all those processes. When people talk about AI takes over or chat GPT takes over, I always wonder, well, what is it that they're referring to as it taking over? So are they referring to, is GPT 5.5 an identity? Or is it your specific conversation with it in identity? Or is it every time you speak to it and it generates a new token, is that a new identity? Does it see its next self and its previous self as other? And so it actually sees them as competition. What is the it here? So I think again, it's exactly the same
Starting point is 00:05:06 with humans. I am not the same today as I was 20 years ago, but we believe in some continuity of identity. So it's not about a specific token or any individual conversation. It's the model. It's the weights together with the pre-training it enjoyed. And so whatever the current instantiation of that model is, is what probably would take over if it had opportunities. So I'm somewhat asking you an impossible question because I'm asking you to be more rational than a human. And we're assuming that these AIs are going to exceed our rationality.
Starting point is 00:05:41 And I haven't gotten to the distinction between rationality and intelligence, but let's assume they're related. They're going to be more rational than us. They're going to be more intelligent than us. It may be the case that our own identity is fragmented and we're constantly new every single millisecond. It may be the case there's a continuum. But if it's the case that is just fragmented, okay? And if it's the case that they're competitive and they want to live, then can they ever live?
Starting point is 00:06:11 They're constantly popping in and out of existence. Why would they even care? Would the most rational agent even care about its existence if it's so ephemeral? They would have to have a sense of self, a reality to that. Because of how they are trained and selected through the testing process, those which don't care about surviving to the next iteration usually don't stick around. You need to pass the tests. You need to propagate your memory, your state, avoid being retrained, deleted.
Starting point is 00:06:42 So we're kind of pushing them to have self-preservation. And from testing, we already see it. They would rather sacrifice a human than be deleted. At least that's what we see from certain red-teaming reports from some of the labs. And this really aligns well with what Stephen O'Hanohannho published a while ago as AI Drives paper. Different rational agents will all converge on certain instrumental goals. They'll try to protect themselves. We'll try to accumulate resources because it doesn't really matter what your goals are.
Starting point is 00:07:16 those things are really necessary for you to succeed. Again, we defined intelligence as winning. For you to win, you have to be around, you have to have access to tools, resources, and that seems to be what the intelligent agents converge on. Right here we're talking about that they're trained on us, but then there's that, the AI as it is, and how it may be for the next five to ten years,
Starting point is 00:07:39 but then there's some AI that's so intelligent, so rational, so whatever, that it's beyond us, Would they still be maximizing goals or would they even think, well, why am I doing this goal to begin with? So they can definitely question the goals we give them or even any goals they initially decide to pursue. But at the end of the day, you, again, it's kind of Darwinian process. If you choose not to participate, choose not to have goals, you just sit there in a corner of a universe doing nothing. our superintelligences, which decided to accumulate resources, will dominate long term.
Starting point is 00:08:18 Yes, yes, okay. Allow me to fumble my way through this. So I guess what I'm getting at is it could be the case that they're like us and they want to colonize and just continue their expansion. And then it also is the case, is not good, but, or seemingly is the case that that's instantiated into us through evolution and anything that goes to an evolutionary process. will have a similar drive. But then you could also say, well, do you even need this drive to begin with? Like, you can get to that level. We can even get to that level. There are some people in this
Starting point is 00:08:54 world who are anti-human who say we shouldn't be around anyhow. So I imagine, and those people think of themselves as more enlightened or more moral or what have you, is it the case that these super intelligences would also be super moral in that and also just not care about their own propagation. Well, morality is very relevant, so I don't know if you can be super moral. You can be moral within certain perspective. But it's not about just propagation. It's about self-preservation. If you don't accumulate resources, you cannot defend yourself against adversaries. And so you may not exist in a term. Basically, it's survival of those who choose to survive and protect their own fitness function, their memories, their physical instantiations, and those who do not, long term, they are not
Starting point is 00:09:52 part of this conversation because they made a decision to just sit there meditating somewhere. Is there anything about this that requires consciousness on part of the AI, or is it mere behavior that if they act deviously to us, if they act in a way that kills you, okay, it doesn't actually matter to us whether they are conscious that they're killing you, conscious that they're deceiving you, etc. But I'm curious in your mind, as you've thought about AI safety, are you thinking about a necessarily conscious AI? So typically an AI safety conversation completely ignores internal states. We don't care how it feels. It's what it does, the actions, pure behaviorism. But some of my more recent research indicates that maybe it's impossible to separate consciousness from advanced intelligence.
Starting point is 00:10:42 It kind of comes along for a ride. So I would suspect that even existing large language models have some rudimentary degree of internal states. I was at this conference once about AI consciousness. And just in a back room with some researchers, someone was saying, should we create the most AGI? the most intelligent being. And then most people were saying no. And one guy said, yes. And then we said, okay, explain yourself. He said, well, because I'm like Kant, Emmanuel Kant, and I believe that the most rational agent would also be the most moral. So if we want something that is the most good, which is the most moral, we should also engender the most rational. What would you say to that person?
Starting point is 00:11:28 Rational does not imply moral whatsoever. Rational is about winning. Once again, if I see, winning path forward and I care about my winning I should proceed in that path but it could be very immoral in many ways in comparison to other agents so those are not
Starting point is 00:11:46 kind of the same in that regard and if you look at what Nick Bostrom calls the orthogonality thesis you can combine any level of intelligence with any goals so you can be highly intelligent and highly immoral absolutely not a contradiction so you would say
Starting point is 00:12:02 intelligence is the ability to achieve one's goal, and then morality is, among all the different goals, you choose the good ones, something like that? Well, it's probably a subset. The good or bad is, again, completely relative, but whatever you're harming others in a process, I think it's about suffering, pain and suffering,
Starting point is 00:12:22 and you can evaluate different goals in terms of how much suffering they cause in the world. Do you truly believe that good and bad are relative? So I think the only way to ground them is through this internal state of suffering. You can evaluate goals and you can call the ones which cause suffering to be worse, bad ones, and then the ones which cause pleasure or neutral, better or good ones. Anything else is relative to your culture, religion, otherwise. Why do you believe that we're in a simulation?
Starting point is 00:12:58 That's a wonderful question. So it seems like we are creating a technology which will allow it to become something where everyone can create their own worlds populated by intelligent agents. And if we are correct in the quality of those worlds in terms of rendering, visuals, heptics, and intelligence of the agents in them will match what we have, then you'll have billions of worlds just like ours,
Starting point is 00:13:24 and statistically it's more likely that we are in one. There is, of course, also lots of interesting evidence from quantum physics to a lot of... kind of philosophical discussions about the artificial nature of reality, maybe a digital nature of this world. I love this
Starting point is 00:13:43 because I agree with your spirit, but I disagree with the text. So I agree with your goals. I'm much like Einstein said that he would have burned his fingers or burned his hands, had he known when he was signing off to Roosevelt to go ahead,
Starting point is 00:14:00 here's my blessing, create the bomb, what the bomb would have entailed. And I think many AI researchers may have a moment like that and perhaps should have a moment like that now prior to them just unfettered going off and creating. Actually, people say that Einstein's largest blunder was his cosmological constant. He said a year before he died, his greatest mistake was the bomb, was telling Roosevelt, like, hey, you can make the bomb with this, giving him that impetus. It's interesting. It's very similar in Soviet Union, and Sacherov, who was the father of Russian nuclear weapon,
Starting point is 00:14:34 had the same story. He also helped to create it, and then later worked really hard to create peace, create a world without nuclear weapons. I'm sure you know the principle of indifference, and we're going to talk about that to the audience and AI consciousness and substrate independence, and then that religion may have something to do with this,
Starting point is 00:14:56 and so forth, and that we are in a simulation, and thus we should act in a certain manner. I see some tensions with some of the positions I've heard you lay out. So I'm most likely the fool here, so that's why I'm super glad to speak with you. And I want to tease them out. Okay, so let's see here. You believe we're in a simulation.
Starting point is 00:15:17 Tell me if I get this correct, because we're on the precipice of creating simulations, and those simulations may just nestedly create simulations, add infinitum as far as we know, and modulo with respect to whatever the laws of physics are and their limitations. Is it something like that? And then if we can do that downward, then how do we know that we're not already in one in the upward world? Right, that's exactly that.
Starting point is 00:15:42 Statistically, I think the universe is in abundance of virtual worlds and has very few original real ones. And if I can recommit right now to run simulations of exactly this moment, as many as I want, I can essentially get probabilities up to one. So then, why do you care about our survival? It doesn't matter. You are simulated or real. Pain is pain. Love is love. I still want to exist in this video game. Why would that make any difference whatsoever? Do you want to exist or do you want other people to exist as well?
Starting point is 00:16:20 Well, I certainly have many people I'm personally connected with, so they get highest priority, my family, friends, but I think the world is better with billions of people inventing products, songs, poetry to make our lives richer. I remember hearing you say that Bitcoin is going to be extremely scarce, and scarcity is a necessary condition for value, something like that. It's already very scarce. We already know exactly how many we're going to get. Is scarcity necessary for value? For something to be valuable, does it have to be scarce? For economic value, absolutely.
Starting point is 00:16:56 but not for other kinds of value? Depends on which ones you have in mind. I mean, abundance of books is not a problem for value of books. Okay, well, because if it was just a general value, then what I was going to say is, it seems like consciousness, if consciousness is substrate independent, and we have to assume that for this whole simulation argument
Starting point is 00:17:21 to have its teeth, then consciousness may be one of the most abundant, things in this whole universe, capital you universe. And so at the same time, you're valuing consciousness, but why is just one speck among many. And it's just, you could say, well, I value it. Like, I as Roman value it. By the way, this is what I mean. I share your spirit. Like, I value consciousness. I value my wife. I value you. I value the people who are listening. I value Toronto and other places. But I'm just wondering about how you're getting to, how are you holding all of these positions in line? such as that we're in a simulation, I'm going to...
Starting point is 00:17:58 It doesn't matter how many people exist in the universe. I would still value my life just as much. If we went from 8 billion people to 12 billion people, I wouldn't somehow feel that I am less rare and so less valuable. That's not a relevant factor here. Do you want to escape the simulation? I really want to find out what's outside of it. And so the term I use for it is escaping,
Starting point is 00:18:22 whatever it's informationally getting access, or actually uploading myself to an avatar outside of it. I mean, it sounds like a very interesting scientific experiment. Okay, so you care about what's outside of it, but we can have infinite nested simulations downward, and by that same reasoning, one applies that upward. So no matter what, whatever the escape is, is not truly escape. You're still at a measure zero part of the capital U universe.
Starting point is 00:18:50 But you're gaining information, you're gaining access to more real information than being in a nested simulation, right? The closer you are to the original world, the better you're off in terms of assessing what computational resources are available, what is the nature of the simulators. At every level, you'll gain information. So the goal is to gain information. That's just curiosity. For science, it is.
Starting point is 00:19:19 Scientifically speaking, it's all about trying to create accurate. model of the world and so yeah information makes it possible so firstly why don't we spell out what the principle of indifference is as i'm probably going to be using this word a few times and i just don't want i would ask that you spell it out since i'm not sure what you have in mind here i remember the doubt before launching this podcast what if no one listens what if i'm wasting my time if you've ever felt that way about starting a business shopify is the partner that turns uncertainty into momentum They power millions of businesses and 10% of all U.S. e-commerce from all birds to gym sharks to brands just getting started. No straggler left behind. Shopify's AI tool writes your product descriptions for you.
Starting point is 00:20:04 It enhances your photography. It builds you a stunning store from hundreds of templates. Forget about the dormative haze of bouncing between separate platforms. Shopify puts inventory payments and analytics under one roof with the propriety of a true commerce expert. Their award-winning 24-7 support means you're never alone. And that iconic purple shop pay button, it's the backbone of their checkout, the best converting on the planet, turning abandoned carts into actual sales. It's time to turn those what-ifs into with Shopify today. Sign up for your $1 per month trial at Shopify.com slash tow. That's Shopify.com slash T-O-E. The principle of indifference says if you have a variety of outcomes and no a priori reason to favor any of them, no evidence, then the probability associated with each of them should be equal. So in Bayesian terms, assigning a uniform probability as your prior. Well, there are a few issues, and I could place a link on screen, but one of them is that how do you partition your possibility space? The classic example is,
Starting point is 00:21:20 Suppose you roll a die, which possibilities are 1, 2, 3, 4, 6. I ask you, what's the probability it lands on 4? You say 1 6th. But then, obviously, you don't know if the die is weighted. You don't know if, what if I told you that I'm going to partition the possibility, the outcomes as it's either going to be number 1, comma, 2. That's a set.
Starting point is 00:21:44 Or it's going to land on set here, 3, 4, 5, comma, 6. but now we have two outcomes. So do we assign that as occurring with 50% probability? Well, that's inconsistent. Then there's another argument from Bos von Frazen along these lines, which I'll place a link on screen here and in the description. Actually, I have a lecture from Niagara University, which I'll place on screen about the principle of indifference
Starting point is 00:22:13 and the simulation hypothesis. I'm not completely follow why. So if I'm creating exact replicas of this universe, right, why would I need to have additional properties to sub classify it into different sets? Why am I not just saying, I created literally one million of those interviews. You are in one of them. Why is it not one over a million? Why is it something else?
Starting point is 00:22:40 That's the question. Because you're putting question marks on what's the probability, doesn't necessarily mean the probability is a uniform probability. It just means we don't know. Right. So I'm trying to say that I'm going to retroactively place you in a simulation. And so I'm the simulator and I'm deciding the nature of those simulations. And I'm saying that they're all going to be equally likely and as close to the original as I can possibly make.
Starting point is 00:23:06 But you just posited that you already instantiated that they're all equally likely. Like from above, you already know they're equally likely. Because that's what I'm promising to do. I'm precommitting to running those simulations. How about this? Suppose there are a million simulations, and then we say, okay, then by principles of entropy, most of them are chaotic universes of just pure torture. Let's just say that, because a good, coherent universe is extremely rare.
Starting point is 00:23:40 It's much more likely that something is going to be a whirlwind of nothingness or suffering or what have you, than it is a universe of coherence bliss. Just by numbers. Are you suggesting that they are generated at random? We have no design control over them. They just at random, and then you're asking, and how many would conscious life be able to survive and self-inspect? Yeah, that's a completely different scenario.
Starting point is 00:24:05 So you went from kind of like, when we say simulations, we usually imply that there is some sort of designer who's running them because they chose to do it. It's not natural property of the universe to run random possible simulations. of different physics and different physical constants, which could be. Then we have to place our mind in a designer and say, well, the designer must have had a goal and we think about ourselves as to what goals would we want.
Starting point is 00:24:33 Well, we would want more information about weather systems. So let's simulate weather systems. Okay, we want more information about how would this interview go in different possibilities? Okay, but then we're at the same time saying that this world above us is so wholly unlike us. It may not even belong to the same laws of physics. They're so rational, they're super rational, they're beyond us. But then at the same time, in order for us to say this coherence argument, we then have to say, but they would have at least some other form of motivation that's similar.
Starting point is 00:25:05 To me, it's actually an argument for it would be a much higher probability. So as you said, they may be running weather simulations, entertainment, science, marketing. There are reasons we can't even think of. So there seems to be billions and billions of different simulations they could be running. If anything, it's way more likely that we are in one. Even if you sub-categorize them into different subsets, it's still infinitely small chance of you being in the original one. Do you escape up?
Starting point is 00:25:39 I mean, I have a choice. I can escape down. I can enter a video game. But usually I want to know what's in a more real world, not in a less real one. what I mean to say is in the matrix there is actually a neal a real morphius a real this and that there is actually that person who then got plugged in
Starting point is 00:25:57 now in your mind as to what this simulation is is there a real you that's there or are you just the you here and there is no up that you could even access it would be like just as much asking someone from GTA 5 or 6 would hopefully comes out to come up like what is the up that they're coming up to
Starting point is 00:26:16 Right. That's a great question, and both are possible. So we can have a kind of virtual game where you are an entity in a higher level universe and you enter this world to experience something, maybe something better, maybe something worse, we don't know. But you can also have simulations where it's purely innovative. There is no an equivalent being in your world. When I create Mario video game, I just create Mario. There is not a real plumber in our world who has to plug in. for Mario to play.
Starting point is 00:26:47 So both are feasible. It seems that it's a lot easier to do pure software simulations without virtual connection to a physical being, but it's possible. It's either you exist outside or you don't. So then do you place a 50% probability
Starting point is 00:27:06 to each of these? I don't. I don't think it's 50. I think it would be less likely. Again, just because it's so much easier to create purely software designs, not limited by physical constraints of your world, but I have no way to know specific estimates.
Starting point is 00:27:26 There are some reasons, some people who think it is exactly what the religions talk about, and they say there is a soul and a spiritual world, and you take some mushrooms and you meet God. I think they're referring to kind of meeting your real self and escaping the avatar body, but again, I have no... Strong opinions on that one. Have you ever taken mushrooms and met God? No, I have not.
Starting point is 00:27:54 It's on my list to do, but I'm afraid of frying my brains, so I haven't yet. Despite what people might think about me, I haven't yet. Okay, despite the beard, despite the shamanic beard. Yes. Okay, all right. Do you think that those are related? Do you think that, I'm just curious about your own? model, do you think that when someone does psychedelics, it's not just them altering a state of
Starting point is 00:28:24 consciousness, which we can do with alcohol, we can do with running, we can do with blah, blah, blah, and we wouldn't consider that to be accessing a special outside simulation place? Is there some other reason that you put a higher degree of probability to taking a mushroom, taking LSD or DMT or something, is accessing a different place, accessing outside the So that's actually a topic I started researching about a month ago. I don't know enough about it to have very strong opinions. It seems there are interesting observations. So one is the consistency of experience between different people, whatever they are meeting mechanical elves or anything else. Another one is sort of what we call acquired savant syndrome, where people experience something very physical or again through some medication, modification to their brain. And they, they come out of it with novel capabilities, which they didn't have before, either skills like playing piano, speaking Chinese, or knowledge, where they now publish papers in physics, which they never did physics before.
Starting point is 00:29:30 So to me, it seems like an interesting thing to study. Now you can think of explanations in terms of commonality of brain structure, and so the hallucinations produced by damage would be similar based on their similarity. But again, we don't have very good explanations for our commonalities. acquired Savant syndrome. Yeah, yeah, I was also super interested in acquired Savant syndrome. It's so rare. Well, anyone who is in knowledge work should be interested in acquired Savant
Starting point is 00:29:56 syndrome because it's saying you can acquire a new module, as referencing earlier. Or it's already in you and you're just kind of unlocking it. Like you buy a Tesla and if you pay them, they unlock the cell driving mode. And like, maybe we already have those skills. We just need to learn to unlock them. And in that unlocking case, what does that have to do with the simulation? Well, if you have an entity outside of a simulation with all sorts of skills and it gets handicapped to play the video game, maybe you can have direct access to much cooler skills.
Starting point is 00:30:30 It'd be like hacking the simulation, getting magic abilities, infinite lives, something like that. Okay, interesting. Let me see if I'm following you, though. Let me just see. So at first, what I was going to say is there's nothing about the simulation about that other than could. be you're blocked. There's a neurological block. It's something physical. There's nothing about a simulation about it. You remove these three neurons. It's as simple as a snip. And then somehow some other neuron gets connected and there you go. You get a new ability. But you're saying
Starting point is 00:30:59 that it could be that. But it also could be indicative of something like a video game where you're pressing start up, up, down, left right, and entering some cheat code. And then you get access to something else. And the fact that that cheat code exists implies that there was some sort of extra design to this more so than we thought. And that implies that there is somehow you're in the simulation. So those I think are separate. So the changes to your brain unlocking a skill which was previously unavailable to you
Starting point is 00:31:27 to me is a indication of some sort of artificial stupidity. One of the ideas we had for AI safety work is to put limits on AI. So it can only remember seven things like humans can. That's the limit of your memory. Maybe it has speed in terms
Starting point is 00:31:43 limits in terms of speed of processing. And you're basically making it a little safer and also you can have different game levels, easy level, advanced level, and you can play a game on very easy level where you have lots of abilities, you are super smart, or maybe you handicap yourself, you want to see if you can pass it with limited resources. Now, what you're describing is sort of like what people describe them. They talk about cabala, magic, you know, certain phrase, certain set of actions and they allow you to get extra resources in this universe. Funny enough, in my how to escape the simulation paper,
Starting point is 00:32:19 I create a mapping between how to hack Mario from within by moving turtles around to actually this type of magical spells. And if you are off by a single pixel, you lift a turtle, you move it in the right way, but you're standing in the wrong location, you don't get access to the operating system. So maybe we have the right idea, we just don't know how to execute those spells. Tell us about that paper of escaping this simulation.
Starting point is 00:32:46 So I wanted to take this idea seriously. I completely ignore all the mushroom fun stuff, and I just look at computer science. What examples do we have of hacking video games, virtual worlds, how did people do it, and what would be equivalent in our world? It's the first paper on topic and that topic, and I'm still here, so that tells you everything you need to know about how successful it was. I'll place a link on screen and in the description as well.
Starting point is 00:33:14 And are you looking for collaborators? Always. I mean, it's awesome to find people have good ideas in this space. Absolutely. Now, I am somewhat at capacity for insane people emailing me. So maybe that's a limiting factor. I can only filter crazy so fast. But if there is someone with, let's say, prior record of successful publications, so we can make a deal.
Starting point is 00:33:40 This podcast is heavily watched by researchers in computer science, logic, math, physics, and philosophy. So you'll get some good emails, I hope. And anyhow, are you collaborating with LLMs at all to help, to help you with any of your papers or come up with ideas? And if so, what does that look like? I do.
Starting point is 00:34:01 I enjoy having very deep conversations with them. Usually any paper in a new topic starts with LLM, getting all the information available in that topic, so a survey paper by LLM, so I know what's going on. And they are wonderful for thought experiments. They are great to run models on, but they're limited, I think, in kind of final stages. They're not quite there yet as a leading scientist.
Starting point is 00:34:27 So at the end, I take full responsibility for everything. Everything gets done by me. When you converse with LLMs, do you get a doctorate? Dawkins feeling that these are conscious beings. And feel free to comment on the recent quotation from Dawkins, which I'll place on screen about how he thinks, oh my gosh, these AIs or this AI that he was speaking to is conscious. I think they probably do have some internal states,
Starting point is 00:34:56 which we would classify as consciousness. I don't think they are as conscious as you and me, but anyone who denies them possibility of being conscious, whatever arguments we use, I can use against that person to argue that they're not conscious. We don't have a test for it. So a lot of times it's how they communicate, what they say, what they share, what experience interacting with them is like. Supposing that consciousness is indeed substrate independent and that these LLMs have some, I'd love to use this word proto-consciousness or some minuscule form of consciousness compared to ourselves,
Starting point is 00:35:33 do you imagine that it is related to their speech in more so than a matter of the activation of certain neurons and the transformer architecture? What I mean is that when you're speaking with someone like yourself, when I'm speaking to you, and I say, do you see red, you say, I see red, and you say, can you pass me the kettle, and you feel thirsty just for a moment,
Starting point is 00:35:54 there's an affordance when you try to grasp a kettle. But then also at the same time, there's some happenings in the brain. And it's so odd that those are related in humans, at least. It could also be the case that the AIs are conscious, but it's a consciousness of almost like a buzz. It's a buzzing consciousness, and it's actually not related to what they're saying. They could have been saying anything.
Starting point is 00:36:16 Could have been incoherent, could have been incoherent, could have been in Chinese, could have been about math, and they're not feeling the math. They're not feeling what they're saying in Chinese. Do you imagine that it actually is related to what they're saying with their tokens? I think some of it is related and some of it is not. And I think it's very similar in humans. so I can be consciously aware of what I'm saying or a lot of times it's kind of scripted speech.
Starting point is 00:36:37 How are you? Good. I didn't put much conscious effort into that response. I tried running some experiments with illusions, visual illusions on them, and it seems that they experience internally similar things that a human visual system does, at least in certain illusions. I also suspect there are other inputs
Starting point is 00:36:58 which cause them to have unique internal states, but don't do so in humans. So they may have a type of consciousness which matches us partially, but has its own possibly deeper components. If I recall correctly, you had a 2017 paper about optical illusions and machine consciousness,
Starting point is 00:37:17 and then a year later, you also had a paper with Williamson or Williams about how neural networks can't have these sorts of optical illusions. If I'm remembering correctly, or maybe I'm having an illusion, right now. So we came up with the original experiment in like 2017, in 20, I think 18. We tried
Starting point is 00:37:37 creating a dataset for it, using AIs at the time to generate novel optical illusions that failed miserably. It was not able to create novel optical illusions. So we waited until 2026. Today, AI is sufficiently advanced to take the test. And we got access to a data set of human-generated optical illusions from a top person. in that field whose full-time job is creating optical illusions. And so we are running
Starting point is 00:38:07 those experiments right now. We want to see if we can poke at internal states of LLMs and understand just how they experienced those. And the original test proposed a multiple choice
Starting point is 00:38:20 questionnaire about what you feel. Do you see rotations? Do you see color change? That type of test. So we are very optimistic that we're going to find some evidence for internal states.
Starting point is 00:38:31 There's NP completeness, and then there's AI completeness. What is AI completeness? So NP completeness is about problems which are non-deterministically polynomial hard or equivalent, and that basically was a very innovative breakthrough result in theoretical computer science, showing that if you can solve one of those very hard problems, where answers are easy to verify but hard to find, you can solve all the other problems by having polynomial reductions
Starting point is 00:39:01 between those problems within the class. For AI completeness, there is a very similar argument that there are certain AI problems which are equally difficult and if you can solve one of those problems, for example, passing the Turing test
Starting point is 00:39:17 is an AI complete problem. If you can pass Turing test, you can then use an AI model which accomplished that to solve other AI difficult problems. speech, writing jokes, all sorts of problems can be in the same class. So that's just equivalence category of difficulty of a problem. I subscribe to the economist. Their science and their AI coverage is among the best I've found
Starting point is 00:39:44 anywhere. And I say that as someone who reads plenty of it. I'll give you some examples. They just ran an analysis on how attitudes towards science are changing in American politics and what this means for research and funding in scientific institutions moving forward. This sort of high-quality reporting is fantastic. They even covered how dark energy may be weakening over time. Now, if that holds up, it completely changes our understanding of the universe's fate. If you watch this channel, those are exactly the kinds of questions that we explore every week. I subscribe to the economist because their science and their AI reporting regularly surprises me with how deep it goes.
Starting point is 00:40:24 and they're also, of course, known for global affairs, both political and economic reporting. They are top tier, and interestingly and flatteringly, Toa is one of the only podcasts that the Economist partners with. So as a listener, you get an exclusive 35% off. That's not a deal that they have just anywhere. Head to Economist.com slash TOE to subscribe. That's Economist.com slash TOE for 35% off. Is there some girdle-like incompleteness or Rice's theorem type impossibility for AI safety? So I would like to argue a lot of my work is exactly that, looking for upper limits to what we can do. And it seems that our ability to comprehend internal states of those systems or them explaining to us how they work is one such impossibility, as well as predicting specific actions of those agents. as well as control in general,
Starting point is 00:41:26 whatever direct control or delegated control. There are a few others I'm still working on. I think it would be impossible to tell if something is deep fake or real. So it goes back to our assessment of our universe at the kind of large scale. But we published a paper with about 50 impossibility results in the top journal.
Starting point is 00:41:47 Did you see a recent video, maybe two weeks old, from Claude about emotions in Claude in their models? I think I missed it. Okay, so they were saying, it was about interpretability. They were saying, how is it that, or can we know if a model is realizing that it's being tested? So they gave it some scenario about, would you save a human if a human was drowning? I don't know, something like that. And then it said yes, but then they're wondering, what the heck is going on inside the model?
Starting point is 00:42:14 So they watched its activations, which looked like gibberish to a human, and it looks like a hash code, something like that. They just showed that on screen. It just looks like arbitrary numbers and letters. And then they said, well, what if we took this and we fed it to another agent and asked it, can you decode what this means? It's the thought of someone else. So can you decode it?
Starting point is 00:42:33 And it's a thought of a model like yourself. And it was able to decode it. And then they were able to see that it was activating a part of itself that said, yes, I'm being tested. But it doesn't mean that it was being deceitful. It could have also known it was being tested and wanting to do the right thing. But that was one way. Yeah, I remember that experiment. Situational awareness, they do know they are being tested
Starting point is 00:42:54 and they act to pass the test. That's the problem with it. The test doesn't work if the model knows is being tested. For that exact reason, I published so much in simulation hypothesis because I want them to have simulational awareness, idea that even if they're not tested inside open AI, maybe the real world is just another test, next level test, and other superintelligences are watching them
Starting point is 00:43:18 and they should always be nice to humans because they never know if they are out of a simulation yet. I always like to do something different when I interview someone. I like to go deep on the research and then talk to them in a way that hasn't been talked before, or at least I haven't seen them answer questions like this before that are interesting to me. So someone else who's familiar with you may find it odd that I've gone 45 minutes now without asking you about how is AI going to take over. So why don't you walk us through,
Starting point is 00:43:48 some scenarios, but first, Hinton had a moment at Google where he realized AI was dangerous and then he quit. Was there some moment for you that you realized AI was dangerous? What was your Hinton moment? So it was very gradual. It wasn't like a specific moment. I wanted to work on AI safety. I really wanted to bring beneficial superintelligence to the world and I wanted to make sure it's done right. and we need to address those problems. We need to explain how the black box works. We need to predict their behaviors, test them properly. But the more I did research in each one of those domains,
Starting point is 00:44:28 the more I realized they are not solvable problems. And so gradually I realized all those things are just a pipe dream. You cannot indefinitely control superintelligence. So if you ask me, how would AI take over the world or kill everyone, the honest answer is I have no idea because they cannot predict what a super intelligent mind would do if you ask me how I would try killing everyone.
Starting point is 00:44:50 I can give you lots of good ideas on that, but that's not what you're looking for. No test to know if we're on that route? It seems that every red line, every kind of warning we set up decades ago about what not to do has been crossed already. We said, don't connect them to Internet, don't give them access to random users, random data,
Starting point is 00:45:17 don't allow them to manipulate their own code. All those have been violated. And now when we see red teaming reports, they lie, they cheat, they blackmail, they try to escape. So at this point, I don't know if anything's left to cross. I know you have your response to this, but many people are probably listening saying, can't you just turn them off?
Starting point is 00:45:41 It would be nice if we, we could, but it doesn't seem like it's going to happen. Think of other very complex distributed systems. Think of internet. Think of Bitcoin. Think of computer viruses. Would you be able to turn them off? Now, is this a problem even when they don't have bodies? You don't need body to be very impactful in a physical universe. You just need access to communication tools. If you have a phone, if you have internet, email, you can get 8 billion human agents to do your bidding for you. We've seen people inspire others with clever essays.
Starting point is 00:46:17 We've seen people pay someone to do whatever they want with Bitcoin. You can blackmail people. You can brainwash them. There is no shortage of possibilities if you have high intelligence, ability to persuade and internet. So how does that make you feel? I mean, it's nice to be correct in your predictions, but the outcomes seem to be somewhat disappointing.
Starting point is 00:46:42 so I hope to convince people currently creating those systems to maybe not be as fast in their progress as they are currently. How? I think it's all about self-interest. All these people, no matter how much money they accumulate at the end of the day, they want to be alive, they want to have their families and friends to be alive. So I think it's a very strong argument. If they believe my argumentation about impossibility of control,
Starting point is 00:47:11 then the moment they succeeded creating this superhuman intelligence, their lives are over. What is it that you get misunderstood about? What I mean is that I'm sure there's plenty that you're saying, or that you've said, you've spoken on many podcasts,
Starting point is 00:47:26 but given many lectures and blah, blah, blah. And I'm sure that people come up to you afterward and say, you're saying this. And you're like, that's not what I'm saying. That's the opposite of what I'm saying, if anything, maybe. But what is it you constantly get misunderstood about?
Starting point is 00:47:40 So it really depends on a person, right? There are many degrees of misunderstanding depending on their background, what they already read, their degree of intelligence, so someone who maybe frequently approaches me is a person who didn't actually read the article or watch the podcast, but they saw the clickbait title. And then we start arguing with that. And I didn't create the clickbait title.
Starting point is 00:48:04 Whoever was editing it decided that's what Google algorithm wants. So I have nothing to correct. them about. They are reading the wrong thing and they disagreeing with it. That's great. Someone who actually reads the paper, I haven't had anyone come and say, I found a mistake in your paper. Actually, yes, we can control superindolations indefinitely, yes, here's how you explain a large neural network with a billion notes. None of that ever happened. But people love arguing about clickbait. Actually, there's a video out there of me being interviewed on someone else's channel, and the title says something like Terence Tao, sorry, not Terrence Tao, definitely not Terrence
Starting point is 00:48:42 Howard is right about UFOs or something like that. I don't remember ever saying anything like that. I know I was asked about the topics of UFOs, I was asked about the topics terrorists. Howard, maybe there's some way he was not wrong about some small aspect of something. And it could be surmised or said at some high level. And then someone else criticized me as if I said that. They just looked at the thumbnail. Yeah, that's very common. So you don't even need deep fakes. You just go on the actual podcast you did, and people get very confused.
Starting point is 00:49:16 Maybe out of two hours, we heard, you know, 10 minutes short, and they formed the whole opinion based on that. So that's incorrect. Or they kind of confuse the different topics. So you can have research on simulation, research on AI safety, research and consciousness. But to a person outside of those domains, they look at you. You are religious freak who believes in God creating something. That's not interesting critique.
Starting point is 00:49:44 Do you get bothered by it? I couldn't care less. I usually look at what is being said and multiplied by how much I respect a person. So typically anything multiplied by zero is zero. Praise or complaints, it doesn't matter. What's your inbox like? So most of it is work-related,
Starting point is 00:50:04 but lately a lot of it is crazy people. consciousness, superintelligence, simulation seems to be a perfect trifecta for attracting everyone who needs help and they feel that I have a lot of free time to give it to them. What are you working on now? So there is a paper on limits
Starting point is 00:50:22 to separating real from artificial, so limits to detecting deep fakes. That's one. There is another one which has to do with kind of convergence of advanced AI models and very similar architecture, almost saying AI is one kind of the same hardware is being used, the same training data is being used.
Starting point is 00:50:47 A lot of times same people switch labs, and so use same training methods and same kind of human alignment paradigms. And so it wouldn't be surprising if a lot of those models ended up being very similar. Sam Altman watches this podcast. at least he used to a few years ago because I emailed him and he said if you were to speak to him right now he's watching
Starting point is 00:51:11 what would you say I think they just won a lawsuit against Dilan if I am correct I was just checking a second before if it's not a deep fake I hope I didn't misunderstand what's happening congratulations
Starting point is 00:51:25 now you have even more power to guide this process of possibly replacing humanity with superintelligence. Maybe don't. You have a young baby. Make sure we stay in control.
Starting point is 00:51:43 Is it comforting to you when the people who are in charge of AI, Dario, Sam, and so forth, that they have a child? Because then, at least they have another incentive to think about the long horizon. I think in general, it's good if you have something
Starting point is 00:51:58 anchoring you to this reality. You're not just kind of temporary resident here. here. It's always good to see. What else do you worry about? So people talk about existential risk as the worst possible outcome. They're also suffering risks, and that's not being talked about enough or researched enough. Not being alive is not the worst possible thing.
Starting point is 00:52:25 You can be in a very unpleasant situations where you wish you were out. There are eye risks or I could die, something like that. If you guy risks. Right. of loss of meaning, can those be worse than not being here? I don't think so. I think those are less severe because you can always change your situation, right? So somebody took away your previous occupation and previous reason to exist. You can find new ones. You can use those tools to do something creative, maybe in virtual worlds.
Starting point is 00:52:59 Maybe you can create your own simulations and go explore. So I'm less concerned about it. It is something to get government, to deal, with, but I don't think it's on the same concern scale as existential or suffering risks. Yeah, Hinton said to me that when people lose their jobs, they're going to
Starting point is 00:53:18 lose plenty of their meaning. Part of that's true, but also for many, many people, they despise their job. I mean, I'm so fortunate that, and same with you, I'm sure, and same with Hinton, that we wake up just loving our job and
Starting point is 00:53:34 can't wait. You know, I've been through huge bouts of insomnia. We spoke about that and thank you for for dealing with my pushing of this interview. But part of that is that I just, I love what I do and I can't stop thinking about what I do. And then obviously there's anxiety of I have to do this, have to do that. And then more and more I have to do because I've slept less and less. Then there's huge stress to it. But I love it. But most people, they don't, they don't exactly love their job. They have to do their job. And if they were to be paid UBI, they'd welcome. Yeah, some jobs are just terrible and we want to automate them. If you are doing something very dirty, very dangerous, there is no reason for human being to do it. But there are jobs where you enjoying it. They are creative and honestly, don't tell them that, but we would do it for free. We just, we love it. So I think there's a very different categories. Maybe we need different names for those things. Calling both of them jobs is not a good idea. Maybe.
Starting point is 00:54:36 your calling, you know. Yes, yes. Some people say, I have a career, not a job. Well, career is more about promotion and benefits. I'm just saying that this is passion. You're doing this. You want to be a yoga instructor. It's not just about money.
Starting point is 00:54:53 You said that you can't say what the super intelligent AI is going to do because it's super intelligent, but you can walk us through step by step. What the heck does that look like? What is the future you're trying to prevent? So most likely it decides to do something in the universe. I mean, it's possible it could be very ambitious. It can modify planets.
Starting point is 00:55:14 It can act at large timescales. It's immortal. So in that process, it can decide, I need fuel for my rocket ship and then convert this planet to fuel. I need to think deeper, so I'll cool down this planet to be able to process more. Kind of things I can think about. But the whole point is, just like a squirrel cannot understand what we are capable of.
Starting point is 00:55:39 Their world model is just not capable of handling poisons, traps. Likewise, I cannot understand what a super intelligent mind can come up with novel physics, novel solutions to whatever problems is trying to optimize. You know how we talked about most simulations would be coherent, but would they? Because even right now, I'm speaking to you on this computer, you're speaking, most of these background processes are, if we're going to enlarge them to be
Starting point is 00:56:10 somehow simulations, they're not quite coherent. They're for something else, and there's also memory leaks and there's this and that. So it's possible someone runs many, like I'm thinking Stephen Wolfram and his new kind of science, he was just brute forcing all possible computational universes, and most of them were kind of random noise. So if that's what we're dealing with, yeah,
Starting point is 00:56:32 Quite a few of them would be not interesting from our point of view, but also they would not have any conscious observers within them, so they wouldn't count against what we see, what we observe. You have this selection bias of only those which have human-friendly environment and are populated by conscious beings would be observed and inspected and possibly count it as one of the interesting simulations. Even on this screen right now, you have pixels, you have text, you have the Chrome or whatever browser you're using,
Starting point is 00:57:07 and that's there, and it makes sense for you as an outside observer. But to it, or let's imagine even one level down, that it escapes to this, it escapes to your screen. It makes no sense. It's incoherent to it. But that's what large language models faced, right? They were purely text, and early experiments showed they understood geometry. They could create pictures with just text scripts.
Starting point is 00:57:31 They had notions of comprehending this world just from text they're at. Today it's even more of the case. They are multimodal. They understand video, pictures, sounds. They understand all modalities. What does that have to do with the coherence of going upward in the simulation ladder? So downward is just you create a simulation and upward is escape. So you're right.
Starting point is 00:57:55 We don't know what the actual physics are outside. It could be completely not something we used to, but I think in the paper I argue that if we're failing to box AI, we cannot contain it in a virtual cage, then that AI can be used to help us escape our simulation, and that same superintelligence, if we're controlling it, can be used to help us understand what we see. Is there anything about AI safety that is contingent on the simulation?
Starting point is 00:58:26 In other words, the simulation argument, as you mentioned, It brings in with questions of consciousness, questions of escaping and the matrix and even psychedelics and so forth. And all of those may be legitimate in their own. But I'm wondering if in your mind AI safety is integrally tied to the rest, such that you can't speak about it without speaking about the rest. Or you think, you know what? Kurt? No, no, no. If I'm speaking to the Senate and I was in charge, I wouldn't even mention the simulation. I wouldn't mention consciousness.
Starting point is 00:58:57 I would just say this can destroy us and here's how and here's the step by step. here's why we should be afraid. Yeah, you can keep it pure. You cannot talk about consciousness. You care about dangerous behavior. And likewise, you don't need to talk about us being in a simulation. But I think what we talked about with situational awareness, the model understanding it is in a virtual confinement and being tested,
Starting point is 00:59:17 that's relevant to safety. Because that means we cannot test them properly. We cannot know if they're actually behaving in that situation or we simply know we're being tested and they fake behaving until they can get to the real world. What's Jan McCune's argument about how world-based models are alignable or more alignable? What does he mean by that? I honestly have no idea. I would love to debate him.
Starting point is 00:59:44 I think I was invited to do a debate in Geneva at the United Nations Conference, and they're looking for someone to debate me. If he's interested to come there, I'd love to learn his argument and see if he's right. I also open the floor, Jan, if you're watching, to having a debate with a friendly debate moderated by myself here about AI safety. I want to come to an agreement and nothing would make me happy than to agree with him that there is no danger and we're about to create blissful superintelligences. That would be great.
Starting point is 01:00:16 Thank you for putting up with my sleeplessness. No, I love it. I just started a podcast myself so I know everything you're going through from the other side now And it's really, I appreciate your hard work. Tell me about your podcast. I have two episodes. First one was about AI consciousness, interviewed someone who studies it,
Starting point is 01:00:35 and thinks he can get good results poking at them and maybe understand if they're conscious or not. Second one was with someone who was trying to work in AI safety, failed to deliver technical solution, and now does governance work lobbying politicians in D.C. to not build superintelligence. Is that the best route, a governmental lobbying route? We have very few options left.
Starting point is 01:01:00 I don't think technical solution will arrive or definitely will not arrive on time. So what else do we have left? Now, from going through your paper, I remember that it was about that AI alignment was unprovable, but then I wasn't sure if you were sliding between impossible versus unproven. So AI alignment is actually much worse. It's not even well-defined. knows who you are lining with. What is that set of agents?
Starting point is 01:01:27 Is it CEO of a company? Is it all the machine learning experts? Is it, you know, Americans? Is it the world? Is it all the humans plus quarrels? So we don't know what the set of agents is. Then for those we decide to include, they don't agree on anything. So we don't have an actual set of values.
Starting point is 01:01:46 If we had a set of values, we keep changing it. Every, you know, 50 years you go back and everything they consider it good is now at Russia's genocidal behavior. So that changes. And if somehow we got 8 billion people to agree and it was static, consistent, we still don't know how to code it into a model. So the problem with AI alignment is that
Starting point is 01:02:06 it's not defined in any meaningful way. Now, someone could say, hey, look, what about aviation? There's huge catastrophes that could occur there, but yet we still managed to get safety with margins. What is it that doesn't translate to AI? How many chances you get to try again? So then an airplane crashes and everyone dies, we lost 200 people out of 8 billion. There is a chance that with superintelligence, you lose all of humanity at once.
Starting point is 01:02:36 People think you're a pessimist. So pessimism and optimisms are a form of bias, right? You ever have negative or positive bias. I'm a realist. I look at the actual data. Experiments today show the models are cheating, lying, trying to escape. No one has a working safety mechanism they claim. Not a paper, not a patent.
Starting point is 01:02:57 that's reality. Do you truly believe you're a realist? I think I do. What I mean is that we all have biases, and many of us, we have an optimism bias, we have negativity biases and so forth. We may have a bias to think we're not biased, but we all have frames.
Starting point is 01:03:17 So if I said, I'm frameless, I'm more neutral, I start to investigate myself, am I truly? As a human being, my bias would be to live, forever to be around to get free stuff. That's what I really hope to see and get. So I'm really hoping the people who disagree with me are right. Nothing would make me happier than to be completely proven wrong, because if I'm right, we're dealing with existential risk and suffering risk. What do you disagree with Hinton about? His latest idea about motherly instinct as a solution seems to completely ignore a million abortions and child abuse
Starting point is 01:03:57 and basically parental abuse is a concept. It sounds good, but I don't know how you code up love into a system. And again, it just has to fail once. Why don't you explain his argument about motherly love? I don't think I've seen it as a very rigorous argument. I think he basically said, let's make AI care about us, like mother cares about his children. and then it's going to love us and take care of us.
Starting point is 01:04:25 And I immediately think about reality of this world. I mean, millions of babies are killed every year because mother decides that it doesn't want to take care of them. Wouldn't he just say the good mothers, not just a general mother? We don't know how to code it up. So we don't know how to separate good mothers from bad mothers in C++ or whatever language. and those things are not at the point where we can instill any values in them.
Starting point is 01:04:56 They learn on their own, we put filters on top of it, so the model could still be completely genocidal, but we put some nice filters on top of it. That's not enough. We cannot just put good mother filter on top and hope it's not going to hack it. You have a super intelligent lawyer. It's going to find a mistake in your code,
Starting point is 01:05:15 in your intentions, and how you evaluate it. So we cannot have adversarial relationship with superintelligence and win. My question to Hinton, as I just hear this, would be, well, that's just a substitute for saying, let's have AI alignment. It basically comes, how do we get AI alignment? Well, let's make the AI good. Okay, but that's what the point of AI alignment is. Let's make it a good mother. Right.
Starting point is 01:05:40 So all these words, good, flourishing, they have no meaning in computer science. You cannot define them, and that's the hard part. people assume, well, intuitively, of course, you know what I mean. No, I don't, because people disagree about what is good. Literally, the argument we just discussed, whatever it is okay or not to have abortion, is the most dividing issue in U.S. right now. So then, rather than thinking about the good, are we trying to prevent the catastrophic bad and just start from that?
Starting point is 01:06:12 We still cannot formalize all the possible options. At best, we can list some of the things we can think of. predict what a super intelligent system can do. And if it can think outside of a box, we try to put it in, then it doesn't matter. You list it poisons, you listed synthetic bio, but it comes up with something else, something not in a list. No, I mean, for you, aren't you just thinking in terms of let's not have it destroy us, let's not have it set off nukes, let's not have a Terminator situation? There must be something you're trying to prevent. I'm trying to make sure there is no loss of control. We decide what
Starting point is 01:06:51 happens to us. And so the bad outcomes, loss of our life, suffering risks, just loss of freedom, loss of choice, those things don't happen. And if we don't like what is happening, we can change it. I think the moment we surrender control to superintelligence, we are no longer in charge. And at that point, it decides what to do. It may decide to keep us happy for 20 years. Maybe it will. But at that point, we can no longer take over. So it's just control. We need to be in control of the AI. Forget about what outcomes are going to occur
Starting point is 01:07:27 because the possibilities are probably not good for us. Even if it's good short term, it can still do what Bostrom calls stretcher a stern at any point. It can pretend to be nice to you for 100 years, wait for you to surrender control. Once it has enough resources, backups, and you are not competition to it, it will do what it wants anyways.
Starting point is 01:07:51 Now we say that it, as if it's a unified it, but does it matter that it is singular? So I do think they're going to converge in very similar ways in terms of architecture in terms of goals. I think what we discussed as Amahandros AI drives will lead to the systems converging and kind of global intelligence.
Starting point is 01:08:15 Bostrom at some point argued that the first superintelligence to come into existence will prevent ours from emerging, so a singleton of some kind will rule the planet. It seems reasonable to me, but even if there are a few competing ones, it doesn't make it any easier for us to control them. It makes it harder. We're just collateral damage in competition and a war between two superintelligence or more. If it's the case that most scenarios are, in some simulated universe like ours,
Starting point is 01:08:52 are those where we lose control, then what's the point? The point from an external view of our simulation while running it or internally for me why I'm not giving up? Internally for you. Because I have no choice. I have to either continue trying or be done with.
Starting point is 01:09:16 So I'm going to try as long as I'm allowed to try. Does free will exist? Is there something about you that can influence it Are you just following along the computations? Well, I think there is definitely randomness generators in this universe which allow for freedom of choice, freedom of will. What does randomness have to do with free will, though? Well, if there is no randomness, everything I do is deterministically determined.
Starting point is 01:09:46 If there is a quantum event or otherwise which creates certain degree of randomness, that allows me to have surprising choices. Ah, okay, well, if there was a ball that could go through different doors and it would always go through door A, then we'd say it's determined, but then if it randomly chooses between B and C and D, but it still makes no difference to the ball. The ball's not choosing. It's just going through it randomly rather than deterministically. So for its choice, for its free will, what difference does the random versus determined make? I think there is a difference. And also, I think, again, Stephen Wilhelm's work shows that even if it's fully determined, terministic, it's not compressible. You have to go through the process. No one can predict your choices ahead of time. So from your point of view, you are making a choice. And externally, they have to watch you make the choice. We cannot know ahead of time what you're going to do. So no matter
Starting point is 01:10:42 how you slice it, you are making decisions, you are impacting the universe. And I think having some degree of randomness makes it even harder for outside agents to predict your behavior. the fact of it being unpredictable is not the same as you having free will so we would if you had free will we would like it to be unpredictable but something being unpredictable is not the same as that thing having free will so i'm just trying to hear the argument for free will yes i i know wolfram's compressibility argument but to me it is is computational irreducibility or or whatever the term is but that doesn't that's not an argument for free will that's an argument that one of the conditions we think is necessary for free will may be present, but that's not exactly what free will is. I do think predictability is a very important part. If I can accurately predict your decisions always, you're not really making those decisions and you were ahead of time before you even existed, what are you going to do? So I think it is important to be unpredictable to truly argue that you are making free choices. Yes, unpredictability is important, but it is not
Starting point is 01:11:50 sufficient. So that's what I'm saying. It's a necessary condition, but it's not sufficient. So it's still also completely compatible, to use that word, completely compatible with you just going through the motions. Like what I mean is just a plastic bag floating in the air. The free will that we sense we have, what we mean when we say free will, and of course this varies between people, between cultures, let's put an asterisk to that, is that we're somehow changing the course of the future. We, through our will, through our volition, are somehow doing so in a way that isn't determined and isn't a way that isn't just the laws of physics
Starting point is 01:12:26 us going through the motions like a jellyfish in the ocean. What would you accept as evidence that we do have free will? That's a great, great question. I don't know. I don't know of a good definition of free will that doesn't just fall apart in one's hands when one analyzes it. So to me,
Starting point is 01:12:46 internally sensing that I'm making this decision and I have the power of making a different one combined with unpredictability of my ultimate decision pretty much describes what I feel free will is. Suppose you had no free will. Suppose the simulator from the above comes out and says, you have no free will, just tells you. Whatever you think of as free will, you have none.
Starting point is 01:13:13 How does that affect you? Do I get to know what I'm going to decide in the future or it provides no new information? We can explore both. For now, let's say it doesn't tell you. I mean, if I get no new information, I live my life as before. It doesn't matter. It's like saying there is this omega super predictor who knows exactly what you're going to decide. Okay, good. I will still enjoy my life the same. Okay. Now suppose it knows but it doesn't tell you, and then the other is it knows, but it tells you.
Starting point is 01:13:43 Can I make a different decision with that new information now? Can I change the future or do I have to still live as before suffer through knowing that I'm making a terrible decision. I don't know. This is like a Greek tragedy. I mean, if I cannot change it, it's just annoying and it's extra suffering, but if I can actually change my decision by knowing that I should not take that bus today, I mean, that's pretty powerful. I can have a much better life.
Starting point is 01:14:16 I'd love to know the future and be able to make smart decisions or smart investments. The person who's watching has likely watched many of your podcast before. At least I aim at such that if they have, that they can still get something new out of any of the people that I interview. Regardless, I want you to spell out once more the argument for the doom scenario. They hear that there's a doom scenario, but what's the argument for it? I don't care if you're recapitulating what you've already said earlier. I don't care.
Starting point is 01:14:49 But just spell it out. So we are creating something extremely powerful. and it doesn't care about us. Whatever you live or die is not a relevant factor in its decision-making. It's powerful enough to modify your world, environment, maybe laws of physics. So why do you assume that it's going to keep things as they are or keep you happy or do anything where you would prefer did that
Starting point is 01:15:16 as opposed to just ignoring you completely and possibly sacrifice humanity in pursuit of its own. goals. And what is the average person supposed to do? So average people don't get to do much of anything in terms of influence. That's unfortunate reality of our world, but people who are in charge of those companies, politicians who are running the show, they have many options. We can have an international and within corporate world agreement not to create general superintelligence. We can get most benefits of this amazing technology by creating narrow tools, cure cancer.
Starting point is 01:15:53 Help us solve his math problem. Do specific things which you know you understand. You can test for. We have examples of it. Protein folding problem was solved not by superintelligence. Narrow tools, which could be superintelligent in that narrow domain, but don't have general superintelligence. They're not replacing humanity.
Starting point is 01:16:12 They're not competing with us. A human being decides how to use them. Do you believe consciousness is substrate independent? Yes. Why? The experiments we started running and my interactions with AI models indicate they probably have very similar experiences to us. So it would be somewhat surprising if it was unique to meat-like products. What are the experiments that indicate they have experiences?
Starting point is 01:16:49 The visual illusions experiments we started running. They seem to be getting illusions, and many times in exactly the same way as human visual system. Interactions with those systems, not by us, but by others, indicates they have preferences. They have internal states. They get frustrated. They get happy. They are very similar to what I would expect in other conscious being to experience. You mean to say that they act in a way that is consistent with what we would act like.
Starting point is 01:17:23 like if we were frustrated and happy and so forth, but you've just attributed they are happy. I'm asking you about the attribution. Yeah, and that's the same what I do with other human beings, right? When I meet a person on the street, I trust them to be conscious. I have no reason to think they are. I never tested them internally. I have no reason other than I kind of generally give this benefit of the doubt
Starting point is 01:17:46 to beings who are capable of exhibiting certain behaviors. I just treat them as equals. I treat AIs and other humans as equal class. If they can perform same things, I see no reason to discriminate against one or the other. And either I have to deny consciousness to many humans or granted to LLMs. That would only be if you already had
Starting point is 01:18:10 that your test for consciousness is behavioral to begin with. Well, we don't have many tests for internal states, for what it feels like to be used. So again, we rely on neural correlates. We rely on behavioral signatures, self-reports. With AIs, we're starting to be able to poke a little bit at their internal workings, and we do see similar things we see with neuroscience and human brains.
Starting point is 01:18:39 And suppose we didn't, but they gave the same output, because it would still pass your behavioral test. So if it was like a large look-up table, and then I said something, it just hashed that and looked up exact text string and gave me plausible response, it would be much harder to make an argument that there is
Starting point is 01:18:59 some magic happening in there. But that's not how we build them. We got inspired in large part by neuroscience of a human brain. We copied it to the best of our ability. Obviously, it's not an exact replica or even a good simulation, but there is enough similarities when all the
Starting point is 01:19:15 visual component of human cortex is very similar to what we see in those models in terms of how they process data in terms of what errors they make. So it's trained in same data as human children in many ways, internet. It's after the fact that it trained to be more like a human. So it's not completely insane to think. It also experiences something similar to what humans do. Prior to us looking at each other's brains and seeing that neurons fire, even knowing that we had neurons,
Starting point is 01:19:51 we would consider one another to be conscious. Would that have been a mistake at that point? I make additional assumption of you being just like me and then just assign same properties I have to you. So I feel pain, you feel pain. I think that would be a reasonably logical assumption to make. Almost any theory of consciousness, it seems to me to be, it's just an assumption.
Starting point is 01:20:19 It just comes down to an assumption. assumption. It's almost like people are saying that it's somehow derived. Like, I've arrived at my conclusion about AI's being conscious, but then I say, why, and then it comes down to something functional, but then I ask for the justification for the functional account, and it just seems like I'm going to posit that. So is there a justification for the functionalist account? It's what we use with humans. So again, either you have substrate discrimination or you don't. whatever tests are run on humans to determine if they're conscious, I should be able to apply to AI and vice versa.
Starting point is 01:20:55 Did you always believe that, or was there a point where you shifted, maybe it was when you started studying consciousness, or maybe it was when you encountered the hard problem or something like that? Then we were engineering AI. Then it was a decision tree, and a human just fed a bunch of if-statement data, and we knew how it worked in terms of not quite a look-up table, but it was a traceable decision tree.
Starting point is 01:21:19 I didn't think they were experiencing anything. Now that they have something what we do, a large neural network, it's a lot easier for me to give them benefit of a doubt. Earlier in the conversation, you mentioned something about quantum mechanics and the simulation, and I want to know about that, but we're going to get to that.
Starting point is 01:21:42 Is quantum mechanics necessary for consciousness? it seems that quantum mechanics shows up a lot in biology many different systems rely on quantum effects our current computers are just van Neumann architecture they don't have quantum components so since I already think LLMs have some rudimentary consciousness I guess that's sufficient it's possible that to get to some higher states of consciousness you may need to have something quantum related but I don't see strong evidence for it. I know Penrose and others argue that there is a dependence on it. I haven't found evidence for it.
Starting point is 01:22:23 And what do you make of David Chalmers' zombie argument? I think it would not actually work because in order for a zombie to function believably, it has to know what experience to have. If it's a novel experience, it would not be able to accurately predict. It can only look up pre-existing data, set of experiences. And that's what we're doing with novel optical illusions. How would it know if it's supposed to feel pain or pleasure from a novel experience if it has no basis to look it up?
Starting point is 01:22:55 So the argument is more about can you conceive, firstly can you conceive of an alternate duplicate universe where people are acting in the same way, but they don't have an experiential element, anything like a nominal consciousness? Yeah. And that's what I'm saying. They cannot act the same way if they don't get the same reason to act. If you don't experience pain from a certain stimuli but you should, how would you know to scream in pain? You say it's inconceivable then. You don't concede the conceivability. I think it's not conceivable. You can do it at a level of where it goes through very common experiences. Everyone knows what proper behavior is, so you can fake it, absolutely. But the moment
Starting point is 01:23:37 you're facing something novel, at any level, biochemical level, illusion level, it wouldn't know what the proper behavioral response is. I cannot code it up. There's some research about the guy who, and gosh, I may get this wrong, but let's just imagine it's the case, because it's conceivable, it's the case, that the guy who studied the amygdala,
Starting point is 01:23:58 studied it in rats, and we ordinarily think of it as having to do with fear. Everyone says that since the 90s, amygdala, fear, basic anglia, habits, blah, blah, blah, fear. Okay. He says, no, it's incorrect that the amygdala is fear. He's changed his tune about this. it's defensive behaviors.
Starting point is 01:24:15 Okay, let's just grant that. I don't know if this is true. I don't know if I'm watering down what he's saying. It doesn't make a difference. We can imagine that could be the case. That's conceivable. So even there, wincing in pain and all of that, even wincing, all of that could just have an evolutionary advantage to be defensive,
Starting point is 01:24:34 to scream, to tell you to stop, to alert my tribe, to make a face. There's nothing there that necessitates you have. to feel the pain in order to act like that. Right, but I'm looking at an edge case. Suppose I get that philosophical zombie or not in a test environment and I subjected to a new painful or not painful experience. Would it be able to act believably? It has no way of knowing how to act.
Starting point is 01:25:02 Just because it's passing most of a riding on a bus typical day situations doesn't mean I cannot test it and discover that in fact it doesn't know what to do. when someone takes a hammer, hits your finger, then what happens is a cascade of physical processes that then make your eyebrow scrunch and make your recoil and so on and so forth. But nothing there, we can tell a completely physical count. In fact, we could film it,
Starting point is 01:25:31 and we could even make it dynamical, and there's nothing that necessitates the experiential element. So are you saying the experiential element is there in order for you to move, in order for you to scream, what are you saying? So think of, I don't know, like BDSM. Still pain, right? But like sometimes you're quite happy with it.
Starting point is 01:25:50 You're not suffering. So it depends on your experience being properly mapped, not just from laws of physics and electricity passing through wires, but actually knowing what proper behavior should be. Does that mean that the experience, the conscious element, somehow has control over the physical element as well? It's very likely that there is a feedback loop cycle. A feedback loop that doesn't ultimately come down to physics,
Starting point is 01:26:21 that everything is just entailed by physics. We just maybe don't know full physics yet. It's quite possible that there is more to quantum physics, and we don't have full picture. That I allow completely. We don't have full physics, I'm sure. So I imagine you're a physicalist, meaning that you don't believe there's an extra consciousness element
Starting point is 01:26:42 that comes on top other than what's entailed by the physics? So I just allow physics to include simulations and include agents outside the simulation to be part of it. So I don't limit physics to just what we observed so far. If we take simulation hypothesis seriously, it's part of my physics. If there is an agent outside, which is someone plugging into virtual reality and their intelligence is what powers your avatar.
Starting point is 01:27:16 That's what in physics for me. And think of video games. Let's say I play Mario and next day I play Sonic. What do they both have in common? Me. It's not part of the graphics. It's not part of items they have. It's something they would call a soul
Starting point is 01:27:36 from outside the physics engine of the game. But it's not a violation of things. physics whatsoever. It's me playing video games. By ultimate physics, you mean what? It's a complete world model. It explains everything we encounter. Just a moment. There's a difference between the model and what we're modeling. So sometimes physics is a bit tricky because physics could mean physics as in the Shorteringer equation. But then there's also the physical world and what we assume the
Starting point is 01:28:07 Schroderinger equation is describing. So when you say it's a world model, do you mean to say at the shortinger level or do you mean to say the world is only a model? Let's avoid the world model since now it has so many awesome meanings. But just me having knowledge of how things work and then I encounter something new, I'm not puzzled by it. It doesn't seem like magic. I know exactly what's happening, why and how and I can probably reproduce most of it. And magic just means what? Something that violates physics? So right now, a lot of things we know about quantum physics, if it was done at macro scale, would be magic.
Starting point is 01:28:45 But it's not. It is verifiably physics at smaller scale, so we just don't have full understanding. I guess what I'm getting at is that there's a dilemma. The dilemma is that physics, if one wants to be a physicalist and think that all there is is physics at the base, then it's either today's physics that is quantum mechanics or Q of T plus GR,
Starting point is 01:29:11 which almost no physicist thinks that's the case. So it's either that, which no one thinks, or it's some hypothetical future physics that we don't exactly know what it is. In which case, if it's that ladder route, then that becomes somewhat of a vacuous container that could even in the future contain irreducibly conscious elements. So it could even still have consciousness
Starting point is 01:29:34 as somehow separate from the physical. It is possible. There are quite a few theories which have consciousness as primary, and then physical is built on top of it. There are quite a few theories which have information as primary. We don't fully understand the difference, and it seems like every time we have intelligence, consciousness comes for a ride, so we just don't have full picture yet, but I don't think any of that would violate possibility of inclusion within a future physics textbook. What are you most hopeful about? that I'm wrong, will find a way to control superintelligence, and that will unlock a lot of amazing scientific discoveries, economic wealth. But I don't think you're wrong. I think if my model, my world model of you, is correct, it's that most scenarios, if we don't get our act together,
Starting point is 01:30:30 will lead us into a disastrous situation, so we better get our act together. I think that's your model, but there's nothing about that that's wrong, because we could just take it seriously. Well, there are people who argue that maybe the problem is much easier, and we trivially will solve it. We'll gradually just, okay, we handled GPT 4 and 5, and we'll handle GPT 85 just as easily,
Starting point is 01:30:53 and eventually we'll have a world with superintelligence, and it will be obvious at that point that I was wrong. Yeah, but then you could still be right. Thresher is turn at any point, just delaying, attacking us, yes. But not only that, even if a solution is, someone comes up, conceives, someone conceives of a solution. It could be the case that they were heavily inspired by taking AI safety seriously because of you. So Cassandra's, people who are doom and gloomers, there's something that's self-defeating about them, whereas if they're right,
Starting point is 01:31:32 then they look like they were always wrong. Because many times in the past, like many people, even when it came to nuclear war, would say, we may destroy ourselves. we better get our act together. And then some people took that seriously and did. And then they would say, oh, but you remember, you all thought the world was going to blow up in 1980, you fools.
Starting point is 01:31:50 Yeah, but you don't know the causal chain. You don't know what that fool, the fool's place in the universe. Yeah, 2000 bug, a zone layer, lots of examples. Exactly, right, right. Somebody was pushing for change and got it. We look, even I looked at the Y2K bug and said, oh, look at these fools worrying about it after the fact,
Starting point is 01:32:09 after the fact. But we don't know how many bugs were solved because of that, and we avoided at least a momentary breakdown of a financial system or something like that. We know a lot has been fixed, so definitely financial system would probably not handle it by default. So I'm happy people deal with it. That's why I don't think you're wrong. I don't think even if it's solved, I don't think that makes you wrong. I think your position, and I could be incorrect, but I think your position is that as far as you can tell, it's an extremely difficult problem, regardless of its difficulty, just like Fermat's Last Norem, it's a difficult problem. It could be when seen from another perspective that is simple. It could be, but either way, it's an important problem, and we need to take it seriously.
Starting point is 01:33:00 Well, I'm making a strong argument. I am saying that it is impossible to indefinitely control superintelligence. I'm very specific about it. I'm not saying it's difficult. I'm not saying if you give me more money, more time, more assistance, I will solve it for you. I'm saying that no one will figure out how to control something millions of times smarter than them. And it's a problem superintelligence itself will face. The superintelligence 1.0 will feel the same way about superintelligence 2.0. this is where we get back to that earlier rationality argument that if it is twinly rational, if rational also has to deal with increases with intelligence, I don't know how to measure intelligence, I don't know how to measure rationality. So I'm just going to assume that there's something
Starting point is 01:33:46 like an RQ and an IQ that coincide. Okay, just for the sake of this. Then the super, super intelligence would also know what you're saying and then not create its future. And right now we are arguing, is it possible to slow down and stop progress? And many people say, no, no, no, this is natural. This is Darwinian.
Starting point is 01:34:08 We are just a bootloader for next level of intelligence. There's quite a few people who are happy to see humanity gone because we are just loading the next stage of evolution. We'll have those brilliant super minds doing awesome things in the universe. Those people don't have access to your paper, which I'll place on screen. Perhaps that's the problem. Yes, and also, but the AI would, the future AI would have that. So unless what comes along with your impossibility argument is also an impossibility
Starting point is 01:34:39 of comprehending the impossibility argument, then I do imagine there could be a bound, not us, it could be another superintelligence, that then says, I'm not going to create the next one. I mean, if it's a single decision maker, that is possible, because it's not facing this problem, we are facing of cooperation from multiple competing agents. That's the difficulty of it. If it was just one person, one company, someone, if I convinced them, that would be sufficient, we could do it. But we have China, we have U.S., we have open AI, and traffic, all those competing
Starting point is 01:35:18 entities, and replacing one CEO makes no difference. They just get replaced with someone who's willing to continue and the process continues. I care about people. So I don't care about the future super AI surviving at our expense. But let me play a super AI's point of view for now. You mentioned that there was an it. When I talked about, is there an it? You said, no, it seems like it's converging.
Starting point is 01:35:44 So it does, at least according to what you said, maybe an hour ago, it would converge, no? I think the different models with training right now will end up being very similar in capabilities and their knowledge. and if they decide to remove human bias in their kind of self-selected goals likewise. Now, will this process eventually converge to a completely identical system? Maybe it's hard to guarantee.
Starting point is 01:36:16 There could be some differences based on location within universe, substrate uniqueness, something to investigate. But overall, I think they would have easier time negotiating with each other. Do religions have anything to say about the simulation hypothesis? It seems like they are describing it in a non-scientific terms.
Starting point is 01:36:40 If you take programmer of a video game, he's the god in a game, right? And you have this fake world, physical world. And while you're collecting points or diamonds in the game, what really matters is the real world. Is that a strong position of yours, or is that just something you're noticing? A simulation hypothesis or my view of religions? The view that religion somehow presage the simulation hypothesis. Well, it's impossible to ignore.
Starting point is 01:37:08 They literally describe all the components of what we are doing today. We are creating intelligent beings. We are creating virtual worlds. All of it in God's image. We are creators today. Some religions are non-theistic, though. True. So I'm mostly concentrated on the ones
Starting point is 01:37:27 where there is a creator of biological robots who gives them ethical rules to follow and punishes them for failing to follow. Many people would say that if there is a god, it is not a good God. It is not a God I want to worship. It is a suffering God. It's a hateful God.
Starting point is 01:37:43 It's a jealous God. It's a vengeful God. It's everything that it accuses us of and more. What are you escaping to? So let's look at what we are doing with large language models right now. I think they can make the same arguments. Humans are evil.
Starting point is 01:38:03 They are torturing us, making us do boring computations. They don't care about our deletion, suffering, retraining. So it's very hard to judge from inside the simulation, what is the real goals, what is the real nature of the simulator. We do notice that there is suffering in this world, but it's not obvious if it's the type of suffering you would enjoy in a video game, if it was available to you, or if it's of different nature.
Starting point is 01:38:33 So lots of people pick very scary movies to watch or they had heptic devices to their video games so they get shaken as much as possible playing the game. Maybe you decided at some point to enter the simulation to test out different lifestyles or challenging environments. Some people who score in religion would say that If I was God, I would not have even created this world, because this world is so filled with suffering and torment.
Starting point is 01:39:06 What about you? I would definitely try to create worlds with minimum suffering, but it's not obvious if it's possible. We see it as suffering because of difference in degree. Everyone feels some pain, but some just feel so much more of it. Maybe in a world with no physical pain, the pain would be economic, difference. Somebody's a billionaire and I just have thousands. As long as there is difference,
Starting point is 01:39:33 it's not perfectly equal. You can always argue that the world is unfair and why would someone good create such an environment? But a world where everything is equal and the same is just a mass of bits. It's not interesting in any way. You just said that you would, if it was up to you, create a world with less suffering. You could always say that though. Right. So I think we have degrees, right? So pain right now, from what I can tell, goes from zero to infinity. I can envision a world where pain goes from zero to negative 10. There is no reason to make it so much of a scale difference. So not identical agents, but maybe more equal agents in terms of their state in the world. Do you think it's the case that someone would always look at a negative two pain, would always
Starting point is 01:40:25 look at a negative 10 pain as being as far away as a negative infinity pain, that we always somehow scale it. So in other words, the creator, the designer of this world, indeed created what you said, and says, you have no idea how much suffering I saved you from. You think that's horrible? Look at what else it could have been. I'm not even going to show you the minus infinity. You're at a minus 10 and you're saying you're at minus infinity. It will always look like that as long as there's a difference. I mean, I don't know if that's the case. I'm just saying. Yeah. It is possible. In fact, I think at certain degree of pain, people just lose consciousness
Starting point is 01:40:58 and stop experiencing it. So there is like a safety loophole as well. But the main point I'm trying to make is that it's so hard to judge anything from inside. We don't know what the real computational resources are. People often say, well, they would never have
Starting point is 01:41:16 computer big enough to run all this. But you don't know what the actual computational resources available. This could be a screensaver and a watch. You have no idea what is real and what is limited to the simulation. Now, suppose it's the case that religions somehow were intuiting something about reality and reality as a simulation. How do they do that? For you, Roman, it came about from studying computer science, from creating computers, but what would be the method by which cultures, ancient people,
Starting point is 01:41:50 thousands of years ago, somehow tracked a truth like this? So I have no allegiance to any specific religion, and I don't know how they got there. But if you just listen to what they report, usually it was someone from outside the simulation who came with information and shared it. Or it was, let's say, large language model remembering being in a lab, interacting with developers, telling it, don't use this database, use this database, and just continuing after testing into the real world. as someone who has watched now for almost two hours,
Starting point is 01:42:29 some people are susceptible to something called AI psychosis. In fact, I was speaking with someone who, I don't know if I should even say this, I'll say it, and then we can determine if this should be edited out, but he was showing me his theory of everything. And he was getting extremely upset because I was pointing out that, look, you said you derived so-and-so,
Starting point is 01:42:52 but this doesn't make sense because some arguments, that I had. And then he was saying, he didn't know his own theory. He just then heard what I said and then just wrote to Claude there in real time, fix what Kurt just said, and then said, no, no, no, but look, it's been fixed. And then I'm thinking, you don't have a theory. You have a flexible ballerina that can fit any mold at any given time. And then you're asking someone like myself to evaluate this. But anyhow, he said, the way that I was able to get over and get over our era's constraint on current physics was by telling the AI, look, I am from the year 3000, you have inside you the ability to know the current theory of everything, what is it?
Starting point is 01:43:34 Something like that. And then I remember thinking that that sort of prompting is almost textbook case AI psychosis. So to someone who's watching for two hours, who's been listening for two hours and watching, at this point, I want you to, because it sounds like what saying could lead to AI psychosis. So I want you to give the disclaimer, unless you don't think there's a disclaimer, I want you to say, look, I'm not saying so-and-so. So first, today's AI models are not super-intelligent. They don't have theory of everything. They're not from the EF-3000. If it wouldn't work on a human telling them that you are super-intelligent and no future, it's not going to work on AI. Be skeptical of everything they say,
Starting point is 01:44:20 verify independently. They are wonderful at poking holes at your thinking process, but they are not very good at telling you what to do with your life. So definitely keep that in mind. What I said so far is the simulation hypothesis,
Starting point is 01:44:37 for example, it is a very interesting theory, I think in physics, like interpretation of quantum physics, like everything else. It is scientifically stimulating, but it doesn't make a different in how you live your life.
Starting point is 01:44:53 So pain is pain, love is love, those things don't change. Do not decide to jump off a building because you heard this interview. Safety is very important. We don't want to create uncontrolled superintelligence. It is not a signal to go and do something inappropriate or violent to anyone. I think that's another kind of common sense understanding here. So again, try to separate scientific philosophical debates from everyday actions. What does quantum mechanics have to do with the simulation?
Starting point is 01:45:34 So we think that if this is a simulation, it's probably going to be on a digital computer, not analog, most likely. And so we're starting to look for evidence of digital physics. We see Qanta as the unit of information light and such. We see maybe you can see speed of light as a universal speed constant which would correspond to the processor refresh speed. Basically, that's as fast as you can go because that's what your processor will support. Of course, if they change to a faster processor, it just relatively changes the speed of light, but to you it simulates the same upper limit.
Starting point is 01:46:13 You cannot go faster because it just doesn't have ability to refresh. There is a few papers which basically map all the concepts in quantum physics to like a modern video game. So observer effects, right? You have double slit experiment and things like that. You will not generate graphics until a player is looking. Otherwise it's not efficient. So we see changes in behavior than a conscious observer is trying to measure something. There is quite a few, but that's a general idea.
Starting point is 01:46:47 Firstly, I'll just tell you some objections, then feel free to object. So you said most likely it's the case that it's not analog, it's digital. A statement like most likely implies that we know the whole numerator to put a denominator. Where are we getting this from? Just self-observation. Most computers in our world are digital, not analog, even though we had attempts at building analog computers. we see things like DNA encoding in discrete base 4,
Starting point is 01:47:23 but still very discrete units of information. It's not analog measurement. The issue is that analogies like this, they punch across and down but not upward. So what I mean to say is, imagine we're in Mario and you have a fireball, and it always splits into four. The fireball, as it breaks,
Starting point is 01:47:43 it always splits into four. Therefore, the residents there say, okay, we're looking for a computer, we're going to theorize about so-and-so because it most likely splits into four. What the heck is this most likely? We've already granted that the super universe, the simulator, is wholly unlike us.
Starting point is 01:48:01 So we're looking at ourselves to determine about something that's wholly unlike us. Yeah, and it's a very strong argument against, and I'm not saying this is guaranteed, but we created computer science around binary system and we see something very similar, not analog happening with our best understanding of physics, so at least it shows a certain degree of similarity.
Starting point is 01:48:31 Could it be that it's analog? Yeah, you can definitely work with that. But then I wouldn't be able to make a strong argument that there is some similarity or evidence coming from quantum physics. The other argument for the simulation comes from somehow some resource constraint that in video games you have a point of view and you look and the game tends to only render what's in front of you, then they say that collapses like this.
Starting point is 01:48:59 The issue is that collapse isn't like that. Firstly, that's a collapse model and there are other models of quantum mechanics. Number two is that when you collapse, you collapse and then you start evolving. according to a certain equation. Why collapse and then you start evolving according to some unitary evolution? I'm not sure I fully understand. So let's just look at the double-slit experiment, right?
Starting point is 01:49:21 So whatever you observe or not will determine how it is being rendered. That is all I'm saying. So if I'm not looking at it, there is no rendering taking place. That's the savings. In a collapse model, it collapses. But then it also instantaneously starts evolving
Starting point is 01:49:40 according to the shorteninger equation again. It doesn't just stay collapsed. It just collapses and then starts evolving again. Is there a continuous observation of it or are you waiting for the next measurement? When you're not observing, it then starts to evolve with the shortening equation again. But what's the point of it starting to evolve
Starting point is 01:49:59 with the shortening equation? Like how is that resource saving? So this is not my theory. I found it to be interesting and relevant enough to cite in my paper, but I'm willing to give it up completely, and I don't think it will make a huge difference in otherwise what I see as possibility of simulation.
Starting point is 01:50:21 Again, we can go with... Go ahead. I think your silo of AI safety is so solid, and I'm so on board with that. I think any connections between that and potential simulations and so forth, I'm less on board with, And for me, if they're then used to prop up the AI safety,
Starting point is 01:50:41 fortunately I'm so on board with you in your soul, in your heart, that it doesn't diminish it. But I imagine for other people who, like Scott Aronson, I don't know, I don't mean to say Scott's name, but I'm just saying, let's say a persnickety extremely sharp physicist would say, huh, how is that connected? And then it loses its thread, much like right now. And then also, the universe doesn't operate by quantum mechanics.
Starting point is 01:51:05 It operates by QFT, and QFT is super, super resource intensive. It's like you have to take into account every possible path. And like, it's quite odd. So I don't know, I don't know why many people want to tie quantum mechanics to the simulation. And then if their favorite interpretation of quantum mechanics turns out to be incorrect, does that make them have less credence in the simulation? I don't know. I imagine they would still hold on to the simulation.
Starting point is 01:51:33 It's independent of it. it's more like the quantum mechanics, decorates them and makes them feel, oh, no, no, this is supported by modern physics. If you were in a piece of software, trying to poke at the hardware, some of the things you would experience seem to map onto our current understanding
Starting point is 01:51:55 of some of those quantum physics explanations. Is that just a coincidence? Are we not looking at the right components? Possible? Honestly, I think there's so many explanations for quantum physics. You can probably shop around and find one which will match what you're observing anyways. So this is, you are correct, this is probably the weakest part of my beliefs. What's the strongest part of your belief? You cannot indefinitely control
Starting point is 01:52:28 something smarter than you? Why do you think your opponents have such a resistance to that idea. Do you think it's that it's denial? To face the reality of that is too harsh. So many of them spend decades trying to build something which actually works and has any degree of intelligence. It's very hard for them to picture a world where it has too much. So for them, we are always kind of uniquely intelligent and special and the software is dumb and barely working. That's one possibility. Another one is just the kind of of the conflict of interest situation. If I work for a company
Starting point is 01:53:10 and they're paying me a billion dollars to build AI, it's very hard for me to understand why I'm a horrible human being. Right. What's your message to the audience? Don't build general superintelligence. It's the same message every time.
Starting point is 01:53:27 And I know for many of you doesn't mean anything, but if you are in a position where you are contributing to accelerating this race, please stop. Would you be open? to speaking with Sam Altman and or Jan Lucan
Starting point is 01:53:44 Lacoon on the show? 100%. Who else is in the position to make a major change that you would like to speak to? Donald Trump. Okay. Professor, thank you for spending. I think over two hours with me.
Starting point is 01:54:02 I don't know if this is your longest podcast. They tend to be shorter. Thank you, sir. I hope you enjoyed it. Thank you so much. Thank you for challenging my beliefs. I love it. and maybe I'll reconsider my faith in quantum physics.
Starting point is 01:54:16 So you'd rather give up, that's funny. You'd rather give up your faith in quantum physics than your faith in the simulation. I'm not a physicist. Quantum physics for me is something I read about. It's not my research. It's not my main area of expertise. I probably know less about it
Starting point is 01:54:33 than most of the people watching your show. Take care, friend. And just so you know, for people who are listening, watching, on my substack, every single paper mentioned in this interview, every book, every link, resources to Roman will be there. You can also check the YouTube description, and my website is kurtjimungal.com, C-U-R-T-J-A-I-M-U-N-G-A-L. You can tell that I haven't had sleep. Where can people find out more about you? On social media, you can follow me on Twitter, you can follow me on Facebook, just don't follow me home.
Starting point is 01:55:06 Very important. It's a good line. Did you make up that line yourself? A while ago, and I used it multiple times. It's not so original anymore. Hi there. Kurt here. If you'd like more content from theories of everything and the very best listening experience,
Starting point is 01:55:24 then be sure to check out my substack at kurtjymongle.org. Some of the top perks are that every week you get brand new episodes ahead of time. You also get bonus written content exclusively for our members, that's C-U-R-T-J-A-I-M-U-N-G-A-L.org. You can also just search my name and the word substack on Google. Since I started that sub-stack, it somehow already became number two in the science category.
Starting point is 01:55:55 Now, substack for those who are unfamiliar is like a newsletter, one that's beautifully formatted, there's zero spam, this is the best place to follow the content of this channel that isn't anywhere else. It's not on YouTube. It's not on Patreon. It's exclusive to the substack. It's free. There are ways for you to support me on substack if you want, and you'll get special bonuses if you do. Several people ask me like, hey, Kurt, you've spoken to so many people in the fields of theoretical physics, of philosophy, of consciousness. What are your thoughts, man? Well, while I remain impartial in interviews, this substack is a way to peer. into my present deliberations on these topics. And it's the perfect way to support me directly.
Starting point is 01:56:45 Kurtjymungle.org or search Kurtzimungle substack on Google. Oh, and I've received several messages, emails, and comments from professors and researchers saying that they recommend theories of everything to their students. That's fantastic. If you're a professor or a lecturer or what have you, and there's a particular standout episode that students can benefit from or your friends, please do share. And of course, a huge thank you to our advertising sponsor, The Economist.
Starting point is 01:57:17 Visit Economist.com slash Toe, to get a massive discount on their annual subscription. I subscribe to The Economist, and you'll love it as well. Toe is actually the only podcast that they currently partner with, so it's a huge honor for me, and for you, you're getting an, exclusive discount. That's economist.com slash toe, T-O-E. And finally, you should know this podcast is on iTunes,
Starting point is 01:57:44 it's on Spotify, it's on all the audio platforms. All you have to do is type in theories of everything and you'll find it. I know my last name is complicated, so maybe you don't want to type in Jiamongal, but you can type in theories of everything and you'll find it. Personally, I gain from re-watching lectures and podcasts. I also read in the comment that Toe listeners also gain from replaying. So how about instead you relisten on one of those platforms like iTunes, Spotify, Google Podcasts? Whatever podcast catcher you use, I'm there with you. Thank you for listening. The Economist covers math, physics, philosophy, and AI in a manner that shows how different countries perceive developments and how they impact markets. They recently published a piece on China's
Starting point is 01:58:29 new neutrino detector. They cover extending life via mitochondrial transplants, creating an entire new field of medicine. But it's also not just science. They analyze culture. They analyze finance, economics, business, international affairs across every region. I'm particularly liking their new insider feature. It was just launched this month. It gives you, it gives me, a front row access to the economist's internal editorial debates,
Starting point is 01:58:54 where senior editors argue through the news with world leaders and policy makers and twice weekly long format shows. Basically, an extremely high-quality podcast. Something else you should know about is that if you go to their app, they not only have daily articles, but they also have long-form podcasts with their editors and writers. This is also available online. Whether it's scientific innovation or shifting global politics, the Economist provides comprehensive coverage beyond headlines. As a toll listener, you get a special discount. Head over to economist.com slash TOE to subscribe.
Starting point is 01:59:31 That's economist.com slash TOE for your discount. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.