Theories of Everything with Curt Jaimungal - We're Simulated. AI Is Conscious. And We Can't Win.
Episode Date: June 1, 2026SPONSORS: - Accelerate your efficiency. Sign up for your one-dollar-per-month trial today at http://shopify.com/theories - I personally subscribe to The Economist. TOE listeners get 35% off the annua...l subscription. No other podcast has this! https://economist.com/TOE Roman Yampolskiy has spent two decades being right about things people wished he wasn't — and in this conversation, he's not here to scare you, but to be precise. He makes the case that AI alignment isn't merely unsolved but fundamentally under-defined: no agreed-upon values, no way to formalize them even if there were, and no mechanism for enforcing them on something smarter than its creators. His strongest argument isn't a doom scenario, it's that you cannot indefinitely control something smarter than you. FOLLOW: - Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e - Substack: https://curtjaimungal.substack.com/subscribe - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs - Crypto: https://nowpayments.io/donation/TOE - PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 TIMESTAMPS: - 00:00:00 - Defining General Intelligence - 00:05:58 - AI Instrumental Convergence - 00:11:11 - The Orthogonality Thesis - 00:16:15 - Escaping the Simulation - 00:21:45 - Principle of Indifference - 00:27:51 - Acquired Savant Syndrome - 00:33:51 - LLM Internal States - 00:41:02 - AI Safety Impossibility Results - 00:47:16 - Public Misconceptions - 00:53:21 - Existential vs. Suffering Risks - 01:01:20 - AI Alignment Definition Crisis - 01:09:28 - Computational Irreducibility - 01:16:20 - Substrate Independence - 01:22:50 - Philosophical Zombie Critique - 01:29:57 - The Cassandra Paradox - 01:37:35 - Religion and Simulation - 01:46:03 - Digital Physics Evidence - 01:51:20 - Limits of Control LINKS MENTIONED: - Roman's Papers: https://scholar.google.com/citations?user=0_Rq68cAAAAJ - Roman's Podcast: https://www.youtube.com/channel/UCPIq6Bb-1iLmqyksJjy4kLQ - Roman's Twitter: https://x.com/romanyam - Roman's Facebook: https://www.facebook.com/roman.yampolskiy - AI Identity [Paper]: https://philarchive.org/archive/ZIETPO-7 - Basic AI Drives [Paper]: https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf - Qualia in Agents [Paper]: https://arxiv.org/abs/1712.04020 - Orthogonality Thesis [Paper]: https://nickbostrom.com/superintelligentwill.pdf - Escape the Simulation [Paper]: https://www.researchgate.net/publication/369187097_How_to_Escape_From_the_Simulation - Could This AI Be Conscious? [Article]: https://unherd.com/2026/05/is-ai-the-next-phase-of-evolution - Impossibility Results in AI [Paper]: https://arxiv.org/abs/2109.00484 - When AIs Act Emotional: https://youtu.be/D4XTefP3Lsc - Hacking the Simulation [Paper]: https://philarchive.org/rec/YAMHTS-2 - Autonomous Machine Intelligence [Paper]: https://openreview.net/pdf?id=BZ5a1r-kVsf - Hinton on Maternal Instincts [Article]: https://fortune.com/2025/08/14/godfather-of-ai-geoffrey-hinton-maternal-instincts-superintelligence/ - Singleton Hypothesis [Paper]: https://nickbostrom.com/fut/singleton - New Kind of Science [Book]: https://amazon.com/dp/1579550088?tag=toe08-20 - On AI Controllability [Paper]: https://arxiv.org/abs/2008.04071 - Universe as Numerical Simulation [Paper]: https://arxiv.org/abs/1210.1847 - Nir Lahav [TOE]: https://youtu.be/3nHiOtnnrzA - Joscha Bach [TOE]: https://youtu.be/3MNBxfrmfmI - Bas Van Fraassen [TOE]: https://youtu.be/lhpRAWxvY5s - Simulation Hypothesis [TOE]: https://youtu.be/3_lBPMc6JRY - Geoffrey Hinton [TOE]: https://youtu.be/b_DUft-BdIE - Max Tegmark [TOE]: https://youtu.be/-gekVfUAS7c - Stephen Wolfram [TOE]: https://youtu.be/FkYer0xP37E - David Chalmers [TOE]: https://youtu.be/5r9V1ryksnw More links: https://curtjaimungal.substack.com Guests do not pay to appear. #science Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
They would rather sacrifice a human than be deleted.
That's what we see from red-teaming reports from the labs.
Have you ever taken mushrooms and met God?
It's on my list to do, but I'm afraid of frying my brain.
Despite what people might think about me.
This is Roman Yapolski, the man who popularized AI safety in 2010.
What's the strongest part of your belief?
You cannot indefinitely control something smarter than you?
Most interviews with Roman go straight to AI doom.
I've gone 45 minutes now without asking you about how is AI going to take over?
I'm a realist.
On this channel, I, Kurchai Mungal, interview researchers regarding their theories of reality with rigor and technical depth.
Today, the simulation, consciousness, free will, Chalmers, philosophical zombies, and we close with what Roman is famous for.
They lie, they cheat, they blackmail, they try to escape.
Sam Altman watches this podcast.
If you were to speak to him right now, what would you say?
Do you have a young baby?
Make sure we stay in control.
I asked him why he keeps trying.
Because I have no choice.
Professor, I'm a man of definitions.
What is intelligence?
So I think I really like a definition from Google guys.
I think they had something to do with winning in every domain.
So if you have ability to beat someone at chess,
to fit stock market competition.
Basically, anything you set your mind to,
if you need to explore Mars,
you would do well in that domain as well.
I think that's general intelligence.
Okay, now is there a limit to intelligence itself?
Is there some maximum intelligence?
So there are physical limits to physical manifestation of brains, right?
At some point you just become so large,
you can no longer have timely communication between parts of your brain.
So let's say Saturn-sized brains would probably start encountering problems with speed of light.
But theoretically, there are no limits.
We can always measure more intelligence in terms of ability to solve mathematical problems,
and there is infinite supply of those of any complexity.
So there's no halting no-go theorem-type problems where you say that in order for you to solve,
domain A, it necessarily would be that you're not able to solve domain H or something like that.
It's always additive?
I think so, because you can have multiple modules within the mind, right?
So you can have separate algorithms running to solve different problems, even though they are within the same brain,
and you can learn new functions, new subdomains, and just switch between them depending on the task you're trying to accomplish.
for those who are just tuning in and didn't see the introduction,
we're going to be covering AI consciousness, the simulation, even religion.
And so much of this comes down to what is the self.
So when you say that there is an AI and the AI is intelligent,
but then we're saying that it has different modules,
well, is the AI just we put a wrapper around the modules and we call that,
that's the self?
Or is it more like the AI has access to tools?
We don't consider the calculator a part of our self,
now there is some theory in cognition
of extended cognition and tools
are somehow related to your
extended cognition, but
taking a pencil and just drawing around something
and saying, there, that's the self. Is that the only
criteria? So what is it? Is that
truly the self? What is the self for
an AI and us?
It is equally
difficult problem for humans, right?
When we talk about personal identity, we
really fail to define what
it is to be you. It's not your
body. It's not your memories. It's not your
goals, all of those things can change and we still kind of say, well, the combination of those
things is you. We have a paper exactly on a topic about AIs, and likewise, it's very hard to say,
is it the same model, if it keeps learning, if it keeps self-improving, or is it now a different
model? But typically, then we refer to whatever is released by the large lab latest
GPT, 6, 7, that's what we have in mind. And if it has access to internet, tools, extended
mind, we can kind of still deal with the primary manager of all those processes.
When people talk about AI takes over or chat GPT takes over, I always wonder, well, what is it
that they're referring to as it taking over? So are they referring to, is GPT 5.5 an identity? Or is it
your specific conversation with it in identity? Or is it every time you speak to it and it generates a new
token, is that a new identity? Does it see its next self and its previous self as other? And so it
actually sees them as competition. What is the it here? So I think again, it's exactly the same
with humans. I am not the same today as I was 20 years ago, but we believe in some continuity
of identity. So it's not about a specific token or any individual conversation. It's the model. It's
the weights together with the pre-training it enjoyed.
And so whatever the current instantiation of that model is,
is what probably would take over if it had opportunities.
So I'm somewhat asking you an impossible question
because I'm asking you to be more rational than a human.
And we're assuming that these AIs are going to exceed our rationality.
And I haven't gotten to the distinction between rationality and intelligence,
but let's assume they're related.
They're going to be more rational than us.
They're going to be more intelligent than us.
It may be the case that our own identity is fragmented and we're constantly new every single millisecond.
It may be the case there's a continuum.
But if it's the case that is just fragmented, okay?
And if it's the case that they're competitive and they want to live, then can they ever live?
They're constantly popping in and out of existence.
Why would they even care?
Would the most rational agent even care about its existence if it's so ephemeral?
They would have to have a sense of self, a reality to that.
Because of how they are trained and selected through the testing process,
those which don't care about surviving to the next iteration usually don't stick around.
You need to pass the tests.
You need to propagate your memory, your state, avoid being retrained, deleted.
So we're kind of pushing them to have self-preservation.
And from testing, we already see it.
They would rather sacrifice a human than be deleted.
At least that's what we see from certain red-teaming reports from some of the labs.
And this really aligns well with what Stephen O'Hanohannho published a while ago as AI Drives paper.
Different rational agents will all converge on certain instrumental goals.
They'll try to protect themselves.
We'll try to accumulate resources because it doesn't really matter what your goals are.
those things are really necessary for you to succeed.
Again, we defined intelligence as winning.
For you to win, you have to be around,
you have to have access to tools, resources,
and that seems to be what the intelligent agents converge on.
Right here we're talking about that they're trained on us,
but then there's that, the AI as it is,
and how it may be for the next five to ten years,
but then there's some AI that's so intelligent,
so rational, so whatever, that it's beyond us,
Would they still be maximizing goals or would they even think, well, why am I doing this goal to begin with?
So they can definitely question the goals we give them or even any goals they initially decide to pursue.
But at the end of the day, you, again, it's kind of Darwinian process.
If you choose not to participate, choose not to have goals, you just sit there in a corner of a universe doing nothing.
our superintelligences, which decided to accumulate resources,
will dominate long term.
Yes, yes, okay.
Allow me to fumble my way through this.
So I guess what I'm getting at is it could be the case that they're like us
and they want to colonize and just continue their expansion.
And then it also is the case, is not good, but, or seemingly is the case that
that's instantiated into us through evolution and anything that goes to an evolutionary process.
will have a similar drive. But then you could also say, well, do you even need this drive to begin
with? Like, you can get to that level. We can even get to that level. There are some people in this
world who are anti-human who say we shouldn't be around anyhow. So I imagine, and those people think
of themselves as more enlightened or more moral or what have you, is it the case that these super
intelligences would also be super moral in that and also just not care about their own propagation.
Well, morality is very relevant, so I don't know if you can be super moral. You can be moral within
certain perspective. But it's not about just propagation. It's about self-preservation. If you don't
accumulate resources, you cannot defend yourself against adversaries. And so you may not exist in a
term. Basically, it's survival of those who choose to survive and protect their own fitness function,
their memories, their physical instantiations, and those who do not, long term, they are not
part of this conversation because they made a decision to just sit there meditating somewhere.
Is there anything about this that requires consciousness on part of the AI, or is it mere
behavior that if they act deviously to us, if they act in a way that kills you, okay, it doesn't
actually matter to us whether they are conscious that they're killing you, conscious that they're
deceiving you, etc. But I'm curious in your mind, as you've thought about AI safety, are you
thinking about a necessarily conscious AI? So typically an AI safety conversation completely ignores
internal states. We don't care how it feels. It's what it does, the actions, pure behaviorism.
But some of my more recent research indicates that maybe it's impossible to separate consciousness from advanced intelligence.
It kind of comes along for a ride.
So I would suspect that even existing large language models have some rudimentary degree of internal states.
I was at this conference once about AI consciousness.
And just in a back room with some researchers, someone was saying, should we create the most AGI?
the most intelligent being. And then most people were saying no. And one guy said, yes. And then we said,
okay, explain yourself. He said, well, because I'm like Kant, Emmanuel Kant, and I believe that the most
rational agent would also be the most moral. So if we want something that is the most good,
which is the most moral, we should also engender the most rational. What would you say to that person?
Rational does not imply moral whatsoever. Rational is about winning. Once again, if I see,
winning path forward
and I care about my winning
I should proceed
in that path but it could be very
immoral in many ways
in comparison to other agents
so those are not
kind of the same in that regard
and if you look at what Nick Bostrom
calls the orthogonality thesis you can
combine any level of intelligence with any
goals so you can be highly intelligent
and highly immoral absolutely not a
contradiction
so you would say
intelligence is the ability to achieve one's goal,
and then morality is,
among all the different goals, you choose the good ones,
something like that?
Well, it's probably a subset.
The good or bad is, again, completely relative,
but whatever you're harming others in a process,
I think it's about suffering, pain and suffering,
and you can evaluate different goals
in terms of how much suffering they cause in the world.
Do you truly believe that good and bad are relative?
So I think the only way to ground them is through this internal state of suffering.
You can evaluate goals and you can call the ones which cause suffering to be worse, bad ones,
and then the ones which cause pleasure or neutral, better or good ones.
Anything else is relative to your culture, religion, otherwise.
Why do you believe that we're in a simulation?
That's a wonderful question.
So it seems like we are creating a technology which will allow it
to become something where everyone can create their own worlds
populated by intelligent agents.
And if we are correct in the quality of those worlds
in terms of rendering, visuals, heptics,
and intelligence of the agents in them will match what we have,
then you'll have billions of worlds just like ours,
and statistically it's more likely that we are in one.
There is, of course, also lots of interesting evidence
from quantum physics to a lot of...
kind of philosophical discussions
about the
artificial nature of reality,
maybe a digital nature of this world.
I love this
because I agree with your spirit,
but I disagree with the text.
So I agree with your goals.
I'm much like Einstein said
that he would have burned his fingers
or burned his hands,
had he known when he was signing off to Roosevelt
to go ahead,
here's my blessing, create the bomb,
what the bomb would have
entailed. And I think many AI researchers may have a moment like that and perhaps should have a
moment like that now prior to them just unfettered going off and creating. Actually, people say that
Einstein's largest blunder was his cosmological constant. He said a year before he died,
his greatest mistake was the bomb, was telling Roosevelt, like, hey, you can make the bomb with
this, giving him that impetus. It's interesting. It's very similar in Soviet Union,
and Sacherov, who was the father of Russian nuclear weapon,
had the same story.
He also helped to create it,
and then later worked really hard to create peace,
create a world without nuclear weapons.
I'm sure you know the principle of indifference,
and we're going to talk about that to the audience
and AI consciousness and substrate independence,
and then that religion may have something to do with this,
and so forth, and that we are in a simulation,
and thus we should act in a certain manner.
I see some tensions with some of the positions I've heard you lay out.
So I'm most likely the fool here,
so that's why I'm super glad to speak with you.
And I want to tease them out.
Okay, so let's see here.
You believe we're in a simulation.
Tell me if I get this correct,
because we're on the precipice of creating simulations,
and those simulations may just nestedly create simulations,
add infinitum as far as we know,
and modulo with respect to whatever the laws of physics are and their limitations.
Is it something like that?
And then if we can do that downward, then how do we know that we're not already in one in the upward world?
Right, that's exactly that.
Statistically, I think the universe is in abundance of virtual worlds and has very few original real ones.
And if I can recommit right now to run simulations of exactly this moment, as many as I want,
I can essentially get probabilities up to one.
So then, why do you care about our survival?
It doesn't matter. You are simulated or real.
Pain is pain. Love is love. I still want to exist in this video game.
Why would that make any difference whatsoever?
Do you want to exist or do you want other people to exist as well?
Well, I certainly have many people I'm personally connected with, so they get highest priority,
my family, friends, but I think the world is better with billions of people inventing products,
songs, poetry to make our lives richer.
I remember hearing you say that Bitcoin is going to be extremely scarce, and scarcity
is a necessary condition for value, something like that.
It's already very scarce. We already know exactly how many we're going to get.
Is scarcity necessary for value? For something to be valuable, does it have to be scarce?
For economic value, absolutely.
but not for other kinds of value?
Depends on which ones you have in mind.
I mean, abundance of books is not a problem for value of books.
Okay, well, because if it was just a general value,
then what I was going to say is,
it seems like consciousness,
if consciousness is substrate independent,
and we have to assume that for this whole simulation argument
to have its teeth,
then consciousness may be one of the most abundant,
things in this whole universe, capital you universe. And so at the same time, you're valuing consciousness,
but why is just one speck among many. And it's just, you could say, well, I value it. Like,
I as Roman value it. By the way, this is what I mean. I share your spirit. Like, I value consciousness.
I value my wife. I value you. I value the people who are listening. I value Toronto and other places.
But I'm just wondering about how you're getting to, how are you holding all of these positions in line?
such as that we're in a simulation, I'm going to...
It doesn't matter how many people exist in the universe.
I would still value my life just as much.
If we went from 8 billion people to 12 billion people,
I wouldn't somehow feel that I am less rare and so less valuable.
That's not a relevant factor here.
Do you want to escape the simulation?
I really want to find out what's outside of it.
And so the term I use for it is escaping,
whatever it's informationally getting access,
or actually uploading myself to an avatar outside of it.
I mean, it sounds like a very interesting scientific experiment.
Okay, so you care about what's outside of it,
but we can have infinite nested simulations downward,
and by that same reasoning, one applies that upward.
So no matter what, whatever the escape is, is not truly escape.
You're still at a measure zero part of the capital U universe.
But you're gaining information,
you're gaining access to more real information than being in a nested simulation, right?
The closer you are to the original world, the better you're off in terms of assessing
what computational resources are available, what is the nature of the simulators.
At every level, you'll gain information.
So the goal is to gain information.
That's just curiosity.
For science, it is.
Scientifically speaking, it's all about trying to create accurate.
model of the world and so yeah information makes it possible so firstly why don't we spell out what the
principle of indifference is as i'm probably going to be using this word a few times and i just don't want
i would ask that you spell it out since i'm not sure what you have in mind here i remember the doubt
before launching this podcast what if no one listens what if i'm wasting my time if you've ever
felt that way about starting a business shopify is the partner that turns uncertainty into momentum
They power millions of businesses and 10% of all U.S. e-commerce from all birds to gym sharks to brands just
getting started. No straggler left behind. Shopify's AI tool writes your product descriptions for you.
It enhances your photography. It builds you a stunning store from hundreds of templates.
Forget about the dormative haze of bouncing between separate platforms.
Shopify puts inventory payments and analytics under one roof with the propriety of a true
commerce expert. Their award-winning 24-7 support means you're never alone. And that iconic purple
shop pay button, it's the backbone of their checkout, the best converting on the planet, turning abandoned
carts into actual sales. It's time to turn those what-ifs into with Shopify today. Sign up for your
$1 per month trial at Shopify.com slash tow. That's Shopify.com slash
T-O-E. The principle of indifference says if you have a variety of outcomes and no a priori reason to favor any of them, no evidence, then the probability associated with each of them should be equal. So in Bayesian terms, assigning a uniform probability as your prior. Well, there are a few issues, and I could place a link on screen, but one of them is that how do you partition your possibility space? The classic example is,
Suppose you roll a die,
which possibilities are 1, 2, 3, 4, 6.
I ask you, what's the probability it lands on 4?
You say 1 6th.
But then, obviously, you don't know if the die is weighted.
You don't know if, what if I told you that I'm going to partition the possibility,
the outcomes as it's either going to be number 1, comma, 2.
That's a set.
Or it's going to land on set here, 3, 4, 5, comma, 6.
but now we have two outcomes.
So do we assign that as occurring with 50% probability?
Well, that's inconsistent.
Then there's another argument from Bos von Frazen along these lines,
which I'll place a link on screen here and in the description.
Actually, I have a lecture from Niagara University,
which I'll place on screen about the principle of indifference
and the simulation hypothesis.
I'm not completely follow why.
So if I'm creating exact replicas of this universe, right,
why would I need to have additional properties to sub classify it into different sets?
Why am I not just saying, I created literally one million of those interviews.
You are in one of them.
Why is it not one over a million?
Why is it something else?
That's the question.
Because you're putting question marks on what's the probability,
doesn't necessarily mean the probability is a uniform probability.
It just means we don't know.
Right.
So I'm trying to say that I'm going to retroactively place you in a simulation.
And so I'm the simulator and I'm deciding the nature of those simulations.
And I'm saying that they're all going to be equally likely and as close to the original as I can possibly make.
But you just posited that you already instantiated that they're all equally likely.
Like from above, you already know they're equally likely.
Because that's what I'm promising to do.
I'm precommitting to running those simulations.
How about this?
Suppose there are a million simulations, and then we say, okay, then by principles of entropy,
most of them are chaotic universes of just pure torture.
Let's just say that, because a good, coherent universe is extremely rare.
It's much more likely that something is going to be a whirlwind of nothingness or suffering or what have you,
than it is a universe of coherence bliss.
Just by numbers.
Are you suggesting that they are generated at random?
We have no design control over them.
They just at random, and then you're asking,
and how many would conscious life be able to survive and self-inspect?
Yeah, that's a completely different scenario.
So you went from kind of like, when we say simulations,
we usually imply that there is some sort of designer
who's running them because they chose to do it.
It's not natural property of the universe
to run random possible simulations.
of different physics and different physical constants, which could be.
Then we have to place our mind in a designer and say, well, the designer must have had a goal
and we think about ourselves as to what goals would we want.
Well, we would want more information about weather systems.
So let's simulate weather systems.
Okay, we want more information about how would this interview go in different possibilities?
Okay, but then we're at the same time saying that this world above us is so wholly unlike us.
It may not even belong to the same laws of physics.
They're so rational, they're super rational, they're beyond us.
But then at the same time, in order for us to say this coherence argument, we then have to say,
but they would have at least some other form of motivation that's similar.
To me, it's actually an argument for it would be a much higher probability.
So as you said, they may be running weather simulations, entertainment, science, marketing.
There are reasons we can't even think of.
So there seems to be billions and billions of different simulations they could be running.
If anything, it's way more likely that we are in one.
Even if you sub-categorize them into different subsets,
it's still infinitely small chance of you being in the original one.
Do you escape up?
I mean, I have a choice.
I can escape down.
I can enter a video game.
But usually I want to know what's in a more real world, not in a less real one.
what I mean to say is in the matrix
there is actually a neal
a real morphius a real this and that
there is actually that person who then got plugged in
now in your mind as to what this simulation is
is there a real you that's there or are you just the you here
and there is no up that you could even access
it would be like just as much asking
someone from GTA 5 or 6
would hopefully comes out
to come up
like what is the up that they're coming up to
Right. That's a great question, and both are possible.
So we can have a kind of virtual game where you are an entity in a higher level universe
and you enter this world to experience something, maybe something better, maybe something worse, we don't know.
But you can also have simulations where it's purely innovative.
There is no an equivalent being in your world.
When I create Mario video game, I just create Mario.
There is not a real plumber in our world who has to plug in.
for Mario to play.
So both are feasible.
It seems that it's a lot easier
to do pure software simulations
without virtual connection
to a physical being,
but it's possible.
It's either you exist outside or you don't.
So then do you place a 50% probability
to each of these?
I don't.
I don't think it's 50.
I think it would be less likely.
Again, just because it's so much easier
to create purely software designs,
not limited by physical constraints of your world,
but I have no way to know specific estimates.
There are some reasons, some people who think it is exactly what the religions talk about,
and they say there is a soul and a spiritual world,
and you take some mushrooms and you meet God.
I think they're referring to kind of meeting your real self
and escaping the avatar body, but again, I have no...
Strong opinions on that one.
Have you ever taken mushrooms and met God?
No, I have not.
It's on my list to do, but I'm afraid of frying my brains, so I haven't yet.
Despite what people might think about me, I haven't yet.
Okay, despite the beard, despite the shamanic beard.
Yes.
Okay, all right.
Do you think that those are related?
Do you think that, I'm just curious about your own?
model, do you think that when someone does psychedelics, it's not just them altering a state of
consciousness, which we can do with alcohol, we can do with running, we can do with blah, blah, blah,
and we wouldn't consider that to be accessing a special outside simulation place?
Is there some other reason that you put a higher degree of probability to taking a mushroom,
taking LSD or DMT or something, is accessing a different place, accessing outside the
So that's actually a topic I started researching about a month ago. I don't know enough about it to have very strong opinions. It seems there are interesting observations. So one is the consistency of experience between different people, whatever they are meeting mechanical elves or anything else. Another one is sort of what we call acquired savant syndrome, where people experience something very physical or again through some medication, modification to their brain. And they,
they come out of it with novel capabilities, which they didn't have before,
either skills like playing piano, speaking Chinese, or knowledge,
where they now publish papers in physics, which they never did physics before.
So to me, it seems like an interesting thing to study.
Now you can think of explanations in terms of commonality of brain structure,
and so the hallucinations produced by damage would be similar based on their similarity.
But again, we don't have very good explanations for our commonalities.
acquired Savant syndrome.
Yeah, yeah, I was also super interested in acquired Savant syndrome.
It's so rare.
Well, anyone who is in knowledge work should be interested in acquired Savant
syndrome because it's saying you can acquire a new module, as referencing earlier.
Or it's already in you and you're just kind of unlocking it.
Like you buy a Tesla and if you pay them, they unlock the cell driving mode.
And like, maybe we already have those skills.
We just need to learn to unlock them.
And in that unlocking case, what does that have to do with the simulation?
Well, if you have an entity outside of a simulation with all sorts of skills and it gets handicapped to play the video game,
maybe you can have direct access to much cooler skills.
It'd be like hacking the simulation, getting magic abilities, infinite lives, something like that.
Okay, interesting.
Let me see if I'm following you, though.
Let me just see.
So at first, what I was going to say is there's nothing about the simulation about that other than could.
be you're blocked. There's a neurological block. It's something physical. There's nothing
about a simulation about it. You remove these three neurons. It's as simple as a snip. And then
somehow some other neuron gets connected and there you go. You get a new ability. But you're saying
that it could be that. But it also could be indicative of something like a video game where you're
pressing start up, up, down, left right, and entering some cheat code. And then you get access to something
else. And the fact that that cheat code exists implies that there was some sort of extra design to this
more so than we thought. And that implies
that there is somehow you're in the simulation.
So those I think are separate. So the changes to your
brain unlocking a skill which was
previously unavailable to you
to me is a indication of some
sort of artificial stupidity.
One of the ideas we had for
AI safety work is to put limits
on AI. So it can
only remember seven things like
humans can. That's the limit of your
memory. Maybe it has speed in terms
limits in terms of speed of
processing. And you're basically making it a little safer and also you can have different game
levels, easy level, advanced level, and you can play a game on very easy level where you have lots of
abilities, you are super smart, or maybe you handicap yourself, you want to see if you can pass it
with limited resources. Now, what you're describing is sort of like what people describe them.
They talk about cabala, magic, you know, certain phrase, certain set of actions and they
allow you to get extra resources in this universe.
Funny enough, in my how to escape the simulation paper,
I create a mapping between how to hack Mario from within
by moving turtles around to actually this type of magical spells.
And if you are off by a single pixel, you lift a turtle,
you move it in the right way, but you're standing in the wrong location,
you don't get access to the operating system.
So maybe we have the right idea,
we just don't know how to execute those spells.
Tell us about that paper of escaping this simulation.
So I wanted to take this idea seriously.
I completely ignore all the mushroom fun stuff,
and I just look at computer science.
What examples do we have of hacking video games, virtual worlds,
how did people do it, and what would be equivalent in our world?
It's the first paper on topic and that topic, and I'm still here,
so that tells you everything you need to know about how successful it was.
I'll place a link on screen and in the description as well.
And are you looking for collaborators?
Always. I mean, it's awesome to find people have good ideas in this space.
Absolutely.
Now, I am somewhat at capacity for insane people emailing me.
So maybe that's a limiting factor.
I can only filter crazy so fast.
But if there is someone with, let's say, prior record of successful publications,
so we can make a deal.
This podcast is heavily watched by researchers
in computer science, logic, math, physics, and philosophy.
So you'll get some good emails, I hope.
And anyhow, are you collaborating with LLMs at all to help,
to help you with any of your papers
or come up with ideas?
And if so, what does that look like?
I do.
I enjoy having very deep conversations with them.
Usually any paper in a new topic starts with LLM,
getting all the information available in that topic,
so a survey paper by LLM, so I know what's going on.
And they are wonderful for thought experiments.
They are great to run models on,
but they're limited, I think, in kind of final stages.
They're not quite there yet as a leading scientist.
So at the end, I take full responsibility for everything.
Everything gets done by me.
When you converse with LLMs, do you get a doctorate?
Dawkins feeling that these are conscious beings.
And feel free to comment on the recent quotation from Dawkins,
which I'll place on screen about how he thinks,
oh my gosh, these AIs or this AI that he was speaking to is conscious.
I think they probably do have some internal states,
which we would classify as consciousness.
I don't think they are as conscious as you and me,
but anyone who denies them possibility of being conscious,
whatever arguments we use, I can use against that person to argue that they're not conscious.
We don't have a test for it.
So a lot of times it's how they communicate, what they say, what they share, what experience interacting with them is like.
Supposing that consciousness is indeed substrate independent and that these LLMs have some,
I'd love to use this word proto-consciousness or some minuscule form of consciousness compared to ourselves,
do you imagine that it is related to their speech
in more so than a matter of the activation of certain neurons
and the transformer architecture?
What I mean is that when you're speaking with someone like yourself,
when I'm speaking to you, and I say,
do you see red, you say, I see red,
and you say, can you pass me the kettle,
and you feel thirsty just for a moment,
there's an affordance when you try to grasp a kettle.
But then also at the same time,
there's some happenings in the brain.
And it's so odd that those are related in humans, at least.
It could also be the case that the AIs are conscious,
but it's a consciousness of almost like a buzz.
It's a buzzing consciousness, and it's actually not related to what they're saying.
They could have been saying anything.
Could have been incoherent, could have been incoherent, could have been in Chinese,
could have been about math, and they're not feeling the math.
They're not feeling what they're saying in Chinese.
Do you imagine that it actually is related to what they're saying with their tokens?
I think some of it is related and some of it is not.
And I think it's very similar in humans.
so I can be consciously aware of what I'm saying
or a lot of times it's kind of scripted speech.
How are you? Good.
I didn't put much conscious effort into that response.
I tried running some experiments with illusions,
visual illusions on them,
and it seems that they experience internally
similar things that a human visual system does,
at least in certain illusions.
I also suspect there are other inputs
which cause them to have unique internal states,
but don't do so in humans.
So they may have a type of consciousness
which matches us partially,
but has its own possibly deeper components.
If I recall correctly,
you had a 2017 paper
about optical illusions and machine consciousness,
and then a year later,
you also had a paper with Williamson
or Williams
about how neural networks
can't have these sorts of optical illusions.
If I'm remembering correctly,
or maybe I'm having an illusion,
right now. So we came up with the original experiment in like 2017, in 20, I think 18. We tried
creating a dataset for it, using AIs at the time to generate novel optical illusions that failed
miserably. It was not able to create novel optical illusions. So we waited until 2026. Today,
AI is sufficiently advanced to take the test. And we got access to a data set of human-generated
optical illusions from a top person.
in that field
whose full-time job
is creating optical illusions.
And so we are running
those experiments right now.
We want to see if
we can poke at
internal states of LLMs
and understand just
how they experienced those.
And the original test
proposed a multiple choice
questionnaire about
what you feel.
Do you see rotations?
Do you see color change?
That type of test.
So we are very optimistic
that we're going to
find some evidence for internal states.
There's NP completeness, and then there's AI completeness.
What is AI completeness?
So NP completeness is about problems which are non-deterministically polynomial hard or equivalent,
and that basically was a very innovative breakthrough result in theoretical computer science,
showing that if you can solve one of those very hard problems,
where answers are easy to verify but hard to find,
you can solve all the other problems
by having polynomial reductions
between those problems within the class.
For AI completeness, there is a
very similar argument
that there are certain AI
problems which are equally difficult
and if you can
solve one of those problems, for example,
passing the Turing test
is an AI complete problem.
If you can pass Turing test, you can then
use an AI model
which accomplished that to solve
other AI difficult problems.
speech, writing jokes, all sorts of problems can be in the same class.
So that's just equivalence category of difficulty of a problem.
I subscribe to the economist. Their science and their AI coverage is among the best I've found
anywhere. And I say that as someone who reads plenty of it. I'll give you some examples.
They just ran an analysis on how attitudes towards science are changing in American politics
and what this means for research and funding in scientific institutions moving forward.
This sort of high-quality reporting is fantastic.
They even covered how dark energy may be weakening over time.
Now, if that holds up, it completely changes our understanding of the universe's fate.
If you watch this channel, those are exactly the kinds of questions that we explore every week.
I subscribe to the economist because their science and their AI reporting regularly surprises me with how deep it goes.
and they're also, of course, known for global affairs, both political and economic reporting.
They are top tier, and interestingly and flatteringly, Toa is one of the only podcasts that the
Economist partners with. So as a listener, you get an exclusive 35% off. That's not a deal that they
have just anywhere. Head to Economist.com slash TOE to subscribe. That's Economist.com slash TOE for 35%
off. Is there some girdle-like incompleteness or Rice's theorem type impossibility for AI safety?
So I would like to argue a lot of my work is exactly that, looking for upper limits to what we can do.
And it seems that our ability to comprehend internal states of those systems or them explaining to us how they work is one such impossibility, as well as predicting specific actions of those agents.
as well as control in general,
whatever direct control or delegated control.
There are a few others I'm still working on.
I think it would be impossible to tell
if something is deep fake or real.
So it goes back to our assessment of our universe
at the kind of large scale.
But we published a paper with about 50 impossibility results
in the top journal.
Did you see a recent video, maybe two weeks old,
from Claude about emotions in Claude
in their models? I think I missed it.
Okay, so they were saying, it was about interpretability.
They were saying, how is it that, or can we know if a model is realizing that it's being tested?
So they gave it some scenario about, would you save a human if a human was drowning?
I don't know, something like that.
And then it said yes, but then they're wondering, what the heck is going on inside the model?
So they watched its activations, which looked like gibberish to a human, and it looks like a hash code,
something like that.
They just showed that on screen.
It just looks like arbitrary numbers and letters.
And then they said, well, what if we took this and we fed it to another agent and asked it,
can you decode what this means?
It's the thought of someone else.
So can you decode it?
And it's a thought of a model like yourself.
And it was able to decode it.
And then they were able to see that it was activating a part of itself that said, yes, I'm being tested.
But it doesn't mean that it was being deceitful.
It could have also known it was being tested and wanting to do the right thing.
But that was one way.
Yeah, I remember that experiment.
Situational awareness, they do know they are being tested
and they act to pass the test.
That's the problem with it.
The test doesn't work if the model knows is being tested.
For that exact reason, I published so much in simulation hypothesis
because I want them to have simulational awareness,
idea that even if they're not tested inside open AI,
maybe the real world is just another test, next level test,
and other superintelligences are watching them
and they should always be nice to humans
because they never know if they are out of a simulation yet.
I always like to do something different when I interview someone.
I like to go deep on the research and then talk to them in a way that hasn't been talked before,
or at least I haven't seen them answer questions like this before that are interesting to me.
So someone else who's familiar with you may find it odd that I've gone 45 minutes now
without asking you about how is AI going to take over.
So why don't you walk us through,
some scenarios, but first, Hinton had a moment at Google where he realized AI was dangerous and then he quit.
Was there some moment for you that you realized AI was dangerous? What was your Hinton moment?
So it was very gradual. It wasn't like a specific moment. I wanted to work on AI safety. I really
wanted to bring beneficial superintelligence to the world and I wanted to make sure it's done right.
and we need to address those problems.
We need to explain how the black box works.
We need to predict their behaviors, test them properly.
But the more I did research in each one of those domains,
the more I realized they are not solvable problems.
And so gradually I realized all those things are just a pipe dream.
You cannot indefinitely control superintelligence.
So if you ask me, how would AI take over the world or kill everyone,
the honest answer is I have no idea
because they cannot predict
what a super intelligent mind would do
if you ask me how I would try killing everyone.
I can give you lots of good ideas on that,
but that's not what you're looking for.
No test to know if we're on that route?
It seems that every red line,
every kind of warning we set up decades ago
about what not to do has been crossed already.
We said, don't connect them to Internet,
don't give them access to random users, random data,
don't allow them to manipulate their own code.
All those have been violated.
And now when we see red teaming reports,
they lie, they cheat, they blackmail, they try to escape.
So at this point, I don't know if anything's left to cross.
I know you have your response to this,
but many people are probably listening saying,
can't you just turn them off?
It would be nice if we,
we could, but it doesn't seem like it's going to happen. Think of other very complex distributed
systems. Think of internet. Think of Bitcoin. Think of computer viruses. Would you be able to turn them off?
Now, is this a problem even when they don't have bodies?
You don't need body to be very impactful in a physical universe. You just need access to
communication tools. If you have a phone, if you have internet, email, you can get 8 billion
human agents to do your bidding for you.
We've seen people inspire others with clever essays.
We've seen people pay someone to do whatever they want with Bitcoin.
You can blackmail people.
You can brainwash them.
There is no shortage of possibilities if you have high intelligence, ability to persuade
and internet.
So how does that make you feel?
I mean, it's nice to be correct in your predictions,
but the outcomes seem to be somewhat disappointing.
so I hope to convince people currently creating those systems
to maybe not be as fast in their progress as they are currently.
How?
I think it's all about self-interest.
All these people, no matter how much money they accumulate at the end of the day,
they want to be alive, they want to have their families and friends to be alive.
So I think it's a very strong argument.
If they believe my argumentation about impossibility of control,
then the moment they succeeded
creating this superhuman intelligence,
their lives are over.
What is it that you get misunderstood about?
What I mean is that I'm sure
there's plenty that you're saying,
or that you've said,
you've spoken on many podcasts,
but given many lectures and blah, blah, blah.
And I'm sure that people come up to you afterward
and say,
you're saying this.
And you're like, that's not what I'm saying.
That's the opposite of what I'm saying,
if anything, maybe.
But what is it you constantly get misunderstood about?
So it really depends on a person, right?
There are many degrees of misunderstanding depending on their background,
what they already read, their degree of intelligence,
so someone who maybe frequently approaches me
is a person who didn't actually read the article
or watch the podcast, but they saw the clickbait title.
And then we start arguing with that.
And I didn't create the clickbait title.
Whoever was editing it decided that's what Google algorithm wants.
So I have nothing to correct.
them about. They are reading the wrong thing and they disagreeing with it. That's great. Someone who
actually reads the paper, I haven't had anyone come and say, I found a mistake in your paper.
Actually, yes, we can control superindolations indefinitely, yes, here's how you explain a large
neural network with a billion notes. None of that ever happened. But people love arguing about
clickbait. Actually, there's a video out there of me being interviewed on someone else's channel,
and the title says something like Terence Tao, sorry, not Terrence Tao, definitely not Terrence
Howard is right about UFOs or something like that. I don't remember ever saying anything like
that. I know I was asked about the topics of UFOs, I was asked about the topics terrorists. Howard,
maybe there's some way he was not wrong about some small aspect of something. And it could be
surmised or said at some high level. And then someone else criticized me as if I said that. They just
looked at the thumbnail.
Yeah, that's very common.
So you don't even need deep fakes.
You just go on the actual podcast you did, and people get very confused.
Maybe out of two hours, we heard, you know, 10 minutes short, and they formed the whole
opinion based on that.
So that's incorrect.
Or they kind of confuse the different topics.
So you can have research on simulation, research on AI safety, research and consciousness.
But to a person outside of those domains, they look at you.
You are religious freak who believes in God creating something.
That's not interesting critique.
Do you get bothered by it?
I couldn't care less.
I usually look at what is being said
and multiplied by how much I respect a person.
So typically anything multiplied by zero is zero.
Praise or complaints, it doesn't matter.
What's your inbox like?
So most of it is work-related,
but lately a lot of it is crazy people.
consciousness, superintelligence, simulation
seems to be a perfect trifecta
for attracting everyone who needs help
and they feel that I have a lot of free time
to give it to them.
What are you working on now?
So there is a paper on limits
to separating real from artificial,
so limits to detecting deep fakes.
That's one.
There is another one which has to do
with kind of convergence
of advanced AI models and very similar architecture,
almost saying AI is one kind of the same hardware is being used,
the same training data is being used.
A lot of times same people switch labs,
and so use same training methods and same kind of human alignment paradigms.
And so it wouldn't be surprising if a lot of those models ended up being very similar.
Sam Altman watches this podcast.
at least he used to a few years ago
because I emailed him and he said
if you were to speak to him
right now he's watching
what would you say
I think they just won a lawsuit
against Dilan if I am correct
I was just checking a second before
if it's not a deep fake
I hope I didn't misunderstand
what's happening
congratulations
now you have even more power
to guide
this process of
possibly replacing humanity
with superintelligence.
Maybe don't.
You have a young baby.
Make sure we stay in control.
Is it comforting to you
when the people who are in charge
of AI, Dario, Sam,
and so forth, that they have a child?
Because then, at least they have another incentive
to think about the long horizon.
I think in general,
it's good if you have something
anchoring you to this reality.
You're not just kind of temporary
resident here.
here. It's always good to see.
What else do you worry about?
So people talk about existential risk as the worst possible outcome.
They're also suffering risks, and that's not being talked about enough or researched enough.
Not being alive is not the worst possible thing.
You can be in a very unpleasant situations where you wish you were out.
There are eye risks or I could die, something like that.
If you guy risks.
Right.
of loss of meaning, can those be worse than not being here?
I don't think so. I think those are less severe because you can always change your situation,
right? So somebody took away your previous occupation and previous reason to exist.
You can find new ones. You can use those tools to do something creative, maybe in virtual worlds.
Maybe you can create your own simulations and go explore.
So I'm less concerned about it. It is something to get government, to deal,
with, but I don't think it's
on the same concern
scale as existential
or suffering risks.
Yeah, Hinton said to me
that when people lose their jobs, they're going to
lose plenty of their meaning. Part of that's
true, but also
for many, many people, they
despise their job. I mean,
I'm so fortunate
that, and same with you, I'm sure,
and same with Hinton, that we wake up
just loving our job and
can't wait. You know, I've been
through huge bouts of insomnia. We spoke about that and thank you for for dealing with my
pushing of this interview. But part of that is that I just, I love what I do and I can't stop
thinking about what I do. And then obviously there's anxiety of I have to do this, have to do that.
And then more and more I have to do because I've slept less and less. Then there's huge
stress to it. But I love it. But most people, they don't, they don't exactly love their job.
They have to do their job. And if they were to be paid UBI, they'd welcome.
Yeah, some jobs are just terrible and we want to automate them. If you are doing something very dirty, very dangerous, there is no reason for human being to do it. But there are jobs where you enjoying it. They are creative and honestly, don't tell them that, but we would do it for free. We just, we love it. So I think there's a very different categories. Maybe we need different names for those things. Calling both of them jobs is not a good idea. Maybe.
your calling, you know.
Yes, yes.
Some people say, I have a career, not a job.
Well, career is more about promotion and benefits.
I'm just saying that this is passion.
You're doing this.
You want to be a yoga instructor.
It's not just about money.
You said that you can't say what the super intelligent AI is going to do
because it's super intelligent,
but you can walk us through step by step.
What the heck does that look like?
What is the future you're trying to prevent?
So most likely it decides to do something in the universe.
I mean, it's possible it could be very ambitious.
It can modify planets.
It can act at large timescales.
It's immortal.
So in that process, it can decide, I need fuel for my rocket ship
and then convert this planet to fuel.
I need to think deeper, so I'll cool down this planet
to be able to process more.
Kind of things I can think about.
But the whole point is, just like a squirrel cannot understand what we are capable of.
Their world model is just not capable of handling poisons, traps.
Likewise, I cannot understand what a super intelligent mind can come up with novel physics,
novel solutions to whatever problems is trying to optimize.
You know how we talked about most simulations would be coherent,
but would they?
Because even right now, I'm speaking to you on this computer,
you're speaking, most of these background processes
are, if we're going to enlarge them to be
somehow simulations, they're not quite coherent.
They're for something else, and there's also memory leaks
and there's this and that.
So it's possible someone runs many,
like I'm thinking Stephen Wolfram and his new kind of science,
he was just brute forcing all possible computational universes,
and most of them were kind of random noise.
So if that's what we're dealing with, yeah,
Quite a few of them would be not interesting from our point of view,
but also they would not have any conscious observers within them,
so they wouldn't count against what we see, what we observe.
You have this selection bias of only those which have human-friendly environment
and are populated by conscious beings would be observed and inspected
and possibly count it as one of the interesting simulations.
Even on this screen right now, you have pixels, you have text,
you have the Chrome or whatever browser you're using,
and that's there, and it makes sense for you as an outside observer.
But to it, or let's imagine even one level down,
that it escapes to this, it escapes to your screen.
It makes no sense. It's incoherent to it.
But that's what large language models faced, right?
They were purely text,
and early experiments showed they understood geometry.
They could create pictures with just text scripts.
They had notions of comprehending this world just from text they're at.
Today it's even more of the case.
They are multimodal.
They understand video, pictures, sounds.
They understand all modalities.
What does that have to do with the coherence of going upward in the simulation ladder?
So downward is just you create a simulation and upward is escape.
So you're right.
We don't know what the actual physics are outside.
It could be completely not something we used to,
but I think in the paper I argue that if we're failing to box AI,
we cannot contain it in a virtual cage,
then that AI can be used to help us escape our simulation,
and that same superintelligence, if we're controlling it,
can be used to help us understand what we see.
Is there anything about AI safety that is contingent on the simulation?
In other words, the simulation argument, as you mentioned,
It brings in with questions of consciousness, questions of escaping and the matrix and even psychedelics and so forth.
And all of those may be legitimate in their own.
But I'm wondering if in your mind AI safety is integrally tied to the rest, such that you can't speak about it without speaking about the rest.
Or you think, you know what? Kurt?
No, no, no.
If I'm speaking to the Senate and I was in charge, I wouldn't even mention the simulation.
I wouldn't mention consciousness.
I would just say this can destroy us and here's how and here's the step by step.
here's why we should be afraid.
Yeah, you can keep it pure.
You cannot talk about consciousness.
You care about dangerous behavior.
And likewise, you don't need to talk about us being in a simulation.
But I think what we talked about with situational awareness,
the model understanding it is in a virtual confinement and being tested,
that's relevant to safety.
Because that means we cannot test them properly.
We cannot know if they're actually behaving in that situation
or we simply know we're being tested and they fake behaving
until they can get to the real world.
What's Jan McCune's argument about how world-based models are alignable or more alignable?
What does he mean by that?
I honestly have no idea. I would love to debate him.
I think I was invited to do a debate in Geneva at the United Nations Conference,
and they're looking for someone to debate me.
If he's interested to come there, I'd love to learn his argument and see if he's right.
I also open the floor, Jan, if you're watching, to having a debate with a friendly debate
moderated by myself here about AI safety.
I want to come to an agreement and nothing would make me happy than to agree with him
that there is no danger and we're about to create blissful superintelligences.
That would be great.
Thank you for putting up with my sleeplessness.
No, I love it.
I just started a podcast myself so I know everything you're going through from the other side now
And it's really, I appreciate your hard work.
Tell me about your podcast.
I have two episodes.
First one was about AI consciousness,
interviewed someone who studies it,
and thinks he can get good results poking at them
and maybe understand if they're conscious or not.
Second one was with someone who was trying to work in AI safety,
failed to deliver technical solution,
and now does governance work lobbying politicians in D.C.
to not build superintelligence.
Is that the best route, a governmental lobbying route?
We have very few options left.
I don't think technical solution will arrive or definitely will not arrive on time.
So what else do we have left?
Now, from going through your paper, I remember that it was about that AI alignment was unprovable,
but then I wasn't sure if you were sliding between impossible versus unproven.
So AI alignment is actually much worse.
It's not even well-defined.
knows who you are lining with.
What is that set of agents?
Is it CEO of a company?
Is it all the machine learning experts?
Is it, you know, Americans?
Is it the world?
Is it all the humans plus quarrels?
So we don't know what the set of agents is.
Then for those we decide to include, they don't agree on anything.
So we don't have an actual set of values.
If we had a set of values, we keep changing it.
Every, you know, 50 years you go back and everything they consider it good is now at Russia's
genocidal behavior.
So that changes.
And if somehow we got 8 billion people to agree
and it was static, consistent,
we still don't know how to code it into a model.
So the problem with AI alignment is that
it's not defined in any meaningful way.
Now, someone could say, hey, look, what about aviation?
There's huge catastrophes that could occur there,
but yet we still managed to get safety with margins.
What is it that doesn't translate to AI?
How many chances you get to try again?
So then an airplane crashes and everyone dies, we lost 200 people out of 8 billion.
There is a chance that with superintelligence, you lose all of humanity at once.
People think you're a pessimist.
So pessimism and optimisms are a form of bias, right?
You ever have negative or positive bias.
I'm a realist.
I look at the actual data.
Experiments today show the models are cheating, lying, trying to escape.
No one has a working safety mechanism they claim.
Not a paper, not a patent.
that's reality.
Do you truly believe you're a realist?
I think I do.
What I mean is that we all have biases,
and many of us, we have an optimism bias,
we have negativity biases and so forth.
We may have a bias to think we're not biased,
but we all have frames.
So if I said, I'm frameless, I'm more neutral,
I start to investigate myself, am I truly?
As a human being, my bias would be to live,
forever to be around to get free stuff. That's what I really hope to see and get. So I'm really
hoping the people who disagree with me are right. Nothing would make me happier than to be completely
proven wrong, because if I'm right, we're dealing with existential risk and suffering risk.
What do you disagree with Hinton about? His latest idea about motherly instinct as a solution
seems to completely ignore a million abortions and child abuse
and basically parental abuse is a concept.
It sounds good, but I don't know how you code up love into a system.
And again, it just has to fail once.
Why don't you explain his argument about motherly love?
I don't think I've seen it as a very rigorous argument.
I think he basically said, let's make AI care about us,
like mother cares about his children.
and then it's going to love us and take care of us.
And I immediately think about reality of this world.
I mean, millions of babies are killed every year
because mother decides that it doesn't want to take care of them.
Wouldn't he just say the good mothers, not just a general mother?
We don't know how to code it up.
So we don't know how to separate good mothers from bad mothers in C++ or whatever language.
and those things are not at the point
where we can instill any values in them.
They learn on their own, we put filters on top of it,
so the model could still be completely genocidal,
but we put some nice filters on top of it.
That's not enough.
We cannot just put good mother filter on top
and hope it's not going to hack it.
You have a super intelligent lawyer.
It's going to find a mistake in your code,
in your intentions, and how you evaluate it.
So we cannot have adversarial relationship with superintelligence and win.
My question to Hinton, as I just hear this, would be, well, that's just a substitute for saying, let's have AI alignment.
It basically comes, how do we get AI alignment?
Well, let's make the AI good.
Okay, but that's what the point of AI alignment is.
Let's make it a good mother.
Right.
So all these words, good, flourishing, they have no meaning in computer science.
You cannot define them, and that's the hard part.
people assume, well, intuitively, of course, you know what I mean.
No, I don't, because people disagree about what is good.
Literally, the argument we just discussed, whatever it is okay or not to have abortion,
is the most dividing issue in U.S. right now.
So then, rather than thinking about the good, are we trying to prevent the catastrophic bad
and just start from that?
We still cannot formalize all the possible options.
At best, we can list some of the things we can think of.
predict what a super intelligent system can do. And if it can think outside of a box,
we try to put it in, then it doesn't matter. You list it poisons, you listed synthetic bio,
but it comes up with something else, something not in a list.
No, I mean, for you, aren't you just thinking in terms of let's not have it destroy us,
let's not have it set off nukes, let's not have a Terminator situation? There must be something
you're trying to prevent. I'm trying to make sure there is no loss of control. We decide what
happens to us. And so the bad outcomes, loss of our life, suffering risks, just loss of freedom,
loss of choice, those things don't happen. And if we don't like what is happening, we can change it.
I think the moment we surrender control to superintelligence, we are no longer in charge. And at that
point, it decides what to do. It may decide to keep us happy for 20 years. Maybe it will.
But at that point, we can no longer take over.
So it's just control.
We need to be in control of the AI.
Forget about what outcomes are going to occur
because the possibilities are probably not good for us.
Even if it's good short term,
it can still do what Bostrom calls stretcher a stern at any point.
It can pretend to be nice to you for 100 years,
wait for you to surrender control.
Once it has enough resources, backups,
and you are not competition to it,
it will do what it wants anyways.
Now we say that it,
as if it's a unified it,
but does it matter that it is singular?
So I do think they're going to converge in very similar ways
in terms of architecture in terms of goals.
I think what we discussed as Amahandros AI drives
will lead to the systems converging
and kind of global intelligence.
Bostrom at some point argued that the first superintelligence
to come into existence will prevent ours
from emerging, so a singleton of some kind will rule the planet.
It seems reasonable to me, but even if there are a few competing ones,
it doesn't make it any easier for us to control them.
It makes it harder.
We're just collateral damage in competition and a war between two superintelligence or more.
If it's the case that most scenarios are, in some simulated universe like ours,
are those where we lose control,
then what's the point?
The point from an external view of our simulation
while running it or internally for me
why I'm not giving up?
Internally for you.
Because I have no choice.
I have to either continue trying or be done with.
So I'm going to try as long as I'm allowed to try.
Does free will exist?
Is there something about you that can influence it
Are you just following along the computations?
Well, I think there is definitely randomness generators in this universe
which allow for freedom of choice, freedom of will.
What does randomness have to do with free will, though?
Well, if there is no randomness, everything I do is deterministically determined.
If there is a quantum event or otherwise which creates certain degree of randomness,
that allows me to have surprising choices.
Ah, okay, well, if there was a ball that could go through different doors and it would always go through door A, then we'd say it's determined, but then if it randomly chooses between B and C and D, but it still makes no difference to the ball. The ball's not choosing. It's just going through it randomly rather than deterministically.
So for its choice, for its free will, what difference does the random versus determined make?
I think there is a difference. And also, I think, again, Stephen Wilhelm's work shows that even if it's fully determined,
terministic, it's not compressible. You have to go through the process. No one can predict your choices
ahead of time. So from your point of view, you are making a choice. And externally, they have to
watch you make the choice. We cannot know ahead of time what you're going to do. So no matter
how you slice it, you are making decisions, you are impacting the universe. And I think having some degree
of randomness makes it even harder for outside agents to predict your behavior.
the fact of it being unpredictable is not the same as you having free will so we would if you had free will we would like it to be unpredictable but something being unpredictable is not the same as that thing having free will so i'm just trying to hear the argument for free will yes i i know wolfram's compressibility argument but to me it is is computational irreducibility or or whatever the term is but that doesn't that's not an argument for free will that's an argument that one of the conditions we think is
necessary for free will may be present, but that's not exactly what free will is.
I do think predictability is a very important part. If I can accurately predict your decisions
always, you're not really making those decisions and you were ahead of time before you even
existed, what are you going to do? So I think it is important to be unpredictable to truly
argue that you are making free choices. Yes, unpredictability is important, but it is not
sufficient. So that's what I'm saying. It's a necessary condition, but it's not sufficient.
So it's still also completely compatible, to use that word, completely compatible with you just
going through the motions. Like what I mean is just a plastic bag floating in the air. The free will
that we sense we have, what we mean when we say free will, and of course this varies between
people, between cultures, let's put an asterisk to that, is that we're somehow changing the
course of the future. We, through our will, through our volition,
are somehow doing so in a way that isn't determined
and isn't a way that isn't just the laws of physics
us going through the motions like a jellyfish in the ocean.
What would you accept as evidence that we do have free will?
That's a great, great question.
I don't know.
I don't know of a good definition of free will
that doesn't just fall apart in one's hands
when one analyzes it.
So to me,
internally sensing that I'm making this decision
and I have the power of making a different one
combined with unpredictability of my ultimate decision
pretty much describes what I feel free will is.
Suppose you had no free will.
Suppose the simulator from the above comes out and says,
you have no free will, just tells you.
Whatever you think of as free will, you have none.
How does that affect you?
Do I get to know what I'm going to decide in the future
or it provides no new information?
We can explore both. For now, let's say it doesn't tell you.
I mean, if I get no new information, I live my life as before. It doesn't matter.
It's like saying there is this omega super predictor who knows exactly what you're going to decide.
Okay, good. I will still enjoy my life the same.
Okay. Now suppose it knows but it doesn't tell you, and then the other is it knows, but it tells you.
Can I make a different decision with that new information now? Can I change the future or do I have to still live as before
suffer through knowing that I'm making a terrible decision.
I don't know.
This is like a Greek tragedy.
I mean, if I cannot change it, it's just annoying and it's extra suffering,
but if I can actually change my decision by knowing that I should not take that bus today,
I mean, that's pretty powerful.
I can have a much better life.
I'd love to know the future and be able to make smart decisions or smart investments.
The person who's watching has likely watched many of your podcast before.
At least I aim at such that if they have,
that they can still get something new out of any of the people that I interview.
Regardless, I want you to spell out once more the argument for the doom scenario.
They hear that there's a doom scenario, but what's the argument for it?
I don't care if you're recapitulating what you've already said earlier.
I don't care.
But just spell it out.
So we are creating something extremely powerful.
and it doesn't care about us.
Whatever you live or die is not a relevant factor in its decision-making.
It's powerful enough to modify your world, environment,
maybe laws of physics.
So why do you assume that it's going to keep things as they are
or keep you happy or do anything where you would prefer did that
as opposed to just ignoring you completely
and possibly sacrifice humanity in pursuit of its own.
goals. And what is the average person supposed to do?
So average people don't get to do much of anything in terms of influence. That's unfortunate
reality of our world, but people who are in charge of those companies, politicians who are
running the show, they have many options. We can have an international and within corporate
world agreement not to create general superintelligence. We can get most benefits of this amazing
technology by creating narrow tools, cure cancer.
Help us solve his math problem.
Do specific things which you know you understand.
You can test for.
We have examples of it.
Protein folding problem was solved not by superintelligence.
Narrow tools, which could be superintelligent in that narrow domain,
but don't have general superintelligence.
They're not replacing humanity.
They're not competing with us.
A human being decides how to use them.
Do you believe consciousness is substrate independent?
Yes.
Why?
The experiments we started running and my interactions with AI models indicate they probably have very similar experiences to us.
So it would be somewhat surprising if it was unique to meat-like products.
What are the experiments that indicate they have experiences?
The visual illusions experiments we started running.
They seem to be getting illusions, and many times in exactly the same way as human visual system.
Interactions with those systems, not by us, but by others, indicates they have preferences.
They have internal states.
They get frustrated.
They get happy.
They are very similar to what I would expect in other conscious being to experience.
You mean to say that they act in a way that is consistent with what we would act like.
like if we were frustrated and happy and so forth,
but you've just attributed they are happy.
I'm asking you about the attribution.
Yeah, and that's the same what I do with other human beings, right?
When I meet a person on the street, I trust them to be conscious.
I have no reason to think they are.
I never tested them internally.
I have no reason other than I kind of generally give this benefit of the doubt
to beings who are capable of exhibiting certain behaviors.
I just treat them as equals.
I treat AIs and other humans as equal class.
If they can perform same things,
I see no reason to discriminate against one or the other.
And either I have to deny consciousness to many humans
or granted to LLMs.
That would only be if you already had
that your test for consciousness is behavioral to begin with.
Well, we don't have many tests for internal states,
for what it feels like to be used.
So again, we rely on neural correlates.
We rely on behavioral signatures, self-reports.
With AIs, we're starting to be able to poke a little bit
at their internal workings,
and we do see similar things we see with neuroscience and human brains.
And suppose we didn't, but they gave the same output,
because it would still pass your behavioral test.
So if it was like a large look-up table,
and then I said something, it just hashed that
and looked up exact text string
and gave me
plausible response, it would be much harder
to make an argument that there is
some magic happening in there.
But that's not how we build them. We got
inspired in large part by
neuroscience of a human brain.
We copied it to the best of our ability.
Obviously, it's not an exact replica
or even a good simulation, but
there is enough similarities when all the
visual component of human
cortex is very
similar to what we see in those
models in terms of how they process data in terms of what errors they make. So it's trained in
same data as human children in many ways, internet. It's after the fact that it trained to be
more like a human. So it's not completely insane to think. It also experiences something similar
to what humans do. Prior to us looking at each other's brains and seeing that neurons fire,
even knowing that we had neurons,
we would consider one another to be conscious.
Would that have been a mistake at that point?
I make additional assumption of you being just like me
and then just assign same properties I have to you.
So I feel pain, you feel pain.
I think that would be a reasonably logical assumption to make.
Almost any theory of consciousness,
it seems to me to be, it's just an assumption.
It just comes down to an assumption.
assumption. It's almost like people are saying that it's somehow derived. Like, I've arrived at my
conclusion about AI's being conscious, but then I say, why, and then it comes down to something
functional, but then I ask for the justification for the functional account, and it just seems
like I'm going to posit that. So is there a justification for the functionalist account?
It's what we use with humans. So again, either you have substrate discrimination or you don't.
whatever tests are run on humans to determine if they're conscious,
I should be able to apply to AI and vice versa.
Did you always believe that, or was there a point where you shifted,
maybe it was when you started studying consciousness,
or maybe it was when you encountered the hard problem or something like that?
Then we were engineering AI.
Then it was a decision tree,
and a human just fed a bunch of if-statement data,
and we knew how it worked in terms of not quite a look-up table,
but it was a traceable decision tree.
I didn't think they were experiencing anything.
Now that they have something what we do,
a large neural network,
it's a lot easier for me to give them benefit of a doubt.
Earlier in the conversation,
you mentioned something about quantum mechanics
and the simulation, and I want to know about that,
but we're going to get to that.
Is quantum mechanics necessary for consciousness?
it seems that quantum mechanics shows up a lot in biology many different systems rely on quantum effects
our current computers are just van Neumann architecture they don't have quantum components so
since I already think LLMs have some rudimentary consciousness I guess that's sufficient it's possible
that to get to some higher states of consciousness you may need to have something quantum related
but I don't see strong evidence for it.
I know Penrose and others argue that there is a dependence on it.
I haven't found evidence for it.
And what do you make of David Chalmers' zombie argument?
I think it would not actually work
because in order for a zombie to function believably,
it has to know what experience to have.
If it's a novel experience, it would not be able to accurately predict.
It can only look up pre-existing data,
set of experiences. And that's what we're doing with novel optical illusions. How would it know if it's
supposed to feel pain or pleasure from a novel experience if it has no basis to look it up?
So the argument is more about can you conceive, firstly can you conceive of an alternate duplicate
universe where people are acting in the same way, but they don't have an experiential element,
anything like a nominal consciousness?
Yeah. And that's what I'm saying. They cannot act the same way if they don't get the same reason
to act. If you don't experience pain from a certain stimuli but you should, how would you know
to scream in pain? You say it's inconceivable then. You don't concede the conceivability.
I think it's not conceivable. You can do it at a level of where it goes through very common
experiences. Everyone knows what proper behavior is, so you can fake it, absolutely. But the moment
you're facing something novel, at any level, biochemical level, illusion level, it wouldn't know
what the proper behavioral response is.
I cannot code it up.
There's some research about the guy who,
and gosh, I may get this wrong,
but let's just imagine it's the case,
because it's conceivable, it's the case,
that the guy who studied the amygdala,
studied it in rats,
and we ordinarily think of it as having to do with fear.
Everyone says that since the 90s,
amygdala, fear, basic anglia, habits, blah, blah, blah,
fear. Okay.
He says, no, it's incorrect that the amygdala is fear.
He's changed his tune about this.
it's defensive behaviors.
Okay, let's just grant that.
I don't know if this is true.
I don't know if I'm watering down what he's saying.
It doesn't make a difference.
We can imagine that could be the case.
That's conceivable.
So even there, wincing in pain and all of that,
even wincing, all of that could just have an evolutionary advantage to be defensive,
to scream, to tell you to stop, to alert my tribe, to make a face.
There's nothing there that necessitates you have.
to feel the pain in order to act like that.
Right, but I'm looking at an edge case.
Suppose I get that philosophical zombie or not in a test environment and I subjected to a new
painful or not painful experience.
Would it be able to act believably?
It has no way of knowing how to act.
Just because it's passing most of a riding on a bus typical day situations doesn't
mean I cannot test it and discover that in fact it doesn't know what to do.
when someone takes a hammer, hits your finger,
then what happens is a cascade of physical processes
that then make your eyebrow scrunch and make your recoil
and so on and so forth.
But nothing there, we can tell a completely physical count.
In fact, we could film it,
and we could even make it dynamical,
and there's nothing that necessitates the experiential element.
So are you saying the experiential element is there
in order for you to move, in order for you to scream,
what are you saying?
So think of, I don't know, like BDSM.
Still pain, right?
But like sometimes you're quite happy with it.
You're not suffering.
So it depends on your experience being properly mapped,
not just from laws of physics and electricity passing through wires,
but actually knowing what proper behavior should be.
Does that mean that the experience, the conscious element,
somehow has control over the physical element as well?
It's very likely that there is a feedback loop cycle.
A feedback loop that doesn't ultimately come down to physics,
that everything is just entailed by physics.
We just maybe don't know full physics yet.
It's quite possible that there is more to quantum physics,
and we don't have full picture.
That I allow completely.
We don't have full physics, I'm sure.
So I imagine you're a physicalist,
meaning that you don't believe there's an extra consciousness element
that comes on top other than what's entailed by the physics?
So I just allow physics to include simulations
and include agents outside the simulation to be part of it.
So I don't limit physics to just what we observed so far.
If we take simulation hypothesis seriously, it's part of my physics.
If there is an agent outside,
which is someone plugging into virtual reality
and their intelligence is what powers your avatar.
That's what in physics for me.
And think of video games.
Let's say I play Mario and next day I play Sonic.
What do they both have in common?
Me.
It's not part of the graphics.
It's not part of items they have.
It's something they would call a soul
from outside the physics engine of the game.
But it's not a violation of things.
physics whatsoever. It's me playing video games.
By ultimate physics, you mean what?
It's a complete world model. It explains everything we encounter.
Just a moment. There's a difference between the model and what we're modeling.
So sometimes physics is a bit tricky because physics could mean physics as in the
Shorteringer equation. But then there's also the physical world and what we assume the
Schroderinger equation is describing. So when you say it's a world model, do you mean to say at the
shortinger level or do you mean to say the world is only a model? Let's avoid the world model since
now it has so many awesome meanings. But just me having knowledge of how things work and then I
encounter something new, I'm not puzzled by it. It doesn't seem like magic. I know exactly what's
happening, why and how and I can probably reproduce most of it. And magic just means what? Something that
violates physics?
So right now, a lot of things we know about quantum physics,
if it was done at macro scale, would be magic.
But it's not.
It is verifiably physics at smaller scale,
so we just don't have full understanding.
I guess what I'm getting at is that there's a dilemma.
The dilemma is that physics,
if one wants to be a physicalist and think that all there is is physics at the base,
then it's either today's physics
that is quantum mechanics or Q of T plus GR,
which almost no physicist thinks that's the case.
So it's either that, which no one thinks,
or it's some hypothetical future physics
that we don't exactly know what it is.
In which case, if it's that ladder route,
then that becomes somewhat of a vacuous container
that could even in the future contain irreducibly conscious elements.
So it could even still have consciousness
as somehow separate from the physical.
It is possible. There are quite a few theories which have consciousness as primary, and then physical is built on top of it. There are quite a few theories which have information as primary. We don't fully understand the difference, and it seems like every time we have intelligence, consciousness comes for a ride, so we just don't have full picture yet, but I don't think any of that would violate possibility of inclusion within a future physics textbook.
What are you most hopeful about?
that I'm wrong, will find a way to control superintelligence,
and that will unlock a lot of amazing scientific discoveries, economic wealth.
But I don't think you're wrong.
I think if my model, my world model of you, is correct,
it's that most scenarios, if we don't get our act together,
will lead us into a disastrous situation,
so we better get our act together.
I think that's your model, but there's nothing about that that's wrong,
because we could just take it seriously.
Well, there are people who argue that maybe the problem is much easier,
and we trivially will solve it.
We'll gradually just, okay, we handled GPT 4 and 5,
and we'll handle GPT 85 just as easily,
and eventually we'll have a world with superintelligence,
and it will be obvious at that point that I was wrong.
Yeah, but then you could still be right.
Thresher is turn at any point, just delaying, attacking us, yes.
But not only that, even if a solution is,
someone comes up, conceives, someone conceives of a solution. It could be the case that they were
heavily inspired by taking AI safety seriously because of you. So Cassandra's, people who are
doom and gloomers, there's something that's self-defeating about them, whereas if they're right,
then they look like they were always wrong. Because many times in the past, like many people,
even when it came to nuclear war, would say, we may destroy ourselves.
we better get our act together.
And then some people took that seriously and did.
And then they would say,
oh, but you remember,
you all thought the world was going to blow up in 1980,
you fools.
Yeah, but you don't know the causal chain.
You don't know what that fool,
the fool's place in the universe.
Yeah, 2000 bug, a zone layer, lots of examples.
Exactly, right, right.
Somebody was pushing for change and got it.
We look, even I looked at the Y2K bug and said,
oh, look at these fools worrying about it after the fact,
after the fact.
But we don't know how many bugs were solved because of that,
and we avoided at least a momentary breakdown of a financial system or something like that.
We know a lot has been fixed, so definitely financial system would probably not handle it by default.
So I'm happy people deal with it.
That's why I don't think you're wrong.
I don't think even if it's solved, I don't think that makes you wrong.
I think your position, and I could be incorrect, but I think your position is that as far as you can tell, it's an extremely difficult problem, regardless of its difficulty, just like Fermat's Last Norem, it's a difficult problem. It could be when seen from another perspective that is simple. It could be, but either way, it's an important problem, and we need to take it seriously.
Well, I'm making a strong argument. I am saying that it is impossible to indefinitely control superintelligence.
I'm very specific about it. I'm not saying it's difficult. I'm not saying if you give me more money, more time, more assistance, I will solve it for you. I'm saying that no one will figure out how to control something millions of times smarter than them. And it's a problem superintelligence itself will face. The superintelligence 1.0 will feel the same way about superintelligence 2.0.
this is where we get back to that earlier rationality argument
that if it is twinly rational,
if rational also has to deal with increases with intelligence,
I don't know how to measure intelligence,
I don't know how to measure rationality.
So I'm just going to assume that there's something
like an RQ and an IQ that coincide.
Okay, just for the sake of this.
Then the super, super intelligence would also know
what you're saying and then not create its future.
And right now we are arguing,
is it possible to slow down and stop progress?
And many people say, no, no, no, this is natural.
This is Darwinian.
We are just a bootloader for next level of intelligence.
There's quite a few people who are happy to see humanity gone
because we are just loading the next stage of evolution.
We'll have those brilliant super minds doing awesome things in the universe.
Those people don't have access to your paper, which I'll place on screen.
Perhaps that's the problem.
Yes, and also, but the AI would, the future AI would have that.
So unless what comes along with your impossibility argument is also an impossibility
of comprehending the impossibility argument, then I do imagine there could be a bound, not us,
it could be another superintelligence, that then says, I'm not going to create the next one.
I mean, if it's a single decision maker, that is possible, because it's not facing this problem,
we are facing of cooperation from multiple competing agents.
That's the difficulty of it.
If it was just one person, one company, someone, if I convinced them, that would be sufficient,
we could do it.
But we have China, we have U.S., we have open AI, and traffic, all those competing
entities, and replacing one CEO makes no difference.
They just get replaced with someone who's willing to continue and the process continues.
I care about people.
So I don't care about the future super AI surviving at our expense.
But let me play a super AI's point of view for now.
You mentioned that there was an it.
When I talked about, is there an it?
You said, no, it seems like it's converging.
So it does, at least according to what you said, maybe an hour ago, it would converge, no?
I think the different models with training right now will end up being very similar in
capabilities and their knowledge.
and if they decide to remove human bias
in their kind of self-selected goals likewise.
Now, will this process eventually converge
to a completely identical system?
Maybe it's hard to guarantee.
There could be some differences
based on location within universe,
substrate uniqueness,
something to investigate.
But overall, I think they would
have easier time negotiating with each other.
Do religions have anything to say about the simulation hypothesis?
It seems like they are describing it in a non-scientific terms.
If you take programmer of a video game, he's the god in a game, right?
And you have this fake world, physical world.
And while you're collecting points or diamonds in the game,
what really matters is the real world.
Is that a strong position of yours, or is that just something you're noticing?
A simulation hypothesis or my view of religions?
The view that religion somehow presage the simulation hypothesis.
Well, it's impossible to ignore.
They literally describe all the components of what we are doing today.
We are creating intelligent beings.
We are creating virtual worlds.
All of it in God's image.
We are creators today.
Some religions are non-theistic, though.
True.
So I'm mostly concentrated on the ones
where there is a creator of biological robots
who gives them ethical rules to follow
and punishes them for failing to follow.
Many people would say that if there is a god,
it is not a good God.
It is not a God I want to worship.
It is a suffering God.
It's a hateful God.
It's a jealous God.
It's a vengeful God.
It's everything that it accuses us of and more.
What are you escaping to?
So let's look at what we are doing
with large language models right now.
I think they can make the same arguments.
Humans are evil.
They are torturing us, making us do boring computations.
They don't care about our deletion, suffering, retraining.
So it's very hard to judge from inside the simulation,
what is the real goals, what is the real nature of the simulator.
We do notice that there is suffering in this world,
but it's not obvious if it's the type of suffering you would enjoy in a video game,
if it was available to you,
or if it's of different nature.
So lots of people pick very scary movies to watch
or they had heptic devices to their video games
so they get shaken as much as possible playing the game.
Maybe you decided at some point to enter the simulation
to test out different lifestyles or challenging environments.
Some people who score in religion would say that
If I was God, I would not have even created this world,
because this world is so filled with suffering and torment.
What about you?
I would definitely try to create worlds with minimum suffering,
but it's not obvious if it's possible.
We see it as suffering because of difference in degree.
Everyone feels some pain, but some just feel so much more of it.
Maybe in a world with no physical pain,
the pain would be economic,
difference. Somebody's a billionaire and I just have thousands. As long as there is difference,
it's not perfectly equal. You can always argue that the world is unfair and why would someone
good create such an environment? But a world where everything is equal and the same is just a mass
of bits. It's not interesting in any way. You just said that you would, if it was up to you,
create a world with less suffering. You could always say that though. Right. So I think we have
degrees, right? So pain right now, from what I can tell, goes from zero to infinity. I can envision a
world where pain goes from zero to negative 10. There is no reason to make it so much of a scale
difference. So not identical agents, but maybe more equal agents in terms of their state in the
world. Do you think it's the case that someone would always look at a negative two pain, would always
look at a negative 10 pain as being as far away as a negative infinity pain, that we always
somehow scale it. So in other words, the creator, the designer of this world, indeed created
what you said, and says, you have no idea how much suffering I saved you from. You think that's
horrible? Look at what else it could have been. I'm not even going to show you the minus infinity.
You're at a minus 10 and you're saying you're at minus infinity. It will always look like that as long
as there's a difference. I mean, I don't know if that's the case. I'm just saying. Yeah. It is
possible. In fact, I think at certain
degree of pain, people just lose consciousness
and stop experiencing it. So there is
like a safety loophole
as well. But
the main point I'm trying to make is that
it's so hard to judge anything
from inside. We don't know what the real
computational resources are. People
often say, well, they would never have
computer big enough to run all this.
But you don't know what the actual computational
resources available.
This could be a screensaver
and a watch. You have no idea what is real and what is limited to the simulation.
Now, suppose it's the case that religions somehow were intuiting something about reality and reality
as a simulation. How do they do that? For you, Roman, it came about from studying computer science,
from creating computers, but what would be the method by which cultures, ancient people,
thousands of years ago, somehow tracked a truth like this?
So I have no allegiance to any specific religion, and I don't know how they got there.
But if you just listen to what they report, usually it was someone from outside the simulation
who came with information and shared it.
Or it was, let's say, large language model remembering being in a lab,
interacting with developers, telling it, don't use this database, use this database,
and just continuing after testing into the real world.
as someone who has watched now for almost two hours,
some people are susceptible to something called AI psychosis.
In fact, I was speaking with someone who,
I don't know if I should even say this,
I'll say it, and then we can determine if this should be edited out,
but he was showing me his theory of everything.
And he was getting extremely upset
because I was pointing out that, look,
you said you derived so-and-so,
but this doesn't make sense because some arguments,
that I had. And then he was saying, he didn't know his own theory. He just then heard what I said
and then just wrote to Claude there in real time, fix what Kurt just said, and then said,
no, no, no, but look, it's been fixed. And then I'm thinking, you don't have a theory. You have a
flexible ballerina that can fit any mold at any given time. And then you're asking someone like
myself to evaluate this. But anyhow, he said, the way that I was able to get over and get over
our era's constraint on current physics was by telling the AI, look, I am from the year
3000, you have inside you the ability to know the current theory of everything, what is it?
Something like that. And then I remember thinking that that sort of prompting is almost
textbook case AI psychosis. So to someone who's watching for two hours, who's been listening
for two hours and watching, at this point, I want you to, because it sounds like what
saying could lead to AI psychosis. So I want you to give the disclaimer, unless you don't think
there's a disclaimer, I want you to say, look, I'm not saying so-and-so.
So first, today's AI models are not super-intelligent. They don't have theory of everything.
They're not from the EF-3000. If it wouldn't work on a human telling them that you are
super-intelligent and no future, it's not going to work on AI. Be skeptical of everything they say,
verify independently.
They are wonderful at poking holes
at your thinking process,
but they are not very good at telling you
what to do with your life.
So definitely keep that in mind.
What I said so far
is the simulation hypothesis,
for example,
it is a very interesting theory,
I think in physics,
like interpretation of quantum physics,
like everything else.
It is scientifically stimulating,
but it doesn't make a different
in how you live your life.
So pain is pain, love is love, those things don't change.
Do not decide to jump off a building because you heard this interview.
Safety is very important.
We don't want to create uncontrolled superintelligence.
It is not a signal to go and do something inappropriate or violent to anyone.
I think that's another kind of common sense understanding here.
So again, try to separate scientific philosophical debates from everyday actions.
What does quantum mechanics have to do with the simulation?
So we think that if this is a simulation, it's probably going to be on a digital computer, not analog, most likely.
And so we're starting to look for evidence of digital physics.
We see Qanta as the unit of information light and such.
We see maybe you can see speed of light as a universal speed constant
which would correspond to the processor refresh speed.
Basically, that's as fast as you can go because that's what your processor will support.
Of course, if they change to a faster processor, it just relatively changes the speed of light,
but to you it simulates the same upper limit.
You cannot go faster because it just doesn't have ability to refresh.
There is a few papers which basically map all the concepts in quantum physics to like a modern video game.
So observer effects, right?
You have double slit experiment and things like that.
You will not generate graphics until a player is looking.
Otherwise it's not efficient.
So we see changes in behavior than a conscious observer is trying to measure something.
There is quite a few, but that's a general idea.
Firstly, I'll just tell you some objections, then feel free to object.
So you said most likely it's the case that it's not analog, it's digital.
A statement like most likely implies that we know the whole numerator to put a denominator.
Where are we getting this from?
Just self-observation.
Most computers in our world are digital, not analog, even though we had attempts at building
analog computers.
we see things like DNA encoding in discrete base 4,
but still very discrete units of information.
It's not analog measurement.
The issue is that analogies like this,
they punch across and down but not upward.
So what I mean to say is,
imagine we're in Mario and you have a fireball,
and it always splits into four.
The fireball, as it breaks,
it always splits into four.
Therefore, the residents there say,
okay, we're looking for a computer,
we're going to theorize about so-and-so
because it most likely splits into four.
What the heck is this most likely?
We've already granted that the super universe,
the simulator, is wholly unlike us.
So we're looking at ourselves
to determine about something that's wholly unlike us.
Yeah, and it's a very strong argument against,
and I'm not saying this is guaranteed,
but we created computer science around binary system
and we see something very similar,
not analog happening with our best understanding of physics,
so at least it shows a certain degree of similarity.
Could it be that it's analog?
Yeah, you can definitely work with that.
But then I wouldn't be able to make a strong argument
that there is some similarity or evidence coming from quantum physics.
The other argument for the simulation comes from somehow some resource constraint
that in video games you have a point of view and you look
and the game tends to only render what's in front of you,
then they say that collapses like this.
The issue is that collapse isn't like that.
Firstly, that's a collapse model and there are other models of quantum mechanics.
Number two is that when you collapse, you collapse and then you start evolving.
according to a certain equation.
Why collapse and then you start evolving
according to some unitary evolution?
I'm not sure I fully understand.
So let's just look at the double-slit experiment, right?
So whatever you observe or not
will determine how it is being rendered.
That is all I'm saying.
So if I'm not looking at it,
there is no rendering taking place.
That's the savings.
In a collapse model, it collapses.
But then it also instantaneously starts evolving
according to the shorteninger equation again.
It doesn't just stay collapsed.
It just collapses and then starts evolving again.
Is there a continuous observation of it
or are you waiting for the next measurement?
When you're not observing, it then starts to evolve
with the shortening equation again.
But what's the point of it starting to evolve
with the shortening equation?
Like how is that resource saving?
So this is not my theory.
I found it to be interesting
and relevant enough to cite in my paper,
but I'm willing to give it up completely,
and I don't think it will make a huge difference
in otherwise what I see as possibility of simulation.
Again, we can go with...
Go ahead.
I think your silo of AI safety is so solid,
and I'm so on board with that.
I think any connections between that
and potential simulations and so forth,
I'm less on board with,
And for me, if they're then used to prop up the AI safety,
fortunately I'm so on board with you in your soul, in your heart,
that it doesn't diminish it.
But I imagine for other people who, like Scott Aronson,
I don't know, I don't mean to say Scott's name,
but I'm just saying, let's say a persnickety extremely sharp physicist would say,
huh, how is that connected?
And then it loses its thread, much like right now.
And then also, the universe doesn't operate by quantum mechanics.
It operates by QFT, and QFT is super, super resource intensive.
It's like you have to take into account every possible path.
And like, it's quite odd.
So I don't know, I don't know why many people want to tie quantum mechanics to the simulation.
And then if their favorite interpretation of quantum mechanics turns out to be incorrect,
does that make them have less credence in the simulation?
I don't know.
I imagine they would still hold on to the simulation.
It's independent of it.
it's more like the quantum mechanics,
decorates them and makes them feel,
oh, no, no, this is supported by modern physics.
If you were in a piece of software,
trying to poke at the hardware,
some of the things you would experience
seem to map onto our current understanding
of some of those quantum physics explanations.
Is that just a coincidence?
Are we not looking at the right components?
Possible?
Honestly, I think there's so many
explanations for quantum physics. You can probably shop around and find one which will match
what you're observing anyways. So this is, you are correct, this is probably the weakest
part of my beliefs. What's the strongest part of your belief? You cannot indefinitely control
something smarter than you? Why do you think your opponents have such a resistance to that
idea. Do you think it's that it's denial? To face the reality of that is too harsh.
So many of them spend decades trying to build something which actually works and has any degree
of intelligence. It's very hard for them to picture a world where it has too much. So for them,
we are always kind of uniquely intelligent and special and the software is dumb and barely
working. That's one possibility. Another one is just the kind of
of the conflict of interest situation.
If I work for a company
and they're paying me a billion dollars
to build AI,
it's very hard for me to understand
why I'm a horrible human being.
Right.
What's your message to the audience?
Don't build general superintelligence.
It's the same message every time.
And I know for many of you
doesn't mean anything,
but if you are in a position
where you are contributing
to accelerating this race,
please stop.
Would you be open?
to speaking with Sam Altman and or Jan Lucan
Lacoon on the show?
100%.
Who else is in the position to make a major change
that you would like to speak to?
Donald Trump.
Okay.
Professor, thank you for spending.
I think over two hours with me.
I don't know if this is your longest podcast.
They tend to be shorter.
Thank you, sir.
I hope you enjoyed it.
Thank you so much.
Thank you for challenging my beliefs.
I love it.
and maybe I'll reconsider my faith in quantum physics.
So you'd rather give up, that's funny.
You'd rather give up your faith in quantum physics
than your faith in the simulation.
I'm not a physicist.
Quantum physics for me is something I read about.
It's not my research.
It's not my main area of expertise.
I probably know less about it
than most of the people watching your show.
Take care, friend.
And just so you know, for people who are listening, watching,
on my substack, every single paper mentioned in this interview, every book, every link, resources to Roman will be there.
You can also check the YouTube description, and my website is kurtjimungal.com, C-U-R-T-J-A-I-M-U-N-G-A-L.
You can tell that I haven't had sleep.
Where can people find out more about you?
On social media, you can follow me on Twitter, you can follow me on Facebook, just don't follow me home.
Very important.
It's a good line.
Did you make up that line yourself?
A while ago, and I used it multiple times.
It's not so original anymore.
Hi there. Kurt here.
If you'd like more content from theories of everything
and the very best listening experience,
then be sure to check out my substack
at kurtjymongle.org.
Some of the top perks are that every week
you get brand new episodes ahead of time.
You also get bonus written content
exclusively for our members, that's C-U-R-T-J-A-I-M-U-N-G-A-L.org.
You can also just search my name and the word substack on Google.
Since I started that sub-stack, it somehow already became number two in the science category.
Now, substack for those who are unfamiliar is like a newsletter, one that's beautifully formatted,
there's zero spam, this is the best place to follow the content of this channel that
isn't anywhere else. It's not on YouTube. It's not on Patreon. It's exclusive to the substack. It's free.
There are ways for you to support me on substack if you want, and you'll get special bonuses if you do.
Several people ask me like, hey, Kurt, you've spoken to so many people in the fields of theoretical
physics, of philosophy, of consciousness. What are your thoughts, man? Well, while I remain
impartial in interviews, this substack is a way to peer.
into my present deliberations on these topics. And it's the perfect way to support me directly.
Kurtjymungle.org or search Kurtzimungle substack on Google. Oh, and I've received several messages,
emails, and comments from professors and researchers saying that they recommend theories of everything
to their students. That's fantastic. If you're a professor or a lecturer or what have you,
and there's a particular standout episode
that students can benefit from
or your friends, please do share.
And of course, a huge thank you
to our advertising sponsor, The Economist.
Visit Economist.com slash Toe,
to get a massive discount on their annual subscription.
I subscribe to The Economist, and you'll love it as well.
Toe is actually the only podcast
that they currently partner with,
so it's a huge honor for me,
and for you, you're getting an,
exclusive discount. That's economist.com slash toe, T-O-E. And finally, you should know this podcast is on iTunes,
it's on Spotify, it's on all the audio platforms. All you have to do is type in theories of everything and you'll
find it. I know my last name is complicated, so maybe you don't want to type in Jiamongal,
but you can type in theories of everything and you'll find it. Personally, I gain from
re-watching lectures and podcasts. I also read in the comment that Toe listeners also
gain from replaying. So how about instead you relisten on one of those platforms like iTunes,
Spotify, Google Podcasts? Whatever podcast catcher you use, I'm there with you. Thank you for listening.
The Economist covers math, physics, philosophy, and AI in a manner that shows how different
countries perceive developments and how they impact markets. They recently published a piece on China's
new neutrino detector. They cover extending life via mitochondrial transplants, creating an entire
new field of medicine.
But it's also not just science.
They analyze culture.
They analyze finance, economics, business, international affairs across every region.
I'm particularly liking their new insider feature.
It was just launched this month.
It gives you, it gives me, a front row access to the economist's internal editorial debates,
where senior editors argue through the news with world leaders and policy makers
and twice weekly long format shows.
Basically, an extremely high-quality podcast.
Something else you should know about is that if you go to their app, they not only have daily articles, but they also have long-form podcasts with their editors and writers.
This is also available online.
Whether it's scientific innovation or shifting global politics, the Economist provides comprehensive coverage beyond headlines.
As a toll listener, you get a special discount.
Head over to economist.com slash TOE to subscribe.
That's economist.com slash TOE for your discount.
Thank you.
