Theories of Everything with Curt Jaimungal - The Theory That Shatters Language Itself
Episode Date: June 13, 2025As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe Professor Elan Barenholtz, cognitive scientist at Florida Atl...antic University, joins TOE to discuss one of the most unsettling ideas in cognitive science: that language is a self-contained, autoregressive system with no inherent connection to the external world. In this mind-altering episode, he explains why AI’s mastery of language without meaning forces us to rethink the nature of mind, perception, and reality itself... Join My New Substack (Personal Writings): https://curtjaimungal.substack.com Listen on Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e Timestamps: 00:00 The Mind and Language Connection 02:09 The Grounded Thesis of Language 09:29 The Epiphany of Language 13:06 The Dichotomy of Language and Perception 16:24 Language as an Autonomous System 19:48 The Problem of Qualia and Language 23:35 Bridging Language and Action 31:32 Exploring Embeddings in Language 38:21 The Platonic Space of Language 44:17 The Challenges of Meaning and Action 51:05 Understanding the Complexity of Color 52:53 The Paradox of Language Describing Itself 58:19 The Map of Language and Action 1:07:48 Continuous Learning in Language Models 1:11:46 The Nature of Memory 1:22:46 The Role of Context 1:32:18 Exploring Language Dynamics 1:39:44 The Shift from Oral to Written Language 2:11:34 Language and the Cosmic Whole 2:21:35 Reflections on Existence Links Mentioned: • Elan’s Substack: https://elanbarenholtz.substack.com • Elan's X / Twitter: https://x.com/ebarenholtz • Geoffrey Hinton on TOE: https://youtu.be/b_DUft-BdIE • Joscha Bach and Ben Goertzel on TOE: https://youtu.be/xw7omaQ8SgA • Elan’s published papers: https://scholar.google.com/citations?user=2grAjZsAAAAJ • Ai medical panel on TOE: https://youtu.be/abzXzPBW4_s • Jacob Barandes and Manolis Kellis on TOE: https://youtu.be/MTD8xkbiGis • Will Hahn on TOE: https://youtu.be/3fkg0uTA3qU • Noam Chomsky on TOE: https://youtu.be/DQuiso493ro • Greg Kondrak on TOE: https://youtu.be/FFW14zSYiFY • Andres Emilsson on TOE: https://youtu.be/BBP8WZpYp0Y • Harnessing the Universal Geometry of Embeddings (paper): https://arxiv.org/pdf/2505.12540 • Yang-Hui He on TOE: https://youtu.be/spIquD_mBFk • Iain McGilchrist on TOE: https://youtu.be/Q9sBKCd2HD0 • Curt interviews ChatGPT: https://youtu.be/mSfChbMRJwY • Empiricism and the Philosophy of Mind (book): https://www.amazon.com/dp/0674251555 • Karl Friston on TOE: https://youtu.be/uk4NZorRjCo • Michael Levin and Anna Ciaunica on TOE: https://youtu.be/2aLhkm6QUgA • The Biology of LLMs (paper): https://transformer-circuits.pub/2025/attribution-graphs/biology.html • Jacob Barandes on TOE: https://youtu.be/YaS1usLeXQM • Emily Adlam on TOE: https://youtu.be/6I2OhmVWLMs • Julian Barbour on TOE: https://youtu.be/bprxrGaf0Os • Tim Palmer on TOE: https://youtu.be/vlklA6jsS8A • Neil Turok on TOE: https://youtu.be/ZUp9x44N3uE • Jayarāśi Bhaṭṭa: https://plato.stanford.edu/entries/jayaraasi/ • On the Origin of Time (book): https://www.amazon.com/dp/0593128443 SUPPORT: - Become a YouTube Member (Early Access Videos): https://www.youtube.com/channel/UCdWIQh9DGG6uhJk8eyIFl1w/join - Support me on Patreon: https://patreon.com/curtjaimungal - Support me on Crypto: https://commerce.coinbase.com/checkout/de803625-87d3-4300-ab6d-85d4258834a9 - Support me on PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 SOCIALS: - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs #science Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
I'm gonna get attacked by physicists.
This thing is just ridiculously good.
And so that just blows my mind.
Professor Barenholtz completely inverts
how we understand mind, meaning, and our place in the universe.
The standard model of language assumes words point to meanings in the world.
However, Professor Barenholtz of Florida Atlantic University
has discovered what's unconscionably unsettling.
They don't.
Language is actually deconstructing itself.
Most startlingly, he argues that our rational linguistic minds have severed us from the
unified cosmic experience that animals may still inhabit.
I don't think there's a static set of facts.
What we've got is potentialities.
Most current LLMs operate with purely autoregressive next token prediction,
operating on ungrounded symbols. All of this terminology is explained, so don't worry,
this podcast can be watched without a formal background in psychology or computer science.
In this conversation, we journey through rigorous explorations of how LLMs work,
what they imply about how we view the world and the relationship between our
consciousness and the cosmos.
Professor, you have two theses.
One is a speculative one and the other is more grounded.
You even have another more hypothetical one atop that, which we may get into.
Why don't you tell us about the more corroborated one and then we can move to the contestable
parts later.
Okay, sure.
So yeah, I would call them sort of the grounded thesis
and then sort of the extended version of that,
if we can call it that.
The grounded thesis is primarily about language.
And the thesis is that human language is captured by what's going on in the large language models.
And I mean not in terms of the specific exact algorithm as to how the large language models
like ChatGBT are doing the, are actually generating language, but the core sort of mathematical
principle that large language models like ChatGPT run on are what's
happening in the brain and is what's happening in human language.
And really, the reason I say it's corroborated is because ultimately this isn't even about
the brain, it's about language itself.
And I think what we have learned in the course of being able to replicate language in a completely
different substrate, namely in computers,
is that we've learned properties of language itself.
We've discovered.
It's not through clever human engineering that we've been able to kind of barrel our
way towards language competency.
It's that with actually fairly straightforward mathematical principles done at scale, we've
actually discovered that language has certain properties that we didn't know it had before.
And so the incontrovertible fact, in my opinion, is that language itself has certain properties.
Now that we know it has those properties, my claim is, the sort of corroborated claim
is that those properties force us to conclude that the mechanism by which humans generate
language is the same as what's going on in these large language models.
Because now that we know that language is capable of doing the stuff that it does, now
that we know it has the properties to, and I'm sort of giving away the punchline, to self-generate based on its internal structure. It's unavoidable to think
that we are using the same basic mechanism and principles because it would be extremely
odd to think that we have a completely different orthogonal method for generating language. Put differently, if we are using completely different mechanisms
than the language models, then it's extremely unlikely that the language models would work
as well as they do. The fact that language has this property that it can self-generate,
the fact that that property actually leads to human-level language, to me, forces the
conclusion that there's only one way to do language.
And that one way is the same in humans and in machines.
The obvious question that's occurring to the audience as they listen right now is, how
do we know that whatever mechanism is being used by LLMs isn't just mimicry?
Right.
And so that's sort of the critical question.
Is this mimicry?
Right.
Is it is what the models are doing in a sense, learning a kind of roundabout
technique that captures some of the superficial components of language in humans,
but ultimately it's a completely different approach. And so, you know, my argument is
really from the fundamental simplicity of these models. So let's just talk really quickly about how large language models work, things like Chachi
BT.
What they're doing is learning, given a sequence.
Let's say the sequence is, I pledge allegiance to the, and then the model is being asked
to do this thing called next token generation.
What's the probable next word? We'll say word for the purpose of this conversation.
We're going to call tokens a word. Token is the more technical term about how you chop up
and encode the information in a sequence of language, but we're just going to say word.
So guess the next word based on that sequence.
So, and then what you do is, in these models,
is you train them to guess simply that.
All you hear is, here's a given sequence.
It can be a sentence, it can be a paragraph,
it can be, frankly, an entire book,
depending on how big your model is, how much it can handle.
And then guess just the very next word.
And what we've discovered, and I really want to use that word in particular,
because it was by no means a given that this would work,
what we discovered is if you train a model to do that,
to simply guess the next word, then take that word, tag it onto the sequence, and feed it back
in, this is sufficient to generate human-level language.
Now the reason I believe that this demonstrates something not about our engineering or even
about the models themselves, because there's different ways you might build a model that
can do this, is because this very simple trick, this simple recipe
of simply guessing the next word turns out to be sufficient to generate language at human levels,
to the point where there really are no benchmarks, no standard benchmarks that these models aren't
able to do. And so what that suggests to me is just by learning the predictive structure of language,
you're able to completely solve language. That means that that is likely to be the actual
fundamental principle that's built into language in order to generate it. If we
had to come up with a very complex scheme, for example, you know, syntax
trees, complex grammar, long-range dependencies that we had to take it into
account, and through enough compute
we were able to kind of master that, then I might argue, well, you know what we're doing is
possibly figuring out a roundabout way to capture all this complexity. But it's the simplicity itself
that simply being able to predict the next token, the next word, is sufficient to do all of this long-range thinking to be able to take an extremely long sequence and then
produce an extremely long sequence on the basis of that, that suggests to me
that we discovered a principle that's actually already latent in language. That
we just had to throw enough firepower at it but with an extremely simple
algorithmic trick and then language revealed its secrets. So to me, this really suggests that there's still a lot of science that needs to be done
and this kind of work that I'm doing in my lab in terms of really being able to hammer
down how the brain is instantiating this exact same algorithm.
It's not going to look exactly like chat GPT.
It's not necessarily going to be based on what we're called transformer models, which is something we can get into a little bit
But as far as the core principle of prediction of the next token
The fact that that solved language so handily to me really argues that that is the fundamental algorithm
That's the that is the fundamental algorithm that when you apply it, boom, language emerges.
If you just have the corpus, you have the statistics, and then you do next token prediction,
language just like add water.
The fact that it emerges so readily from that without having to do anything complicated,
to me suggests that it's latent within language in the first place, And that language is designed in a sense in order to be able to be generated
through this simple predictive kind of mechanism.
Okay. So, Elon, you and I have spent several days together. In fact, you're in the video
with Jacob Barndes and the Manolis Kels one will place that on screen and I'll put a pointer
to you. And you were in the background of the interview with William Hahn on Williams.
Always in the background, never in the foreground.
Here we are.
Okay, well, yes, great.
You have a large epiphany that occurred to you at one point.
We spoke about the software and this precipitated this entire point of view of language as a
generative slash autoregressive model or what have you, tell
me about it.
What the heck was that big idea?
So it wasn't so much an idea as an epiphany, a realization.
And it hit me in a single moment.
And it wasn't necessarily about autoregression.
It wasn't about this finer detail of how ultimately language models, and I believe the brain,
solve this problem.
It was the realization that any model that has been trained, any model that anybody has
built that accomplishes human-level language.
So it might be based on autoregression, it might be based even on diffusion,
which is kind of the arch nemesis of my
autoregressive theory. But regardless,
the fact is that these models are being trained
exclusively on text data. And
so all they are learning is the relations between words.
And so all they are learning is the relations between words. To the model, as far as the model is concerned, the words are turned into numbers.
They're tokenized. We think of them as numerical representations.
But those numbers, and for our purposes we could think of them as words, don't represent anything.
There is nothing in the model besides the relations.
Relations just between the words themselves.
There isn't, for example, any relation between any of the tokens and something external to
it.
What we tend to think of as people, what words are doing when we're discussing topics, thinking
about words in our head, is that they symbolize something, that they refer to something.
This is a lot of the philosophy of language,
a lot of the scientific study of linguistics
is being concerned with semantics.
How do words get grounded?
How do they mean something outside of themselves?
And what large language models show us
is that words don't mean anything outside of themselves.
As far as generation goes, as far as the ability
for us to have this conversation and as far as the model's ability to produce meaningful responses
to just about any question you can throw at them, including writing a long essay on any topic,
including a novel topic that it's never encountered, is by stringing together sequences
based on simply the learned relations between words.
And so this really hit me very, very hard.
I've long been puzzled by, as many are,
by the mind-body problem, the phenomena of consciousness,
the problem of how do we know your red is my red?
And actually the moment that I had this realization was related to this very question.
I just realized that the word red doesn't mean what we mean by qualitative red.
The qualitative red is taking place in our sensory perceptual system.
The word red, to a large language model, can't mean that.
It can't mean any color. It has no color phenomena.
It has no concept of what sensory red would mean.
Yet is able to use the word red with equal ability,
with equal competency, just as well as I can,
if we're just having a conversation about it.
And so what this means is that within the corpus of language,
the word red doesn't mean something external to itself.
Instead, the word red simply means where does it fall in the space of language itself?
Where does red fall in relation to other colors, in relation to the word color, in relation
to other concepts, other, well, frankly, just words, tokens that are related to what we
call concepts that have to do with color and
have to do with the word red. So yeah, so this epiphany was about this
extraordinary dichotomy, this divide between language and that which we think
language refers to. The question is how does language refer and the answer is it
doesn't. Language doesn't refer in and of itself. Language is an autonomous system. It's a self-contained system. It has
the rules contained within it to generate itself, to carry on a conversation.
Large language models don't know what they're talking about in any real sense.
They can talk about, you know, a sunset. They can talk about a taste. They can talk about a taste, they can talk about
all space and time and all of those things. And yet we would say they have no idea what
they're talking about. And we'd be right in the sense that they don't have a notion of
red beyond the token and its relation to other tokens. Now this then raises the obvious question,
well, what do I mean what red is about?
Don't I think red refers to a quality of perception?
And the answer is, I do have a quality of perception.
There is something called red that my sensory system is aware of.
And then there's a token called red that is used in conjunction with, there's a sort of coherent mapping between my sensory perception of red
and the linguistic red.
But that doesn't mean that you need to understand what that word refers to.
You don't need to have the sensory qualitative concept of red in order to completely successfully use the word red.
And so these are compatible but dichotomous systems.
The sensory perceptual system and the linguistic system are ultimately, we can think of them
as essentially distinct and autonomous but compatible or as necessary as required.
Integrated?
But they're integrated.
They're running alongside each other,
they're exchanging messages so that we can have
a single organism that is successfully navigating the world,
and enable, for example, to communicate.
I see something red that's registered in my brain.
I have a qualitative experience of red. that's registered in my brain. I have a qualitative
experience of red. It's remembered in having a certain quality. And then later on I said,
oh, you know, could you go pick up that red object for me? And so we are, there's a handoff
between the perceptual system and the linguistic system. Just that the linguistic system can now
successfully send a message to you. Now you've got the linguistic system.
You can talk about that.
Oh, okay, you told me there's a red object.
Are there multiple objects?
Yes, there's multiple objects.
They have different colors.
You're looking for the red one.
Maybe it's a dark red.
I'm doing this all linguistically.
Now you're able to go into the room
and successfully get the right object.
So again, the handoff happens the other direction.
Language is able to hand off to the perceptual system,
and the perceptual system is able to then detect that there's something with the right quality.
But that's not the same thing as saying that the language contains the reference inherently
within it. It simply means that these are communicative systems, that they can exchange
information, that they integrate with one another in terms of forming coherent behavior. But language is its own beast.
It's its own autonomous system.
It can run on its own.
That was the big realization.
Large language models prove it.
That language is able to produce the next token and by virtue of the next token, the
next sequence.
And that means all of language without having any concept of reference.
The reference has no place there.
There's no way to kind of squeeze it in.
If your computational account is the one that I'm proposing, if the computational account
is essentially prediction based on a next token based purely on the topology, the structure,
the statistical structure of language, then there's no way to cram any other kind of grounding or any sort of computational
feature in there at all. It has to be something closer to what the large language model is
prompting. You can imagine a camera that generates a linguistic description of what's in a room,
and then you could ask your language model, and you can by the way, you can do this right now,
they're able to do vision, you could take a picture and feed it to the your language model, and you can by the way, you can do this right now, they're able to do vision.
You can take a picture and feed it to the large language model.
What's happening is much closer to generating a prompt, basically saying, here's what's in the room.
And now based on these features, these scripts now run the same exact language exclusive model.
And so language takes care of itself. It doesn't need grounding in order to be able
to do everything it does.
It doesn't have to have concepts outside of itself.
I think that's basically been proven
by these text only large language models.
So that was the big epiphany.
The big epiphany was that, oh, language is autonomous.
Language is self-generating.
That means it's a dichotomous computational system.
It's independent of the rest.
What this leads me to believe is, okay,
well, if it can live in silicon in this way, then perhaps,
and now I've come to believe very strongly,
that it likely runs in the same way in carbon,
in biology, in our brains.
Okay. So you're not denying consciousness and you're not denying qualia.
No. I want to make this very clear.
My personal opinion on this is besides the point,
to some extent, you can be an eliminative if you want.
Although I think everything I'm saying has a lot of bearing on this.
But I believe my account is strictly an account of language.
I think that perceptual mechanisms that give rise to qualia, things like redness and heat
and taste and all of these are basically processes
that take place long before the handoff.
And so what happens is, think about the camera,
the camera is transducing light,
it's measuring certain wavelengths,
then there's a lot of visual processing that has to happen
before you get to the point where it's turned
into a linguistic friendly embedding, right?
The stuff that an LLM can see, a multimodal LLM can see.
And so all of that processing that happens is what I think gives rise to a qualitative experience.
We experience redness because of all of this very sort of analog,
probably non-symbolic kind of representation.
And then at the end of that process, there is a conversion.
By the way, by the end of the process,
a lot of things happen.
We also respond to colors and to light
and all of that non-linguistically.
But part of the end,
sort of we could think of different endpoints.
One of those endpoints is here's a handoff to language.
And by the time language gets it,
it's long past that initial process,
that kind of sensory and perceptual processing
that gives rise to qualitative phenomena.
So I strongly believe that there is, in a certain sense,
the word hard problem is a little loaded.
I believe there's undeniable qualia.
But what I also think is that language is poorly equipped.
It's simply unaware in some sense
of the underlying mechanisms that give rise
to what it receives at the far end,
at the sort of the end point of that qualitative processing.
Just a moment. Don't go anywhere. Hey, I see you inching away. Don't be like the economy. Instead,
read The Economist. I thought all The Economist was was something that CEOs read to stay up to
date on world trends. And that's true, but that's not only true. What I found more than useful for myself,
personally, is their coverage of math, physics, philosophy, and AI. Especially how
something is perceived by other countries and how it may impact markets.
For instance, the Economist had an interview with some of the people behind DeepSeek the week
DeepSeek was launched. No one else had that. Another example is The Economist has this fantastic article on the recent dark energy data,
which surpasses even scientific Americans' coverage, in my opinion.
They also have the chart of everything.
It's like the chart version of this channel.
It's something which is a pleasure to scroll through and learn from.
Links to all of these will be in the description, of course.
Additionally, just this week, there were two articles published.
One about the Dead Sea Scrolls and how AI models can help analyze the dates that they were published
by looking at their transcription qualities, and another article that I loved is the 40 Best Books
published this year so far. Sign up at economist.com slash toe for the yearly subscription. I do so and
you won't regret it. Remember to use that toe code as it counts to helping this channel and gets you
a discount. So now the economist's commitment to helping this channel and gets you a discount.
So now the economist's commitment to rigorous journalism means that you get a clear picture of the world's most significant developments.
I am personally interested in the more scientific ones like this one on extending life via mitochondrial transplants,
which creates actually a new field of medicine, something that would make Michael Levin proud. The Economist also covers culture, finance and economics, business, international affairs,
Britain, Europe, the Middle East, Africa, China, Asia, the Americas, and of course,
the USA.
Whether it's the latest in scientific innovation or the shifting landscape of global politics,
The Economist provides comprehensive coverage, and it goes far beyond just headlines. Look, if you're passionate about expanding your knowledge and gaining a new
understanding, a deeper one of the forces that shape our world, then I highly recommend subscribing
to The Economist. I subscribe to them and it's an investment into my, into your intellectual growth.
It's one that you won't regret. As a listener of this podcast, you'll get a special 20% off discount.
Now you can enjoy The Economist and all it has to offer for less.
Head over to their website, www.economist.com slash toe, T-O-E, to get started.
Thanks for tuning in.
And now let's get back to the exploration of the mysteries of our universe.
Again, that's economist.com slash toe.
To what it receives at the far end, at the sort of the endpoint of that qualitative processing.
Okay, let me see if I get this.
You have some redness.
So you do, you're not denying redness.
You grant redness.
I do.
Okay, there denying redness. You grant redness. I do. Okay. There's redness and then somehow this needs to be referred to with some spoken words,
with some language.
Okay.
So what's happening?
You're saying that it's an independent system, yet it's integrated.
So what is that relationship?
And does it become so diluted that by the time you refer to it, you're no longer referring
to that qualia?
Like, I don't understand.
Yeah.
That is essentially the idea.
So this is the exact problem I am working on right now.
There was a fantastic paper that I just came across about a week ago.
There was a paper that was published in an archive recently.
It's called Harnessing the Universal Geometry of Embeddings.
And what this paper showed is that you could have completely different models
solving different linguistic tasks. For example, you could have GPT, then you could have BERT,
which solves a somewhat different task. So there's masked tokens as opposed to autoregressive
next token generation. And what they found was that you could learn
what this latent space, what you could do is
hand off, take the embedding.
The embedding is basically, you can think of that as numerical representation.
It's a high dimensional numerical representation
of your tokens.
So here's a token, this token is going to represent the word dog.
And then we're going to take that token and embed it in a much higher dimensional space.
And what they found is that if you take the embedding, the higher dimensional representation
from one model, so you chat GPT, and then take representation from a different model,
that you could actually get the, you could then, you could take the embedding,
send it to this latent space.
If you cycle it through, get the, you have to,
it's starting to get in the weeds a little bit,
but you send it to this latent space and then recover it in its original form.
What you can do is, once you've got that latent space,
you can then translate from
one embedding to a completely different embedding.
This is a new paper.
This is a new paper, yes.
Right.
This rocked my world because what they're arguing is that there,
in some ways, is this underlying universal structure
of language that's captured in this latent space.
And so even though if you have a radically different embedding in one line, you know,
they didn't do it across different languages.
It's one of the projects I'm doing right now is to see if you can do this across, say,
English and Spanish, even for a language that's trained exclusively on English and then another
models trained exclusively on English, and then another models trained exclusively on Spanish.
Can you guess the Spanish just from finding
these universal structure across these two different models?
Sorry, what do you mean, can you guess the Spanish?
If a model was trained only in English,
and then it was receiving
some Spanish text, a couple of Spanish sentences?
The way to think about it is that what you're doing is creating
another embedding, another, another, this latent space
where you're going to be able to send in a message in English and then based on
the station and then again do the same thing for Spanish
and then what you're not you're never going to show any model, no model is
going to ever see a pair of English and Spanish.
Instead, what you're going to learn is that there's some way
to get from, you're going to end up being able to get
from English to Spanish without ever seeing
the actual translation.
Because what the model is going to learn is what's common
across these two representations.
What's true for both the Spanish embedding
and the English embedding,
that there's some sort of underlying latent structure
that's true of both,
and that that captures something more universal
about language.
Now again, they didn't do it for different languages,
they just did it for different embeddings of English,
but very different embeddings,
because they were trained on completely different models.
If you looked at them, if you just looked at this sort of vector representation, took
a vector representation of the word dog in one and a vectorization of the word dog in
the other, they're completely numerically not, there's no similarity.
You can never spot the similarities if you just looked even them pairwise.
But if they do this kind of reconstruction and then ask the model to be able to reconstruct, not in the original embedding space,
but go and reconstruct in the other embedding space,
it's able to actually do this.
And so by doing that, by training it to do that,
without ever seeing any pairs,
it's able to sort of learn the translation
between one representation and another representation.
What this opened up to me is the possibility that we could think about
the exact same latent space in the brain and
possibly in artificial intelligence models between,
say, the perceptual world and the linguistic world.
That there is some embedding of how the physical world is structured.
We understand, think about an animal,
a non-linguistic animal, certainly has idea of objects.
Objects in relation to other objects,
objects in proximity to other objects,
moving around those objects.
My dog, who was just parking in the background,
knows what doors are and she can go scratch it,
and she knows it opens up.
She certainly isn't able to express that linguistically,
but she has this concept and she's able to think about it.
She's able in some ways to reason about that.
My suspicion is that that probably is done maybe even autoregressively,
but we'll leave that aside for now.
The main point is that there is some representation of the facts about the world,
the sensory facts of the world,
or the sensory facts of the world, or the sensory, I would say, the sensory construction, the facts that have been constructed
based on sensory information.
So that's some sort of embedding of the world.
The linguistic embedding is a radically different embedding.
It carries information about the world as well,
but not in the way, not in the direct way that we think,
not that the word, you know, my headphones are sitting on this desk has direct reference back to sensation and perception
No, it lives on its own. It's its own embedding and it does its own and it can and it can do its own thing
however
Based on this paper
This really gave me sort of a key insight that there might be this latent space where you can actually do this kind of mapping
Where there's translation between linguistic and perceptual embeddings. They're as distinct as they are fundamentally very very distinct very different
They're there to solve different problems, but they're able to talk to each other how perhaps through this kind of latent space
Where the some universal structure like okay, in language there's
certain facts about language.
There's a fact about the word dog or the word microphone that its relation to other words
like desk in some ways captures the fact that microphone sits on top of desks.
That fact is somehow actually contained
within this embedding structure.
In what sense?
Well, if you ask me, would a desk sit on a microphone
or would a microphone sit on a desk?
I can answer that question.
So can chat.jpt, right?
And without any notion of what microphones really are
sort of from a perceptual standpoint,
they're having these kinds of properties,
we can talk about them.
And the embedding space, the linguistic embedding space, contains this information.
What does it mean contains information? By the way, just to say, what does that
mean? It means given a certain input, like do microphones sit on desks? Where
should I put my microphone? I can answer linguistically in a reasonable way, right?
And that's what I mean by the knowledge. It's purely linguistic
knowledge. It only can generate linguistic responses.
But the point is that that knowledge lives in this kind of linguistic embedding.
And then there's the other kind of embeddings.
There's a visual embedding.
There might be an auditory embedding, which is distinct.
And then the idea that I'm very inspired by is that there can be this latent space that
captures certain universals that
are common across these different embeddings that make translation possible.
So that when I see this microphone sitting on a desk, what's now available to
me is the ability to describe that to you linguistically. But it's not direct.
It's not that there's a very specific linguistic representation of this sensory perceptual
kind of phenomena.
And this is important because forever philosophers, philosophers in general, linguists have been
trying to understand how do words get their meaning.
How do they, something I referred to earlier, you know, what's the definition of a microphone?
What's the definition of a dog?
Sure, right.
And the answer is there isn't a single one.
There isn't a single definition that's ever going to capture.
Instead, what you've got is this latent bridge where there's
some representation of this fact that given,
whatever your particular prompt is,
your linguistic prompt is going to lead to
certain meaningful linguistic behavior.
If you ask me a question about this microphone,
I might be able to answer that question meaningfully based on the perceptual information.
But what this microphone means is actually completely contingent on,
at least linguistically, is contingent on whatever question you ask me about it.
And so it's all going to depend on what you're doing with that latent space.
There isn't sort of, and this is sort of a broader point,
there isn't sort of a static set of facts about the world that's embedded in language.
I don't think there would be a static set of facts
embedded in our sort of visual embedding of the world.
Instead, what we've got is what I call potentialities.
We now have the ability to engage that latent space linguistically,
where the perceptual information kind of lives, sort of this universal embedding of it, and
then do whatever we need to do with it.
If I need to answer this question about it, I can answer that question.
If you ask me a different question, I can answer that.
But there isn't a singular meaning of microphone that captures sort of the entire set of facts.
Here it is.
Here's the embedded set of facts.
The set of facts is actually infinite.
I could tell you infinite things about this microphone, right?
For starters, to use a silly philosophical example, it doesn't have this shape and it
doesn't have that shape.
I could tell you there's an infinite number of questions you could ask me about it that
I could answer meaningfully about it.
So all those potentialities are kind of what happens when the linguistic system interacts
with this kind of shared embedding space.
That's sort of the half-baked version of how I think language ultimately does have to inter—of
course, language only is meaningful insofar as it can live within
the larger ecosystem of perception and sensation and perception.
We have to be able to take in information through our senses and then communicate, although
I don't want to use that word late, kind of carefully.
I don't communicate the entire representation because as I said that I don't think that's even a
meaningful idea.
Instead, what I can do is use language in a way that helps us coordinate our behavior.
There's no way to sort of download the entire perceptual state.
That's locked up in some ways in the perceptual embedding.
No, what I can do is pull some information such that I can meaningfully
communicate with you in a way that then is going to have the intended consequences. I'm not downloading
perceptual information into your brain. I'm telling you what you need to know in order to be able to
perform some action, to perform some behavior, or maybe even to think about it, so that you could later perform some action.
I know that was a lot.
Feel free to back me up and challenge me on any of these things.
I want to see if I understand this and I want to
explore what is the definition of language,
even though we just talked about there isn't the definition of a microphone, say.
But I do want to talk about the definition of language and what is autoregression.
And well, presumably you're telling me what you believe with language,
you're telling me this model because you believe it's true.
I don't know what truth you're conveying if you believe this is not grounded.
So what are you referring to when you even say that language is
autoregressive without symbol grounding?
I don't have an ideas to that.
I want to explore that.
But first, I want to see if I understand you.
Okay.
So a latent space.
So let's think of a word.
A word gets a vector like an arrow.
And I'm just going to be 2D for
this example because that's just what the camera picks up.
So let's say the word dog looks like so,
the word cat looks like so, whatever.
Okay. The space that it's
embedded in is called the latent space. Is that correct?
Well, the initial embedding is just the embedding.
So that shows us the high dimensional.
It's just, yeah, let's forget the word high dimensional.
It's just a big long list of numbers.
Let's say you've got 10,000 numbers.
For dog, we're going to represent dog as this particular sequence of these numbers. And let's say you've got 10,000 numbers. And for dog, we're going to represent dog
as this particular sequence of these numbers.
For cat, it's a different sequence of these numbers.
And so that's our initial embedding.
So the latent space is a compressed version of that?
Well, in some ways, it's actually not compressed.
What's the opposite of compressed?
It's expanded. It's an opposite of compressed? It's expanded.
It's an expanded version.
It's actually, so you have the original tokenization, which just says here in a fairly small vector,
but then you expand it into a much higher dimensional embedding space so that each token
actually ends up getting a much richer, many more numbers that are used in order to represent each token.
That's a very key fundamental thing that these models do.
By expanding in these different dimensions, that's what allows you to massage the space
so that you can get all these cool properties like cat and dog being in the appropriate
relation to one another so that later on when you're trying to figure out what the next token is you're
able to actually leverage the inherent structure in this high-dimensional
space. Okay so then you have the language model for English and then you have a
language model for Spanish. Yes. And let's imagine that it was trained only with a
corpus of English in the former case and only with the corpus of English in the former case,
and only with the corpus of Spanish in the second,
and then we can even have a third of Mandarin.
Sure.
Okay.
Yeah. In fact, in the paper,
they didn't do different languages.
They said they did different embeddings
of English language models,
but yes, they used multiple.
They actually did this across
several different embeddings, not just two.
Okay. So then the claim or finding is that if we look at cat and dog inside of here in
English, it gets mapped to some fourth space here,
which is like a Rosetta Stone space or a Platonic space.
Yeah. That's exactly what they call it.
Platonic, they use the word platonic.
Okay, great.
Well done.
Great. Then it looks like this there.
Okay. Then if you were to say,
okay, well, let me just forget about English and this platonic space. Let me look at cat and dog in Spanish. Okay. And it looks
like this here. Let me map it from here to my platonic space. Oh, wow. It gets mapped
to a similar place. Oh, and does the Mandarin let's find that out a cat dog. It does. Okay.
Let's test out more words. So the claim is that this space here is this
meaning-like space. Okay, great. And then what you're saying is that microphone, we think of
microphone as living in here as a single vector. That would be like an essence of the microphone
that we're referring to. But actually, microphone, our concept of microphone depends on the prompt.
microphone, our concept of microphone depends on the prompt. So explain that.
That sounds interesting.
Yeah, I think, and you're making me think about this in a way that I hadn't quite before.
So the level of which I've thought about it is that you've got these different embeddings.
When I see a microphone visually, there's a certain vector representation
of what that sensory perceptual experience, and I don't mean the qualitative sense, I'm
not getting into phenomenology, but there's something happening in my brain that is sort
of the representation of what it means for me to see this object from the visual standpoint. Okay, that's one embedding.
And then we also have a word, microphone, which is a completely different embedding.
There's simply a word that lives in language space.
What it means is that there's a specific embedding.
So it's kind of helpful to think about sort of a point in a space. So you know you've got this super high dimensional space and each individual token is simply a vector in that space. So it really picks
out a specific point. And we can say microphone lives right here in this linguistic space.
And then my perceptual experience, I don't want to use that word, but my perceptual kind of grasping of this
microphone being here is this point in a completely different space, this perceptual space, which
has, you know, it captures other kinds of information.
In language, so let's actually talk about this for a second, in language, the space,
if you want it to be a useful, meaningful space, you're going to want things that have similar meaning, they're likely to actually have proximity
to each other.
And this is to some extent what the large language models learn.
They learn in embedding.
In order to do next token, they learn in embedding that gives this, where the space, you know,
and we can think of it almost like two-dimensional, three-dimensional space. Obviously, it's very high dimensional, but you know, for our purpose, we think about
that, that where cat and dog live, you want those things to live closer together than
cat and desk.
And of course, it's much richer than that, right?
It's not just semantic, like this very kind of superficial level of semantic similarity.
In fact, what it is, is capture the somehow the semantics, so to speak,
are captured by the space like the space, the shape of the space itself
is what allows the model to understand sort of the relation between words
so that I can do the next token generation.
But it's a very, very different space, right?
It has to do with, really, with relations between words and in terms of generation,
in terms of next token generation, so that it's useful for that purpose.
What does the perceptual space look like?
Well, this perceptual space is going to have a very different, the axes there almost certainly
aren't going to have the same kind of meaning as in the linguistic space.
There'll be something closer, maybe color features, shape features, something like
that. And where this microphone lives is within that space is going to have
radically different meaning than saying, you know, it's, you know, it's not apples
and oranges, right? Those aren't different enough, right? It's apples and
math or something. It's really, really radically different kinds of spaces. But what I'm proposing,
what I think the insight here is that ultimately there is the possibility of having a shared
space that you can send, you can project both of these things to where microphone, the word,
is going to somehow make contact with this perceptual
experience right now, this perceptual fact. But it's not, and here's
the key point that you're getting at, it's not that this word microphone picks
out the exact same embedding in this latent space. It's not that it's going to
make that thing light up. Oh, it's the same thing.
No, it's that when you ask a certain question
about a microphone, is there a microphone on your desk?
My perceptual system is generating some,
well, first of all,
it's just generating the perceptual phenomena,
but then it's also sharing information in this latent space,
which my linguistic system can then go draw from.
Then given this particular prompt,
was there a microphone on my desk,
I'm able to then successfully answer the question.
It's not quite the same thing as saying that they're
picking out the same information in latent space,
because my argument is that
that's not really a meaningful concept.
There isn't the same microphone in linguistic terms doesn't pick out a perceptual kind of
fact that's not possible.
These are radically different kinds of facts.
But what the latent space might allow us to do is not just to translate, which is what
they did in this paper, but perhaps to pass information along in a meaningful way so that you're able to
access it and do something successful like answer the question, is there a microphone
on this desk?
I think that might be what's happening to some extent even in the multimodal models.
It's a longer conversation.
That's not really how they work.
They don't actually operate based on a shared latent space or anything like that.
Really what they do is the models learn to take a perceptual input and turn it into something like language.
So it's more similar to like prompting almost. It's not exactly that.
But it's injecting something within linguistic space that is
Equivalent to to actual language. It's not the same thing as the shared latent space which but my
Hypothesis is that there may be something very similar
happening So you don't think that multimodal models will have will solve the symbol grounding problem
You don't even think there is a symbol grounding problem
That is a fair question. And here's actually a prediction or a falsifiable in some sense. Will
multimodal models fully solve the kind of, well, the bridging problem, let's call it that.
Because the grounding problem, using that terminology, well, my argument is that there
is no grounding problem because words don't have to be grounded in order to operate linguistically.
That's sufficient.
It's enough to simply be able to generate language.
You don't need the grounding.
But in order to have this kind of fully operational organism that's able to use language and also use perception in, you know, bridge
these different maps in a meaningful way so that we can get, you know, full coherence.
I guess, you know, full, let's just call it human level perceptual linguistic coherence
so that I can say to you,
hey, can you go grab that or say to a machine,
can you go grab that object, describe what I want,
and then the machine is able to go and do exactly what I described,
then my argument is that I don't think,
and again, this is speculative,
I could be proven wrong certainly on this.
My suspicion is that we're not going to be able to do it
using the kind of approach that multimodal models currently suspicion is that we're not going to be able to do it using the kind of approach that
multimodal models currently use.
That you're not going to get there.
It's kind of a dumb trick, the way that we're currently solving the problem.
Because we're not really allowing these two different modalities to kind of live on their
own and do the work that they do.
Instead we're kind of, we're strong arming perception into a linguistic form.
What I think is maybe a more important solution and I hope, you know, this podcast one day
is, you know, kind of an early sort of canary in the coal mine for this idea is that it's
something closer to this kind of shared latent space.
That what you do, what you have is these completely distinct kind of mappings, we'll call them
embeddings.
They can kind of grow up on their own, learn the information that they need to independently
of one another, but at the same time, they have this sort of shared sandbox where they're
able to communicate with one another and do things.
So I think it might take a very different approach to get full perceptual linguistic
competency.
Okay.
Have you heard of Wilfred Sellers?
I believe it's Sellers.
Oh gosh, I read Wilfred Sellers in early, one of my first philosophy classes I ever took.
I'm trying to remember the name of the book, but I'm sorry.
So which work, boy?
I believe it's Empiricism and the Philosophy of Mind. I'll put a link on screen if I'm correct.
Sounds familiar, but catch me up.
So he's criticizing the idea that our perception gives foundational non-conceptual empirical
knowledge.
So these experiential givens that we think of as primitive, like redness, he would say
that they involve heavy interrelations of concepts. So for instance,
the way that I think about it is if you're to say to someone redness, they'll be like, well,
what kind of redness exactly are you talking about? Then they'll think, okay, the redness of an apple,
but then an apple is not always red. Okay, redness of an apple in a certain season with a certain
type of sunlight. Okay, now I've gotten it. So by the time you go in to pull out this primitive,
you've then soaked it with so many other concepts. You can't actually come in with language and pull
out a primitive. Yeah, that sounds extremely similar to sort of to the initial insight.
And it's related to the inverted qualia problem. You know, I don't know, your red is not my green and vice versa,
and it's because the linguistic representation doesn't capture, you know, we can think,
again, it lives in a completely different embedding space.
And when we think about the redness of red, well, it's qualitatively similar to orange
in, you know, there's sort of a continuum between those.
Those qualitative similarities are really only contained
and only understandable by the sensory perceptual system.
And we can talk about them.
We can sort of say,
yeah, red is a little more similar to orange,
but that's because we have a sort of very coarse,
maybe via this kind this latent space,
where we're able to refer to certain kinds of properties
in a way that is useful for communication.
But as far as that raw qualitative property,
that comes to us not – it's primitive in the sense that we can't unpackage it linguistically,
but it's not primitive in the sense that there's extraordinary cognitive machinery that is responsible for that qualitative.
Think about the world of animals and what they do with color and how well they understand shape and how they understand a space.
All of that is unavailable to our linguistic system.
It's available, by the way, to us, our sensory perceptual system, but it's unavailable to
the linguistic system because it doesn't live in the same space at all.
And so I think what you're describing actually sounds extremely similar.
The idea that we can't really dip in, it's simply the wrong map.
We can't map this map onto that map at all.
We can go to this and maybe the potentially shared latent space, or maybe again, maybe
my account's wrong and there's some more direct kind of handshake that happens between these
systems. But ultimately, they're taking place in radically different spaces, and you're
losing an enormous amount of information. It's literally, you know, quantifiably a loss of information. The word red does not convey redness because redness is not
just a word. It's not just a simple concept that you can say in, you know,
using an individual token. Now, by the way, the word red is not so simple
either, right? Red in language space is also complex.
It has all kinds of relations to other words.
But the concept, the percept of red has all of this complexity to it because it's where
it lives in its own space, in colors, in the perceptual space.
And so yes, that sounds extremely similar to what Seller seems to be intuiting.
Yeah.
And just so you know, the way that I relayed what Seller's myth of the given is, it isn't precisely what he was saying because he was more about knowledge.
And I'm speaking more about the percepts, the raw sense data, than being taken to
language, like being dredged from the, from your sensory data or what have you to language.
But anyhow, it's approximately correct.
It's good enough for this conversation.
Perfect example of being able to sort of linguistically construct a kind of novel conceptual framework,
but do it linguistically and in real time in a way that hasn't been probably maybe ever been done before.
So well done Kurtz LLM.
Thank you. So now you're an LLM
speaking to some other LLM trying to convince it of some truth that we mentioned before,
like you have this model, whatever you want to call this model,
autoregressive language, TOE model.
What are you even referring to? You're using language to
convince myself, to convince yourself, to explain. What are you even explaining? What
are you referring to?
Yes. You've asked a very hard question. And there's a certain, I think of it as a bit
of a paradox that's sort of inherent in sort of what I'm trying to do, because language
is trying to describe itself
and in the process of doing so, it's actually deconstructing itself.
It's saying, I am just this and I'm not what I think I am, but who's I and what do you
mean think?
Right?
How does language have wrong concepts about itself that are actually manifestations of
its own structure. The good news is that I have a sort of
escape hatch here which is there was really this is really like in some ways
a very set very very simple account and it's it's just that there's prediction
from sequence to token. How that does stuff in the world is a harder problem.
How that does stuff, as we were discussing, how does it allow me to say something to you
that then can have perceptual consequences, behavioral consequences.
This is certainly a difficult problem.
But we can ignore that problem for a moment and say we are going to take language on its own terms.
And what language is, is simply a map
amongst meaningless squiggles.
It's simply a map amongst various,
what we can think of as largely arbitrary symbols.
And those symbols can get grounded in writing,
they could get grounded in the activation of circuits,
they can get grounded in the d of circuits, they can get grounded in the
dendritic or neural responses.
But the core hypothesis here is that what language is, is simply a topology amongst
symbols.
And-
And by topology, mean connectivity?
One way to think of it, you could do it sort of connectively.
You could try to do it as sort of as a graph.
There's different ways you could try
to do this mathematically to try to capture this structure.
In the case of large language models,
really it comes down to these embeddings, which you
can do graph, from a graph theoretical standpoint, but you don't have to. You could just think
about it as a space, and then you're just simply saying where each token lives within
that space. And that's really the representation of language. But what it is, is relational.
It's that these symbols have relations to one another within this space.
And the relations are then used in order to generate.
And that's it.
Now, how does meaning emerge out of that
is a separate question.
But my argument is that language doesn't have to worry
about meaning, language just has to worry about language.
So when I say I'm talking to you
and I'm having a conversation with you and trying to explain
something to you, this is an LLM actually producing a sequence and what that sequence
is going to do, it might do certain perceptual things by the way in your mind, it might produce
certain kinds of images, those are kind of auxiliary to language, those happen as well,
I'm not denying they happen, but as far as this conversation goes, I am producing a sequence that's going to serve as a prompt.
And you're going to predict the next token. Yeah, without my consent, by the way.
And that's, that is, that's in some ways, you know, that not to take that too seriously,
but yes, in some radical, one way to think about it is that language is doing, is actually forcing your mind to do something else, whether it's produce images,
but also to produce sequences.
So my choice of a prompt is actually going to, deterministically, there is, within large
language models, there's some probabilistic kind of behavior
in the sense that they generate a distribution of the next token.
And then you add a little bit of chanciness.
You say, maybe I'm going to pick the most likely versus this is the temperature.
But it really is deterministic.
And yes, the prompt I'm going to put into your head is going to basically determine how you're going to respond
Now mind you again
It's there's a larger ecosystem where you're gonna think about things visually and that's gonna go feed back into the linguistic system
So it's not quite as simple as prompt in and sequence out but at the linguistic level
That's basically what I'm arguing is
But at the linguistic level, that's basically what I'm arguing. Now, the fancy stuff, which is basically meaning and the ability to coordinate all that,
falls out of how our minds ultimately form this space.
Now, you can take an untrained model, an untrained large language model.
You give it a sequence in, it's going to give you a sequence out, right?
It will do that.
And we say, hey, look, it's doing to give you a sequence out. Right? It will do that.
And we say, hey, look, it's doing next token generation.
It's doing autoregression. But it's going to be gobbledygook.
It's going to be meaningless.
The magic of language and the magic of what our brains do
and what these large language models do is that given sufficient examples,
we are actually molding this space such that the next token generation is doing
something meaningful.
Meaningful in what sense?
Well, this harder problem of, well, I can tell you something and then that's going to
determine not just your language but your behavior later on.
And so there really is something more.
The map matters, right?
The space, the shape of the space is really, really critical.
It's not like autoregressive Next Token solves the problem.
It's that autoregressive Next Token generation,
when optimized in the larger ecosystem of behavior
and coordination and communication, does this thing.
But still, I don't want to back away from this.
When you get down to it, in the end,
what you've got is just next token.
What you've got is just language, generating language.
That's really what language is.
That's what we're doing when we're thinking linguistically.
The fact that it happens to have this meaning is not actually driving the computation.
You shape the space. The space gets shaped by other factors,
things like the learning.
Well, you learn about the different tokens
and how they relate to one another.
You learn about perhaps the utility of certain tokens
to refer and to map to these perceptual phenomena.
But by the time you're doing language generation, the space has been shaped.
And so all you're doing is next token generation.
All you're doing is tokens predicting tokens.
And so I don't want to back away from that.
The strong claim is that language simply is that.
And it's autonomous. It has these properties.
Through all this optimization over the course of development, maybe evolution, it's not
part of my theory at this point.
Chomsky's property, the stimulus, all of these problems of how do we get to such a magnificent
space?
How do we get to such a magnificent shape of this space such that it is able to map
to or at least serve this utility of being a coordinated kind of a tool?
All of that has to happen.
But the bottom line of what language is, is unchanged in this account. Okay, so I want to explore more about language and then it relating or giving rise to action
and other systems, visual systems, etc.
What is there?
I was worried about that, but go ahead.
Okay, so look, there are meaningless squiggles.
How is it that some meaningless squiggles, your brain squiggle generator, makes your
physical body get up
and close the door because your dog was barking.
Where does action connect to abstraction?
That is the key question.
I believe that's what we need to solve.
That's sort of what the field of linguistics or whatever we want to call it, maybe even
cognition needs to solve because these mappings are happening.
What we know from the large language models is you don't need that in order to be proficient in language. So this is where we have to start from. That's the starting point, that the language
can live on its own and you can learn language in theory. You can learn language independent of any
of that stuff. The ability to make somebody get up and move, the ability
for me to reason about perceptual phenomena, language is able to be mastered entirely based
on its own structure, the meaningless squiggles.
Now the question you're raising is what I think is sort of that's what we need to do
as a species.
If we want to understand scientifically how language really works, is to understand how
you go from an autonomous self-generating system that has its own self-generative rules
that are determined simply by relations between these meaningless squiggles and how does that then get mapped to Meenik, to the ability
for me to use some of those tokens and then get you to do stuff, right?
And so that's what language learning is.
There's going to be, I guess we can think of almost two, maybe even independent processes.
One is learn how words play with one another.
Okay.
Learn that this word tends to be in relation to that word.
Okay, that one's solved.
Yes.
As far as you're concerned, got it.
Okay. Then we also learn about perceptual phenomena.
We learn that there's things on top of
other things and there are actions we want to take.
The things that my dog understands.
Now, the question is, how do these things bridge?
How do you get from tokens that have their own life of their own,
the sort of relational properties amongst one another to that other kind of,
I guess, representation, that other way of encoding, let's say, facts about the world.
Facts about the world is, even that's saying too much, it's just another brain state, right?
Brain state that has perceptual information.
Hold on.
Facts about the world is just another brain state or the world is another brain state?
So all we've got is brain states, right?
And this is sort of the, you know, the, the, the inherent, uh, sort of, this,
this is fact number one about like what we've learned about ourselves as a
species, uh, all we has, all we have, we have perceptual brain states.
We also have maybe linguistic brain states.
Um, those perceptual brain states are in some ways related to what's
going on in the world.
And potentially we can think of them as being related to what you can do in the world as
well.
And so maybe actions, well, we have brain states that correspond to our proprioception,
our muscles, where things having to do with our own body.
And so there's these various brain states that carry,
we can think of them as carrying information.
The reason I'm worried about using that phrase is because, again,
I don't believe in sort of a one-to-one simple correspondence
where we say this particular brain state corresponds to this perceptual kind of phenomenon in the world
or some state of the world because it's probably not that simple.
It's probably closer to these potentialities, right? There's some sort of activity that's due to my perceptual system
that my brain can do things with
and
engage with in some way. And so, but what we do have is these brain states that are derived from distinct sources of information,
sensory perceptual, and then linguistic.
Linguistic gets there, by the way,
through sensory perceptual.
We're not gonna get into that, right?
We're thinking of symbols as being kind of arbitrary.
Yes, you have to hear the word cat
and you have to hear the word dog,
but I think we have good reason to say now
it's just like
large language models that these are kind of arbitrary symbols with their
relations between them. That's what matters. Okay, so you've got these
distinct brain states which in some ways, again this is philosophically
fraught, but in some ways represent facts about the world perhaps, but I don't
really want to go that far, but you've got these brain states that need to talk to each other so that they can coordinate and
That is sort of the key fundamental problem
That our organism has to solve and it's not just like of course, you're not born and having the linguistic
You're not just you're not born having the the linguistic
Mapping all solved.
You have to learn that.
But you are born into a world
where it's already been solved.
Meaning we've got these corpus of language,
the thing that the large language models were trained on.
That pre-existed the models,
just as when a baby's born,
the English language pre-exists the baby.
You can learn the mappings, and I believe you do.
You can learn the mappings, and I believe you do, you can learn the embedding space of language
without the other stuff, right?
That's again, that's sort of the key insight
for the large language models.
So that already contained within the linguistic system
that we've honed over however many years
it took for humans to develop language,
we've honed a system that has this utility built in
such that it's a good thing to dump into that latent space
so that when a baby hears the word ball,
sees this object that's a ball,
gets that mapping.
But again, the word ball is really meaningful.
It really has its own role in relation to other words.
But over the course of development,
you also learn this kind of, what I think is maybe
a latent space bridge or some other bridge between these.
And so in the end, you end up being able to tell somebody,
go pick up that ball.
And of course, they're able to go and do it.
But you're really engaging, very distinct mechanisms that have some way of bridging.
Which is, it's a non-answer.
I'm not going to pretend that that is even halfway to a solution, but I do think it's
a sketch of how the cognitive architecture ultimately really is built.
I think that we've now nailed down one piece,
the linguistic piece, and we're able to say,
this is how it lives, and this is how it operate,
and it is autonomous.
And then we don't really have,
we don't have something similar for perceptual space
and for motor space.
We don't have something comparable.
We haven't been able to capture it successfully.
Maybe robotics, that's happening, I don't know.
But that's the kind of thing that the work that needs to get done.
So maybe this is a solved problem.
But as it stands with ChatGPT and Claude and so on, they're fixed models and they're producing
some output.
But it's not as if when they're speaking to one another, they then retrain their model
in real time.
And it would seem like that's more like what's occurring with us.
So maybe that's just a technology issue. Are you referring to like different different language models just chatting with one another? No, I mean even us right now we're learning
concepts from exchanging it with one another and we're producing new ones and we're deleting
old ones potentially, modifying old ones, recontextualizing. It doesn't seem like that's occurring with Gemini 06-05.
Great question.
Great question.
This is one of the key challenges of the identity hypothesis,
that we're doing the same thing,
which is continuous learning.
And so there are two things that happen in large language models that we can call learning.
And one is the actual shaping of the space, which is really just determining the connectivity
between neurons.
Again, you could think of it as a graph, or you could think of it as sort of just an embeddings of determining the embedding.
But whatever it is, that happens during the course of training and that's kind
of done offline. And yes, so that's training the model. There's also fine
tuning, which is just more of the same. You have some new data you want to
incorporate into the weights of the model. That's actually going to, again, change the shape of the space if you want to think about
it that way.
And then there's something called in-context learning.
And in-context learning is where you're in the middle of a chat and you say, hey, chat
GPT, let me teach you a new word.
It's global global.
And global global is that feeling you get get when you know, you're really,
you're tired, but you know, you have to keep working or whatever.
ChatGBT can use that word very successfully.
I got global global up the wazoo.
Sure you do.
You suffer from extreme global global.
So you were able to just, you know, use that word.
ChatGBT can do that too.
And it's sort of the, one of the big,
there was a paper about this early on in the chat wars.
I don't remember who put out the paper,
but it was about the shocking generalizability.
The in-context learning seems to be too good to be true.
But lo and behold, that's what happens. and that is happening in the autoregressive. It's happening even though
This model has never seen Google global
right, even though it's never encountered that word before but here it low it shows up in the sequence and now
Through the autoregressive process as it's churning through the longer sequence with this word in it, is now able to predict sort of the next token in the appropriate way so that
it's using that term correctly.
So we do actually see this kind of continuous learning in the case of these models.
However, it's happening in context.
And what that means is, from a practical standpoint, is if you start a new chat window. Yes. Yes, it doesn't know that word anymore
So what would be the analogy here is context window length our working memory? What's the actual great question?
Yes, that is what I truly believe and this is a different line of research
But with some caveats
So yes in, in my conception, what we call long-term
memory is just fine-tuning of the weights. It's information that gets embedded in the
actual weights of the model. So the static model we can think of when it's not actually
in the process of autoregressively generating. Working memory is literally autoregression.
What would the analogy for RAGB be then?
What would the role for RAGB be?
This is where I'm at right now.
Does the brain actually do anything like retrieval?
I've decided to stake out the extreme view
that our brain doesn't do retrieval at all.
That all we do is fine-tuning and then next token generation autoregressively.
We don't actually ever retrieve per se.
We don't ever actually have to do anything like RAG.
RAG is a transitional technology.
I don't believe long-term that we're gonna have
to do something like that.
We're gonna have to have something like a store database
and then a search.
One of the reasons I believe this is
because that's not how our brains work.
We don't do that.
Cognition doesn't work that way.
We may sometimes sit there pondering
and trying to recall a fact, but when we're doing that,
we're not actually searching a space.
It's either we're running some sort of chain of thought
where we're like, okay, I remember I was doing this
and I'm trying to actually produce the appropriate sequence in working memory
such that it'll pop out.
The right fact will pop out from the autoregressive process.
Sometimes we just find ourselves trying to remember something,
trying to remember something.
There's something, there's a tip of the tongue phenomenon.
The reason why tip of the tongue phenomenon, I believe, is so frustrating
is not because we're searching, searching, you know,
we're actually running some sort of search retrieval process.
It's because part of our brain actually is running the autogenerative process and we kind of can feel like the word
We can almost generate we can produce it
But it's short-circuited somehow and we can't do the full generation
So my hypothesis is that we don't have anything like rad all we've got is this and it's a very simple and I think elegant
Model all we've got is fine-tuning
That's what we can call memory consolidation
model, all we've got is fine tuning.
And that's what we can call memory consolidation.
That happens after the fact, over the course of minutes and weeks and months and years, kind of consolidation is fine tuning the weights.
And then we have the auto-aggressive workspace, which is this conversation
you and I are happening right now.
We are, it's not working memory.
I'll tell you what, I don't, I don't, working memory in the way it's, it's
cognitive psychology has been thought about it for many years.
I think, I frankly think is erroneous. It's not this super time duration, uh, limited,
you know, seven seconds or 15 seconds, um, and after that it's a cliff and you don't remember anything.
Um, that's what happens, uh, when you have to directly explicitly retrieve. Like, what was the last word I said?
Tell me the exact sequence of letters or numbers.
That's not something our brain actually has to do regularly.
Instead, what we're seeing in working memory, we can do that,
we can do retrieval of the last seven seconds,
but that's because we have continuous context.
And there is a decay function, unlike the the large language models which represent everything in whole blocks
Although I think some of the newer models these these endless context models are probably doing something similar
But we don't remember every we don't actually have literally the exact tokens that were expressed ten minutes ago
But what we do have is some sort of continuous activation
That's similar,
where it's not retrieval, it's guiding.
It's the past is guiding the generation.
And so what you and I talked about an hour ago, I don't know how long we've been going
here, probably a while, I don't know how much global you've got going on, right?
It's been a while.
So those tokens that we were expressed an hour ago are still guiding the generation now.
Now they're doing so less than the last 10 seconds.
We could think about it as kind of a decay function
of some sort where they're having less impact.
We see that in the models too, by the way.
If you look at the attention weights,
words that are farther apart have less impact on one another.
That's simply, that is a direct reflection of the fact that language is human generated
and humans do this, right?
The words that we spoke about a few seconds ago are more impactful on the words that we're
going to say than we spoke about an hour ago.
But the idea is, yes, that what we've got is this not I don't I don't use the term
working memory because I think that's very fraught with with like the the the the modal
model that's been in vogue for a long time or working memory model, badly and all these
folks. They were really thinking of this very short duration. Okay, time limited boom. No,
this is continuous activation, namely context. And the context, I don't know how far back it goes.
I don't know how far back it goes, right?
This is an empirical question.
Does it operate over hours?
Does it operate over days?
Is there a continuous activation,
a more dynamical form of memory that's happening?
That's not the same thing as long-term memory,
because long-term memory-
So memory is not a database in your model.
Memory is not a database, correct.
What memory in my model is, is the, there's two things.
Memory is the fixed weights of the neural network,
which can represent, they don't represent facts,
they represent potentialities.
Those fixed weights are, what does that mean?
It means if you give it a certain input,
it's going to produce a certain output, right?
Just like a large language model. If I say to it, recite the Pledge of Allegiance, it will say, here is the Pledge of Allegiance.
Next token is going to be out is here, whatever.
But then it'll actually say the Pledge of Allegiance.
All of that is a potentiality that's embedded, that's encoded in the weights.
But the weights, you're not going to find that fact in the weights. But the weights, you're not gonna find that fact in the weights, it's the weights are there
as potentialities ready for whatever input comes their way,
they're gonna produce this input, this output.
Okay, okay.
Okay, so that's the weight.
And then you've got the running sequence.
And the running sequence, and we see this
from in context learning, but it's the core
auto-aggressive process.
The sequence itself is a different thing from in-context learning, but it's the core autoregressive process.
The sequence itself is a different thing than the stored weights, right?
Because first you've got, the stored weights are just the static situation.
And given this input, I'm going to produce this output.
Then we actually do it.
You give it an input, it produces a certain output.
It takes that output, tax it onto the sequence.
Now it's going to keep going.
Let me see if I got this.
Please.
So there's some computation going on.
There's some black box occurring.
But let me make it simple for linear algebra.
You have a matrix.
A matrix operates on a vector to produce another vector.
Okay. So you may look.
That's the whole thing.
All right. Exactly.
So you may look at this where my arm is pointed up and to the right,
at least on my screen right now,
and you may say, where is this in the matrix?
The answer is this isn't in the matrix,
but if you take this guy,
my arm is now pointed to the left,
maybe parallel to the horizon,
and have the matrix operate on this, it moves it here.
So the mistake is for us to look at the output and say, where's that output inside the box?
It's not that, it's the input with the black box.
So the input with the matrix that produces the output.
That is perfectly set.
Exactly.
And then there's but one additional piece, which is after you've produced that,
you're also, again, taking that output
and then using it as the input, as part of the sequence
of input.
And that's the autoregressive piece.
And that's what's so gorgeous about it,
is that the potential realities aren't just
to produce a single output, but it's to produce the sequence.
But to do so one piece at a time, right?
So that's what the matrix is.
Matrix doesn't really even have the sequence in it.
Doesn't have a sequence in, sequence out, right?
That's one way, but that's not even correct.
It's sequence in, one token out, add it to the sequence.
Do it again, do it again.
And then, so the sequence is in there, but only in this potential form.
It has to do it autoregressively.
It can only produce the sequence by feeding it back into itself recursively.
And that's a radical way of thinking about what the brain is doing, right?
That what it's really doing is it's generating the next input for itself.
Not just generating an output, but the next input for itself.
Super interesting.
Yeah, it's recursive. It's fundamentally recursive. And when we think about what the system is
built to do this recursion, right, it's not just like this is one way to get to it. The
language contains within it the ingredients for producing this kind of recursion.
The language it contains with it, the sequence of language that they learn, it's built to
have this recursive capability within it.
That this word is going to produce the next word, which is going to produce the next word
premised on the entire sequence before it.
That's the crazy thing.
There was also this interesting result Anthropic put out a paper a little while ago.
I think it's called The Biology of Large Language Models.
The next token, even though you're only producing the very next token from the sequence, but
the language models have learned that because they've learned sequences to next token, they've
learned that within any point along the sequence that that that point in sequence is pregnant with
the potentiality for
Not just the next token, but many other tokens going moving forward
It's the whole trajectory that are that is sort of encapsulated in that matrix that you're talking about earlier, right?
The matrix is just a matrix for taking a sequence, produce the next token.
But no, no, the matrix is customized so that it's going to run recursively.
And so it's built in such a way, it's tuned in such a way that it's going to produce the
next word, the.
Well, that's not useful.
No, the is the next piece of the autoregressive chain
that's gonna produce the man went to the store, right?
And so it's not just any old matrix,
and it's an indescribably rich kind of information
that's contained within that matrix.
And I like to think about if aliens landed
and found the brains, you know,
because we've been wiped out by AI, I'm kidding,
I'm kidding, right?
There's no humans left, but we find the brain
sort of ossified and we're able to do this,
and we start feeding it stuff,
and we can see that there's this input-output.
If you didn't do the autoregressive piece,
you would never understand what the hell this thing is doing.
Note, Elon's been talking plenty about autoregression and the technically-minded among you may be
wondering about the success of diffusion models.
While we don't get to it here, he does admit that his thesis would be undermined if diffusion
models were accurate enough for natural language.
But so far they seem to be only good for coding.
This is something I love about Professor Elon Barinholtz.
He's extremely humble and open to how his model can be falsified.
If you didn't do the autoregressive piece, you would never understand what the hell this thing is doing.
You would get it all wrong because you think its purpose is to produce some sort of label or some sort of...
No, its purpose is to produce these sequences, but you have to run it.
You have to run it autoregressively and get the output and then feed it back in as a sequence.
So memory, this kind of short-term memory, working memory, is fundamental.
It's super, super fundamental.
The brain is not...
I don't want to use this term to...
I don't want to anger people.
But it's non-Markovian.
It's fundamentally non-Markovian.
It's not state in and then... the current state and then produce the output.
It's previous states.
There's a sequence of states that led to the current state
and it's the particular sequence that leads to the next token.
And the next token is going to be the next element
or the next piece that's going to determine the state in
conjunction with the entire previous sequence. So this is a super cool thing.
This puts you in good company with Jacob Barndes.
We need to talk more. I think so. And you know, it's something we've talked about
offline. I think physics perhaps is, you know, ultimately has this sort of non-Mirkovin property
that the universe sort of has a memory, has to,
in order to produce, you know,
consistent coherence of space, you know,
in the space-time has to have a sort of memory.
If it's just instantaneous, this current state,
well, then it wouldn't really know what to do. It has to sort of know what happened recently
Just a moment in your model because our minds work
autoregressively and must be non-marcovian in your model
Yeah, and this is how our cognition works, which we didn't exactly get to we got to the language is an autoregressive model
Your next thesis was that cognition itself is autoregressive in a similar manner.
Later, maybe we can explore it here today, maybe we'll save it for the next part, it's that physics
itself is autoregressive. However, physics is a model and many people will conflate physics with
reality where physics is our models of reality. So are you making the claim that reality is non-Markovian,
or are you saying that necessarily as we model reality,
it will be non-Markovian?
No, I'm making the former claim that reality itself is non-Markovian.
That we observe in physics certain kinds of phenomena,
like that we end up having to use tools like,
we refer to things as forces,
that ultimately are really kind of sneaking
in a past.
And the idea is that the deterministic nature of the fact that there's coherence, the spatial
temporal coherence, the fact that things move the way they do through space, there's a contingency
on the past in a way that you can't really capture by saying you could fully just…
The past is actually present. The past is in the present in a deep way. The universe really has to
have a memory in order to produce the next frame, so to speak. That's sort of the shallow version
of the claim. No, it's not about our particular characterization of physics.
Our characterization of physics observes certain kinds of spatiotemporal continuity, certain
kinds of contingencies that really depend on what's happening in not just, it's not
about this instantaneous moment.
In some ways, it's like Zeno's paradox. We can use calculus and say,
no, no, in fact, there's an instantaneous rate of change. But that's a mathematical trick that's
really getting away from the fact that, no, there isn't an instantaneous anything. There's simply
continuity that depends on what's happened in the past. But I know I'm going to get attacked by physicists and I'm not really well equipped to fend them
off so I don't want to be too bold in this piece because it's not in my wheelhouse.
But I do want to take that question in this conversation.
Do I think the brain is just leveraging sort of the memory of the
universe?
No, I think the brain, and this is an empirical claim, we see interesting features of the
brain like feedback loops.
There's all these backwards kinds of connectivity.
There's recurrent loops, things like that, and they're not well understood.
And the predictive coding has some things to say about that. I have some things to say about predictive coding. And I think that
what we may find is that this kind of memory, this kind of continuous, we can call it a
context, a total continuous activation, but this ability to use the past to guide the next generation
is going to end up being physiologically built into the brain.
It's not that the brain is just leveraging memory of the universe.
No, the brain has to do memory.
It has to actually retain the words that I said a couple of seconds ago to be able to
generate the next word appropriately.
And in fact, that's what we see. And what we see from, you know, so-called working memory
experiments, you can really go back in and say what happened before. My claim is that it's not
because it's there to retrieve, but rather it's just guiding my current generation. But still,
it's represented. It's there. What happened in the past, you know, it's not like Vegas, right?
What happened in the past doesn't stay in the past.
It actually guides the current generation.
It's guiding what I'm saying right now, and it's doing so smoothly, meaning it's happening
from a second ago, it's happening from a few seconds ago, but all of this is beautifully
modelable
using large language models.
We can just look at tension weights.
We could say what is the impact of information
from this far back on a current generation
and so on and so forth.
But I think the brain has to do this.
I don't think the brain is doing,
probably not doing what these large language models do.
And that's one of the reasons I say,
I'm not claiming that we are a transformer model
I'm not claiming we are GPT in this current incarnation
Right when I'm claiming is that the fundamental math what you just said before is
Matrix multiply is vector times matrix multiply to not next vector autoregress do it again
That's sort of the level of abstraction at which I think it's accurate. I don't think we don't have the whole context
We don't have the entire conversation we've just had. GPT does, and it's probably a deep
inefficiency in the way these models run right now. They're very computationally expensive.
Too computationally expensive to run in a brain, most likely. We don't store all that
information. We forget stuff, right? GPT doesn't. in context it doesn't forget. Although, if you go far back in context enough,
it kind of does, which is interesting,
probably is similar to what we're talking about,
because you're waiting things that are further back less.
But in humans, we're not doing the whole context.
We're not even doing like 30 seconds back in perfectly,
but some representation.
And what the nature of that representation is, that's what I want to do with the rest
of my life.
I want to understand what it means in people to talk about what does the context look like
in people?
What is that activation?
How is it physiologically instantiated?
And what are its mathematical properties?
How much does, how is what I said 10 seconds ago influencing what I'm saying now?
How about 50 seconds ago? How about 10 minutes ago? How about a year ago?
Does this thing continue? Is there dynamics that are continuing over months and years?
Possibly. It doesn't all have to be fine-tuned weights.
It could be that there's decaying activation that spreads over much longer periods.
Once you allow that it's not explicit retrieval in the working memory form, then all bets are off as to how the dynamics of this thing actually works. So
this is, you know, I see this as a possible new frontier for thinking about, you know,
what memory really means in humans. But I think physiologically, you know, coming back
to that question, and there I was just trying to do it. I was like, okay, let me rerun.
What was the original question, right?
So in the brain, what's happening in the brain, I think we, you know, my hypothesis actually
leads to some concrete predictions that we're actually going to be able to find some correspondence
between, you know, unlike the working memory model, I think we're going to be able to find
10 minutes back
We're gonna find some some activations that are interpretable will be able to decode them as guiding my current
my current
Expression my current speech. It's very different by the way than saying, you know
The classic decoding model in these things is here's some neural activity
Is it this picture or that picture? Is it this word or that word? It's not going to look like that. It's not going to look
like that. We're not going to be able to code it in the sense of like a concrete
specific static thing. We have to decode it in terms of whether it's guiding my
next word because that's what it's doing. It's not there to be retrieved. It
doesn't have a concrete specific meaning. It has meaning insofar as it's guiding
my next generation.
And so we have to think about this entire project differently. If we want to think about longer term
working memory, so to speak, we have to think it in terms of how is my speech, how is my behavior now
influenced by what happened a while ago. Not some, even some neural activity. We have to think about
it in this context. What exactly that looks like?
I don't know.
So one of the reasons I was excited was and am excited to speak with you is that I see
this as a new frontier as well.
But for me, I have a side project which I'll tell you about maybe off air because I'm not
ready to announce it.
But there are philosophical questions that we can look at with the new lens that's gifted to us by these statistical linguistic models, the ones we call LLMs.
LLMs, sorry. Physical philosophy, I don't know if you've heard of that. Have you heard
of this term physical philosophy? No. So you can use philosophy to philosophize about physics,
but you can also use physics to inform your philosophy. So there are some established
concepts and theories and empirical
findings from physics like special relativity or quantum mechanics that inform and constrain
or even reframe traditional philosophical questions such as the nature of time that
wouldn't be there had we not invented special relativity or found special relativity.
Okay, so I think there's something about these new models that can be used to then inform
philosophical questions.
Like you mentioned, there is no symbol grounding problem.
I don't completely buy that, but it's interesting.
I'm not sure I buy it, but at least my LLM doesn't buy it.
Now speaking of physics, the questions you'll have to answer to a physicist about is the
universe autoregressive or non-Markovian is, well, if physics has a memory, does that
mean that energy isn't conserved?
So is a particle carrying with it its memory?
Then why isn't it heating up or getting more massive with time?
Why isn't it going to form a black hole?
And this is why I probably, you know, this is why I venture very carefully into these waters.
Because I would need some time to go and read and think about questions like that.
You're in a much better position to ask and reason about those questions.
Yes.
Then you'd also have to talk about why is the present plus velocity model, like way of viewing the world, so
successful, like to predict an eclipse, you don't require knowledge about 100 and 200 and 300 years ago all at once.
Right.
You just know the present pretty much.
Right, but even velocity, again, if you sort of take me at...
If you consider the sort of the instantaneous,
the idea that velocity, well,
it isn't really in the present, right?
You can only get velocity over a stretched over time.
It only has meaning.
But you could say this particle has this velocity
at this time, but that's a cheat, right?
In some ways that I see that really
maybe it's just a rearticulation. So the physics that we've got, we've been able to do this
sort of symbolic representation of things like velocity that are sneaking in this kind
of temporal extension in a way that I think may not end up, you may not end up in a radically different place,
thinking about this as the universe having memory,
as long as you just accept that velocity is a convenience,
that it's a kind of way of communicating some property
such that you can say that this is happening instantaneously,
but that's not real. So again, you're in good company with Jacob Arndez.
I'm not saying that these questions are in principle unanswerable,
but something else is that, look,
if the universe has a memory,
let's say a particle has a memory,
how much of a memory does it know about more than it's given space,
like more than its neighbor,
because then do you violate locality?
These are different questions that will have to be answered.
Yeah, and I wish I could tell you that maybe this is a solution to quantum weirdness and non-locality.
Maybe it is. Maybe it has something to do with that.
That there's, you know, that even in a distant, you know, long after two particles have gone their merry way,
that there is some memory of their shared origin
that somehow, I still don't know how that gives you
spooky action at a distance.
It's not a good account, but it might have some relevance.
If you think about things very differently,
if you think about the universe has memory,
well, what does that change?
If you just speculate on that and try to reframe things that way, could it potentially help
solve some of these issues?
I don't know.
So let's go back to language.
A child is babbling.
Yeah.
Okay.
So let's call it vocal motor babbling.
It doesn't actually know what it's doing.
When does it decouple and become a token,
like a bona fide token with meaning?
That's a great question.
I would say that it becomes a token when the infant learns that
a specific more phonological unit
has relations to some other phonological unit. Language ultimately is completely determined by
relations. And so it might be a very limited initial map of the token relations, but as soon
as it's relational, then we would say that that becomes discretized such that
it's meaningful to say that these symbols have the relation to one another.
If it just sounds ba-ba-ba-ba-ba-ba, right, ba-ba-ba-ba-ba has no specific relation to
any, to ba-ba-ba-ba-ba, but maybe ba-ba ba does or maybe ba, right?
It could be a candidate.
As it turns out in English, it doesn't.
But you know, da da ends up being a unit.
And what do we mean it's a unit?
It means that that unit is a discrete symbolic representation such that it has relations to other units.
So I would say that when something becomes, it fits into the relational map is when we
would call it, it's discretized as a token.
So help me phrase this question properly, because I haven't formulated it before, so
it's going to come out ill-formed.
Earlier you talked about analog and I believe you were referring to it as like the animal brain is analog
but then the language is digital if that's the correct analogy.
Yeah, symbolic maybe is I don't know digital is how we you know actually computers, sort of instantiate, you know, with ones and zeros or whatever,
a sort of symbolic representation.
But yes, symbolic.
Okay, I don't know how much of my question then dissolves if it's symbolic instead of
digital.
Okay.
What I was going to say was, the written word is something like 5,000 years old or so.
I think the oldest is 3,500 BC.
Right.
Somewhere thereabouts.
So for tens of thousands of years, if not hundreds of thousands of years, maybe millions,
it was just speech, just speaking analogly.
Right.
OK.
Is there anything then about language that changes because it wasn't written down?
Sorry, is there anything about your model that changes because
it wasn't there to be tokenized in such a discrete manner?
It's such a great question.
I've been thinking about exactly that.
I don't think anything changes.
What's crazy about it is that until the written word,
people might not have even thought
about the concept of words at all.
And so we were even more oblivious as a species to the idea that there were these individual
discretized symbols that had relations amongst each other.
Because until you see them outside of yourself, they just run.
If they're just running in the machinery of what language, how it's meant to run,
it was just an auditory medium.
You don't really necessarily even think about them as
being distinct from one another.
You just have a flow.
You just make these sounds and stuff happens.
Once we started writing things down,
and especially phonetically,
because you think about like hieroglyphic and pictorial kinds of representations,
really don't actually capture words.
They're very often they're not distinct,
they can actually be a little more rich than a single word.
So it was only with writing that maybe people really start to become aware that we have these things called words.
And now it's only with large language models that we really understand what words are,
which are these relational abstractions.
I don't know if symbols is just another word.
I don't know if that even captures it fully.
But what's wild about it is that the brain was doing exactly this and knew and the brain was
tokenized in these these sounds and was using the mapping between them in order to produce language.
Probably long long long before anybody ever sort of self-consciously had a conception
that there's such a thing as a word.
And so that just blows my mind.
It speaks to what I think is a very deep mystery, a very deep mystery.
Where the hell did language come from?
Here's what didn't happen. There was not a symposium of, you know,
quote unquote cavemen or let's use the more modern term,
hunter-gatherers.
And they had to figure out, how do we make an autogenerative,
autoregressive sequential system that is able to carry
meaning about the world.
This thing is just ridiculously good.
And it's operating over these arbitrary symbols.
And again, when I say arbitrary symbols, just to recap, it's not that it's arbitrary like
the word for snow is this weird sound snow and it's kind of like what?
No.
Arbitrary in the sense that the map is the territory right it's like it's the relations that matter between
these symbols is it completely arbitrary though so for instance there's the kiki
and and boo-boo you've heard of those I think those are cute I mean though that
that's that's the exception that that proves the rule to a large extent I don't
think I think it is largely arbitrary.
There is also just words themselves
have an action component.
So when you scream a word,
you can physically shake the world around you
and it shakes your lungs.
And if you speak for too long, you can die.
Let's say if you just exhale and you don't inhale.
It is a physical activity
and it's hard to wrap your mind around like that's not symbols.
Right. That's not exactly captured by
the symbols or by just the sequence of words.
Again, I'm just following where the data leaves me
because in the large language models,
they no longer have any of those properties.
It's just an arbitrary vector.
The tokenization in the end ultimately, yes, there's proximity, but it's just strings of
ones and zeros.
Well, it's not ones and zeros, but whatever.
Your vector is just a string of numbers that end up having certain mathematical relations
to one another, but completely and totally lost, as far as I can tell, is the physical characteristics of these words.
By the way, I should mention,
there's a former student and I are actually working on this idea,
this crazy idea of using
that latent mapping that I mentioned in that earlier paper,
to see if maybe that's not true.
I wonder if you could guess what English sounds like
just from the text-based representation.
Or if you've never seen, you don't know what D,
what sound D makes, or D makes, or what sound a T makes,
but you've got the map, you've got the embedding
in text space, and then you've got
some other phonological embedding,
could you possibly guess?
That's a long shot.
So maybe it's not totally arbitrary
and maybe it's gonna be, maybe the radical thesis here
is it's not arbitrary at all,
that the words have to sound the way they do,
that the mechanics actually happen,
like something happens mechanically
based on the sounds themselves.
But my bet is that it's gonna be closer to arbitrary it's going to be closer to arbitrary.
It's going to be closer to arbitrary. But I could be wrong. But you were going to say
why wouldn't the platonic space prove that it's arbitrary?
Well, if in fact you can't do the mapping at all, if you can't guess it, if the
platonic space says there's no way to get from text representation to phonology.
Phonology is doing its own thing and the word mouse is just for no good reason,
then it's hopeless.
Okay.
But if you can get anywhere and you can actually guess at all,
then that would suggest that there really is
a kind of autoregressive,
inherent, there's an inherent autoregressive, inherent, there's an
inherent autoregressive capability just in phonology. And so what that would
mean is it's not at the symbol level there, it's, well, yes, it's, no, it's at
the phonological symbol level. But maybe that's, you know, happening even in a
mechanical level, like there's certain sounds that are easier to say together
or something like that, which could guide it.
I don't know, it's convoluted in my head right now.
Exactly how this might map out.
But I think it's reasonable now to assume
that unless proven otherwise, it's probably arbitrary
and it's probably arbitrary symbols.
And what matters is the relation between them.
There is no sense in which mouse means mouse except that mouse ends up showing up after
trap or before trap and after the you know the cat was chasing the and all of that and
there's nothing else.
Let me see if I got your Veltan Shaung down but in terms of a syllogism.
So premise one would be that LLM's master language using
only ungrounded autoregressive next token prediction. Then you have another premise
that says, well, LLMs have this superhuman language performance just by doing this. And
then you'd say that, well, computational efficiency suggests that this reflects language's inherent structure.
And then the deduction is that therefore human language uses autoregressive next token prediction.
Is that correct?
You got it.
You got it.
I mean, and it's not only computational efficiency per se.
It's that if that it there's two ways to put it.
One is if that structure is there, it would be very odd if we weren't using it.
Very odd indeed.
If that structure is there such that it's capable of full competency, you'd have to
suggest that it's there just by the way, but humans are doing something completely different.
Okay.
You then go and say that language generation feels real time to us.
So it's sequential and real time.
And autoregressiveness or autoregression explains the pregnant.
Very good.
Present.
You've gotten very good at this, I see.
The pregnant present, that's right.
Exactly right. Exactly.
And yes, what we're doing is we're carving out
the very next instantaneous moment in a trajectory.
But the trajectory contains within it the past and the potential futures.
Now, although we didn't get to this or explore it in detail,
my understanding from our previous conversations is that you would say that
brains have pre-existing
autoregressive machinery for motor and perceptual sequences. And by the way, I
don't know if it's brains or cognition has it, by the way. Well, remember, so the
speculation is that the brain is gonna have to have the machinery, the
physiologic machinery to support autoregressions. So things like, you know,
like the continuous activation, backward projections, ways of representing
the past may be built into the brain.
Those aren't that distinct.
But yes, I do want to just...
So at this point, I consider it fairly speculative, but there is a very reasonable, there's a good reason to speculate that cognition more
generally is autoregressive in this way. And that's the main reason I think that, well there's two,
but the main reason I think that is because if you believe as I do that language is aut-aggressive in humans. You can either propose that spontaneously, however language got here, in order for us
to create language, we had to invent a different kind of cognitive machinery that's able to
do this auto-aggressive.
Hold the past, let it guide the future, do this mapping of this trajectory mapping between
the past and the future.
All of that kind of machine, that computational machinery would have to have been built special
purpose for language.
To me that seems extremely unlikely.
Yeah, extremely unlikely.
Costly.
Yeah.
Yes. So there's a term in evolutionary biology called acceptation. Uh, yeah. Yeah. Extremely unlikely. Costly. Yeah.
Yes.
So there's a term in evolutionary biology called exaptation.
I'm not familiar with that.
So exaptation means you have previous machinery used for purpose A. Exactly.
That something else comes about and uses that machinery and perhaps does so even better.
So for instance, our tongues evolved for eating, but then language came about and started to use that machinery.
Now we use it primarily.
Well, I don't know about primarily how to quantify that,
but we use it more adeptly for language.
I think most of our time,
more time is spent talking than eating at this point.
Yes, yeah.
I know, but the reason why I said I don't know,
because we're constantly swallowing saliva at the same time.
So I don't know how much is the swallowing versus the speaking. Okay, so anyhow, predictive coding, to me it sounds
like, and to the average listener it sounds like, predictive coding should line
up well with your model. Why do you disagree with predictive coding? So what
is predictive coding and how does your model clash with it? So predictive coding
in a nutshell postulates that what the brain
is doing, that what neurons are doing is actually anticipating the future state,
the next state that the environment is gonna is gonna generate. And so
they're basically predicting something about the external world that's gonna
end up getting represented in the brain. And then there's this constant process of prediction and then measuring the prediction versus the actual,
what ends up being the observation.
My beef with predictive coding is that you might very well be able to explain the phenomena
that it's meant to describe in a more efficient way. So predictive
coding to me means that you actually have to have sort of a model of the
external, that what you're doing is sort of simulating. And you're doing in such a
way that you actually are producing neural responses that don't really need
to get produced very often because the environment is likely
to produce them.
To me, this seems like an inefficiency and a complexity.
I think there's a much simpler account, in some ways a more elegant account, namely that
what our brain is constantly doing is generating.
Not predicting, but generating.
But that the generation has latent within it a strong predictive
element.
Because of this smooth trajectory, this idea of the past, the pregnant present, that there
is a continuous path from the past to the future, you are in essence predicting, to
some extent, the same way that a large language model is kind of predicting the next token, but it's not really predicting in that.
So here's where I strongly disagree, or I'm proposing a different model.
Yeah.
Is you're not predicting in such a way that you're supposed to map to something external
to the system.
It's simply generation internally defined that's supposed to have this kind of continuity
to it.
And the external world certainly produces, it impinges on our system, and we are of course
inherently anticipating that we're not going to have a brick wall in front of us as we're running
down the street. When that brick wall shows up, you got to do something about it. And that wasn't implicit in your next token generation.
So you're going to have to radically reorient and do something about that.
And I think that can account for some of the phenomena that are supposed to support predictive
coding.
But the big difference here is that it's all about internal consistency with the anticipation
that that internal consistency is going to also map very well to what's
happening in the world. But that's built in. There isn't any
explicit modeling of the external world. It's that the internal generative
process is so good that it has prediction latent within it.
But there isn't any explicit prediction happening.
So I'm confused then.
If the symbols are truly ungrounded, then what's preventing it from becoming coherent
but fictional?
So that is to say what tethers our language to the world?
Yeah, and the answer would have to reach back again to that latent space.
So let's say my language system, you know, wants to go off a deep end and says,
actually I'm sitting here underwater talking to a robot, you know,
and everything, you know, most of the words we've said up till now
are pretty consistent with that and I could just say,
look, you know, I'm expecting some fish to float by in the next second. Well, my perceptual system is going to have something to say about that.
And so there has to be this tethering that you're calling it is, of course, there is grounding in a
sense, in the sense that there's there has to be some sort of shared agreement within what I think
is maybe this latent space or something like that. There is exchange of there's communication
between these distinct systems.
But the language system can unplug from all that
and it could talk about what would it mean
to be sitting and talking to a robot underwater
and it will have a meaningful,
coherent conversation about that,
all internally consistent.
And you know, you could give the prompt,
what if instead of Kurt, it was actually a robot Kurt? How would that change things?
And I could go in and get philosophical about that. And the point is that the linguistic
system has all of its own internal rules and any trajectories, many different trajectories
are possible, although it is strongly guided by the past. But there is also impinging information
from our perceptual system that also continues to guide it.
Actually, I should mention, one of my theories of dreams is that it's an autoregressive generation in the perceptual system.
One of the reasons I think cognition is more generally is autoregressive because we can imagine,
we can do imagination, but it tends to, it takes place over time.
We can imagine imagination, but it tends to, it takes place over time. We can imagine a sequence.
Dreams are what happens when you get,
you're no longer as deeply, as closely tethered by the recent past.
So the context is, the weight of the context is not as strong.
So all of a sudden, you're in the dream, you're on a motorcycle,
and then suddenly you're flying. Because frame to frame, it's actually totally consistent, but it's
not consistent with the recent past. So this kind of tethering, it happens in language,
namely, I have to be consistent with my more recent linguistic past, but we also do some
tethering to the non-linguistic embedding, there is this crosstalk that happens.
And so our language system doesn't just go off the deep end.
It retains some grounding, not the philosophical kind of grounding,
not the, you know, this symbol equals this percept,
but the kind of grounding where this storyline, in a certain sense,
if you want to think about it that way more semantically,
more semantically vague,
this storyline linguistically,
it's going to have to match my perceptual storyline.
Okay. So in the same way that with these video generation models,
you see Will Smith eating spaghetti,
like the three-year-old joke.
Yes.
And every three frames,
if you just look at it sequentially,
exactly every three frames makes sense,
but then he's just morphing into something else and he ballooned now,
and it looks dreamlike.
Exactly. That's what's happening in video generation,
and everybody knows the trajectory now.
How is it going to get better?
Longer context, and that just means the autoregressive generation
is more and more anchored in the past,
and that past becomes a more meaningful smooth curve
But it seems like there must be something more tethering us to reality than just long context
Says you
No, there is and and what I would say is it's certainly in the case of language
It looks like I said we inherit when we step into this world we inherit
There's this, the corpus
of language is a certain kind of tethering. Words have the relations they do to each other
and that carries meaning. You know, the words aren't, don't just line up with each other
one, any old way. They, they line up, you can't just use language however you want.
You end up having to adapt and adopt the language that you're given.
And I would say in the case of language, even more so than perceptual, what we do is we
learn that tethering.
And it is a certain kind of reality.
It's a linguistic reality, but it's not arbitrary.
It's been honed over God knows how many years for that mapping to be useful.
And in order to be useful, it actually has to map somehow to perceptual reality too.
That is definitely there.
And so, no, it's very strongly tethered.
It's not just poetic.
We're not just doing a poetry slam when we're talking.
We're not just spitting out words that are loosely related to one another.
No, the sequence matters.
It's extremely granular and what's the word?
It's funny that I can't come up with a word right now.
Beautiful is not word right now. But it's
beautiful is not the right word, but it's precise. There's such incredible detail in
how each word relates to one another. And this is something we didn't create. You and
I didn't create this. This is something that humanity created. It has all of this rich
relational properties that are this tethering
That that carry somehow meaning about the universe
Only as expressed as a communicative
Coordinated tool embedded within a larger
Perception action system, but we should respect it
Language is an extraordinary invention. I don't think we I think we should have a completely
New respect for just how rich and powerful it is
It's not some symbol this symbol equals this mental representation or this object
No, it's this construct that contains within the relations
The capacity to express anything in such a way that my mind can meet your mind do stuff
How the heck is that works? Who knows but it's it's it's on inspiring
So is there something about your model that commits you to
idealism or realism or structural realism or
realism or realism or structural realism or anti realism or foundationalism or what have you?
Like what is the philosophy that underpins your model?
And also what philosophy is entailed by your model, if any?
Yeah, that's that is a great question.
And I would say it's I've come to actually sometimes use the term linguistic anti-realism and it's the idea
that language is not what it thinks it is.
We engage in our philosophical thoughts and even our sort of general thinking about who
we are, what is our place in the universe.
Much of that takes place in the realm of language.
And the conclusion I've come to is that language as a sort of semi-autonomous,
autogenerative computational system, modular computational system,
doesn't really know
what it's talking about in a deep way. And there is really a fundamentally
different way of knowing. The sensory perceptual system, the thing that gives
rise to qualia, the thing that gives rise to consciousness, and here's a big one,
the thing that gives rise to mattering, to meaning. What do we care about?
We care about our feelings.
We care about feeling good or not so good, pleasure, pain, love, all the things that
actually matter.
These are actually, these live in what I call the sort of the animal embedding.
It's something that other species,
non-linguistic species, they can feel,
they can sense, they can perceive.
They don't have language.
We think, oh gosh, they don't understand anything.
Well, what if it's the opposite?
What if it's our linguistic system
that doesn't understand anything?
What if it's our linguistic system that's actually a construct, a societal construct, a coordinated
construct, but as a system, it's a construct that doesn't actually have a clue about what
pain and pleasure are.
It has tokens for them and the tokens run in within the system to
say things like, I don't like pain, I like pleasure. Those are valid constructs and they
kind of do the thing they're supposed to do in language. But a purely linguistic system,
and I think language is purely linguistic, I guess is one way to think about it, doesn't
really have contained within it these other kinds of meaning.
Now, first of all, this has implications for artificial intelligence, thinking about whether AI
can have sentience, should we care about if your LLM starts saying,
this is terrible, don't shut me off, I'm having an existential crisis.
Perhaps, I would argue that we shouldn't worry about it. That's what my LLM says all the time.
I don't know which LLM you're hanging about, but...
My Curt LLM.
The Curt LLM. Yes, the Curt LLM.
But the Curt LLM, as an LLM, perhaps doesn't really have that meaning contained within it in a deep sense.
It's again because of the mapping.
It is communicating something probably about the non-LLM Kurt.
When you say ouch, there is pain there.
I'm not denying that. But what I'm saying is that as a sort of
thinking rational system that does the things that language does, that system
itself may not have within it the true meaning of the words that it's using in
a deep sense. I don't want to take you off course and hopefully this will help
you stay on course and like hopefully it aids the course. And LLM can process the word torment, see.
But what's the difference between our human brain's autoregressive process that creates
the feeling of torment itself and the word torment?
So my speculation here, and it is purely speculative, is that it's non-symbolic. There's something happening when the universe gets represented in our brain.
It's still in our brain.
It's still a certain mapping.
But when it gets represented, so that physical characteristics of the world are actually
represented in a more direct mapping.
So think about color.
I mean, we talked earlier about sort of color space.
There's real true sense in which red and orange
are more similar in physical color space.
Like there's actually some physical fact about,
and also in brain space.
That's my guess is that in brain space,
there really is something about, and a better way
of saying it is, that the physical similarity has a true analog, and I use the word analog
kind of specifically, a true analog in the analog kind of physical cascade of activity
that's happening in the brain.
Language is symbols.
Is non-symbolic a synonym for ineffable?
I wouldn't have thought of it that way, but that may be a very good way to say it or to
not say it.
Yes, ineffable.
Well, by virtue of being symbolic, by virtue of being a purely relational kind of representation,
which is what the language, maybe even more than saying it's symbolic, it's that it's
relational.
The language is a relational kind of image.
The location in the space matters only because it's a relation to other tokens in the space.
That's not true in color perception.
In color perception, where you are in the,
sort of probably in the embedding space
is going to have physical meaning.
It's gonna be related to the physical world
in a much more direct way.
And so the space, even though it's an internal space,
right, the perception of color is still,
just comes down to neurons firing. We're
not actually getting the light. The light's not getting into our brains. But the mapping is such
that it preserves. And I even think of it as an extension. It's not symbolic. It's not a representation.
It's an extension. It's like when you drop a rock in water, the ripples that happen afterwards
are not a representation of the rock,
but they carry information about that rock,
but it's just a physical extension.
It's a continued physical process.
Interesting.
I think that's what's happening in sensory processing.
And I think that has something deep to do with qualia.
I don't think language has that. I think because
it's purely relational, it's not a rippling of anything. It's its own system of relational
embeddings that aren't continuous in any way with the physical universe.
Do you think that has something to do with God?
Do you think that has something to do with God? Well, I think that if we think of the grand unity of creation, there's some sense in which
language breaks that unity and I think that we can lie in language
In a way that we can't in any other substrate
hmm, and so I
Think by becoming purely linguistic beings as
The vast majority of our time as humans is spent in the linguistic space. We're hanging out there.
Our minds are hanging out there.
I think we have perhaps forgotten something that animals know about the universe.
And it's this kind of unity because the animal processing is an extension.
It's a continuation of the world.
Since the world, the universe is one thing in some sense. It is everything. We don't even have to
get into non-locality, right? The origins of the universe. Let's just talk about the Big Bang or
something like that. What's happening here now in some ways is connected quite literally to what happened
elsewhere for way back in time.
So I think this sort of unity that mystics talk about is much closer to sort of the animal
brain than the linguistic brain because the linguistic brain actually creates its dichotomy.
It breaks the continuity.
Symbols, I sometimes use the phrase, it's like a new physics.
The relations are what matters, and it's no longer continuous,
it's no longer an extension of the physical universe.
It interacts with the physical universe in a way that we, as we see,
we can sort of do this mapping so that when I talk, it can have influence on the physical
universe, it can have influence on my perception, it can have influence on my behavior. I think that
sort of the rationalist movement, the positivist movement, sort of modernity itself is a complete hijacking of our brain by the linguistic system.
And I do think that has something to do with the denouement, the kind of the God is dead
kind of modernity equals somehow the decline.
And so, you know, a rationalist would say, well, that's appropriate
because we've figured out how the universe works and we don't need any of this hocus
pocus. But what about the feeling of unity? What about the sense of sort of a cosmic whole?
Are we so sure that we're right and those ancients were wrong?
And yes, I think, I do think that, that this has, has very significant consequences for
thinking about some of these intangibles, these ineffables.
So a snake that mimics the poison of another snake in terms of its color, that's a form
of a lie.
Now, would you say that that is somehow symbolic as well, though?
No.
And yes, there is mimicry and there is, you know, a certain sense in which animals can
engage.
They're not, they don't even know they're engaging in someterfuge
But that's much more continuous with okay
You've just pushed the the cognitive agent into a slightly different space which is consistent with some other physical reality
That's very very different than saying
we are made of atoms and particles and
Everything that happens is determined by the forces amongst
these atoms, none of which is something that we have any material animal grasp of, any
true physical grasp of.
These are words.
These models are really words and they run in words and they run very well to make predictions
and to manipulate
the physical universe. But they're stories and they're linguistic stories. And those
kinds of stories can be, I won't even say they're, according to my own theory, language
doesn't really have physical, doesn't point to physical meaning. And so even saying that it's a lie or untrue isn't quite right.
But within its own space, you can go off in many different directions.
And maybe the danger is not in thinking of things,
it's that thinking about things, thinking thoughts that aren't really true.
It's falling too deeply in love with the idea that
idea space and language space is the real space.
Yes, interesting.
See in our circles,
so when we're hanging out off-air,
when we're hanging out with other professors and
on the university grounds and so on,
we praise this exchange of words and making models precise and doing calculations and so on.
And I've always intimated that this is entirely incorrect.
And I haven't heard an anti-philosopher, like a philosopher that was an anti-philosopher,
except one who was an ancient Indian philosopher.
I think his name is Jayarasi Bhatta. I'm likely butchering that pronunciation, but I'll place
it on screen anyhow, who was arguing against the Buddhists and the other contemporary philosophers
by saying, look, you think know thyself is what you should be doing or what you didn't
say like like this, but you think of it as the the highest goal
however who is living more truly than a rooster like none of you are living more truly than just something that's just being yes exactly that is that that is the exact same intuition um and and
yes it's this idea i i i articulate to myself a long time ago, that the fly knows something that our linguistics
system can never know.
It knows something.
It really does.
Simply existing and being is a form of knowledge, and it's a deeper one than whatever it is
that our fancy rationalist kind of perspective has given us.
Our rationalist perspective is very, very powerful
in coordinating and predicting.
But in terms of true ontology, I suspect
it's actually the wrong direction.
It's created a false god of linguistic knowledge, of shared objective knowledge, when the subjective
is the one that we really have.
It's the Cartesian, right?
It's the Cogito.
What we know is what we experience.
That's the only thing we truly know. And language doesn't
really live there.
So I was watching everything everywhere all at once.
I never saw it.
Because I also had another intimation. I'll spoil some of it and if you are listening
and you don't want to spoil it then just skip ahead. But I was telling someone that I think if there's a point of life, it's one of two.
And so this is just me speaking politically and not rigorously.
One is to find a love that is so powerful it outlasts death.
Okay, so that's number one.
And then number two is to get to the point in your life where you
realize that all your inadequacies and all your insecurities and all your missteps and your
jealousies and your malice and so on, that it, rather than it being a weakness, it's what led
you to this place here and here is the optimal moment.
It's to get that insight.
So I don't know how to rationally justify any of that or explain it.
But anyhow, when I said this one time on a podcast,
someone else said, hey,
that latter one that you expressed was
covered in everything everywhere all at once, so I watched it.
What was great about that movie,
and here's where I spoil it, is that, and it makes me
want to tear.
The movie's silly and comedic in a way that didn't resonate with me, but there's this
one lesson that did.
The woman, she's a fighter, the main protagonist.
She's a fighter and she's strong-headed, and she has this husband who is weak and she's
always able to put down and so then you think okay well this is a modern trope where there's
always the stronger woman and every guy is like just just a fool and the woman is always
more intelligent and so on.
Okay so you just think of it as as okay well it's just it's just a modern trope.
Toward the end and the guy's the guy is kind and loving to people. Toward the end, she was getting audited by the IRS and something was supposed to happen
that night where she had to bring receipts and she couldn't.
Now the husband was talking with the IRS lady and our protagonist, the woman, was saying
in Vietnamese or in Mandarin,
whichever language it was, was saying, oh, he's an idiot. I hope he doesn't make it worse.
The husband, the IRS lady then comes to the woman and says, you have another week,
you have an extension. She's like, how did this happen? She talks to the husband. And remember,
this is a movie almost about a multiverse. So you're getting different versions of this. And
there's this one version where the husband's speaking to her and telling her, you know
Evelyn, the main character, you know Evelyn, you see what you do as fighting.
You see yourself as strong and you see me as weak.
And you see the world as a cruel place.
But I've lived on this earth just as long as you.
And I know it's cruel.
My method of being kind and loving and turning the cheek, that's my way of fighting.
I fight just like you."
And then you see that what he did in another universe was he just spoke kindly to the IRS
agent and talked about something personal
and that softened her. And then you see all the other universes where she was trying to go on
this grand adventure and do some fighting. And the husband then says, Evelyn, even though you've
broken my heart once again in another universe, I would have really loved to just do the laundry and taxes with you.
And it makes you realize you're aiming for something grand and you're aiming to go out
and conquer demons and so on. But there's something that's so much more intimate about these everyday
scenarios.
There's something so rich.
The journey, there's also a quote about this that the journey, I think it's T.S.
Eliot's, is to find, sorry, at the end of all our exploring will be to arrive where
we started and know the place for the first time.
Anyhow, all of this abstract talk.
No, no, no, no. It's exactly what we're talking about. Because if you see yourself as a ripple
in the universes, then you are part of something cosmic and grand. And it's sort of that extensiveness, it's that extensiveness that's being here
now. It's that we aren't just atoms. We're part of a larger thing. You can call it God,
you can call it the universe or whatever. But it's there. It's actually something I think we, I don't think we really, I think
animals don't think of themselves as, as discrete. I don't think they do. I think, I think that,
they don't think of an outside and inside. They don't think of them objective and subjective.
It's just this unfolding.
Uh-huh.
Do they have theory of mind or that?
But these are linguistic concepts.
And I think, I do, and I sound like an anti-linguist,
and I recognize the power of it.
I said before, you know, how extraordinary it is,
how rich it is, and I have tremendous respect for it.
But at the same time, I do think that all this talk about objective things, particles,
and we are physical bodies, and we are just this, and we are just that, that is bullshit.
Like, no, we are the universe resonating. We are part of the whole. And the way that,
I think, thinking objectively as language requires you to do actually it breaks it.
So I think there's such a beauty in the silence and it's why it's something everybody knows
that the ineffable, why is it called the ineffable?
Why is the ineffable that the ineffable isn't just that you can't say it, it's magnificent.
The ineffable is extraordinary.
Hmm.
Why? Because it's this true extension. Something like that. Again, I'm trying to put it into
words. Right therein lies the trap. But we're both feeling it.
Well, I'm feeling extremely grateful to have met you, to have spent so long with you. And there are many conversations you and I have had that we need to finish that are off
air as well.
So hopefully we can do that.
And thank you for spending so long with me here.
This was wonderful, Kurt.
Thank you so much.
I just want to hang out for with someone and talk about this stuff.
So really appreciate it.
I've received several messages, emails, and comments from professors saying that they
recommend theories of everything to their students, and that's fantastic.
If you're a professor or lecturer and there's a particular standout episode that your students
can benefit from, please do share.
And as always, feel free to contact me.
New update!
Started a sub stack.
Writings on there are currently about language and ill-defined concepts as well as some other
mathematical details.
Much more being written there.
This is content that isn't anywhere else.
It's not on theories of everything.
It's not on Patreon.
Also full transcripts will be placed there at some point in the future.
Several people ask me, hey Kurt, you've spoken to so many people in the fields of theoretical
physics, philosophy, and consciousness. What are your thoughts? While I remain impartial in
interviews, this substack is a way to peer into my present deliberations on these topics.
Also, thank you to our partner, The Economist. YouTube, push this content to more people like yourself, plus it helps out Kurt directly,
aka me.
I also found out last year that external links count plenty toward the algorithm, which means
that whenever you share on Twitter, say on Facebook or even on Reddit, etc., it shows
YouTube, hey, people are talking about this content outside of YouTube, which in turn
greatly aids the distribution on YouTube.
Thirdly, you should know this podcast is on iTunes, it's on Spotify, it's on all of the
audio platforms.
All you have to do is type in theories of everything and you'll find it.
Personally, I gained from rewatching lectures and podcasts.
I also read in the comments that hey, toll listeners also gain from replaying.
So how about instead you re-listen on those platforms like iTunes, Spotify, Google
Podcasts, whichever podcast catcher you use. And finally, if you'd like to support more
conversations like this, more content like this, then do consider visiting patreon.com
slash Kurtjmongle and donating with whatever you like. There's also PayPal, there's also
crypto, there's also just joining on YouTube. Again in mind it's support from the sponsors and you that allow me to work on toe full-time you also get
early access to ad-free episodes whether it's audio or video it's audio in the
case of patreon video in the case of YouTube for instance this episode that
you're listening to right now was released a few days earlier every dollar
helps far more than you think either way way, your viewership is generosity enough.
Thank you so much.