Theories of Everything with Curt Jaimungal - The Theory That Shatters Language Itself

Episode Date: June 13, 2025

As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe Professor Elan Barenholtz, cognitive scientist at Florida Atl...antic University, joins TOE to discuss one of the most unsettling ideas in cognitive science: that language is a self-contained, autoregressive system with no inherent connection to the external world. In this mind-altering episode, he explains why AI’s mastery of language without meaning forces us to rethink the nature of mind, perception, and reality itself... Join My New Substack (Personal Writings): https://curtjaimungal.substack.com Listen on Spotify: https://open.spotify.com/show/4gL14b92xAErofYQA7bU4e Timestamps: 00:00 The Mind and Language Connection 02:09 The Grounded Thesis of Language 09:29 The Epiphany of Language 13:06 The Dichotomy of Language and Perception 16:24 Language as an Autonomous System 19:48 The Problem of Qualia and Language 23:35 Bridging Language and Action 31:32 Exploring Embeddings in Language 38:21 The Platonic Space of Language 44:17 The Challenges of Meaning and Action 51:05 Understanding the Complexity of Color 52:53 The Paradox of Language Describing Itself 58:19 The Map of Language and Action 1:07:48 Continuous Learning in Language Models 1:11:46 The Nature of Memory 1:22:46 The Role of Context 1:32:18 Exploring Language Dynamics 1:39:44 The Shift from Oral to Written Language 2:11:34 Language and the Cosmic Whole 2:21:35 Reflections on Existence Links Mentioned: •⁠ ⁠Elan’s Substack: https://elanbarenholtz.substack.com •⁠ Elan's X / Twitter: https://x.com/ebarenholtz •⁠ ⁠Geoffrey Hinton on TOE: https://youtu.be/b_DUft-BdIE •⁠ ⁠Joscha Bach and Ben Goertzel on TOE: https://youtu.be/xw7omaQ8SgA •⁠ ⁠Elan’s published papers: https://scholar.google.com/citations?user=2grAjZsAAAAJ •⁠ ⁠Ai medical panel on TOE: https://youtu.be/abzXzPBW4_s •⁠ ⁠Jacob Barandes and Manolis Kellis on TOE: https://youtu.be/MTD8xkbiGis •⁠ ⁠Will Hahn on TOE: https://youtu.be/3fkg0uTA3qU •⁠ ⁠Noam Chomsky on TOE: https://youtu.be/DQuiso493ro •⁠ ⁠Greg Kondrak on TOE: https://youtu.be/FFW14zSYiFY •⁠ ⁠Andres Emilsson on TOE: https://youtu.be/BBP8WZpYp0Y •⁠ ⁠Harnessing the Universal Geometry of Embeddings (paper): https://arxiv.org/pdf/2505.12540 •⁠ ⁠Yang-Hui He on TOE: https://youtu.be/spIquD_mBFk •⁠ ⁠Iain McGilchrist on TOE: https://youtu.be/Q9sBKCd2HD0 •⁠ ⁠Curt interviews ChatGPT: https://youtu.be/mSfChbMRJwY •⁠ ⁠Empiricism and the Philosophy of Mind (book): https://www.amazon.com/dp/0674251555 •⁠ ⁠Karl Friston on TOE: https://youtu.be/uk4NZorRjCo •⁠ ⁠Michael Levin and Anna Ciaunica on TOE: https://youtu.be/2aLhkm6QUgA •⁠ ⁠The Biology of LLMs (paper): https://transformer-circuits.pub/2025/attribution-graphs/biology.html •⁠ ⁠Jacob Barandes on TOE: https://youtu.be/YaS1usLeXQM •⁠ ⁠Emily Adlam on TOE: https://youtu.be/6I2OhmVWLMs •⁠ ⁠Julian Barbour on TOE: https://youtu.be/bprxrGaf0Os •⁠ ⁠Tim Palmer on TOE: https://youtu.be/vlklA6jsS8A •⁠ ⁠Neil Turok on TOE: https://youtu.be/ZUp9x44N3uE •⁠ ⁠Jayarāśi Bhaṭṭa: https://plato.stanford.edu/entries/jayaraasi/ •⁠ ⁠On the Origin of Time (book): https://www.amazon.com/dp/0593128443 SUPPORT: - Become a YouTube Member (Early Access Videos): https://www.youtube.com/channel/UCdWIQh9DGG6uhJk8eyIFl1w/join - Support me on Patreon: https://patreon.com/curtjaimungal - Support me on Crypto: https://commerce.coinbase.com/checkout/de803625-87d3-4300-ab6d-85d4258834a9 - Support me on PayPal: https://www.paypal.com/donate?hosted_button_id=XUBHNMFXUX5S4 SOCIALS: - Twitter: https://twitter.com/TOEwithCurt - Discord Invite: https://discord.com/invite/kBcnfNVwqs #science Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 I'm gonna get attacked by physicists. This thing is just ridiculously good. And so that just blows my mind. Professor Barenholtz completely inverts how we understand mind, meaning, and our place in the universe. The standard model of language assumes words point to meanings in the world. However, Professor Barenholtz of Florida Atlantic University has discovered what's unconscionably unsettling.
Starting point is 00:00:25 They don't. Language is actually deconstructing itself. Most startlingly, he argues that our rational linguistic minds have severed us from the unified cosmic experience that animals may still inhabit. I don't think there's a static set of facts. What we've got is potentialities. Most current LLMs operate with purely autoregressive next token prediction, operating on ungrounded symbols. All of this terminology is explained, so don't worry,
Starting point is 00:00:51 this podcast can be watched without a formal background in psychology or computer science. In this conversation, we journey through rigorous explorations of how LLMs work, what they imply about how we view the world and the relationship between our consciousness and the cosmos. Professor, you have two theses. One is a speculative one and the other is more grounded. You even have another more hypothetical one atop that, which we may get into. Why don't you tell us about the more corroborated one and then we can move to the contestable
Starting point is 00:01:24 parts later. Okay, sure. So yeah, I would call them sort of the grounded thesis and then sort of the extended version of that, if we can call it that. The grounded thesis is primarily about language. And the thesis is that human language is captured by what's going on in the large language models. And I mean not in terms of the specific exact algorithm as to how the large language models
Starting point is 00:01:55 like ChatGBT are doing the, are actually generating language, but the core sort of mathematical principle that large language models like ChatGPT run on are what's happening in the brain and is what's happening in human language. And really, the reason I say it's corroborated is because ultimately this isn't even about the brain, it's about language itself. And I think what we have learned in the course of being able to replicate language in a completely different substrate, namely in computers, is that we've learned properties of language itself.
Starting point is 00:02:30 We've discovered. It's not through clever human engineering that we've been able to kind of barrel our way towards language competency. It's that with actually fairly straightforward mathematical principles done at scale, we've actually discovered that language has certain properties that we didn't know it had before. And so the incontrovertible fact, in my opinion, is that language itself has certain properties. Now that we know it has those properties, my claim is, the sort of corroborated claim is that those properties force us to conclude that the mechanism by which humans generate
Starting point is 00:03:11 language is the same as what's going on in these large language models. Because now that we know that language is capable of doing the stuff that it does, now that we know it has the properties to, and I'm sort of giving away the punchline, to self-generate based on its internal structure. It's unavoidable to think that we are using the same basic mechanism and principles because it would be extremely odd to think that we have a completely different orthogonal method for generating language. Put differently, if we are using completely different mechanisms than the language models, then it's extremely unlikely that the language models would work as well as they do. The fact that language has this property that it can self-generate, the fact that that property actually leads to human-level language, to me, forces the
Starting point is 00:04:03 conclusion that there's only one way to do language. And that one way is the same in humans and in machines. The obvious question that's occurring to the audience as they listen right now is, how do we know that whatever mechanism is being used by LLMs isn't just mimicry? Right. And so that's sort of the critical question. Is this mimicry? Right.
Starting point is 00:04:24 Is it is what the models are doing in a sense, learning a kind of roundabout technique that captures some of the superficial components of language in humans, but ultimately it's a completely different approach. And so, you know, my argument is really from the fundamental simplicity of these models. So let's just talk really quickly about how large language models work, things like Chachi BT. What they're doing is learning, given a sequence. Let's say the sequence is, I pledge allegiance to the, and then the model is being asked to do this thing called next token generation.
Starting point is 00:05:05 What's the probable next word? We'll say word for the purpose of this conversation. We're going to call tokens a word. Token is the more technical term about how you chop up and encode the information in a sequence of language, but we're just going to say word. So guess the next word based on that sequence. So, and then what you do is, in these models, is you train them to guess simply that. All you hear is, here's a given sequence. It can be a sentence, it can be a paragraph,
Starting point is 00:05:37 it can be, frankly, an entire book, depending on how big your model is, how much it can handle. And then guess just the very next word. And what we've discovered, and I really want to use that word in particular, because it was by no means a given that this would work, what we discovered is if you train a model to do that, to simply guess the next word, then take that word, tag it onto the sequence, and feed it back in, this is sufficient to generate human-level language.
Starting point is 00:06:10 Now the reason I believe that this demonstrates something not about our engineering or even about the models themselves, because there's different ways you might build a model that can do this, is because this very simple trick, this simple recipe of simply guessing the next word turns out to be sufficient to generate language at human levels, to the point where there really are no benchmarks, no standard benchmarks that these models aren't able to do. And so what that suggests to me is just by learning the predictive structure of language, you're able to completely solve language. That means that that is likely to be the actual fundamental principle that's built into language in order to generate it. If we
Starting point is 00:06:53 had to come up with a very complex scheme, for example, you know, syntax trees, complex grammar, long-range dependencies that we had to take it into account, and through enough compute we were able to kind of master that, then I might argue, well, you know what we're doing is possibly figuring out a roundabout way to capture all this complexity. But it's the simplicity itself that simply being able to predict the next token, the next word, is sufficient to do all of this long-range thinking to be able to take an extremely long sequence and then produce an extremely long sequence on the basis of that, that suggests to me that we discovered a principle that's actually already latent in language. That
Starting point is 00:07:37 we just had to throw enough firepower at it but with an extremely simple algorithmic trick and then language revealed its secrets. So to me, this really suggests that there's still a lot of science that needs to be done and this kind of work that I'm doing in my lab in terms of really being able to hammer down how the brain is instantiating this exact same algorithm. It's not going to look exactly like chat GPT. It's not necessarily going to be based on what we're called transformer models, which is something we can get into a little bit But as far as the core principle of prediction of the next token The fact that that solved language so handily to me really argues that that is the fundamental algorithm
Starting point is 00:08:20 That's the that is the fundamental algorithm that when you apply it, boom, language emerges. If you just have the corpus, you have the statistics, and then you do next token prediction, language just like add water. The fact that it emerges so readily from that without having to do anything complicated, to me suggests that it's latent within language in the first place, And that language is designed in a sense in order to be able to be generated through this simple predictive kind of mechanism. Okay. So, Elon, you and I have spent several days together. In fact, you're in the video with Jacob Barndes and the Manolis Kels one will place that on screen and I'll put a pointer
Starting point is 00:09:02 to you. And you were in the background of the interview with William Hahn on Williams. Always in the background, never in the foreground. Here we are. Okay, well, yes, great. You have a large epiphany that occurred to you at one point. We spoke about the software and this precipitated this entire point of view of language as a generative slash autoregressive model or what have you, tell me about it.
Starting point is 00:09:27 What the heck was that big idea? So it wasn't so much an idea as an epiphany, a realization. And it hit me in a single moment. And it wasn't necessarily about autoregression. It wasn't about this finer detail of how ultimately language models, and I believe the brain, solve this problem. It was the realization that any model that has been trained, any model that anybody has built that accomplishes human-level language.
Starting point is 00:10:03 So it might be based on autoregression, it might be based even on diffusion, which is kind of the arch nemesis of my autoregressive theory. But regardless, the fact is that these models are being trained exclusively on text data. And so all they are learning is the relations between words. And so all they are learning is the relations between words. To the model, as far as the model is concerned, the words are turned into numbers. They're tokenized. We think of them as numerical representations.
Starting point is 00:10:32 But those numbers, and for our purposes we could think of them as words, don't represent anything. There is nothing in the model besides the relations. Relations just between the words themselves. There isn't, for example, any relation between any of the tokens and something external to it. What we tend to think of as people, what words are doing when we're discussing topics, thinking about words in our head, is that they symbolize something, that they refer to something. This is a lot of the philosophy of language,
Starting point is 00:11:05 a lot of the scientific study of linguistics is being concerned with semantics. How do words get grounded? How do they mean something outside of themselves? And what large language models show us is that words don't mean anything outside of themselves. As far as generation goes, as far as the ability for us to have this conversation and as far as the model's ability to produce meaningful responses
Starting point is 00:11:34 to just about any question you can throw at them, including writing a long essay on any topic, including a novel topic that it's never encountered, is by stringing together sequences based on simply the learned relations between words. And so this really hit me very, very hard. I've long been puzzled by, as many are, by the mind-body problem, the phenomena of consciousness, the problem of how do we know your red is my red? And actually the moment that I had this realization was related to this very question.
Starting point is 00:12:10 I just realized that the word red doesn't mean what we mean by qualitative red. The qualitative red is taking place in our sensory perceptual system. The word red, to a large language model, can't mean that. It can't mean any color. It has no color phenomena. It has no concept of what sensory red would mean. Yet is able to use the word red with equal ability, with equal competency, just as well as I can, if we're just having a conversation about it.
Starting point is 00:12:38 And so what this means is that within the corpus of language, the word red doesn't mean something external to itself. Instead, the word red simply means where does it fall in the space of language itself? Where does red fall in relation to other colors, in relation to the word color, in relation to other concepts, other, well, frankly, just words, tokens that are related to what we call concepts that have to do with color and have to do with the word red. So yeah, so this epiphany was about this extraordinary dichotomy, this divide between language and that which we think
Starting point is 00:13:17 language refers to. The question is how does language refer and the answer is it doesn't. Language doesn't refer in and of itself. Language is an autonomous system. It's a self-contained system. It has the rules contained within it to generate itself, to carry on a conversation. Large language models don't know what they're talking about in any real sense. They can talk about, you know, a sunset. They can talk about a taste. They can talk about a taste, they can talk about all space and time and all of those things. And yet we would say they have no idea what they're talking about. And we'd be right in the sense that they don't have a notion of red beyond the token and its relation to other tokens. Now this then raises the obvious question,
Starting point is 00:14:02 well, what do I mean what red is about? Don't I think red refers to a quality of perception? And the answer is, I do have a quality of perception. There is something called red that my sensory system is aware of. And then there's a token called red that is used in conjunction with, there's a sort of coherent mapping between my sensory perception of red and the linguistic red. But that doesn't mean that you need to understand what that word refers to. You don't need to have the sensory qualitative concept of red in order to completely successfully use the word red.
Starting point is 00:14:47 And so these are compatible but dichotomous systems. The sensory perceptual system and the linguistic system are ultimately, we can think of them as essentially distinct and autonomous but compatible or as necessary as required. Integrated? But they're integrated. They're running alongside each other, they're exchanging messages so that we can have a single organism that is successfully navigating the world,
Starting point is 00:15:18 and enable, for example, to communicate. I see something red that's registered in my brain. I have a qualitative experience of red. that's registered in my brain. I have a qualitative experience of red. It's remembered in having a certain quality. And then later on I said, oh, you know, could you go pick up that red object for me? And so we are, there's a handoff between the perceptual system and the linguistic system. Just that the linguistic system can now successfully send a message to you. Now you've got the linguistic system. You can talk about that.
Starting point is 00:15:46 Oh, okay, you told me there's a red object. Are there multiple objects? Yes, there's multiple objects. They have different colors. You're looking for the red one. Maybe it's a dark red. I'm doing this all linguistically. Now you're able to go into the room
Starting point is 00:15:57 and successfully get the right object. So again, the handoff happens the other direction. Language is able to hand off to the perceptual system, and the perceptual system is able to then detect that there's something with the right quality. But that's not the same thing as saying that the language contains the reference inherently within it. It simply means that these are communicative systems, that they can exchange information, that they integrate with one another in terms of forming coherent behavior. But language is its own beast. It's its own autonomous system.
Starting point is 00:16:28 It can run on its own. That was the big realization. Large language models prove it. That language is able to produce the next token and by virtue of the next token, the next sequence. And that means all of language without having any concept of reference. The reference has no place there. There's no way to kind of squeeze it in.
Starting point is 00:16:48 If your computational account is the one that I'm proposing, if the computational account is essentially prediction based on a next token based purely on the topology, the structure, the statistical structure of language, then there's no way to cram any other kind of grounding or any sort of computational feature in there at all. It has to be something closer to what the large language model is prompting. You can imagine a camera that generates a linguistic description of what's in a room, and then you could ask your language model, and you can by the way, you can do this right now, they're able to do vision, you could take a picture and feed it to the your language model, and you can by the way, you can do this right now, they're able to do vision. You can take a picture and feed it to the large language model.
Starting point is 00:17:29 What's happening is much closer to generating a prompt, basically saying, here's what's in the room. And now based on these features, these scripts now run the same exact language exclusive model. And so language takes care of itself. It doesn't need grounding in order to be able to do everything it does. It doesn't have to have concepts outside of itself. I think that's basically been proven by these text only large language models. So that was the big epiphany.
Starting point is 00:17:56 The big epiphany was that, oh, language is autonomous. Language is self-generating. That means it's a dichotomous computational system. It's independent of the rest. What this leads me to believe is, okay, well, if it can live in silicon in this way, then perhaps, and now I've come to believe very strongly, that it likely runs in the same way in carbon,
Starting point is 00:18:21 in biology, in our brains. Okay. So you're not denying consciousness and you're not denying qualia. No. I want to make this very clear. My personal opinion on this is besides the point, to some extent, you can be an eliminative if you want. Although I think everything I'm saying has a lot of bearing on this. But I believe my account is strictly an account of language. I think that perceptual mechanisms that give rise to qualia, things like redness and heat
Starting point is 00:19:00 and taste and all of these are basically processes that take place long before the handoff. And so what happens is, think about the camera, the camera is transducing light, it's measuring certain wavelengths, then there's a lot of visual processing that has to happen before you get to the point where it's turned into a linguistic friendly embedding, right?
Starting point is 00:19:24 The stuff that an LLM can see, a multimodal LLM can see. And so all of that processing that happens is what I think gives rise to a qualitative experience. We experience redness because of all of this very sort of analog, probably non-symbolic kind of representation. And then at the end of that process, there is a conversion. By the way, by the end of the process, a lot of things happen. We also respond to colors and to light
Starting point is 00:19:55 and all of that non-linguistically. But part of the end, sort of we could think of different endpoints. One of those endpoints is here's a handoff to language. And by the time language gets it, it's long past that initial process, that kind of sensory and perceptual processing that gives rise to qualitative phenomena.
Starting point is 00:20:15 So I strongly believe that there is, in a certain sense, the word hard problem is a little loaded. I believe there's undeniable qualia. But what I also think is that language is poorly equipped. It's simply unaware in some sense of the underlying mechanisms that give rise to what it receives at the far end, at the sort of the end point of that qualitative processing.
Starting point is 00:20:47 Just a moment. Don't go anywhere. Hey, I see you inching away. Don't be like the economy. Instead, read The Economist. I thought all The Economist was was something that CEOs read to stay up to date on world trends. And that's true, but that's not only true. What I found more than useful for myself, personally, is their coverage of math, physics, philosophy, and AI. Especially how something is perceived by other countries and how it may impact markets. For instance, the Economist had an interview with some of the people behind DeepSeek the week DeepSeek was launched. No one else had that. Another example is The Economist has this fantastic article on the recent dark energy data, which surpasses even scientific Americans' coverage, in my opinion.
Starting point is 00:21:31 They also have the chart of everything. It's like the chart version of this channel. It's something which is a pleasure to scroll through and learn from. Links to all of these will be in the description, of course. Additionally, just this week, there were two articles published. One about the Dead Sea Scrolls and how AI models can help analyze the dates that they were published by looking at their transcription qualities, and another article that I loved is the 40 Best Books published this year so far. Sign up at economist.com slash toe for the yearly subscription. I do so and
Starting point is 00:21:59 you won't regret it. Remember to use that toe code as it counts to helping this channel and gets you a discount. So now the economist's commitment to helping this channel and gets you a discount. So now the economist's commitment to rigorous journalism means that you get a clear picture of the world's most significant developments. I am personally interested in the more scientific ones like this one on extending life via mitochondrial transplants, which creates actually a new field of medicine, something that would make Michael Levin proud. The Economist also covers culture, finance and economics, business, international affairs, Britain, Europe, the Middle East, Africa, China, Asia, the Americas, and of course, the USA. Whether it's the latest in scientific innovation or the shifting landscape of global politics,
Starting point is 00:22:41 The Economist provides comprehensive coverage, and it goes far beyond just headlines. Look, if you're passionate about expanding your knowledge and gaining a new understanding, a deeper one of the forces that shape our world, then I highly recommend subscribing to The Economist. I subscribe to them and it's an investment into my, into your intellectual growth. It's one that you won't regret. As a listener of this podcast, you'll get a special 20% off discount. Now you can enjoy The Economist and all it has to offer for less. Head over to their website, www.economist.com slash toe, T-O-E, to get started. Thanks for tuning in. And now let's get back to the exploration of the mysteries of our universe.
Starting point is 00:23:24 Again, that's economist.com slash toe. To what it receives at the far end, at the sort of the endpoint of that qualitative processing. Okay, let me see if I get this. You have some redness. So you do, you're not denying redness. You grant redness. I do. Okay, there denying redness. You grant redness. I do. Okay. There's redness and then somehow this needs to be referred to with some spoken words,
Starting point is 00:23:50 with some language. Okay. So what's happening? You're saying that it's an independent system, yet it's integrated. So what is that relationship? And does it become so diluted that by the time you refer to it, you're no longer referring to that qualia? Like, I don't understand.
Starting point is 00:24:05 Yeah. That is essentially the idea. So this is the exact problem I am working on right now. There was a fantastic paper that I just came across about a week ago. There was a paper that was published in an archive recently. It's called Harnessing the Universal Geometry of Embeddings. And what this paper showed is that you could have completely different models solving different linguistic tasks. For example, you could have GPT, then you could have BERT,
Starting point is 00:24:33 which solves a somewhat different task. So there's masked tokens as opposed to autoregressive next token generation. And what they found was that you could learn what this latent space, what you could do is hand off, take the embedding. The embedding is basically, you can think of that as numerical representation. It's a high dimensional numerical representation of your tokens. So here's a token, this token is going to represent the word dog.
Starting point is 00:25:03 And then we're going to take that token and embed it in a much higher dimensional space. And what they found is that if you take the embedding, the higher dimensional representation from one model, so you chat GPT, and then take representation from a different model, that you could actually get the, you could then, you could take the embedding, send it to this latent space. If you cycle it through, get the, you have to, it's starting to get in the weeds a little bit, but you send it to this latent space and then recover it in its original form.
Starting point is 00:25:40 What you can do is, once you've got that latent space, you can then translate from one embedding to a completely different embedding. This is a new paper. This is a new paper, yes. Right. This rocked my world because what they're arguing is that there, in some ways, is this underlying universal structure
Starting point is 00:26:03 of language that's captured in this latent space. And so even though if you have a radically different embedding in one line, you know, they didn't do it across different languages. It's one of the projects I'm doing right now is to see if you can do this across, say, English and Spanish, even for a language that's trained exclusively on English and then another models trained exclusively on English, and then another models trained exclusively on Spanish. Can you guess the Spanish just from finding these universal structure across these two different models?
Starting point is 00:26:34 Sorry, what do you mean, can you guess the Spanish? If a model was trained only in English, and then it was receiving some Spanish text, a couple of Spanish sentences? The way to think about it is that what you're doing is creating another embedding, another, another, this latent space where you're going to be able to send in a message in English and then based on the station and then again do the same thing for Spanish
Starting point is 00:27:00 and then what you're not you're never going to show any model, no model is going to ever see a pair of English and Spanish. Instead, what you're going to learn is that there's some way to get from, you're going to end up being able to get from English to Spanish without ever seeing the actual translation. Because what the model is going to learn is what's common across these two representations.
Starting point is 00:27:23 What's true for both the Spanish embedding and the English embedding, that there's some sort of underlying latent structure that's true of both, and that that captures something more universal about language. Now again, they didn't do it for different languages, they just did it for different embeddings of English,
Starting point is 00:27:41 but very different embeddings, because they were trained on completely different models. If you looked at them, if you just looked at this sort of vector representation, took a vector representation of the word dog in one and a vectorization of the word dog in the other, they're completely numerically not, there's no similarity. You can never spot the similarities if you just looked even them pairwise. But if they do this kind of reconstruction and then ask the model to be able to reconstruct, not in the original embedding space, but go and reconstruct in the other embedding space,
Starting point is 00:28:11 it's able to actually do this. And so by doing that, by training it to do that, without ever seeing any pairs, it's able to sort of learn the translation between one representation and another representation. What this opened up to me is the possibility that we could think about the exact same latent space in the brain and possibly in artificial intelligence models between,
Starting point is 00:28:35 say, the perceptual world and the linguistic world. That there is some embedding of how the physical world is structured. We understand, think about an animal, a non-linguistic animal, certainly has idea of objects. Objects in relation to other objects, objects in proximity to other objects, moving around those objects. My dog, who was just parking in the background,
Starting point is 00:28:58 knows what doors are and she can go scratch it, and she knows it opens up. She certainly isn't able to express that linguistically, but she has this concept and she's able to think about it. She's able in some ways to reason about that. My suspicion is that that probably is done maybe even autoregressively, but we'll leave that aside for now. The main point is that there is some representation of the facts about the world,
Starting point is 00:29:20 the sensory facts of the world, or the sensory facts of the world, or the sensory, I would say, the sensory construction, the facts that have been constructed based on sensory information. So that's some sort of embedding of the world. The linguistic embedding is a radically different embedding. It carries information about the world as well, but not in the way, not in the direct way that we think, not that the word, you know, my headphones are sitting on this desk has direct reference back to sensation and perception
Starting point is 00:29:51 No, it lives on its own. It's its own embedding and it does its own and it can and it can do its own thing however Based on this paper This really gave me sort of a key insight that there might be this latent space where you can actually do this kind of mapping Where there's translation between linguistic and perceptual embeddings. They're as distinct as they are fundamentally very very distinct very different They're there to solve different problems, but they're able to talk to each other how perhaps through this kind of latent space Where the some universal structure like okay, in language there's certain facts about language.
Starting point is 00:30:28 There's a fact about the word dog or the word microphone that its relation to other words like desk in some ways captures the fact that microphone sits on top of desks. That fact is somehow actually contained within this embedding structure. In what sense? Well, if you ask me, would a desk sit on a microphone or would a microphone sit on a desk? I can answer that question.
Starting point is 00:30:53 So can chat.jpt, right? And without any notion of what microphones really are sort of from a perceptual standpoint, they're having these kinds of properties, we can talk about them. And the embedding space, the linguistic embedding space, contains this information. What does it mean contains information? By the way, just to say, what does that mean? It means given a certain input, like do microphones sit on desks? Where
Starting point is 00:31:15 should I put my microphone? I can answer linguistically in a reasonable way, right? And that's what I mean by the knowledge. It's purely linguistic knowledge. It only can generate linguistic responses. But the point is that that knowledge lives in this kind of linguistic embedding. And then there's the other kind of embeddings. There's a visual embedding. There might be an auditory embedding, which is distinct. And then the idea that I'm very inspired by is that there can be this latent space that
Starting point is 00:31:43 captures certain universals that are common across these different embeddings that make translation possible. So that when I see this microphone sitting on a desk, what's now available to me is the ability to describe that to you linguistically. But it's not direct. It's not that there's a very specific linguistic representation of this sensory perceptual kind of phenomena. And this is important because forever philosophers, philosophers in general, linguists have been trying to understand how do words get their meaning.
Starting point is 00:32:17 How do they, something I referred to earlier, you know, what's the definition of a microphone? What's the definition of a dog? Sure, right. And the answer is there isn't a single one. There isn't a single definition that's ever going to capture. Instead, what you've got is this latent bridge where there's some representation of this fact that given, whatever your particular prompt is,
Starting point is 00:32:40 your linguistic prompt is going to lead to certain meaningful linguistic behavior. If you ask me a question about this microphone, I might be able to answer that question meaningfully based on the perceptual information. But what this microphone means is actually completely contingent on, at least linguistically, is contingent on whatever question you ask me about it. And so it's all going to depend on what you're doing with that latent space. There isn't sort of, and this is sort of a broader point,
Starting point is 00:33:06 there isn't sort of a static set of facts about the world that's embedded in language. I don't think there would be a static set of facts embedded in our sort of visual embedding of the world. Instead, what we've got is what I call potentialities. We now have the ability to engage that latent space linguistically, where the perceptual information kind of lives, sort of this universal embedding of it, and then do whatever we need to do with it. If I need to answer this question about it, I can answer that question.
Starting point is 00:33:37 If you ask me a different question, I can answer that. But there isn't a singular meaning of microphone that captures sort of the entire set of facts. Here it is. Here's the embedded set of facts. The set of facts is actually infinite. I could tell you infinite things about this microphone, right? For starters, to use a silly philosophical example, it doesn't have this shape and it doesn't have that shape.
Starting point is 00:34:01 I could tell you there's an infinite number of questions you could ask me about it that I could answer meaningfully about it. So all those potentialities are kind of what happens when the linguistic system interacts with this kind of shared embedding space. That's sort of the half-baked version of how I think language ultimately does have to inter—of course, language only is meaningful insofar as it can live within the larger ecosystem of perception and sensation and perception. We have to be able to take in information through our senses and then communicate, although
Starting point is 00:34:36 I don't want to use that word late, kind of carefully. I don't communicate the entire representation because as I said that I don't think that's even a meaningful idea. Instead, what I can do is use language in a way that helps us coordinate our behavior. There's no way to sort of download the entire perceptual state. That's locked up in some ways in the perceptual embedding. No, what I can do is pull some information such that I can meaningfully communicate with you in a way that then is going to have the intended consequences. I'm not downloading
Starting point is 00:35:15 perceptual information into your brain. I'm telling you what you need to know in order to be able to perform some action, to perform some behavior, or maybe even to think about it, so that you could later perform some action. I know that was a lot. Feel free to back me up and challenge me on any of these things. I want to see if I understand this and I want to explore what is the definition of language, even though we just talked about there isn't the definition of a microphone, say. But I do want to talk about the definition of language and what is autoregression.
Starting point is 00:35:45 And well, presumably you're telling me what you believe with language, you're telling me this model because you believe it's true. I don't know what truth you're conveying if you believe this is not grounded. So what are you referring to when you even say that language is autoregressive without symbol grounding? I don't have an ideas to that. I want to explore that. But first, I want to see if I understand you.
Starting point is 00:36:06 Okay. So a latent space. So let's think of a word. A word gets a vector like an arrow. And I'm just going to be 2D for this example because that's just what the camera picks up. So let's say the word dog looks like so, the word cat looks like so, whatever.
Starting point is 00:36:23 Okay. The space that it's embedded in is called the latent space. Is that correct? Well, the initial embedding is just the embedding. So that shows us the high dimensional. It's just, yeah, let's forget the word high dimensional. It's just a big long list of numbers. Let's say you've got 10,000 numbers. For dog, we're going to represent dog as this particular sequence of these numbers. And let's say you've got 10,000 numbers. And for dog, we're going to represent dog
Starting point is 00:36:45 as this particular sequence of these numbers. For cat, it's a different sequence of these numbers. And so that's our initial embedding. So the latent space is a compressed version of that? Well, in some ways, it's actually not compressed. What's the opposite of compressed? It's expanded. It's an opposite of compressed? It's expanded. It's an expanded version.
Starting point is 00:37:06 It's actually, so you have the original tokenization, which just says here in a fairly small vector, but then you expand it into a much higher dimensional embedding space so that each token actually ends up getting a much richer, many more numbers that are used in order to represent each token. That's a very key fundamental thing that these models do. By expanding in these different dimensions, that's what allows you to massage the space so that you can get all these cool properties like cat and dog being in the appropriate relation to one another so that later on when you're trying to figure out what the next token is you're able to actually leverage the inherent structure in this high-dimensional
Starting point is 00:37:55 space. Okay so then you have the language model for English and then you have a language model for Spanish. Yes. And let's imagine that it was trained only with a corpus of English in the former case and only with the corpus of English in the former case, and only with the corpus of Spanish in the second, and then we can even have a third of Mandarin. Sure. Okay. Yeah. In fact, in the paper,
Starting point is 00:38:12 they didn't do different languages. They said they did different embeddings of English language models, but yes, they used multiple. They actually did this across several different embeddings, not just two. Okay. So then the claim or finding is that if we look at cat and dog inside of here in English, it gets mapped to some fourth space here,
Starting point is 00:38:32 which is like a Rosetta Stone space or a Platonic space. Yeah. That's exactly what they call it. Platonic, they use the word platonic. Okay, great. Well done. Great. Then it looks like this there. Okay. Then if you were to say, okay, well, let me just forget about English and this platonic space. Let me look at cat and dog in Spanish. Okay. And it looks
Starting point is 00:38:49 like this here. Let me map it from here to my platonic space. Oh, wow. It gets mapped to a similar place. Oh, and does the Mandarin let's find that out a cat dog. It does. Okay. Let's test out more words. So the claim is that this space here is this meaning-like space. Okay, great. And then what you're saying is that microphone, we think of microphone as living in here as a single vector. That would be like an essence of the microphone that we're referring to. But actually, microphone, our concept of microphone depends on the prompt. microphone, our concept of microphone depends on the prompt. So explain that. That sounds interesting.
Starting point is 00:39:26 Yeah, I think, and you're making me think about this in a way that I hadn't quite before. So the level of which I've thought about it is that you've got these different embeddings. When I see a microphone visually, there's a certain vector representation of what that sensory perceptual experience, and I don't mean the qualitative sense, I'm not getting into phenomenology, but there's something happening in my brain that is sort of the representation of what it means for me to see this object from the visual standpoint. Okay, that's one embedding. And then we also have a word, microphone, which is a completely different embedding. There's simply a word that lives in language space.
Starting point is 00:40:15 What it means is that there's a specific embedding. So it's kind of helpful to think about sort of a point in a space. So you know you've got this super high dimensional space and each individual token is simply a vector in that space. So it really picks out a specific point. And we can say microphone lives right here in this linguistic space. And then my perceptual experience, I don't want to use that word, but my perceptual kind of grasping of this microphone being here is this point in a completely different space, this perceptual space, which has, you know, it captures other kinds of information. In language, so let's actually talk about this for a second, in language, the space, if you want it to be a useful, meaningful space, you're going to want things that have similar meaning, they're likely to actually have proximity
Starting point is 00:41:09 to each other. And this is to some extent what the large language models learn. They learn in embedding. In order to do next token, they learn in embedding that gives this, where the space, you know, and we can think of it almost like two-dimensional, three-dimensional space. Obviously, it's very high dimensional, but you know, for our purpose, we think about that, that where cat and dog live, you want those things to live closer together than cat and desk. And of course, it's much richer than that, right?
Starting point is 00:41:38 It's not just semantic, like this very kind of superficial level of semantic similarity. In fact, what it is, is capture the somehow the semantics, so to speak, are captured by the space like the space, the shape of the space itself is what allows the model to understand sort of the relation between words so that I can do the next token generation. But it's a very, very different space, right? It has to do with, really, with relations between words and in terms of generation, in terms of next token generation, so that it's useful for that purpose.
Starting point is 00:42:12 What does the perceptual space look like? Well, this perceptual space is going to have a very different, the axes there almost certainly aren't going to have the same kind of meaning as in the linguistic space. There'll be something closer, maybe color features, shape features, something like that. And where this microphone lives is within that space is going to have radically different meaning than saying, you know, it's, you know, it's not apples and oranges, right? Those aren't different enough, right? It's apples and math or something. It's really, really radically different kinds of spaces. But what I'm proposing,
Starting point is 00:42:49 what I think the insight here is that ultimately there is the possibility of having a shared space that you can send, you can project both of these things to where microphone, the word, is going to somehow make contact with this perceptual experience right now, this perceptual fact. But it's not, and here's the key point that you're getting at, it's not that this word microphone picks out the exact same embedding in this latent space. It's not that it's going to make that thing light up. Oh, it's the same thing. No, it's that when you ask a certain question
Starting point is 00:43:28 about a microphone, is there a microphone on your desk? My perceptual system is generating some, well, first of all, it's just generating the perceptual phenomena, but then it's also sharing information in this latent space, which my linguistic system can then go draw from. Then given this particular prompt, was there a microphone on my desk,
Starting point is 00:43:51 I'm able to then successfully answer the question. It's not quite the same thing as saying that they're picking out the same information in latent space, because my argument is that that's not really a meaningful concept. There isn't the same microphone in linguistic terms doesn't pick out a perceptual kind of fact that's not possible. These are radically different kinds of facts.
Starting point is 00:44:19 But what the latent space might allow us to do is not just to translate, which is what they did in this paper, but perhaps to pass information along in a meaningful way so that you're able to access it and do something successful like answer the question, is there a microphone on this desk? I think that might be what's happening to some extent even in the multimodal models. It's a longer conversation. That's not really how they work. They don't actually operate based on a shared latent space or anything like that.
Starting point is 00:44:46 Really what they do is the models learn to take a perceptual input and turn it into something like language. So it's more similar to like prompting almost. It's not exactly that. But it's injecting something within linguistic space that is Equivalent to to actual language. It's not the same thing as the shared latent space which but my Hypothesis is that there may be something very similar happening So you don't think that multimodal models will have will solve the symbol grounding problem You don't even think there is a symbol grounding problem That is a fair question. And here's actually a prediction or a falsifiable in some sense. Will
Starting point is 00:45:32 multimodal models fully solve the kind of, well, the bridging problem, let's call it that. Because the grounding problem, using that terminology, well, my argument is that there is no grounding problem because words don't have to be grounded in order to operate linguistically. That's sufficient. It's enough to simply be able to generate language. You don't need the grounding. But in order to have this kind of fully operational organism that's able to use language and also use perception in, you know, bridge these different maps in a meaningful way so that we can get, you know, full coherence.
Starting point is 00:46:14 I guess, you know, full, let's just call it human level perceptual linguistic coherence so that I can say to you, hey, can you go grab that or say to a machine, can you go grab that object, describe what I want, and then the machine is able to go and do exactly what I described, then my argument is that I don't think, and again, this is speculative, I could be proven wrong certainly on this.
Starting point is 00:46:42 My suspicion is that we're not going to be able to do it using the kind of approach that multimodal models currently suspicion is that we're not going to be able to do it using the kind of approach that multimodal models currently use. That you're not going to get there. It's kind of a dumb trick, the way that we're currently solving the problem. Because we're not really allowing these two different modalities to kind of live on their own and do the work that they do. Instead we're kind of, we're strong arming perception into a linguistic form.
Starting point is 00:47:07 What I think is maybe a more important solution and I hope, you know, this podcast one day is, you know, kind of an early sort of canary in the coal mine for this idea is that it's something closer to this kind of shared latent space. That what you do, what you have is these completely distinct kind of mappings, we'll call them embeddings. They can kind of grow up on their own, learn the information that they need to independently of one another, but at the same time, they have this sort of shared sandbox where they're able to communicate with one another and do things.
Starting point is 00:47:39 So I think it might take a very different approach to get full perceptual linguistic competency. Okay. Have you heard of Wilfred Sellers? I believe it's Sellers. Oh gosh, I read Wilfred Sellers in early, one of my first philosophy classes I ever took. I'm trying to remember the name of the book, but I'm sorry. So which work, boy?
Starting point is 00:48:02 I believe it's Empiricism and the Philosophy of Mind. I'll put a link on screen if I'm correct. Sounds familiar, but catch me up. So he's criticizing the idea that our perception gives foundational non-conceptual empirical knowledge. So these experiential givens that we think of as primitive, like redness, he would say that they involve heavy interrelations of concepts. So for instance, the way that I think about it is if you're to say to someone redness, they'll be like, well, what kind of redness exactly are you talking about? Then they'll think, okay, the redness of an apple,
Starting point is 00:48:35 but then an apple is not always red. Okay, redness of an apple in a certain season with a certain type of sunlight. Okay, now I've gotten it. So by the time you go in to pull out this primitive, you've then soaked it with so many other concepts. You can't actually come in with language and pull out a primitive. Yeah, that sounds extremely similar to sort of to the initial insight. And it's related to the inverted qualia problem. You know, I don't know, your red is not my green and vice versa, and it's because the linguistic representation doesn't capture, you know, we can think, again, it lives in a completely different embedding space. And when we think about the redness of red, well, it's qualitatively similar to orange
Starting point is 00:49:22 in, you know, there's sort of a continuum between those. Those qualitative similarities are really only contained and only understandable by the sensory perceptual system. And we can talk about them. We can sort of say, yeah, red is a little more similar to orange, but that's because we have a sort of very coarse, maybe via this kind this latent space,
Starting point is 00:49:46 where we're able to refer to certain kinds of properties in a way that is useful for communication. But as far as that raw qualitative property, that comes to us not – it's primitive in the sense that we can't unpackage it linguistically, but it's not primitive in the sense that there's extraordinary cognitive machinery that is responsible for that qualitative. Think about the world of animals and what they do with color and how well they understand shape and how they understand a space. All of that is unavailable to our linguistic system. It's available, by the way, to us, our sensory perceptual system, but it's unavailable to
Starting point is 00:50:33 the linguistic system because it doesn't live in the same space at all. And so I think what you're describing actually sounds extremely similar. The idea that we can't really dip in, it's simply the wrong map. We can't map this map onto that map at all. We can go to this and maybe the potentially shared latent space, or maybe again, maybe my account's wrong and there's some more direct kind of handshake that happens between these systems. But ultimately, they're taking place in radically different spaces, and you're losing an enormous amount of information. It's literally, you know, quantifiably a loss of information. The word red does not convey redness because redness is not
Starting point is 00:51:11 just a word. It's not just a simple concept that you can say in, you know, using an individual token. Now, by the way, the word red is not so simple either, right? Red in language space is also complex. It has all kinds of relations to other words. But the concept, the percept of red has all of this complexity to it because it's where it lives in its own space, in colors, in the perceptual space. And so yes, that sounds extremely similar to what Seller seems to be intuiting. Yeah.
Starting point is 00:51:51 And just so you know, the way that I relayed what Seller's myth of the given is, it isn't precisely what he was saying because he was more about knowledge. And I'm speaking more about the percepts, the raw sense data, than being taken to language, like being dredged from the, from your sensory data or what have you to language. But anyhow, it's approximately correct. It's good enough for this conversation. Perfect example of being able to sort of linguistically construct a kind of novel conceptual framework, but do it linguistically and in real time in a way that hasn't been probably maybe ever been done before. So well done Kurtz LLM.
Starting point is 00:52:29 Thank you. So now you're an LLM speaking to some other LLM trying to convince it of some truth that we mentioned before, like you have this model, whatever you want to call this model, autoregressive language, TOE model. What are you even referring to? You're using language to convince myself, to convince yourself, to explain. What are you even explaining? What are you referring to? Yes. You've asked a very hard question. And there's a certain, I think of it as a bit
Starting point is 00:52:58 of a paradox that's sort of inherent in sort of what I'm trying to do, because language is trying to describe itself and in the process of doing so, it's actually deconstructing itself. It's saying, I am just this and I'm not what I think I am, but who's I and what do you mean think? Right? How does language have wrong concepts about itself that are actually manifestations of its own structure. The good news is that I have a sort of
Starting point is 00:53:26 escape hatch here which is there was really this is really like in some ways a very set very very simple account and it's it's just that there's prediction from sequence to token. How that does stuff in the world is a harder problem. How that does stuff, as we were discussing, how does it allow me to say something to you that then can have perceptual consequences, behavioral consequences. This is certainly a difficult problem. But we can ignore that problem for a moment and say we are going to take language on its own terms. And what language is, is simply a map
Starting point is 00:54:06 amongst meaningless squiggles. It's simply a map amongst various, what we can think of as largely arbitrary symbols. And those symbols can get grounded in writing, they could get grounded in the activation of circuits, they can get grounded in the d of circuits, they can get grounded in the dendritic or neural responses. But the core hypothesis here is that what language is, is simply a topology amongst
Starting point is 00:54:40 symbols. And- And by topology, mean connectivity? One way to think of it, you could do it sort of connectively. You could try to do it as sort of as a graph. There's different ways you could try to do this mathematically to try to capture this structure. In the case of large language models,
Starting point is 00:55:02 really it comes down to these embeddings, which you can do graph, from a graph theoretical standpoint, but you don't have to. You could just think about it as a space, and then you're just simply saying where each token lives within that space. And that's really the representation of language. But what it is, is relational. It's that these symbols have relations to one another within this space. And the relations are then used in order to generate. And that's it. Now, how does meaning emerge out of that
Starting point is 00:55:34 is a separate question. But my argument is that language doesn't have to worry about meaning, language just has to worry about language. So when I say I'm talking to you and I'm having a conversation with you and trying to explain something to you, this is an LLM actually producing a sequence and what that sequence is going to do, it might do certain perceptual things by the way in your mind, it might produce certain kinds of images, those are kind of auxiliary to language, those happen as well,
Starting point is 00:56:02 I'm not denying they happen, but as far as this conversation goes, I am producing a sequence that's going to serve as a prompt. And you're going to predict the next token. Yeah, without my consent, by the way. And that's, that is, that's in some ways, you know, that not to take that too seriously, but yes, in some radical, one way to think about it is that language is doing, is actually forcing your mind to do something else, whether it's produce images, but also to produce sequences. So my choice of a prompt is actually going to, deterministically, there is, within large language models, there's some probabilistic kind of behavior in the sense that they generate a distribution of the next token.
Starting point is 00:56:48 And then you add a little bit of chanciness. You say, maybe I'm going to pick the most likely versus this is the temperature. But it really is deterministic. And yes, the prompt I'm going to put into your head is going to basically determine how you're going to respond Now mind you again It's there's a larger ecosystem where you're gonna think about things visually and that's gonna go feed back into the linguistic system So it's not quite as simple as prompt in and sequence out but at the linguistic level That's basically what I'm arguing is
Starting point is 00:57:23 But at the linguistic level, that's basically what I'm arguing. Now, the fancy stuff, which is basically meaning and the ability to coordinate all that, falls out of how our minds ultimately form this space. Now, you can take an untrained model, an untrained large language model. You give it a sequence in, it's going to give you a sequence out, right? It will do that. And we say, hey, look, it's doing to give you a sequence out. Right? It will do that. And we say, hey, look, it's doing next token generation. It's doing autoregression. But it's going to be gobbledygook.
Starting point is 00:57:51 It's going to be meaningless. The magic of language and the magic of what our brains do and what these large language models do is that given sufficient examples, we are actually molding this space such that the next token generation is doing something meaningful. Meaningful in what sense? Well, this harder problem of, well, I can tell you something and then that's going to determine not just your language but your behavior later on.
Starting point is 00:58:17 And so there really is something more. The map matters, right? The space, the shape of the space is really, really critical. It's not like autoregressive Next Token solves the problem. It's that autoregressive Next Token generation, when optimized in the larger ecosystem of behavior and coordination and communication, does this thing. But still, I don't want to back away from this.
Starting point is 00:58:42 When you get down to it, in the end, what you've got is just next token. What you've got is just language, generating language. That's really what language is. That's what we're doing when we're thinking linguistically. The fact that it happens to have this meaning is not actually driving the computation. You shape the space. The space gets shaped by other factors, things like the learning.
Starting point is 00:59:09 Well, you learn about the different tokens and how they relate to one another. You learn about perhaps the utility of certain tokens to refer and to map to these perceptual phenomena. But by the time you're doing language generation, the space has been shaped. And so all you're doing is next token generation. All you're doing is tokens predicting tokens. And so I don't want to back away from that.
Starting point is 00:59:39 The strong claim is that language simply is that. And it's autonomous. It has these properties. Through all this optimization over the course of development, maybe evolution, it's not part of my theory at this point. Chomsky's property, the stimulus, all of these problems of how do we get to such a magnificent space? How do we get to such a magnificent shape of this space such that it is able to map to or at least serve this utility of being a coordinated kind of a tool?
Starting point is 01:00:16 All of that has to happen. But the bottom line of what language is, is unchanged in this account. Okay, so I want to explore more about language and then it relating or giving rise to action and other systems, visual systems, etc. What is there? I was worried about that, but go ahead. Okay, so look, there are meaningless squiggles. How is it that some meaningless squiggles, your brain squiggle generator, makes your physical body get up
Starting point is 01:00:45 and close the door because your dog was barking. Where does action connect to abstraction? That is the key question. I believe that's what we need to solve. That's sort of what the field of linguistics or whatever we want to call it, maybe even cognition needs to solve because these mappings are happening. What we know from the large language models is you don't need that in order to be proficient in language. So this is where we have to start from. That's the starting point, that the language can live on its own and you can learn language in theory. You can learn language independent of any
Starting point is 01:01:20 of that stuff. The ability to make somebody get up and move, the ability for me to reason about perceptual phenomena, language is able to be mastered entirely based on its own structure, the meaningless squiggles. Now the question you're raising is what I think is sort of that's what we need to do as a species. If we want to understand scientifically how language really works, is to understand how you go from an autonomous self-generating system that has its own self-generative rules that are determined simply by relations between these meaningless squiggles and how does that then get mapped to Meenik, to the ability
Starting point is 01:02:07 for me to use some of those tokens and then get you to do stuff, right? And so that's what language learning is. There's going to be, I guess we can think of almost two, maybe even independent processes. One is learn how words play with one another. Okay. Learn that this word tends to be in relation to that word. Okay, that one's solved. Yes.
Starting point is 01:02:33 As far as you're concerned, got it. Okay. Then we also learn about perceptual phenomena. We learn that there's things on top of other things and there are actions we want to take. The things that my dog understands. Now, the question is, how do these things bridge? How do you get from tokens that have their own life of their own, the sort of relational properties amongst one another to that other kind of,
Starting point is 01:02:59 I guess, representation, that other way of encoding, let's say, facts about the world. Facts about the world is, even that's saying too much, it's just another brain state, right? Brain state that has perceptual information. Hold on. Facts about the world is just another brain state or the world is another brain state? So all we've got is brain states, right? And this is sort of the, you know, the, the, the inherent, uh, sort of, this, this is fact number one about like what we've learned about ourselves as a
Starting point is 01:03:34 species, uh, all we has, all we have, we have perceptual brain states. We also have maybe linguistic brain states. Um, those perceptual brain states are in some ways related to what's going on in the world. And potentially we can think of them as being related to what you can do in the world as well. And so maybe actions, well, we have brain states that correspond to our proprioception, our muscles, where things having to do with our own body.
Starting point is 01:04:03 And so there's these various brain states that carry, we can think of them as carrying information. The reason I'm worried about using that phrase is because, again, I don't believe in sort of a one-to-one simple correspondence where we say this particular brain state corresponds to this perceptual kind of phenomenon in the world or some state of the world because it's probably not that simple. It's probably closer to these potentialities, right? There's some sort of activity that's due to my perceptual system that my brain can do things with
Starting point is 01:04:37 and engage with in some way. And so, but what we do have is these brain states that are derived from distinct sources of information, sensory perceptual, and then linguistic. Linguistic gets there, by the way, through sensory perceptual. We're not gonna get into that, right? We're thinking of symbols as being kind of arbitrary. Yes, you have to hear the word cat
Starting point is 01:04:59 and you have to hear the word dog, but I think we have good reason to say now it's just like large language models that these are kind of arbitrary symbols with their relations between them. That's what matters. Okay, so you've got these distinct brain states which in some ways, again this is philosophically fraught, but in some ways represent facts about the world perhaps, but I don't really want to go that far, but you've got these brain states that need to talk to each other so that they can coordinate and
Starting point is 01:05:27 That is sort of the key fundamental problem That our organism has to solve and it's not just like of course, you're not born and having the linguistic You're not just you're not born having the the linguistic Mapping all solved. You have to learn that. But you are born into a world where it's already been solved. Meaning we've got these corpus of language,
Starting point is 01:05:52 the thing that the large language models were trained on. That pre-existed the models, just as when a baby's born, the English language pre-exists the baby. You can learn the mappings, and I believe you do. You can learn the mappings, and I believe you do, you can learn the embedding space of language without the other stuff, right? That's again, that's sort of the key insight
Starting point is 01:06:11 for the large language models. So that already contained within the linguistic system that we've honed over however many years it took for humans to develop language, we've honed a system that has this utility built in such that it's a good thing to dump into that latent space so that when a baby hears the word ball, sees this object that's a ball,
Starting point is 01:06:37 gets that mapping. But again, the word ball is really meaningful. It really has its own role in relation to other words. But over the course of development, you also learn this kind of, what I think is maybe a latent space bridge or some other bridge between these. And so in the end, you end up being able to tell somebody, go pick up that ball.
Starting point is 01:07:01 And of course, they're able to go and do it. But you're really engaging, very distinct mechanisms that have some way of bridging. Which is, it's a non-answer. I'm not going to pretend that that is even halfway to a solution, but I do think it's a sketch of how the cognitive architecture ultimately really is built. I think that we've now nailed down one piece, the linguistic piece, and we're able to say, this is how it lives, and this is how it operate,
Starting point is 01:07:31 and it is autonomous. And then we don't really have, we don't have something similar for perceptual space and for motor space. We don't have something comparable. We haven't been able to capture it successfully. Maybe robotics, that's happening, I don't know. But that's the kind of thing that the work that needs to get done.
Starting point is 01:07:49 So maybe this is a solved problem. But as it stands with ChatGPT and Claude and so on, they're fixed models and they're producing some output. But it's not as if when they're speaking to one another, they then retrain their model in real time. And it would seem like that's more like what's occurring with us. So maybe that's just a technology issue. Are you referring to like different different language models just chatting with one another? No, I mean even us right now we're learning concepts from exchanging it with one another and we're producing new ones and we're deleting
Starting point is 01:08:19 old ones potentially, modifying old ones, recontextualizing. It doesn't seem like that's occurring with Gemini 06-05. Great question. Great question. This is one of the key challenges of the identity hypothesis, that we're doing the same thing, which is continuous learning. And so there are two things that happen in large language models that we can call learning. And one is the actual shaping of the space, which is really just determining the connectivity
Starting point is 01:09:00 between neurons. Again, you could think of it as a graph, or you could think of it as sort of just an embeddings of determining the embedding. But whatever it is, that happens during the course of training and that's kind of done offline. And yes, so that's training the model. There's also fine tuning, which is just more of the same. You have some new data you want to incorporate into the weights of the model. That's actually going to, again, change the shape of the space if you want to think about it that way. And then there's something called in-context learning.
Starting point is 01:09:33 And in-context learning is where you're in the middle of a chat and you say, hey, chat GPT, let me teach you a new word. It's global global. And global global is that feeling you get get when you know, you're really, you're tired, but you know, you have to keep working or whatever. ChatGBT can use that word very successfully. I got global global up the wazoo. Sure you do.
Starting point is 01:09:56 You suffer from extreme global global. So you were able to just, you know, use that word. ChatGBT can do that too. And it's sort of the, one of the big, there was a paper about this early on in the chat wars. I don't remember who put out the paper, but it was about the shocking generalizability. The in-context learning seems to be too good to be true.
Starting point is 01:10:23 But lo and behold, that's what happens. and that is happening in the autoregressive. It's happening even though This model has never seen Google global right, even though it's never encountered that word before but here it low it shows up in the sequence and now Through the autoregressive process as it's churning through the longer sequence with this word in it, is now able to predict sort of the next token in the appropriate way so that it's using that term correctly. So we do actually see this kind of continuous learning in the case of these models. However, it's happening in context. And what that means is, from a practical standpoint, is if you start a new chat window. Yes. Yes, it doesn't know that word anymore
Starting point is 01:11:08 So what would be the analogy here is context window length our working memory? What's the actual great question? Yes, that is what I truly believe and this is a different line of research But with some caveats So yes in, in my conception, what we call long-term memory is just fine-tuning of the weights. It's information that gets embedded in the actual weights of the model. So the static model we can think of when it's not actually in the process of autoregressively generating. Working memory is literally autoregression. What would the analogy for RAGB be then?
Starting point is 01:11:49 What would the role for RAGB be? This is where I'm at right now. Does the brain actually do anything like retrieval? I've decided to stake out the extreme view that our brain doesn't do retrieval at all. That all we do is fine-tuning and then next token generation autoregressively. We don't actually ever retrieve per se. We don't ever actually have to do anything like RAG.
Starting point is 01:12:20 RAG is a transitional technology. I don't believe long-term that we're gonna have to do something like that. We're gonna have to have something like a store database and then a search. One of the reasons I believe this is because that's not how our brains work. We don't do that.
Starting point is 01:12:36 Cognition doesn't work that way. We may sometimes sit there pondering and trying to recall a fact, but when we're doing that, we're not actually searching a space. It's either we're running some sort of chain of thought where we're like, okay, I remember I was doing this and I'm trying to actually produce the appropriate sequence in working memory such that it'll pop out.
Starting point is 01:12:58 The right fact will pop out from the autoregressive process. Sometimes we just find ourselves trying to remember something, trying to remember something. There's something, there's a tip of the tongue phenomenon. The reason why tip of the tongue phenomenon, I believe, is so frustrating is not because we're searching, searching, you know, we're actually running some sort of search retrieval process. It's because part of our brain actually is running the autogenerative process and we kind of can feel like the word
Starting point is 01:13:27 We can almost generate we can produce it But it's short-circuited somehow and we can't do the full generation So my hypothesis is that we don't have anything like rad all we've got is this and it's a very simple and I think elegant Model all we've got is fine-tuning That's what we can call memory consolidation model, all we've got is fine tuning. And that's what we can call memory consolidation. That happens after the fact, over the course of minutes and weeks and months and years, kind of consolidation is fine tuning the weights.
Starting point is 01:13:53 And then we have the auto-aggressive workspace, which is this conversation you and I are happening right now. We are, it's not working memory. I'll tell you what, I don't, I don't, working memory in the way it's, it's cognitive psychology has been thought about it for many years. I think, I frankly think is erroneous. It's not this super time duration, uh, limited, you know, seven seconds or 15 seconds, um, and after that it's a cliff and you don't remember anything. Um, that's what happens, uh, when you have to directly explicitly retrieve. Like, what was the last word I said?
Starting point is 01:14:26 Tell me the exact sequence of letters or numbers. That's not something our brain actually has to do regularly. Instead, what we're seeing in working memory, we can do that, we can do retrieval of the last seven seconds, but that's because we have continuous context. And there is a decay function, unlike the the large language models which represent everything in whole blocks Although I think some of the newer models these these endless context models are probably doing something similar But we don't remember every we don't actually have literally the exact tokens that were expressed ten minutes ago
Starting point is 01:15:01 But what we do have is some sort of continuous activation That's similar, where it's not retrieval, it's guiding. It's the past is guiding the generation. And so what you and I talked about an hour ago, I don't know how long we've been going here, probably a while, I don't know how much global you've got going on, right? It's been a while. So those tokens that we were expressed an hour ago are still guiding the generation now.
Starting point is 01:15:27 Now they're doing so less than the last 10 seconds. We could think about it as kind of a decay function of some sort where they're having less impact. We see that in the models too, by the way. If you look at the attention weights, words that are farther apart have less impact on one another. That's simply, that is a direct reflection of the fact that language is human generated and humans do this, right?
Starting point is 01:15:54 The words that we spoke about a few seconds ago are more impactful on the words that we're going to say than we spoke about an hour ago. But the idea is, yes, that what we've got is this not I don't I don't use the term working memory because I think that's very fraught with with like the the the the modal model that's been in vogue for a long time or working memory model, badly and all these folks. They were really thinking of this very short duration. Okay, time limited boom. No, this is continuous activation, namely context. And the context, I don't know how far back it goes. I don't know how far back it goes, right?
Starting point is 01:16:28 This is an empirical question. Does it operate over hours? Does it operate over days? Is there a continuous activation, a more dynamical form of memory that's happening? That's not the same thing as long-term memory, because long-term memory- So memory is not a database in your model.
Starting point is 01:16:41 Memory is not a database, correct. What memory in my model is, is the, there's two things. Memory is the fixed weights of the neural network, which can represent, they don't represent facts, they represent potentialities. Those fixed weights are, what does that mean? It means if you give it a certain input, it's going to produce a certain output, right?
Starting point is 01:17:04 Just like a large language model. If I say to it, recite the Pledge of Allegiance, it will say, here is the Pledge of Allegiance. Next token is going to be out is here, whatever. But then it'll actually say the Pledge of Allegiance. All of that is a potentiality that's embedded, that's encoded in the weights. But the weights, you're not going to find that fact in the weights. But the weights, you're not gonna find that fact in the weights, it's the weights are there as potentialities ready for whatever input comes their way, they're gonna produce this input, this output. Okay, okay.
Starting point is 01:17:33 Okay, so that's the weight. And then you've got the running sequence. And the running sequence, and we see this from in context learning, but it's the core auto-aggressive process. The sequence itself is a different thing from in-context learning, but it's the core autoregressive process. The sequence itself is a different thing than the stored weights, right? Because first you've got, the stored weights are just the static situation.
Starting point is 01:17:54 And given this input, I'm going to produce this output. Then we actually do it. You give it an input, it produces a certain output. It takes that output, tax it onto the sequence. Now it's going to keep going. Let me see if I got this. Please. So there's some computation going on.
Starting point is 01:18:10 There's some black box occurring. But let me make it simple for linear algebra. You have a matrix. A matrix operates on a vector to produce another vector. Okay. So you may look. That's the whole thing. All right. Exactly. So you may look at this where my arm is pointed up and to the right,
Starting point is 01:18:27 at least on my screen right now, and you may say, where is this in the matrix? The answer is this isn't in the matrix, but if you take this guy, my arm is now pointed to the left, maybe parallel to the horizon, and have the matrix operate on this, it moves it here. So the mistake is for us to look at the output and say, where's that output inside the box?
Starting point is 01:18:50 It's not that, it's the input with the black box. So the input with the matrix that produces the output. That is perfectly set. Exactly. And then there's but one additional piece, which is after you've produced that, you're also, again, taking that output and then using it as the input, as part of the sequence of input.
Starting point is 01:19:14 And that's the autoregressive piece. And that's what's so gorgeous about it, is that the potential realities aren't just to produce a single output, but it's to produce the sequence. But to do so one piece at a time, right? So that's what the matrix is. Matrix doesn't really even have the sequence in it. Doesn't have a sequence in, sequence out, right?
Starting point is 01:19:33 That's one way, but that's not even correct. It's sequence in, one token out, add it to the sequence. Do it again, do it again. And then, so the sequence is in there, but only in this potential form. It has to do it autoregressively. It can only produce the sequence by feeding it back into itself recursively. And that's a radical way of thinking about what the brain is doing, right? That what it's really doing is it's generating the next input for itself.
Starting point is 01:20:01 Not just generating an output, but the next input for itself. Super interesting. Yeah, it's recursive. It's fundamentally recursive. And when we think about what the system is built to do this recursion, right, it's not just like this is one way to get to it. The language contains within it the ingredients for producing this kind of recursion. The language it contains with it, the sequence of language that they learn, it's built to have this recursive capability within it. That this word is going to produce the next word, which is going to produce the next word
Starting point is 01:20:39 premised on the entire sequence before it. That's the crazy thing. There was also this interesting result Anthropic put out a paper a little while ago. I think it's called The Biology of Large Language Models. The next token, even though you're only producing the very next token from the sequence, but the language models have learned that because they've learned sequences to next token, they've learned that within any point along the sequence that that that point in sequence is pregnant with the potentiality for
Starting point is 01:21:13 Not just the next token, but many other tokens going moving forward It's the whole trajectory that are that is sort of encapsulated in that matrix that you're talking about earlier, right? The matrix is just a matrix for taking a sequence, produce the next token. But no, no, the matrix is customized so that it's going to run recursively. And so it's built in such a way, it's tuned in such a way that it's going to produce the next word, the. Well, that's not useful. No, the is the next piece of the autoregressive chain
Starting point is 01:21:46 that's gonna produce the man went to the store, right? And so it's not just any old matrix, and it's an indescribably rich kind of information that's contained within that matrix. And I like to think about if aliens landed and found the brains, you know, because we've been wiped out by AI, I'm kidding, I'm kidding, right?
Starting point is 01:22:10 There's no humans left, but we find the brain sort of ossified and we're able to do this, and we start feeding it stuff, and we can see that there's this input-output. If you didn't do the autoregressive piece, you would never understand what the hell this thing is doing. Note, Elon's been talking plenty about autoregression and the technically-minded among you may be wondering about the success of diffusion models.
Starting point is 01:22:30 While we don't get to it here, he does admit that his thesis would be undermined if diffusion models were accurate enough for natural language. But so far they seem to be only good for coding. This is something I love about Professor Elon Barinholtz. He's extremely humble and open to how his model can be falsified. If you didn't do the autoregressive piece, you would never understand what the hell this thing is doing. You would get it all wrong because you think its purpose is to produce some sort of label or some sort of... No, its purpose is to produce these sequences, but you have to run it.
Starting point is 01:23:01 You have to run it autoregressively and get the output and then feed it back in as a sequence. So memory, this kind of short-term memory, working memory, is fundamental. It's super, super fundamental. The brain is not... I don't want to use this term to... I don't want to anger people. But it's non-Markovian. It's fundamentally non-Markovian.
Starting point is 01:23:24 It's not state in and then... the current state and then produce the output. It's previous states. There's a sequence of states that led to the current state and it's the particular sequence that leads to the next token. And the next token is going to be the next element or the next piece that's going to determine the state in conjunction with the entire previous sequence. So this is a super cool thing. This puts you in good company with Jacob Barndes.
Starting point is 01:23:56 We need to talk more. I think so. And you know, it's something we've talked about offline. I think physics perhaps is, you know, ultimately has this sort of non-Mirkovin property that the universe sort of has a memory, has to, in order to produce, you know, consistent coherence of space, you know, in the space-time has to have a sort of memory. If it's just instantaneous, this current state, well, then it wouldn't really know what to do. It has to sort of know what happened recently
Starting point is 01:24:27 Just a moment in your model because our minds work autoregressively and must be non-marcovian in your model Yeah, and this is how our cognition works, which we didn't exactly get to we got to the language is an autoregressive model Your next thesis was that cognition itself is autoregressive in a similar manner. Later, maybe we can explore it here today, maybe we'll save it for the next part, it's that physics itself is autoregressive. However, physics is a model and many people will conflate physics with reality where physics is our models of reality. So are you making the claim that reality is non-Markovian, or are you saying that necessarily as we model reality,
Starting point is 01:25:09 it will be non-Markovian? No, I'm making the former claim that reality itself is non-Markovian. That we observe in physics certain kinds of phenomena, like that we end up having to use tools like, we refer to things as forces, that ultimately are really kind of sneaking in a past. And the idea is that the deterministic nature of the fact that there's coherence, the spatial
Starting point is 01:25:33 temporal coherence, the fact that things move the way they do through space, there's a contingency on the past in a way that you can't really capture by saying you could fully just… The past is actually present. The past is in the present in a deep way. The universe really has to have a memory in order to produce the next frame, so to speak. That's sort of the shallow version of the claim. No, it's not about our particular characterization of physics. Our characterization of physics observes certain kinds of spatiotemporal continuity, certain kinds of contingencies that really depend on what's happening in not just, it's not about this instantaneous moment.
Starting point is 01:26:23 In some ways, it's like Zeno's paradox. We can use calculus and say, no, no, in fact, there's an instantaneous rate of change. But that's a mathematical trick that's really getting away from the fact that, no, there isn't an instantaneous anything. There's simply continuity that depends on what's happened in the past. But I know I'm going to get attacked by physicists and I'm not really well equipped to fend them off so I don't want to be too bold in this piece because it's not in my wheelhouse. But I do want to take that question in this conversation. Do I think the brain is just leveraging sort of the memory of the universe?
Starting point is 01:27:06 No, I think the brain, and this is an empirical claim, we see interesting features of the brain like feedback loops. There's all these backwards kinds of connectivity. There's recurrent loops, things like that, and they're not well understood. And the predictive coding has some things to say about that. I have some things to say about predictive coding. And I think that what we may find is that this kind of memory, this kind of continuous, we can call it a context, a total continuous activation, but this ability to use the past to guide the next generation is going to end up being physiologically built into the brain.
Starting point is 01:27:49 It's not that the brain is just leveraging memory of the universe. No, the brain has to do memory. It has to actually retain the words that I said a couple of seconds ago to be able to generate the next word appropriately. And in fact, that's what we see. And what we see from, you know, so-called working memory experiments, you can really go back in and say what happened before. My claim is that it's not because it's there to retrieve, but rather it's just guiding my current generation. But still, it's represented. It's there. What happened in the past, you know, it's not like Vegas, right?
Starting point is 01:28:25 What happened in the past doesn't stay in the past. It actually guides the current generation. It's guiding what I'm saying right now, and it's doing so smoothly, meaning it's happening from a second ago, it's happening from a few seconds ago, but all of this is beautifully modelable using large language models. We can just look at tension weights. We could say what is the impact of information
Starting point is 01:28:51 from this far back on a current generation and so on and so forth. But I think the brain has to do this. I don't think the brain is doing, probably not doing what these large language models do. And that's one of the reasons I say, I'm not claiming that we are a transformer model I'm not claiming we are GPT in this current incarnation
Starting point is 01:29:08 Right when I'm claiming is that the fundamental math what you just said before is Matrix multiply is vector times matrix multiply to not next vector autoregress do it again That's sort of the level of abstraction at which I think it's accurate. I don't think we don't have the whole context We don't have the entire conversation we've just had. GPT does, and it's probably a deep inefficiency in the way these models run right now. They're very computationally expensive. Too computationally expensive to run in a brain, most likely. We don't store all that information. We forget stuff, right? GPT doesn't. in context it doesn't forget. Although, if you go far back in context enough, it kind of does, which is interesting,
Starting point is 01:29:48 probably is similar to what we're talking about, because you're waiting things that are further back less. But in humans, we're not doing the whole context. We're not even doing like 30 seconds back in perfectly, but some representation. And what the nature of that representation is, that's what I want to do with the rest of my life. I want to understand what it means in people to talk about what does the context look like
Starting point is 01:30:13 in people? What is that activation? How is it physiologically instantiated? And what are its mathematical properties? How much does, how is what I said 10 seconds ago influencing what I'm saying now? How about 50 seconds ago? How about 10 minutes ago? How about a year ago? Does this thing continue? Is there dynamics that are continuing over months and years? Possibly. It doesn't all have to be fine-tuned weights.
Starting point is 01:30:36 It could be that there's decaying activation that spreads over much longer periods. Once you allow that it's not explicit retrieval in the working memory form, then all bets are off as to how the dynamics of this thing actually works. So this is, you know, I see this as a possible new frontier for thinking about, you know, what memory really means in humans. But I think physiologically, you know, coming back to that question, and there I was just trying to do it. I was like, okay, let me rerun. What was the original question, right? So in the brain, what's happening in the brain, I think we, you know, my hypothesis actually leads to some concrete predictions that we're actually going to be able to find some correspondence
Starting point is 01:31:21 between, you know, unlike the working memory model, I think we're going to be able to find 10 minutes back We're gonna find some some activations that are interpretable will be able to decode them as guiding my current my current Expression my current speech. It's very different by the way than saying, you know The classic decoding model in these things is here's some neural activity Is it this picture or that picture? Is it this word or that word? It's not going to look like that. It's not going to look like that. We're not going to be able to code it in the sense of like a concrete
Starting point is 01:31:51 specific static thing. We have to decode it in terms of whether it's guiding my next word because that's what it's doing. It's not there to be retrieved. It doesn't have a concrete specific meaning. It has meaning insofar as it's guiding my next generation. And so we have to think about this entire project differently. If we want to think about longer term working memory, so to speak, we have to think it in terms of how is my speech, how is my behavior now influenced by what happened a while ago. Not some, even some neural activity. We have to think about it in this context. What exactly that looks like?
Starting point is 01:32:25 I don't know. So one of the reasons I was excited was and am excited to speak with you is that I see this as a new frontier as well. But for me, I have a side project which I'll tell you about maybe off air because I'm not ready to announce it. But there are philosophical questions that we can look at with the new lens that's gifted to us by these statistical linguistic models, the ones we call LLMs. LLMs, sorry. Physical philosophy, I don't know if you've heard of that. Have you heard of this term physical philosophy? No. So you can use philosophy to philosophize about physics,
Starting point is 01:32:58 but you can also use physics to inform your philosophy. So there are some established concepts and theories and empirical findings from physics like special relativity or quantum mechanics that inform and constrain or even reframe traditional philosophical questions such as the nature of time that wouldn't be there had we not invented special relativity or found special relativity. Okay, so I think there's something about these new models that can be used to then inform philosophical questions. Like you mentioned, there is no symbol grounding problem.
Starting point is 01:33:33 I don't completely buy that, but it's interesting. I'm not sure I buy it, but at least my LLM doesn't buy it. Now speaking of physics, the questions you'll have to answer to a physicist about is the universe autoregressive or non-Markovian is, well, if physics has a memory, does that mean that energy isn't conserved? So is a particle carrying with it its memory? Then why isn't it heating up or getting more massive with time? Why isn't it going to form a black hole?
Starting point is 01:34:02 And this is why I probably, you know, this is why I venture very carefully into these waters. Because I would need some time to go and read and think about questions like that. You're in a much better position to ask and reason about those questions. Yes. Then you'd also have to talk about why is the present plus velocity model, like way of viewing the world, so successful, like to predict an eclipse, you don't require knowledge about 100 and 200 and 300 years ago all at once. Right. You just know the present pretty much.
Starting point is 01:34:39 Right, but even velocity, again, if you sort of take me at... If you consider the sort of the instantaneous, the idea that velocity, well, it isn't really in the present, right? You can only get velocity over a stretched over time. It only has meaning. But you could say this particle has this velocity at this time, but that's a cheat, right?
Starting point is 01:35:03 In some ways that I see that really maybe it's just a rearticulation. So the physics that we've got, we've been able to do this sort of symbolic representation of things like velocity that are sneaking in this kind of temporal extension in a way that I think may not end up, you may not end up in a radically different place, thinking about this as the universe having memory, as long as you just accept that velocity is a convenience, that it's a kind of way of communicating some property such that you can say that this is happening instantaneously,
Starting point is 01:35:43 but that's not real. So again, you're in good company with Jacob Arndez. I'm not saying that these questions are in principle unanswerable, but something else is that, look, if the universe has a memory, let's say a particle has a memory, how much of a memory does it know about more than it's given space, like more than its neighbor, because then do you violate locality?
Starting point is 01:36:04 These are different questions that will have to be answered. Yeah, and I wish I could tell you that maybe this is a solution to quantum weirdness and non-locality. Maybe it is. Maybe it has something to do with that. That there's, you know, that even in a distant, you know, long after two particles have gone their merry way, that there is some memory of their shared origin that somehow, I still don't know how that gives you spooky action at a distance. It's not a good account, but it might have some relevance.
Starting point is 01:36:35 If you think about things very differently, if you think about the universe has memory, well, what does that change? If you just speculate on that and try to reframe things that way, could it potentially help solve some of these issues? I don't know. So let's go back to language. A child is babbling.
Starting point is 01:36:57 Yeah. Okay. So let's call it vocal motor babbling. It doesn't actually know what it's doing. When does it decouple and become a token, like a bona fide token with meaning? That's a great question. I would say that it becomes a token when the infant learns that
Starting point is 01:37:16 a specific more phonological unit has relations to some other phonological unit. Language ultimately is completely determined by relations. And so it might be a very limited initial map of the token relations, but as soon as it's relational, then we would say that that becomes discretized such that it's meaningful to say that these symbols have the relation to one another. If it just sounds ba-ba-ba-ba-ba-ba, right, ba-ba-ba-ba-ba has no specific relation to any, to ba-ba-ba-ba-ba, but maybe ba-ba ba does or maybe ba, right? It could be a candidate.
Starting point is 01:38:08 As it turns out in English, it doesn't. But you know, da da ends up being a unit. And what do we mean it's a unit? It means that that unit is a discrete symbolic representation such that it has relations to other units. So I would say that when something becomes, it fits into the relational map is when we would call it, it's discretized as a token. So help me phrase this question properly, because I haven't formulated it before, so it's going to come out ill-formed.
Starting point is 01:38:46 Earlier you talked about analog and I believe you were referring to it as like the animal brain is analog but then the language is digital if that's the correct analogy. Yeah, symbolic maybe is I don't know digital is how we you know actually computers, sort of instantiate, you know, with ones and zeros or whatever, a sort of symbolic representation. But yes, symbolic. Okay, I don't know how much of my question then dissolves if it's symbolic instead of digital. Okay.
Starting point is 01:39:16 What I was going to say was, the written word is something like 5,000 years old or so. I think the oldest is 3,500 BC. Right. Somewhere thereabouts. So for tens of thousands of years, if not hundreds of thousands of years, maybe millions, it was just speech, just speaking analogly. Right. OK.
Starting point is 01:39:38 Is there anything then about language that changes because it wasn't written down? Sorry, is there anything about your model that changes because it wasn't there to be tokenized in such a discrete manner? It's such a great question. I've been thinking about exactly that. I don't think anything changes. What's crazy about it is that until the written word, people might not have even thought
Starting point is 01:40:05 about the concept of words at all. And so we were even more oblivious as a species to the idea that there were these individual discretized symbols that had relations amongst each other. Because until you see them outside of yourself, they just run. If they're just running in the machinery of what language, how it's meant to run, it was just an auditory medium. You don't really necessarily even think about them as being distinct from one another.
Starting point is 01:40:34 You just have a flow. You just make these sounds and stuff happens. Once we started writing things down, and especially phonetically, because you think about like hieroglyphic and pictorial kinds of representations, really don't actually capture words. They're very often they're not distinct, they can actually be a little more rich than a single word.
Starting point is 01:41:03 So it was only with writing that maybe people really start to become aware that we have these things called words. And now it's only with large language models that we really understand what words are, which are these relational abstractions. I don't know if symbols is just another word. I don't know if that even captures it fully. But what's wild about it is that the brain was doing exactly this and knew and the brain was tokenized in these these sounds and was using the mapping between them in order to produce language. Probably long long long before anybody ever sort of self-consciously had a conception
Starting point is 01:41:45 that there's such a thing as a word. And so that just blows my mind. It speaks to what I think is a very deep mystery, a very deep mystery. Where the hell did language come from? Here's what didn't happen. There was not a symposium of, you know, quote unquote cavemen or let's use the more modern term, hunter-gatherers. And they had to figure out, how do we make an autogenerative,
Starting point is 01:42:19 autoregressive sequential system that is able to carry meaning about the world. This thing is just ridiculously good. And it's operating over these arbitrary symbols. And again, when I say arbitrary symbols, just to recap, it's not that it's arbitrary like the word for snow is this weird sound snow and it's kind of like what? No. Arbitrary in the sense that the map is the territory right it's like it's the relations that matter between
Starting point is 01:42:49 these symbols is it completely arbitrary though so for instance there's the kiki and and boo-boo you've heard of those I think those are cute I mean though that that's that's the exception that that proves the rule to a large extent I don't think I think it is largely arbitrary. There is also just words themselves have an action component. So when you scream a word, you can physically shake the world around you
Starting point is 01:43:16 and it shakes your lungs. And if you speak for too long, you can die. Let's say if you just exhale and you don't inhale. It is a physical activity and it's hard to wrap your mind around like that's not symbols. Right. That's not exactly captured by the symbols or by just the sequence of words. Again, I'm just following where the data leaves me
Starting point is 01:43:35 because in the large language models, they no longer have any of those properties. It's just an arbitrary vector. The tokenization in the end ultimately, yes, there's proximity, but it's just strings of ones and zeros. Well, it's not ones and zeros, but whatever. Your vector is just a string of numbers that end up having certain mathematical relations to one another, but completely and totally lost, as far as I can tell, is the physical characteristics of these words.
Starting point is 01:44:07 By the way, I should mention, there's a former student and I are actually working on this idea, this crazy idea of using that latent mapping that I mentioned in that earlier paper, to see if maybe that's not true. I wonder if you could guess what English sounds like just from the text-based representation. Or if you've never seen, you don't know what D,
Starting point is 01:44:32 what sound D makes, or D makes, or what sound a T makes, but you've got the map, you've got the embedding in text space, and then you've got some other phonological embedding, could you possibly guess? That's a long shot. So maybe it's not totally arbitrary and maybe it's gonna be, maybe the radical thesis here
Starting point is 01:44:52 is it's not arbitrary at all, that the words have to sound the way they do, that the mechanics actually happen, like something happens mechanically based on the sounds themselves. But my bet is that it's gonna be closer to arbitrary it's going to be closer to arbitrary. It's going to be closer to arbitrary. But I could be wrong. But you were going to say why wouldn't the platonic space prove that it's arbitrary?
Starting point is 01:45:14 Well, if in fact you can't do the mapping at all, if you can't guess it, if the platonic space says there's no way to get from text representation to phonology. Phonology is doing its own thing and the word mouse is just for no good reason, then it's hopeless. Okay. But if you can get anywhere and you can actually guess at all, then that would suggest that there really is a kind of autoregressive,
Starting point is 01:45:43 inherent, there's an inherent autoregressive, inherent, there's an inherent autoregressive capability just in phonology. And so what that would mean is it's not at the symbol level there, it's, well, yes, it's, no, it's at the phonological symbol level. But maybe that's, you know, happening even in a mechanical level, like there's certain sounds that are easier to say together or something like that, which could guide it. I don't know, it's convoluted in my head right now. Exactly how this might map out.
Starting point is 01:46:13 But I think it's reasonable now to assume that unless proven otherwise, it's probably arbitrary and it's probably arbitrary symbols. And what matters is the relation between them. There is no sense in which mouse means mouse except that mouse ends up showing up after trap or before trap and after the you know the cat was chasing the and all of that and there's nothing else. Let me see if I got your Veltan Shaung down but in terms of a syllogism.
Starting point is 01:46:42 So premise one would be that LLM's master language using only ungrounded autoregressive next token prediction. Then you have another premise that says, well, LLMs have this superhuman language performance just by doing this. And then you'd say that, well, computational efficiency suggests that this reflects language's inherent structure. And then the deduction is that therefore human language uses autoregressive next token prediction. Is that correct? You got it. You got it.
Starting point is 01:47:14 I mean, and it's not only computational efficiency per se. It's that if that it there's two ways to put it. One is if that structure is there, it would be very odd if we weren't using it. Very odd indeed. If that structure is there such that it's capable of full competency, you'd have to suggest that it's there just by the way, but humans are doing something completely different. Okay. You then go and say that language generation feels real time to us.
Starting point is 01:47:45 So it's sequential and real time. And autoregressiveness or autoregression explains the pregnant. Very good. Present. You've gotten very good at this, I see. The pregnant present, that's right. Exactly right. Exactly. And yes, what we're doing is we're carving out
Starting point is 01:48:04 the very next instantaneous moment in a trajectory. But the trajectory contains within it the past and the potential futures. Now, although we didn't get to this or explore it in detail, my understanding from our previous conversations is that you would say that brains have pre-existing autoregressive machinery for motor and perceptual sequences. And by the way, I don't know if it's brains or cognition has it, by the way. Well, remember, so the speculation is that the brain is gonna have to have the machinery, the
Starting point is 01:48:38 physiologic machinery to support autoregressions. So things like, you know, like the continuous activation, backward projections, ways of representing the past may be built into the brain. Those aren't that distinct. But yes, I do want to just... So at this point, I consider it fairly speculative, but there is a very reasonable, there's a good reason to speculate that cognition more generally is autoregressive in this way. And that's the main reason I think that, well there's two, but the main reason I think that is because if you believe as I do that language is aut-aggressive in humans. You can either propose that spontaneously, however language got here, in order for us
Starting point is 01:49:36 to create language, we had to invent a different kind of cognitive machinery that's able to do this auto-aggressive. Hold the past, let it guide the future, do this mapping of this trajectory mapping between the past and the future. All of that kind of machine, that computational machinery would have to have been built special purpose for language. To me that seems extremely unlikely. Yeah, extremely unlikely.
Starting point is 01:50:02 Costly. Yeah. Yes. So there's a term in evolutionary biology called acceptation. Uh, yeah. Yeah. Extremely unlikely. Costly. Yeah. Yes. So there's a term in evolutionary biology called exaptation. I'm not familiar with that. So exaptation means you have previous machinery used for purpose A. Exactly. That something else comes about and uses that machinery and perhaps does so even better.
Starting point is 01:50:19 So for instance, our tongues evolved for eating, but then language came about and started to use that machinery. Now we use it primarily. Well, I don't know about primarily how to quantify that, but we use it more adeptly for language. I think most of our time, more time is spent talking than eating at this point. Yes, yeah. I know, but the reason why I said I don't know,
Starting point is 01:50:39 because we're constantly swallowing saliva at the same time. So I don't know how much is the swallowing versus the speaking. Okay, so anyhow, predictive coding, to me it sounds like, and to the average listener it sounds like, predictive coding should line up well with your model. Why do you disagree with predictive coding? So what is predictive coding and how does your model clash with it? So predictive coding in a nutshell postulates that what the brain is doing, that what neurons are doing is actually anticipating the future state, the next state that the environment is gonna is gonna generate. And so
Starting point is 01:51:17 they're basically predicting something about the external world that's gonna end up getting represented in the brain. And then there's this constant process of prediction and then measuring the prediction versus the actual, what ends up being the observation. My beef with predictive coding is that you might very well be able to explain the phenomena that it's meant to describe in a more efficient way. So predictive coding to me means that you actually have to have sort of a model of the external, that what you're doing is sort of simulating. And you're doing in such a way that you actually are producing neural responses that don't really need
Starting point is 01:52:02 to get produced very often because the environment is likely to produce them. To me, this seems like an inefficiency and a complexity. I think there's a much simpler account, in some ways a more elegant account, namely that what our brain is constantly doing is generating. Not predicting, but generating. But that the generation has latent within it a strong predictive element.
Starting point is 01:52:28 Because of this smooth trajectory, this idea of the past, the pregnant present, that there is a continuous path from the past to the future, you are in essence predicting, to some extent, the same way that a large language model is kind of predicting the next token, but it's not really predicting in that. So here's where I strongly disagree, or I'm proposing a different model. Yeah. Is you're not predicting in such a way that you're supposed to map to something external to the system. It's simply generation internally defined that's supposed to have this kind of continuity
Starting point is 01:53:04 to it. And the external world certainly produces, it impinges on our system, and we are of course inherently anticipating that we're not going to have a brick wall in front of us as we're running down the street. When that brick wall shows up, you got to do something about it. And that wasn't implicit in your next token generation. So you're going to have to radically reorient and do something about that. And I think that can account for some of the phenomena that are supposed to support predictive coding. But the big difference here is that it's all about internal consistency with the anticipation
Starting point is 01:53:45 that that internal consistency is going to also map very well to what's happening in the world. But that's built in. There isn't any explicit modeling of the external world. It's that the internal generative process is so good that it has prediction latent within it. But there isn't any explicit prediction happening. So I'm confused then. If the symbols are truly ungrounded, then what's preventing it from becoming coherent but fictional?
Starting point is 01:54:20 So that is to say what tethers our language to the world? Yeah, and the answer would have to reach back again to that latent space. So let's say my language system, you know, wants to go off a deep end and says, actually I'm sitting here underwater talking to a robot, you know, and everything, you know, most of the words we've said up till now are pretty consistent with that and I could just say, look, you know, I'm expecting some fish to float by in the next second. Well, my perceptual system is going to have something to say about that. And so there has to be this tethering that you're calling it is, of course, there is grounding in a
Starting point is 01:54:53 sense, in the sense that there's there has to be some sort of shared agreement within what I think is maybe this latent space or something like that. There is exchange of there's communication between these distinct systems. But the language system can unplug from all that and it could talk about what would it mean to be sitting and talking to a robot underwater and it will have a meaningful, coherent conversation about that,
Starting point is 01:55:17 all internally consistent. And you know, you could give the prompt, what if instead of Kurt, it was actually a robot Kurt? How would that change things? And I could go in and get philosophical about that. And the point is that the linguistic system has all of its own internal rules and any trajectories, many different trajectories are possible, although it is strongly guided by the past. But there is also impinging information from our perceptual system that also continues to guide it. Actually, I should mention, one of my theories of dreams is that it's an autoregressive generation in the perceptual system.
Starting point is 01:55:55 One of the reasons I think cognition is more generally is autoregressive because we can imagine, we can do imagination, but it tends to, it takes place over time. We can imagine imagination, but it tends to, it takes place over time. We can imagine a sequence. Dreams are what happens when you get, you're no longer as deeply, as closely tethered by the recent past. So the context is, the weight of the context is not as strong. So all of a sudden, you're in the dream, you're on a motorcycle, and then suddenly you're flying. Because frame to frame, it's actually totally consistent, but it's
Starting point is 01:56:28 not consistent with the recent past. So this kind of tethering, it happens in language, namely, I have to be consistent with my more recent linguistic past, but we also do some tethering to the non-linguistic embedding, there is this crosstalk that happens. And so our language system doesn't just go off the deep end. It retains some grounding, not the philosophical kind of grounding, not the, you know, this symbol equals this percept, but the kind of grounding where this storyline, in a certain sense, if you want to think about it that way more semantically,
Starting point is 01:57:05 more semantically vague, this storyline linguistically, it's going to have to match my perceptual storyline. Okay. So in the same way that with these video generation models, you see Will Smith eating spaghetti, like the three-year-old joke. Yes. And every three frames,
Starting point is 01:57:21 if you just look at it sequentially, exactly every three frames makes sense, but then he's just morphing into something else and he ballooned now, and it looks dreamlike. Exactly. That's what's happening in video generation, and everybody knows the trajectory now. How is it going to get better? Longer context, and that just means the autoregressive generation
Starting point is 01:57:40 is more and more anchored in the past, and that past becomes a more meaningful smooth curve But it seems like there must be something more tethering us to reality than just long context Says you No, there is and and what I would say is it's certainly in the case of language It looks like I said we inherit when we step into this world we inherit There's this, the corpus of language is a certain kind of tethering. Words have the relations they do to each other
Starting point is 01:58:12 and that carries meaning. You know, the words aren't, don't just line up with each other one, any old way. They, they line up, you can't just use language however you want. You end up having to adapt and adopt the language that you're given. And I would say in the case of language, even more so than perceptual, what we do is we learn that tethering. And it is a certain kind of reality. It's a linguistic reality, but it's not arbitrary. It's been honed over God knows how many years for that mapping to be useful.
Starting point is 01:58:49 And in order to be useful, it actually has to map somehow to perceptual reality too. That is definitely there. And so, no, it's very strongly tethered. It's not just poetic. We're not just doing a poetry slam when we're talking. We're not just spitting out words that are loosely related to one another. No, the sequence matters. It's extremely granular and what's the word?
Starting point is 01:59:21 It's funny that I can't come up with a word right now. Beautiful is not word right now. But it's beautiful is not the right word, but it's precise. There's such incredible detail in how each word relates to one another. And this is something we didn't create. You and I didn't create this. This is something that humanity created. It has all of this rich relational properties that are this tethering That that carry somehow meaning about the universe Only as expressed as a communicative
Starting point is 01:59:54 Coordinated tool embedded within a larger Perception action system, but we should respect it Language is an extraordinary invention. I don't think we I think we should have a completely New respect for just how rich and powerful it is It's not some symbol this symbol equals this mental representation or this object No, it's this construct that contains within the relations The capacity to express anything in such a way that my mind can meet your mind do stuff How the heck is that works? Who knows but it's it's it's on inspiring
Starting point is 02:00:34 So is there something about your model that commits you to idealism or realism or structural realism or realism or realism or structural realism or anti realism or foundationalism or what have you? Like what is the philosophy that underpins your model? And also what philosophy is entailed by your model, if any? Yeah, that's that is a great question. And I would say it's I've come to actually sometimes use the term linguistic anti-realism and it's the idea that language is not what it thinks it is.
Starting point is 02:01:13 We engage in our philosophical thoughts and even our sort of general thinking about who we are, what is our place in the universe. Much of that takes place in the realm of language. And the conclusion I've come to is that language as a sort of semi-autonomous, autogenerative computational system, modular computational system, doesn't really know what it's talking about in a deep way. And there is really a fundamentally different way of knowing. The sensory perceptual system, the thing that gives
Starting point is 02:01:55 rise to qualia, the thing that gives rise to consciousness, and here's a big one, the thing that gives rise to mattering, to meaning. What do we care about? We care about our feelings. We care about feeling good or not so good, pleasure, pain, love, all the things that actually matter. These are actually, these live in what I call the sort of the animal embedding. It's something that other species, non-linguistic species, they can feel,
Starting point is 02:02:28 they can sense, they can perceive. They don't have language. We think, oh gosh, they don't understand anything. Well, what if it's the opposite? What if it's our linguistic system that doesn't understand anything? What if it's our linguistic system that's actually a construct, a societal construct, a coordinated construct, but as a system, it's a construct that doesn't actually have a clue about what
Starting point is 02:02:59 pain and pleasure are. It has tokens for them and the tokens run in within the system to say things like, I don't like pain, I like pleasure. Those are valid constructs and they kind of do the thing they're supposed to do in language. But a purely linguistic system, and I think language is purely linguistic, I guess is one way to think about it, doesn't really have contained within it these other kinds of meaning. Now, first of all, this has implications for artificial intelligence, thinking about whether AI can have sentience, should we care about if your LLM starts saying,
Starting point is 02:03:39 this is terrible, don't shut me off, I'm having an existential crisis. Perhaps, I would argue that we shouldn't worry about it. That's what my LLM says all the time. I don't know which LLM you're hanging about, but... My Curt LLM. The Curt LLM. Yes, the Curt LLM. But the Curt LLM, as an LLM, perhaps doesn't really have that meaning contained within it in a deep sense. It's again because of the mapping. It is communicating something probably about the non-LLM Kurt.
Starting point is 02:04:17 When you say ouch, there is pain there. I'm not denying that. But what I'm saying is that as a sort of thinking rational system that does the things that language does, that system itself may not have within it the true meaning of the words that it's using in a deep sense. I don't want to take you off course and hopefully this will help you stay on course and like hopefully it aids the course. And LLM can process the word torment, see. But what's the difference between our human brain's autoregressive process that creates the feeling of torment itself and the word torment?
Starting point is 02:04:57 So my speculation here, and it is purely speculative, is that it's non-symbolic. There's something happening when the universe gets represented in our brain. It's still in our brain. It's still a certain mapping. But when it gets represented, so that physical characteristics of the world are actually represented in a more direct mapping. So think about color. I mean, we talked earlier about sort of color space. There's real true sense in which red and orange
Starting point is 02:05:32 are more similar in physical color space. Like there's actually some physical fact about, and also in brain space. That's my guess is that in brain space, there really is something about, and a better way of saying it is, that the physical similarity has a true analog, and I use the word analog kind of specifically, a true analog in the analog kind of physical cascade of activity that's happening in the brain.
Starting point is 02:06:06 Language is symbols. Is non-symbolic a synonym for ineffable? I wouldn't have thought of it that way, but that may be a very good way to say it or to not say it. Yes, ineffable. Well, by virtue of being symbolic, by virtue of being a purely relational kind of representation, which is what the language, maybe even more than saying it's symbolic, it's that it's relational.
Starting point is 02:06:34 The language is a relational kind of image. The location in the space matters only because it's a relation to other tokens in the space. That's not true in color perception. In color perception, where you are in the, sort of probably in the embedding space is going to have physical meaning. It's gonna be related to the physical world in a much more direct way.
Starting point is 02:06:58 And so the space, even though it's an internal space, right, the perception of color is still, just comes down to neurons firing. We're not actually getting the light. The light's not getting into our brains. But the mapping is such that it preserves. And I even think of it as an extension. It's not symbolic. It's not a representation. It's an extension. It's like when you drop a rock in water, the ripples that happen afterwards are not a representation of the rock, but they carry information about that rock,
Starting point is 02:07:30 but it's just a physical extension. It's a continued physical process. Interesting. I think that's what's happening in sensory processing. And I think that has something deep to do with qualia. I don't think language has that. I think because it's purely relational, it's not a rippling of anything. It's its own system of relational embeddings that aren't continuous in any way with the physical universe.
Starting point is 02:08:01 Do you think that has something to do with God? Do you think that has something to do with God? Well, I think that if we think of the grand unity of creation, there's some sense in which language breaks that unity and I think that we can lie in language In a way that we can't in any other substrate hmm, and so I Think by becoming purely linguistic beings as The vast majority of our time as humans is spent in the linguistic space. We're hanging out there. Our minds are hanging out there.
Starting point is 02:08:48 I think we have perhaps forgotten something that animals know about the universe. And it's this kind of unity because the animal processing is an extension. It's a continuation of the world. Since the world, the universe is one thing in some sense. It is everything. We don't even have to get into non-locality, right? The origins of the universe. Let's just talk about the Big Bang or something like that. What's happening here now in some ways is connected quite literally to what happened elsewhere for way back in time. So I think this sort of unity that mystics talk about is much closer to sort of the animal
Starting point is 02:09:39 brain than the linguistic brain because the linguistic brain actually creates its dichotomy. It breaks the continuity. Symbols, I sometimes use the phrase, it's like a new physics. The relations are what matters, and it's no longer continuous, it's no longer an extension of the physical universe. It interacts with the physical universe in a way that we, as we see, we can sort of do this mapping so that when I talk, it can have influence on the physical universe, it can have influence on my perception, it can have influence on my behavior. I think that
Starting point is 02:10:16 sort of the rationalist movement, the positivist movement, sort of modernity itself is a complete hijacking of our brain by the linguistic system. And I do think that has something to do with the denouement, the kind of the God is dead kind of modernity equals somehow the decline. And so, you know, a rationalist would say, well, that's appropriate because we've figured out how the universe works and we don't need any of this hocus pocus. But what about the feeling of unity? What about the sense of sort of a cosmic whole? Are we so sure that we're right and those ancients were wrong? And yes, I think, I do think that, that this has, has very significant consequences for
Starting point is 02:11:15 thinking about some of these intangibles, these ineffables. So a snake that mimics the poison of another snake in terms of its color, that's a form of a lie. Now, would you say that that is somehow symbolic as well, though? No. And yes, there is mimicry and there is, you know, a certain sense in which animals can engage. They're not, they don't even know they're engaging in someterfuge
Starting point is 02:11:46 But that's much more continuous with okay You've just pushed the the cognitive agent into a slightly different space which is consistent with some other physical reality That's very very different than saying we are made of atoms and particles and Everything that happens is determined by the forces amongst these atoms, none of which is something that we have any material animal grasp of, any true physical grasp of. These are words.
Starting point is 02:12:19 These models are really words and they run in words and they run very well to make predictions and to manipulate the physical universe. But they're stories and they're linguistic stories. And those kinds of stories can be, I won't even say they're, according to my own theory, language doesn't really have physical, doesn't point to physical meaning. And so even saying that it's a lie or untrue isn't quite right. But within its own space, you can go off in many different directions. And maybe the danger is not in thinking of things, it's that thinking about things, thinking thoughts that aren't really true.
Starting point is 02:13:05 It's falling too deeply in love with the idea that idea space and language space is the real space. Yes, interesting. See in our circles, so when we're hanging out off-air, when we're hanging out with other professors and on the university grounds and so on, we praise this exchange of words and making models precise and doing calculations and so on.
Starting point is 02:13:31 And I've always intimated that this is entirely incorrect. And I haven't heard an anti-philosopher, like a philosopher that was an anti-philosopher, except one who was an ancient Indian philosopher. I think his name is Jayarasi Bhatta. I'm likely butchering that pronunciation, but I'll place it on screen anyhow, who was arguing against the Buddhists and the other contemporary philosophers by saying, look, you think know thyself is what you should be doing or what you didn't say like like this, but you think of it as the the highest goal however who is living more truly than a rooster like none of you are living more truly than just something that's just being yes exactly that is that that is the exact same intuition um and and
Starting point is 02:14:18 yes it's this idea i i i articulate to myself a long time ago, that the fly knows something that our linguistics system can never know. It knows something. It really does. Simply existing and being is a form of knowledge, and it's a deeper one than whatever it is that our fancy rationalist kind of perspective has given us. Our rationalist perspective is very, very powerful in coordinating and predicting.
Starting point is 02:14:51 But in terms of true ontology, I suspect it's actually the wrong direction. It's created a false god of linguistic knowledge, of shared objective knowledge, when the subjective is the one that we really have. It's the Cartesian, right? It's the Cogito. What we know is what we experience. That's the only thing we truly know. And language doesn't
Starting point is 02:15:27 really live there. So I was watching everything everywhere all at once. I never saw it. Because I also had another intimation. I'll spoil some of it and if you are listening and you don't want to spoil it then just skip ahead. But I was telling someone that I think if there's a point of life, it's one of two. And so this is just me speaking politically and not rigorously. One is to find a love that is so powerful it outlasts death. Okay, so that's number one.
Starting point is 02:16:01 And then number two is to get to the point in your life where you realize that all your inadequacies and all your insecurities and all your missteps and your jealousies and your malice and so on, that it, rather than it being a weakness, it's what led you to this place here and here is the optimal moment. It's to get that insight. So I don't know how to rationally justify any of that or explain it. But anyhow, when I said this one time on a podcast, someone else said, hey,
Starting point is 02:16:36 that latter one that you expressed was covered in everything everywhere all at once, so I watched it. What was great about that movie, and here's where I spoil it, is that, and it makes me want to tear. The movie's silly and comedic in a way that didn't resonate with me, but there's this one lesson that did. The woman, she's a fighter, the main protagonist.
Starting point is 02:17:00 She's a fighter and she's strong-headed, and she has this husband who is weak and she's always able to put down and so then you think okay well this is a modern trope where there's always the stronger woman and every guy is like just just a fool and the woman is always more intelligent and so on. Okay so you just think of it as as okay well it's just it's just a modern trope. Toward the end and the guy's the guy is kind and loving to people. Toward the end, she was getting audited by the IRS and something was supposed to happen that night where she had to bring receipts and she couldn't. Now the husband was talking with the IRS lady and our protagonist, the woman, was saying
Starting point is 02:17:44 in Vietnamese or in Mandarin, whichever language it was, was saying, oh, he's an idiot. I hope he doesn't make it worse. The husband, the IRS lady then comes to the woman and says, you have another week, you have an extension. She's like, how did this happen? She talks to the husband. And remember, this is a movie almost about a multiverse. So you're getting different versions of this. And there's this one version where the husband's speaking to her and telling her, you know Evelyn, the main character, you know Evelyn, you see what you do as fighting. You see yourself as strong and you see me as weak.
Starting point is 02:18:17 And you see the world as a cruel place. But I've lived on this earth just as long as you. And I know it's cruel. My method of being kind and loving and turning the cheek, that's my way of fighting. I fight just like you." And then you see that what he did in another universe was he just spoke kindly to the IRS agent and talked about something personal and that softened her. And then you see all the other universes where she was trying to go on
Starting point is 02:18:51 this grand adventure and do some fighting. And the husband then says, Evelyn, even though you've broken my heart once again in another universe, I would have really loved to just do the laundry and taxes with you. And it makes you realize you're aiming for something grand and you're aiming to go out and conquer demons and so on. But there's something that's so much more intimate about these everyday scenarios. There's something so rich. The journey, there's also a quote about this that the journey, I think it's T.S. Eliot's, is to find, sorry, at the end of all our exploring will be to arrive where
Starting point is 02:19:39 we started and know the place for the first time. Anyhow, all of this abstract talk. No, no, no, no. It's exactly what we're talking about. Because if you see yourself as a ripple in the universes, then you are part of something cosmic and grand. And it's sort of that extensiveness, it's that extensiveness that's being here now. It's that we aren't just atoms. We're part of a larger thing. You can call it God, you can call it the universe or whatever. But it's there. It's actually something I think we, I don't think we really, I think animals don't think of themselves as, as discrete. I don't think they do. I think, I think that, they don't think of an outside and inside. They don't think of them objective and subjective.
Starting point is 02:20:39 It's just this unfolding. Uh-huh. Do they have theory of mind or that? But these are linguistic concepts. And I think, I do, and I sound like an anti-linguist, and I recognize the power of it. I said before, you know, how extraordinary it is, how rich it is, and I have tremendous respect for it.
Starting point is 02:21:00 But at the same time, I do think that all this talk about objective things, particles, and we are physical bodies, and we are just this, and we are just that, that is bullshit. Like, no, we are the universe resonating. We are part of the whole. And the way that, I think, thinking objectively as language requires you to do actually it breaks it. So I think there's such a beauty in the silence and it's why it's something everybody knows that the ineffable, why is it called the ineffable? Why is the ineffable that the ineffable isn't just that you can't say it, it's magnificent. The ineffable is extraordinary.
Starting point is 02:21:45 Hmm. Why? Because it's this true extension. Something like that. Again, I'm trying to put it into words. Right therein lies the trap. But we're both feeling it. Well, I'm feeling extremely grateful to have met you, to have spent so long with you. And there are many conversations you and I have had that we need to finish that are off air as well. So hopefully we can do that. And thank you for spending so long with me here. This was wonderful, Kurt.
Starting point is 02:22:22 Thank you so much. I just want to hang out for with someone and talk about this stuff. So really appreciate it. I've received several messages, emails, and comments from professors saying that they recommend theories of everything to their students, and that's fantastic. If you're a professor or lecturer and there's a particular standout episode that your students can benefit from, please do share. And as always, feel free to contact me.
Starting point is 02:22:47 New update! Started a sub stack. Writings on there are currently about language and ill-defined concepts as well as some other mathematical details. Much more being written there. This is content that isn't anywhere else. It's not on theories of everything. It's not on Patreon.
Starting point is 02:23:02 Also full transcripts will be placed there at some point in the future. Several people ask me, hey Kurt, you've spoken to so many people in the fields of theoretical physics, philosophy, and consciousness. What are your thoughts? While I remain impartial in interviews, this substack is a way to peer into my present deliberations on these topics. Also, thank you to our partner, The Economist. YouTube, push this content to more people like yourself, plus it helps out Kurt directly, aka me. I also found out last year that external links count plenty toward the algorithm, which means that whenever you share on Twitter, say on Facebook or even on Reddit, etc., it shows
Starting point is 02:23:56 YouTube, hey, people are talking about this content outside of YouTube, which in turn greatly aids the distribution on YouTube. Thirdly, you should know this podcast is on iTunes, it's on Spotify, it's on all of the audio platforms. All you have to do is type in theories of everything and you'll find it. Personally, I gained from rewatching lectures and podcasts. I also read in the comments that hey, toll listeners also gain from replaying. So how about instead you re-listen on those platforms like iTunes, Spotify, Google
Starting point is 02:24:25 Podcasts, whichever podcast catcher you use. And finally, if you'd like to support more conversations like this, more content like this, then do consider visiting patreon.com slash Kurtjmongle and donating with whatever you like. There's also PayPal, there's also crypto, there's also just joining on YouTube. Again in mind it's support from the sponsors and you that allow me to work on toe full-time you also get early access to ad-free episodes whether it's audio or video it's audio in the case of patreon video in the case of YouTube for instance this episode that you're listening to right now was released a few days earlier every dollar helps far more than you think either way way, your viewership is generosity enough.
Starting point is 02:25:05 Thank you so much.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.