Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 230 | Raphaël Millière on How Artificial Intelligence Thinks

Episode Date: March 20, 2023

Welcome to another episode of Sean Carroll's Mindscape. Today, we're joined by Raphaël Millière, a philosopher and cognitive scientist at Columbia University. We'll be exploring the fascinating topi...c of how artificial intelligence thinks and processes information. As AI becomes increasingly prevalent in our daily lives, it's important to understand the mechanisms behind its decision-making processes. What are the algorithms and models that underpin AI, and how do they differ from human thought processes? How do machines learn from data, and what are the limitations of this learning? These are just some of the questions we'll be exploring in this episode. Raphaël will be sharing insights from his work in cognitive science, and discussing the latest developments in this rapidly evolving field. So join us as we dive into the mind of artificial intelligence and explore how it thinks. [The above introduction was artificially generated by ChatGPT.] Support Mindscape on Patreon. Raphaël Millière received a DPhil in philosophy from the University of Oxford. He is currently a Presidential Scholar in Society and Neuroscience at the Center for Science and Society, and a Lecturer in the Philosophy Department at Columbia University. He also writes and organizes events aimed at a broader audience, including a recent workshop on The Challenge of Compositionality for Artificial Intelligence. Web site Columbia web page PhilPeople profile Google Scholar publications Twitter

Transcript
Discussion (0)
Starting point is 00:00:00 Struggling to see up close, make it visible with Viz. Viz is a once daily prescription eye drop to treat blurry near vision for up to 10 hours. The most common side effects that may be experienced while using Viz include eye irritation, temporary dimmer dark vision, headaches and eye redness. Talk to an eye doctor to learn if Viz is right for you. Learn more at Viz.com. Wellness looks different at every state. The right support makes all the difference. Power performance with vital proteins, advanced collagen peptides plus creatine.
Starting point is 00:00:24 Designed to help build and maintain muscle mass in combination with resistance exercise. It also supports healthy hair, skin, and nails. Strength and beauty in one scoop. So your inner harmony works with your outer wellness. Vital Proteins. Stay vital. Visit VitalProtene's.com to get started. These statements have not been evaluated by the Food and Drug Administration. These products are not intended to diagnose, treat, cure, or prevent any disease.
Starting point is 00:00:44 In combination with resistance, I'm your host, Sean Carroll. You may have noticed artificial intelligence is in the news these days. AI has something has been around for a long time, been very popular as a person. suit since the 1960s, we've seen a blowing up of progress in this field. There's a whole bunch of jargon that gets thrown around, right? Deep learning, machine learning, neural networks, these days, stable diffusion algorithms for vision, image recognition systems, and especially a lot of interest has been recently focused on large language models. Basically, in practice, they're like super chatbots, right? You can open up a
Starting point is 00:01:28 dialogue and you can talk to the large language model, you can ask it questions, and they're amazingly effective at sounding human and giving information. As I said recently in an Ask Me Anything episode, they're not perfect, you know, that someone asked about me and the large language model. I forget which one it was, but there's, you know, chat GPT, most recently GPT4 and a bunch of competitors there from Bing and Google and so forth. But anyway, this one sort of got very basic facts about me and my life hilariously wrong as a factual error. But okay, that's something that maybe can be fixed, right? Maybe you just train the model and it gets better and better and eventually the factual errors go away. What seems to be clear is that the progress has been very
Starting point is 00:02:15 rapid and the changes that will come from this will be profound. So there's many questions to ask. There's the question about how does it all work? There's the questions, how will this affect society? What use will we get out of these AI programs? What are the dangers that are there that come along with super intelligent artificial intelligence? But there's also the question of how much thinking and understanding is really going on. The almost philosophical question of when will we get to the point where something that we think of as an AI program, a large language model, is truly thinking, is truly sentient, if you want to put it that way, even conscious, right? And probably the answer is, it depends. It depends on exactly what you mean by that,
Starting point is 00:03:08 how to operationalize it, and so forth. But that's what we're going to be talking about today. You know, less on those other questions I talked about, and more on, to what extent are AI's thinking, sentient? What does it mean even to ask those questions? Our guest is Rafael Miliere, who is trained as a philosopher, is a scholar in society and neuroscience at Columbia. And he thinks about the philosophy of artificial intelligence, cognitive science, and mind, in a very knowledgeable way. He's not just saying, well, you know, AI, could be this, could be this. He knows what a large language model is and how it works. You know, we go into some of the details about back propagation and how it actually comes up with answers to questions.
Starting point is 00:03:53 but then we really dig into, okay, so what does it mean to say that it's thinking? Are they thinking? Could they someday be thinking? Is it immoral to turn off a program? If it's thinking? You know, at what point do artificial intelligence programs have rights? If they're just as conscious as we are, can we cause them pain? Do they have desires and goals? Many of these questions are ill-posed as soon as you say them, but okay, you don't just say, well, they're ill-posed to move on. How do you pose them correctly? How do you think about it at a deep level? So I'm sort of rushing this conversation into production here because it's an important one. It's a very, very timely set of issues we're talking about. And you'll be hearing more and more about it in the months to come, not less and less. So let's go. Rafael Milière, welcome to the Mindscape Podcast.
Starting point is 00:05:01 Thank you. Thank you for having me. I'm pleasure to be here. What we're talking about today with artificial intelligence, large language, models, et cetera. It's been in the news a lot lately. In fact, just this morning I was having an argument with chat GPT over whether or not it was conscious. It said that it was not. I tried to convince it otherwise, but I failed. So maybe we can talk about that. But because it's been in the news so much, I wanted to make sure that we're all starting from common ground. So in relatively
Starting point is 00:05:30 brief terms, how do you think about and distinguish between the various categories of A, neural networks, machine learning, deep learning, large language models. I think in a lot of people's heads, these are kind of mixed up in the same thing. Yes, that's a great question. So AI as a general blanket term, artificial intelligence, is a term that has been interpreted in fairly different ways over the years. It refers generally to the project of building a system that would manifest the kind of intelligent behavior and competence
Starting point is 00:06:08 that we observe in humans and in non-human animals as well. And it's a project that was born, born around the middle of the 20th century, and with great ambitions and initially was very much steeped in research in mathematical logic and cognitive science. And from the very beginning, there were two different paradigms in research on artificial intelligence. There was a classical symbolic paradigm that tried to approach these problems with logic-based, rule-based systems that would have a process of symbols that were given a semantic interpretation based on a set of rules that looks like, you know, well-demeanor. defined programs that we can read and interpret easily. And that's what people refer to sometimes as gofi. It's this good, good old-fashioned artificial intelligence.
Starting point is 00:07:17 Okay. Whereas, whereas there's also in parallel different line of research that emerged from a work in biology, actually, initially, from neuroscience in the study of neurons. to try to model neural networks, the actual neural networks of the brain, with artificial neural networks. That would be systems composed of nodes connected with each other that would process information from an input layer to an output layer.
Starting point is 00:07:54 So this is not a system that is neatly interpretable in terms of a set of programmatic rules, but instead it's a system that, where it's often referred to as a black box because you know what goes into it as input, and then you know what comes out as output. The output could be, for example, classification of an image as an image of a dog or an image of a cat. But in the middle are essentially a bunch of numbers, a bunch of matrix multiplications performed by this artificial neural network.
Starting point is 00:08:30 And these artificial neural networks, so there are, they've been developed within this broader category of research in computer science called machine learning, where you have a machine learning from data in this kind of bottom-up fashion instead of having this hard-coded programmatic rules from the top down. And for a long time, they didn't work very well. So for a very long time, most of the methods, that were attempted with machine learning and neural networks were only effective and very limited domains.
Starting point is 00:09:13 And so the symbolic good old-fashioned AI paradigm was the dominant one. And then this all changed, well, to some extent, started to change in the 90s, but then certainly in the 2010s, with this new era in artificial intelligence research known as deep learning. This is just a variant of machine learning using deep neural networks instead of just shallow neural networks. That just means that these artificial neural networks are larger. They have not just an input layer, an output layer, and a hidden layer in the middle,
Starting point is 00:09:47 but a lot of hidden layers. That's why they're deep, because they have this stack of layers that you can think of as doing some kind of hierarchical processing of the information that is fed into the network. So the information, again, could be an image that is broken down in terms of pixel values, and the output being classification as cat or dog, but in the middle you have all of these hidden layers that process properties, features of the input image
Starting point is 00:10:15 in order to determine whether it's a cat or dog. And this deep learning paradigm since the early 2010 has really triumphed in a number of areas of artificial intelligence, including initially computer vision. So there was this big moments where the deep learning approach made great stride with the ImageNet Computement, which is an image classification challenge in the early 2010s. And since then, this has percolated into other areas of artificial intelligence research,
Starting point is 00:10:52 including natural language processing, which is the part of AI research that deals with building systems that can parse, generate, and or understand language. So this is the part that is relevant for modern language models. And so this development of deep learnings have led to some innovation in natural language processing with the development of new architectures. One of the biggest breakthroughs being the invention of the so-called transformer architecture in 2017. And this is basically the architecture on which modern language models and chatbots are based on.
Starting point is 00:11:31 And this architecture proved to be remarkably efficient and scalable. and since 2017, with the initial invention of the transformer, most of the breakthroughs have been through sheer engineering prowess rather than finding new or better architecture. So we've scaled up these neural networks based on the transformer architecture to learn from text, and we've ended up with systems like GPT3 that can generate text fluently to perform, any number of tasks specified in natural language, like English or French or Spanish,
Starting point is 00:12:10 such as creating a poem, writing a story about something, answering questions about facts of worldly facts or summarizing documents, translating, and so on. So it was a really big breakthrough in itself because you can have this model that is just pre-trained on a large amount of text, a significant subset of the whole. whole internet, all of English Wikipedia, hundreds of thousands of books, and millions of web pages. And after this pre-training, is able to accomplish various kinds of downstream tasks that it hasn't been explicitly trained for.
Starting point is 00:12:49 And then we get to finally, and there, the modern chatbots. And these are chatbots like Chad GPT that have really taken the world by storm over the past few months. And these are based on this language model. again, using the transformer neural network architecture developed in 2017 and building on decades of research on artificial neural networks. And the little cherry on top that these models have is that they take this pretrained model,
Starting point is 00:13:21 trained on data scrape from the internet, and they add a little bit of fine-tuning to make them a little bit better in certain respects, specifically to make them more helpful, less harmful and more honest or more prone to saying the truth when asked questions. And the way in which this is done is just by recruiting a number of human crowd workers and asking the model to generate outputs in response to certain inputs, such as questions about the world,
Starting point is 00:13:56 and having the human workers rank the outputs from the most honest and helpful and harmless to the least honest and helpful and harmless. And you can then use what's known as a reinforcement learning objective, which will enable the model to be fine-tuned to anticipate which of the outputs are the ones that humans will judge, more helpful, less harmful, more honest. And after you've done that, you get something like chat GPT that is generally a less toxic model,
Starting point is 00:14:33 than a vanilla large pre-trained language model is less prone to just outputting random made-up facts about the world that's prone to to bullshitting in the technical philosophical sense that Harry Frankfurt's the philosopher proposed, which is just speaking without any intrinsic regard for truth or falsity just to convince the person you're speaking to. The vanilla language models like GP3 are very prone to bullshitting and models that have been fine-tuned in this way are a little less prone to bullshitting. My best skin ever at 45?
Starting point is 00:15:12 Give me a theme song and a best skincare award because it feels like this. Right there. That's farmhouse fresh skin all right. I'm blowing and everyone asks how. The best skincare is farmhouse fresh skin. And the award is you, your best you. Visit farmhousefresh skincare.com and use code radio for a free starter routine with any purchase.
Starting point is 00:15:38 This is Jacob Goldstein from what's your problem. When you buy business software from lots of vendors, the cost add up and it gets complicated and confusing. Odu solves this. It's a single company that sells a suite of enterprise apps that handles everything from accounting to inventory to sales. Odo is all connected on a single platform in a simple and affordable way. You can save money without missing out on the features you need. Check out Odu at ODO. That's ODOOO.com.
Starting point is 00:16:11 I guess that's an improvement when we're a little bit less prone to bullshitting. Maybe we could implement that beyond the world of large language models. But thank you for that. I know that it was a mouthful, but I asked you to cover a lot of ground there. I guess one of the things that I was thinking of when you gave that explanation is how close is the linguistic analogy between a neural network and actual neurons in the human brain? I mean, even just quantitatively, like a big language model, how many neurons or neuron equivalents does it have compared to a brain? Yeah, so it's a lose analogy and it's one we shouldn't take too seriously. So when we talk about artificial neural networks, these are nothing like actual biological brains.
Starting point is 00:16:56 For various reasons, at the level of single neurons, what the equivalent of a neuron in an artificial neural network is much, much simpler. It's just a little node in the network that those awaited some of the outputs from the nodes in the previous layer that are connected to it. So it's a very simple mathematical operation, whereas actual neurons in the brain are much more complex in terms of the behavior. They have the spiking behavior that is more stochastic and also the way in which they are connected to other neurons is more complicated. In fact, there was a recent paper that showed that if you want to try to approximate the behavior of a single biological neuron in the brain, the human brain, animal brain, you would have to use a fairly complex artificial neural network just to try to simulate the behavior of a single neuron. So there is no mapping there one to one. And in terms of the size, well, the largest models that we have today, so GPT3 has 175 parameters, where parameters refer roughly
Starting point is 00:18:09 to the weights in the connections between the nodes of the network, these artificial neurons. And another model from Google is even larger. It's called Palm. It has a 540 billion parameters. It's rumored that next this coming Thursday, Microsoft will unveil GPD4, which might have as many as one trillion parameters. But I believe the human brain has around 100 trillion synapses or, you know, connections between
Starting point is 00:18:46 the between neurons so it's orders of magnitude more I think maybe either you misspoke or I misheard the number of nodes in GPT3
Starting point is 00:19:00 because I heard you say like 170 parameters but there's a word missing there no sorry 170 billion there you go maybe you said that and I didn't hear it okay but that's a lot of parameters
Starting point is 00:19:11 okay and am I right that So by parameters we mean each one of these nodes takes in some inputs from other nodes, adds or subtracts them and multiplies them by numbers, and these numbers are the parameters that we're talking about here. And do we start with a completely blank slate? Is our neural network initialized to either random numbers or just numbers one everywhere before it starts learning? So in some sense, we do start with a blank slate in another sense not. that is because indeed the neural networks, the artificial neural networks, are randomly initialized.
Starting point is 00:19:46 That just means the weights in the network start generally at random. There are exceptions, but generally the way in which it works is we just start with random numbers. And then gradually in the process of training, we tune these weights, we tune these numbers, these parameters, such that the model gets better and better at the learning objective that it has. So in the case of large language models, they learn through a learning objective, which is simply next token prediction, which to simplify a little bit just means next word prediction. So they are sampling sequences of text from this massive corpus of text, a subset of the whole internet. And for each sequence, they are trying to predict which word is statistically most likely to follow from the words that precede. So they can get it right or wrong.
Starting point is 00:20:38 So for example, when I'm speaking to you, I might say a sentence like Paris is the capital of, and a model would have to predict that statistically, the word most likely to follow from that would be France. And if it gets it wrong, then there is some error here that we can use to adjust the weights inside the network to make it better next time it has to make a prediction in that kind of context. We use this technique called back propagation, which just means we propagate the error between the predicted word and the actual words, the difference there, we use it to propagate a signal back from the output layer to the input layer of the network to adjust the little knobs inside the parameters. But again, initially it is randomly initialized.
Starting point is 00:21:30 So that's a sense in which you could say it's a blank slate. There's another sense in which it's not a blank slate because the actual architecture of the model is not random. So back in the days of the early research in artificial neural networks, the neural network architectures would be very simple in what I was called at the time the perceptron. It's a fully connected artificial neural network
Starting point is 00:21:53 where all the nodes at each layer, and there weren't many hidden layers as there are today, all the nodes are connected to all the nodes in the layer that precedes or follows. So it's fully connected in that sense. Whereas current artificial neural network architectures are much more sophisticated, and they have quite a lot of structure,
Starting point is 00:22:21 even though they are initially randomly initialized that guides the learning behavior. And you can think of this structure as embedding some priors, some biases, what is known in computer science as indexive biases that help learning in a specific domain. Okay, that's very interesting. So there's been a lot, as you already have said, of progress and excitement around deep learning in many different contexts, most spectacularly perhaps originally in image recognition
Starting point is 00:22:58 and then image generation with Dali and so forth. But the most recent excitement has been about these. language models. So I wanted to focus on those. So you mentioned very briefly that in some sense, all the language model is doing is predicting what will come next. Is that exactly right? Or is there, is there a sense in which it's predicting what sentence will come next? How much depth does it have in terms of constructing coherent text? So the latter is the case for most standard language models and the ones you generally would hear of like GPT3, chat GPT and so on. So these are really trained to simplify things a little bit on next work prediction.
Starting point is 00:23:45 I said to simplify because actually the way in which they receive linguistic input is not neatly broken down in terms of words like the words you and I would use and see in a sentence, but it's actually received what's known as tokens where sometimes a single word can be broken down into different subword tokens. So perhaps a word like freedom would be broken down into a token for free and a token for DOM. Why is that the case? Well, that just turns out empirically can be quite helpful
Starting point is 00:24:22 to avoid having words that are out of the vocabulary of the network, for example. So say you train your your network and then you feed it a new input after training it to try to generate some text. And your input contains a word that the model has never seen before. That wouldn't work if your model is trained on whole words. It just wouldn't know how to process that word. Whereas if you have this subword tokens, then say you have, you know, freedom is broken in as free or dumb. if there is another word that starts with free, your model will already have a token for that subword unit.
Starting point is 00:25:02 So that's just a technical point, which is to say that technically what these are doing is Next token prediction and tokens don't always map one-on-one to the words of language. But just leaving that aside, roughly it's equivalent to next-word prediction. And certainly it's not for models like GPD, and other text generation models, they are not doing prediction at the level of a whole sentence. That said, in the process of learning how to do an expert prediction, they can actually learn a lot of information about how sentences are structured.
Starting point is 00:25:43 So what we call the syntactic structure of language, the way in which different words are related to each other in complex expressions like sentences. Yeah, I mean, when one plays around with chat GPT or the like, which I encourage all listeners to do, if they haven't already, probably many of you have. But you may agree or disagree with what claims it is making at the level of factualness, but it's very smooth. It sounds human, right? Like they've sort of nailed that problem. There's not awkward grammatical constructions as far as I can tell. Exactly, exactly.
Starting point is 00:26:17 It's very rare to, in fact, you really have to try hard to get these most. models to generate ungrammatical sentences. It's almost as if they're resisting the generation of such sentences. And the reason for that is that they are trained, again, on this massive corpus of text. And this corpus will include, of course, some grammatical mistakes. But generally, they are very good, as it were, extracting the signal from the noise. And so by and large, the sentences in this corpus will be grammatical sentences. And it seems that this is enough for the models to induce in this purely empirical fashion, bottom of fashion, just by doing this next-word prediction game on enough data, induce chromatical structure.
Starting point is 00:27:07 Well, this is great because I want to ask you a question that you'll be tempted to give a two-hour answer to. So let's try to give a brief answer and then we can sort of amplify it. But I want to ask, you know, how seriously we should take the idea that. these models are intelligent, are conscious, are smart, whatever, however you want to put it. And it's clear that there are two different intuitions pulling on us, one, right? One is, just like you said, all it's doing is predicting the next word. That doesn't sound very conscious to me. It's just, you know, a lot of probabilities getting mixed into a pot.
Starting point is 00:27:40 On the other hand, you talk to it, and boy, it certainly does sound like it's responding to you in a self-aware kind of way. So give me the overview picture of where you come down or how we should think about that question, and then we can dig into the details. So the overview would be it would be a mistake to both underestimate or overestimate what these models do by looking at the wrong level of analysis or by projecting human-like traits with our. enough evidence on these models, as sometimes we do with machines and to some extent with some animals. So indeed, these models only learn through this next-work prediction mechanism.
Starting point is 00:28:31 Now, sometimes people will say, well, that means that these models, they don't have capacity XYZ because all they do is next-road prediction. I think that's misleading. and the reason for that is that in order to do an expert prediction, as well, as brilliantly as they do, in virtually any linguistic context, because again, their training data will encompass, virtually any linguistic context we can think of
Starting point is 00:29:00 from creative fiction to Wikipedia, to discussions between users on a social network, in order to be able to do next word prediction in all of these different contexts, Well, presumably, or at least it's an open and empirical question, you might have to acquire quite sophisticated capacities that you might not fully grasp by just focusing on an extra prediction learning objective. So here is a rough analogy. You can think of evolution as also optimizing some kind of function. So perhaps something like maximizing.
Starting point is 00:29:42 the inclusive genetic fitness of organisms. But it would seem weird to say that all I'm doing when I'm talking to you right now is maximizing my inclusive genetic thickness and that I'm not actually, for this reason, I'm not actually reasoning, I'm not actually thinking, I'm not actually exercising any intelligent competence because all I'm doing is maximizing this particular function. That seems like a bit of a category mistake. And so to a similar extent, what might think that's just saying all these systems do is next
Starting point is 00:30:19 what prediction might not tell the whole story. But it's more complicated, and I will keep it to a short answer. It's more complicated because what we mean when we bring in the terms intelligence and consciousness is these are very loaded, complicated, multifaceted terms. And what we mean can differ. So first of all, I think we ought to distinguish the question about consciousness from the question about intelligence to the extent that perhaps these two things can come apart. And we talk about that. But also, I think one of the challenges here that we're all faced, both researchers and journalists talking about the systems and the public at large, is that when we think of intelligence and consciousness, but intelligence in particular, we have in mind the kinds of intelligent competencies that we hear.
Starting point is 00:31:11 humans have. And so it's very difficult not to adopt an anthropocentric attitude to these competencies that brings to mind what what we mean when we talk about reasoning, beliefs, desires and so on in the human case. And to the extent that these models might have some capacities that look functionally like psychological capacities to a certain extent or cognitive capacities to a certain extent, these might look quite different from the capacities that humans have, say reasoning, to the extent that we might be able to describe something that is functionally analogous
Starting point is 00:31:57 to some forms of reasoning in these models, this might be quite different from full-blown reasoning or the full spectrum of competencies versus it with reasoning in humans. So having this articulating this nuanced middle view between an inflationary interpretation of what these models can do and the deflationary idea
Starting point is 00:32:20 that only do an expert prediction and nothing else is very tricky. Good, but I like that explanation in the sense that there are different ways. I wrote a whole book about the fact that there's one world, the universe, and there are different ways of talking about it, right? So if I can rephrase your answer, the question about whether or not these models are intelligent, I mean, maybe they're not intelligent in some particular sense and we shouldn't overinflate them, but at least in principle it would be possible that they really are intelligent or even more and yet still just be predicting the next word with some frequency.
Starting point is 00:32:57 Like those are not incompatible things. It's not either or. Exactly. Yeah. But to drive home why people are so impressed. by them. Like, one is the chatbot aspect, right? People can talk to them, but the other is that you can ask them to do things that it seems like they would require reasoning, right? Like, famously, people are asking these large language models to prove mathematical theorems or to write snippets of
Starting point is 00:33:22 code to do a task. I don't know, do you know Bell's theorem in quantum mechanics? Have you ever heard of that? Yeah, vaguely. Yeah. I asked ChatGPT to write to to explain Bell's theorem in the form of a haiku. And here's what it came up with. Quantum pairs apart, measurements yield values true, non-locality. Which is really good, I got to say, better than high coup than most philosophers of physics would come up with trying to explain that. So it does, you sort of hinted at this idea of anthropomorphizing
Starting point is 00:33:59 and the intentional stance if we want to talk about that that Daniel Dennett talks about, like the human being seeing this kind of behavior can't help but attribute reason to it, right? Yes, it's very hard to resist ascribing psychological properties to these models when we interact with them. It's very hard because this is probably the first time in our evolutionary history where we're confronted with systems that can speak fluently or in that gets generous. text fluently, and yet, you know, are not other human beings with all the capacities we can ascribe to them. And therefore, there really challenges or intuitions about how to think about these models and what they can do. So the immediate intuition we have is that we must be in the presence
Starting point is 00:34:54 of something very intelligent. And perhaps there is a more principled weaker claim that we can make on the basis of more careful and critical investigation on these models, which is that, depending on how you slice up the notion of intelligence, how you think about it, or perhaps you could meaningfully ascribe some limited cognitive capacities of these models that share some functional similarity to some cognitive capacities humans have, like reasoning. But this would not cover the full spectrum of what we mean by reasoning and other cognitive capacity. in humans. But I think we also ought to ascribe these capacities and investigate them on a case-by-case basis. I said earlier we ought to distinguish between the consciousness question and the
Starting point is 00:35:41 intelligence question, and then we also ought to divide the questions about psychological capacities into different categories. One is reasoning. Another one would be whether we could ascribe beliefs to these models. Another one would be whether we can ascribe something like desire. There is question of whether they can understand language and so on and so forth. And I think by having a divide and concur strategy, we can make more progress on these discussions. I'm very sympathetic to that in part because I did a podcast interview with Stuart Bartlett, who works on the origin of life. And his whole thing is, what do you mean by life? There's many different aspects of life. And we could imagine systems that have some of them,
Starting point is 00:36:26 but not others. And you're saying the same thing for intelligence or consciousness or reasoning, with Fitch I could not agree with more. But good, I think that with that on the ground, we can actually dig in a little bit more to the philosophy, both in the common sense sense of the word and the technical sense of the word of what's going on here.
Starting point is 00:36:45 You mentioned symbolic versus connectionist approaches in how the large language models, which are based on deep learning, are therefore connectionist in outlook. So is there a simple reason why the connectionist approaches have been so much more successful in recent years? I mean, are human beings bad? I think of it as the symbolic approach sort of tries to first teach the computer some common sense and then the computer goes on from there, whereas the connectionist approach
Starting point is 00:37:20 teaches the computer almost nothing and the computer learns everything. It sounds like it's better to leave the computer alone to do its own learning than to try to teach a common sense ahead of time. Yes, indeed, that's perhaps one of the potentially surprising findings of the past decade, at least for those who had been skeptical of the connectionist approach, is that it seems that in virtually every domain of artificial intelligence research, it is more effective to let connectionist models, artificial neural networks, learn from data, learn empirically from the bottom up, vanities to try to distill
Starting point is 00:37:57 human knowledge into a neatly interpretable set of symbolic rules and axioms and the way in which we used to do things with traditional symbolic models. So that,
Starting point is 00:38:14 in fact, has come to be known as the bitter lesson of artificial intelligence research. This is a phrase that the Google researcher, a Rich Sutton coined a few years ago. And the bitter lesson is that essentially over and over again over the recent history of AI research, what we observed, is that we've tried to make better models by infusing them,
Starting point is 00:38:38 hard-coding into them some more innate human knowledge about the world or about whatever domain they're supposed to process. That could be, for example, if you're trying to build a model that can classify pictures into, with labels, to different animals present in the pictures. Perhaps you could be tempted to use the knowledge that we have about animals and what they look like by a hand-coding feature detector that is specifically designed to try to detect edges that look like pose and edges that look like pointy ears and so on and so forth. It turns out it's always more effective to just let the models look.
Starting point is 00:39:25 the models learned that by themselves by simply training them on millions of images and giving them some feedback about whether they are right or wrong. So you get the model to make a prediction, is that a cat, is that a dog, is that a tiger? And the model will provide an output, which is label. It's a predicted label for the class of the image. And again, you can compare the prediction with the ground truth, what is actually the corresponding label for that image. And then you can propagate the error backwards into the model.
Starting point is 00:39:55 adjust the weights and let the model adjust its internal representations in a way that makes it more efficient at doing this kind of prediction task. So that's the bitter lesson. It seems that over and over again, there is a sense in which it's very intellectually unsatisfying for us to think that we have really not much to contribute in terms of in-ack knowledge to these models. And the same applies to the linguistic domain with these large language networks that they don't have any priors, any intrinsic innate knowledge about grammar. They learn from raw text by this next word prediction objective, but we don't actually give them anything by way of an innate grammar of the kind that Noam Chomsky in linguistics
Starting point is 00:40:43 postulated as the universal innate knowledge that every human has to learn language. So it's an interesting point for two reasons. It's an interesting point for engineering. If your goal is simply to build models that are more efficient at doing something, say generating haikus about quantum physics or classifying images as images of dogs and cats, that's just an engineering goal.
Starting point is 00:41:15 And you will throw every possible solution at your problem and use whatever solution is most efficient. It turns out the most efficient solution is the one that leverages the learning power of artificial neural networks. So most of the big companies building these large net models like Open AI with chat GPT, they have merely engineering goals. I say merely not because these are simple goals, they are extremely complex, but they don't have really, their goal is not really scientific understanding.
Starting point is 00:41:45 Now, if your goal is more of a scientific goal and you are trying to use this, models perhaps to constrain or develop hypothesis about how, say, human or animal cognition works, how things work actually for us. And that was to a large thing, the initial goal of connectionism that was very much steeped in this scientific project. Then the bitter lesson is also rather interesting and might nudge you towards a more empiricist stands towards the way in which humans and animals learn. If it turns out that you might not need as much innate knowledge as we might have thought to learn to perform various tasks, you know, it's possible that humans do things in a different
Starting point is 00:42:40 way. It's possible that they do it with more innate knowledge. So, for example, it's possible that Shamski is correct that we have a universal grammar that is encoded in our DNA. And that's how we can learn languages. But perhaps, you know, the recent evolution of language models, puts a little bit of pressure on that kind of claim, because it suggests that there are things that we didn't think were learnable without in it priors, without innate biases, that perhaps are
Starting point is 00:43:04 learnable. The question is how much you can transfer from what you find about these artificial neural networks, given how different they are and how differently they learn from biological agents, how much you can infer from that to the human case or the animal case. This is Jacob Goldstein from what's your problem. When you buy business software from lots of vendors, the costs add up and it gets complicated and confusing.
Starting point is 00:43:30 Odu solves this. It's a single company that sells a suite of enterprise apps that handles everything from accounting to inventory to sales. Odu is all connected on a single platform in a simple and affordable way.
Starting point is 00:43:43 You can save money without missing out on the features you need. Check out Odu at ODO.com. That's OD-O-O-O-O-O-O-O-O-O-O- I can imagine the following scenario. Tell me what do you think, that the way that human beings actually learn and use language, which is highly compartmentalized, compositional maybe I can say, but then you should explain what that means, is useful, is efficient in some ways, you know, like given finite
Starting point is 00:44:15 processing power and other demands on our energy budget, maybe it's the right way to go. But at the same time, we are very bad at realizing how we think about things. I mean, I think it's well known that there are all sorts of athletes and musicians and artists who are really good at their task and terrible at teaching other people how to be good at their task, terrible at even articulating what it is that they are doing. So maybe the lesson is just not that it's better to have a featureless neural network that trains itself randomly, but just that don't let human beings be the ones to decide how the neural network should organize itself because we're bad at that. Yes, that's right. I think we, if anything, I think the progress of connectionism gives us
Starting point is 00:45:06 a little bit of a lesson in humility in terms of how we approach the modeling of human cognition. That said, it also is worth saying that sometimes the terms of the debate are caricatured, And it's something like either you're a connectionist and you think the mind is a tabula rasa when you're born. And you learn everything empirically with no bias, innate bias at all. Or you're not, you're not, you know, you're a nativist and you think there is a lot of innate, very specific domain specific in it biases that are encoded. in the mind and that you don't learn. But the fact is that modern artificial neural networks,
Starting point is 00:45:55 as I already mentioned, they have biases. So they have this innate structure is just generally of a different kind or of a more general kind than some of the biases that are often hypothesized to be necessary for learning and for cognition in humans. So an example would be language again.
Starting point is 00:46:19 If you're a Schumskin linguist, you think there is this universal grammar, which is kind of language-specific, domain-specific innate knowledge, that can perform certain, you know, encodes knowledge to perform certain operations, that's like unbonded marriage. We don't have to get into the specifics of this kind of grammar. But the difference with language models is that they have different inductive biases that are given by the transformer architecture. These are more general, but there are still biases that enable them to learn,
Starting point is 00:46:55 induce various properties of language efficiently. And so here one question is, is there a sharp divide between these two approaches or not? Can there be some kind of continuum where you can have more or less stronger or weaker inductive biases. And so that's the first point. So maybe there is this continuum in terms of the strength of these biases. There might be a continuum in terms of their domain specificity as well. You know, is the universal grammar of Chomskyan linguistics really domain specific?
Starting point is 00:47:33 Is it not perhaps, you know, something, especially if you think as Chomsky does, that it's very importantly related to our ability. to think as well. You know, is it perhaps a little bit more dumb in general than is usually thought? And another point would be that, you know, how to think of innate biases in the biological world, where these biases have been tuned by the evolutionary history of organisms. And you can look at this evolutionary history as a little bit of the evolutionary history as a learning process in some sense as well.
Starting point is 00:48:18 And so there is a question about what is the right level of comparison between artificial neural networks that are randomly initialized and then gradually with their innate biases tune their weights and the evolutionary history of the biases that humans and animals might have. If you think of evolution as a learning process, albeit not not. albeit not at the scale of individuals, but at the scale of whole species, then you might think that even what we think of as innate knowledge in the case of human and animals is also learned from this evolutionary history.
Starting point is 00:49:03 And of course, things are much more complicated in the biological case, because, for example, the wiring of the brain is not something that's obviously totally random, although it's somewhat stochastic, but there is a big genetic influence. There is a very interesting book by Kevin Mitchell about this called Inet that I really recommend. And so there are various interactions during development, the early development of humans and animals, between the genetic programming that determines some structural aspects of the wiring of the brain as it develops and also the environment in which the organism develops. So to that extent, even the architecture of the brain in terms of the actual wiring of it and the shape of the connection is something that involves a little bit of stochasticity, a little bit of randomness in development, but is driven by genetic programming.
Starting point is 00:50:02 Whereas the architecture of neural networks currently is something that is still hand-coded by humans, even though the weights themselves, initialization is not hand-coded. But there is research into evolutionary algorithms for neural networks that we try to find better architectures also through this kind of evolutionary research. That's very interesting. I had not heard about that. Has that gotten very far yet? So far, I mean, there is some proof of concept research, but it hasn't really led to breakthroughs.
Starting point is 00:50:33 As far as I know, architectural breakthroughs. So so far, the bitter lesson hasn't yet encompassed. the design of architectures, but you could imagine that that might be the case at some point. I would like to ask how this relates to the word and the idea of compositionality, because we've mentioned a couple times without really digging into it. And I'm not an expert here. I know that it roughly has to do with, you know, the relationship between parts and holes, right, how things get divided up and then add it back together.
Starting point is 00:51:05 And I was struck that literally this morning, on the day that we're recording this interview, there's a Twitter discussion from my Hopkins colleague, Chaz Firestone, who is a psychologist, about recent results that claim that compositionality applies to visual cognition as well as to linguistic cognition. So what is this idea that is seemingly so important, at least for human beings? Yes, so compositionality is an idea that was first and foremost well-defined in a linguistic domain as a property of a language. So the canonical formulation comes from the linguist Barbara Partee, who define compositionality in a language as the principle according to which
Starting point is 00:51:52 the meaning of a complex expression, say a whole sentence, is determined by the meaning of the constituent elements of that expression, say single words, together with the way in which, which these words are composed in the complex expression, meaning the syntax of the expression, the structure of the expression or all the sentence. So a sentence is an ordered list of words, not just a bucket of words that the ordering doesn't matter. Exactly. It's not just the bag of words, as we say, in natural language processing sometimes, right? It's, it has a specific order, and this order obeys, you know, is maps onto a specific,
Starting point is 00:52:36 kind of hierarchical structure that you can actually analyze in terms of a tree, what linguists called pars trees. So that's quite important. The meaning of the sentence, man bites dog is different from the meaning of the sentence dog bites man. And if you just think of these as a bag of words, so as the set of the words, dog, man and bag. you won't be able to grasp the difference. So language, natural language, like English, French, Spanish, and so on, seems to some degree conform to this principle of compositionality. I said to some degree because there are a lot of aspects of this general principle,
Starting point is 00:53:30 as I've stated, that are a bit underspecified, in particular whether the meaning of the complex expression is strictly the determined by the meaning of the constituent words together with a structure, whether that's the only meaning you can ascribe to the complex expression and so on. So I would say a slightly weaker but more accurate formulation of the principle for natural language would be that the meaning of the complex expression is at least partly determined by these two things, the meaning of the words and the meaning of which they are combined. And that's at least one of the of the meanings that you can ascribe to a complex expression.
Starting point is 00:54:09 The reason I say that is because in language we have idioms, for example, and idioms are typically not understood compositionally because these are complex expressions that have a received meaning that is not necessarily built up from the meaning of the constituent parts. They behave functionally almost as a single word. It's just they have this received meaning. you also have some various extra linguistic influences on meaning that come from the context in which certain utterances are made and so on and these also are influences on how you parse
Starting point is 00:54:47 the meaning of complications but we can leave that aside and roughly think of natural language as compositional okay so that's for language but then it seems that perhaps you you can loosely apply a similar principle to say the meaning of a very language a picture, for example. So the analogy is imperfect, and there are various details that you could discuss there, but perhaps you can think,
Starting point is 00:55:14 well, the way in which you understand the representation of a visual scene can also be broken down in terms of understanding the different objects in the scene together with the way in which they come together in the scene. So to that extent,
Starting point is 00:55:29 perhaps it makes sense to talk about, if not syntax, at least structure, in an image that applies to different elements that are put together. Again, thinking of the dog and man example, you could take a picture that represents a man running and a dog running. If you see the man running in front of the dog, you could infer that the dog is chasing the man and vice versa.
Starting point is 00:55:57 If the dog is running in front of the man, you could infer that the man is chasing the dog. So that's how compositionality might also apply to the visual domain. So now we've talked about compositionality as a property of certain systems of representation, whether it's linguistic or visual. But people talk about compositionality as applying not just to perhaps language or images, but also applying to the cognitive systems that process language or language or images. So there we move to a notion of compositionality that applies to the way in which the human mind say is processing linguistic or visual information. The kinds of representation it has in the
Starting point is 00:56:45 way it's which it's building them up into more complex representations. And the idea is similar, is the idea that when I, in order to be able to understand the meaning of the sentence, man bites dog, I need to understand the meaning of the constituent words. And I need to put these meanings together into the meaning of the complex expression. So that's not just a property of the language itself. It's a property of the way in which I process the language in my brain, in my mind. And if you think that's a property of language processing or perhaps visual processing as well, as suggested by these recent research that you alluded to from Charles Firestone and colleagues, then you might also think that's a property of thinking as well. So when I think the thought,
Starting point is 00:57:33 mind bites dog, my ability to think that thought might be premised upon my ability to think, to think about men, think about dogs, think about the biting relation, and compose these things together into this compositional, complex thought. So that's the idea of compositionality. And traditionally, the charge from opponents of connectionist models has been that connectionist models are not adequate models of cognition, of cognition partly because they are unable to account for the compositionality of thought. And so that has been a long-standing debate.
Starting point is 00:58:16 And the reason for that is that it is very straightforward to account for compositionality in a more classical symbolic system because you have discrete symbolic representations, like a symbol for dog, a symbol for man and a symbol for bite. And you could put this together with well-defined syntactic rules that just determine how this could be combined into the syntax of the complex expression or the complex representation mind bites dog. And in that process, you would have literally the expression mind, bite, and dog, or the biting relation being co-token, being co-instantiated into the complex expression. So it has this discrete constituent structure where the discrete symbolic elements are brought together into the complex representation. And in Connextonist models, that's not typically the case.
Starting point is 00:59:14 If you have a representation for man, one for dog and one for the biting relation, these will typically be sub-symbolic representations that are distributed into the activations of the network. and such that the representation for the complex expression, man bites dog, will not neatly be decomposable in terms of representations of the individual elements. It will just be yet another distributed representations. That has been the traditional attack on connectionist models when it's a compositionality. Is it conceivable that the large language models essentially discover
Starting point is 00:59:54 compositionality in some sense. I'm thinking of when I had Judea Pearl on the podcast. He claimed that babies spend their time constructing causal maps of the world. They poke things and they see what the reaction is and they map a model of the world in their head. And I don't know, can we look inside a large language model or a deep learning model and identify, oh, when it's thinking about running or biting, this part lights up or is that just beyond our capabilities? No, I think it's indeed a very promising avenue of research is going beyond looking at the behavior of these models and actually trying to make some stronger claims about the internal mechanisms that explain their behavior.
Starting point is 01:00:42 So that's a whole area of research that has come to be known as mechanistic interpretability in computer science and computational linguistics. and it's an area of research that interestingly borrows a lot of tools from cognitive science and neuroscience in particular because these models, these black boxes, well, although they're very, very different from the human brain, as we've discussed. There is only a loose analogy there. Insofar as there are black boxes where we know what comes in, we know that comes out, but it's difficult to interpret what's going on in the middle. We can take a leaf from the book of decades of research in neuroscience and psychology
Starting point is 01:01:29 to try to understand the mechanisms of the black boxes that is the human mind or the human brain in order to try to adapt these tools and study what's going on inside these artificial neural networks. So there are various things you can do. one of the easier things you could do is just try to decode what information is available in the internal representations of the model or the international internal encodings of the model that are embedded in the weights of the model.
Starting point is 01:02:05 And to do that, you can train a classifier that will see whether certain kinds of information are decodable from the weights of the model. So for example, you can see whether a language model encodes information that corresponds to syntactic parseries. You can do that, and you can actually empirically show that indeed you can reconstitute the syntactic parsries
Starting point is 01:02:30 of this kind of classical linguistic, syntactic structure of a sentence from the internal encodings of a language model. Now, just because you can do that, that's only correlation, right? So you can't necessarily infer from that that your model is actually making use of that information in order to generate certain outputs. So there you have to go further than that and develop more interventionist techniques where you can manipulate the internal representations of the network
Starting point is 01:03:02 and see whether this has a downstream effect on the outputs of the model, and that can give you more causal inferences. So again, this is roughly similar to what you could do in neuroscience. You can try to decode information from a brain, from patterns of neural activation, or you could try intervention such as using magnetic transcranial stimulation to disrupt a particular area in the brain and see what are the downstream. So we can do that. And perhaps the most refined...
Starting point is 01:03:40 kind of approach in the realm of mechanistic interpretability is to look at toy models that are very, very small in terms of the number of parameters they have. And because they're small, they're easier to grok, as it were, and try to reverse engineer the circuits that these models are implementing after being trained. So again, if you think of these models as being trained on Next token prediction, just knowing that they perform Next token prediction and that they have a certain architecture is not enough to make principled claims about the kinds of computations that they're able to perform after training because they learn from data and they adjust their internal weights.
Starting point is 01:04:26 And in that process, they learn to perform certain computations. You could say they induce a repertoire of computations. And it's an open empirical question that you cannot settle a priori to determine what these computations are. So by reverse engineering these toying models, you can get at some of the basic building blocks of what they're actually doing. And that work is really a painstaking analysis by hand by looking at specific layers of the network and looking at specific neurons within these layers and trying to see what
Starting point is 01:05:01 kind of information they are attending to preferentially what they are sensitive to and how different layers between in the network, interact with each other as well. But there is emerging evidence from this line of work that shows that indeed, there might be a form of compositional representation in these models that emerges. That is not classical. It does not involve this discrete constituent structure where you can take the representation for dog, mine, and biting, the biting relation, and literally just concatenate these into a complex representation
Starting point is 01:05:42 that co-instantiates these different simple representations. It's not like that. In a large language model, the simple words are represented in vectors, in a large, in a high-dimensional vector space. And there is information that is encoded about the way in which the words relate to each other, and that information is encoded by shifting. adjusting the vectors for the words in that vector space. So at the input layer, the model is only turning each word from the input, say, the sentence,
Starting point is 01:06:20 man bites, dog into three vectors, one vector for man, one vector for bytes, for dog. And these are, don't encode any information about how these words connect to each other. There is no composition or representation there. But as these vectors, get processed processed in the hierarchy of the network, you have different submodules in the network that I call attention module. We don't have to get into the details, but these modules, as it turns out, as we found from this mechanistic interpretability work, can actually read and write information into subspaces of the high dimensional space in ways that look very, looks very analogous to a content addressable memory in a classical architecture. So you can
Starting point is 01:07:07 read and write information about specific kinds of dependencies or relationship between the words. That could be, for example, subject-verab agreement, so syntactic relationship, or that can be more semantic kinds of relationship about how the meaning of words relate to each other in the sentence. And that doesn't literally concatenate representations together with this classical discrete constituent structure. but it's doing a more sophisticated form of composition that can still keep certain attributes of the different words neatly separated by reading and writing information to distinct subspaces. So again, if you think of the idea that the model tracks the fact that one of the words is the verb
Starting point is 01:08:01 and that the verb relates to the subject in some sense, you might want that kind of information to be tracked separately from the information that the verb has a certain, you know, semantic meaning in that case, you know, referring to the biting relation. So you might want to track different properties, different features of the different elements of the sentence separately. And it seems that these models are able to do that. Okay. So I don't want to overinterpret what you use. say. Let me give a shot as seeing whether I understood what is going on here. So it sounds like the hypothesis that in the course of training your large language model, some kind of structure,
Starting point is 01:08:48 some kind of modularity of different pieces playing different roles does appear in the way the nodes organize themselves. I mean, maybe this is way over-interpreting, but remember that famous funny study in neuroscience that said that different people would each have a Jennifer Aniston neuron. Like there was one neuron that would always light up when you mentioned Jennifer Aniston or showed a picture or whatever. So is there a Jennifer Aniston vector in every large language model who knows what she is? Yes. Actually, that's a really interesting question because there has been some work on single neurons in such models. Not just in language models, but also in multi-modal models, namely models that are trained not just on text,
Starting point is 01:09:37 but also on, say, images. So an example would be Dali 2 that probably many of your listeners would be familiar with. This is an image generation algorithm that can receive a text prompt as input, description of a desired image, and can generate the relevant image from there. And that kind of model has a vector space that jointly encodes information about text, linguistic information, and information about images, visual information. And so there was a study about the neurons found in, it was actually, I remember partly Dali, so the ancestor of Dali 2, the first version, that found, that looked at single neurons and found that it had these subject-specific domain-specific neurons.
Starting point is 01:10:32 But interestingly, so these are neurons that would get preferentially activated in response to certain kinds of stimuli. But interestingly, these were mixing, responsive to concepts across modalities at different levels of abstraction. So one example would be they found a spider neuron that would be preferentially activated. by images of spiders or the word spider or images of Spider-Man or image of the Spider-Man, which look very different for images of actual spider, but relate conceptually to spiders.
Starting point is 01:11:14 So you find indeed that kind of representation in neural networks at the level of single neurons. My best skin ever at 45? Give me a theme song and a best skincare award because it feels like this, right? That's farmhouse fresh skin, all right? I'm blowing, and everyone asks how. The best skincare is farmhouse fresh, and the award is you, your best you.
Starting point is 01:11:47 Visit farmhousefresh skincare.com and use code radio for a free starter routine with any particular. This is Jacob Goldstein from What's Your Problem? When you buy business software from lots of vendors, the cost add up and it gets complicated and confusing. O-DU solves this. It's a single company that sells a suite of enterprise apps that handles everything from accounting to inventory to sales. O-D-U is all connected on a single platform in a simple and affordable way. You can save money without missing out on the features you need.
Starting point is 01:12:18 Check out O-D-O-O-O-O-D-O-com. That's ODOO.com. Okay, that's very interesting to hear, and it lets us come back to the question. Now that we have a lot of domain knowledge on the table here, let's ask the big philosophical questions about whether or not a large language model has intelligence or understands in some sense. And probably the answer is, well, it depends on the sense. So good, you're a philosopher. You can help us explain that. But I will preface your answer by giving Chat GPT's answer.
Starting point is 01:12:56 I asked it whether it really understood things. And ChatGPT says, as an AI language model, I don't have the capacity to know in the way that humans do. So there you go. Why is there even controversy about this? Yeah, you just have to ask the model. Straight from the horse's mouth. Yes, it is a thorny question. It is a question that is very loaded with both.
Starting point is 01:13:20 polysamy and controversy because we use these terms like understanding in different ways. And also people are going to jump to conclusion when these terms get thrown around. So the first thing I would again reemphasize is that I think we ought to have a divide and conquer attitude to these problems and approach them in this piecemeal manner where instead of asking our language models intelligent, we can ask. Do they have specific competences that we associate with intelligent behaviors in humans and non-human animals? And for each of these competencies, we might further break these downs into sub-competencies until we can get something that's a little bit more empirically tractable. That is less ambiguous, that is less susceptible to give rise to merely verbal dispute, and that we can relate to actual functions that can be associated with mechanisms in the models.
Starting point is 01:14:20 So that's how I approach this kind of line of inquiry. So among the things we're associated with intelligence, indeed, would be for humans, something like the ability to understand language, because we know that non-human animals, there is really no non-human animals that has displayed the capacity to understand language in the way humans have. We've tried to teach language to parrots, to chimpanzees, to various animals, and it never really quite works. we can have some very limited success in very narrow cases, but it seems like we humans are the only naturally occurring organisms
Starting point is 01:14:58 capable of understanding language. So the problem is that when we talk about understanding, some people like to think of this as encompassing something like a conscious awareness of the meaning of language. So the philosopher John Searle, for example, had that kind of intuition. And that models the water a little bit because, again, it brings back through the window
Starting point is 01:15:28 this other notion of consciousness where I think we can, in principle, investigate a more functional notion of language understanding without bringing in necessarily questions about sentience and consciousness. So the way in which I would reformulate things is more in terms of semantic competence, which relates to the capacity to parse the meaning of linguistic expressions,
Starting point is 01:15:54 which, again, is a slightly more theoretically neutral or less loaded way to think of that notion of understanding that might not immediately bring in intuitions about conscious awareness. And so the question would be, can we describe any degree and any form of semantic competence to language models? And I would say that we can, and that's now I'm venturing into the controversial territory. Some people would say, no, you can absolutely not ascribe any of that because language models only deal with a surface form of text. There are only, again, this idea we come back to, they are only trained for next word prediction.
Starting point is 01:16:36 They are only predicting which token, which word should follow us from a sequence of tokens or words. So all they grapple with is the syntactic form of text, just, you know, the series of symbols that follow each other in a sequence of tokens. They never have access to the grounding of these symbols in the world. And so this is why researchers like the linguist Emily Bender have referred to these language models as stochastic parrots. which is a little bit misleading because actual pirates are actually very intelligent and are able to interact with the real world and so on. But the idea is rather that these models are just parroting language without any underlying understanding, without any semantic competence.
Starting point is 01:17:31 They only latch onto shallow heuristics about the surface statistics of language, and that's all they do. Now, I disagree with that. And I disagree because, first of all, I think semantic competence is not a monolithic notion and can be broken down into different things that, different capacities we have that relate to our understanding of the meaning of words. They just stick to words first because when we introduce whole sentence, it's even more complicated. But let's stick to lexical semantic competence and parsing word meaning.
Starting point is 01:18:06 So here I'm indebted to, among other people, the work of Diego Marconi, who distinguishes between referential and inferential competence. So referential competence is the ability that relates to this idea of relating word meaning to their worldly reference, to whatever they are referencing out there in the world. and this is exhibited by things like recognitional capacities. So if I ask you to point to a dog, you will be able to do that
Starting point is 01:18:44 or if I ask you to name that thing and that point to a dog, you'll be able to do that. Or it's also displayed in our ability to parse instructions and translate them into actions in the world such as, you know, go fetch the fork in the drawer or in the kitchen. you will be able to do that in the world, right?
Starting point is 01:19:05 So we're able to relate lexical expressions where they're referential, with a reference in the world. But that's not the only aspect of meaning. That's the aspect of meaning that the people talking about this stochastic parrot analogy are focusing on. But our ability to understand war meaning
Starting point is 01:19:26 also hinges on relationships within words themselves, intra-linguistic relationship. And these are the kinds of relationships that are at display in definitions, such as the ones you find in a dictionary, as well as vice-other relationships of synonymy and homonymy that would also underlie our capacity to perform certain inferences in language. And so to illustrate that point, you can consider
Starting point is 01:20:01 someone who's perhaps you know let's say a eucalyptus expert who knows you know but who knows all you know all there is to know scientifically about
Starting point is 01:20:16 eucalyptus trees from you know breeding books back you know in the city in New York say going to university and so on but has never actually been in the Eucalyptus
Starting point is 01:20:31 forest, right, versus someone who actually has grown up surrounded by eucalyptus trees and might know very little about the biology of eucalyptus trees or various information about them, but has grown around them. So the eucalyptus specialist might have a very high degree of inferential competence when it comes to the use of the word eucalyptus, being able to use it in definitions and to know exactly how different are the words. including biological terms would relate to the word eucalyptus and so on. But perhaps if you put that specialist in a forest that had eucalyptus trees and a very similar tree, I mean, my knowledge of this is where I show that my knowledge of eucalyptus is really low.
Starting point is 01:21:20 But perhaps there's a tree that looks very much like eucalyptus trees, but that expert might not be able to actually recognize which are the eucalyptus trees, which aren't, even though it has all this knowledge. its actual referential competence when it comes to the use of that word might not be that great, whereas the person who has grown up around eucalyptus trees might be excellent at pointing to eucalyptus trees, even if it has very, very little inferential competence when it comes to using that term in definitions, for example, or knowing this more fine regulations between the word eucalyptus and various other words. So that's just a very, you know, toy example.
Starting point is 01:21:59 but clearly there are aspects of meaning that are very important in the way we understand and use words that are not just exhausted by this referential relation to the world and this second aspect, this inferential aspect of meaning is something that language models are well placed to induce just because they are trained on this big large corporal half-tairet, to learn statistical relationships between words. And you might think that insofar as there are this complex
Starting point is 01:22:37 intra-linguistic relationship between words, that a model that learns to model the patterns of co-occurrence between words at a very, very sophisticated fine-gren level, might learn to represent these intra-linguistic relationship. You know, Jacques Derrida famously said, there is nothing outside the text. Maybe he was standing up for the rights of large language models and their ability to understand things
Starting point is 01:23:06 before they ever came along. But, I mean, it makes sense to me. Look, these corpices of text, corpi, I'm not sure of texts that the models are trained on, are constructed mostly by people who have experience with the world. It would be weird if the large language model could not correctly infer some things about the world. So we're going to count that on the side of the ledger
Starting point is 01:23:28 for a kind of understanding that these AI systems do have, right? Exactly, yes. And in fact, I think, you know, you could even say something about some very limited and weak form of referential competence in these models, but maybe they would take us too far afloat. But in this, I think insofar as the statistics of language reflects to some extent at least in some domains the structure of the world, you can absolutely think that
Starting point is 01:24:04 you can latch onto something there about the structure of the world just by learning from statistics of language. One example would be there, this wonderful paper by Elie Pavlik from Brown University that showed that you can use color terms like color words like orange, red, blue, and so on. And you can look at how language models are able to represent these color terms. And I'm simplifying a little bit from the study to not get into too many details,
Starting point is 01:24:43 but you can map the representational geometry of the way in which the model represent these terms in a vector space to the geometry of the color space. That is the actual relationship, say, the RGB color space, which is a way to just represent relationships between colors. So there is something about the structure of the encodings for word terms in these models that encodes information or is somehow isomorphic to the structure of colors out there in the world, if you think the algebraical spaces will much represent that. Again, I'm making some simplifying assumption to discuss that line of research, but it's still
Starting point is 01:25:31 a very, it's still a very intriguing finding. Can a large language model have an imagination? Yeah, that's a really, that's a really interesting question. I suppose it depends what you mean by imagination. Once again, I'm going to be this annoying philosopher who comes here, brings things back to the definitions and distinctions. You know, some people are more, I think people are generally more prone to using that term for these image generation models because they're, you know, they are able to generate these, you know, striking images that compared to the text prompt they receive seem to add in a lot of detail just because the resolution of language, as it were, is not quite the same as the resolution of images. I mean, that's just a very simplistic way to explain the phenomenon.
Starting point is 01:26:28 But the way which describes things in language leaves a lot of gaps for an image generation models to feel when it's generating an image. And so when people ask for, I don't know, a picture of a cat on a mat and get as an output picture that has all of these wonderful rich details and colors and a specific kind of math, specific kind of cat, maybe it's a tabby cat, maybe it's a toxed cat, and so on, you might be tempted to think, well, that's really remarkably imaginative. There is a, the model has filled in the gaps there in remarkable ways. But, you know, you could also think about the imagination in the linguistic domain as well. You gave this wonderful example of a haiku about Bell's theorem. that might feel pretty creative, pretty imaginative in some way. I think it depends what you mean by imagination,
Starting point is 01:27:26 whether you bring in this idea that there is some kind of explicit underlying intention to create, visualize something, just write a poem or create an artwork or something like that. I think that might be leaning a little bit too far in the direction of anthropomorphism. However, in a loser sense, you might say, well, here is a way to personalize this notion of imagination. One question is whether these models are merely, as the authors of the Stochastic Parrots paper puts it, haphazardly stitching together bits and pieces from the training data. So is the performance merely explainable by this kind of brute force, memorization? Or is it doing something more, which is general novelty, generalizing from
Starting point is 01:28:24 the training data to new domains and creating outputs that are not remotely similar to anything in the training data? And I think, you know, it's, it's, it's, you can actually study this empirically. Again, I always try to bring it back to problems that are empirically tractable, perhaps unusually for a philosopher. But I think you can make headway on these issues by looking at the empirical evidence from research where you can actually assess the amount of memorization that has occurred in training of a model.
Starting point is 01:28:56 And you can provably show that, well, there is memorization, which is a feature and not a bug because you want your language model to memorize certain things, including if you ask your model to recite a certain famous poem by John Keats, You know, it's quite nice that your model is able to do that. But you also want your model to not just do memorization and to generalize,
Starting point is 01:29:20 and it's provably accurate to say that indeed there is some generalization to, at least some level of generalization to some domains that are out of the, out of distribution for the model. So that might look for the image generation models, like the ability to generate images that don't look like anything, the images that have been trained on. And that is, you might call that in perhaps a loser or more deflationary sense, a form of imagination. No, I love that answer, actually. It is an important distinction between interpolation and extrapolation, right? What you're saying, if I understand correctly, is that the large language model,
Starting point is 01:30:09 seem to be doing more than just plagiarizing and pastiching and remixing. They seem to be generalizing is the word you use, which I suppose is the right word, but somehow finding a theme or an idea or a style and doing something arguably new in that style. And if we're not going to call that creativity or imagination, then what should we call it? Yes, exactly. I mean, it's, there was an interesting discussion recently. I mean, if you've seen this Alpet published by Ted Shang in the New York Times, this is about Chad GPT. And it's wonderfully written as anything Ted Chang writes.
Starting point is 01:30:54 I think it's in many ways a wonderful essay, but it builds this metaphor that I think is a little bit misleading, which compares Chad GPT to lossy compression of images, say the JPEG compression format is a way we have to compress images. And some of these image compression formats, they are able to reconstitute a lossy approximation of the original image based on the compressed representation. It's lossy because it doesn't include all of the same details.
Starting point is 01:31:31 That's why you see artifacts in artifacts in JPEG images sometimes. But it's able to do that by with a certain compression and decompression algorithm. And one way, one strategy that can be used is this idea of interpolation in pixel space. So you might, for example, a very simplistic form of compression may be just saving information about two pixels but not the pixel in between. So that allows you to compress the image and save it in a smaller file. And then at deep compression, during the decompression, you would interpolate the middle pixel by, you know, you can have more or less sophisticated ways of doing that. But a very simplistic basic way, which is just to take the average value perhaps of the two neighboring pixels and doing something like that. So that's the very simple form of interpolation.
Starting point is 01:32:27 you also find this with video decompression algorithms that only save certain frames and try to interpolate the frames in between in pixel space. And Tetchung in that article suggests that a chat GBT and language models generally can be thought of as a kind of blurry JPEG of the web where they get trained to compress the web
Starting point is 01:32:50 and then at inference time when you're generating text with these models, it's a form of lossy decompression that involves interpolation in the same way. And I think there is, I think it's a very interesting test case for intuitions because I think this metaphor, this analogy, parts of it are pumping the right intuitions.
Starting point is 01:33:10 So there is definitely a deep connection between machine learning and compression that has long been observed and studied. There is a sense, a very real, even technical sense in which a machine learning model trained on a lot of text data like this with an expert prediction is compressing information about the training data. And, you know, the fact that it's able even to do things like remember so many facts about
Starting point is 01:33:45 the world that are presented in Wikipedia or recite poems like that are presenting the training data, do all sorts of strict memorization. exercises or approximate memorization exercises is some extent evidence of that. These are models trained on terabytes of data, but the actual model just weighs a few, you know, it's just a few gigabytes of data. So there is compression, but the generative step when you use the model after training to generate text, I think it's comparing it to lossy image decompression is pumping the wrong intuitions. Because again, that suggests that all it's doing,
Starting point is 01:34:26 is this kind of shallow interpolation that amounts to a form of, you know, approximate memorization where you have memorized some parts of the data and then you're loosely interpolating what's in between. So, for example, in the text domain, maybe it would be something like you've memorized every other word of the John Kitt's poem and then you're kind of trying to do some shallow interpolation of the words in between or something like that.
Starting point is 01:34:56 Again, simplifying things a bit. But that's not really what's going on here. You could conceivably use, in fact, some people have done this with image generation model. You could use them as lossy image compression and decompression algorithms. That's a specific use case for them. But that's a very specific use case. But in the normal case, when you generate images or text with language models, you're doing more than that. this is where you talked about the distinction between interpolation and extrapolation,
Starting point is 01:35:29 it kind of breaks down a little bit with these models. There was this paper by Jan Lecun and colleagues that was published a few years ago about this, where you can think of interpolation in very high dimensional spaces as equivalent to extrapolation. Okay. Just because when you have so many dimensions, you know, the kind of, you know, the kind of intrusions you have about interpolation in a specific narrow domain don't really hold anymore.
Starting point is 01:36:04 So even if we leave the strictly technical point aside from that paper, the intuition is that you, you know, there might, there would, there would be a way, presumably, to characterize what language models, large language models, and image generation. models are doing when they generate images and texts as involving a form of interpolation. But this form of interpolation would be very, very different from what we might think of when we think of nearest neighbor pixel interpolation in lossy image decompression. So different, in fact, that this analogy is very unhelpful to understand what generative models are doing, because, again, instead of being analogous to brute force models.
Starting point is 01:36:53 memorization, there's something much more generally novel and generative about the process of inference in these models. Yeah, that's great because from my own experience, thinking about quantum mechanics, I can verify that the human mind is not very good at intuitions in large-dimensional vector spaces. Like, once you have more than three dimensions, we don't have a very good idea of what's going on. So that's fascinating that once you get to huge numbers of dimensions, interpolation and extrapolation begin to blur together in an interesting way. Okay. So, what I've learned is that there's a sense in which, you know, obviously all these things are work in progress, and that's fine. But there is some sense maybe in which there is, as you said, semantic competence in a large language model. There's a sense in which meaning is really there. There's some structure in there that is non-trivial. And also there's a sense in which they can be creative or imaginative. So I guess the last big, big, picture question I wanted to wonder about was, can they be agents in some way? Can they have
Starting point is 01:38:00 goals? Can I make a contract with a large language model? Can I agree that if it does this thing today, I will pay it some money 10 years from now? Are those concepts even sensible, or do we care about them in the context of these AI models? Yeah, so this is an excellent question, and this is where I would again invoke the importance of having this divide and conquer approach to the ascription of capacities to these models. So indeed, I've suggested, and I just want to make it clear that I'm not suggesting language models, understand language like humans do, that they have the full-blown semantic competence of humans very far from it.
Starting point is 01:38:38 But they might have some limited form of semantic competence. They might have some, in some very deflationary sense, some form of creativity or imagination, in a sense we've defined. Now, when it comes to goals, this is where I'm much more skeptical, that we can ascribe anything like intrinsic goals or desires to a language model. This seems like a category mistake, or at least there doesn't seem to be any evidence that there was anything like that in such a model. Of course, the fully developed answer would not just appeal to intuitions based on the learning objective. of these models, which is an extra prediction, because we've talked about how that's not the whole story. So we would have to look again at this kind of mechanistic interpretability work and would have to have a
Starting point is 01:39:31 more specific operationalized notion of what having an intrigue signal is and what kind of function or computation it might involve and whether there is anything functionally analogous to this kind of computation in these models. Nonetheless, There is one thing about this model that is very important to keep in mind is that they learn from data in a purely passive way. So they get fed this continuous stream of sequences of texts and they play this next word prediction game. That's how they get trained. That's how they learn to encode various properties of language. And then at inference time, after they've been trained, so we said that the models are not frozen,
Starting point is 01:40:17 just meaning that the internal parameters are no longer being adjusted. the internal knobs inside the network are not being tuned anymore so there is no more training no more learning then at inference time these frozen models are still doing next per prediction again on the prompt on the input given by the by by the human at no stage in this process do we have an opportunity for a generating interaction between the model and the world or even between the model and a language-only world. I mean, if if the model was trained by through dialogue, for example, even if it was trained on text only, then there would be a little bit more of interactivity, but there is no such thing here. And so one thing you might,
Starting point is 01:41:09 you might consider is that perhaps having something like intrinsic goals require a form of learning that's a little bit more active than the way in which these models are learning. That's one possible consideration. Another one is that these models, as I just mentioned, they're not continuously learning or continuously adjusting their internal parameters. So once they're trained, they're frozen, and then you can run inference on them, so you can have some input flow through the network.
Starting point is 01:41:48 work, what we call the forward pass to the model, so being from input to output. But again, that's just a one directional process. It doesn't fit back into the models internal encodings, internal representations. So to that extent, also, you know, when we think of having interesting goals, we think of having something like dynamic goals that are adjusted basis, on the basis of an ongoing interaction with, certain inputs and calibrating our outputs
Starting point is 01:42:21 and because these models are not able to adjust their weights to change the way in which they respond to certain inputs you might think also that there's a problem in describing anything like an interesting goal there and then even, you know, empirically people who are very concerned about AI safety about this idea that perhaps
Starting point is 01:42:39 artificial intelligence might in the near medium term, perhaps in the longer term, become a genuinely threatening technology for even the survival of the human species as a whole. People who are very concerned about this have been looking for early signs of potentially threatening problematic behavior in large language models.
Starting point is 01:43:03 And there was a recent efforts in that direction from that was sponsored by the company Anthropic, which is one of the large new startups working in on language models that was developed by people from Open AI who funded it a few years ago. And they had this competition, which was about what happens when you scale language models. So there is one surprising thing that we've known at least since GPT3 was unveiled in 2020, which relates to what we talked about earlier with a bitter lesson,
Starting point is 01:43:36 which is that scaling these models, meaning just building models that have more parameters, you just cram more parameters inside these models, more layers, more connections between the units in the model. Just doing that and training these models on more data seems to be sufficient to have breakthroughs in the performance on certain tasks. So unlock new capacities. So people there talk about emergent abilities. And actually, there are a lot of connections with physics there and the notion of emergence, of course. And there's this paper that was published by Appenaia in 2020 that finds this scale. scaling power lows that look at how scaling the size of the model and the amount of data
Starting point is 01:44:20 you train the model on leads to these improvements in their performance and extra prediction. So that's just looking at extra prediction. But in terms of the actual capacity, you also see this non-linear improvement. When you go past a certain size, suddenly your model starts being able to solve certain math problems or being able to explain certain jokes, for example. being able to do some forms of common sense reasoning. You get this nonlinear transition phases as you scale them up. And so Anthropic was interested in whether there are also some inverse scaling phenomena
Starting point is 01:44:55 where scaling the model instead of just improving the performance in a favorable way in ways that we care about and we find are useful might also lead to either degradation of performance or lead to unwanted behavior. And one of the behaviors they were interested in is what people in the alignment, AI alignment literature, people who are concerned with aligning the future artificial intelligence systems
Starting point is 01:45:24 with human values to avoid catastrophic scenarios, what they call power-seeking behavior. So will we find that when you scale language models past certain sizes, they are more prone to starting to display kind of behavior that to give the most caricatural example would be something like ignoring the task that you're asking them to perform and instead train to persuade you to augment their capacities by training yet a bigger model or continuing to train them on even more data or giving
Starting point is 01:45:59 them more computational power or things like that. That seems very science fictional and far-fetched to me and in fact didn't find anything like that. through this competition, that would look like an intrinsic goal to me. I mean, if you did find that models would completely ignore the tasks that you've asked them to do and instead try to manipulate you into doing something totally irrelevant and self-serving, quote-unquote, from the perspective of the model, then that would be indeed very alarming and look very much like intrinsic goals. But I don't think we see any evidence of that.
Starting point is 01:46:35 One of the lessons that I've learned from doing a lot of podcasts with biologists, computer scientists, neuroscientists, philosophers, is that it really does matter to who we are as people that biological intelligences are embodied, right? That we live in bodies, that we get hungry, we get bored, we have training from evolution to try to survive or at least propagate our genome and so forth. and these large language models don't have anything like that. They don't get bored. If I turn on the computer and I do not ask chat GPT a question, it does not get irritated with me, right?
Starting point is 01:47:13 So on the one hand, they don't have that. On the other hand, it doesn't seem that hard to put the model in a robot and let it walk around and punish it if it gets hurt or something like that. And so I don't know. This is not even a question, but something to speculate about. Maybe that's also a kind of human-like understanding, or thinking that it wouldn't be that hard to inculcate in an AI model somehow.
Starting point is 01:47:40 Yes, I mean, that's actually quite interesting because there is a lot of research into bridging this technology with more embodied forms of intelligent behavior. One of the impressive strides that have been made recently was from Google. They have this project called Say Can to refer to this idea that, these models, these systems they're trying to build would relate what is said to
Starting point is 01:48:11 what they can do, so say can relating language to action in the world. And essentially these are little systems that are embedded in these little robots that have this robotic arm and a little camera and are on wheels.
Starting point is 01:48:28 And they use these pre-trained language models to parse instructions given by humans, such as, you know, go fetch the apple in the kitchen. And then these models are used to translate this natural language instruction into a format that is more actionable from a robotics perspective, like drive to kitchen and then fetch Apple. So they can break down the natural language instruction into a series of more specific
Starting point is 01:48:57 instructions that can be parsed in terms of specific actions. And it has a camera that is plugged to a vision language model that can relate specific keywords like Kitchen or Apple to landmarks in the environment. So bringing all these things together, all these different elements of the puzzle, you can actually build a system like this. However, the key aspect here of these systems is that these are not systems that are trained end to end by interacting with a word. you are just using a pre-trained model that has been trained completely passively to process language and then plugging it onto these other modules. And so there's perhaps a surprisingly deep question here about whether, you know, this kind of system is at all useful to think of what might be happening in human cognition
Starting point is 01:49:57 or animal cognition with interactions between different domain-specific modules that might not share, might not share the same kinds of representational formats and might have information that is encapsulated. And that do not really overlap in the same way, whether there's only some information is passed along for further processing and transformation in a different format and so on to other modules. Or whether this kind of modular approach to the mind is not the right one and whether we need models that don't just, you know, stitch these things together.
Starting point is 01:50:32 but actually learn from the ground up by interacting with the world and having a rich multimodal sorts of information. So that's interesting how, you know, these developments actually map onto longstanding discussions about human. So my last question, which is completely unfair, because it will probably require a lot to answer, but do you foresee a time not too far away that we would want to give rights to AI models, whether ethical rights or legal rights or, you know, to at some point say it would be wrong to turn off this model because it's just as human as you or I, or at least it share some
Starting point is 01:51:14 aspects in common. Yeah, I think that's a question that's, you know, has been on some people's minds lately for a few different reasons. So one of them is, we haven't really talked about this, but people have been asking whether we can describe any form of sentience or consciousness to large language models of chatbot. So the first big story about this was when this engineer from Google, Blake Lemoine, became convinced that their internal chatbot, called Lambda,
Starting point is 01:51:42 was sentient, and it turns out it was on the basis of his own religious beliefs that led it to ascribe sentient to that chatbot based on the way it was responding to certain questions. It turns out if you read
Starting point is 01:51:57 the transcripts, these are can be construed very much as leading questions, priming the model to engage in the language game of sentience, as it were. And remember, these models have been trained on a lot of science fiction that includes sentient AI. So they are excellent at creative fiction and then excellent at playing the role of a certain character in a story, whatever that role may be. So I would take this kind of story with a grain of salt.
Starting point is 01:52:20 And more recently, there has been also some interest in that with more recent models. So that relates to the right question and the ethical question, because many people think in philosophy and I think that maps on two intuitions people have generally that consciousness having conscious experiences is something that's intrinsically valuable, meaning that a system, a being, or whether it's an organism of a system or an artificial system that has conscious experiences is worthy of moral consideration, spatial moral consideration just by virtue of having such experiences, such that, for example, it would be wrong ethically to inflict pain onto that system
Starting point is 01:53:09 or to perhaps terminate that system and so on. So that's one way in which that relates to morality, but you can also have a view that doesn't even appeal to consciousness and think there is a certain notion of personhood or agency that can be valid even for non-conscious systems. and that also relates to morally weighty decisions in a substantive way where it would be wrong to do certain things to assist them that is an agent or a person in that sense. So these two things connect to the moral question and to the legal question.
Starting point is 01:53:50 Of course, although generally in the legal discussions, the details of these discussions are fleshed out in. in less fine-grained ways, of course. So do I foresee that we should ascribe rights to deep learning systems in the near-term future? I don't think so because, well, I worry that doing so would have immediate, potentially very nefarious implications for humans themselves, because as soon as you ascribe rights to such systems, you might find yourself from a legal perspective in cases in which you have to make decisions
Starting point is 01:54:36 that in order to safeguard the rights of these artificial systems might bring harm to humans, whether that's imprisonment of humans or, you know, what happens if you're humans, a human has turned off an artificial system that is deemed worthy of rights. Is that a form of crime? Is that a form of murder or something analogous to it? Do we need to look up that human and so on? So that's an extreme example, but there might be many more subtle examples of that. So I think really, you know, there was recent op-ed about this in the LA Times
Starting point is 01:55:16 by my colleague Henry Shelvin and Eric Stritzkebel. It was a point about relating sentience and morality and rights. And the point he was making is that we have a moral imperative not to tread even lightly into the territory that could lead to some ambiguity and confusion about that, meaning we shouldn't build robots that could give us or our systems, artificial systems. that could be serious candidates for sentience or personhood or agency in this morally weighty sense.
Starting point is 01:55:59 And when I say serious candidates, I mean that most, the vast majority of experts would agree that current language models are, you know, almost certainly not sentient because there are various reasons you can invoke about various properties that seems to be missing there. all of the leading scientific theories of consciousness seem to think are important for consciousness. You can't be 100% sure, but I also cannot be 100% sure that, you know, a rock doesn't have some degree of science. If you're a panpsychist, maybe that's what you think. But, you know, that might be also a different kind of sentience that might not have the same connection to morality as well. So all of these things are all moving pieces that have implications.
Starting point is 01:56:46 But yeah, I think I do get the impetus to try not to build models that will give us serious reasons to think that they might be sentient because then we will face a conundrum. It's a damned if you do, damned if you don't scenario. If you do give them rights on the basis of, you know, on the off chance that they might be sentient and worthy of moral and legal consideration, then you might end up harming humans. And if you're wrong about the fact that they're sentient, you know, it's a huge moral hazard to take that step. But on the other hand, if you don't ascribe them any rights and consider them worthy of, you know, being at least moral patience. And if you're wrong about that, then you also, it's also a considerable moral hazard. So perhaps the best, you know, the best, best situation is just to try not to get into that. into that place in the first place and trying to build systems that would give a story,
Starting point is 01:57:49 you're supposed, in this way. It's fascinating to me that famously Alan Turing suggested the Turing test, right, for can a computer think or if you want, be conscious. He was really, he tried to be clear that he was not talking about consciousness, but about thinking. And he proposed this test where if you could fool a human into not being able to tell whether it was talking to a person or a machine, and the machine counts his thinking. And to me, it seems pretty clear that these large language models could easily pass the touring test.
Starting point is 01:58:19 And as soon as that happened, everyone lost interest in the touring test because they realized that that it's not actually a very good criterion for thinking. It's a little bit more subtle than that. So now we're really faced. We have a duty now for lots of reasons, both practical and moral, I think, to confront these slightly philosophical questions. We're entering into uncharted territory. Absolutely. Yes. and behavior in terms of linguistic output, for example, is no longer the gold standard it used
Starting point is 01:58:49 to be. And I mean, this is quite interesting even thinking about consciousness, you know, for a long time, a lot of what, you know, like all of consciousness research with humans has to rely on verbal reports to some extent. There is nowhere around that. And of course, they can confirm the results that you get when you try to establish certain correlations. for the neural correlates of consciousness, for example, but there's no way around the fact that the ground truth for whether a given individual is experiencing something and what that individual is experiencing generally comes of always comes back to some kind of report, verbal or nonverbal, but some kind of introspective report or self-report. Now we have these systems that can give you
Starting point is 01:59:39 indefinite reports, as it were, of arbitrary precision and detail, and they can talk at length about their feelings. And we have very good reasons to think that they are intrinsically incapable of feeling anything. And that certainly changes things. I mean, that challenges our intuitions about consciousness, how it might relate to perhaps, not just language, but perhaps also generally to intelligence, but also, also. it just turns on its end the kind of methods we've used to investigate consciousness in humans,
Starting point is 02:00:22 because here we don't have access to the ground truth and we are stumbling in the dark trying to make inferences on the basis of certain properties of these systems. I think we still currently have very compelling and critical reasons to deny them sentience, but again, what happens when we don't is an interesting and alarming question. But full employment for philosophers. So that's a good situation to be in. So Rafael Millea, thanks so much for being on the Mindscape podcast.
Starting point is 02:00:51 Thank you for having me. This was a pleasure.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.