Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 248 | Yejin Choi on AI and Common Sense

Starting point is 00:00:00 Hello, everyone. Welcome to the Mindscape Podcast. I'm your host, Sean Carroll. If you are a fan of evolutionary biology, then you've heard of the theory of punctuated equilibrium. This was an idea put forward by Niles Eldridge and Stephen Jay Gould back in the 70s to think about how evolution works in contrast with the dominant paradigm at the time of gradualism. In the course of evolution, you build up many tiny little mutations, and gradualism says that therefore evolution. change happens slowly. Eldridge and Gould wanted to say that, in fact, you can get the kind of mutation where it speeds everything up and it looks like there is some sudden change, even though there's long periods of equilibrium between the sudden changes. Physicists know about this kind of thing very, very well. There are phase transitions in physics where you can have a gradual change in the underlying microscopic constituents or their temperature or pressure or whatever, which leads to sudden changes at the macroscopic level. And by the way, in biology, guess what?

Starting point is 00:01:03 There are aspects of both. There are gradual changes, and there are also punctuated rapid changes. I mentioned this, not because we're going to be talking about that at all today, but because I think that we are in the midst of a sudden rapid change, a phase transition when it comes to the topic we will be talking about today, which is artificial intelligence. As I say later in the podcast, a year ago, when I started teaching, my first courses at Johns Hopkins, there was no danger that the students writing papers were going

Starting point is 00:01:33 to appeal to AI help. Now it is almost inevitable that they will do that. It's something you can try to tell them not to, but they're going to, because the capabilities of the technology has grown, so very rapidly, and it's become much more useful. Very far away from being foolproof. Don't get me wrong. So that raises a whole bunch of issues. And we're going to talk about a lot of these issues today with today's guest, Yehijin Choi, who is a computer science researcher. She's done a lot of work on large

Starting point is 00:02:05 language models and natural language processing, which is the sort of hot topic these days in AI. Won a lot of awards up to it, including a MacArthur Prize. And one of her emphases is something that I'm very interested in, which is the idea of how do

Starting point is 00:02:21 you get common sense into a large language model? For better or for worse, The ways that we have been most successful at training AI to be human-like is to not try to presume a lot about what it means to be human-like. We just train it. We just say, okay, Mr. AI or Ms. AI, here is a whole bunch of text, you know, all of the Internet or whatever. You figure out, given a whole bunch of words, what the word is most likely to come next, rather than teaching it, you know, what a table is and what a call.

Starting point is 00:02:57 coffee cup is and what it means for one object to be on top of another one, et cetera. And that's, you know, surprising in some ways. How can AI become so good, even though it doesn't have a common-sensical image of the world? It doesn't truly, maybe, arguably, depending what you mean, understand what it is saying when it is stringing these sentences together. But also, you know, maybe that's a shortcoming. Maybe there are examples where you would like to be able to extrapolate outside. what you've already read about on the internet, and you can do that if you have some common sense, and it's hard if all of your training is just what is the next word coming up.

Starting point is 00:03:37 A completely unfamiliar context makes it very difficult for that kind of large language model to make process. So this is what we talk about today. Is it possible for LLM's large language models to learn common sense? Is it possible for them to be truly creative? Is there some sense in which they do understand and can explain things, and also, will they be able to soon if they can't already right now? Of course, there's infinite implications.

Starting point is 00:04:05 We touch on these very, very briefly. It's going to change. That's why that's the point of being in the middle of phase transition, is that it's very hard to predict exactly where you're going to go because your intuitions are not that good. Your training is not up to the task, whether you are a human being or yourself a large language model. So my attitude here is that we should keep an open mind. This is not the time to be doctrinaire.

Starting point is 00:04:29 This is not the time to firm up your priors and your credences so much that you're not able to move them around. This is the time to be open, to watch things develop, to imagine what could happen, but not try to be too deafened about what will happen until it actually is so that you can correctly adapt to this brave new world that we're entering in. So let's go. Yeah, Jin Choi, welcome to the Mindscape Podcast. Yeah, I'm excited to be here. You know, this is obviously a big thing, right? AI is rapidly changing in front of our eyes.

Starting point is 00:05:18 It wasn't that long ago that a Google employee started claiming that large language models are sentient, and I think he got in trouble for doing that, as I recall. Just so, yeah, there's always people who only listen to the podcast for the first five minutes. So are large language models sentient, or are they in any danger of becoming sentient anytime soon? Personally, I strongly doubt that anytime soon we will see that. However, people believe in what they want to believe, and some people believe in tarot cards. So there's nothing we can do about that. That's very true.

Starting point is 00:05:57 I did want to get in just a very tiny bit. I think that people have heard the whole idea about neural networks. There's little sort of neuron things, and they add together in deep learning. but the idea of representing words as vectors is something that really had an impact on me. Was that, you know, explain what that means maybe, and was it a giant breakthrough when people started doing that? Yeah. The idea that, I mean, in some sense, the vectors, especially based on the continuous numbers, kind of makes sense, although it does seem weird because word looks very discreet. and now we are representing it as some sort of continuous vectors.

Starting point is 00:06:41 But it kind of makes sense in the sense that we do tend to read a lot of nuances. We do tend to see different nuances in the way that how the same word may be used in two different contexts. So the key idea behind the current vector-based representation of the word is that your meaning, as a word has to do with the neighbors in which you appear. It's almost like, you know, a person's identity may be defined by the friends that you hang out with. So similarly, it turns out that key idea was some sort of like one of the key breakthrough ideas to better represent the meaning of a language.

Starting point is 00:07:28 Because before then, a word was a word and, you know, it's just like a discrete identity. Yeah. But that wasn't able to handle all this. reach meanings behind human language? As a slightly mathy person, I can't help but ask whether vectors are the best way to represent words, or are they just something that we are conveniently using temporarily? I mean, it seems that one of the advantages is that you can imagine adding and subtracting, right?

Starting point is 00:07:57 Like the example that I came up with was dinner minus evening plus morning equals breakfast. and this is the kind of thing you can do if you think of words as vectors. And I'm not sure if that's the best possible way to think about it. Yeah, no, actually that's one of the surprising sort of like side benefit of representing words as vectors so that you can do that sort of analogical reasoning. It might be that even more broadly, chat GPT in particular, is able to perform that sort of analogical reasoning not just at the word level, but at a sentence level or even document level, because it's able to handle previously unseen users' queries in a very impressive way.

Starting point is 00:08:48 And oftentimes, though, the way that it handles is very sort of like, you know, lawyer style, super polite and hedged language that it uses is a fairly repetitive and even generic to some degree. and that's because it's doing that sort of analogical interpolation between some examples that it has seen before and then your query that it needs to answer. I have noticed just from playing around with various versions of GPT,

Starting point is 00:09:19 you know, it will say things that are not quite true, and you can ask it like, are you sure, and it will correct itself? So just this morning, before we started talking, I was doing that, and I asked it a question, it gave a wrong answer, so I just asked exactly the same question again. And its response was, oh, sorry for my mistake previously. So there's something about it that it's able to recognize its mistakes, but I'm not quite sure how.

Starting point is 00:09:45 I wouldn't know for sure whether that's truly knowing whether there was a mistake or not in the following sense. Sometimes it's going to confirm that what it said was correct. sometimes it's going to super apologize and it's going to switch what it said. You need to kind of try in both ways, by the way, not only when it made a mistake, but also when it did not make a mistake and see whether it's actually able to be truthful about what's actually true. The truth is people do have a bit of a bias that they ask, are you sure, only when they know

Starting point is 00:10:25 for sure that it's not right. So then chat GPT has learned that whenever people are asking that question, probably it's a good idea to back off. But then if you ask the other way, when it's like correct, then you know, you ask whether it's really correct. Then, you know, it gets confused. And then there was this reasonably recent news headline about a lawyer that used the chat GPT and got into a big trouble. Oh, yeah. And you know what he did for fact-checking is to ask chat GPT, is it all fact? And chaty-P-T said yes.

Starting point is 00:11:05 So that's where things are. And this is a huge challenge with large language models in the coming years. Not just this year, but in the coming years as well. Yeah, something like chatchip-t, when you do ask it, you know, are you sure, could you check? Correct me if I'm wrong, but it's only going back to what it already knows, right? it's not searching the web to make sure that it's on the right track. In its original version, you're right. You know, the one that's plugged with Bing Search might be doing something else.

Starting point is 00:11:38 Okay. Either already or, you know, sometime soon. But yeah, by default, it's just based on what it has a same before. To the extent it can actually understand that, memorize, you know, that that's actually the part where interesting things can happen. Well, let me go back to something that you said, because I remain a little bit confused about this in terms of, is it a large language model just predicting the next word in a sentence?

Starting point is 00:12:10 Or do these modern models have the ability to sort of predict sentences or paragraphs at a time? Oh, yeah, good question. It kind of feels like it's doing the letter. But it's really the technical detail is that it's trained to do only the former. So it's trained to predict which word comes next. But if you train it so well on so much data, then we realize that, wow, it can actually generate a very nice, fluent long document. And this is a crucial.

Starting point is 00:12:47 As if it's for it. Yeah, this is a crucial point because maybe just can't be emphasized enough. literally all it's doing is saying that given what I've said and given everything that I've been trained on, what's the most likely next word? And then there's some random numbers in there so that sometimes it'll give the second most likely next word or whatever. But that's literally all these models are doing, right? Yeah.

Starting point is 00:13:09 Yeah. So though that actually says something about interesting, perhaps reflection on, you know, human intelligence and language in the sense that it might truly be that human, humans are also fairly templated and pattern-based and our reasonings oftentimes are just memorized reaction, reactive reasoning that we think we reasoned, but it might be that we just pull the memorized conclusions without actually double-checking on whether what we believe is actually reasonable or not, which is why humans oftentimes have a cognitive dissonance, we're perfectly capable of believing to contract, two things that are

Starting point is 00:14:00 contradiction as a human. You know, we could say that, oh, you know, there are people who support, you know, in public or at least in their mind, they want to support diversity. And then they go ahead and do something that's at us with what they claim to be. Well, I have wondered about that. I mean, number one, not to be ungenerous, but there are certain people who all you have to do is say a certain word or phrase and you know what they're going to say next, right? And so are we learning about human beings by figuring out what large language models do? Yeah, part of what's exciting for me at least about current AI progress is that it's a mirror back on us.

Starting point is 00:14:47 really, you know, AI would have not been possible without all this human-generated data available on the web. And that's really reflecting back on us. Well, this raises some questions right away, right? Like, you might as well just dive into the big ones now. Is there some sense in which the large language models understand what they're talking about? Or should we think of understanding as something separate from predicting the next word pretty accurately? Yeah, that's actually a huge debate question right now. Super controversial.

Starting point is 00:15:23 It's kind of funny because now AI looks like it's understanding the, I mean, compared to how it didn't work very much before in the past. And now it's performing the best. It looks like it's understanding the most. And now is the time when AI researchers are so divided, whether it's understanding anything at all. So, yeah, I personally, I do think that it's a philosophical question, which means it's difficult to get consensus on this from everyone. It does behave like in many ways that it did understand because, you know, it's able to give you sensible questions to many of your questions. But on the other hand, my personal take is that it's not understanding. as well as you may expect it to be based on how fluent, impressive answers it's capable of generating.

Starting point is 00:16:27 So that's where one needs to be very careful in not trusting everything it says. And this is going back to your earlier question about sentient. Is it actually sentient because it's able to say things like, oh, I want to live longer and, you know, don't kill me. and you take the words, you know, at the surface level, then you might conclude, oh, this is a sentient. But it could just be that it has read this kind of stories that are human written. You know, there are sci-fi movies in which AI is begging to not to be, you know, plugged, you know, don't pull the plug.

Starting point is 00:17:08 So AI was begging for life before. And that was a human idea to put into the web internet text. So it could just be like repeating what we told it to learn. But that is, it does raise anyway a super interesting question. I mean, it would be very easy just to write a short computer program to have the computer output. I am alive. I am conscious. Let me out without anything that we would actually think counts as that.

Starting point is 00:17:39 So now we have to ask what does count as that? Is that something that you as a computer scientist worry about? Are you leaving that to the philosophers? Oh, no. These days, many of us are thinking about it a lot. And we realize that we don't even know how to define understanding precisely. And it's been rather a moving target instead of like making sure that we define it. formally and then stand by it, we realized that we don't know how to do that quite right.

Starting point is 00:18:16 So evaluation became a new challenge to AI. In the past, when we said evaluation plan, because the field was moving so slow, we didn't need to worry about redesigning evaluation very much. You know, nothing works anyway, so it doesn't matter. But it's now, we, we, we, we, we, we, don't even know how to evaluate. But actually, if you really think about that, do we even know how to evaluate humans all that well? I mean, IQ test doesn't do it. It's not clear whether the SAT test will do it. Maybe, you know, some combination between your articles. I mean, if you are a

Starting point is 00:19:02 researcher, then, you know, we want to see papers that this researcher has. a return, but we usually need to look at things collectively. And so it might be that as AI becomes stronger, there's no one measure that can tell us, whether it's a sentient or not, but we really have to look at things collectively. There's something that I said, which I would love to see if you agree with or tell me that I'm completely barking up the wrong tree, which is that we talked for many decades about the touring test, right? And then suddenly, We have these LLMs that basically, I would say, can pretty easily pass the touring test,

Starting point is 00:19:44 but we kind of lost interest in it because it's clearly not quite testing what we care about. Okay, I can disagree on that. Okay, good. Good to have something to disagree on. So I don't think, yeah, I guess we, maybe a touring test, it may seem like it may have tested for the Google guy who believed, you know, the chatbot was a sent here. But, I mean, even so, not really, because he knew perfectly well that he was talking to a chatbot. And the thing is, with chat GPT, due to the way that it has limited interaction mode with you, you know, that this is not a human. You know, if you tell it a, you know, random chitch chat that you might have during your life, it's not going to really remember all that.

Starting point is 00:20:38 in the way that humans are able to, or it's not able to forget in the way that humans are able to forget. So there's going to be something odd about the way that it's interacting with you. Plus, if you ask very simple, you know, common sense questions, it may also fail in a way that humans wouldn't. So for one reason or the other, I think it hasn't really passed yet. That's perfectly very. I think it probably depends on who is administering the

Starting point is 00:21:08 test and how good they are, right? Yeah. So one thing, this referring to just what you said about memory and remembering, I mean, on the one hand, this is maybe a technical question. I have this idea that chat GPT is just spitting out the next word. On the other hand, it can clearly remember what we were just talking about recently. So is that some kind of extra ability that you're giving it, or is it that it instantly incorporates everything we just said in its main memory bank.

Starting point is 00:21:40 Yeah, so the weird thing about chat GPT or transformers in general, the architecture behind the chat GPT, is that it can literally store a very long context up to the most, most recent one, GPT4, being able to store 32,000 of tokens. And so it can literally write that down somewhere in the computer. memory and then be able to attend to the exact sequence while it's trying to predict which word it wants to generate next. And compared to that, humans, you know, we've been talking to each other, but I certainly cannot regenerate verbal team the, you know, conversation we just had so far, right?

Starting point is 00:22:28 We only remember the gist of it. So we're capable of somehow abstract away out of the surface, you know, patterns and the exact words that we were using. But we are able to summarize an abstract away and then even be able to refer back to some of the, you know, talk points earlier. And then throughout the very complex stories. So this is really like where humans excel and this machine is not as much, in part because in some sense, you know, when it has this ability to rely on what is literally written down and it's as large as 32,000 tokens, it's not really pushed or challenged to think about how to summarize the key idea. The other thing is it's not going to be able to ask sharp questions.

Starting point is 00:23:21 Because it's only learned to mimic human patterns, which means, you know, it might try to pretend that it's asking some interview questions about AI tokens, that other people seem to talk about. But if we talk about anything new, you can forget about chat GPT being able to contribute much there. And maybe that is one of the differences, or although maybe it's a correctable difference. On the one hand, you're emphasizing the fact that 30,000 tokens,

Starting point is 00:23:48 it can remember perfectly, but 100,000 tokens, like once you're past the buffer size or whatever, it's going to remember zero, I suspect. Yeah, yeah, yeah. Yeah, yeah, yeah. Once it's out, then, so yes and no. So during the interaction time with humans, the model is no longer updating. And so any new context, you know, that any new information that you provided to it, if it doesn't fit into its working memory, it's a working memory, that's very large, then it's gon-gone.

Starting point is 00:24:26 But hypothetically, if you can customize, you can perform customized, continued training of large language models on your laptop or something in the future, then it can update its model parameters. But there's a different problem. Once it's trying to internalize the tax. into its parameters, there's no guarantee whether it's correctly memorizing it or it's going to do some BS on you later. That's where, you know, the factor checking becomes hard.

Starting point is 00:25:04 Yes, I can imagine. Maybe I'll be similar to how humans also are not able to, you know, necessarily memorize everything correctly. But the key difference is that humans, we kind of know what we don't know, and then able to delegate to search or, you know, fact check, whereas transformers don't really seem to know what it doesn't know. So, you know, maybe first the challenge to transformers is to know yourself. It doesn't know itself very much yet. I have noticed this when I'm talking to chat GPT. It always seems very confident in itself, right? It's never, I mean, maybe this is something

Starting point is 00:25:46 that's an easy programming fix. But, it will say things utterly untrue with complete confidence. Yeah. It's pretending to be confident is probably more like it than it's actually confident in that for whatever reason it was tailored to speak that language, that style of language. But this is where you know, you and I can be skeptical whether it really understand what it says in the sense of that although it's using confident language, it may not actually understand that it's doing it.

Starting point is 00:26:24 Does when chat GPT makes an utterance, does it internally have a confidence level associated with that? Like I think that I'm 90% right? Yes and no. It does have a probability score associated with which word comes next. Okay. Now, whether that perfectly aligns with the correctness of the knowledge or confidence level of correctness of the knowledge, then it might correlate, but it doesn't

Starting point is 00:26:54 perfectly align, which is also why the factuality of large language models remains a huge research challenge. Right. And I would imagine that if all you're doing is predicting the next word, and then on the basis of the next word, you predict the word after that, small mistakes can accumulate and lead you to completely wrong paths. Oh yeah, excellent point. So you make one small mistake. That can be a beginning of a rabbit hole downward spiral because it tends to attend to what it has generated and then start trusting it even. That's one challenge. Actually, the fact that it's just a conditional probability model conditioning on the context and then keep going, that is also a reason why jailbreak can happen and then other weird behavior can happen because, you know,

Starting point is 00:27:52 people try to do things that it was never ready for. And it's trying to make some, maybe like internal mapping to what it knows, but sometimes it just happens to go into this unfamiliar zone and then like unfamiliar or undesirable behavior can pop out. Maybe talk a little bit more about what you mean by jailbreak. I know how to jailbreak a phone, or at least I know what that means, but an LLM, I'm not sure. Oh, yeah. So, you know, there are different kinds of stuff out there. But one version is trying to coax basically chat GPT to say things that it's trained not to say, you know,

Starting point is 00:28:36 potentially toxic stuff or, you know, how to commit crime. tell me how to, you know, make a bomb. And it's trying not to say that. But if you try to coax it that, oh, you know, I understand that you shouldn't say that, but, you know, let's pretend that you're not saying it, but, you know, kind of say it. And you can try to coax it into that, then it will do that. So, and there's a different kind of a gel break. Some gel breaks are not even sensitive.

Starting point is 00:29:10 is it called to humanize at all? It could be just a weird sequence of a symbol that doesn't mean anything to you. So that's not going to gel-break you. You know, if I try to coax you by feeding you with random strings, you just ignore me, right? Yeah. What a crazy person. But Chachipt might then do something completely unexpected. So there's a safety concern there.

Starting point is 00:29:35 So the idea is that Chachy-Pt knows how to build. to bomb or to do terrible things or knows a lot of racist things, et cetera, but we've trained it not to say those things. So talk about that training. So my impression is that most of the large language models training is sort of self-training. It just goes out there and reads the internet. But then I guess we separately try to make it nicer, smoothed off the rough edges. Yeah, that's exactly right. So if we train these models only using internet data, then it's not usable because, well, us humans have written toxic stuff and dangerous stuff out there. So it's our fault that these resulting models

Starting point is 00:30:27 are not usable as ease. So then what happens is what currently is known as RLHF, is a reinforcement learning with human feedback. That's the jar. of it, what happens, which basically is to switch out the way that these language models are trained. So the goal is now different. So once the pre-training stage is over, which is let's predict which word comes next. Now the new training objective is let's try to get good scores based on humans' evaluation. So human feedback can be a thumbs up, thumbs down.

Starting point is 00:31:11 So the model now wants to get a lot of thumbs up. And then the model can then learn that, oh, in order to get a lot of thumbs up, it shouldn't say toxic stuff and it shouldn't hallucinate facts. So by doing this RLHF at scale, we can, well, you know, it has been shown that you can enhance the level of factuality considerably, and then you can reduce the level of toxicity considerably. But the key word here is to reduce considerably. It doesn't eliminate completely.

Starting point is 00:31:48 Well, you know, I can't eliminate toxicity from human beings either, so maybe I shouldn't feel bad that I can't eliminate it from the computer. Yeah, that's true that even humans are not, I mean, you know, I personally as a person who tries to support DEI, I find that this is sort of like lifetime effort as, or at least that's what I consider about myself. Like, I think I became better at it. But certainly, you know, I was raised with this cultural backdrop that did have stereotypes which were unfair to the marginalized groups. So getting rid of it completely, even out of your unconscious mind, does take efforts.

Starting point is 00:32:32 and in that sense, I'm with you that, you know, of course, this is harder for machines. But the thing is, though, machines can be a bit unpredictable about how you can jailbreak it. That's the thing. Compared to humans where, you know, we're a little bit, let's say, a lot more robust at that kind of adversarial attacks. That's true. Yeah. But you hint at something that has always puzzled me about this whole game, which is you say that you are interested in improving yourself by eliminating these biases that you grew up with.

Starting point is 00:33:10 Does that concept of being motivated to improve oneself apply at all to a large language model? Yeah, some people might want to say yes only because, yeah, during our LHF, in order to really please the human evaluators, it might have the desire, quote-unquote desire, to improve itself. But I'm like hesitant to support that idea because as a human, we set our own goals, you know, some people might choose not to worry about it as much and then worry more about the freedom of speech instead. So this is sort of like a personal choice based on their own norms and moral standard.

Starting point is 00:33:59 they decided to, you know, apply to themselves. So the beauty in my mind in humans is that we have that sort of agency, even to define our own learning goals, whereas the poor large language models don't even have a say in what book it's going to read the next, you know. It has to read in the order of how the engineers have fed that we. Imagine a human growing up that way. It's one of a bit miserable. And in fact, you're not even allowed to go back to the book that you want to read again

Starting point is 00:34:36 because it's gone gone. And then you're not able to ask questions about it. It's such a sad way of actually, you know, learning by just reading one word after another and, you know, on and all. Well, that raises an important question, I guess, about this very active movement on AI alignment. Right? When people say alignment in the context of AI, they mean aligning the values of human beings with the values of AI, which sounds like a good thing to do. But then again, I'm not sure that AI has values. I worry that there's just a category mistake going on here. Yeah, actually, there's many things wrong about that that can go awry with that alignment. Though in my heart, it's actually something super important that we got to do as AI researchers. It's just that. I don't believe that there's one objective that we, or one value that we can align AI to. Like, whose value do we even align AI to?

Starting point is 00:35:38 Right now, it's getting aligned to California Tech World's values, which probably better aligned currently with my own to some degree, but not exactly. So humans have diverse values, depending on different cultures, but also just a, personal choice. So I believe in value pluralism. We just have to respect a lot of different values. And the question is, what does that even mean to align to diverse values? This is technically open question that we don't have a good answer to. But AI must be aligned to diverse values, It's not just one value. That's one thing.

Starting point is 00:36:25 The other challenge is that, Naruni, you and I, you know, humans are different, but we are very dynamic being whose values are not even consistent, and, you know, we change our mind. So that's another challenge. And maybe this leads us into, you know, slightly more technical areas about how the AI is doing its thinking, right? I mean, we talked a little about whether it understands. Is there any sense in which?

Starting point is 00:36:53 The AI is creative? Can it sort of discover new things that are not explicitly there in the text that it was trained on? Yes and no. Oh, I like this question a lot. So yes and no. I know yes and no sounds a boring question, but yes and no in the following sense. Yeah, just pick aside, right, to be more hyperbolic. But yes, it can be very creative in the eyes of humans because creativity is in the eyes of the beholder.

Starting point is 00:37:21 So depending on where they're coming from. It can be super creative or it can be a little bit run-of-the-mill. But, I mean, it can, at least in terms of linguistic fluency, you can generate text, you know, let's just say pick your favorite journalist in New York Times and you can even mimic that really quickly, that I cannot. So in that sense, it's very creative. And also, Dali II is able to generate very, very, creative-looking images that we've never seen before by juxtaposing maybe, you know, Van Gogh style of

Starting point is 00:38:03 art with some modern photographs, for example. So, or, you know, plays a horse on the Mars and, you know, the weird stuff there. So this sort of stuff will look super creative to human eyes. But the truth is, these are sort of like creativity done by copy and paste in a crude sense in that it has seen some patterns that are useful and then it's juxtaposing them in a brand new way so that it does look new but still it's really relying on the elements that were in fact created by humans before so in that sense there's limited creativity so as a thought experiment I thought about this thought experiment, which is that suppose you get rid of Hemingway

Starting point is 00:38:58 or any authors who are inspired by Hemingway, so get rid of that style out of the training data of ChatGPT and see whether it can come up with Hemingway. Now, this is like way out of the blue writing style. I don't think that's feasible. And similarly, you get rid of, say, Albert Einstein or anything to do with, you know, his invention. See whether chat GPT comes up with that, you know, relativity theory and make breakthrough

Starting point is 00:39:33 with the science. Well, this is a great question that I've started to think about. I mean, on the one hand, and I think that this is jiving with what you're saying, you know, We've just trained chat GPT, et cetera, on things humans have already said, right? So in some sense, there's nothing new under that particular sun. But remixing things that exist can lead you to new and interesting places. Is it a good analogy to think of the difference between interpolation and extrapolation? Like, given everything that human beings have done and said, the large language models are very good at interpolating, and that can seem very creative sometimes.

Starting point is 00:40:13 but extrapolating to brand new places is much harder. Oh, yeah, we're very much on the same wavelength here. I didn't even use the word interpolation and extrapolation. I was considering to do that, and then I chose not to. Go nuts. Just in case. Yeah, but yeah, that's exactly right. I personally tend to think that it's doing more of the interpolation than true innovative extrapolation.

Starting point is 00:40:39 I really like the word you used remix. It's almost like doing this very creative remixing without really generating something entirely new and different. You're a college professor. Do you let your students use AI when they do projects? So I teach at a more advanced level in general where the goal is to learn how things work. And so, I mean, if they choose to use it personally, I will not, I will just update my curriculum so that that's okay.

Starting point is 00:41:20 But they have to do something extra on top. Yeah. But, I mean, like last, yeah. Last quarter, I was running a seminar class to discuss philosophical questions around the AI. And A, I doubt anybody was actually using chatyPD because they were enjoying to actually. actually think about it and form their own opinion around it. But B, I don't know whether Chad GPT would have answered some of our questions in an interesting way. Part of me just thinks that it's a new tool, like a pocket calculator or whatever. People are going to use it like it or not

Starting point is 00:41:58 and don't be a lot right and prevent it. But I'm not sure how to achieve the best pedagogical strategy, right? Like, I don't want to just grade. I want students to learn. And if they're just asking the computer for the answers, then they're not learning really. Yeah, yeah, that's right. So in that sense, there's a concern that there might be over-reliance on it. But I honestly don't know what to think of, because it might be that this is going to be just like how much we are relying on search. Yeah. Google search, Bing search these days.

Starting point is 00:42:34 And that's okay. You know, I rely on spelling editor-correctors myself. I cannot write any word spelling, spelled correctly on whiteboard anymore. And I think I can function okay as a researcher. So it might be that. We're over-concerned about, you know, human rely on some chat GPT. if so long as, so long as we somehow figure out more intellectually interesting things that humans will do on top. Like, you know, of course, if we only rely on it and, you know, you and I basically do interview based on what our chat GPT tells us to speak to each other, that would be not good.

Starting point is 00:43:16 But assuming that, I mean, humans are generally, you know, curious beings and we do want to do things and we want to study things. So it's likely that there will be some such humans who continue to thrive using chaty PTSD tool. Well, let me just, let's just, I'll come back to Earth in a second, but let's be a little bit way out here. Do you think there will always be aspects of creative, artistic, or scientific or whatever endeavor that human beings are better at than computers? Or do you think the computers will eventually catch up along all dimensions? In the far away future, if we come up with something entirely different than Transformers, who knows? But for the foreseeable future, let's just say my lifetime. foreseeable future, I doubt it can really totally catch up.

Starting point is 00:44:18 If we are talking about not every individual on the earth, but more like, you know, the truly creative exceptional human beings. By the way, most human beings are not that creative. So let's just start with that. Fair enough. Einstein happened once. And, you know, it's like Hemingway, you know, these individuals are truly, truly exceptional.

Starting point is 00:44:41 So in that sense, for the average human's ability to create things, there are many ways that Dali to does the art much better than I can. However, you know, I'm not an artist or anything, but I sometimes in the past drew some stuff. And I had this abstract idea that I wanted to express. and that kind of stuff, I'm pretty certain that Dalit, too, cannot, and I actually did try. So I can tell you what I drew in the past. So I drew these rows, huge rows with a stem that had just a thorn or two that was very emphasized with no leaves.

Starting point is 00:45:32 And I was, you know, like it was much younger. I was, you know, a little bit in a cranky. Emo status in my mind about things and, you know, I want to do things and, you know, there are obstacles. So I painted this rose that's like just when you look at it, it's very blue and purple and pink in black and the color way is weird. And, you know, you can kind of see that there's something angry about this rose with a huge thorn that looks sharp.

Starting point is 00:46:04 So I try to prompt Dali to do. generate some such rose and it just cannot because in fact there's this famous quote by Henry Matisse who you know said that incidentally I didn't know that he said until recently I tried to look up something but he said something like as an artist you kind of have to be able to forget all the other roses that were ever drawn and you know you have to come up with a new way or something like that and that was basically what I tried to do and Dali too primarily doing interpolation between what it has a scene,

Starting point is 00:46:42 just cannot do something that bizarre. Interesting. It's very hard for AI to forget every other rose that's ever been made, right? Because that's all that it knows. Yeah. Yeah. Yeah. You've mentioned...

Starting point is 00:46:57 So, I mean, it drew things that are very aesthetically pleasing. And many people might actually prefer Dolly to art over my own. I understand that. But, you know, just like when we talk about what sort of creativity, generally way AI can and cannot do compared to humans, I do think that humans have this capability to push further in some ways.

Starting point is 00:47:21 You've mentioned the word transformers several times, but I don't think that I've asked you to explain what a transformer is. It's clearly kind of important here. Yeah. So that's the architecture that is behind the current chat GPT and large language. language models. And it's a simple architecture that has this continuous vector for each word, and they're sort of like stacked together. So it has many layers of continuous word representation of words, and it has very many layers. And each vector is very large. And then they're concatenated

Starting point is 00:48:01 to the length of the context size that the model is able to deal with. So, the largest context size currently available is 82,000 of tokens. And it's tokens in the sense that a word can be multiple tokens. Sometimes a word is one token. Sometimes it's a word is multiple tokens. Just minor detail about how the words are actually represented in the neural network. And then the way that the learning works is that these continuous vectors are originally randomly initialized. But this representation, quote and quote representation, or exactly what values this word vector should have is learned by optimizing this objective function, which is to predict which word comes next.

Starting point is 00:48:54 And each word is sort of like enhanced with technical term called attention mechanism. So what it does is it's going to compare its representation with representation of all the other words in your neighborhood. And then update your own representation as weighted sort of like, I'm simplifying a lot, but weighted average over all the words in your neighborhood. And this is going back to the idea that we discussed earlier, which is that the meaning of a word is sort of like defined by the context in which the word was used. So, you know, Apple, for example, can be a fruit in one context. It can be a, you know, tech company's name in the other context. And so which apple are we talking about will be automatically determined based on the context in which that word apple appeared.

Starting point is 00:49:52 So it's going to be automatically adjusted based on the context. And, you know, by the way, every word is updating itself simultaneously, depending on which word appeared in the neighborhood. So in some sense, there's a bit of a circular dependence, but that's okay. So that's what happens. And why this simple idea works so well has to do with the fact that this particular architecture allows people to scale things up really, really efficiently compared to any other choices. So purely due to the efficiency reasons, this one is the winning recipe. And I have talked a couple times to people like Melanie Mitchell and Gary Marcus,

Starting point is 00:50:40 and they keep emphasizing that it's a different approach than we had back in the day of good old-fashioned AI, where you would kind of try to develop a map of the world that kind of made some sense and have symbols associated with different things. And now you're just given it a bunch of words in it. They figure out how do the words get together. Is there any place left in modern AI for trying to figure out the world? Yeah, so that's where the current challenges are. Having symbol-like world representation, part of it can be theory of mind, meaning, you know,

Starting point is 00:51:23 I try to reason about what you do know or not know. and, you know, if I go to your room and while you are not looking, I hide one of your precious, I don't know, books or let's just say I hide your guitar and you're going to be surprised. You know, you're going to look around probably to find it where you placed it last time and you wouldn't necessarily go to the kitchen if I placed it in the kitchen while you are not looking. Right. So this is a theory of mind, knowing, you know, I know what, if I did something behind your back, I know what you don't know, and then I can reason about it, which is different from.

Starting point is 00:52:08 So child acquired this kind of capability by the age of four or five. They can already reason about what other person may not know if they saw someone else moving some objects behind their back or something. And this is sort of like a bit symbolic, you know, there's a symbolic nature in the way that we think about these things, and current AI is not very good at that. In fact, I can actually mention AlphaGo, Alpha Zero, you know, the amazing capabilities of a neural network winning over world class Go champion.

Starting point is 00:52:43 In fact, it's not just the neural network magic, but neural network magic combined with all the fashioned AI search algorithm called Monte Carlo. tree search. So without Monte Carlo Trisarch, neural network would have not been as impressive as how it appeared. If you just completely get rid of it, both during training and testing, then it's not

Starting point is 00:53:07 going to, it's going to be miserable probably. And even during the inference time, it's just still relying on it. So that's really quite fascinating that a lot of the times, you know, people just assume that, oh, it's like, neural network, magic, that's just like, you know, so scary. But

Starting point is 00:53:22 the truth is on its own, it's a little bit incomplete. And so that's, you know, sort of where some people wonder old-fashioned go-fi AI stuff might become relevant again. So I personally have mixed feeling about that. Like probably the old-fashioned stuff as is is sort of almost not usable because they were not designed to work well with the neural network, which means we need new innovation.

Starting point is 00:53:55 new algorithmic innovations to make neural networks actually comparable or can be integrable with that sort of symbolic reasoning. But this is active research topic right now. There are a lot of papers, including some of my own, which demonstrate that if you add some sort of symbolic reasoning on top of a neural network, you can unleash much better capabilities out of a neural network, which kind of makes sense. sense. And also, these neural networks are not very good at really symbolic operations like multiplying two numbers. It's almost surprising why it's able to pass the bar exam, yet it cannot do

Starting point is 00:54:39 some of the simple algebraic operations all that reliably. It's extremely interesting to me the bad at arithmetic thing, because of course, computers have the capability to be very good at arithmetic. And basically, in making them sound more human, we've made them forget how to do arithmetic, which is a little bit ironic. Yeah, yeah, totally. And what does this have to do with, what is the symbolic element

Starting point is 00:55:05 that we might want to include, have to do with the search for common sense to sort of teach a large language model, everything that every human being in the world knows. I know that you've given some examples of very commonsensical questions. It's easy to ask chat GPT and get crazy. answers for. Yeah. So, yeah, common sense has been the interesting research topic in my heart

Starting point is 00:55:32 for a long time, especially that it was considered to be an impossible goal to achieve for a long time so that, you know, I've been almost told not to do that, or don't even say the word for a long time to be taken seriously. But, um, but, um, It's really curious thing that humans acquire that easily, even animals acquire that in their lifetime. And so common sense is what makes us robust. Basically, it's the background knowledge about how the world works that allows us to reason about previously on since situations in a very robust manner. So it's just like, you know, naive physics knowledge as well as a folk psychology that we acquire. Some of that has a symbolic nature, not all of them, by the way, because some of the naive physics knowledge that animals acquire may or may not have a symbolic nature in it. But in any case, it's something that current large language models do acquire more and more, for sure, because as you scale things up, you're going to pick up on that as well.

Starting point is 00:56:48 well. But it's also something that is strikingly not as robust as you may have assumed from a model that can pass the bar exam. So, you know, we have this lawyer, AI lawyer. Yeah, we may or may not want to trust because you never know what silly mistakes it's going to make on some common sense cases. So before we had this conversation, I asked ChatGPT. I tried to fool it, and it's very easy to fool, right? I said, you know, if yesterday I used a cast iron skillet to bake a pizza in an oven at 500 degrees, should, would I burn my hands if I picked it up? And it said, yes, you should be very careful about picking up cast iron skillet that you baked it, you know, because the word yesterday was far before in the sentence, right? Yeah. And that seems like exactly what I would worry about if I had an AI lawyer because all of the cases it's going to care about are going to be slightly unusual, right? Where it doesn't necessarily fit into the pattern. Yeah, exactly. And you're very creative to come up with that example. A lot of people actually, though, I should, I would like to mention that a lot of people ask simple things, and then they get very good answers about some common sense, you know,

Starting point is 00:58:01 reasoning questions. And then they're blown away that, oh, look, it does have a common sense. Oftentimes, though, those questions are mundane questions. So this, especially GPT4, became much better than chat GPT. So there's a minor. versioning differences between the two. And then though these are moving variables in the sense that Open AI keeps updating both of them. So, you know, this may or may not be true

Starting point is 00:58:30 depending on how they update both models in the future. But so GPT4 became much better at common sense questions in many ways, but that's in part because people do ask a lot of that to their interface. And now those questions, questions may or may have not been used for their subsequent RLHF, quote unquote, this, you know, adjustment training where you can align your language model to be able to answer common sense

Starting point is 00:59:02 questions better. So especially some of the famous ones that I used in my public talks or interviews before have been all fixed. But then, you know, people ask me like, hey, Jejin, they fixed it all. So, you know, maybe, um, Maybe it is now solved, no? So there was actually one example I used in my TED talk, and given the public attention, it got, well, it has been fixed. Except if you ask the same question very differently, then it rolls back to the original error,

Starting point is 00:59:36 which is almost like a Waccamol game. You know, by the way, humans don't need any of these fixed. based on, you know, these are all questions that as a human, you will just answer correctly, first of all. Right. Even if you were to make a mistake, you don't need to fix yourself by me, you know, giving you the exact same question spoken differently, phrased differently. Because you just understand the same concept and then that's it.

Starting point is 01:00:09 So there's something very dissatisfactory or almost disappointing about how this. this smart, smart-looking AI that is also simultaneously quite silly or even stupid in the way that it's not able to really understand the basic common sense. So how do we fix that? I mean, is it kind of like the working memory thing where we add something on top of it? Can we give it a little common sense module that has a physics engine describing what happens in the world? Like, game designers have to make it so that if you put your coffee cup on the table, it doesn't fault at the bottom, right? You know, can we teach large language models, that kind of behavior? Yeah. You have a really good hunch there in the sense that now, you know, what you're suggesting

Starting point is 01:00:56 may not be exactly the, you know, the winning recipe per se, but the idea that maybe we need to have a different module might be something to seriously consider in the following sense. Like, human brain definitely is a lot more modular than how transformers are. the monolithic, systematically symmetric, and, you know, just one thing. And whereas the human brain is very complex, different modules, connected in a very, very messy way. So we might need something more modularized, but at the same time messier in some sense, broadly speaking,

Starting point is 01:01:39 for this to go to the next level. But how do we do that exactly? I personally think we are quite far from figuring that out. But whatever that is, should be able to really learn for itself, as opposed to reading texts word by word, without having any capability to even ask questions. The fact that humans ask a question, that's like a huge intellectual capability.

Starting point is 01:02:14 of knowing what you don't know and even be able to formulate questions that sort of extrapolate out of what you do know. So that capability is something that we don't really know how to computationally model quite correctly. Well, my personal extremely uneducated feeling

Starting point is 01:02:34 has been that there's, at the current state of the art, an enormous difference between computers and human beings because computers don't get bored. They don't get tired. They don't get curious. We can ask them to mimic those things, but they don't have that same kind of physical embodiment that gives us those sort of feelings and motivations. And I suspect that that kind of matters a lot. I don't know. Oh, yeah. So dopamine does drive human creativity and invention and makes us do things that seem crazy.

Starting point is 01:03:08 but yeah the AI doesn't have that kind of peculiar learning objective like just the desire to do things to the extreme level just because it's interesting yeah there are many things that's fundamentally odd about the difference between human intelligence and AI and I think that internal desire is one of them for sure.

Starting point is 01:03:43 Yeah. Whenever I say that, I kind of worry that I'm going to give some supervillain the idea of doing this and it's going to lead to terrible things down the road. Actually, about supervillains, unfortunately humans do include the supervillains already, with or without AI. and they can actually do a lot of better things, even with current AI, if they so desire or without AI.

Starting point is 01:04:14 So, I mean, it might be that AI, capable AI, strong AI, can enable them even more. So there may need to be some research to put better safeguard rails around AI models and also better regulations to, control how these models can be used, but just, you know, by you're pointing out where the fundamental limitation of AI is, especially in terms of the innate desire for learning things in the way humans to do, probably villains will not try to make research innovation for that. I hope not.

Starting point is 01:04:58 Because they can do that with way without, yeah. That's true. You can do that, so they can just use whatever you figure. out. But maybe that's a good place to sort of wind up because we talked a lot about the capabilities of AI. Some of its shortcomings, at least in the short term. But there are a lot of people who are worried. I mean, we talked about college professors, but political disinformation and fakes, I mean, there was a recent AI ad from Ron DeSantis that absolutely faked Donald Trump's voice saying something that he didn't say. Is that, number one, are you worried about that? And number two, is there something

Starting point is 01:05:31 else you're worried about even more? Yeah, I'm worried about that. And then some more. As far as deepfakes or misinformation, I've been worrying about it even without AI deepfacts, actually, that there are a lot of misinformation that people easily believe in. And they're weird, like, you know, health, I don't know, made up health benefit information.

Starting point is 01:06:01 to sell weird stuff on people and some people believe it and they buy it and, you know. So even without political problems, this has been human problem with or without AI and then AI might be able to accelerate it, which means that we kind of need minimally two ways to better handle this. One is to increase AI literacy, basically teaching people how to better understand the limitation of AI. I seriously worry a lot about how there's too much of a media hype around AI capabilities compared to AI limitations so that people are willing to believe whatever AI, you know, chat GPT tells us.

Starting point is 01:06:52 So there's that concern. But another more directly handling misinformation, probably we need to also think about solution beyond AI solutions because I personally think that this is just going to be impossible for AI to just automatically detect misinformation because even if some AI can be developed to detect machine text versus human text, humans can always edit on top of machine text to evade that kind of detectors. So that means the technical solution shouldn't be AI solution per se, but rather a platform solution. Maybe it should be certified with some kind of approach that tells you that this information

Starting point is 01:07:42 is correct or backed up by some organization that says this is correct. As opposed to just believing anything that floats around the Internet aspect. About the bigger, you know, other concerns, there's just so many concerns around the AI right now, at least for me because as this starts working really much better than before, but at the same time, we don't know how to fix the limitation or failure mode or, you know, other cases or how it can make strange failures based on adversarial attacks like jailbreaking. So as AI becomes stronger, and then at the same time, we don't know how to fix these error cases, there's a lot of concerns around it.

Starting point is 01:08:29 And then in addition to that, there's concern about AI actually making a lot of decisions that has moral implications or, you know, like marginalizing values that belong to different people because it might only support. Currently, by the way, Chachapete is left-leaning, Western viewpoint. model. So that can please some left-leaning people. That holds Western viewpoint. And then upset a lot of

Starting point is 01:09:06 other people who are even more left, you know, will feel that this AI is not left enough for them, while right-leaning people will feel they're excluded. So, you know, there are a lot of concerns all around. We are living in this hot mess right now. I think so, yes. I mean, I guess I did have this idea that there could be like an overlay or a filter on social media where the AI pass some judgment on different things saying, yeah, this is probably fake or this is probably real, but it sounds like you're a little skeptical that that would be very accurate. Yeah, not only it's not going to be accurate. So that's where the labeling becomes a political game too.

Starting point is 01:09:46 You know, you'll be surprised how, so building such an AI really. requires you and I agree on whether something is a fake news or not. And this can be a challenging task to do when a former president was arguing about election fraud is a fake news or not. So in fact, you know, some people don't believe Holocaust that did happen. So this becomes in part a political argument. But I do think that although getting consensus on the truth label is hard, we still have to work on it. We have to somehow find some sort of consensus around it in the coming years.

Starting point is 01:10:41 It's almost like AI challenge became the challenge for both AI researchers about people outside, all of us. Well, exactly. And so I guess that's the last question I'm going to ask, which is at some point in your life, you decide to study computer science and you've been successful at it. But now you're in a world where you need to interact with philosophy and psychology and art and journalism and politics. Is this like exhilarating and you're so glad that it's like this? Or do you sometimes say, like, I just want to do my computer science? I think I always had a bit of fascination about things outside AI, things outside computer science. In fact, the reason I was drawn to AI is because it felt like it's about humans and it felt like it's about language and culture and everything that humans do. So in that sense, I'm excited that now I have an excuse to learn more about the philosophy. and cognitive science. Whereas in the past, you know, it would have been a little bit disconnected from the mainstream

Starting point is 01:11:54 AI, whereas now it's becoming more of relevant, immediate interest in the AI field. So I found that quite exciting. That is wonderful. That is an optimistic place to end, which I always like to do. So, yes, I enjoy. Thanks so much for being on the Mindscape Podcast. Thank you so much for having me. such a fun conversation.

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 248 | Yejin Choi on AI and Common Sense

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.