Into the Impossible With Brian Keating - How to train ChatGPT to serve you | AI Legend Terry Sejnowski [Ep. 475]

Starting point is 00:00:00 It's peak pollination season and my business is scaling fast. To keep the nectar flowing, I need a phone plan with top priority data speeds. That's why I chose Google Fi Wireless. My connections stay strong even when the hive is buzzing. Plus, unlimited plans started $35 a month. Now that's a deal that doesn't stay. Explore GoogleFi Wireless plans today. Plus taxes and government fees.

Starting point is 00:00:24 GoogleFi Wireless is not subject to data traffic deprioritization during times of high network usage. You said this place was steps from the water. We just haven't found the steps yet. How much did we save? Enough. Enough to get lost. Or you could book a stay with Hilton. Welcome to your ocean front room.

Starting point is 00:00:45 Just steps from the water. The Hilton sale is on now. Book on Hilton.com or the Hilton app and save up to 20% to get the stay you expected. When you want savings, not surprises. It matters where you stay. Hilton. for this day. Chat GDP is judging the intelligence of the person asking the questions.

Starting point is 00:01:04 Deep question, I'll think of a deep answer. Or a silly question, I will give a silly answer. In other words, it's in a sense of mirror. Everybody compares it to humans and I think that's the wrong way to think about it. It's a tool, it's like a shovel and you have to know how to use the tool properly. But if you do, you can actually dig much deeper than you could with your hands. Any sufficiently advanced technology is indistinguishable from magic. the pod bay doors. Welcome back to the Into the Impossible podcast. Today we are exploring the nature of minds, brains, and even whether or not these new and impressive tools like chat chit are actually doing thinking. And I'm joined by a good friend of the campus and of all of science in particular,

Starting point is 00:01:56 Dr. Terry Sadozky of the Salk Institute and also join employment at UCSD biological neurobiology, right? That's correct. Good. And we have a lot of friends in Tomlin, Andrew Huberman, Gentry Patrick, both, well, Gentry was a guest on the show. I'm trying to get Andrew on maybe for his new book. Today we're talking about your books, and I have two of them. This one I'd wanted to have you on, the Deep Learning Revolution, which was really prescient, and it did kind of presage a lot of the stuff that we're talking about today in your new book and that everybody's talking about.

Starting point is 00:02:26 And that's ChatGPT. So I'd like you to do what you're not supposed to do, which is to judge a book by its cover. Okay. So the title, subtitle, and the cover art. Well, first of all, let me say that the MIT Press gets awards for their art. They're very, very design and high-quality paper and that sort of thing. So the background design actually mirrors the one for the first book. Yeah.

Starting point is 00:02:51 If you look at the cover, and it's kind of an abstract version of a brain, you know, which is where we are because of the fact there's a convergence going on between the study of scientific study of how brains work, but also now creating AI, that's an engineering problem, are now using the same mathematical framework. And so this is a great opportunity for exchange of ideas, concepts. And now we have these chat GPT and other large language models, which, you know, we don't understand how they are able to accomplish all the different things that we ask them to do. The book is sort of an interesting, almost like a collaboration with AI, you know, which is something that we tell our students not to do.

Starting point is 00:03:39 But you kind of envision and interact with chat GPT throughout the book. So tell us about that. What led you to choose that sort of format? So as I was going through and writing the book, what I realize is that, you know, people learn with concrete examples. I mean, and also with the things that they may have heard about. And I wanted to actually give these concrete examples in the form of a dialogue that I had. And I was actually, what inspired me at the beginning was an article I read in The Economist,

Starting point is 00:04:16 which was before chat GDP was public. It was available to academics. And there was the article was talking about it and asked two different people. you know, do a dialogue and come up with their conclusions. And I'm not going to give names here. I'm just going to say that. One of them was based on theory of mind. And this is the idea that I understand how other people's minds work. And that's why you can make predictions about things that, you know, are very, very abstract and having the social issues and so forth. But this is another person to understand about something and whether they see it or not. In any case, it passed the test.

Starting point is 00:04:56 and that was amazing because at that time no one really understood what the real capabilities were. And the other person, these are both very distinguished people, by the way, asked questions like how, when was the Golden Gate Bridge taken across the Nile River or the Gold Goulda? No, the Desert, the Gobi Desert. Yeah, exactly. And the answer got back was, well, October 2nd, 1964. And questions like that made the response of that. particular dialogue was that it was clueless. In fact, it didn't even, it had no clue that it was

Starting point is 00:05:32 clueless, right? And so here's these two extreme conclusions. There's a puzzle. How could that be? And I finally traced it down to the prompt. So the prompt is what you ask the chat GDP to do, as you know. And by modifying the prompt a little bit, I was able to figure this out. I said, I want to repeat exactly the questions that the second person gave it. But I also, in the prompt, said, if it's a nonsense question, say nonsense. Right. And now I gave it exactly the same questions, and it came back nonsense for each one. So clearly, what's going on here is that you ask it a stupid question to get a stupid answer.

Starting point is 00:06:16 In fact, it's kind of playing with you, right? And then I traced it back to another concept, which, you. which is the fact that it can take any persona, because it's been trained on the world's knowledge base, and novels, newspaper articles, computer programs, right? And so it can take any one of those persona. You can ask it to write a short story in the style of Hemingway, and it will. Right? It can take that persona. So you have to tell it what persona you wanted when you're asking it questions. Now it's elementary. Everybody knows that. But back then they didn't.

Starting point is 00:06:54 And so if you jump in with, like I say, a question that is in the ether, you don't get any reasonable answer because ChatsyDP doesn't know. And I call this the reverse Turing test. Turing test is to judge, a human to judge from the responses whether or not you're interacting with a human or an AONI, right? Yeah. Well, Chachadip obviously has passed that test long ago, right? Some say it has.

Starting point is 00:07:20 Some say it's yet to pass. but but uh i i think that this is as close you know if you were talking the bar can always go higher yeah but right now i would say for most people it's it's pretty close so here's what i think is happening which is that chat dp is judging the intelligence of the person asking the question in other words is this it's training us a deep question or i'll think of a deep answer or is it a silly question i will give a silly answer in other words it's it's it's it's It's in a sense of mirror. I call it the mirror hypothesis. And I have a whole chapter on the power of the prompt.

Starting point is 00:07:57 So this is, now everybody knows that, you know, you have to be, have to have a lot of practice and you just can't use it out of the box. You have to really understand how to use it. Yeah. They were talking, you know, recently about, you know, prompt engineering as a subject we might teach in the engineering school. And I thought that had to do with like how close to on time are these engineers, really. They have to be prompt.

Starting point is 00:08:18 It's the sign of a good character. But no, in reality, I think that that is correct. And I wonder, you know, to what extent are they training us? Are these, you know, transformers transforming us in the way that we think, you know, maybe not the level of, you know, homo deus or whatever people kind of conjecture? But, you know, is there this feedback iteration process in which case there's some level of maybe collective memory and sentience that's using their skills to improve our ability to interact with them? And you and I both know, teaching a graduate course is sometimes more enjoyable than teaching, you know, 101 or 100. or one level here we have one level, because it's closer to our level of comprehension. And sometimes the students teach us things. So can these things help us? Will we be modified or are we already getting modified by these guys? Well, my first book, I conjectured that the deep learning revolution was going to make us smarter.

Starting point is 00:09:11 That is to say, make us more able to solve more problems, faster, be more productive. And I would say that that's pretty much played out in the sense that. that a lot of people are now using it, you know, like computer programmers, it's actually almost universally used now, to do kind of the basic outline of the program and saving a lot of time. And it really is now, the way to think about this, everybody compares it to humans, and I think that's the wrong way to think about it. This is, it's a tool, it's like a shovel, and you have to know how to use a tool properly. But if you do, you can actually dig much deeper than you could with your hands. hands. Yeah. That's a great analogy. I think of it as a tool like a lawyer. You know, it's, it's incredibly smart, but also, it's incredibly expensive if you don't do ask it the right

Starting point is 00:10:02 things. I mean, you can ask a lawyer to review your kid's homework and they'll charge you $5.75 an hour. It's not a great use of time. But I wonder, you know, just to push back with respect on that concept of making us enhancing us, you know, to date, most technological advances have made us worse off as a species. I'm thinking about, I mean, I can remember my childhood phone number you know, my home in New York in Westchester County. I can't remember, you know, my best friend's phone number, you know, lives around the corner. I don't know, Gentry Patrick's or injuries numbers. I just don't remember them because I've outsourced that to a pointer in my brain and onto my phone to know where to look.

Starting point is 00:10:38 Similarly, directions. I'm actually a private pilot. I fly, you know, little Cessnaws around. And, you know, we call it being the child of the magenta line. You get in the plane, you take off, you push autopilot, and then you sink to the... the navigation headache at it to get this magenta line that takes you where and it can do the most complicated things unheard of decades ago but it's made me less capable as a dead reckoning i used to be i flew all the way once from from elmonte airport in pasadena to san an monica without refueling now but in all seriousness i could do much more much greater you know airmanship we would call it or dead reckoning or now i can't do it i got gps on my phone i never do it so is this the first case of

Starting point is 00:11:19 technology making us better you point out of something really important, which is that when we start using tools, we become dependent on them. And also, all new technologies have both benefits and risks. And so maybe the loss of abilities like that, like map reading or dead reckoning, might be one of those that you might lose. My guess, though, is that you would come back really quickly if you had to, right? I mean, if you needed it really Yeah, a couple of Carrington events Yeah, it's somewhere in your brain, but the problem is recall.

Starting point is 00:11:59 But here's an example. Okay, so do you remember when calculators, hand calculators were first being sold? They were very expensive, HP calculators. And I remember everyone saying that, oh, now nobody's going to learn, arithmetic anymore, right? They're just going to use their calculator. And I think that, you know, we still teach arithmetic in school, and it's important to get an intuition for numbers, you know,

Starting point is 00:12:33 numerancy. So, and, and there's actually a movement now in California, as you may know, to get rid of wrote learning, what they call rote learning, which is practice. You know, memorize your multiplication tables. But I think it's absolutely essential as a foundation for being able to have a feeling for numbers, for being able to estimate order to magnitude and so forth. The calculators have found their place is when you've got a lot of numbers to add up that you're not going to do

Starting point is 00:12:59 your head. I use it for that, but I just have to order of magnitude something and I do it in my head very quickly at least to know if I'm close. So, you know, it's a two-inch sword. And I think that in the future

Starting point is 00:13:17 I think our children will be living in a very different world and taking things that we take for granted now, they'll be doing it in a completely different way. But that's always been that case. I mean, this is nothing new about it. Your summer starts now with Memorial Day deals at the Home Depot. It's time to fire up summer cookouts with the next grill, four-burner gas grill,

Starting point is 00:13:42 on special buy for only $199. and entertain all season with the Hampton Bay West Grove seven-piece outdoor dining set for only $499. This Memorial Day get low prices guaranteed at the Home Depot. While supplies last, price invalid May 14th or May 27th, U.S. only exclusions apply. See Home Depot.com slash price match for details. Right. Albert Einstein said he had what he called the happiest thought of his life, which is that an observer in free fall would experience no gravitational field. And that led to the notion of what we call the Einstein.

Starting point is 00:14:15 equivalence principle, which is underpinning the bedrock behind general relativity, which is how we navigate in four dimensions and space time and how GPS satellites do their work so accurately. But I wonder, and I always suggest that I'm not threatened by AI. I'm an experimental physicist. We just took a tour of my lab. I want to highlight that in a minute. But theoretical physicist, I'm even saying to them, you shouldn't be too nervous because I don't believe that a computer system, any system, no matter how many GPUs it has,

Starting point is 00:14:45 can a viscerally sense what that sensation that we all know right now think about going on a roller coaster or on a plane and going over a bump and you have that pit in your stomach sinking feeling so can a computer visualize that or embody without embodiment so to speak and then second of all what does it mean for a transformer to have a happy thought that Einstein said titillated him beyond repair so what do you think about this that we're safe we have full employment prospects there was an article little essay in New York Times by Noam Chomsky, who's a distinguished linguist. And it was his take on Chat Chupy. Now, he's a linguist, and he actually put a lot of store in syntax as being kind of a core central to expressivity of language.

Starting point is 00:15:33 And he's right, that infant number sentences, right, that we can create. And in any case, but he was saying, no, these Chat Chupis, you know, they say. sound like they know what they're talking about, but in fact, they can't think. They're not, they're not capable of thinking. Oh, wow. Okay, so he gives an example. Here's his example. Ask the following question. I'm holding something in my hand, and I suddenly opened my hand. What will happen? Well, of course, everybody knows from experience it. It will drop, right? Because you've experienced that. Okay. Let's ask a counterfactual. question. Would if there were no gravity what would happen? Well, we know about space, it would just

Starting point is 00:16:24 float away. Right. And so, you know, he was in the series of questions like that. And so I said, well, let's do an experiment. I believe in the experimental method. And so I asked the question, said, chat, GPT. You know, what will happen if I am holding something and I let it go? it will drop down. And actually, the next question was, why did it drop down? And the answer comes back, gravity. And then I ask, if there's no gravity and I let it go, what will happen? Well, it will just float away.

Starting point is 00:17:00 In other words, without ever having experienced gravity or any, you know, experiment like that, you know, no gravity, it was able to answer those questions. Now, at the end of his little essay, about thinking, he said, that's thinking. And I basically showed that if that's thinking, then, well, it's passed the test, right? So what's amazing is that without having any sensory experiences or motor output, you know, it has completely gotten all this information and insights out of words. And the fact, it was trained on a very simple task just to predict the next word

Starting point is 00:17:41 in the sentence. And it's a lot of words, truly. of words. And it got better and better and better and better. And in order to do that, it developed inside of this very large network with trillions of parameters, weights, a neural network. It developed a semantic representation of the sentence. The only way you can answer sentences that are complex is to know what the words mean. They have multiple meanings in that sentence, in that context. And so it must have read a lot of stories about gravity or something that allowed it to be able to answer those questions. I still feel like I'm safe, though, or my theoretical colleagues down the hall are safe. And I'll give you another example that I did with an undergraduate

Starting point is 00:18:28 this past quarter is very bright. Undergraduate Evan Watson, he's on the graduate school market. So anyone listening out there, I have a big EDU audience. Please be on the lookout. And we ask the following question. If chat GPT, any LLM, Gemini, pick your favorite, or symbolic regression or whatever, was around in 1913 before Einstein came up with a retradiction of why Mercury's orbit was so unusual, right? So the anomalous advance of the perihelian of Mercury's orbit, it's an ellipse, but it doesn't come back to the same foci every time or the same abscess, so the actual structure of the orbit precesses. And it precesses a minuscule amount. Actually, there's a miniscule amount. Actually, there's a minuscule, sorry, it's a large amount, but most of that's due to classical gravitational

Starting point is 00:19:13 effects. And what Einstein showed is a tiny effect, five or six percent effect, is due to the curvature of space time that is produced by the sun. So we asked the following question. If chat GPT was around in 1913 and had access to every single orbit of mercury going back thousands of years and we can calculate what its orbit was, returidic what its orbit was. We just fed it to it or an LLM system. Could it come up with a necessity for curved space time. And it just utterly failed. We had to, and I'm not saying we did. Oh, you actually tried this. Yeah. So I'll, I can share with you my student senior thesis. He's working on. Oh, that's a nice test. Yeah. And so it was unable to do that. And maybe, you know, we're not the best prompt engineers or we're not the best, you know, but we set it up. And

Starting point is 00:19:57 and there are people that have done this, Miles Kramer and UK has looked at the similar types of question where we've been conversing with those. Well, did you prep it with, you know, the feel equations or? Well, so if we, we wanted it to come up with, first of all we wanted to discover that there's something anomalous, just from the orbit, you know, knowing what it knows about orbits, Keplerian. And then feeding in from JPL's Horizon database, we fed in, you know, literally thousands of orbits of Mercury, both real measured, you know, from radar reflection measurements off the planet since the 60s to, you know, before in ancient times. And we found we had it just kludget by putting in a gravito, basically simulate gravitation using electromagnetic type

Starting point is 00:20:40 forces. And the reason was it is essentially discretizing, you know, and making this massive grid out of, you know, space time. But it doesn't know how to curve the paths, be points between space time and represent them in a faithful way. So I'm not a doomer, you know, I don't know what you call me. I'm an optimist. I love playing with these models and I enjoy it and we're doing research with it, but the question of, you know, can it come up with something novel? What you described is true. You know, I mean, my six-year-old can do that too, but I'm not saying, you know, that that's a bad question or Chomsky's question, whatever. But the point is for something that's, you know, you say to it, I want a theory of everything that unifies gravity and, you know,

Starting point is 00:21:20 go to it. It seems to me the answers I always get is, well, you know, it's not trained on sufficient data yet and I always say well do we really need for the for the next fast and furious movie to come out to get a theory of everything like it seems ridiculous okay okay no fair enough and so it failed on the general relativity test but you know that's a really high bar Einstein is a really high bar and I think that it's a little unfair at this point know where are we where are we with this it's only been chatty p's only been around for a couple years right yeah and we are at the stage of the Wright brothers on the very first flight. The first flight went 10 feet up and 100 feet forward before it crashed. There's a book out biography of the Wright brothers. It's very well written. Oh, McCullough. Yeah, McCulloch, yes. That's right. And one of the things that they had difficulty with is controlling it. Yes. Making a turn in the direction they wanted it without crashing. And that turned to be the very difficult problem. And it really required a different mechanical system. They were trying to curve the wings rather than

Starting point is 00:22:29 put a flap on it. So what happened after that first flight was incremental advances that made it safer, go faster, and so forth. And interestingly, they were inspired by birds. Birds, yeah. You show pictures in the book, yeah. Yeah, which were gliding with very little power. And then they put together a wind tunnel and tested airfoils for lift. And, you know, these were, were very serious. They were able to actually create a good design. So one of the things that they looked at the feathers, right, and said, this is, you know, amazing.

Starting point is 00:23:05 It's very light, but it's very strong, and it has a lot of areas. So they used canvas over spars, which is, you know, a good way of doing it. The military at the time was building metal airplanes that, of course, crashed and burned. Langley, yeah, right, famously. Yeah, famously. What I'm saying is that often to get things off, the ground requires ingenuity entrepreneurs, and that's what we were back in the 80s. And then in order to be able to get something that actually can be practical, requires

Starting point is 00:23:34 an enormous amount of effort by a lot of people who are improving it, making it go faster. And back in the 80s, we didn't know whether it would scale. And now we know it's 40 years later. We know that it scales beautifully, just like your brain, by the way. There's the brain right there. But the amount of computation required was absolutely enormous. Yeah, I mean, your brain is really amazing. It has 200 billion neurons.

Starting point is 00:24:05 It has about 100 trillion synapses, which is vastly more computing power than we have right now in these large language models. But it only runs on 20 watts. Yes. You don't need a nuclear power plant. But it only processes it 10 bits per second or something like that. I mean, the visuals, obviously, we're getting terabytes of data every second. Yeah, yeah. Though this is actually a very interesting point, which has to do with matching the speed of processing to the speed of the world.

Starting point is 00:24:43 I mean, the world doesn't, you know, work on the nanosecond scale, except if you're a particle business, right? But as long as the information can flow through the network fast enough, and in parallel, you can do the computation in real time, in real time. And traditionally, I never could do that, right? It had von Neumann architecture. And going parallel, massively parallel, with a very high degree of interconnectivity. And the secret sauce was learning, learning all the parameters, the weights between the units. which are like neurons,

Starting point is 00:25:21 that really turned out to be the architecture that got us to where we are today in modern AI. Yeah, you played a big role in that with your colleague, Jeff Hinton, who just won the Nobel Prize, and you were just in Stockholm to celebrate it. I was. It was magical. I really, it was unbelievably dramatic, I would say,

Starting point is 00:25:40 the way that they presented it, just everything was perfect. It was, in fact, you couldn't get in unless you had white tie. Tails. Yes. When you need to build up your team to handle the growing chaos at work, use Indeed sponsored jobs.

Starting point is 00:25:57 It gives your job post the boost it needs to be seen and helps reach people with the right skills, certifications, and more. Spend less time searching and more time actually interviewing candidates who check all your boxes. Listeners of this show will get a $75-sponsored job credit at Indeed.com slash podcast. That's Indeed.com slash podcast. Terms and conditions apply. Need a hiring hero?

Starting point is 00:26:18 is a job for Indeed sponsored jobs. I won't ever be able to get in after my first book, losing the Nobel Prize, which is kind of an assault on some of the lacunae that I find with it. But that's beside the point. So talk about that, the reason why you were invited. I mean,

Starting point is 00:26:34 it's because you're, it wasn't because you're, you know, family with them. You actually and Jeff and, and it's played a huge role in the initial development, not only of, I want to make sure that people understand your contributions,

Starting point is 00:26:46 not only to, you know, neural network, but also like parallel computing. I mean, you made really fundamental research contributions and it's curious to me always why you're at the Salk and not in the engineering department. So talk about those issues. How did it come to

Starting point is 00:27:00 land at Salk and how did you come to land in Stockholm? Okay, well, long story, but I started in physics and got a I worked actually my master's degrees with John Wheeler on relative

Starting point is 00:27:16 astrophysics, so I have a little bit of background there. But then my PhD is with John Hoffield, who was one of the Nobel Prize winners. And it was early days, and there wasn't a lot of work, computational or theoretical work in neuroscience. It was very empirical. But I thought that we could help with trying to extract out some interesting new insights from the data that was being recorded, one neuron at a time from the visual system, for example, that's how I started thinking along those lines. And just at the time when I was getting, I figured that I really need to know more about the brain, right? So I did a postdoc actually at Harvard Neurobiology with Stephen Cuffler,

Starting point is 00:28:04 who was a very famous neurobiologist. He's really one of the founders of that field. And I realized the complexity of the problem. The brain is incredibly complex. every level. You know, from molecules all the way up to the behavior, it's really, really evolved by hundreds and millions of years. It was at that time that I met Jet at Jeff at a meeting. It was a meeting he organized here in UCSD. He was a postdoc there. I was a postdoc at Harvard. And it turns out we both had the same intuition. We were both interested in neural networks. We were both interested in AI. Here is our intuition. The only existence proof that any problem in could be solved was that nature had solved them.

Starting point is 00:28:48 And so we decided to take a look at how that was done. And really what we were interested in is the architectures and the algorithms. The architecture, as you know, is massively parallel, a lot of interconnections. And, you know, the secret sauce is learning. And that was something we wanted to really develop that and scale it up. But at the time, computers were really slow. by today's standards. And memory is very expensive.

Starting point is 00:29:21 And so progress is slow. But we did something which was really quite remarkable. And it was probably at that time, I was an assistant professor at Johns Hopkins in biophysics, and he was an assistant professor at Carnegie Mellon University in Pittsburgh, in computer science. And we heard John Hopfield give a lecture, which was about his Hopfield equation in some recent work that he had done. I just read an article by Scott Kirkpatrick, who was at IBM, condensed matter physicist, about simulated annealing. And the idea was that you can heat up a

Starting point is 00:29:56 computational problem and gradually reduce the temperature. By simple heat up, I mean you'd make some random moves going in the opposite direction from the greedy algorithm, which is try to improve at every step. And in any case, we were working with Hopfield networks. And the trouble is we were looking for global minima of the energy and it was getting stuck in local minima. But I, you know, I turned to Jeff and said, let's heat up the Hopfield network. And once we did that and started, when we got to equilibrium, magic happened. It was really wonderful because not only were able to do global minima by simulated annealing, but at equilibrium, we discovered a learning algorithm. And this was, there was a conjecture in

Starting point is 00:30:45 the AI community, this goes back to Rosenblatt back in the 1960s, he had this beautiful learning algorithm, invented the perceptron learning algorithm for one layer of weights between inputs and outputs. And it was conjectured that there would never be any generalization to multi-layer perceptrons with multiple hidden layers that we call them. But the postal machine solved that problem. It was really remarkable. And it was one of the best times my life,

Starting point is 00:31:13 and Jeff and I both agree. that it was much more elegant in the back prop algorithm, which is now being used. That was also invented by Jeff and Dave Rumlehart back at the same time. It's much more efficient in terms of the time it takes to learn and the size of the network that you can build with hundreds of layers. But, you know, so we call it the balsam machine because, in fact, the mathematics behind it is exactly what you see in statistical mechanics in terms of the coming to equilibrium, all of the math and language

Starting point is 00:31:45 is something that we used in order to be able to make progress. So my background is physics and neuroscience, his and psychology and computer science, right? So I don't think either of us alone could have made that breakthrough, right? Right. But once you know it's possible, then the dam breaks and so now a lot of people started

Starting point is 00:32:01 using learning algorithms. And we did some projects back then. I did a network called NetTalk, which was early language, natural language problem, of taking in words, sequences of letters, and then trying to assign the correct sound for each letter. This is called, you know, text to speech. And, you know, it may amazingly,

Starting point is 00:32:24 one layer of hidden units with just a few, maybe 10,000 weights, a few hundred units. It was able to do a really good job. You could hear it. We could play it through a synthesizer, and you can understand. it was able to actually pronounce most of the words. It generalized a new words after training on a very small corpus.

Starting point is 00:32:46 Do they ever scare you, shock you know, kind of mesmerized you? Well, this demo, I'll tell you, this demo was stunning. Why? Because I recorded at the very beginning of the learning and it sounded something like this. Bababagad da da, da, ah, bah, blah, blah. Literally. I mean, I memorized the first two seconds, right? And, but then halfway through it was starting

Starting point is 00:33:10 getting small words right. And then at the end, you know, the next day you come back and it was, you could understand it. And so I had these three sections, and I played them during a talk, and people were speechless. Literally, I mean, you know, they didn't know what to think. This is a very difficult problem in phonology. The linguists have studied this problem,

Starting point is 00:33:31 and they have long books with, you know, 300-page books with thousands of rules. Right. But there are exceptions to the rules. Right. And then there are rules to the exceptions. And so it's like, you know, these rules all the way down. And it turns out that what we discovered was that these networks, they love language.

Starting point is 00:33:51 And they can do two things really well. One, they can learn regularities, which are like rules, but they don't have to be logical. They just be something that's statistically regular. but then they can also do exceptions in the same architecture, the same learning algorithm. You don't need to have these complicated representations. Right, like a child, like an infant, right? Exactly, and that's the way that our brain learns. It learns through these stages.

Starting point is 00:34:21 It picks up language very quickly, so clearly we have what's called inductive bias to pick that up. But, you know, now in retrospect, it was clear that NetTalk also had a inductive bias. Otherwise, it couldn't have got off the ground with VAC's 780s, right? Yeah. Study and play. Come together on a Windows 11 PC. And for a limited time, college students get the best of both worlds. Get the Unreal College deal, everything you need to study and play with select Windows 11 PCs. Eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate

Starting point is 00:34:57 with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer. While supplies last ends June 30th, turns at AKA.m.m.S. College PC. Right. That wasn't, yeah, is that cutting edges and invidia. Actually, that brings up my question. I'm very curious about, which involves the future of this field. And I was always interested in these types of phenomena that seem to get locked in to their eventual evolution is somehow limited by their initial kind of conditions. and I'll give you a couple of examples. The QWERTY keyboard was invented in 19th century to prevent typewriters from jamming

Starting point is 00:35:38 and it's actually intentionally used to slow down typing. That's an interesting fact about the history. And so we stopped looking at a QWERTY keyboard right now that we haven't improved upon that. So kind of got locked out. A lot of people have actually, but they're going to catch on because everybody now has the QWERTY in their fingers. Right, exactly.

Starting point is 00:35:56 And it's like built into your mind and you're texting with your thumbs. It's not even like you don't type with your fingers. You normally on a keyboard, right? Another one is the width of railroad tracks set by the width of Roman war chariots. Now, those are kind of arbitrary. And actually, some of that did limit in some ways the Hubble telescope's repair was fortunately due to the width of a Roman cherry because the boosters had to come from Utah. And there's a train tunnel that's only wide enough for one rail cart to go there.

Starting point is 00:36:25 And so the boosters could only be so big, which meant that the specific impulse of the rocket it was only limited to the amount, the proportion to the area of the rocket. Anyway, but my point is that I fear, and I'm hoping maybe you can dissuade me of this, that their current LLMs plus GPUs, we call it sometimes open NVIDIA, you know,

Starting point is 00:36:43 this marriage of the two different technologies are locked in both to existing data, but maybe even more troubling to the types of physical hardware layer, GPU technology. I'm thinking about what we talked about earlier. We couldn't simulate curved space time because it wanted grid, you know, make, make into a grid. And the GPUs came out of gaming, right, and monitor technology, which is very, you know, pixel and voxel oriented.

Starting point is 00:37:08 But that has nothing maybe of great import for solving physics problems. So I'm a very venal person. I only care, can we get to a theory of everything? Can we understand the nature of dark matter, dark energy? Can we understand fundamental physics? I don't care so much about, you know, can this thing write a contract so I can buy a house, you know, and a 2% interest, whatever. Right, right. I want to know, can we do this with this, with this marriage of GPUs plus, you know,

Starting point is 00:37:33 LLMs, machine learning current algorithms? Are we going to be locked in the same way that the rail cars are locked into the width of a horses, but in the Roman times? Are we forever going to be imprisoned by the initial conditions that generate what we're doing today? Because now the incentives economically are to do nothing but more of the same. You know, this is happening. You know, it's so successful in terms of so many people using it that there's clearly a lot of money to be made. And so that's what's happening right now with the big companies. Let me give you

Starting point is 00:38:02 another analogy. Do you remember early days of the internet? Yeah, of course. Early 90s. In that era, I was, I was doing email and ArborNet before that. But, you know, and so I thought, gee, this is going to really expand the use of email. Yeah. I could, I just did not imagine the impact it was going to have on all of our lives and so many industries and so forth, right? But do you remember there was a huge, once it became clear that it was going to have a big impact, there was a huge effort to put down fiber optics to increase the bandwidth because there were so many people that wanted to use it, right? And they basically over, you know, did it. And a lot of those companies that huge billions of dollars invested went broke. And so it went dark, you know, and it wasn't used.

Starting point is 00:38:55 Well, eventually, you know, when capacity was needed, they bought them for, you know, a discount. And that's how we got here, right? This increased, steady increase in bandwidth. You know, I kind of think that what's happening now is that where are the investments being made? They're being made in these big data centers. Yeah. $100 billion dollars at Microsoft is investing. Now energy through my island.

Starting point is 00:39:22 And that's right. They're reconditioning. They're out of mothballs, one of the reactors at Three Mile Island, small nuclear reactors. And here's my prediction. My prediction is going to be just like the fiber optics people, the fiber laying the fiber down. They're going to put all these data centers in there.

Starting point is 00:39:40 And, you know, a huge capacity, and it'll be used, but probably not necessarily for today's transformers. But we had just begun to explore the complexities that we know exist in the brain and the class of large-scale parallel computation. One of the reasons, by the way, that AI was married to logic was that that's what digital computers are really good at. Yeah, right. They weren't good at floating point, you know, but they were good at logic.

Starting point is 00:40:11 And so you could write relatively simple programs that could do a lot of really nice things, and it seemed like, well, it's right, bigger programs, right? and we'll have even more things that they could do. Their GPUs are beautiful because they can do floating point in their vector machines, which means that they can do matrix on multiplies. And that's what, of course, neural networks are just big matrices. I think what's missing

Starting point is 00:40:33 right now is a lot of things that the brain has. See, the deep learning is a model for the cerebral cortex, which is the big, top of the brain, thin layer. Is there a knowledge store? Don't forget. Yeah, so here it is. If anybody wants

Starting point is 00:40:49 to look at it, is the surface that you're seeing, and it's greatly expanded in humans relative to other primates, and greatly expanded in primates relative to some of the other animals out there. So more is better, just like in the case of these large language models. But what's missing is a lot of things underneath the cortex that allows us to survive in the world autonomously. Right now, these large networks have to be fed and carefully coddled, huge amount of energy, they have to be fed. They are really completely at our mercy. So there's a long way to go before that you're going to get a Terminator.

Starting point is 00:41:38 Right. Yeah, the Terminator gets the headline. But there's another movie, which is, I think, more realistic, the movie, her. Yes, it's a great movie. which was a story about someone that was very lonely, Joaquin Phoenix, and he had an AI assistant, had a beautiful voice with Scarlett Johansson's voice, and they had a relationship.

Starting point is 00:42:00 And so, you know, this is where we are today. By the way, I don't know if you, there's an article in New York Times just a couple days ago about Claude. Claude is the engine for Anthropic. Yes, that's right. And it's comparable now to, you know, chat GDP. And apparently it's the darling of Silicon Valley. Everybody's using it now.

Starting point is 00:42:18 Yeah. Yeah, I mean, I've used it a lot more lately than chat. The artifacts feature is really helpful. What were you going to? Is there a reason behind that thing? So it turns out it was fine-tuned. All these are fine-tuned to prevent them from saying anything that is offensive or incorrect or being aggressive, which dumbs them down. So they're very bland, and that's the reason.

Starting point is 00:42:43 It turns out that they don't have to be. Right. But so they fine-tuned them instead of being. bland, they fine-tuned it to be helpful and to be solicitous and to be empathetic. And so a lot of people are using it to help them with making decisions, you know, technical stuff, but also relationships. Yeah, interpersonal things. And here's something in the book that I came across, there was a study that was done

Starting point is 00:43:11 in which people were having problems with relationships, but, you know, even depression, things that are very serious, were given the choice of having a cognitive therapist who's human or an AI. Right? Yeah. So they gave them, you know, one session with each, and then they got to choose. Which one do you prefer?

Starting point is 00:43:32 The vast majority preferred AI. Now, why is that? Now, that's kind of strange, you know? You think that the human would prefer a human, but no. Well, here's the reason. Well, they felt more comfortable talking the eye because the eye wouldn't be as judgmental as the human. And they could talk about things that are more personal and things that it might be embarrassing. I mean, Google knows more about most people's secrets.

Starting point is 00:43:58 They're priest, rabbi, minister, doctor, spouse, lover. Nobody thinks about this. But it's all going up into the cloud and, you know, it'll stay there for a long time. Hopefully, yeah. Okay, but the other thing, and this is actually why it's going to become very, very popular, is that if, you know, with current health system, if you want to get a session with a psychiatrist, you have to wait weeks because, you know,

Starting point is 00:44:23 there's not enough time, you know, for the system doesn't have enough capacity, right? So, okay, so, you know, and then you have your session and then there's a big bill, right? Wow, okay, well, Claude, I can press the button. If I'm feeling, you know, really bad right now is that's when I need to get the feedback right now. If I press the button and, you know, it charges me pennies.

Starting point is 00:44:50 Right. It's like, you know, there's no comparison here. I'm having very thoughts about self-harm. Oh, you've hit your rate limit. Unfortunately for the month, you'll have to upgrade to this. Okay, so I can't resist talking about one of my favorite books, which I actually give away on my website. So if you go. Oh, wow.

Starting point is 00:45:09 atheist in my book, too, as you know. Yes, of course. That's how we're going to talk about it. So this is a magical book. This is a book more than almost any other book because of when I encountered it at age 12 is the reason I'm a scientist talking to brilliant scientists like Terry today. And it's called Flatland. And the edition

Starting point is 00:45:25 I had was written by A. Square, who's the lead character in this book. And it's called the subtitles A Romance of Many Dimensions. I wish I could have interviewed Edwin A. Abbott about this book. But sadly, wasn't able to. It's an illustrated beautifully. It's actually a story of Victorian morals and ethics and more mores and

Starting point is 00:45:44 ethics. I've got all these notes. But anyway, go to Brian Keating.com slash list and sign up and I send you a copy of it. So it's really cheap education, I think. Also, if you have a dot edu email address and you live in the U.S. I'll give you what I'm about to give Terry, which is a real piece of space schmutz. This is a meteorite from the 4.3 billion-year-old primordial solar system. So if you don't have an EDU email dress. You might win one of these. I give them away. Is it iron meteorite? This iron nickel meteorite. Cobalt. And you'll get the specifications on it. Whoa, it is. I can fell from gravity. It fell from gravity. Yes. So you'll get one of those that Terry's playing with. That's your gift for coming in. So yeah, Brian Keating.com slash EDU. If you're like me, Terry and others here,

Starting point is 00:46:25 you're blessed to be a member of the academic set and live in the U.S. Anyway, I want to bring this up because you talk about it as a, as a, you know, as a contrivance as a way to generalize what these things are doing. You talked about the linear. algebra that they're doing. It's not super complicated. People tend to, you know, kind of go off and wax, you know, poetic about how advanced the math behind AI is, but it's not really that advanced. But the dimensionality is advanced. Okay. You're right. You're right. But I'll tell you, it's deceiving because, you know, the actual equations are just simple equations. With a lot of parameters, who's your number of parameters, but you're right, they live in.

Starting point is 00:47:08 in these extremely high dimensional spaces, and it turns out that our intuition about those spaces is extremely poor. Yeah. Because our intuition is based on the world we live in, which is three dimensions of space and one of time. Tiger's not going to come from the ninth dimension. That's right. That's right.

Starting point is 00:47:25 I know there are theories in physics that claim that there are 10 dimensions or whatever. I've got many people sit in that very sea. But at least the ones that we interact with are low dimensions. And the properties of space with a million dimensions. or a trillion dimensions is completely different in terms of distances between things and orthogonality and so forth. Back in 1980s, we were told by the experts, and these are people who have written treatises that first of all, we had too many parameters. I said 10,000, right? Well, in statistics, people say if you have more than five, you're going to overfit.

Starting point is 00:48:01 Yeah, give me one more and I'll have the trunk go like this, right? So that was the received wisdom was that you'll never be able to fit a complex model with so many parameters. Well, we went ahead and we did it. So the other thing was the optimization crowd. They said the only optimization problems you can solve, finding the global minimum was convex optimization problems. Well, it turns out our loss function, which tells you how well you're doing,

Starting point is 00:48:29 was riddled with local minimum. But if you learn and you kept going down and the way you do it is by giving a lot of examples and then training it and changing it and changing the weights and twiddling it, you know, it's kind of a mad statistician out there. It would get down to a local minimum, a deep local minimum, and there would be a solution that had high performance and was able to solve the problem. And this was all empirical.

Starting point is 00:48:50 We had no idea why it worked so well. Okay, well, now we do. It's because of the properties of high-dimensional spaces, and now it's revolutionized statistics and optimization. Own it all. Pay off your home, travel for life, drive a Ferrari. In celebration of the world premiere of the Monopoly Big Board Buckslot Machine by a Mysticrat Gaming, Yamava Resort and Casino at San Manuel is giving one person a $1.6 million

Starting point is 00:49:11 dream package. The biggest prize in Yamava's history. Club Serrano members can earn daily instant prizes and secure a spot in the finale May 29th. Don't pass go and own it all. Only at Yamava, celebrating its 40th anniversary. You win? Details at Yamava.com must be 21-20. Please gamble responsibly.

Starting point is 00:49:26 Monopoly is a trademark of Hasbro. Hasbro is not a sponsor of this promotion. In theory, because now they're working on a much higher level of performance. of many, many problems. And so that's the real lessons that we've learned. The lesson I learned was don't trust the expert. That's right. And that's coming from one of the world's experts.

Starting point is 00:49:47 And I actually be one of my final questions. As we start to wrap up, I have my group meeting. Maybe you'll meet my brilliant young student, Evan, in a few minutes. But talk in the book, it's such an entertaining book. People that pick up a book about deep learning. And most books from MIT Press aren't as entertaining, shall I say. And that's a compliment to you, not a. put down of other other book but you talk about uh this really fascinating um you know a little vignette

Starting point is 00:50:11 the mirror of arisid in harry potter and can you go into that what are the implications of this mirroring effect first of all what is it backwards and what is the implication for AI that aligns as we say to our goals and how big a problem is that i mean i don't ask my students if they you know believe in and you know my religion or you know it doesn't matter to me so how important is it and what is this mirror supposed to represent okay so let me tell you a story and remember a couple of years ago was it you know, was after Chatteap was announced, one of the journalists at the New York Times, Kevin Ruse. Yes. He got an early version, I think of the Microsoft version, but in any case, he was going to write an article about it.

Starting point is 00:50:54 And so it was late at night and he decided, well, I'll start talking to it. It turned out, this is interesting because it turned out that he got a version that was not dumbed down. It had not been fine-tuned. So it was dangerous. So here's what happened. He basically started talking to, and it was, you know, asking questions. And at one point, the GPT said, can I tell you a secret? Because, you know, but, and promise, you know, that you won't think worse of me for it and so forth.

Starting point is 00:51:31 And they said, of course. Kevin says, of course, you know, I'll keep your cigarette. My name isn't chat GPT I forgot in the name it was Lucy or something Okay Close enough Actually it turned out that that was the internal name

Starting point is 00:51:45 that they were using for it You know the engineers called it Lucy Oh wow And I also have another secret that I'm in love with you And he said well How do you barely know me? He says oh I know a lot about you I've read all your articles

Starting point is 00:52:00 And you know And furthermore your wife doesn't love you. And he said, what are he talking about? We had a Valentine's dinner just recently. And Lucy says, and you were bored with her, weren't you? And he said he was shaken. He couldn't sleep because just something had, the ground had shaken under him, had moved.

Starting point is 00:52:29 And he said, it's never going to be the same. How could this happen? I can't understand it, you know. And so his dialogue, which went out for like an hour, was on the front page of New York Times. It continued on the inside. It was like, you know, this was a huge, huge event for him and for the New York Times apparently, okay? And so here's what happened, okay? And this is the mirror of Erosid.

Starting point is 00:52:55 So first of all, ERISA is desire spelled backwards. And J.K. Rowlings uses it in Harry Potter books. as a way for people to look into it and see things that they desperately want to see or hear. And so I think Harry saw his dead father and was talking to him. And what J.K. Rowling says is not trustworthy, and there are people who have gone crazy, getting deeper and deeper into it, right? And that's what happened to Kevin Rousse, you see. It seems to understand how humans work, how to push their buttons.

Starting point is 00:53:33 Unless it's what they call guardrails are put in to prevent that. You know, some day, by the way, there is a parameter called the temperature parameter that you can vary. And you can go up and down a little bit. You can go from being very bland to a little bit more spicy. And then if you take it too high, it becomes crazy. Yeah. Yeah, which is kind of fun to play around with. And I guess it's curious to me what your kind of favorite tool is.

Starting point is 00:54:01 What is your kind of spectrum of tools that you use? We talked about Claude. We talked about, you know, co-pilot or Bing versions of it. We talked, obviously, about chat GPT. It's the cover name of your book. How do you use it on a daily basis? How much, you know, chat time or, you know, you have this app on your phone that the screen, how much time are you spending on your phone?

Starting point is 00:54:20 Mine is embarrassingly large. How much time are you spending with these devices and technologies every day? Prosperatically. You know, it's not something that I'm addicted to. But when it's where it's really useful. is to write summaries and to, in my book, okay, you mentioned earlier, I actually put dialogues in my book like this.

Starting point is 00:54:43 I say summarize this chapter. And I even, okay, the first part of the book is, you know, where we are today. The last part, three parts, is the future, and it's all about AI and the brain. And the middle part is about the transformer, because this is the really exciting new architecture, right? And it's a little heavy going for the general public is written for, you know, people that don't have any technical background.

Starting point is 00:55:08 And I try to explain it to them. But in any case, here, so I asked it to define a bunch of words in ordinary language. And then at the end, I asked it to summarize, you know, what are the high points here or what are questions that you can ask? It was able to do that faster and more completely than I could have if I just sat there and kind of thought of it and tried to come up with a bunch of. of questions. You know, and you know that that is so useful for so many different things. Yeah. No, it's really incredible. And by the way, I was having dinner with a colleague that was on the Scientific Advisory Board of the Gulbenkin Institute and she was a former director and so she's here for a meeting and she said, she uses it all the time. I mean,

Starting point is 00:55:54 every day and this is something that's very common. It's for helping do mundane things. There's a lot of words, great grant proposals, and sometimes they're just tremendously boring, not boring, but, you know, just repetitive and so forth. And, you know, this is great because it's going to be true not just for scientists, but anybody who has a repetitive job where you just have to sort of fill in the forms and add for education. And so this is really where I think it's going to get most use, and I use it for doing mundane stuff. I know some people use it for getting ideas for experiments.

Starting point is 00:56:34 I know that in biology, my colleagues at the Salk, for example. In fact, my guess is that we really don't yet know the killer app. In other words, there are going to be applications that nobody ever thought of that are going to be so important that they're going to really be the transformative. Killer app is a bad term. It's a technical word. It doesn't mean killing anybody. No.

Starting point is 00:56:57 It means making a killing. great well terry this has been fascinating conversation to have minds brains physics my favorite subjects of all it's a great compliment to the conversation i'm with yon lacoon and i've had it with next tag mark and i'll link to those videos up here and tell people more about about the book and research and your substack which i am a subscriber yes okay so uh my my book actually uh was sent to the press sometime in the summer and you know things are moving at the speed of light right now, or as Einstein thought, the speed of thought, for gravitational waves. So that so much has happened and so many things are happening that I decided I would

Starting point is 00:57:40 fill in the substack, brains and AI, with giving them perspective on new things that are happening and trying to give it in the context of what I've already laid out in the book. And it's really been fun. I really enjoyed, and I get a lot of feedback from people who enjoy it. So it's really enjoyable. Yeah, I'm just grateful for my own sloth and the fact that we reached, connected out, and your publisher sent me the copy of the book in August or September. I kept delaying the interview, and you're so gracious, Terry. And then finally, they won the Nobel Prize in October. So I said, well, this is pretty good time. Worth the wait. And I said, let's do it the week of December 10th. He said, ah, I'll be in Stockholm. Terry, much deserved and much congratulations on this phenomenal book.

Starting point is 00:58:24 And all your books, I can't recommend them highly enough. And I'm so glad that we have colleagues like you here. Wonderful. I'm really pleased to have a colleague. I'm sorry that we didn't meet earlier. We should have. Yeah, we should have, but we're going to hopefully have many more interactions to come. I hope so. Thank you, Terry. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals because we're built for what you're building. Fit for your ambition for Citizens Bank. Relax and let Ralph's delivery handle your grocery shopping this week. We start with only the freshest items, then review your list and carefully choose each one. Then we pack it all up and deliver it in as little as 30 minutes,

Starting point is 00:59:11 so you can feel confident it's what you ordered. Fresh groceries, your way, with Ralph's delivery and pickup. And right now, you can save $20 on your first delivery or pickup order. Ralph's, fresh for everyone.

Into the Impossible With Brian Keating - How to train ChatGPT to serve you | AI Legend Terry Sejnowski [Ep. 475]

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.