Into the Impossible With Brian Keating - The Mysterious Math Behind LLMs | Anil Ananthaswamy

Starting point is 00:00:01 It's peak pollination season, and my business is scaling fast. To keep the nectar flowing, I need a phone plan with top priority data speeds. That's why I chose Google Fi Wireless. My connections stay strong even when the hive is buzzing. Plus, unlimited plans start at $35 a month. Now that's a deal that doesn't stay. Explore Google Fi Wireless plans today. Plus taxes and government fees.

Starting point is 00:00:24 Google Fi Wireless is not subject to data traffic deprioritization during times of high network usage. You said this place was steps from the water. We just haven't found the steps yet. How much did we save? Enough. Enough to get lost. Or you could book a stay with Hilton. Welcome to your oceanfront room.

Starting point is 00:00:45 Just steps from the water. The Hilton sale is on now. Book on Hilton.com or the Hilton app and save up to 20% to get the stay you expected. When you want savings, not surprises. It matters where you stay. Hilton, for the stay. What if the most powerful AI systems we've ever built are succeeding for reasons we still don't understand?

Starting point is 00:01:06 And worse, they may succeed for reasons that might lock us in for the wrong future for humanity. Today's guest is Anil Ananthaswamy, an award-winning science writer and one of the clearest thinkers on the mathematical foundations of machine learning. In this conversation, we're not just talking about new demos or incremental improvements or dates on new models being released. We're asking even harder questions. Why does the mathematics of machine learning work at all? How do these models succeed when they suffer from problems like overparameterization and lack of input training data? Are large language models revealing deep structure? Or are they just producing very convincing illusions and causing us to face an increasingly AI slop-driven future?

Starting point is 00:01:49 Thank you so much for joining us all the way from Vangalore. This is so exciting. Well, Brian, thank you very much for having me. It's a pleasure. It's really a wonderful book. We're going to judge the book by its cover, as I like to do later on. It's entitled, Why Machines Learn? And the first question I want to ask you, O'Neill, is I was taught as a physicist, you can never ask why questions. That's the first word of your title.

Starting point is 00:02:12 What made you want to explore why and not how or what machines learn instead of why? It's funny. I answered this exact question yesterday at a panel discussion about the very same doubts that people had. This is just a rightly conceit, I must admit. The title came about because when I was trying to learn the mathematics of machine learning, I encountered very early on this amazing proof that uses very simple linear algebra to show that a single-layered neural network, something called a perceptron from the 1950s,

Starting point is 00:02:55 will converge to a solution in finite time if a solution exists. And, you know, in the late 1950s, the algorithm was first developed, which was essentially a simple neural network that could, you know, do linear classification. The algorithm is very simple, and that to me is the how. And a few years after the algorithm was developed, people started mathematically proving that the algorithm would converge to a solution in finite time if a solution existed. And to me, in my head as a former software engineer, the math became the why.

Starting point is 00:03:32 And of course, if you were to ask a physicist, they would just, you know, it's funny because about a couple of months ago in Bangalore, David Gross was visiting the Nobel laureate. And he had the exact same question about the title of the book. And I tried to give him my rationale. And he did not buy that one way to. He said, no, there's no why here. It's how. So yeah, it's just a writer's conceit, you know, to me, how is the algorithm? And because the book is about the mathematics, and I feel like the math kind of gives you a rationale for why these algorithms do what they do. So that's how the title came about. What was the first mathematical idea that you encountered in machine learning and research that you did on the book that made you stop and think that this is genuinely beautiful, as I find it to be?

Starting point is 00:04:21 Oh, it was exactly this perceptron convergence proof. So maybe we can kind of talk a little bit about how that perceptron came about, right? In the late 1950s, when Frank Rosenblatt, who was a Cornell University psychologist, he designed what was the first kind of artificial neural network. And it was a single-layer neural network. And like I just said, you know, the initial, work was simply developing the algorithm and showing that it worked, it did pattern classification. It was able to take two categories of data.

Starting point is 00:05:02 And if these two categories were linearly separable in some mathematical space, the algorithm would find the linear divide between the two clusters of data. And subsequent to the invention of the algorithm, people started mathematically showing why this was powerful and why this classified even worked. And you have to think back to the 1950s, you know, when somebody gave you a mathematical proof saying an algorithm would find a solution in finite time if a solution existed, that was like gold dust, right?

Starting point is 00:05:39 That was really, but when you look at the proof, it's very, very simple linear algebra, right? It is just manipulating vectors and matrices. There is nothing more than that. And it is so beautiful. So that was the proof that made me kind of say, oh, hang on. Until then, I was just learning it for the sake of learning. I was not thinking about a book.

Starting point is 00:06:00 I did not, you know, this was just me trying to get under the skin of machine learning to try and understand it for myself. But when I encountered that proof, that's when kind of a light bulb went off saying, hang on, there are all these beautiful things that we should be communicating to readers. And so that set me off on a journey looking for other theorems and proofs that exist in machine learning that could then form the backbone of a mathematically oriented narrative, historical narrative of machine learning. Why do you think it took so long for the tools and the techniques of machine learning, the mathematics, which, as you say, is very simple. I mean, I teach it to, you know, my undergraduates and even high high schoolers that I happen to know. So how is it that it took so long for it to develop into this incredibly dominant part of our economy?

Starting point is 00:06:54 Yeah, I mean, so the machine learning and AI that we talk about these days is what is called, you know, deep learning or deep neural networks. And these are very, very massive artificial neural networks, where a neural network is simply, you know, a whole bunch of artificial neurons interconnected together and an artificial neuron. you can think of it as a computational unit. Some inputs are coming in. It does the weighted some of the inputs, and if that input exceeds the threshold,

Starting point is 00:07:25 it does something on the output side. So it does that kind of computation. And that's an artificial neuron and a whole bunch of these things interconnected together, form a neural network. In the 1950s when they started, the only way we could do anything with these networks is if they were single-layer neural networks,

Starting point is 00:07:45 which meant that, inputs are coming in, the neurons are there, and they do the computation and produce the output. Just one layer of neurons. The training algorithms that like Frank Rosenblatt had developed or even somebody like Bernie Widrow had developed the least mean square algorithm, these things only worked for single layer neural networks. And the moment you put another layer, after the input layer, the training algorithm didn't work. Essentially, the moment you had multi-layer neural networks, the algorithm that they had to train the network was ineffective.

Starting point is 00:08:21 And so that was in the beginning a stumbling block. So in the 1960s, people kind of realized mathematically that these single layer neural networks actually were good at what they were trying to do, which was linear classification. But they were really no good for anything that involved finding a nonlinear boundary between two classes of data. And towards the end of the decade in the 1960s, Marvin Minsky and Samore Pappert wrote this amazing book called Perceptrons

Starting point is 00:08:50 in honor of Frank Rosenblatt, who had developed the first neural networks. And in that book, they had a very elegant proof for why the perceptron would converge to a solution, but they also had another proof which showed that if the solution involved a non-linear boundary,

Starting point is 00:09:10 then the perceptron, would fail at very, very simple tasks. And that kind of put a big damper on, you know, research because people thought that, oh, if it can't even solve this problem, which was, you know, just literally taking four data points, arranged on the X, Y, planes such that a single straight line could not separate, you know, the circles from the triangles, the circles and triangles between being two different kinds of data. And the other thing Minsky and Pappar did was they kind of insinuated without, you know,

Starting point is 00:09:41 mathematical proof that even if you have multilayer neural networks, they would still not be able to solve these simple nonlinear problems. It was not a proof, which is very obvious now because, of course, these things solved nonlinear problems. But at the time, people kind of took that at face value and the research interest in this topic dried off, funding dried off. So neural network research kind of fell off a cliff during the 1970s. That was the first AI winter.

Starting point is 00:10:08 But there were other kinds of machine learning going on, non-neural network based. machine learning techniques that were like the nearest neighbor algorithm, which was really popular for a long time. They were, you know, Bayesian classifiers. All of this stuff was getting developed and studied and support vector machines, which came about a bit later. The real reason for why neural networks never really took off until, let's say, the next phase was the 80s, when people like Hinton and John Huffield, both of whom got the Nobel last year

Starting point is 00:10:39 for their work during the 80s. They kind of reinvigrated interest in these neural networks. Hopfield designed what are now called Hopfield networks. They're not used anymore today for what we are doing with AI, but they were a big deal in the early 1980s. And then Hinton, along with David Rumelhardt and Ronald Williams, they wrote the first paper on the back propagation algorithm, or at least they put everything together to show how a deep neural network

Starting point is 00:11:11 could be trained using something called the back propagation algorithm. So until then, we just didn't know how to train these networks. So that was a big sort of huge gap between the early neural networks in the late 50s all the way to the mid 80s. But even then, even once we figured out how to train these networks, it was still not enough because these networks are extremely data-hungry, right? They require a lot of data to learn about patterns that exist in data. We just did not have the data in the 80s.

Starting point is 00:11:45 And the other thing is that they're also very compute-hungry. You need a lot of computing power to train these networks. And of course, we just didn't have that either in the 80s. So even though, you know, late 80s and early 90s, we had these things called convolutional neural networks that Jan Lecund and his team had developed, again, they went nowhere because of the lack of data and the lack of compute. And traditional machine learning methods continue to flourish

Starting point is 00:12:11 You know, throughout the 90s and early 2000s, we had support vector machines, which were a big deal. And it was really the invention of the internet, the availability of extremely large-scale data that you could essentially use the internet to collect all that data, images or text or whatever. And the realization that you could use the compute that was the computing power that had been developed in the form of graphical. processing units, GPUs. Everyone knows about GPUs today as being the backbone of what's happening with AI. But really, these things were developed for video gaming. So, you know, when you think about what, you know, video games need to do, they need to refresh your screen at a very fast rate. And the screen is essentially a matrix of numbers, the pixel values, right? So they're very, very good at manipulating matrices of numbers in order to, you know, refresh the screen.

Starting point is 00:13:11 for the purposes of running a video game. And people like Hinton and many other people realize that, OK, we can use these GPUs to do matrix manipulations, which are also the backbone of machine learning. So when a machine learning algorithm transforms an input vector into an output vector, the inside the black box, it's essentially matrix manipulations that are happening.

Starting point is 00:13:40 So they, you know, so it was It was a combination of having enough compute in the form of GPUs and having the data that then in the 2000s, late 2000s and early 2010s that things began to change dramatically. Today's video is sponsored by my friends and a liner. If ever asked AI a tough question and got back, Gabile Gook, that's not entirely the fault of the AI, but the frustration that you feel could actually be worth up to $150 per hour. Behind every AI breakthrough is a network of experts actually teaching these systems how to think. And my friends at a liner are connecting brilliant people, mathematicians, scientists, engineers, geniuses just like you to make sure AI works for all of us.

Starting point is 00:14:20 Liner has specifically partnered with the Into the Impossible podcast to find geniuses from my network to give AI models expert feedback. Your job, if you accept it, is to evaluate AI outputs. That's it. Design problems that even today's best models can't solve. Your job is to grade their attempt at quantum mechanics, topology, advanced coding. You're literally teaching AI the difference between right and roll. undergraduate mistakes and doctoral level thinking. That's why they're partnering with me. Listen, I know that many of you have done an unpaid internship, shall we say? Been lab rats

Starting point is 00:14:49 running someone else's experiments. But now is your turn. You don't have to grind as test particles in someone else's lab ever again. This is different. It can be done all remotely, timing is flexible, and you get paid weekly up to $150 per hour. Aliner is selective. They need to be in order to get the best results, right? They only accept people who can genuinely push AI forward. Most applicants won't make the cut. So check out a miner.com using my link below. AI has already consumed the internet and likely wasted a lot of your time, as it has with mine, with incorrect answers, logical flaws, or poorly worked out solutions.

Starting point is 00:15:21 This is your chance to get it right for the future of science and to get paid while you're at it. Like the link below. And we'll get to, you know, get back to the kind of historical overview of it and even some of the nuts and bolts of how a perceptron works and some of the matrix algebra. You know, it's remarkable. And, you know, there's a famous quote attributed to Stephen Hawley. that every equation in your book cuts the readership and half.

Starting point is 00:15:46 But that's true. I shouldn't have even read this, but I mean, it's got over, you know, 400 equations and incredible detailed illustrations. It's really, it's sort of this hybrid between a textbook and a thriller, a historical thriller. I just think you're to be congratulated for doing it. I listen to it, which I don't know if I recommend the audiobook compared to the, the printed book. I really love the printed book. I'm actually giving it to one of my kids who's

Starting point is 00:16:16 very, very precocious and wants to learn calculus. I figured maybe he can learn calculus from machine learning that you describe in this book. Study and play. Come together on a Windows 11 PC. And for a limited time, college students get the best of both worlds. Get the Unreal College deal, everything you need to study and play with select Windows 11 PCs. eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer. While supplies last, ends June 30th, terms at AKA.m.m.S. College PC. But you mentioned, you know, this kind of, to paraphrase Mark and Drison, you know, AI is eating software and software is eating the world.

Starting point is 00:17:04 I'm going to talk about this phenomenon, which I've done a little bit of research on for fun for the podcast. It's called lock-in. And I'm not sure if you're familiar with it, but I'll just describe what it is. It's the phenomenon by which an early technology becomes super dominant, cannibalizes everything that came before it because it enables some new efficiency

Starting point is 00:17:27 or new capability that heretofore didn't exist. And, you know, there's a couple of classic examples. One is the QWERTY keyboard, which is not optimal. And it's not efficient from a human, you know, from a frequency of words and typing perspective. But it was invented because the typewriters that were early adopted had this problem that the keys, the mechanical hammers,

Starting point is 00:17:51 would stick together if they were used too often next to each other. So they wanted to space letters apart so that they wouldn't be pressed at the same time and you wouldn't have this lockup, not lock in, but lock up. Another example is the quality of the Hubble deep field image

Starting point is 00:18:09 is great, it's breathtaking, but it could have been, you know, as good as the web telescope images, which are, you know, 10 times better, if not for the fact that the backside of a horse is about a meter across. So when the Romans designed chariots to be pulled by two horses,

Starting point is 00:18:27 that was set by the width of the horses rear end. And because of that, the roads and the train tracks that later took precedence over the roads had a width of about two, you know, two to four meters. to accommodate two chariots going back and forth. And because of that,

Starting point is 00:18:44 and because of the fact that the space shuttle was built, its boosters were built in Utah, and the launches were in Florida, they had to transport these massive rockets through train tunnels all the way from Utah in the U.S. to Florida, which meant it had to go through a train tunnel, which meant it couldn't be bigger than a certain diameter, which meant that the specific impulse,

Starting point is 00:19:05 the thrust couldn't be above a certain amount, which meant they couldn't get to a high enough altitude that it could have taken a better image. Okay. These are examples of lock-in, that some early technology establishes the basically dooms the future into this, you know,

Starting point is 00:19:20 kind of irrevocable prison that it can't escape from. And I'm wondering the success, this transition inflection point with LMs plus GPUs, I'm worried it's another type of lock-in. And as successful as it is, I'm worried that we won't get

Starting point is 00:19:36 the things that I'm most interested in, which, you know, new laws of physics, and new descriptions of mathematical reality, et cetera. Do you worry about the success, not the failure, not the AI winters and stuff, but do you worry about the summers being so bountiful that it will crowd out, essentially any competing and possibly better technology? Yes, I think you're spot on because if you, and here the lock-in weirdly is the

Starting point is 00:20:04 incredible amount of data that we have been able to scrape off the internet, right? and also in the presence of GPUs. Now, the GPUs, one can argue that they're just a computing element, which haven't necessarily locked us in. But I think this LLM revolution has been made possible because of this extraordinary amount of data on the Internet, right? And we have managed to somehow create these models that are learning about, you know, the knowledge

Starting point is 00:20:39 and the sort of syntax of human written language. And kind of, it's an intelligence that is imposed from the top down. These machines are not learning things from the ground up the way, let's say, humans do or animals do. And our general intelligence very much is a property of the fact that nervous systems have evolved over evolutionary time and nervous systems have encountered things in their environment and have enabled the development of, you know,

Starting point is 00:21:08 the development of brain structures and algorithms that operate in those brains from the ground up. And I hadn't thought of it in the way that you're framing it, but it makes complete sense that the economic incentives now to succeed in this arena is so high that there's so much money that is being poured into building these LLMs and they're getting bigger and bigger. People have bought into the argument that scaling up is going to unlock more and more, quote, unquote, intelligent behaviors. So, yes, at this moment in time, we are certainly locked into this particular form of, you know, AI, so much so that I'm sure there are many, many, many smart people who otherwise could have

Starting point is 00:22:05 been doing other kinds of research into, you know, different kinds of models that would potentially learn how to generalize better, be much more sample efficient like our brains are, use much less energy than these LLMs do, et cetera. And all of those areas of research have probably been kind of squeezed of funding because of the money that's going into developing LLMs. So yeah, entirely, entirely possible that we are in a phase of lock-in because of this current trend. And, you know, as I said before, to me, the greatest thing would be to get, you know, a theory of quantum gravity, you know, that no human has been able to come up with. And I want to draw your attention to a statement made by a different Nobel Prize winner.

Starting point is 00:22:53 It's Albert Einstein. who said that his greatest thought, his happiest thought, was that an observer in free fall would experience no gravitational force. And he literally said it gave him tingles up his spine, basically. And, you know, I wonder to what extent, and that allowed him to create the, you know,

Starting point is 00:23:13 principle of general relativity and, and equivalence, principles and so forth that we credit to him. But I wonder, you know, can a computer experience a tingle down its spine, Conversely, can it experience pain? Can it have a happiest thought? And if not, what does that portend for its ability to create new laws of physics that humans are incapable of creating with this, you know, three-pound, you know, neural network that we have in our brains? To what extent, in your opinion, is embodiment is, you know, kind of unique human sensations, what we call qualia? Are those important for making breakthroughs that really matter to scientists, say, like me?

Starting point is 00:23:53 Oh, that's a huge question, right? I think it comes down at some very basic level to what we think is human consciousness and whether our intelligence and our consciousness can be thought of in materialist terms. So for people who take the view that everything about our consciousness and intelligence can be explained eventually in computational terms, and even if it is computational, then the computation also is substrate independent. If that's the case, if everything that we are, and it's a big, big if, if everything that we are is something that can be boiled down to computational principles, substrate independent computational

Starting point is 00:24:35 principles, then I don't see any in-principle reason why machines cannot be built to perform those very computations and have the same kinds of experiences, et cetera, that we are privy to, right? But there's a big if there. And that's a huge one. Can LLMs have those? Again, I mean, a lot of this comes down to agreeing or disagreeing upon what we think is happening within us. That's right.

Starting point is 00:25:02 Yeah, I almost thought as I was reading this, I hope Naneal writes a book, Why Humans Learn. Yes. I mean, that's a big question for right now, for even machine learning people and computational neuroscientists. We don't have full-fledged answers to, you know, why we do what we do. So our intelligence, what kinds of algorithms are running in our brains, for instance, is everything finally describable in terms of computation? Even that question is not answered. The answer to your overarching questions about whether machines can eventually feel and have

Starting point is 00:25:34 feelings the way we do hinges upon answers to questions about our own intelligence and our own consciousness. If everything that we are can be talked of in materialistic terms can be reduced to the workings of matter, and if, you know, if all of what we are is somehow captured by computations and the computations have to be substrate independent, it doesn't require biology, it could happen, you know, in silicon material, then yes, why not? And embodiment would be just another axis on which these machines would function. But without knowing the answers to questions about human intelligence and consciousness, it's really hard to answer.

Starting point is 00:26:16 what will happen with machines. I don't think we are in a position right now to definitely say that we will be able to build machines that will feel and have conscious experiences. It all depends on our definition of consciousness. And then there are people even today who would say that, yes, machines are very definitively going to be conscious.

Starting point is 00:26:35 And you'll find as many people who will completely say, no, that's absolutely not possible. So I think it's an open question. whether conscious experiences are eventually necessary for the kind of breakthroughs that we're talking about, you know, coming up with the theory of relativity without having any prior knowledge of that stuff, you know, that I'm not so sure consciousness is necessary there. To me, they're orthogonal problems, like intelligence and consciousness are you can have them varying on orthogonal axes. So you could potentially have a system that is capable of.

Starting point is 00:27:12 of coming up with something new, but have no quote-unquote conscious experience of it, hence no joy, no pain, whatever. What do you think is the most underappreciated and over, you know, kind of emphasized aspects of machine learning that you've encountered? Underappreciated. I think for me, after having written this book about the mathematics of machine learning, I, the thing that I find most fascinating and that, is really underappreciated.

Starting point is 00:27:44 And I think it's hard for someone who hasn't encountered the math to even appreciate is the high dimensional mathematical spaces in which these machines operate, right? I mean, these are all, these machines are doing their thing in vector spaces. And it's extraordinary when you look at the dimensionality of these mathematical spaces in which these calculations are happening. And the properties of these mathematical spaces that lead to the properties of these machine learning algorithms. That is really fascinating. But I don't know how something like that could be appreciated or even, you know, communicated without explaining a whole bunch of stuff

Starting point is 00:28:22 about vector spaces and things like that. So there is something very beautiful that is happening in these mathematical spaces. And it's entirely possible that our brains are also functioning similarly, you know, navigating high dimensional spaces to do the things that they do. And to me, that's the most fascinating part. And yeah, you mentioned this phenomenon. of emergence, which is, you know, like the Supreme Court in America said about pornography, which is, you know, you know it when you see it, but it's very hard to define how these phenomena really do come about. It really was not truly clear to me until I read your book. And in terms of, you know, the details of how these algorithms work, but also the import of the training data

Starting point is 00:29:07 and how important that, really crucially important that is, you go. over some of the restricted training data, you know, the U.S. Postal Service data that was used for, you know, recognizing numbers and so forth. And then, you know, we don't look at the post office as the model of efficiency, but it does do this incredibly well in optical character recognition and all sorts of other techniques that they pioneered that you mentioned in the book in other countries as well. But it seems to me, you know, kind of this very strange phenomena to be in that we've ingested most of the Internet. You know, we have these huge, huge number of tokens and parameter models that you could put, you know, on your local desktop

Starting point is 00:29:49 and soon on your phone will be, you know, not far behind if it's already not here. But that, you know, what is left to be ingested, you know, when I talked to Yanukun last year, you know, he was saying, well, a cat, you know, can take in, you know, four terabytes of data per second. But, you know, if these algorithms are waiting for the next, you know, avatar movie to come out so it can ingest in more language and more data into its training set, if that's allowed even. It seems to me like we're just going to slowly asymptotically converge to everything has the same information because there's only one internet out there. And yes, it's hard to

Starting point is 00:30:28 characterize it all. Could it be that the very enabling feature of the success of these models will be its downfall because eventually there'll be no advantage. Everything will have the same data where all have access to the same internet. And there'll be no advantage to any of these models, and they should just all have the same outputs and given some predictive input. So what do you make of the kind of, again, a lock-in phenomenon that having all this training data

Starting point is 00:30:53 was crucial, but now we're kind of saturated, and maybe that means we'll asymptotically improve only very slowly in the future. Entirely possible. Because the lack of sort of freely available data is very obvious now. I think all that has been already scraped and taken in. There's a lot of data still locked

Starting point is 00:31:13 in behind firewalls within institutions and corporations and private hands. And that's actually very, very high quality data as opposed to the stuff that we have scraped off the internet, which is relatively low quality data.

Starting point is 00:31:29 But there's a lot of structured data that exists in company databases and institutions. And that, there is still value to be unlocked there. There's also this idea that we could have synthetic data generation.

Starting point is 00:31:45 Now, that has the danger that we will end up sort of, you know, AI is generating data and then kind of, there's a very interesting, very evocative phrase that was used by someone, I forget who it was, they said that eventually these models will choke on

Starting point is 00:32:01 their own exhaust, right? Own it all. Pay off your home, travel for life, drive up a In celebration of the world premiere of the Monopoly Big Board Buckslot Machine by Aristocrat Gaming, Yamava Resort and Casino at San Manuel is giving one person a $1.6 million dream package. The biggest prize in Yamaba's history. Club Serrano members can earn daily instant prizes and secure a spot in the finale May 29th. Don't pass go and own it all, only at Yamava, celebrating its 40th anniversary. You win?

Starting point is 00:32:27 Details at Yamava.com must be 21-20. Please gamble responsibly. Monopoly is a trademark of Hasbro. Hasbro is not a sponsor of this promotion. I call it. Sorry to two, my own horn about it. I, you remember the mad cow disease of the 1990s in the UK when basically all meat was tainted because cows were fed cows.

Starting point is 00:32:45 So I call it mad bot disease, you know, where they're taking in their own data and then, you know, using it to regurgitate to something new. But I like the exhaust as well, but go on. Yeah. No, so you're, and this is a valid concern, right? People have this concern that maybe we are saturating. And, but it's also true that even if, even if, you're, you're not. we just continue the same paradigm of training on more and more data, there is still very,

Starting point is 00:33:09 very high quality data that is available and we just haven't used those. And it may not, it's possible we may not be able to use them for publicly usable LLMs because this will be copyrighted data and private data and there will be all sorts of concerns about, you know, privacy of the people whose data it is, etc. So I'm not sure it can be unlocked that easily, but there's, there is good data out there. My sense is that, and Jan Lakun is right about this, that there are ways in which animals and humans learn, that there's something, we're doing something very different than LLMs. You know, we don't require, even though as a child or, you know, as a cat, we encounter a lot of data, there's a lot of structure in the, in the environment that we are encountering. and there is something about the algorithms that we have

Starting point is 00:34:05 that are operating inside our brains that are much more sample efficient. We just don't require that many examples of some instance of a pattern for us to learn about what it is. And then we are able to generalize so much easier, right? We learn abstractions about some problem, and then we use the learned abstractions to then solve a problem in a completely different domain. and machine learning algorithms are not there yet, even these LLMs, they can't generalize the way we do.

Starting point is 00:34:36 So my suspicion is that even if LLMs and the current approach saturates on this data problem, the breakthroughs might come in the form of new algorithms that learn very differently. And they learn continually, right? So current machine learning models, especially LLMs, they don't have this feature of continual learning. They're not, you know, you train a model and then you freeze that model. That's it. The weights of the model don't change after that. You can use it as much as you want, but it is what it is.

Starting point is 00:35:06 And you get a snapshot in time of the knowledge that it has ingested. And it's not a continually learning machine. And we are, of course, we are learning all the time. And even though when we learn new things, we don't mess up things we have already learned or we don't forget the things that we learned before, machine learning algorithms are not like that right now. So somebody is going to figure out how to come up with machine learning algorithms that are, you know, capable of continual learning, are more sample efficient, energy efficient, et cetera, and are able to generalize better. Then the data problem will be,

Starting point is 00:35:44 will not be as acute. And what, you know, kind of alternatives, if you had to take, you know, the Schrodinger versus Heisenberg, you know, from your previous explorations in physics and through two doors at once. What sort of, you know, competitors to the GPU plus LLMs are there? Even if it's, you know, it's kind of the 98-pound weakling versus the, you know, the behemoth. What's sort of the David to the LLM plus GPU Goliath right now? Yeah. I'm not so sure the GPU part is really the issue. because even any kind of computations that are happening in these machine learning models finally will involve matrix manipulations.

Starting point is 00:36:27 So the GPU is going to be important. Whether you require as many GPUs for other algorithms, that's a different question. We don't know the answer. Let me just break in. What about these tensor TPUs? What's their fundamental advantage or comparative difference between those and GPUs? I have lack of knowledge there about the exact differences between TPUs and GPs. I mean, they're still doing matrix manipulations,

Starting point is 00:36:53 but, you know, tensors are obviously a more general form of matrices, so they're manipulating these more general forms. I don't know the exact details about how a TPU works. So I would believe in practice that, yes, I also am not, you know, incredibly familiar with it, but that, you know, Google has adopted, you know, the tensor, the TPU approach and has used no NVIDIA, you know, GPUs, whereas, you know, Nvidia is used by almost everybody,

Starting point is 00:37:18 and it's the most valuable company in the world, and it has the stock market capitalization of all of the UK and India and Germany put together. So it's kind of astonishing that Google could be considered this kind of David, as I said before. But okay, so then in terms of alternative model, you know, applications of ML, what are some alternatives to, I've heard of these things like grok with the Q and other, you know, kind of neuromorphic but not actually. actual LMs, what are some of the kind of alternative algorithms that run on some form of matrix manipulating computational device?

Starting point is 00:37:56 I think in terms of making these things more energy efficient, right now when you look at these artificial neurons, they are of course being simulated in software. And so you have inputs coming into a neuron, it does some computation, and based on the computation, it produces an output. But in the context of a software simulation, the neuron has some real valued output that is always present. If you were to then implement that in hardware, that would be the equivalent of a neuron consistently having a voltage signal on its output side, which means it's consuming energy all the time, whereas our brains are what are called, you know, they have what are called spiking neurons

Starting point is 00:38:45 where our neurons essentially collect information that come in through the dendrites, they do some computation, and every so often, or very, you know, very infrequently, they'll fire. And that, you know, an occasional signal will go out on the axon in the form of spike trains, voltage spike trains. And a biological neuron, for the most part, is very silent. It's really not producing any output. It's just doing the computations, but staying silent. And when it does produce a signal, it's a spike train, which consumes very little energy. And we are now just now beginning to figure out how to build sort of artificial neural networks where the individual neurons are spiking neurons.

Starting point is 00:39:27 And then once we have figured out how to train large artificial neural networks made of spiking neurons, if we then implement them in hardware through these so-called neuromorphic chips, then we can potentially have very energy-efficient neural networks. Take a couple orders of magnitude or more, you know, in terms of energy efficiency. So that's definitely one thing to look out for. You know, you could still build LLMs using that architecture, but it would be significantly lower in energy consumption. But we still haven't cracked the problem of how to be.

Starting point is 00:40:05 build these things at scale and train them at scale. So that is one big sort of research area. The thing that I have been most intrigued by are efforts to get machine learning models to learn about the environment in which they are functioning and, you know, essentially learning models of the world in the form of abstractions. So they use, they kind of build abstract models of the world and then use those abstract models to make predictions about, you know, what's happening outside. And this whole approach is how we think our brains work.

Starting point is 00:40:42 Our brains, we think, work by constructing world models and situating ourselves as agents' models inside those world models. And then anytime we need to make a perception, our brains are essentially using these world models to hypothesize about what might be there outside that is causing the sensations that fall on our eyes or on our ears. And it's these hypotheses that we perceive as things that are out there. And then the brain has to do a whole bunch of processing over many, many layers

Starting point is 00:41:15 in order to make sure that what it is hypothesizing is out there, is actually out there in the form of making sure that the predictions it's making about the sensory consequences of whatever might be outside is exactly what was received by our senses. So there's a whole bunch of error processing going on. But fundamentally, you know, it has built these very sophisticated and complicated and abstract world models. And AIs that are beginning to do that might show us the way towards functioning more like the human brain than current LLMs. So they also would potentially have the capacity to be more sample efficient, they requireing less data.

Starting point is 00:41:55 Because when you think about our sort of cognition and our cognitive capacity, when you have, a problem, you're not constantly waiting for external sensory data. You are, you know, capable of running internal simulations, counterfactuals, right? So we are essentially generating so much data internally for our own neural networks. So it's entirely possible that if we can figure out how machine learning models can do the same, they could also become much more data efficient. So that's something to watch out for because they are, those things are going to do things differently than LLNs. Yes. Right.

Starting point is 00:42:33 Well, I promised that we would review the cover, judge the cover of the book, and now we'll do that. So we have a special jingle, which is generated by machine learning techniques. That will insert here. We're going to judge a book by its cover. Hey, book lovers. We're judging books by the covers. We know we're not supposed to do it.

Starting point is 00:42:52 But it's the impossible. There's nothing to it. Let's take a look and judge some books. All right. So, Neil, so take us through the title of the book, the subtitle of the book. and the cover artwork, please. So the title is, of course, why machines learn, and that was just a title, strangely enough,

Starting point is 00:43:11 that just popped into my head when I was first conceiving of the book. It came about because I was learning about a particular algorithm called the Perceptron, you know, learning algorithm, which is used for training single-layer neural networks. And as I was learning the math of why the algorithm works.

Starting point is 00:43:33 It was the beauty of the math that made me think of, oh, there is a book to be written about why all these algorithms do what they do from the perspective of the mathematics. So the why was just my, you know, writerly sort of conceit, right? You could just have, just as he said, how, and it would have been a fine title. But the why seems to grab people's attention, right or wrong. And in my mind, it was more why than how.

Starting point is 00:44:04 And the subtitle, again, it's just elaborating on this exact idea that there is a lot of very beautiful and relatively simple mathematics underlying this extremely powerful moment in time that we find ourselves in. And it's like maybe high school or first year undergraduate level, linear algebra, calculus. probability and statistics and some optimization techniques, right? It's not at all sort of, it's not the kind of physics that most sort of graduate students in physics or electrical engineering would do. They would do much more sophisticated math than what is required to understand. You know, again, there is a simplicity in the math for understanding how these machines or why these machines do what they do, but it's a very different level of math that you

Starting point is 00:44:58 if you are the one designing these algorithms. So that is a different ballgame is, right? The cover art on the book is a variation of some MC Escher etching. So completely due to my publishers. So they, I think there is, I think it's an MC Escher etching called Three Spheres, and then they've gone ahead and added a fourth one and made it color fun. Yes, it's sort of mesmerizing and kind of reminiscent of other curvilinear shapes and things like a 3D printed brain my kid made me.

Starting point is 00:45:31 Nice. In the book, you emphasize something that wasn't obvious to me, but it seems, you know, kind of if I were to set out on a journey to recreate machine learning techniques, I might stop because of this problem of what's called over-parameterization. And you make the case in the book that, you know, classically, there's something in, you know, classical statistical analysis that if you have an over-parameterized model, you should overfit the data and therefore your models should fail or you're representative of it. But deep learning seems to not only succeed but thrive on having more and more parameters.

Starting point is 00:46:11 I mean, every week we're getting inundated with new models and foundations and this number of billions of parameters. And soon it'll be trillions. I'm convinced it'll be true. So what's the least kind of hand-wavy explanation for how this even works at all, given the, you know, in classical statistics, that over-perimiteration should kill your reliability and therefore make it completely worthless. But in fact, it's one of the most useful tools ever created by humans.

Starting point is 00:46:41 So I think mathematically we are still trying to figure it out, right? You're right. The old statistical learning, machine learning techniques kind of made it very, very clear that if you overparameterize your model, you will end up memorizing it, which means you'll end up memorizing your training data or overfitting it. And then when you're when you're encountering new data, you won't be able to generalize to that new data. And so people used to make sure that their models were optimally paramarmetized so that you were not overfitting. And then along come neural networks. And we notice this empirically. So this is not something that

Starting point is 00:47:18 was worked out theoretically. They just noticed that if they just made the networks bigger and bigger, A, they worked better and they really noticed that these things were not overfitting. The consensus, well, I don't think there's any consensus at this point about why it's, you know, it is still an active area of research. Why do deep neural networks, despite being heavily

Starting point is 00:47:42 overparameterize, generalize so well as well as they do, and the fact that they don't overfit? There is some thought about that there might be some implicit regularization going on in these networks, that they do end up pruning themselves so that it's not as heavily parameterized as it seems at first blush. But still, you know, these networks have brought us into a regime of parameterization that was not the regime in which traditional machine learning functions. And what has been very interesting is not that people have figured out why neural networks are doing what they do. They've started noticing that other traditional machine learning techniques, like kernel methods and, you know, support vector machines combined with kernel methods and others, that also had hints of the same behavior,

Starting point is 00:48:41 but they were never really pushed, you know, early on. on, people just assumed that overparameterization was not to be done. And now there are hints in earlier papers that if you go look at some early machine learning papers, they were seeing this behavior in non-neural network machine learning methods. But they were never explored. So what the artificial neural networks have done is they have kind of opened our eyes to the fact that there is this completely new regime of operation, which potentially even traditional machine learning methods could benefit from.

Starting point is 00:49:14 and so now the math is being worked out and there is no clear answer to this yet. Hey everybody, I'm usually the one that asks my guests to judge their books by their covers, but today I'm asking myself to judge my own book by its cover. My newest book, Focus Like a Nobel Prize winner, is chartful of advice, life tips, and focus and productivity tips from nine of the world's greatest minds.

Starting point is 00:49:36 Nobel laureates ranging from economics to peace to physics, of course. I will go check it out. And my publisher's got an Amazon to run a special. So go to Amazon and get the Kindlecopy today. So another feature of this book is the, you know, incredible care and diligence by which you describe the nuts and bolts of how this field has come to be so successful in the mathematics of it. As I said, there's thousand equations. There's hundreds of illustrations.

Starting point is 00:50:05 There's interviews. It's an incredible book. As I said, it's sort of this hybrid new paradigm that's a blend between a textbook. and thriller, you know, historical thriller and, you know, kind of modern day application of, of, of, of, of, of, of these tools. But one of the kind of heroes in the book is, uh, is a technique called stochastic gradient descent. And I certainly wasn't familiar with it. I knew it was gradient descent. I've known about it since the time of Isaac Newton. Uh, but, but the question of how it works so well, given that these landscapes that you describe, you know, we, can only visualize in the book, two-dimensional, you know, three-dimensional projections of two-dimensional things. How is it possible in millions or trillions of dimensional landscapes that this S-G-D method works so well? First of all, could you explain it for the audience? And then how is it that they work so well and it become to be this kind of the superhero of ML techniques today?

Starting point is 00:51:06 Yeah, I think so the high-dimensional landscape, you're referring to our what I call loss landscapes, the error that a network makes, and it is error as a function of the number of parameters. So if it was, for instance, one parameter, you would just have a curve, a 1D curve, but if it's two parameters, then you have some sort of surface, 3D surface. Of course, these things have hundreds of billions, if not these days, close to a trillion plus parameters. So the loss function is in some extremely high dimensional space and also there are lots of non-linearities in the networks. So the shape of the lost landscape

Starting point is 00:51:49 is not convex. So it's not some sort of simple bowl-shaped surface where if you start off at some high point on the surface you can just do simple gradient descent and be guaranteed of coming, you know, finding the global minimum. And we don't even know if these things have something called a global minimum. So these are extremely high dimensional surfaces with lots of hills and valleys. And the weird thing is if you just did gradient descent, if you just went, you know, small step by small step down the lost landscape, trying to find a region of that landscape where the error that the network is making is very low, you might end up getting stuck in

Starting point is 00:52:33 some deep local minima and never be able to get out of it. So stochastic gradient descent is this idea that you kind of do a drunkard's walk down that slope. And you're taking steps not just always in the direction of the negative of the gradient, but you are taking steps that have some sort of stochasticity. And it's that stochasticity that potentially allows you to escape these local minima and end up finding what might be an optimal minimum, even though we don't know if it'll find a global minimum or even if one exists,

Starting point is 00:53:12 but it does end up finding some sort of optimal minimum, which represents a state where the network is making a low enough loss so that, a low enough error, so that is actually functioning the way you wanted to. So it's the stochasticity that seems to be allowing us to navigate this extremely complex lost landscape, and escape local minima. And the other thing that, you know, kind of resonates very highly is, of course, the Perceptron.

Starting point is 00:53:44 I think that is the, you know, main character energy of this book. Can you give a some sort of description for maybe a layperson of how these things were conceived of and what they fundamentally do in terms of simplifying, you know, these massive, you know, kind of data sets or whatever into. a tractable problem, maybe not always soluble, but at least tractable through very simple mathematics. But what is the fundamental job of a perceptron? I viewed it, you know, before as sort of this, you know, kind of black box, literally, you know, black box.

Starting point is 00:54:20 But now I see it more as kind of the transistor, the cubit, the element of ML. So can you describe that for the audience and whether or not you think that we'll still be talking about them and using them in 50 years from now? So the perceptron is the name given to the first artificial neural network, right? And it was Frank Rosenblatt who designed this artificial neuron. And the artificial neuron is a very, very simplified version of what we think is happening in our biological neurons. So the biological neurons have a whole bunch of inputs that come in through the dendrites.

Starting point is 00:54:57 The biological neuron does some computation. And then based on the results of that computation, it produces. and output. And you can think of this same thing now implemented as a piece of software, which is what Perceptron is. You have, you have, imagine, you know, a circular figure, which is the body of the artificial neuron, inputs are coming in. Let's say you have, you know, three inputs coming in, X1, X2, and X3. And what the artificial neuron does is it basically does a weighted sum of these inputs. So each of these inputs has associated with it a strength or a weight, like W1 for X1, W2 for X2, and W3 for X3. So it will do a weighted sum of the inputs.

Starting point is 00:55:41 And if the weighted sum exceeds some threshold, it will output a plus one. If the weighted sum is less than a threshold, it outputs a minus one. This was the computational unit that was the perceptron right now. And it was amazing that something this simple could then be used to do, for instance, classifying two sets of images into images that are cats and images that are dogs, right? So think about 10 by 10 images for argument's sake where you can, these images are black and white and they represent either they are either images of cats or images of dogs. And some human has painstakingly looked at these images and said, oh, if it's a cat,

Starting point is 00:56:29 we're going to call it plus one. If it's a dog, we're going to call it minus one, right? Now, you take each one of these images. If it's 10 by 10, that means there are 100 pixels. You turn the image into a single vector that is 100 elements long, where each element of that vector represents one pixel of information. You feed these 100 pixels into the perceptron. So now the perceptron instead of having three inputs is going to have 100 inputs because there are 100 pixels coming in. And you're training the perceptron to learn the weights that are necessary in order to take a certain image and output a minus 1 or a plus 1. And as long as these images are separable in 100 dimensional space into cats and dogs, where cats are in one part of the 100 dimensional space and dogs are in another part of

Starting point is 00:57:22 100 dimensional space and there's a clear gap between the two in this mathematical space, the perceptron will find a plane, hyperplane, that is capable of separating the dogs from the cats. And then when you have a new image and you want to know whether it's a cat or a dog, all you have to do is take the image, linearize it into the 100 pixels, feed it into the perceptron that you have trained, and it's going to say, oh, this is plus one or minus one. It doesn't know dog from cat.

Starting point is 00:57:49 All it knows is this side of the hyperplane. I'm going to call it cats, this side of the hyperplane. I'm going to call it dogs or whatever. And then this was the beginning. And even today's neural networks are just slightly more sophisticated forms of the artificial neurons. that Rosenblatt came up with in the 1950s, and they just have additional elements that bring in non-linearities

Starting point is 00:58:16 and allow you to train multilevered neural networks and things like that. But in essence, they are still simple computational units that are nowhere near as complicated as what a biological neuron is, and yet they do amazing things because of the fact that we can interconnect hundreds of billions of these things together. Will they be there? 50 years from now?

Starting point is 00:58:40 Oh my, that's a, that, I don't see why not? Because we have an existence proof of a machine that does something really well, and which is our brain. And our brain, one thing we can definitely say about our brain is it is made up of a whole bunch of neural networks, right? I mean, there are, you know, 86 billion neurons in there with 100 trillion connections. And even a very simplified model of that is a very complicated neural network. network. And it's obviously doing amazing things. So no reason to think that neural networks won't be around.

Starting point is 00:59:15 But we might come up with ways of interconnecting these neurons that are very different from the ways we do it today. So the architecture of these neural networks might be very different 50 years from now. But the idea that we'll have neural networks, I think they'll survive. Oh, yeah. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll. with your goals because we're built for what you're building fit for your ambition for citizens bank so um i know it's late there but if you'll indulge me with uh sort of a two-part question where the first part of the final question is just a as just a relatively rapid fire question which is

Starting point is 01:00:00 uh richard feinman you probably know was asked you know like what what is the nature of a scientific model of reality. And he gave an example where if an alien species looked at the Earth, the planet Earth, with its atmosphere and with the water cycle and so forth, it would,

Starting point is 01:00:20 if it were, had all the knowledge of the laws of physics, it would know that we have this phenomenon called rainbows, right? And he basically said, if you're, you know, understood basic physics, Maxwell's equations, and, you know, at a high level, as we do, you could make

Starting point is 01:00:36 predictions just from observations of the basic ingredients of a system. I want to ask you, before I turn to the final question, which will also be about this phenomenon, do you think like a smart alien species, you know, looking at LMs plus GPUs, plus machine learning, plus all the great stuff you write about in this book, could predict that these models would hallucinate and that they'd be sycophantic? And I'll tie that into one of your earlier books in just a second. But do you think it was inevitable, in other words, that these things would have these pernicious, in some sense, and very dangerous phenomena, potentially, of hallucinating, you know, where I asked it recently, you know, what books has Brian Keating written? And it said, you know, losing the Nobel

Starting point is 01:01:20 Prize and into the impossible and a brief history of time. And it's like, well, that's nice. I wish I had, you know, a couple percent of the book sales of Stephen Hawking, but I don't. So tell me, And would a, you know, kind of an intelligent alien looking at these models and so forth, would they be able to predict that they would eventually come to have these pernicious phenomena like hallucination and sycophanty? I think so. I think if you, you know, assuming that the aliens can look at the math, which if they are smart, they will obviously be able to.

Starting point is 01:01:53 It's no different from us knowing, you know, if you look at the math, it's very, very obvious why they're going to hallucinate, right? I mean, these next token prediction machines are essentially probabilistic. They're always trying to generate a probability distribution over their vocabulary to say what is the most likely next word. It's not 100% certain about what has to be produced. And it has learned about patterns that exist in data that is not a definitive amount of data for any particular problem.

Starting point is 01:02:27 And the way these things are constructed, they will always. output something that they think is the most likely one. And right or wrong, right? I mean, it's just the nature of the beast. It's just constructed in a way where, yes, if you look at the math, it's so obvious that, you know, I find the word hallucination itself problematic because the procedure that generates correct answers or answers that look correct to us is exactly the same procedure that results in hallucination.

Starting point is 01:02:55 So there should be anyone who can peek at the math will immediately. say, yes, of course these things will hallucinate. Yes, and in some sense, you know, as I said, it could be useful for me, you know, at least to buoy my confidence. But, of course, we do want things to give us factual information. But of course, to the extent that they mimic the human mind, you know, this is perhaps inevitable. So I want to follow up with a final question. You've been very generous with your time in a late evening there in Bangalore.

Starting point is 01:03:30 In the man who wasn't there, you write about these patients, these, who exhibit these sorts of very strange phenomena, and including what you define, I think, in that book, as maladies of the self. And these are confabulations and hallucinations, not as much about sycophanty, perhaps, as LLMs are prone to. But they lose a sense of themselves. And I'm wondering, has, you know, the explorations that you've done in LLMs, have that, has that kind of refer. find the way that you think about the way the human mind works. And we kind of mentioned this. And my hope is that you'll write a book about human, but you kind of did earlier. So what have you learned about the human condition, these unusual traits that, you know, you talk about in your earlier book, that these, by the nature of being complex systems that have emerging phenomena, that you sort of will get strange behaviors, maybe even worse than hallucination. and sycophanty, maybe true disorders, maladies, true maladies of the self. What do you make of that as kind of a learning that you encountered and writing this new

Starting point is 01:04:41 book, you know, after having written this incredible book, The Man Who Wasn't There? I think for me, the writing of The Man Who Wasn't There was very instrumental in making me think of what is happening within us in computational terms. it's kind of when you view our perception of our bodies of our cognitive selves, etc., through the lens of the brain

Starting point is 01:05:06 creating models of the environment, models of us embedded in that environment, using the models to make predictions about what is out there, including making predictions about our body, and the fact that what we perceive at any moment are predictions that the brain is making, once you view everything within that framework,

Starting point is 01:05:26 it again becomes very clear that while on average and most of the time the brain is doing what it's supposed to be doing and whatever we are perceiving is more or less congruent with physical reality so we are not being we're not hallucinating we're not being psychotic the fact that it is a computational process and the fact that there is stochasticity in that process means that these computational systems are prone themselves to making wrong predictions. And because what we perceive is the brain's prediction at any given moment, and we take the prediction to be real and truthful, even if the prediction is wrong, it will feel like real to us.

Starting point is 01:06:13 So it's very easy to understand why we end up having states of psychosis or states of hallucination. So now when you think of what's happening with machine learning models and the fact that we are seeing some of these processes, you know, in very minor ways being duplicated in machines, it starts, the connections become more and more obvious that, you know, we might even end up building machines or function like us, which will themselves be prone to psychosis, which will be no different than, I mean, right now we complain about the hallucinations that an LLM makes because it represents wrong answers to us. But imagine building a machine that is using its internal predictive mechanisms to understand its own state and its behavior in the world. And if those predictions about its own state are wrong, it is essentially hallucinating about itself. We're not too far away from building at least simple versions of such machines. And I don't even want to imagine where that will go. But the parallels are pretty striking between if we take a computational view of what's happening in our brains and the things that we're doing when we build these machines, the parallels are

Starting point is 01:07:24 striking. Annal Anathaswami, thank you so much for writing this wonderful book. It's really one of my favorite books. I only regret that I didn't read it earlier. I've interviewed, you know, dozens and dozens of people from both, you know, pro-AI, anti-Ele-AI and Lens, but understanding the details behind, you know, what's underneath the hood was a real treat. And you approach it as you do with all your writing in such a beautiful, eloquent, and a careful way that I just can't take you enough for this and the opportunity to interview you and for you to stay up late on this late December evening for you over there in Bangalore. Thank you so much. This has been a

Starting point is 01:08:05 real pleasure. Well, Brian, thank you very much for having me on your podcast. It's been my pleasure entirely. Thank you. If you enjoyed this conversation with Anil, you'll want to check out the follow-up interview I did with Jan Lacoon. We tackle many of these same questions, but from the perspective of someone building the systems themselves. Yon's no AI Dumer. That episode is linked right here. Watch it right now. And it's this conversation helped sharpen how you think about AI, not just what to believe,

Starting point is 01:08:29 but how to question it and how to understand when it's actually doing then. Please do me a favor. Like this video, subscribe to the channel, and leave a comment with a question you think the AI community is still avoiding. I read them all. See in the next episode. How many discounts does USA Auto Insurance offer? Too many to say here.

Starting point is 01:08:46 Multi-vehicle discount, safe driver discount, new vehicle discount, storage discount, legacy. How many discounts will you stack up? Tap the banner or visit usaa.com slash auto discounts. Restrictions apply. Did you know if your windows are bare, indoor temperatures can go up 20 degrees? Get ahead of summer with custom window treatments like solar roller shades from blinds.com and save up to 45% during the Memorial Day early access sale. Whether you want to DIY it or have a pro handle everything, we've got you.

Starting point is 01:09:13 Free samples, real design experts, and zero pressure. Just help when you need it. Shop up to 45% off site-wide right now during the early access Memorial Day sale at blinds.com. Rules and restrictions apply.

Into the Impossible With Brian Keating - The Mysterious Math Behind LLMs | Anil Ananthaswamy

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.