Into the Impossible With Brian Keating - The Mysterious Math Behind LLMs | Anil Ananthaswamy
Episode Date: January 23, 2026WANTED: Developers and STEM experts! Get paid to create benchmarks and improve AI models. Sign up for Alignerr using our link: https://alignerr.com/?referral-source=briankeating One of the most powe...rful AI systems we’ve ever built is succeeding for reasons we still don’t understand. And worse, they may succeed for reasons that might lock us into the wrong future for humanity. Today’s guest is Anil Ananthaswamy, an award-winning science writer and one of the clearest thinkers on the mathematical foundations of machine learning. In this conversation, we’re not just talking about new demos, incremental improvements, or updates on new models being released. We’re asking even harder questions: Why does the mathematics of machine learning work at all? How do these models succeed when they suffer from problems like overparameterization and lack of training data? And are large language models revealing deep structure, or are they just producing very convincing illusions and causing us to face an increasingly AI-slop-driven future? KEY TAKEAWAYS 00:00 — Book explores why ML works through math 02:47 — Perceptron proof shows simple math guarantees learning 05:11 — Early AI failed due to single-layer limits 07:12 — Nonlinear limits caused the first AI winter 09:04 — Backpropagation revived neural networks 10:59 — GPUs + big data enabled deep learning 15:25 — AI success risks technological lock-in 17:30 — LLMs lack human-like learning and embodiment 22:57 — High-dimensional spaces power ML behavior 27:36 — Data saturation may slow future gains 31:11 — Continual learning is still missing in AI 33:46 — Neuromorphic chips promise energy efficiency 41:49 — Overparameterized models still generalize well 45:05 — SGD succeeds via randomness in complex landscapes 48:27 — Perceptrons remain the core of modern neural net - Additional resources: Anil's NEW Book "Why Machines Learn: The Elegant Math Behind Modern AI": https://www.amazon.com/Why-Machines-Learn-Elegant-Behind/dp/0593185749 Get My NEW Book: Focus Like a Nobel Prize Winner: https://www.amazon.com/dp/B0FN8DH6SX?ref_=pe_93986420_775043100 Please join my mailing list here 👉 https://briankeating.com/yt to win a meteorite 💥 - Join this channel to get access to perks like monthly Office Hours: https://www.youtube.com/channel/UCmXH_moPhfkqCk6S3b9RWuw/join 📚 Get a copy of my books: Think Like a Nobel Prize Winner, with life changing interviews with 9 Nobel Prizewinners: https://a.co/d/03ezQFu My tell-all cosmic memoir Losing the Nobel Prize: http://amzn.to/2sa5UpA The first-ever audiobook from Galileo: Dialogue Concerning the Two Chief World Systems: Ptolemaic and Copernican https://a.co/d/iZPi9Un 📺 Watch my most popular videos:📺 Neil Turok https://www.youtube.com/watch?v=Dt5cFLN65fI Frank Wilczek https://youtu.be/3z8RqKMQHe0?sub_confirmation=1 Eric Weinstein vs. Stephen Wolfram https://www.youtube.com/watch?v=OI0AZ4Y4Ip4?sub_confirmation=1 Sir Roger Penrose: https://youtu.be/AMuqyAvX7Wo Sabine Hossenfelder: https://youtu.be/g00ilS6tBvs Avi Loeb: https://youtu.be/N9lUceHsLRw Follow me to ask questions of my guests: 🏄♂️ Twitter: https://twitter.com/DrBrianKeating 🔔 Subscribe https://www.youtube.com/DrBrianKeating?sub_confirmation=1 📝 Join my mailing list; just click here http://briankeating.com/list ✍️ Detailed Blog posts here: https://briankeating.com/blog 🎙️ Listen on audio-only platforms: https://briankeating.com/podcast #universe #podcast #briankeating #intotheimpossible #science #astronomy #cosmology #cosmicmicrowavebackground #intotheimpossible #briankeating #AnilAnanthaswamy Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
It's peak pollination season, and my business is scaling fast.
To keep the nectar flowing, I need a phone plan with top priority data speeds.
That's why I chose Google Fi Wireless.
My connections stay strong even when the hive is buzzing.
Plus, unlimited plans start at $35 a month.
Now that's a deal that doesn't stay.
Explore Google Fi Wireless plans today.
Plus taxes and government fees.
Google Fi Wireless is not subject to data traffic deprioritization during times of high network usage.
You said this place was steps from the water.
We just haven't found the steps yet.
How much did we save?
Enough.
Enough to get lost.
Or you could book a stay with Hilton.
Welcome to your oceanfront room.
Just steps from the water.
The Hilton sale is on now.
Book on Hilton.com or the Hilton app
and save up to 20% to get the stay you expected.
When you want savings, not surprises.
It matters where you stay.
Hilton, for the stay.
What if the most powerful AI systems we've ever built are succeeding for reasons we still don't understand?
And worse, they may succeed for reasons that might lock us in for the wrong future for humanity.
Today's guest is Anil Ananthaswamy, an award-winning science writer and one of the clearest thinkers on the mathematical foundations of machine learning.
In this conversation, we're not just talking about new demos or incremental improvements or dates on new models being released.
We're asking even harder questions.
Why does the mathematics of machine learning work at all?
How do these models succeed when they suffer from problems like overparameterization and lack of input training data?
Are large language models revealing deep structure?
Or are they just producing very convincing illusions and causing us to face an increasingly AI slop-driven future?
Thank you so much for joining us all the way from Vangalore. This is so exciting.
Well, Brian, thank you very much for having me. It's a pleasure.
It's really a wonderful book. We're going to judge
the book by its cover, as I like to do later on.
It's entitled, Why Machines Learn?
And the first question I want to ask you, O'Neill, is I was taught as a physicist,
you can never ask why questions.
That's the first word of your title.
What made you want to explore why and not how or what machines learn instead of why?
It's funny.
I answered this exact question yesterday at a panel discussion about the very same doubts
that people had.
This is just a rightly conceit, I must admit.
The title came about because when I was trying to learn the mathematics of machine learning,
I encountered very early on this amazing proof that uses very simple linear algebra
to show that a single-layered neural network, something called a perceptron from the 1950s,
will converge to a solution in finite time if a solution exists.
And, you know, in the late 1950s, the algorithm was first developed,
which was essentially a simple neural network that could, you know, do linear classification.
The algorithm is very simple, and that to me is the how.
And a few years after the algorithm was developed,
people started mathematically proving that the algorithm would converge to a solution
in finite time if a solution existed.
And to me, in my head as a former software engineer, the math became the why.
And of course, if you were to ask a physicist, they would just, you know, it's funny because
about a couple of months ago in Bangalore, David Gross was visiting the Nobel laureate.
And he had the exact same question about the title of the book.
And I tried to give him my rationale.
And he did not buy that one way to.
He said, no, there's no why here.
It's how. So yeah, it's just a writer's conceit, you know, to me, how is the algorithm? And because the book is about the mathematics, and I feel like the math kind of gives you a rationale for why these algorithms do what they do. So that's how the title came about.
What was the first mathematical idea that you encountered in machine learning and research that you did on the book that made you stop and think that this is genuinely beautiful, as I find it to be?
Oh, it was exactly this perceptron convergence proof.
So maybe we can kind of talk a little bit about how that perceptron came about, right?
In the late 1950s, when Frank Rosenblatt, who was a Cornell University psychologist,
he designed what was the first kind of artificial neural network.
And it was a single-layer neural network.
And like I just said, you know, the initial,
work was simply developing the algorithm and showing that it worked, it did pattern classification.
It was able to take two categories of data.
And if these two categories were linearly separable in some mathematical space, the algorithm
would find the linear divide between the two clusters of data.
And subsequent to the invention of the algorithm, people started mathematically showing why this
was powerful and why this classified even worked.
And you have to think back to the 1950s,
you know, when somebody gave you a mathematical proof
saying an algorithm would find a solution in finite time
if a solution existed, that was like gold dust, right?
That was really, but when you look at the proof,
it's very, very simple linear algebra, right?
It is just manipulating vectors and matrices.
There is nothing more than that.
And it is so beautiful.
So that was the proof that made me kind of say, oh, hang on.
Until then, I was just learning it for the sake of learning.
I was not thinking about a book.
I did not, you know, this was just me trying to get under the skin of machine learning
to try and understand it for myself.
But when I encountered that proof, that's when kind of a light bulb went off saying,
hang on, there are all these beautiful things that we should be communicating to readers.
And so that set me off on a journey looking for other theorems and proofs that exist in machine learning that could then form the backbone of a mathematically oriented narrative, historical narrative of machine learning.
Why do you think it took so long for the tools and the techniques of machine learning, the mathematics, which, as you say, is very simple.
I mean, I teach it to, you know, my undergraduates and even high high schoolers that I happen to know.
So how is it that it took so long for it to develop into this incredibly dominant part of our economy?
Yeah, I mean, so the machine learning and AI that we talk about these days is what is called, you know,
deep learning or deep neural networks.
And these are very, very massive artificial neural networks, where a neural network is simply, you know,
a whole bunch of artificial neurons interconnected together and an artificial neuron.
you can think of it as a computational unit.
Some inputs are coming in.
It does the weighted some of the inputs,
and if that input exceeds the threshold,
it does something on the output side.
So it does that kind of computation.
And that's an artificial neuron
and a whole bunch of these things
interconnected together, form a neural network.
In the 1950s when they started,
the only way we could do anything with these networks
is if they were single-layer neural networks,
which meant that,
inputs are coming in, the neurons are there, and they do the computation and produce the output.
Just one layer of neurons.
The training algorithms that like Frank Rosenblatt had developed or even somebody like Bernie Widrow
had developed the least mean square algorithm, these things only worked for single layer neural
networks.
And the moment you put another layer, after the input layer, the training algorithm didn't work.
Essentially, the moment you had multi-layer neural networks, the algorithm that they had to train the network was ineffective.
And so that was in the beginning a stumbling block.
So in the 1960s, people kind of realized mathematically that these single layer neural networks actually were good at what they were trying to do, which was linear classification.
But they were really no good for anything that involved finding a nonlinear boundary between two classes of data.
And
towards the end of the decade
in the 1960s,
Marvin Minsky and Samore Pappert wrote
this amazing book called Perceptrons
in honor of Frank Rosenblatt,
who had developed the first neural networks.
And in that book,
they had a very elegant proof for why
the perceptron would converge to a solution,
but they also had another proof
which showed that if the solution
involved a non-linear boundary,
then the perceptron,
would fail at very, very simple tasks.
And that kind of put a big damper on, you know, research because people thought that, oh, if it
can't even solve this problem, which was, you know, just literally taking four data points,
arranged on the X, Y, planes such that a single straight line could not separate, you know,
the circles from the triangles, the circles and triangles between being two different kinds of
data.
And the other thing Minsky and Pappar did was they kind of insinuated without, you know,
mathematical proof that even if you have multilayer neural networks, they would still not be able
to solve these simple nonlinear problems.
It was not a proof, which is very obvious now because, of course, these things solved nonlinear
problems.
But at the time, people kind of took that at face value and the research interest in this topic
dried off, funding dried off.
So neural network research kind of fell off a cliff during the 1970s.
That was the first AI winter.
But there were other kinds of machine learning going on, non-neural network based.
machine learning techniques that were like the nearest neighbor algorithm, which was really popular
for a long time.
They were, you know, Bayesian classifiers.
All of this stuff was getting developed and studied and support vector machines, which came
about a bit later.
The real reason for why neural networks never really took off until, let's say, the next phase
was the 80s, when people like Hinton and John Huffield, both of whom got the Nobel last year
for their work during the 80s.
They kind of reinvigrated interest in these neural networks.
Hopfield designed what are now called Hopfield networks.
They're not used anymore today for what we are doing with AI,
but they were a big deal in the early 1980s.
And then Hinton, along with David Rumelhardt and Ronald Williams,
they wrote the first paper on the back propagation algorithm,
or at least they put everything together to show how a deep neural network
could be trained using something called the back propagation algorithm.
So until then, we just didn't know how to train these networks.
So that was a big sort of huge gap between the early neural networks in the late 50s all
the way to the mid 80s.
But even then, even once we figured out how to train these networks, it was still not enough
because these networks are extremely data-hungry, right?
They require a lot of data to learn about patterns that exist in data.
We just did not have the data in the 80s.
And the other thing is that they're also very compute-hungry.
You need a lot of computing power to train these networks.
And of course, we just didn't have that either in the 80s.
So even though, you know, late 80s and early 90s,
we had these things called convolutional neural networks
that Jan Lecund and his team had developed,
again, they went nowhere because of the lack of data and the lack of compute.
And traditional machine learning methods continue to flourish
You know, throughout the 90s and early 2000s, we had support vector machines, which were a big deal.
And it was really the invention of the internet, the availability of extremely large-scale data that you could essentially use the internet to collect all that data, images or text or whatever.
And the realization that you could use the compute that was the computing power that had been developed in the form of graphical.
processing units, GPUs. Everyone knows about GPUs today as being the backbone of what's happening
with AI. But really, these things were developed for video gaming. So, you know, when you think about
what, you know, video games need to do, they need to refresh your screen at a very fast rate. And
the screen is essentially a matrix of numbers, the pixel values, right? So they're very, very good at
manipulating matrices of numbers in order to, you know, refresh the screen.
for the purposes of running a video game.
And people like Hinton and many other people
realize that, OK, we can use these GPUs
to do matrix manipulations, which are also the backbone
of machine learning.
So when a machine learning algorithm transforms
an input vector into an output vector, the inside the black box,
it's essentially matrix manipulations that are happening.
So they, you know, so it was
It was a combination of having enough compute in the form of GPUs and having the data
that then in the 2000s, late 2000s and early 2010s that things began to change dramatically.
Today's video is sponsored by my friends and a liner.
If ever asked AI a tough question and got back, Gabile Gook, that's not entirely the fault of the
AI, but the frustration that you feel could actually be worth up to $150 per hour.
Behind every AI breakthrough is a network of experts actually teaching these systems how to think.
And my friends at a liner are connecting brilliant people, mathematicians, scientists, engineers, geniuses just like you to make sure AI works for all of us.
Liner has specifically partnered with the Into the Impossible podcast to find geniuses from my network to give AI models expert feedback.
Your job, if you accept it, is to evaluate AI outputs.
That's it.
Design problems that even today's best models can't solve.
Your job is to grade their attempt at quantum mechanics, topology, advanced coding.
You're literally teaching AI the difference between right and roll.
undergraduate mistakes and doctoral level thinking. That's why they're partnering with me.
Listen, I know that many of you have done an unpaid internship, shall we say? Been lab rats
running someone else's experiments. But now is your turn. You don't have to grind as
test particles in someone else's lab ever again. This is different. It can be done all
remotely, timing is flexible, and you get paid weekly up to $150 per hour. Aliner is
selective. They need to be in order to get the best results, right? They only accept
people who can genuinely push AI forward. Most applicants won't make the cut. So check out a
miner.com using my link below.
AI has already consumed the internet and likely wasted a lot of your time, as it has with mine,
with incorrect answers, logical flaws, or poorly worked out solutions.
This is your chance to get it right for the future of science and to get paid while
you're at it.
Like the link below.
And we'll get to, you know, get back to the kind of historical overview of it and even
some of the nuts and bolts of how a perceptron works and some of the matrix algebra.
You know, it's remarkable.
And, you know, there's a famous quote attributed to Stephen Hawley.
that every equation in your book cuts the readership and half.
But that's true.
I shouldn't have even read this, but I mean, it's got over, you know, 400 equations
and incredible detailed illustrations.
It's really, it's sort of this hybrid between a textbook and a thriller,
a historical thriller.
I just think you're to be congratulated for doing it.
I listen to it, which I don't know if I recommend the audiobook compared to the,
the printed book. I really love the printed book. I'm actually giving it to one of my kids who's
very, very precocious and wants to learn calculus. I figured maybe he can learn calculus
from machine learning that you describe in this book. Study and play. Come together on a Windows 11 PC.
And for a limited time, college students get the best of both worlds. Get the Unreal
College deal, everything you need to study and play with select Windows 11 PCs.
eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate with a custom color Xbox wireless controller.
Learn more at Windows.com slash student offer.
While supplies last, ends June 30th, terms at AKA.m.m.S. College PC.
But you mentioned, you know, this kind of, to paraphrase Mark and Drison, you know, AI is eating software and software is eating the world.
I'm going to talk about this phenomenon, which I've done a little bit of research on for fun for the podcast.
It's called lock-in.
And I'm not sure if you're familiar with it,
but I'll just describe what it is.
It's the phenomenon by which an early technology
becomes super dominant,
cannibalizes everything that came before it
because it enables some new efficiency
or new capability that heretofore didn't exist.
And, you know, there's a couple of classic examples.
One is the QWERTY keyboard, which is not optimal.
And it's not efficient from a human, you know,
from a frequency of words and typing perspective.
But it was invented because the typewriters
that were early adopted had this problem
that the keys, the mechanical hammers,
would stick together if they were used
too often next to each other.
So they wanted to space letters apart
so that they wouldn't be pressed at the same time
and you wouldn't have this lockup,
not lock in, but lock up.
Another example is the quality
of the Hubble deep field image
is great, it's breathtaking,
but it could have been, you know,
as good as the web telescope images,
which are, you know, 10 times better,
if not for the fact that the backside of a horse
is about a meter across.
So when the Romans designed chariots
to be pulled by two horses,
that was set by the width of the horses rear end.
And because of that,
the roads and the train tracks
that later took precedence over the roads
had a width of about two, you know,
two to four meters.
to accommodate two chariots going back and forth.
And because of that,
and because of the fact that the space shuttle was built,
its boosters were built in Utah,
and the launches were in Florida,
they had to transport these massive rockets
through train tunnels all the way from Utah in the U.S. to Florida,
which meant it had to go through a train tunnel,
which meant it couldn't be bigger than a certain diameter,
which meant that the specific impulse,
the thrust couldn't be above a certain amount,
which meant they couldn't get to a high enough altitude
that it could have taken a better image.
Okay.
These are examples of lock-in,
that some early technology establishes
the basically dooms the future
into this, you know,
kind of irrevocable prison
that it can't escape from.
And I'm wondering the success,
this transition inflection point
with LMs plus GPUs,
I'm worried it's another type of lock-in.
And as successful as it is,
I'm worried that we won't get
the things that I'm most interested in,
which, you know,
new laws of physics,
and new descriptions of mathematical reality, et cetera.
Do you worry about the success, not the failure, not the AI winters and stuff,
but do you worry about the summers being so bountiful that it will crowd out,
essentially any competing and possibly better technology?
Yes, I think you're spot on because if you, and here the lock-in weirdly is the
incredible amount of data that we have been able to scrape off the internet, right?
and also in the presence of GPUs.
Now, the GPUs, one can argue that they're just a computing element,
which haven't necessarily locked us in.
But I think this LLM revolution has been made possible
because of this extraordinary amount of data on the Internet, right?
And we have managed to somehow create these models
that are learning about, you know, the knowledge
and the sort of syntax of human written language.
And kind of, it's an intelligence that is imposed from the top down.
These machines are not learning things from the ground up
the way, let's say, humans do or animals do.
And our general intelligence very much is a property of the fact
that nervous systems have evolved over evolutionary time
and nervous systems have encountered things in their environment
and have enabled the development of, you know,
the development of brain structures and algorithms that operate in those brains from the ground up.
And I hadn't thought of it in the way that you're framing it, but it makes complete sense
that the economic incentives now to succeed in this arena is so high that there's so much
money that is being poured into building these LLMs and they're getting bigger and bigger.
People have bought into the argument that scaling up is going to unlock more and more,
quote, unquote, intelligent behaviors.
So, yes, at this moment in time, we are certainly locked into this particular form of, you know,
AI, so much so that I'm sure there are many, many, many smart people who otherwise could have
been doing other kinds of research into, you know, different kinds of models that would
potentially learn how to generalize better, be much more sample efficient like our brains are,
use much less energy than these LLMs do, et cetera. And all of those areas of research have
probably been kind of squeezed of funding because of the money that's going into developing
LLMs. So yeah, entirely, entirely possible that we are in a phase of lock-in because of this
current trend. And, you know, as I said before, to me, the greatest thing would be to get,
you know, a theory of quantum gravity, you know, that no human has been able to come up with.
And I want to draw your attention to a statement made by a different Nobel Prize winner.
It's Albert Einstein.
who said that his greatest thought,
his happiest thought,
was that an observer in free fall
would experience no gravitational force.
And he literally said it gave him tingles up his spine, basically.
And, you know, I wonder to what extent,
and that allowed him to create the, you know,
principle of general relativity and, and equivalence,
principles and so forth that we credit to him.
But I wonder, you know, can a computer experience a tingle down its spine,
Conversely, can it experience pain?
Can it have a happiest thought?
And if not, what does that portend for its ability to create new laws of physics that humans are incapable of creating with this, you know, three-pound, you know, neural network that we have in our brains?
To what extent, in your opinion, is embodiment is, you know, kind of unique human sensations, what we call qualia?
Are those important for making breakthroughs that really matter to scientists, say, like me?
Oh, that's a huge question, right?
I think it comes down at some very basic level to what we think is human consciousness
and whether our intelligence and our consciousness can be thought of in materialist terms.
So for people who take the view that everything about our consciousness and intelligence
can be explained eventually in computational terms, and even if it is computational,
then the computation also is substrate independent.
If that's the case, if everything that we are, and it's a big, big if, if everything that we are
is something that can be boiled down to computational principles, substrate independent computational
principles, then I don't see any in-principle reason why machines cannot be built to perform
those very computations and have the same kinds of experiences, et cetera, that we are
privy to, right?
But there's a big if there.
And that's a huge one.
Can LLMs have those?
Again, I mean, a lot of this comes down to agreeing or disagreeing upon what we think is happening within us.
That's right.
Yeah, I almost thought as I was reading this, I hope Naneal writes a book, Why Humans Learn.
Yes.
I mean, that's a big question for right now, for even machine learning people and computational neuroscientists.
We don't have full-fledged answers to, you know, why we do what we do.
So our intelligence, what kinds of algorithms are running in our brains, for instance,
is everything finally describable in terms of computation?
Even that question is not answered.
The answer to your overarching questions about whether machines can eventually feel and have
feelings the way we do hinges upon answers to questions about our own intelligence and our
own consciousness.
If everything that we are can be talked of in materialistic terms can be reduced to
the workings of matter, and if, you know, if all of what we are is somehow captured by
computations and the computations have to be substrate independent, it doesn't require biology,
it could happen, you know, in silicon material, then yes, why not? And embodiment would be just
another axis on which these machines would function. But without knowing the answers to
questions about human intelligence and consciousness, it's really hard to answer.
what will happen with machines.
I don't think we are in a position right now
to definitely say that we will be able to build machines
that will feel and have conscious experiences.
It all depends on our definition of consciousness.
And then there are people even today
who would say that, yes, machines are very definitively
going to be conscious.
And you'll find as many people
who will completely say, no, that's absolutely not possible.
So I think it's an open question.
whether conscious experiences are eventually necessary for the kind of breakthroughs that we're talking about,
you know, coming up with the theory of relativity without having any prior knowledge of that stuff,
you know, that I'm not so sure consciousness is necessary there.
To me, they're orthogonal problems, like intelligence and consciousness are you can have them varying on orthogonal axes.
So you could potentially have a system that is capable of.
of coming up with something new, but have no quote-unquote conscious experience of it,
hence no joy, no pain, whatever.
What do you think is the most underappreciated and over, you know,
kind of emphasized aspects of machine learning that you've encountered?
Underappreciated.
I think for me, after having written this book about the mathematics of machine learning,
I, the thing that I find most fascinating and that,
is really underappreciated.
And I think it's hard for someone who hasn't encountered the math to even appreciate
is the high dimensional mathematical spaces in which these machines operate, right?
I mean, these are all, these machines are doing their thing in vector spaces.
And it's extraordinary when you look at the dimensionality of these mathematical spaces
in which these calculations are happening.
And the properties of these mathematical spaces that lead to the properties of these
machine learning algorithms. That is really fascinating. But I don't know how something like that
could be appreciated or even, you know, communicated without explaining a whole bunch of stuff
about vector spaces and things like that. So there is something very beautiful that is happening
in these mathematical spaces. And it's entirely possible that our brains are also functioning
similarly, you know, navigating high dimensional spaces to do the things that they do. And to me,
that's the most fascinating part. And yeah, you mentioned this phenomenon.
of emergence, which is, you know, like the Supreme Court in America said about pornography,
which is, you know, you know it when you see it, but it's very hard to define how these phenomena
really do come about. It really was not truly clear to me until I read your book. And in terms of,
you know, the details of how these algorithms work, but also the import of the training data
and how important that, really crucially important that is, you go.
over some of the restricted training data, you know, the U.S. Postal Service data that was used
for, you know, recognizing numbers and so forth. And then, you know, we don't look at the post
office as the model of efficiency, but it does do this incredibly well in optical character
recognition and all sorts of other techniques that they pioneered that you mentioned in the book
in other countries as well. But it seems to me, you know, kind of this very strange phenomena
to be in that we've ingested most of the Internet. You know, we have these huge, huge
number of tokens and parameter models that you could put, you know, on your local desktop
and soon on your phone will be, you know, not far behind if it's already not here.
But that, you know, what is left to be ingested, you know, when I talked to Yanukun
last year, you know, he was saying, well, a cat, you know, can take in, you know, four
terabytes of data per second.
But, you know, if these algorithms are waiting for the next, you know, avatar movie
to come out so it can ingest in more language and more data into its training set, if that's
allowed even. It seems to me like we're just going to slowly asymptotically converge to
everything has the same information because there's only one internet out there. And yes, it's hard to
characterize it all. Could it be that the very enabling feature of the success of these models
will be its downfall because eventually there'll be no advantage. Everything will have the same data
where all have access to the same internet.
And there'll be no advantage to any of these models,
and they should just all have the same outputs
and given some predictive input.
So what do you make of the kind of, again,
a lock-in phenomenon that having all this training data
was crucial, but now we're kind of saturated,
and maybe that means we'll asymptotically improve
only very slowly in the future.
Entirely possible.
Because the lack of sort of freely available data
is very obvious now. I think
all that has been already scraped and taken in.
There's a lot of data still locked
in behind firewalls
within institutions
and corporations and
private hands. And that's actually
very, very high quality data as opposed to
the stuff that we have scraped off the
internet, which is relatively low
quality data.
But there's a lot of structured
data that exists in company
databases and institutions.
And that, there is still
value to be unlocked there.
There's also this idea that
we could have
synthetic data generation.
Now, that
has the danger that we will end
up sort of, you know,
AI is generating data and then
kind of, there's a very interesting, very
evocative phrase that was used by someone,
I forget who it was, they said
that eventually these models will choke on
their own exhaust, right?
Own it all.
Pay off your home, travel for life, drive up a
In celebration of the world premiere of the Monopoly Big Board Buckslot Machine by Aristocrat Gaming, Yamava Resort and Casino at San Manuel is giving one person a $1.6 million dream package.
The biggest prize in Yamaba's history.
Club Serrano members can earn daily instant prizes and secure a spot in the finale May 29th.
Don't pass go and own it all, only at Yamava, celebrating its 40th anniversary.
You win?
Details at Yamava.com must be 21-20.
Please gamble responsibly.
Monopoly is a trademark of Hasbro.
Hasbro is not a sponsor of this promotion.
I call it.
Sorry to two, my own horn about it.
I, you remember the mad cow disease of the 1990s in the UK when basically all meat
was tainted because cows were fed cows.
So I call it mad bot disease, you know, where they're taking in their own data and
then, you know, using it to regurgitate to something new.
But I like the exhaust as well, but go on.
Yeah.
No, so you're, and this is a valid concern, right?
People have this concern that maybe we are saturating.
And, but it's also true that even if, even if, you're, you're not.
we just continue the same paradigm of training on more and more data, there is still very,
very high quality data that is available and we just haven't used those.
And it may not, it's possible we may not be able to use them for publicly usable LLMs because
this will be copyrighted data and private data and there will be all sorts of concerns about,
you know, privacy of the people whose data it is, etc.
So I'm not sure it can be unlocked that easily, but there's, there is good data out there.
My sense is that, and Jan Lakun is right about this, that there are ways in which animals and humans learn, that there's something, we're doing something very different than LLMs.
You know, we don't require, even though as a child or, you know, as a cat, we encounter a lot of data, there's a lot of structure in the, in the environment that we are encountering.
and there is something about the algorithms that we have
that are operating inside our brains that are much more sample efficient.
We just don't require that many examples of some instance of a pattern
for us to learn about what it is.
And then we are able to generalize so much easier, right?
We learn abstractions about some problem,
and then we use the learned abstractions to then solve a problem
in a completely different domain.
and machine learning algorithms are not there yet, even these LLMs, they can't generalize the way we do.
So my suspicion is that even if LLMs and the current approach saturates on this data problem,
the breakthroughs might come in the form of new algorithms that learn very differently.
And they learn continually, right?
So current machine learning models, especially LLMs, they don't have this feature of continual learning.
They're not, you know, you train a model and then you freeze that model.
That's it.
The weights of the model don't change after that.
You can use it as much as you want, but it is what it is.
And you get a snapshot in time of the knowledge that it has ingested.
And it's not a continually learning machine.
And we are, of course, we are learning all the time.
And even though when we learn new things, we don't mess up things we have already learned
or we don't forget the things that we learned before, machine learning algorithms are not
like that right now. So somebody is going to figure out how to come up with machine learning
algorithms that are, you know, capable of continual learning, are more sample efficient,
energy efficient, et cetera, and are able to generalize better. Then the data problem will be,
will not be as acute. And what, you know, kind of alternatives, if you had to take, you know,
the Schrodinger versus Heisenberg, you know, from your previous explorations in physics and
through two doors at once. What sort of, you know, competitors to the GPU plus LLMs are there?
Even if it's, you know, it's kind of the 98-pound weakling versus the, you know, the behemoth.
What's sort of the David to the LLM plus GPU Goliath right now?
Yeah. I'm not so sure the GPU part is really the issue.
because even any kind of computations that are happening in these machine learning models
finally will involve matrix manipulations.
So the GPU is going to be important.
Whether you require as many GPUs for other algorithms, that's a different question.
We don't know the answer.
Let me just break in.
What about these tensor TPUs?
What's their fundamental advantage or comparative difference between those and GPUs?
I have lack of knowledge there about the exact differences between TPUs and GPs.
I mean, they're still doing matrix manipulations,
but, you know, tensors are obviously a more general form of matrices,
so they're manipulating these more general forms.
I don't know the exact details about how a TPU works.
So I would believe in practice that, yes, I also am not, you know,
incredibly familiar with it, but that, you know,
Google has adopted, you know, the tensor, the TPU approach
and has used no NVIDIA, you know, GPUs,
whereas, you know, Nvidia is used by almost everybody,
and it's the most valuable company in the world,
and it has the stock market capitalization of all of the UK and India and Germany put together.
So it's kind of astonishing that Google could be considered this kind of David, as I said before.
But okay, so then in terms of alternative model, you know, applications of ML,
what are some alternatives to, I've heard of these things like grok with the Q and other, you know,
kind of neuromorphic but not actually.
actual LMs, what are some of the kind of alternative algorithms that run on some form of matrix
manipulating computational device?
I think in terms of making these things more energy efficient, right now when you look at these
artificial neurons, they are of course being simulated in software.
And so you have inputs coming into a neuron, it does some computation, and based on the computation,
it produces an output.
But in the context of a software simulation, the neuron has some real valued output that is always present.
If you were to then implement that in hardware, that would be the equivalent of a neuron
consistently having a voltage signal on its output side, which means it's consuming energy all the
time, whereas our brains are what are called, you know, they have what are called spiking neurons
where our neurons essentially collect information that come in through the dendrites, they do some
computation, and every so often, or very, you know, very infrequently, they'll fire. And that, you know,
an occasional signal will go out on the axon in the form of spike trains, voltage spike trains.
And a biological neuron, for the most part, is very silent. It's really not producing any output.
It's just doing the computations, but staying silent.
And when it does produce a signal, it's a spike train, which consumes very little energy.
And we are now just now beginning to figure out how to build sort of artificial neural networks
where the individual neurons are spiking neurons.
And then once we have figured out how to train large artificial neural networks made of spiking neurons,
if we then implement them in hardware through these so-called neuromorphic chips,
then we can potentially have very energy-efficient neural networks.
Take a couple orders of magnitude or more, you know, in terms of energy efficiency.
So that's definitely one thing to look out for.
You know, you could still build LLMs using that architecture,
but it would be significantly lower in energy consumption.
But we still haven't cracked the problem of how to be.
build these things at scale and train them at scale.
So that is one big sort of research area.
The thing that I have been most intrigued by are efforts to get machine learning
models to learn about the environment in which they are functioning and, you know,
essentially learning models of the world in the form of abstractions.
So they use, they kind of build abstract models of the world and then use those abstract models
to make predictions about, you know, what's happening outside.
And this whole approach is how we think our brains work.
Our brains, we think, work by constructing world models
and situating ourselves as agents' models inside those world models.
And then anytime we need to make a perception,
our brains are essentially using these world models
to hypothesize about what might be there outside
that is causing the sensations that fall on our eyes or on our ears.
And it's these hypotheses that we perceive as things that are out there.
And then the brain has to do a whole bunch of processing over many, many layers
in order to make sure that what it is hypothesizing is out there,
is actually out there in the form of making sure that the predictions it's making
about the sensory consequences of whatever might be outside
is exactly what was received by our senses.
So there's a whole bunch of error processing going on.
But fundamentally, you know, it has built these very sophisticated and complicated and abstract world models.
And AIs that are beginning to do that might show us the way towards functioning more like the human brain than current LLMs.
So they also would potentially have the capacity to be more sample efficient, they requireing less data.
Because when you think about our sort of cognition and our cognitive capacity, when you have,
a problem, you're not constantly waiting for external sensory data. You are, you know, capable of running
internal simulations, counterfactuals, right? So we are essentially generating so much data internally
for our own neural networks. So it's entirely possible that if we can figure out how machine
learning models can do the same, they could also become much more data efficient. So that's something
to watch out for because they are, those things are going to do things differently than LLNs.
Yes.
Right.
Well, I promised that we would review the cover, judge the cover of the book, and now we'll do
that.
So we have a special jingle, which is generated by machine learning techniques.
That will insert here.
We're going to judge a book by its cover.
Hey, book lovers.
We're judging books by the covers.
We know we're not supposed to do it.
But it's the impossible.
There's nothing to it.
Let's take a look and judge some books.
All right.
So, Neil, so take us through the title of the book, the subtitle of the book.
and the cover artwork, please.
So the title is, of course, why machines learn,
and that was just a title, strangely enough,
that just popped into my head
when I was first conceiving of the book.
It came about because I was learning about
a particular algorithm called the Perceptron,
you know, learning algorithm,
which is used for training single-layer neural networks.
And as I was learning the math
of why the algorithm works.
It was the beauty of the math that made me think of,
oh, there is a book to be written about why all these algorithms do what they do
from the perspective of the mathematics.
So the why was just my, you know, writerly sort of conceit, right?
You could just have, just as he said, how,
and it would have been a fine title.
But the why seems to grab people's attention, right or wrong.
And in my mind, it was more why than how.
And the subtitle, again, it's just elaborating on this exact idea that there is a lot of very beautiful and relatively simple mathematics underlying this extremely powerful moment in time that we find ourselves in.
And it's like maybe high school or first year undergraduate level, linear algebra, calculus.
probability and statistics and some optimization techniques, right?
It's not at all sort of, it's not the kind of physics that most sort of
graduate students in physics or electrical engineering would do.
They would do much more sophisticated math than what is required to understand.
You know, again, there is a simplicity in the math for understanding how these machines
or why these machines do what they do, but it's a very different level of math that you
if you are the one designing these algorithms.
So that is a different ballgame is, right?
The cover art on the book is a variation of some MC Escher etching.
So completely due to my publishers.
So they, I think there is, I think it's an MC Escher etching called Three Spheres,
and then they've gone ahead and added a fourth one and made it color fun.
Yes, it's sort of mesmerizing and kind of reminiscent of other curvilinear shapes and things
like a 3D printed brain my kid made me.
Nice.
In the book, you emphasize something that wasn't obvious to me, but it seems, you know,
kind of if I were to set out on a journey to recreate machine learning techniques,
I might stop because of this problem of what's called over-parameterization.
And you make the case in the book that, you know, classically,
there's something in, you know, classical statistical analysis that if you have an over-parameterized model,
you should overfit the data and therefore your models should fail or you're representative of it.
But deep learning seems to not only succeed but thrive on having more and more parameters.
I mean, every week we're getting inundated with new models and foundations and this number of billions of parameters.
And soon it'll be trillions.
I'm convinced it'll be true.
So what's the least kind of hand-wavy explanation for how this even works at all,
given the, you know, in classical statistics,
that over-perimiteration should kill your reliability
and therefore make it completely worthless.
But in fact, it's one of the most useful tools ever created by humans.
So I think mathematically we are still trying to figure it out, right?
You're right.
The old statistical learning, machine learning techniques kind of made it
very, very clear that if you overparameterize your model, you will end up memorizing it,
which means you'll end up memorizing your training data or overfitting it. And then when you're
when you're encountering new data, you won't be able to generalize to that new data. And so people
used to make sure that their models were optimally paramarmetized so that you were not overfitting.
And then along come neural networks. And we notice this empirically. So this is not something that
was worked out theoretically.
They just noticed that if they just made the networks bigger and bigger,
A, they worked better and they really noticed
that these things were not overfitting.
The consensus, well, I don't think there's any consensus
at this point about why it's, you know,
it is still an active area of research.
Why do deep neural networks, despite being heavily
overparameterize, generalize so well as well as they do,
and the fact that they don't overfit?
There is some thought about that there might be some implicit regularization going on in these networks,
that they do end up pruning themselves so that it's not as heavily parameterized as it seems at first blush.
But still, you know, these networks have brought us into a regime of parameterization that was not the regime in which traditional machine learning functions.
And what has been very interesting is not that people have figured out why neural networks are doing what they do.
They've started noticing that other traditional machine learning techniques, like kernel methods and, you know,
support vector machines combined with kernel methods and others, that also had hints of the same behavior,
but they were never really pushed, you know, early on.
on, people just assumed that overparameterization was not to be done.
And now there are hints in earlier papers that if you go look at some early machine learning
papers, they were seeing this behavior in non-neural network machine learning methods.
But they were never explored.
So what the artificial neural networks have done is they have kind of opened our eyes to the fact
that there is this completely new regime of operation, which potentially even traditional
machine learning methods could benefit from.
and so now the math is being worked out
and there is no clear answer to this yet.
Hey everybody, I'm usually the one that asks my guests
to judge their books by their covers,
but today I'm asking myself to judge my own book by its cover.
My newest book, Focus Like a Nobel Prize winner,
is chartful of advice, life tips,
and focus and productivity tips from nine of the world's greatest minds.
Nobel laureates ranging from economics to peace to physics, of course.
I will go check it out.
And my publisher's got an Amazon to run a special.
So go to Amazon and get the Kindlecopy today.
So another feature of this book is the, you know, incredible care and diligence by which you
describe the nuts and bolts of how this field has come to be so successful in the mathematics of it.
As I said, there's thousand equations.
There's hundreds of illustrations.
There's interviews.
It's an incredible book.
As I said, it's sort of this hybrid new paradigm that's a blend between a textbook.
and thriller, you know, historical thriller and, you know, kind of modern day application of, of, of, of, of, of, of these tools. But one of the kind of heroes in the book is, uh, is a technique called stochastic gradient descent. And I certainly wasn't familiar with it. I knew it was gradient descent. I've known about it since the time of Isaac Newton. Uh, but, but the question of how it works so well, given that these landscapes that you describe, you know, we,
can only visualize in the book, two-dimensional, you know, three-dimensional projections of two-dimensional
things. How is it possible in millions or trillions of dimensional landscapes that this S-G-D method
works so well? First of all, could you explain it for the audience? And then how is it that they
work so well and it become to be this kind of the superhero of ML techniques today?
Yeah, I think so the high-dimensional landscape, you're referring to our what I call
loss landscapes, the error that a network makes, and it is error as a function of the number of
parameters.
So if it was, for instance, one parameter, you would just have a curve, a 1D curve, but if it's
two parameters, then you have some sort of surface, 3D surface.
Of course, these things have hundreds of billions, if not these days, close to a trillion plus
parameters. So the loss function is in some extremely high dimensional space and also
there are lots of non-linearities in the networks. So the shape of the lost landscape
is not convex. So it's not some sort of simple bowl-shaped surface where if you
start off at some high point on the surface you can just do simple gradient descent
and be guaranteed of coming, you know, finding the global minimum. And
we don't even know if these things have something called a global minimum.
So these are extremely high dimensional surfaces with lots of hills and valleys.
And the weird thing is if you just did gradient descent, if you just went, you know, small
step by small step down the lost landscape, trying to find a region of that landscape
where the error that the network is making is very low, you might end up getting stuck in
some deep local minima and never be able to get out of it.
So stochastic gradient descent is this idea that you kind of do a drunkard's walk down that slope.
And you're taking steps not just always in the direction of the negative of the gradient,
but you are taking steps that have some sort of stochasticity.
And it's that stochasticity that potentially allows you to escape these local minima
and end up finding what might be an optimal minimum,
even though we don't know if it'll find a global minimum
or even if one exists,
but it does end up finding some sort of optimal minimum,
which represents a state where the network is making a low enough loss
so that, a low enough error,
so that is actually functioning the way you wanted to.
So it's the stochasticity that seems to be
allowing us to navigate this extremely complex lost landscape,
and escape local minima.
And the other thing that, you know, kind of resonates very highly is, of course, the Perceptron.
I think that is the, you know, main character energy of this book.
Can you give a some sort of description for maybe a layperson of how these things were
conceived of and what they fundamentally do in terms of simplifying, you know, these massive,
you know, kind of data sets or whatever into.
a tractable problem, maybe not always soluble, but at least tractable through very simple mathematics.
But what is the fundamental job of a perceptron?
I viewed it, you know, before as sort of this, you know, kind of black box, literally,
you know, black box.
But now I see it more as kind of the transistor, the cubit, the element of ML.
So can you describe that for the audience and whether or not you think that we'll still be
talking about them and using them in 50 years from now?
So the perceptron is the name given to the first artificial neural network, right?
And it was Frank Rosenblatt who designed this artificial neuron.
And the artificial neuron is a very, very simplified version of what we think is happening
in our biological neurons.
So the biological neurons have a whole bunch of inputs that come in through the dendrites.
The biological neuron does some computation.
And then based on the results of that computation, it produces.
and output. And you can think of this same thing now implemented as a piece of software, which is what
Perceptron is. You have, you have, imagine, you know, a circular figure, which is the body of the
artificial neuron, inputs are coming in. Let's say you have, you know, three inputs coming in,
X1, X2, and X3. And what the artificial neuron does is it basically does a weighted sum of these inputs.
So each of these inputs has associated with it a strength or a weight, like W1 for X1, W2 for X2, and W3 for X3.
So it will do a weighted sum of the inputs.
And if the weighted sum exceeds some threshold, it will output a plus one.
If the weighted sum is less than a threshold, it outputs a minus one.
This was the computational unit that was the perceptron right now.
And it was amazing that something this simple could then be used to do, for instance,
classifying two sets of images into images that are cats and images that are dogs, right?
So think about 10 by 10 images for argument's sake where you can,
these images are black and white and they represent either they are either images of cats or
images of dogs. And some human has painstakingly looked at these images and said, oh, if it's a cat,
we're going to call it plus one. If it's a dog, we're going to call it minus one, right?
Now, you take each one of these images. If it's 10 by 10, that means there are 100 pixels.
You turn the image into a single vector that is 100 elements long, where each element of that
vector represents one pixel of information. You feed these 100 pixels into the perceptron.
So now the perceptron instead of having three inputs is going to have 100 inputs because there are 100 pixels coming in.
And you're training the perceptron to learn the weights that are necessary in order to take a certain image and output a minus 1 or a plus 1.
And as long as these images are separable in 100 dimensional space into cats and dogs,
where cats are in one part of the 100 dimensional space and dogs are in another part of
100 dimensional space and there's a clear gap between the two in this mathematical space,
the perceptron will find a plane, hyperplane, that is capable of separating the dogs from the cats.
And then when you have a new image and you want to know whether it's a cat or a dog,
all you have to do is take the image,
linearize it into the 100 pixels,
feed it into the perceptron that you have trained,
and it's going to say, oh, this is plus one or minus one.
It doesn't know dog from cat.
All it knows is this side of the hyperplane.
I'm going to call it cats, this side of the hyperplane.
I'm going to call it dogs or whatever.
And then this was the beginning.
And even today's neural networks are just slightly more sophisticated forms
of the artificial neurons.
that Rosenblatt came up with in the 1950s,
and they just have additional elements that bring in non-linearities
and allow you to train multilevered neural networks
and things like that.
But in essence, they are still simple computational units
that are nowhere near as complicated as what a biological neuron is,
and yet they do amazing things because of the fact
that we can interconnect hundreds of billions of these things together.
Will they be there?
50 years from now?
Oh my, that's a, that, I don't see why not?
Because we have an existence proof of a machine that does something really well,
and which is our brain.
And our brain, one thing we can definitely say about our brain is it is made up of a whole
bunch of neural networks, right?
I mean, there are, you know, 86 billion neurons in there with 100 trillion connections.
And even a very simplified model of that is a very complicated neural network.
network. And it's obviously doing amazing things. So no reason to think that neural networks won't be around.
But we might come up with ways of interconnecting these neurons that are very different from the
ways we do it today. So the architecture of these neural networks might be very different 50 years from now.
But the idea that we'll have neural networks, I think they'll survive.
Oh, yeah.
Ambition comes in all shapes and sizes. At First Citizens Bank, we roll.
with your goals because we're built for what you're building fit for your ambition for citizens
bank so um i know it's late there but if you'll indulge me with uh sort of a two-part question where
the first part of the final question is just a as just a relatively rapid fire question which is
uh richard feinman you probably know was asked you know like what what is the nature of a scientific
model of reality.
And he gave an example
where if an alien
species looked at the Earth,
the planet Earth, with its atmosphere
and with the water cycle
and so forth, it would,
if it were, had all the knowledge
of the laws of physics, it would know that
we have this phenomenon
called rainbows, right?
And he basically said, if you're, you know,
understood basic physics, Maxwell's equations,
and, you know, at a high level,
as we do, you could make
predictions just from observations of the basic ingredients of a system. I want to ask you,
before I turn to the final question, which will also be about this phenomenon, do you think
like a smart alien species, you know, looking at LMs plus GPUs, plus machine learning, plus all
the great stuff you write about in this book, could predict that these models would hallucinate
and that they'd be sycophantic? And I'll tie that into one of your earlier books in just a second. But
do you think it was inevitable, in other words, that these things would have these pernicious,
in some sense, and very dangerous phenomena, potentially, of hallucinating, you know, where I asked
it recently, you know, what books has Brian Keating written? And it said, you know, losing the Nobel
Prize and into the impossible and a brief history of time. And it's like, well, that's nice. I wish I
had, you know, a couple percent of the book sales of Stephen Hawking, but I don't. So tell me,
And would a, you know, kind of an intelligent alien looking at these models and so forth,
would they be able to predict that they would eventually come to have these pernicious phenomena
like hallucination and sycophanty?
I think so.
I think if you, you know, assuming that the aliens can look at the math, which if they are
smart, they will obviously be able to.
It's no different from us knowing, you know, if you look at the math, it's very, very obvious
why they're going to hallucinate, right?
I mean, these next token prediction machines are essentially probabilistic.
They're always trying to generate a probability distribution over their vocabulary to say
what is the most likely next word.
It's not 100% certain about what has to be produced.
And it has learned about patterns that exist in data that is not a definitive amount of
data for any particular problem.
And the way these things are constructed, they will always.
output something that they think is the most likely one.
And right or wrong, right?
I mean, it's just the nature of the beast.
It's just constructed in a way where, yes, if you look at the math, it's so obvious that,
you know, I find the word hallucination itself problematic because the procedure that generates
correct answers or answers that look correct to us is exactly the same procedure that
results in hallucination.
So there should be anyone who can peek at the math will immediately.
say, yes, of course these things will hallucinate.
Yes, and in some sense, you know, as I said, it could be useful for me, you know,
at least to buoy my confidence.
But, of course, we do want things to give us factual information.
But of course, to the extent that they mimic the human mind, you know, this is perhaps inevitable.
So I want to follow up with a final question.
You've been very generous with your time in a late evening there in Bangalore.
In the man who wasn't there, you write about these patients, these, who exhibit these sorts of very strange phenomena, and including what you define, I think, in that book, as maladies of the self. And these are confabulations and hallucinations, not as much about sycophanty, perhaps, as LLMs are prone to. But they lose a sense of themselves. And I'm wondering, has, you know, the explorations that you've done in LLMs, have that, has that kind of refer.
find the way that you think about the way the human mind works. And we kind of mentioned this.
And my hope is that you'll write a book about human, but you kind of did earlier. So what have you
learned about the human condition, these unusual traits that, you know, you talk about in your
earlier book, that these, by the nature of being complex systems that have emerging phenomena,
that you sort of will get strange behaviors, maybe even worse than hallucination.
and sycophanty, maybe true disorders, maladies, true maladies of the self.
What do you make of that as kind of a learning that you encountered and writing this new
book, you know, after having written this incredible book, The Man Who Wasn't There?
I think for me, the writing of The Man Who Wasn't There was very instrumental in making me
think of what is happening within us in computational terms.
it's kind of
when you view
our perception of our bodies
of our cognitive selves,
etc., through the lens of the brain
creating models of the environment,
models of us embedded in that environment,
using the models to make predictions about
what is out there, including
making predictions about our body,
and the fact that what we perceive at any moment
are predictions that the brain is making,
once you view everything within that framework,
it again becomes very clear that while on average and most of the time the brain is doing what
it's supposed to be doing and whatever we are perceiving is more or less congruent with physical
reality so we are not being we're not hallucinating we're not being psychotic the fact that it is
a computational process and the fact that there is stochasticity in that process means that
these computational systems are prone themselves to making wrong predictions.
And because what we perceive is the brain's prediction at any given moment, and we take the
prediction to be real and truthful, even if the prediction is wrong, it will feel like real
to us.
So it's very easy to understand why we end up having states of psychosis or states of hallucination.
So now when you think of what's happening with machine learning models and the fact that we are seeing some of these processes, you know, in very minor ways being duplicated in machines, it starts, the connections become more and more obvious that, you know, we might even end up building machines or function like us, which will themselves be prone to psychosis, which will be no different than, I mean, right now we complain about the hallucinations that an LLM makes because it represents wrong answers to us.
But imagine building a machine that is using its internal predictive mechanisms to understand its own state and its behavior in the world.
And if those predictions about its own state are wrong, it is essentially hallucinating about itself.
We're not too far away from building at least simple versions of such machines.
And I don't even want to imagine where that will go.
But the parallels are pretty striking between if we take a computational view of what's happening in our
brains and the things that we're doing when we build these machines, the parallels are
striking.
Annal Anathaswami, thank you so much for writing this wonderful book. It's really one of my
favorite books. I only regret that I didn't read it earlier. I've interviewed, you know,
dozens and dozens of people from both, you know, pro-AI, anti-Ele-AI and Lens, but understanding
the details behind, you know, what's underneath the hood was a real treat. And you approach it
as you do with all your writing in such a beautiful, eloquent, and a careful way that I just
can't take you enough for this and the opportunity to interview you and for you to stay up late
on this late December evening for you over there in Bangalore. Thank you so much. This has been a
real pleasure. Well, Brian, thank you very much for having me on your podcast. It's been my pleasure
entirely. Thank you. If you enjoyed this conversation with Anil, you'll want to check out the
follow-up interview I did with Jan Lacoon. We tackle many of these same questions, but from the
perspective of someone building the systems themselves.
Yon's no AI Dumer.
That episode is linked right here.
Watch it right now.
And it's this conversation helped sharpen how you think about AI, not just what to believe,
but how to question it and how to understand when it's actually doing then.
Please do me a favor.
Like this video, subscribe to the channel, and leave a comment with a question you think
the AI community is still avoiding.
I read them all.
See in the next episode.
How many discounts does USA Auto Insurance offer?
Too many to say here.
Multi-vehicle discount, safe driver discount, new vehicle discount, storage discount, legacy.
How many discounts will you stack up?
Tap the banner or visit usaa.com slash auto discounts.
Restrictions apply.
Did you know if your windows are bare, indoor temperatures can go up 20 degrees?
Get ahead of summer with custom window treatments like solar roller shades from blinds.com
and save up to 45% during the Memorial Day early access sale.
Whether you want to DIY it or have a pro handle everything, we've got you.
Free samples, real design experts, and zero pressure.
Just help when you need it.
Shop up to 45% off site-wide right now
during the early access Memorial Day sale at blinds.com.
Rules and restrictions apply.
