Behind The Tech with Kevin Scott - Surya Ganguli: Innovator in artificial intelligence
Episode Date: May 23, 2019From quantum mechanics and string theory to monkey brains and electrophysiology mapping, Surya’s take on AI may surprise you. Can we guide AI so it’s good for the world? This researcher is taking ...his cues from biology.
Transcript
Discussion (0)
A lot of people, of course, listening to your podcast and out there in the world have become incredibly successful based on the rise of digital technology.
But I think we need to think about digital technology now as a suboptimal legacy technology.
To achieve energy-efficient artificial intelligence, we really got to take cues from biology.
Hi, everyone. Welcome to Behind the Tech.
I'm your host, Kevin Scott, Chief Technology Officer for Microsoft.
In this podcast, we're going to get behind the tech.
We'll talk with some of the people who've made our modern tech world possible and understand what motivated them to create what they did.
So join me to maybe learn a little bit about the history of computing
and get a few behind-the-scenes insights into what's happening today.
Stick around.
Hello and welcome to the show.
I'm Christina Warren, Senior Cloud Advocate at Microsoft.
And I'm Kevin Scott.
Today we're excited to have with us Surya Ganguly. Yeah, Surya is a professor at Stanford working
at the intersection of a bunch of different super interesting fields. So he's a physicist,
he's a neuroscientist, he is actually quite familiar with modern psychology, and he's a
computer scientist. And so he's applying all of this interest that he
has across all of these different fields to trying to make sure that we're building better and more
interesting biologically and psychologically inspired machine learning systems and AI.
I love it. It's so cool. But Kevin, I know that you and Surya are going to have like a totally serious, totally informed and thought-provoking conversation about all things AI.
But I wonder if maybe first we can kind of go into like the pop culture aspect of AI, you know, with all the different movies and comics and various cult classics.
That would be great.
Right?
Because a lot of times these things really kind of indulge our imagination, but also our fears about AI.
Like, you know, I'm a really big fan of the film Blade Runner, the Ridley Scott classic, which is based on like the Philip K. Dick book.
And there are other things like even Westworld and films like 2001, A Space Odyssey.
A lot of these kind of views of AI aren't necessarily positive, right?
Yeah, I think it's a really challenging thing because I think when you think about AI in
general, like AI done well disappears into the background and it is a thing that exists to
empower humans, to augment us, to support us, to enhance us. And like, it isn't a substitute
for humanity. The thing that I've been really thinking about is that fiction, to a certain extent,
has always been a way that we reflect the hopes and anxieties about the impact technology has
on our lives. So it's this, like, really beautiful way that we express our struggles with sort of this unknown future.
And this imagination that we've got right now about AI, like it's sort of really playing out in fiction.
So I remember when I was growing up, we were still in the halo of the space race,
which had created this incredible canvas for all these writers and artists to make all sorts of amazing literature and films about what it would be like for humans to go beyond Earth, beyond their terrestrial origins.
And the thing that really gave all of those artists permission to create the stories that they created I think is sort of the speech that President Kennedy gave in 1961 when he announced the Apollo program.
So if you look at the text of that speech, which I have because I've been writing a book, one of the things that he said in this speech is we're going to set sail on this new sea because there is new
knowledge to be gained and new rights to be won, and they must be won and used for the progress of
all people. It's like really funny, like you read the text of this speech and you could just sort of
substitute out like all of the things that he said about the anxieties that people were having over,
you know, sort of rockets and the space program and all of these technologies and, like, replace it with AI.
And the speech would totally make sense. goal that we had to, you know, go to the moon and start the space race, like, I think it gave this
incredible permission to all of these artists and thinkers to, like, sort of imagine, like,
what this future could be. And I sort of feel like we're missing that a little bit right now with AI.
Anyway, we could wax poetic about all of this for a really long time.
We could go on and on and on, but we do.
And on and on.
Yes, but we should probably meet with our guest.
Yeah, let's talk to Surya.
I'm very pleased to introduce today's guest, Surya Ganguly. Surya is an assistant professor
of applied physics and by courtesy of neurobiology and electrical engineering at Stanford University.
He's considered by many to be one of the leading experts in the field of artificial intelligence.
So welcome, Surya, and thanks for being on the show.
Yeah, thanks for having me.
So I got to be familiar with your work through the work that I started doing with Stanford's Human-Centered AI Institute.
And so, like, we're going to get to, like, all the cool stuff that you're doing right now.
But I'm super curious, like, you've got this crazy interesting educational background where you just, you seem super curious.
Like, where did that come from?
Like, how did you start in tech as a kid?
Yeah, that was kind of my misspent wayward youth. So, you know, I kind
of always wanted to be a scientist. And then, you know, I grew up in Irvine. I read all the books
in the public high school about artificial intelligence. And they're all written by old
professors at MIT. So, I kind of wanted to work in AI even in high school. So, I went to MIT. I
took my first AI course. This was, you This was end of the 90s or so.
It was all old school expert systems, logic-based systems, all that.
And what year was this?
This was around in the middle of the 90s, essentially.
So then I asked my professor, shouldn't we try to reverse engineer the brain?
And I'll never forget his answer.
He told me, no, no, no, Surya, just ignore the brain.
It'll just confuse you.
All we got to do is figure out the software program the brain is running. And even as a freshman, that didn't feel right to me.
And so I wasn't sure I wanted to do AI anymore. I stuck with a CS degree, but I had friends taking math and physics courses, and I kind of enjoyed that. So then I ended up triple majoring through,
you know, just serendipitously in math, physics, computer science.
How do you serendipitously in math, physics, and computer science. How do you serendipitously triple major?
I just took courses for fun, and then I checked the requirements for the courses in my junior
year, and I realized, okay, if I just take the junior physics lab, which is a terrible
experimental course, I mean, it's a good experimental course, but not if you want to
be a theorist, then I could get a physics degree and the math degree would come for
free based on what I'd done.
And so I just sucked it up and took the experimental physics course. That's awesome. And were your parents scientists or engineers?
No, my dad was an engineer. Yeah, he's a mechanical engineer. And my mom was actually
a philosophy major in undergrad. And then she became practical and became a certified public
accountant for the IRS. Oh, interesting. And so were they encouraging you to pursue a particular
path? They were super encouraging you to pursue a particular path?
They were super encouraging. I think my dad was like an amazing mentor. He still has quotes that
he would say all the time that stick in my head. Like, we were watching a video about neurosurgery,
right? There was a neurosurgeon who was working his butt off doing the surgery. He goes tired,
and he made it happen. And my dad said something like, Surya, the power of a concentrated mind is limitless.
Wow.
And it just stuck.
And he was full of these kinds of inspirational things.
And I was kind of naturally inclined to study science and math.
And he totally, like, encouraged that.
What was the most interesting course that you took as an undergraduate?
So you were sampling, like, this great breadth of things.
Like, what was the most interesting?
I loved quantum mechanics.
It was amazing.
And what about it really interested you?
It was just the power of mathematics to penetrate into the microscopic world in a way that human intuition could not.
And then slowly you think about it and think about it and think you gain intuition.
And it was just amazing.
And then you can predict how the microscopic world will evolve, verify those predictions.
Just the power of mathematics to penetrate nature.
I really felt that for the first time as an undergraduate then.
Because up until then, everything was studying topics that you could sort of reason about, like electricity, magnetism, waves, and so on.
But quantum mechanics was different.
Yeah, it's still something that I have a hard time getting my head fully wrapped around.
I mean, at least for me, you sort of nailed it in the, like, your normal intuitions as
a human being with a normal set of human experiences are all wrong for, like, trying to understand
the quantum world.
Yeah, and you construct stories in your mind.
But a famous physicist once said, if you think you understand quantum mechanics, then you don't.
So it's this kind of catch-22 nature to it.
Yeah, I've got the Feynman lecture sitting by my favorite reading chair at home.
And when I'm feeling especially energetic, I will grab one of them and, like, try to understand. I think, you know, that was one of the geniuses of Feynman is, like, he was so good at relating
these, like, very complicated, in many cases, completely non-intuitive concepts in a way that
you could understand.
Absolutely.
I love those lectures.
I lived off of those lectures when I was an undergrad.
It was super fun.
That's awesome. And so this spark that ignited for you
around quantum mechanics and physics
sort of led you to pursue a graduate degree in physics,
right?
Yeah.
On a whim, I decided to not go to graduate school
in computer science.
I decided to go to graduate school in physics
because I felt like it'd be fun.
And on a whim, I decided to do my PhD in string theory
because at the time, it felt like the most fundamental subject in physics.
So explain to our audience what string theory is.
Explain to me what string theory is.
Yeah.
So basically, you know, we have these twin theories, right?
General relativity, which governs a curvature of space and time on cosmic scales.
And we have quantum mechanics, which governs the temporal evolution of the nanoworld, right, on very microscopic scales. And there's no one theory that can really unify those together,
where you can get both gravity and quantum mechanics together.
String theory is one where each particle becomes a little string,
and different modes of vibration of the string
become different types of physics.
Like one mode of vibration becomes the graviton,
another mode of vibration becomes photons, and so on.
And so in a quantum mechanical way, so you can really unify these two things. And it's a
mathematically self-consistent theory. It's absolutely beautiful and very difficult to
connect to experiments. Yeah, one of my colleagues at Google had gotten his degree in string theory
at Stanford. Oh, was this Yonatan Zunger? Yeah. Yeah, I was in classes with him.
Yeah, super fun guy.
And like he did absolutely nothing
with string theory at Google,
but he was like one of the more
interesting and brilliant people I've had.
It's great training.
And to be honest with you,
if I had to do everything over again,
I would still do a PhD in string theory.
That's interesting.
And string theory is like a little bit
out of vogue right now.
Or is that a controversial thing to assert?
It's evolving, actually.
I mean, I still talk to the Stanford string theorists to some extent.
And there's very interesting ideas about holography and tensor mathematics and now
applying string theory to quantum condensed matter systems in ways that, like, you can
take tabletop quantum condensed matter physics experiments and describe them using a dual viewpoint using general relativity.
And that idea came out of string theory. So, it's morphed.
That's super interesting. So, PhD in string theory, then what?
Yeah. So, you know, I kind of had a battle in my soul between do I want to become a pure
mathematician or a scientist, right? If I stayed in string theory, I worked on the more mathematical topics in string theory.
But part of me always wanted to connect to nature.
Everything was driven by understanding, you know, nature, right?
And I didn't feel like I could really connect to nature in string theory.
So I took my first course in computational neuroscience at the end of my PhD.
It was a fascinating course taught by great professors at Berkeley,
Yang Dan and Frederick Tunison, Bruno Olshausen,
who discovered sparse coding, which is foundational in machine learning now.
And it was an amazing course.
It was all about reverse engineering the brain, trying to understand it, and so on.
And I just completely fell in love.
And that was what my freshman kind of soul was yearning for.
And I only discovered it at the end of my PhD.
And then I couldn't even look back. Like, that's what I did all my postdocs.
It's sort of interesting. I mean, like, I was watching the talk that you gave at the
launch of the Human-Centered AI Institute at Stanford, and you said this thing that I think is,
you know, really one of the impediments to, like, getting people to more fully embrace
the connections between neuroscience and machine learning, which is, like, all of us, based
on our background, sort of reduce the complexity of problems in ways that are sort of convenient
to our training.
And so, like, you were sort of saying that, you know, computer scientists, like, when
we think about artificial neural networks, like like we have this very reductionist way of looking at, you know, how to model a synapse.
It's like this single scaler.
It's like this weight, you know, like when we talk about billion parameter models, like, you know, it's sort of like the very loose moral equivalent of like a billion synapse system.
But like we model it with a single scaler.
And like if you look at the neurobiology of what a synapse is, like, it's this incredibly
complicated system.
And so, I'm just sort of interested.
Like, you go from this discipline string theory where part of what you're trying to do is,
like, develop this beautiful, elegant model of the universe.
And, like, you jump into this neuroscience world where everything is
just complicated and messy. So like how did you go from one to the other? Like they just seem to be
like opposing ideas to me. Yeah, it's easy. I think Paulia had a famous statement, whenever you're
attacking a new problem, you both know too much and too little. Too much of the wrong thing and
too little of the right thing. So I actually decided to temporarily forget my training in physics.
And I went straight to UCSF, a medical school, to do my postdoc,
where I was surrounded by experimental neuroscientists.
And I really spent a lot of time trying to understand how experimentalists think.
What are the questions that they're interested in?
What are they asking?
What's important to them?
Because your success as a theorist in any field will partially be determined by your ability to change what the experimentalists do,
what the practitioners or engineers do, and so on.
I mean, just from a career perspective, like, let's forget about all the complicated science stuff.
Like, that was sort of a brave thing to do, right?
Like, I mean, for, like, listeners who haven't been in academia, like, academia is, like, a very, you know, in many ways, like, a rigid system.
Like, you go get your degree, you get your PhD.
Like, hopefully you can jump into a tenure track position.
Like, you know, you may use a postdoc as a thing that gets you to.
And, like, switching disciplines, that is a horribly risky thing to do.
Like, where did you get the courage to do that?
It's either courage or idiocy.
You know, I never worried too much about the future, the long-term future.
Honestly, what I was thinking then was, I'm pretty sure I don't want to do string theory.
I'm super excited about this neuroscience thing.
Somebody had offered me a fellowship to just learn about it for, you know, several years. And I was like, let's do it. And if it doesn't work out,
Wall Street is always waiting. But I was not that excited about that at all. So,
I just kind of jumped in. Honestly, I thought this is kind of ridiculous.
It's probably never going to work out. But at least for the next couple of years,
I can have a lot of fun. Yeah, there's like the career risk. There's also,
you know, you are sort
of making yourself vulnerable in a way, right? Because you spent all of this time accumulating
expertise and with this PhD in physics, and like now you're jumping into this brand new domain
where you have to go bootstrap yourself again. Like that's also a thing that people sometimes
have a hard time doing. Yeah, I mean, it does sound like a lot, but actually there is a well-worn
pathway from physics to many other fields, especially neuroscience. If you look at a lot of the top
theoretical neuroscientists out there in the world, a lot of them were actually trained in physics
to begin with, and increasingly they're being trained in computer science. And so
what was really nice is that the neuroscientists were super welcoming. They needed the help of
quantitative people. So needed the help of quantitative
people. So there's lots of opportunities for people trained in the quantitative sciences,
you know, computer science, physics, mathematics, to really make an impact in neuroscience.
And you can hit the ground running pretty quickly compared to what it takes to do,
you know, research in string theory. And so as you entered this brand new field as a postdoc, what were some of the
interesting connections that you were able to make that were only possible because you had this,
you know, sort of unique point of view and background? You know, what I always kind of
thought about slightly differently from some of my colleagues at the time was, you know, thinking about a high-dimensional dynamical systems view of the brain. So, my first project in string theory
that led to a publication in Neuron was how do monkeys pay attention, right? We get distracted
by bottom-up distractors. We also have top-down attention. Monkeys have both of these things.
They can both focus in a particular location of space and get distracted. And there was some strange neural dynamics that was occurring in the monkey's
brain, the part of the brain that allocates attention. And nobody could understand it.
When I attacked it, I thought about it from a higher dimensional perspective,
and that cracked open why the brain was operating that way. So, to make a long story short.
Yeah. Well, actually, let's go into the long story. So, like, how do you approach a problem like that? So, even though a monkey brain is, like, not quite like a Homo sapiens brain, but it's still a very complicated mechanism, how do you, like, even get the data that you need to build a better understanding or, like, quantitative model of what's going on?
Yeah, so I had fantastic collaborators, Mickey Goldberg and Michael Shadlin.
These are experimentalists
who can record many, many neurons from the brain.
In this particular experiment,
they recorded from the parietal cortex of the monkey,
which is sort of, it has a map of visual space.
And there's patterns of activity
in one-to-one correspondence
with locations in visual space.
And wherever the pattern happens to reside
is where the monkey's allocating attention.
And so you can make this bump move around,
this bump of activity move around in the brain
by flashing distractors.
You can make it move around in a top-down fashion
by having the monkey allocate its attention
by doing a task at a certain point in visual space.
And so they did both of these manipulations
while they recorded in the brain.
They had lots of neurons.
They didn't have a simple theory for why the dynamics of the neurons.
And the recording is something like-
Electrophysiology recording. So, they stick electrodes into the brain,
and they measure the eavesdrop on the electrical signals that neurons emit when they fire
action potentials.
So, it's not directly observing the firing. It's sort of observing some sort of secondary effect
of like a bunch of things firing.
It's pretty close.
So you have electrodes that eavesdrop on a small number of neurons near it, and you can demix that signal because each neuron's firing has a different shape.
Interesting.
That's super fascinating. a ton of progress in machine learning over the past 15 years is that a lot of our systems are
benefiting from things that are growing exponentially fast. So like data for training,
the compute that you're using to run the training, and like you're able to do large-scale experiments
with like very quick turnaround. And so like you sort of take all of those together and, like, you can turn the crank
on an experimental cycle really quickly and, you know, just sort of drive to, like, larger
and larger scale in your models.
But, like, when you're doing these biological experiments, like, you're sort of missing
some of these things.
Definitely, yeah.
That's why there's still a lot of room in
deep learning and machine learning for the small data problem, where you have small amounts of data
that's very expensive to collect. How do you detect patterns in high-dimensional data sets
where you don't have that many data points? Yeah. The first time that you and I chatted,
you gave me a recommendation for a book to read whose title I'm totally spacing on now,
but like it's sitting in the front seat of my car about, you know, sort of the design of biological neural systems.
Oh, yeah, the principles of neural design.
Yeah, it's just like a fantastic book. I read, you know, the first 50 pages or so of it. And,
you know, the thing that's, I think, really fascinating is some of the big models that we're training right now
that are the things that are sort of sensational
in the world of machine learning
take an unbelievable amount of power to train.
And so we just finished training a model the other day
that was sort of a three-petaflop-day run.
And so you're doing this run
on hundreds and hundreds and hundreds of 300 watt chips.
And, you know, it's like rows and rows of servers
and data centers connected by miles of cabling.
And the power envelope in these things are,
you know, like a cluster of these machines
might be sort of a megawatt of power
consumption when they're at full utilization. And like a human brain is what? It's 20 watts. It's
20 watts in a steady state. So, it's like just unbelievable to me, like what this machine that
sits inside of your head is able to do relative to the things that we're doing right now that are
like, you know, just the vanguard in machine learning.
Right.
Yeah, we've actually been thinking about that.
We've actually been looking into a way to get inspiring directions for researchers to
think about order of magnitude discrepancies between what the brain does and what machine
learning systems do.
You hit it, but upon the head, the energy or the power dissipation is an order of multiple
orders of magnitude discrepancy.
A part of that has to do with biological systems operate very differently from our computers,
right? In digital computation, every single bit has to flip with very, very low probabilities
of error and very, very fast. So the laws of thermodynamics extract a high energetic cost
for every fast and reliable bit flip. But biological systems
operate very differently. You look at them and they look like they're noisy, chaotic, out of
control. But what they've done is they've made every intermediate step of the computation just
good enough for the final answer to be just accurate enough, thereby not spending excess
power at intermediate steps of the computation. A lot of people, of course, listening to your
podcast and out there in the world have become
incredibly successful based on the rise of digital technology.
But I think we need to think about digital technology now as a suboptimal legacy technology.
To achieve energy-efficient artificial intelligence, we really got to take cues from biology.
The other aspect of what you just asked, the data hungriness of current AI algorithms,
I suspect that's because the existing framework of training bigger models on bigger data sets might be a little bit like climbing a tree to get to the moon if the moon is considered the goal of general intelligence, right?
I've looked into the numbers on this.
If you look at the data requirements of AI systems compared to humans, You know, like AlphaGo Zero, it practiced
about 33 million games, right? So, if a human were to play 33 million games, it would have to play,
I did the calculation recently, around, I think it was 300 games a day every day for 30 years,
right? And our top Go masters do it on much less. Right. And then these systems probably won't be able to generalize. If you change one rule, a human would do still very well. If you change the size of a board, the human would do well, but these games wouldn't. So I think humans have a very different like specific things that you have examined that are like sort way to try to accelerate the convergence of the actual
numerical optimization systems that sort of sit at the core of modern machine learning training
algorithms. And then some of the things that you're doing are much more closely associated
with these biological systems. Yeah, I can give you kind of two examples. So, we draw inspiration
both from the physical world and the biological world when we work in AI problems. We also directly attack neuroscience problems and
physics problems as well. But one example from the neuroscience world is really taking the
complexity of synapses seriously, right? Synapses are incredibly complex signal processing devices.
And in artificial neural networks, we just treat them as a scalar value. If you take into account
the dynamical
complexity of synapses, you can create different types of artificial neural networks where the
synapses can retain a memory trace of all the changes that they've undergone while solving a
task. So talk about that a little bit more. Like, I understand a little bit about neurobiology of
synapses, but like, I don't think I fully understand, like, this whole notion of being
able to have this memory trace.
Yeah, I'll give you an example of that.
So, you know, one way that we used a potential memory trace in the synapse is to try to attack the problem of catastrophic forgetting, right?
So, the catastrophic forgetting problem in artificial neural networks is that you train on task A, you learn all your synaptic weights.
Then you train on task B, and you relearn the weights, but they erase whatever information was in the weights about task A.
Yeah, and so, like, a good example of this is, like, the early reinforcement learning
systems that people were building to play video games.
Like, you could play it, you could train a system that could get really good at playing
Pac-Man, for instance, but if you then try to get that same neural network to play another
video game, like Q- Qbert, it's completely
clueless.
It's completely clueless, yeah.
And like there are even, you know, some like really interesting things.
You know, someone was showing me a demo the other day where like some of the training
that we can do for game playing is like extremely, extremely brittle where you change the luminosity
of the screen or like you move like an element in the
game like one little bit and like it just really hasn't learned anything general at all about the
structure of the game yeah that's the lack of robustness which is another key issue with these
current ml systems so anyway i interrupted you so no no yeah so going back to the catastrophic
forgetting problem right if you could have each synapse retain a
memory trace of how important it was for solving a particular problem, and we developed an online
learning algorithm within the synapse that kept track of its own importance at almost no additional
competition cost compared to just gradient descent training, then as you learn subsequent problems,
you could slow down the learning rates of the important synapses and speed up the learning
rates of the unimportant synapses. And that way, you could learn the
second task without forgetting the first task. And we demonstrated that this actually worked.
So, this was kind of inspired by taking synapses more seriously, taking the potential power of
them more seriously. Interesting. And so, this is a little bit related to this whole idea of
transfer learning, right?
Yeah, exactly.
Yeah.
We also tried to use it for transfer learning.
So transfer learning is, you know, humans are really good at learning one task, say
like ping pong, and then being really good at another racket sport compared to how they
would have been if they had no experience with a racket, right?
So they can transfer stuff from task A to task B.
A major impediment to transfer learning is, can we
come up with a mathematical formulation of pairs of tasks such that we can predict when
structure in one task will transfer to structure in another task through a neural network?
And so we were able to develop a recent theory of that that will be presented at iClear.
And what we found is we found a mathematical formula for pairs of tasks that took into account
how common the features were that are important for solving the two tasks. It's kind of intuitive
in hindsight. If there's enough common features, then transfer learning will be successful.
What's interesting is it doesn't matter what you do with the features, right? As long as there's
some particular function of the inputs that's important for solving both task A and task B.
But task A says you do one thing with that function and task B says you do another.
That's not a problem.
So you just need a notion of commonality in the input feature space.
And we were able to formalize this all mathematically, which was super fun.
Yeah, this is just fascinating stuff.
So where do you think the next set of breakthroughs are likely to come from?
Like what are you thinking about right now that you're – Yeah, we're thinking about kind of all of it simultaneously.
We're really interested in sort of unsupervised learning, you know, what can be done in that direction.
We're thinking about more mathematical theories.
We're trying to understand, develop algorithms for interpreting neural
networks, especially neural networks that were trained to mimic the brain. Because what's
happening in neuroscience is we're increasingly starting to model more complex neural circuits
under more complex tasks. And our models themselves are complicated deep neural networks.
So then, you know, we're placing something that we don't understand, i.e. the brain,
with an artificial network that we don't understand. And so we're developing algorithms
to do that. And we're actually using ideas from physics involving coarse grading and things like
that, where let's say you have a very, very complicated model. Can you extract from it a
simpler model where the individual connections in the model and the neurons in the simpler model
are in one-to-one correspondence with what we think is already there in the brain?
And we've recently actually, as a test case,
we've been working on the retina,
which is a deep neural network of its own accord.
And people have been showing, like, for 40 years,
different artificial stimuli to the retina.
And they come up with these ad hoc models
for each of these artificial stimuli.
Yet nobody has come up with a really good model
of the retinal response to natural scenes,
the very scenes that sculpted the evolution of the retina.
We recently came up with a state-of-the-art model of the retina that involved a deep neural
network.
It's a complicated model, and now we're looking inside it to see how it responds to all of
these artificial stimuli.
That's really fascinating.
I mean, one of the things that I've thought for a long while is that even though in many, many ways the connection between artificial neural networks and biological neural networks is tenuous, like a way to get insight into how the biological systems work versus, you know, like what we typically try to do,
which is like derive this inspiration from the biological for the artificial.
Yeah, exactly.
My colleague Dan Yamins at Stanford has done some great work on that where he's come up with models of the ventral visual stream
all the way from retina to V1 to V2 to V4 to IT, which has these object detection cells.
And he trained deep neural networks to do object classification,
nothing to do with neuroscience.
And then he looked for patterns of activity in different layers of the deep network
that would optimally match the patterns of activity in different layers of, say, a monkey's brain.
Right.
When neural activity patterns were measured in different layers
in response to the same set of objects.
And he found a great match there.
Interesting.
I mean, I want to go back to this. You mentioned unsupervised learning just a minute ago. And it's one of the things that is really very exciting right now. So there's been
a bunch of breakthroughs just over the past nine months or so with applying the techniques of
unsupervised learning to natural language processing tasks.
Yeah, yeah.
You know, document summarization, question answering, translation, like a ton of things.
And so, you know, for everybody, unsupervised learning is this notion that you can train
a set of machine learning systems like they typically involve deep neural networks,
but you don't have human beings in the loop providing corrections to training
or, like, labels for particular things.
It just sort of, like, learns, like,
a general conceptual model of some domain,
and then you try to figure out how to apply it
to, like, specific problems.
In some cases, like, that application is, is like you do transfer a learn from like the unsupervised
general model to some small supervised model that sort of specializes the unsupervised
model to a task.
But like in some cases, and this is the work that OpenAI did with their GPT-2 model, they
were doing one-shot learning. So just with
no supervision whatsoever, like getting this thing to do useful things. And so the reason that that's
super interesting, as you well know, is that one of the things that constrains how fast you can
sort of scale up the ambition of classical machine learning systems is like
this data labeling, data engineering task is very, very difficult.
So how do you think about this?
Because in some ways, human beings can be very good at unsupervised learning.
Like a toddler, just by absorbing the universe around it, can learn things way more advanced than what a machine learning system right now is able to figure out.
Yeah, exactly.
Yeah, I think you hit the nails on the head on everything you said.
You know, going back to the first example you gave, natural language processing, unsupervised learning has been incredibly useful in natural language processing because we have a simple principle for solving an unsupervised learning problem, which is predicting the next word or maybe the next character in sequential text, right?
So if you can predict, then you can understand.
And what's been really amazing is the internal representations
that these neural networks use to solve this prediction
are actually very useful for subsequent training and supervised tasks.
And actually, I'm working with a computational linguist, Chris Manning, actually, an NLP person.
We're jointly advising a rotation student.
Who's written my favorite NLP textbook of all time.
Exactly, yeah, yeah.
I've been studying, actually, his textbook recently.
We're jointly advising a student who's analyzing how these unsupervised trained networks work.
John Hewitt, he actually came up with a really cool result that showed that if you look inside these neural networks,
they implicitly build up syntactic trees associated with sentences, right?
And so what he did was he, there's something called the dependency parse tree, where you
can take a sentence and come up with a dependency parse, and that gives you a distance between
all pairs of words in the sentence.
He showed that he can learn a simple quadratic form
going from the internal representations of these networks,
Burton and Elmo, to a scalar,
which predicts the dependency parse
between words in the sentence.
And so now what we're doing
is we're analyzing the dynamics of this model,
trying to figure out how on an online word-by-word fashion,
it builds up this parse tree,
which has been interesting computation
because conventional computer science algorithms to build up dependency parse trees need a stack to do it.
So we suspect that a stack machinery is hiding in the dynamics of this network.
That also shows.
We're playing around with that.
Yeah, that's super fascinating. I mean, it's interesting on a bunch of different dimensions.
As you were sort of saying all of that, that I was thinking about is, like, I was a compiler and programming
language person when I was a younger computer scientist. And, you know, I remember long,
long, long ago reading Chomsky's work. And like one of the, it said, like, Noam Chomsky is like
a very famous- Another MIT person.
Another MIT person, like, you know, one of the most famous linguists and, you know,
social philosophers in the world. That I didn't agree with even when I was a freshman.
Yeah.
And like look, I also didn't agree with him because like one of the things that he asserted a long, long time ago is that human beings like had some built-in notion of grammar that was like sort of striking that like even when you're evolving an artificial system that like maybe some fundamental notion of grammar like manifests itself.
Yeah, exactly.
Like that's super interesting.
It's an emergent property of – and we've actually worked out mathematical solutions to the dynamics of learning in deep neural networks that show how hierarchical concepts can emerge naturally in a deep neural network. So, for example, babies,
when they learn concepts, they learn coarse-grained distinctions first, like animals versus plants,
even if you control for perceptual dissimilarity. And then as they get older, they learn finer
distinctions, different types of animals, different types of plants. We were able to prove mathematically
that deep neural networks have to do this when they're exposed to it. And so a paper that's
going to appear in PNAS soon compares our mathematical theory of deep learning in semantic cognition
to many, many experiments on babies in semantic cognition, and we achieve a match.
Interesting.
Yeah. So actually, I had one more thing I wanted to say about unsupervised learning.
You know, is prediction good enough? Okay. That's the driving principle, one of the driving
principles, you know, I'm surprised learning today. I don't think it's good enough. If you go back to what babies do, there's a famous
experiment. If you give a baby two magical objects, like object A, where if you drop it,
it doesn't fall, right? Say through a video or something. And then object B, it seems to go
through walls. You give these two objects to a six-month baby sitting on a high chair. What will
it do? The object that didn't fall, it'll throw it off the high chair to check if it falls. The object that
seemed to go through the walls, it bangs it on the table to see if it'll go through the table.
So this is incredible, right? Babies, even at six months, have an implicit model for the physical
evolution of the world. They pay attention to violations of that world model, and they actively choose experiments to gather specialized training data to test those violations
further. So this business of building world models, using those world models to imagine
the future and make decisions, looking at violations to modify the world model,
actively doing experiments, that's the next frontier in machine learning. Yeah. Not just passive training data.
Yeah.
Totally, totally agree.
I mean, like, we've gotten really, really good at prediction and classification.
And, like, we're not really great yet at higher order things like deductive reasoning.
Yeah.
And reinforced learning is starting to get there in terms of at least formulating the
problem because you have a sequential decision-making problem. Right. where you have to exploit and explore and all that. I think
the methods for exploration are not that efficient. So, model-based reinforcement learning is kind of
the direction that people are trying to go where you use world models to, again, imagine, plan,
and learn. So, this has been a super fascinating conversation, but one of the things that I want to pick your brain on is everything that we've talked about so far is like incredibly technical, which I love.
Like, you know, I could spend hours, you know, just sort of grappling with right now is like all of this AI stuff is
increasingly having an impact on everybody's day-to-day life. And so, for the majority of
folks, like, you know, we probably lost them like half an hour ago, you know, in this conversation
that we're having. So, how do you think about our role as scientists and engineers and technologists
in helping the public better understand this big bag of complicated stuff so that they can make
good decisions about it? Yeah, I think it's incredibly important. And more than that even, it's coming to terms with ourselves as a society
and how we can optimize the development of AI so the outcomes are good for society as opposed to bad.
And so there's incredibly thorny issues involved in that that touch on labor economics
and political science and regulation and on ethics and so on and so forth.
So it's an incredibly complicated set of problems.
So we need to bring together, in a real way, scientists from many different disciplines,
like the ones I just mentioned.
And that's what HAI is actually trying to do.
Just to talk a little bit about the structure, we kind of have three focus areas.
One is building next-generation AI inspired by the power,
versatility, and robustness of human intelligence. And so we bring in ideas from neuroscience and
psychology and machine learning to work together to get to that new technology.
The second is building AI systems to augment the capabilities of humans. Think like building
intelligent hospitals, working in domains where companies
might fear to tread, like development in Africa or things like that. And then the third branch,
which is very important, which speaks to the question that you just asked, is can we guide
and design the impact of AI in society? So we're bringing in economists, social scientists,
historians. Historians, for example, can study biases that exist in the
training data that we're using to teach AI systems. Social scientists who can study the impact that AI
could have in different sectors of our society. Economists to deal with the displacement of jobs
that I think is a proximal issue that we're going to have to deal with very soon. So I've been having
a lot of fun in this initiative actually meeting economists and
lawyers as well. Like, how do we regulate AI systems? What are the ethics involved? You know,
if a self-driving car hits somebody, who's liable, right? These are incredible issues that we don't
have all the answers to. We need to build an institute to bring people together, to convene
the stakeholders, to figure out,
you know, what we should do. And that's what we're trying to get at.
Yeah, I'm involved. So, like, obviously, I think it's a worthy undertaking. I think there's this
really urgent need for storytellers to basically bridge this gap between an incredibly complicated technical world where it's like very,
very easy. And I've allowed myself to do this on multiple occasions. Like, you know, the thing that
you really, really strive for sometimes when you're a scientist or engineer is to like just
get into this flow state where you're completely immersed in a particular problem.
And psychologically, like when you get into flow state, like everything else around you just sort of disappears.
And I love the flow state.
Yeah, no, we all love the flow state.
And it's like where you do your best work.
But, you know, we also are like working in this discipline where it is equally important
to like pull yourself up and like connect with the greater context around you and to make sure that we're sort of pointing all
of this intellectual energy that we're focusing into these very interesting problems right now
in ways that sort of have a net human benefit to everyone. Absolutely. So we as professors are
doing that through HAI. We're talking to media. We're talking to people of decision-making power
in industry and governments. I've actually been, you know, in the past, I've tutored CEOs of portfolio companies for a VC firm on AI,
which is actually super fun.
Incredibly smart people who may not have all the technical background,
but they're very curious.
The other thing that we're trying to do is really nip this in the bud
and train the next generation of leaders to take ethics and social science
and all these other issues into account at the very beginning,
say, freshman CS courses and things like that.
We need to bring in the societal implications of AI,
bring that into the consciousness of students who are going to be the next generation of AI practitioners.
And we have a very serious educational goal there at AI as well.
Indeed.
So let's change directions completely.
So, you have so many interests across such a breadth of different things. So, do you have
any interesting hobbies outside of your professional life that you get obsessed about?
Yeah, I used to have a lot more. I have a three-year-old kid at home, so that takes me back.
Yeah, that's the new hobby. I love playing tennis.
I was actually on the varsity tennis team, so I used to be a college jock, albeit at MIT.
MIT was very proud of its football team, for example, because it made it into Sports Illustrated for fun stats, which is the highest ratio of IQ to weight, body weight, which means they weren't that great on the field.
Isn't there a professional football player right now who's getting his PhD in applied math at MIT?
Yeah, yeah, you're right.
I'm blanking on his name, but yeah, that's absolutely true.
Yeah, he's like super amazing.
Like I'm ashamed that I can't remember his name.
So anyway, like you were a tennis player. I love playing tennis.
I love swimming.
We love hiking and so on.
Yeah, I like getting out into nature and just exercising the body as opposed to the
mind. And I'm sort of curious, like, I know my hobbies help me to get my brain reset. So,
like, yesterday, for instance, like, I'm working on a book right now, and I had to get myself
psyched up to finish writing the last chapter of this book, which I spent, you know, 12 hours
yesterday, like putting the finishing touches on. But before I found 12 straight hours to work,
by hook or crook, it involves staying up until very late last night. But before I could even
get my head clear enough to sit down and write, I had to go do something with my hands. It's almost like,
you know, some people meditate, like I go into my shop and I make something.
Yeah, yeah.
And it doesn't need to be a complicated thing. It just needs to require 100% of my attention.
Like if I'm not focusing on this, I'm going to cut my fingers off or, you know, something.
Yeah. You know, for me, that's swimming. Like I leave the terrestrial surface,
I go under the water, I swim for half an hour, 45 minutes, and I'm a
completely new person. It's really weird. So when I was in grad school, when I had the freedom,
my schedule was roll in at 10 a.m., right? Socialize with my grad student friends,
then go swimming at 6 p.m. Like I used to swim a mile a day and then show up back in the lab at
like, you know, 8 p.m., you know, a bunch of grad students were still there. Then work until 2 or 3 a.m.
That was great.
And then roll in at 10 a.m.
I can't do that now.
Now I wake up at 4 or 5 a.m. before my kid wakes up.
And I can barely make it to the pool because I have to be home by 6.30 p.m.
So it's tough.
Yeah, I used to think when I was a graduate student that I was very busy.
Oh, yeah.
And if I had known what my life was going to be like at 47,
like I would have... I tell my grad students and postdocs that the best time in the academic chain
is grad school or postdoc. Actually, postdoc, I think, is the best. They don't believe me.
Yeah. No, and like, I wouldn't have believed you either because I thought my life was crap.
You value different things at different stages.
All right. Well, thank you so much different things at different stages.
All right. Well, thank you so much for being with us today. This was an awesome conversation.
Thanks. It's my pleasure. Thanks for having me.
Awesome.
Well, thanks for joining us for Behind the Tech. That was Kevin Scott speaking with Surya Ganguly.
Yeah, Surya is this, as everyone just heard, like this really, really brilliant polymath.
Well, one of the things, you know, that you two were talking about towards the end of your conversation that I found really interesting is, you know, we're getting all these really intelligent models.
We're starting to train these things in really interesting ways. We're able to do a lot of really great things. But with that comes obvious ethical questions. So, how do we ensure that what
we're building is going to be used in a way that doesn't turn into that dystopic vision of, you know, the science fiction novels that have defined our entertainment or even worse, had like a negative impact on our lives?
I think one of the most important things that we can do is long period of time to accumulate a bunch of expertise is like admirable, but like
it's also like not this unattainable sort of thing. And so what we have to do a much better
job of than we've been doing throughout the history of AI as a defined discipline is making
sure that we are able to convey some of the complexity
of the field to other folks.
So, like, we have a big swath of people participating in the conversation about AI in a rational,
reasonable way.
You can't just expect, like, a bunch of scientists and engineers to universally be able to make
a set of good decisions on behalf
of the rest of society. Like, everybody needs to have a voice in this thing.
Yeah, you're definitely right. We need to have more voices, and we need to be more transparent
with how these things are being built, too, right? Because, I mean, I think that that's one way that
you do both, A, improve the models that are being built, but B, and correct me if I'm wrong here,
but it seems like maybe it would make people more comfortable if they have a better insight into what actually
is happening. Yeah, I totally agree. And like the irony of this field is like, in a way, it is a
very modern discipline. So, in some senses, it has far more transparency than some of the scientific
or engineering disciplines that have preceded it. Because you have open source software where people are able to take the code that they're
writing to create some of these models and share them with the rest of the world.
Like you have transparency there because so much of this data that people are using to
train models exists on the open internet.
You have transparency there. And because publications in this area are so much freer now than they've ever been before
at any point in human history, you've got a real transparency about the ideas.
We have some work to do there.
There's sort of a reproducibility crisis that you've got happening across all of science right now where the experiments and results that people are publishing are becoming increasingly difficult for other people to replicate.
And I think that's certainly a problem here in AI. of the transparency here is about awareness, because I can make arbitrarily complicated
things super transparent.
I can send my mom the proceedings of the big Deep Neural Network Conference, NeurIPS, the
Neural Information Processing Symposium not, that's not going to help her
be involved in the conversation about AI. Like, even though it is like, the very foundation of,
you know, transparency about what's happening in the field right now.
Okay, so what we need is like a Netflix show or an HBO show that shows
the reality of the situation, right?
Totally. Like, we've got to figure out like, how to get people more excited about connecting with this stuff.
And we've got to get my posse of folks, like the scientists and engineers,
like, more excited about sort of boiling these things down to their, like, clear, understandable essence.
All right.
So we're going to write a screenplay.
But until then, that's what we're going to do, Kevin.
But until then, I think that is all the time that we have for today.
Yeah, indeed.
And be sure to join us next time on Behind the Tech when we speak with Fei-Fei Li.
Fei-Fei is the co-director of the Stanford Human-Centered AI Institute.
I hope you'll join us.
Yes, and please help spread the word about this podcast.
See you next time.