Lex Fridman Podcast - #333 – Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI
Episode Date: October 29, 2022Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford. Please support this podcast by ...checking out our sponsors: - Eight Sleep: https://www.eightsleep.com/lex to get special savings - BetterHelp: https://betterhelp.com/lex to get 10% off - Fundrise: https://fundrise.com/lex - Athletic Greens: https://athleticgreens.com/lex to get 1 month of fish oil EPISODE LINKS: Andrej's Twitter: http://twitter.com/karpathy Andrej's YouTube: http://youtube.com/c/AndrejKarpathy Andrej's Website: http://karpathy.ai Andrej's Google Scholar: http://scholar.google.com/citations?user=l8WuQJgAAAAJ Books mentioned: The Vital Question: https://amzn.to/3q0vN6q Life Ascending: https://amzn.to/3wKIsOE The Selfish Gene: https://amzn.to/3TCo63s Contact: https://amzn.to/3W3y5Au The Cell: https://amzn.to/3W5f6pa PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (05:41) - Neural networks (10:45) - Biology (16:15) - Aliens (26:27) - Universe (38:18) - Transformers (46:34) - Language models (56:45) - Bots (1:03:05) - Google's LaMDA (1:10:28) - Software 2.0 (1:21:28) - Human annotation (1:23:25) - Camera vision (1:28:30) - Tesla's Data Engine (1:32:39) - Tesla Vision (1:39:09) - Elon Musk (1:44:17) - Autonomous driving (1:49:11) - Leaving Tesla (1:54:39) - Tesla's Optimus (2:03:45) - ImageNet (2:06:23) - Data (2:16:15) - Day in the life (2:29:31) - Best IDE (2:36:37) - arXiv (2:41:06) - Advice for beginners (2:50:24) - Artificial general intelligence (3:03:44) - Movies (3:09:37) - Future of human civilization (3:13:56) - Book recommendations (3:20:05) - Advice for young people (3:21:56) - Future of machine learning (3:28:44) - Meaning of life
Transcript
Discussion (0)
The following is a conversation with Andreak Apathy, previously the director of AI at Tesla.
And before that, at OpenAI and Stanford, he is one of the greatest scientist engineers
and educators in the history of artificial intelligence.
And now, a quick few second mention of the sponsor.
Check them out in the description.
It's the best way to support this podcast. We got 8 sleep, 4 naps, better help, 4 mental health, 5 fund rise,
4 real estate investing, and a flooded greens, 4 new nutrition, 2 wise of my friends.
And now onto the full ad reads, as always, no ads in the middle. I try to make this interesting,
but if you skip them, please still check out our sponsors. I enjoy their stuff.
Maybe you will too.
This episode is sponsored by H sleep and it's new pod 3 mattress.
I'm recording this in a hotel.
In fact, given some complexities of my life, this is the middle of the night
4 a.m. I'm seeing an empty hotel room yelling at a microphone
This my friends is my life
I do usually feel good about myself at 4 a.m
But not with two cups of coffee in me and the reason I feel good is because I'm going to go to sleep soon
and I've accomplished a lot. This is true today except for the sleep soon part because I think I'm going to an airport at some point soon. It doesn't matter what matters is I'm not even going to sleep
here and that's great because in a hotel I don't have an eight-sleep bed that can cool itself
at home I do and that's where I'm headed.
I'm headed home.
Anyway, check it out and get special savings when you go to 8sleep.com slash Lex.
This episode is also brought to you by BetterHelp.
It's spelled H-E-L-B-H-H.
I'm a huge fan of talk therapy.
I think of podcasting as a kind of talk therapy. So I'm a huge fan of listening to podcast.
In fact, that's how I think of doing a podcast myself. I just get to have front row seats to a thing I love.
And it's actually just the process of talking that reveals something about the mind.
I think that's what good talk therapy is.
Is it guided by a professional therapist?
It helps you reveal to yourself something about your mind.
Just lay it all out on the table.
So yeah, you should definitely use the best method of talk therapy, the best meaning
the most accessible.
At least to try it, if not to make it a regular part of your life, that's what BetterHelp does.
Check them out at betterhelp.com slashlex and save on your first month.
This episode is also brought to you by FundRise spelled F-U-N-D-R-I-S-E.
It's a platform that allows you to invest in private real estate.
We'll live in hard times, folks, for many different reasons, but one of them is financial.
And one way to protect yourself in difficult times is to diversify your investments.
Private real estate is one of the things, believe you should diversify into. And when you do, you should use
tools that look like they're made in the 21st century, which a lot of investment, even like online
investment websites and services seem to be designed by the same people that designed the original ATMs. That's not the case with
Fundrise. Super easy to use, accessible over 150,000 investors use it. Their team vets
and manages all their real estate projects. You can track your portfolio's performance
on their website and see updates as properties across the country are acquired, improved,
and operated. Anyway, check out Fundrise.
It takes just a few minutes to get started at Fundrise.com slash Lex.
This show is brought to you by Athletic Greens and it's AG1 Drink, which is an all-in-one
daily drink to support better health and peak performance.
I have to be honest, I completely forgot to bring a thread of greens with me as I'm
traveling now and I miss it. It's not just good for my nutritional base and needs, it's good for my
soul. It's part of the sort of the daily habit of life and when you don't have that habit, the routine stuff is off. So it's good
to just put that into your daily routine to make sure that you're getting the vitamins
and nutrition that you need, no matter the dietary, the workload, the athletic endeavors
that you partake in. I don't know, it's kind of incredible.
And yeah, that's what I thought it greens is for me.
They'll give you one month's supply of fish oil when you sign up at
athleticgreens.com slash Lex.
This is the Lex Friedman podcast to support it.
Please check out our sponsors and now to your friends.
Here's Andre Kapati.
What is a neural network? And what does it seem to do such a surprisingly good job of learning? What is a neural network? And what does it seem to do such a surprisingly good job of learning?
What is a neural network?
It's a mathematical abstraction of the brain.
I would say that's how it was originally developed.
At the end of the day, it's a mathematical expression.
And it's a fairly simple mathematical expression when you get down to it.
It's basically a sequence of matrix smoke applies, which are really dot
products mathematically, and some not only are he's thrown in. And so it's a very simple
mathematical expression. And it's got knobs in it. Many knobs, many knobs. And these
knobs are loosely related to basically the synapses in your brain. They're trainable,
they're modifiable. And so the idea is like, we need to find the setting of the knobs
that makes the neural nut do whatever you want it to do,
like classify images and so on. And so there's not too much mystery I was saying it like
you might think that basically don't want to end out with too much meaning with respect to the
brain and how it works. It's really just a complicated mathematical expression with knobs and those
knobs need a proper setting for it to do something desirable. Yeah, but poetry is just the collection of letters with spaces, but it can make us feel
a certain way.
In that same way, when you get a large number of knobs together, whether it's inside the
brain or inside a computer, they seem to surprise us with their power.
Yeah.
I think that's fair.
So basically, I'm underselling it by a lot because you definitely do get very surprising emergent
Behaviors out of these neural nets when they're large enough and trained on
complicated enough problems like say for example the next word prediction in a massive data set from the internet and
Then these neural nets take on a pretty surprising magical properties. Yeah, I think it's kind of interesting
How much you can get out of even very simple mathematical formalism.
When your brain right now is talking, is it doing next word prediction or is it doing something
more interesting? Well, it's definitely some kind of a generative model that's a GPT
like and prompted by you. Yeah. So you're giving me a prompt and I'm kind of like responding
to it in a generated way. And by yourself, perhaps a little bit, like are you adding extra prompts from your own memory inside your head?
Or now?
Well, definitely feels like you're referencing some kind of a declarative structure of like memory and so on.
And then you're putting that together with your prompt and giving away some of this.
How much of what you just said has been said by you before?
Nothing, basically basically right?
No, but if you actually look at all the words you've ever said in your life,
and you do a search, you'll probably said a lot of the same words in the same order before.
Yeah, could be.
I mean, I'm using phrases that are common, etc, but I'm remixing it into a pretty,
sort of unique sentence at the end of the day, but you're right. Definitely, there's like But I'm remixing it into a pretty sort of unique sentence at the
end of the day. But you're right, definitely, there's like a ton of remixing.
Why? You didn't, you, it's like Magnus Carlson said, I'm rated 2,900, whatever, which
is pretty decent. I think you're talking very, you're not giving enough credit to neural
nuts here. Why do they seem to, what's your best intuition
about this emergent behavior?
I'm just kind of interesting because I'm simultaneously
underselling them, but I also feel like there's an element
to which I'm over, like, it's actually kind of incredible
that you can get so much emergent magical behavior out of them,
despite them being so simple mathematically.
So I think those are kind of like two surprising statements
that are just juxtaposed together. And I think basically what it is is we are actually fairly good at optimizing
these neural nets. And when you give them a hard enough problem, they are forced to learn very
interesting solutions in the optimization. And those solutions basically have these emerging
properties that are very interesting. There's wisdom and knowledge in the knobs.
And so this representation that's in the knobs doesn't make sense to you intuitively,
the large number of knobs can hold a representation that captures some deep wisdom about the data.
It has looked at a lot of knobs.
It's a lot of knobs.
And somehow, you know, so speaking
concretely, one of the neural nuts that people are very excited about right now are our
GPs, which are basically just next word prediction networks. So you consume a sequence of words
from the internet and you try to predict the next word. And once you train these on
a large enough data set, they, you can basically prompt these neural
nuts and arbitrary ways and you can ask them to solve problems and they will.
So you can just tell them, you can make it look like you're trying to solve some kind of
mathematical problem and they will continue what they think is the solution based on what
they've seen on the internet.
And very often those solutions look very remarkably consistent, look correct potentially. Do you still think about the brain side of it? So as
neural nets as an abstraction or mathematical abstraction of the brain, you
still draw wisdom from the biological neural networks or even the bigger
question. So your big fan of biology, biological computation, what impressive thing is biology
doing to you the computer, not yet, that gap?
I would say I'm definitely on a much more hesitant
with the analogies to the brain than I think
you would see potentially in the field.
And I kind of feel like certainly the way
your own network started is everything stemmed from inspiration by the brain. But at the end of the like, certainly the way neural networks started is everything stemmed
from inspiration by the brain.
But at the end of the day, artifacts that you get after training, they are arrived at
by a very different optimization process than the optimization process that gave rise to
the brain.
And so I think I kind of think of it as a very complicated alien artifact.
It's something different.
I'm sorry, the neural nuts that were training. They are complicated alien artifact. It's something different. I'm sorry, the neural nuts that we're training.
They are complicated alien artifact.
I do not make analogies to the brain
because I think the optimization process
that gave rise to it is very different from the brain.
So there was no multi agent, self play kind of setup
and evolution.
It was an optimization that is basically
a, what amounts to a compression objective on a massive amount of data.
Okay, so artificial neural networks are doing compression and biological neural networks are now into survive.
And they're not really doing anything. They're an agent in a multi agent, self-place system that's been running for a very, very long time. That said, evolution has found that it is very useful to predict and have a predictive
model in the brain.
And so I think our brain utilizes something that looks like that as a part of it, but
it has a lot more, you know, gadgets and gizmos and value functions and ancient nuclei
that are all trying to like make a survivor and reproduce and everything else.
And the whole thing through embryogenesis is built from a single cell
I mean, it's just the code is inside the DNA
Mm-hmm. And it just builds it up like the entire organism
With the thirms crazy ahead and legs. Yes, and like it does it pretty well
It should not be possible. So there's some learning going on. There's some, there's
some kind of computation going through that building process. I mean, I don't know where,
if you were just to look at the entirety of history of life on earth, where do you think
is the most interesting invention? Is it the origin of life itself? Is it just jumping to eukaryotes, is it mammals, is it humans themselves,
homo sapiens, the origin of intelligence or highly complex intelligence, or is it all
just in continuation of the same kind of process?
Certainly, I would say it's an extremely remarkable story that I'm only like briefly learning
about recently, all the way from, actually like you almost have to start at the formation of Earth and all of its
conditions and the entire solar system and how everything is arranged with Jupiter and Moon and
the habitable zone and everything and then you have an active Earth that's turning over material
and then you start with a biogenesis and everything. So it's all like a pretty remarkable
story. I'm not sure that I can pick like a single unique piece of it that I find most interesting.
I guess for me as an artichoke intelligence researcher, it's probably the last piece. We have
lots of animals that are not building technological society, but we do.
And it seems to have happened very quickly.
It seems to have happened very recently.
And something very interesting happened there
that I don't fully understand.
I almost understand everything else,
I think intuitively, but I don't understand
exactly that part and how quick it was.
Both explanations would be interesting.
One is that this is just a continuation of the same kind of process.
There's nothing special about humans.
That would be deeply understanding that would be very interesting.
That we think of ourselves as special, but it was obvious.
All it was already written in the code that you would have greater and greater
intelligence emerging.
And then the other explanation, which is something truly special happened, something like a rare event,
whether it's like crazy rare event, like space odyssey, what would it be?
See, if you say like the invention of fire, or the, as Richard and Rangham says, the beta males deciding a clever way to kill the
alpha males by collaborating.
So just optimizing the collaboration, the multi-agent aspect of the multi-agent, and that
really being constrained on resources and trying to survive the collaboration aspect is
what created the complex intelligence.
But it seems like it's a natural algorithm
to the evolution process.
What could possibly be a magical thing that happened?
Like a rare thing that would say that
humans are actually human-level intelligence,
actually a really rare thing in the universe.
Yeah, I'm hesitant to say that it is rare, by the way,
but it definitely seems like it's kind of like a punctuated equilibrium where you have lots of exploration and then you have
certain leaps, sparse leaps in between.
So of course, like origin of life would be one, you know, DNA, sex, eukaryotic, eukaryotic
life, the endosymbiosis event where the archaeon ate the whole bacteria, you know, just the whole
thing. And then of course, emergence of consciousness and so on. So it seems like different
than there are sparse events where mass amount of progress was made, but yeah, it's kind of hard to
pick one. So you don't think humans are unique. I gotta ask you how many intelligent alien civilizations
do you think are out there? And is there intelligence different or similar to ours?
Yeah, I've been preoccupied with this question quite a bit recently, basically the Fermi
paradox and just thinking through.
And the reason actually that I am very interested in the origin of life is fundamentally trying
to understand how common it is that there are technological societies out there in space.
And the more I study it, the more I think that there should be quite a lot.
Why haven't we heard from them?
Because I agree with you.
It feels like I just don't see why what we did here in Earth is so difficult to do.
Yeah, and especially when you get into the details of it,
I used to think Origin of Life was very...
It was this magical rare event,
but then you read books like, for example,
in the claim, the vital question,
life ascending, etc.
And he really gets in and he really makes you believe
that this is not that rare basic chemistry.
You have an active Earth and you have your alkaline vents
and you have lots of alkaline waters mixing with acid devotion
and you have your proton gradients
and you have the little porous pockets
of these alkaline vents that concentrate chemistry.
And basically as he steps through all of these little pieces,
you start to understand that actually this is not that crazy.
You could see this happen on other systems.
And he really takes you from just a geology
to primitive life.
And he makes it feel like it's actually pretty plausible.
And also, like the origin of life
didn't was actually fairly fast after formation of Earth.
I think if I'm recurrently just a few hundred million
years for something like that after basically
when it was possible life actually arose
And so that makes me feel that that is not the constraint that is not the limiting variable and that life should actually be fairly common
um, and then it you know where the drop offs are is very
um, is very interesting to think about I currently think that there's no major drop offs basically and so there should be quite a lot of life
and basically what it where that brings me to then is the only way to reconcile the
fact that we haven't found anyone and so on is that we just can't, we can't see them.
We can't observe them.
Just a quick brief comment. Nick Lane, a lot of biologists I talk to, they really seem
to think that the jump from bacteria to more complex organisms is the hardest jump.
The Eukaryotic life. to jump from bacteria to more complex organisms is the hardest jump. You carried it like this.
Yeah, which I don't, I get it.
They're much more knowledgeable than me about
like the intricacies of biology,
but that seems like crazy.
Because how many single cell organisms are there?
Like, and how much time you have,
surely it's not that difficult. Like an ability in years is not even that long
Of a time really just all these bacteria under constrained resources battling it out
I'm sure they can invent more complex. I got it on it's like how to move from a hello world program to like
Like invent a function or something like that. I don't yeah
And so I don't yeah. So I'm with you, I just feel like I don't see any.
If the origin of life, that would be my intuition.
That's the hardest thing.
But if that's not the hardest thing
because it happens so quickly, then it's gotta be everywhere.
And yeah, maybe we're just too dumb to see it.
Well, it's just, we don't have really good mechanisms
for seeing this life.
I mean, by what, how?
So I'm not an expert just to preface this, but just from what I've been life. I mean, by what, how, so I'm not an expert
just to preface this, but just from what I think is that it was, I want to meet an expert
on alien intelligence and how to communicate. I'm very suspicious of our ability to find
these intelligence is out there and to find these Earth-like radio waves, for example, are
terrible. Their power drops off as basically one over R square. So I remember reading that our current radiowaves would not be,
the ones that we are broadcasting would not be measurable by our devices today.
Only like, was it like one tenth of a light year away?
Like not even basically tiny distance because you really need
like a targeted transmission of massive power directed somewhere for
this to be picked up on long distances.
And so I just think
that our ability to measure is not amazing. I think there's probably other civilizations out there.
And then the big question is why don't they build one on m m propes and why don't they
interstellar travel across the entire galaxy. And my current answer is it's probably interstellar
travel is like really hard. You have the interstellar medium. If you want to move at
closer speed of light, you're going to be encountering bullets along the way,
because even like tiny hydrogen atoms and little particles
of dust are basically have massive kinetic energy
at those speeds.
And so basically you need some kind of shielding.
You need that you have all the cosmic radiation.
It's just like brutal out there.
It's really hard.
And so my thinking is maybe interstellar travel
is just extremely hard.
I think you have to do it very slow. I think it's like years to build hard.
It feels like we're not a billion years away from doing that.
It just might be that you have to go very slowly, potentially, as an example, through space.
Right.
As opposed to close to the speed of life.
So I'm suspicious, basically, of our ability to measure life, and I'm suspicious of the
ability to just permeate all of space in the galaxy or across galaxies
And that's the only way that I can certainly I can currently see a way around it. Yeah, it's kind of mind blowing to think that there's
Trillions of intelligence and alien civilizations out there kind of slowly traveling through space
Made to meet each other and some of them meet, some of them go to war, some
of them collaborate.
Or they're all just independent.
They're all just like little pockets.
I don't know.
Well, statistically, if there's like, if it's the Australians of them, surely some of
them, some of the pockets are close enough together.
Some of them happen to be close, you know, and close enough to see each other.
And then once you see, once you see something that is definitely complex life, like if we see something, we're probably
going to be severe, like intensely aggressively motivated to figure out what the hell that is and
try to meet them. But will be your first instinct to try to like at a generational level, meet them
instinct to try to like at a generational level meet them or defend against them or will be your instinct as a president of the United States and the scientists. I don't know which
hat you prefer in this question. Yeah, I think the question it's really hard. I will say
like for example, for us,
we have lots of primitive life forms on Earth.
Next to us, we have all kinds of ants
and everything else, and we share space with them.
And we are hesitant to impact on them
and to, we are trying to protect them by default
because they are amazing, interesting,
dynamical systems that took a long time to evolve
and they are interesting and special.
And I don't know that it if I can afford to. And I'd like to think that the same would be true about the galactic resources
and that they would think that we're kind of incredible interesting story
that took time, it took a few billions of years to get the idea of
how to do it.
And I think that the idea of how to do it is to do it.
And I think that the idea of how to do it is to do it.
And I think that the idea of how true about the galactic resources and that they would think that we're kind of incredible interesting story that took time.
It took a few billion years to unravel and you don't want to just destroy it.
I could see two aliens talking about Earth right now and saying, I'm a big fan of complex dynamical system.
So I think it's with a value to preserve these and will basically are a video game they watch
or show a TV show that they watch.
Yeah, I think you would need like a very good reason, I think, to destroy it.
Why don't we destroy these end farms and so on?
It's because we're not actually like really in direct competition with them right now.
We do it accidentally and so on, but there's plenty of resources.
Why would you destroy something that is so interesting and precious?
Well, from a scientific perspective, you might interact with it lightly.
You might want to learn something from it, right?
I wonder, there could be certain physical phenomena that we think is a physical phenomena,
but it's actually interacting with us to poke the finger and see what happens.
I think it should be very interesting to scientists.
Other alien scientists, what happened here. And you know, it's a, what we're seeing today is a
snapshot, basically, it's a result of a huge amount of computation, of like billionaires or
or something like that. So it could have been initiated by aliens. This could be a computer running
a program. Like, wouldn't you? Okay, if you had the power to do this, when you, okay, for sure,
at least I would, I would pick a earth-like planet that has the conditions based my understanding
of the chemistry prerequisites for life. And I would see it with life and run it, right?
Like, yeah, when you 100% do that and observe it and then protect, I mean, that's not just a hell of a good TV show. It's a good scientific
experiment. And it's physical simulation, right? Maybe the evolution is the most like actually
running it, is the most efficient way to understand computation or to compute stuff.
For understand life or what life looks like and what branches it can take.
It does make me kind of feel weird that we're part of a science experiment,
but maybe everything's a science experiment inside of.
Does that change anything for us?
For a science experiment?
I don't know.
Two descendants of Apes talking about being inside of the science.
I'm suspicious of this idea of like a deliberate pens premiere as you described it.
So I don't see a divine intervention in some way in the in the historical record right now.
I do feel like the story in these in these books like Nikolai's books and so on sort of makes sense
and it makes sense how life arose on earth uniquely. And yeah, I don't need a, I mean, I don't need to reach for more exotic explanations right now.
Sure, but NPCs inside a video game don't, don't, don't observe any divine intervention either.
We might just be all NPCs running a kind of code. Maybe eventually they will. Currently NPCs are
really dumb, but once they're running GPs, maybe they will be like, this is really suspicious with the hell.
So, you famously tweeted, it looks like if you bombard Earth with photons for a while,
you can emit a roaster.
So, if I can hitchhike this guy to the galaxy, we would summarize the story of Earth.
So, in that book, it's mostly harmless.
What do you think is all the possible stories,
like a paragraph along or a sentence long, that Earth could be summarized as? Once it's done,
it's computation. So like all the possible full, if Earth is a book, right? Yeah.
Probably there has to be an ending. I mean, there's going to be an end to earth and we could end in all kinds of ways.
You can end soon.
You can end later.
What do you think are the possible stories?
Well, definitely there seems to be, yeah, you're sort of, it's pretty incredible that
the self-replicating systems will basically arise from the dynamics and then they perpetuate
themselves and become more complex and eventually become conscious and build a society.
And I kind of feel like in some sense, it's kind of like a deterministic wave that kind
of just like happens on any sufficiently well-arranged system like Earth.
And so I kind of feel like there's a certain sense of inevitability in it and it's really
beautiful. And it ends somehow, right? So it's a it's a
chemically a diverse environment where complex dynamical systems can evolve and become
more further further complex. But then there's a certain
what is it? There's certain terminating conditions.
Yeah, I don't know what the terminating conditions are,
but definitely there's a trend line of something
and we're part of that story.
And like, where does it go?
So, you know, we're famously described often
as a biological bootloader for AIs.
And that's because humans, I mean,
they were an incredible biological system
and we're capable of computation and, you of computation and love and so on.
But we're extremely inefficient as well. Like we're talking to each other through audio,
it's just kind of embarrassing, honestly. They were manipulating like seven symbols,
serially, we're using vocal chords. It's all happening over multiple seconds. It's just like kind
of embarrassing when you step down to the frequencies at which
computers operate or are able to operate on.
And so basically it does seem like synthetic intelligences are kind of like the next stage
of development.
And I don't know where it leads to, like at some point I suspect the universe is some
kind of a puzzle. And these synthetic
AIs will uncover that puzzle and solve it. And then what
happens after right? Like what? Because if you just like fast
forward earth, many billions of years, it's like it's quiet. And
then it's like, to turmoil, you see like city lights and stuff
like that. And then what happens that like at the end, like, is
it like a pool? It's it, like is it like a pool?
Is it, or is it like a calming, is it explosion?
Is it like earth like open like a giant?
Cause you said emit roasters like,
we'll start emitting like like a giant number
of like satellites.
Yeah, so some kind of a crazy explosion.
And we're living, we're like,
we're stepping through a explosion. And we're like living day to day and it doesn't look like it. But it's actually, if you, kind of a crazy explosion and we're living we're like we're stepping through a Explosion and we're like living day-to-day and it doesn't look like it
But it's actually if you I saw a very cool animation of earth and life on earth and basically nothing happens for a long time
And then the last like two seconds like basically cities and everything and
And the lower orbit just gets cluttered and just the whole thing happens in the last two seconds and you're like this is exploding
statement explosion
So if you play, yeah, yeah, if you play at a normal speed, it'll just look like an explosion.
It's a firecracker.
We're living in a firecracker where it's going to start emitting all kinds of interesting
things.
And then the explosion doesn't.
It might actually look like a little explosion with lights and fire and energy emitted
all that kind of stuff, but when you look inside, the details of the explosion, there's actual
complexity happening where there's like, yeah, human life or some kind of life.
We hope it's another destructive firecracker.
It's kind of like a constructive firecracker.
All right, so given that, I think hilarious.
It is a really interesting to think about
what the puzzle of the universe is. Did the creator of the universe give us a message? Like,
for example, in the book contact Carl Sagan, there's a message for humanity for any civilization
in digits in the expansion of Pi in base 11 eventually. We're just kind of interesting thought.
Maybe we're supposed to be giving a message to our creator.
Maybe we're supposed to somehow create
some kind of a quantum mechanical system
that alerts them to our intelligent presence here.
Because if you think about it from their perspective,
it's just say quantum field theory,
massive like cellular atomic bomb like thing.
And like, how do you even notice that we exist?
You might not even be able to pick us up in that simulation.
And so how do you prove that you exist,
that you're intelligent and that you're part of the universe?
So this is like a touring test for intelligence, for Earth.
Like, as the creator is, I mean, maybe this is like,
trying to complete the next origin and sense.
This is a complicated way of that.
Like, Earth is just, is basically sending a message back.
Yeah, the puzzle is basically like
alerting the creator that we exist.
Or maybe the puzzle is just to break out of the system
and just stick it to the creator in some way.
Basically, like if you're playing a video game,
you can somehow find an exploit and find a way
to execute on the host machine in the arbitrary code.
There's some, for example, I believe
someone got a game of Mario to play pong just by exploiting it
and then creating a, basically writing code
and being able to execute arbitrary code in the game.
And so maybe we should be, maybe that's the puzzle,
is that we should be a find a way to exploit it.
So I think like some of these synthetic
areas will eventually find the universe to be some kind of a puzzle and then solve it in some way.
And that's kind of like the end game somehow. Do you often think about it as a simulation?
So as the universe being a kind of computation that might have bugs and exploits? Yes,
yeah, I think so. I think physics is essentially. I think it's possible that physics has exploits and
we should be trying to find them arranging some kind of a crazy quantum mechanical system that
somehow gives you buffer overflow, somehow gives you a rounding error in the floating point.
Yeah, that's right. And more and more sophisticated exploits, those are jokes, but that could be
actually very close.
Yeah, we'll find some way to extract infinite energy.
For example, when you train reinforcement learning agents
in physical simulations and you ask them to, say,
run quickly on the flat ground, they'll
end up doing all kinds of weird things
in part of that optimization.
They'll get on their back leg and they'll slide across the floor.
And it's because the enforcement learning optimization on that agent
has figured out a way to extract infinite energy
from the friction forces and basically their poor implementation.
And they found a way to generate infinite energy
and just slide across the surface.
And it's not what you expected,
it's just sort of like a perverse solution.
And so maybe we can find something like that.
Maybe we can be that little dog in this physical simulation.
The cracks or escapes the intended consequences of the physics
that the universe came up with.
We'll figure out some kind of shortcut to some weirdness.
And then, but see the problem with that weirdness?
Is the first person to discover the weirdness,
like sliding in the back legs, that's all we're going to do. It's very quickly because everybody does that thing.
So the paperclip maximizer is a ridiculous idea, but that very well could be what then
we'll just all switch that because it's so fun.
Well, no person will discover it, I think, by the way, I think it's going to have to be some kind of a super intelligent AGI of
a third generation. Like we're building the first generation AGI.
AGI, you know, third generation. Yeah, so the bootloader for an AI, that AI
will be a bootloader for another AI. And then there's no way for us to interest back like what that might even I think it's very likely that these things for example
Like say you have these aji's it's very likely that for example, they will be completely inert
I like these kinds of sci-fi books sometimes where these things are just completely inert
They don't interact with anything and I find that kind of beautiful because they probably
They've probably figured out the meta metagame of the universe in some way potentially.
They're doing something completely beyond our imagination.
And they don't interact with simple chemical life forms.
Why would you do that?
So I find those kinds of ideas compelling.
What's their source of fun?
What are they doing?
What's the source of pleasure?
Probably solving in the universe.
But inert, so can you define what it means inert?
So they escape the interaction with the Earth to us.
As in, they will behave in some very strange way to us,
because they're beyond, they're playing the meta game.
And the meta game is probably, say, like,
arranging quantum mechanical systems in some very weird ways
to extract infinite energy, solve the digital expansion of pie to whatever
amount they will build their own, like, little fusion reactors or something crazy. Like,
they're doing something beyond comprehension and not understandable to us and actually brilliant
under the hood.
What if quantum mechanics itself is the system and we're just thinking it's physics, but
we're really parasites, not parasites, we're not really hurting physics.
We're just living on this organism, this organism and we're like trying to understand it,
but really it is an organism.
And with a deep, deep intelligence, maybe physics itself is
The organism that's doing the super interesting thing and we're just like one little thing Yeah, and sitting on top of it try to get energy from it
We're just kind of like these particles in the wave that I feel like is mostly deterministic and takes
Universe from some kind of a big bang to some kind of a super intelligent replicator,
some kind of a stable point in the universe given these laws of physics.
You don't think, as Einstein said, God doesn't play dice. So you think it's mostly deterministic.
There's no randomness in the thing. I think there's a deterministic. There's tons of...
Well, I'm, I'm gonna be careful with randomness. Sudo random. Yeah, I don't like random. I think maybe the laws of physics are deterministic
Yeah, I think their determiners just got really uncomfortable with this question
I just do you have anxiety about whether the universe is random or not?
What's like there's no randomness? It's you say you like goodwill hunting. It's not your fault under it
It's not your fault man Andre. It's not your fault, man.
So you don't like randomness.
Yeah, I think it's unsettling.
I think it's a deterministic system.
I think that things that look random, like say the collapse of the wave function, etc.
I think they're actually deterministic, just entanglement and so on.
And some kind of a multi-verse theory, something, something.
Okay, so why does it feel like we have a free will?
Like, if I raise this hand, I chose to do this now.
What... that doesn't feel like a deterministic thing.
It feels like I'm making a choice.
It feels like it.
Okay, so it's all feelings. It's just feelings.
So, when an RL agent does make any choice is that it's not
really making a choice, the choice is already there. Yeah, you're interpreting the choice
and you're creating a narrative for having made it. Yeah, and now we're talking about
the narrative, it's very meta. Looking back, what is the most beautiful or surprising idea
in deep learning or AI in general that you've come across?
You've seen this field explode and grow in interesting ways.
Just what cool idea is like we made you sit back and go,
small, bigger, small.
Well, the one that I've been thinking about recently,
the most probably is the transformer architecture.
So, basically, neural hours have a lot of architectures that were trendy have come and gone for different
sensor modalities like for vision, audio, text.
You would process them with different looking neural mats.
And recently, we've seen these convergence towards one architecture, the transformer.
And you can feed it video or you can feed it,
images or speech or text, and it just gobbles it up.
And it's kind of like a bit of a general purpose computer.
There is also trainable and very efficient
to run in our hardware.
And so this paper came out in 2016, I want to say.
Attention is all you need.
Attention is all you need.
You could have said the paper title in retrospect that it wasn't, it didn't foresee the bigness
of the impact that it was going to have.
Yeah, I'm not sure if the authors were aware of the impact that paper would go on to
have, probably they weren't.
But I think they were aware of some of the motivations and design decisions beyond the
transformer and they chose not to, I think, expand on it in that way in the paper.
And so I think they had an idea that there was more
than just the surface of just like,
we're just doing translation
and here's a better architecture.
You're not just doing translation.
This is like a really cool,
differentiable, optimizable, efficient computer
that you've proposed.
And maybe they didn't have all of that foresight,
but I think it's really interesting.
Isn't it funny, sorry to interrupt,
that title
is memeable that they went for such a profound idea they went with a I don't think anyone
used that kind of title before, right? Attention is all you need. Yeah, it's like a meme
or something. Yeah, it's not funny that one. Like maybe if it was a more serious title,
you would have the impact. Honestly, yeah, there is an element of me that honestly agrees with you and prefers it this way.
Yes.
Ah.
If it was too grand, it would overpromise
and then underdeveloper potentially.
So you want to just mean your way to greatness.
That should be a t-shirt.
So you tweeted a transformers that magnificent neural network
architecture because it is a general purpose
differential computer.
It is simultaneously expressive in the forward pass,
optimizable via back propagation gradient descent
and efficient high parallelism compute graph.
Can you discuss some of those details
expressive, optimizable, efficient memory
or in general, whatever comes to your heart?
You want to have a general purpose computer that you can train on arbitrary problems,
like say the task of next work production or detecting if there's a cat in
an image or something like that.
You want to train this computer so you want to set its weights.
I think there's a number of design criteria that overlap in the transformer simultaneously
that made it very successful.
I think the authors were deliberately trying to make this really powerful architecture.
Basically, it's very powerful in the forward pass because it's able to express
a very general computation as something that looks like message passing.
You have nodes and they all store vectors.
These nodes get to basically look at each other and it's each other's vectors and they get to communicate and basically
nodes get to broadcast, hey I'm looking for certain things and then other nodes get to broadcast,
hey these are the things I have, those are the keys in the values.
So it's not just attention. Yeah exactly, trust former is much more than just the attention
component that's got many pieces architectural that went into it. The residual connection
of the weights arranged, there's a multi-layer perceptron and there
are the weights stacked and so on.
But basically, there's a message passing scheme where nodes get to look at each other, decide
what's interesting, and then update each other.
So I think the, when you get to the details of it, I think it's a very expressive function,
so it can express lots of different types of algorithms and forward pass.
Not only that, but the weights it's designed with the residual connections,
lane normalizations, the softmax, attention, and everything.
It's also optimizable. This is a really big deal because there's lots of computers.
There are powerful that you can't optimize or they're not easy to optimize using the techniques
that we have, which is backpropocation and grading in-scent. These are first-order methods,
very simple optimizers, really. And so you also needed to be optimizable.
And then lastly, you wanted to run efficiently in our hardware.
Our hardware is a massive throughput machine, like GPUs.
They prefer lots of parallelism.
So you don't want to do lots of sequential operations.
You want to do a lot of operations, seriously.
And the transformer is designed with that in mind as well.
And so it's designed for our hardware and is designed to both be very expressive in a forward pass,
but also very optimizable in the backward pass.
And you said that there was a dual connection support, a kind of ability to learn short algorithms fast and first,
and then gradually extend them longer during training.
Yeah, what's the idea of learning short algorithms?
Right. Think of it as a, so basically a transformer is a series of blocks, right? And these blocks
have attention and a little multi-layer perceptual. And so you go off into a block and you come back
to this residual pathway, and then you go off and you come back, and then you have a number of layers
arranged sequentially. And so the way to look at it, I think, is because of the residual pathway,
in the backward path, the gradients sort of flow allowing it uninterrupted because addition distributes
the gradient equally to all of its branches.
So the gradient from the supervision at the top just floats directly to the first layer.
And all the residual connections are arranged so that in the beginning, during initialization,
they contribute nothing to the residual pathway. What it looks like is, imagine the transformer is
a Python function, like a death.
You get to do various kinds of lines of code.
Say you have a hundred layers deep,
transformer, typically they would be much shorter, say 20.
So if 20 lines of code, then you can do something in them.
So think of during the optimization,
basically what it looks like is first, you optimize the first line of code, and then the second lines of code and you can do something in them. And so think of during the optimization basically what it looks like is first you optimize the first line of code and then
the second line of code can kick in and the third line of code can kick in and I kind of feel like
because of the residual pathway and the dynamics of the optimization, you can sort of learn a very
short algorithm that gets the approximate answer, but then the other layers can sort of kick in and
start to create a contribution. And at the end of it, you're optimizing over an algorithm that is
20 lines of code.
Except these lines of code are very complex because this
is an entire block of a transformer.
You can do a lot in there.
What's really interesting is that this transformer architecture
actually has been a remarkably resilient.
Basically, a transformer that came out in 2016
is the transformer you would use today,
except you reshuffle some delayer norms.
The delayer normalizations have been reshuffle
to a pre-norm formulation.
And so it's been remarkably stable, but there's a lot of bells and whistles that people have attached
on it and try to improve it. I do think that basically it's a big step in simultaneously
optimizing for lots of properties of a desirable neural network architecture, and I think people
have been trying to change it, but it's proven remarkably resilient. But I do think that there should
be even better architectures potentially.
But it's your admire the resilience here. There's something profound about this architecture
that at least is so maybe we can everything can be turned into a problem that transformers can solve.
Currently, definitely looks like the transformers taking over AI and you can feed basically arbitrary
problems into it and it's a general the
Fransurable computer and it's extremely powerful and
This conversions in AI has been really interesting to watch
For me personally what else do you think could be discovered here about transformers like what's the surprising thing or is it a stable?
I want a stable place. Is there something interesting something interesting where my discover about transformers,
like aha moments, maybe has to do with memory,
maybe knowledge representation, that kind of stuff.
Definitely the zeitgeist today is just pushing,
like basically right now the zeitgeist
is do not touch the transformer, touch everything else.
So people are scaling up the datasets,
making them much, much bigger.
They're working on the evaluation,
making the evaluation much, much bigger. They're working on the evaluation, making the evaluation much, much bigger.
And they're basically keeping the architecture unchanged.
And that's how we've, that's the last five years of progress
in AI kind of.
What do you think about one flavor of it, which is language models?
Have you been surprised?
Has your sort of imagination been captivated by you mentioned GPD and all the big and
big and bigger language models. And what are the limits of those models do you think?
So just for the task of natural language. Basically the way GPD is trained right as you just
download a massive amount of text data from the internet, and you try to predict the next word in a sequence,
roughly speaking.
You're predicting a little work chunks,
but roughly speaking, that's it.
And what's been really interesting to watch is,
basically, it's a language model.
Language models have actually existed for a very long time.
There's papers on language modeling from 2003, even earlier.
Can you explain that case what a language model is?
Yeah, so language model just basically the rough idea is
just predicting the next word in a sequence, roughly speaking.
So there's a paper from, for example, Ben Geo and the team from 2003,
where for the first time they were using a neural network to take say like three or five words
and predict the next word. And they're doing this
on much smaller datasets and the neural net is not a transformer. It's a multi-air perceptron,
but it's the first time that neural network has been applied in that setting. But even before neural
networks, there were language models, except they were using N-gram models. So N-gram models are
just a count-based models. So if you try to take two words and predict a third one,
you just count up how many times you've seen any two word combinations
and what came next.
And what you predict as coming next is just what you've seen the most of in the training set.
And so, language modeling has been around for a long time.
Neural networks have done, language modeling for a long time.
So, really what's new or interesting or exciting is just realizing that when you scale it up with powerful and
off neural net transformer, you have all these emerging properties where basically what happens is if you have a large
enough data set of text, you are in the task of predicting the next word, you are multitasking a huge amount of different kinds of problems.
You are multitasking, understanding of chemistry, physics, human nature.
Lots of things are sort of clustered in that objective.
It's a very simple objective, but actually you have to understand a lot about the world to make that prediction.
You just said the you word understanding.
Are you in terms of chemistry and physics and so on?
What do you feel like is doing?
Is it searching for the right context?
And like what is it?
What is the actual process happening here?
Yeah, so basically it gets a thousand words
and is trying to predict a thousand at first.
And in order to do that very, very well,
over the entire data set available on the internet,
you actually have to basically understand the context of what's going on in there.
It's a sufficiently hard problem that if you have a powerful enough computer,
like a transformer, you end up with interesting solutions.
You can ask it to do all kinds of things. And it shows a lot of
emerging properties like in context learning, that was the big deal with GPD and the original
paper when they published it, is that you can just sort of prompt it in various ways and ask
it to do various things. And it will just kind of complete the sentence. But in the process of
just completing the sentence, it's actually solving all kinds of really interesting problems that we
care about. Do you think it's doing something like kinds of really interesting problems that we care about.
Do you think is doing something like understanding?
Like when we use the word understanding for us humans, I think is doing some understanding
in its weights, it understands, I think, a lot about the world and it has to in order
to predict the next word in the sequence.
So it's trained on the data from the internet. What do you think
about this this approach in terms of datasets of using data from the internet? Do you think
the internet has enough structured data to teach AI about human civilization?
Yes, so I think the internet has a huge amount of data. I'm not sure if it's a complete
enough set. I don't know that text is enough for having a sufficiently
powerful AGI as an outcome. Of course, there is audio and video and images and all that
kind of stuff. Yeah, so text by itself, I'm a little bit suspicious about. There's a ton
of things we don't put in text in writing just because they're obvious to us about how
the world works and the physics of it and the things fall. We don't put that stuff in text
because why would you? We shared that understanding. And so, Texas communication medium between humans
and it's not a all-encompassing medium of knowledge
about the world.
But as you pointed out, we do have video
and we have images and we have audio.
And so I think that definitely helps a lot.
But we haven't trained models sufficiently
across all of those modalities yet.
So I think that's what a lot of people are interested in.
But I wonder what that shared understanding of what we might call common sense
has to be learned in third in order to complete the sentence correctly.
So maybe the fact that it's implied on the internet,
the model is going to have to learn that,
not by reading about it,
by inferring it in the representation.
So like common sense, just like we, I don't think we learn common sense.
Like nobody says tells us explicitly, we just figure it all out by interacting with
the world.
Right.
And so here's a model of reading about the way people interact with the world, it might
have to infer that.
I wonder.
Yeah.
You briefly worked in a project called World of Bits,
training an RL system to take actions on the internet,
versus just consuming the internet.
Like you talked about, do you think there's a future
for that kind of system interacting with the internet
to help the learning?
Yes, I think that's probably the final frontier
for a lot of these models.
Because, as you mentioned, when I was at OpenAI,
I was working on this project for a little bit.
Basically, it was the idea of giving neural networks access to a keyboard and an mouse,
and the idea was-
What could possibly go wrong.
Basically, you perceive the input of the screen pixels,
and basically, the state of the computer is sort of visualized for human
consumption in images of the web browser and stuff like that. And then you give
the neural or the ability to press keyboards and use the mouse. And we're trying to
get it to, for example, complete bookings and interact with user interfaces. And
would you learn from that experience? Like, what was some fun stuff? This is a super cool idea. Yeah. It's like, yeah, the step between observer to actor
is a super fascinating step.
Yeah.
Well, it's the universal interface in the digital realm,
I would say.
And there's a universal interface in the physical realm,
which in my mind is a humanoid form factor kind of thing.
We can later talk about optimists and so on.
But I feel like there's a, they're kind of thing. We can later talk about optimists and so on, but I feel like there's a,
they're kind of like similar philosophy in some way,
where the world, the physical world is designed
for the human form, and the digital world is designed
for the human form of seeing the screen
and using keyboard and mouse.
And so it's the universal interface
that can basically command the digital infrastructure
we've built up for ourselves.
And so it feels like a very powerful interface
to command and to build on top of.
Now, to your question is to like what I learned from that.
It's interesting because the world of bits
was basically too early, I think, at OpenAI at the time.
This is around 2015 or so.
And the zeitgeist at that time was very different
in AI from the zeitgeist
today.
At the time, everyone was super excited about reinforcement learning from scratch.
This is the time of the Atari paper where neural networks were playing Atari games and
beating humans in some cases, AlphaGo, and so on.
So everyone was very excited about training neural networks from scratch using reinforcement
learning directly.
It turns out that reinforcement learning is extremely an efficient way of training neural networks because you're taking all these actions and all these observations and you get some
sparse rewards once in a while. So you do all this stuff based on all these inputs and once in a
while you're like told you did a good thing, you did a bad thing. And it's just an extremely hard
problem when you can't learn from that. You can burn a forest, and you can sort of boot forest through it.
And we saw that I think with Go and Dota and so on,
and it does work, but it's extremely inefficient,
and not how you want to approach problems, practically speaking.
And so that's the approach that at the time
we also took to world of bits.
We would have an agent initialize randomly,
so with keyboard mash and mouse mash and try to make a booking and
It's just like revealed the insanity of that approach very quickly where you have to stumble by the correct booking in order to get a reward of
You did it correctly and you're never gonna stumble by it by chance at random
So even with a simple web interface there's too many options. There's just too many options
And it's two sparse of reward signal.
And you're starting from scratch at the time,
and so you don't know how to read,
you don't understand pictures, images, buttons,
you don't understand what it means to make a booking.
But now what's happened is it is time to revisit that,
and opening eyes is interesting in this,
companies like Adept are interested in this and so on.
And the idea is coming back because the interface
is very powerful,
but now you're not training an agent from scratch.
You are taking the GPT as an initialization.
So GPT is pre-trained on all of text.
And it understands what's a booking.
It understands what's a submit.
It understands quite a bit more.
And so it already has those representations.
They are very powerful.
And that makes all of the training significantly more efficient and makes the problem tractable.
Should the interaction be with the way humans see it, with the buttons and the language, or should be with the HTML JavaScript and the CSS?
What do you think is the better?
So today, all of this interaction is mostly on the level of HTML, CSS, and so on. That's done because of computational constraints. But I think ultimately,
everything is designed for human visual consumption. And so at the end of the day, there's all the additional information is in the layout of the web page and what's next to you and what's our
red background and all this kind of stuff and what it looks like visually. So I think that's the
final frontier as we are taking in pixels and we're giving out the keyboard mouse commands but I think it's in practical still today. Do you worry about bots on the internet? Given
these ideas, given how exciting they are, do you worry about bots on Twitter being
not the stupid bots that we see now with the crypto bots, but the bots that
might be out there actually that we don't see that they're interacting in
interesting ways. So this kind of system feels like it should be able to pass the, I'm not a robot click button, whatever. Which actually understand how that test works.
I don't quite like there's a check box or whatever that you click. It's presumably tracking
like mouse movement and the timing and so on. Yeah, so exactly this kind of system we're talking about should be able to pass that.
So yeah, what do you feel about bots that are language models,
plus have some interactability and are able to tweet and reply and so on,
do you worry about that world?
Yeah, I think it's always been a bit of an arms race between the attack and the defense.
The attack will get stronger, but the defense will get stronger as well.
Our ability to detect that.
How do you defend?
How do you detect?
How do you know that your Carpate account on Twitter is human?
How would you approach that?
If people were claimed, how would you defend yourself in the court of law that I am a human
as a county?
At some point, I think the society will evolve a little bit.
We might start signing, digitally signing some of our correspondence or things that we create.
Right now, it's not necessary, but maybe in the future, it might be.
I do think that we are going towards a world where we share the digital space with AI's synthetic beings. Yeah. And
they will get much better and they will share our digital realm and they'll eventually
share our physical realm as well. It's much harder. But that's kind of like the world we're
going towards. And most of them will be benign and awful. And some of them will be malicious
and it's going to be an arms race trying to detect them.
So, I mean, the worst isn't the AI,
the worst is the AI is pretending to be human.
So, I don't know if it's always malicious.
There's obviously a lot of malicious applications, but...
It could also be, you know, if I was an AI,
I would try very hard to pretend to be human
because we're in a human world. I wouldn't get any respect as an AI, I would try very hard to pretend to be human because we're in a human world.
I wouldn't get any respect as an AI.
I want to get some love and respect.
I don't think the problem is intractable.
People are thinking about the proof of personhood.
We might start digitally signing our stuff and we might all end up having like, yeah, basically
some solution for proof of personhood.
It doesn't seem to me intractable.
It's just something that we haven't had to do until now,
but I think once the need really starts to emerge,
which is soon, I think what people think about it much more.
So, but that too will be a race because obviously,
you can probably spoof or fake the proof of personhood.
So you have to try to figure out how to...
Probably.
I mean, it's weird that we have like social security numbers
and like passports and stuff.
It seems like it's harder to fake stuff in the physical space,
but in the digital space, it just feels like
it's gonna be very tricky, very tricky to out.
Because it seems to be pretty low cost fake stuff. What are
you going to put an AI in jail for like trying to use a fake person to prove? I mean,
okay, fine, you'll put a lot of AIs in jail, but there'll be more AIs, I'll put you
like exponentially more. The cost of creating a bot is very low. Unless there's some kind of way to track accurately, like you're not
allowed to create any program without showing tying yourself to that program. Like any program that
runs on the internet, you'll be able to trace every single human program that was involved with that
program.
Yeah, maybe you have to start declaring
when we have to start drawing those boundaries
and keeping track of, okay,
what are digital entities versus human entities?
And what is the ownership of human entities
and digital entities and something like that?
I don't know, but I think I'm optimistic
that this is possible.
In some sense, we're currently in the worst time of it because all these bots suddenly
have become very capable, but we don't have the fences yet built up as a society.
There doesn't seem to be intractable. It's just something that we have to deal with.
It seems weird that the Twitter bot, like really crappy Twitter bots are so numerous.
Like, is it?
So I presume that the engineers at Twitter are very good.
So it seems like what I would infer from that
is it seems like a hard problem.
They're probably catching, all right,
if I were to sort of steal man the case,
it's a hard problem and there's a huge cost to
false positive to
removing a post by somebody that's not a bot. That's a crazy, very bad user experience. So they're
very cautious about removing. So maybe it's and maybe the bots are really good at learning what gets removed
and not such that they can stay ahead of the removal process very quickly.
My impression of it honestly is there's a lot of long for it.
I mean, yeah, just that's what I, it's not so.
My impression of it.
It's not so, but you have, yeah, that's my impression as well.
But it feels like maybe you're seeing the tip of the
iceberg. Maybe the number of bots isn't like the trillions and you have to like just
it's a constant assault of bots. Yeah, yeah, I don't know.
I have to still man the case because the bots I'm seeing are pretty like obvious. I could
write a few lines of code that can't see spots.
I mean, definitely there's a lot of long fruit, but I will say I agree that if you are a sophisticated
actor, you could probably create a pretty good bot right now, you know, using tools like
GPTs because it's a language model. You can generate faces that look quite good now,
and you can do this at scale. And so I think, yeah, it's quite plausible and it's going
to be hard to defend.
There was a Google engineer that claimed that the Lambda was sentient. Do you think there's any inkling of truth to what he felt? And more importantly, to me at least, do you think language models
will achieve sentience or the illusion of sentience soonish.
Yeah.
To me, it's a little bit of a canary in a coal mine kind of moment, honestly, a little bit, because so this engineer spoke to like a chatbot at Google and
we can convince that this bought a sentient.
Asked it's some existential philosophical question and gave like reasonable
answers and looked real and so on.
for a thoughtful question. And it gave like reasonable answers and looked real and so on.
So to me, it's a, he was, he wasn't sufficiently trying
to stress the system, I think, and exposing the truth of it
as it is today.
But I think this will be increasingly harder
over time.
So, yeah, I think more and more people will basically become,
yeah, I think there will be more people like that over time
as this gets better.
Like form and emotional connection to an AI.
Yeah, perfectly plausible in my mind.
I think these AI's are actually quite good
at human connection, human emotion.
A ton of text on the internet is about humans and connection and love and so on.
I think they have a very good understanding
in some sense of how people speak to each other about this.
They're very capable of creating a lot of that kind of text.
There's a lot of sci-fi from 50s and 60s that
imagined AI's in a very different way.
They are calculating cold Balkan-like machines.
That's not what we're getting today.
We're getting pretty emotional AI's
that actually are very competent and capable
of generating, you know, possible sounding text
with respect to all of these topics.
See, I'm really hopeful about AI systems
that are like companions that help you grow,
develop as a human being, help you maximize
long-term happiness. But I'm also very worried about AI systems
They figure out from the internet the humans get attracted to drama
And so these would just be like shit talking a eyes
They just constantly did you hear they'll do gossip they'll do they'll try to plant
seeds of suspicion to other humans that you love and trust and just kind of mess
with people. You know, because that's going to get a lot of attention. So drama
maximized drama on the path to maximizing engagement and us humans will feed
into that machine. Yeah. And get it'll be a giant drama shit storm. So I'm worried about that.
So it's the objective function really defines the way
that human civilization progresses with the eyes in it.
Yeah.
I think right now at least today they are not sort of,
it's not correct to really think of them
as goal seeking agents that want to do something.
They have no long-term memory or anything.
They, it's literally a good approximation of it is,
you get a thousand words and you're trying to
pretty good a thousand at first and then you continue feeding it in.
You are free to prompt it in whatever way you want.
So in text, so you say, okay, you are a psychologist and you are very good and you love humans.
Here's a conversation between you and another human,
human-colon-something, you-something. Then it just continues the pattern. Suddenly, you're having a conversation between you and another human, human calling something, you something.
And then it just continues the pattern.
And suddenly you're having a conversation
with a fake psychologist who's trying to help you.
And so it's still kind of like an aroma of a tool.
It is a, people can prompt it in arbitrary ways
and it can create really incredible text.
But it doesn't have long-term goals
over long periods of time.
It doesn't try to, so it doesn't look that way right now.
But you can do short-term goals that have long-term effects.
So if my prompting short-term goal is to get Andra Kapatich respond to me on Twitter
whenever, like I think that's the goal, but it might figure out that talking shit to
you will be the best in a highly sophisticated, interesting way.
And then you build up a relationship
when you were spilling once. And then it like over time, it gets to not be sophisticated and just
like just talk shit. And okay, maybe you won't get to Andre, but it might get to another celebrity and might get into other big
Accounts and it'll just so we just that simple go get them to respond. Yeah, maximize the probability of a actual response
Yeah, I mean you could prompt a powerful model like this with their its opinion about how to do any possible thing
You're interested in so they will just they're kind of on track to become these oracles
I could sort of think of it that way. They are oracles currently is just text,
but they will have calculators, they will have access to Google search, they will have all kinds
of gadgets and gizmos, they will be able to operate the internet and find different information.
And yeah, in some sense, that's kind of like currently what it looks like in terms of the development. Do you think it'll be an improvement
eventually over what Google is for access to human knowledge?
Like it'll be a more effective search engine
to access human knowledge?
I think there's definitely scope
in building a better search engine today.
And I think Google, they have all the tools, all the people,
they have everything they need, they have all the possible pieces,
they have people training transformers, at scale,
they have all the data.
It's just not obvious if they are capable as an organization to innovate on their search
engine right now.
And if they don't, someone else will.
There's absolutes code for building a significantly better search engine built on these tools.
It's so interesting.
A large company where the search, there's already an infrastructure.
It works as it brings out a lot of money.
So where structuring inside a company is their motivation to pivot. the search there's already an infrastructure it works as brings out a lot of money so where
structurally inside a company is their motivation to pivot to say we're going to build a new search engine. That's hard. So it's usually going to come from a startup. That's that would be yeah or
some other more competent organization. So I don't know. So currently, for example, maybe Bing has another shot at it, you know, as an
go Microsoft, we're talking offline. I mean, I definitely, it's really interesting because search engines used to be about, okay, here's some query. Here's, here's, here's what pages that look like the stuff that you have, but you could just directly go to answer and then have supporting evidence.
And these models basically, they've read all the texts
and they've read all the web pages.
And so sometimes when you see yourself going over
to search results and sort of getting like a sense
of like the average answer to whatever you're interested in,
like that just directly comes out.
You don't have to do that work.
So they're kind of like, yeah, I think they have a way to this, of distilling all that knowledge
into like some level of insight, basically.
Do you think of prompting as a kind of teaching and learning like this whole process, like
another layer?
You know, because maybe that's what humans are.
We already have that background model and you model and the world is prompting you.
Yeah, exactly.
I think the way we are programming these computers now,
like GPDs, is converging to how you program humans.
I mean, how do I program humans via prompt?
I go to people and I prompt them to do things.
I prompt them for information.
And so natural language prompt is how we program humans.
And we're starting to program computers directly in that interface. It's like pretty remarkable, honestly. So you've spoken a lot about the idea of software 2.0.
All good ideas become like cliches so quickly like the terms is kind of hilarious. Um, it's like, I think M&M wants that like, if he gets annoyed by a song, he's written
very quickly. That means it's going to be a big hit because it's too catchy.
But, uh, can you describe this idea and how you're thinking about it has evolved over the
months and years since, since you coined it?
Yeah. Yes.
I had a block post on software 2.0.
I think several years ago now.
And the reason I wrote that post is because I kind of saw something remarkable happening
in software development and how a lot of code was being transitioned to be written not
in sort of like C++ and so on, but it's written in the weights of a neural net.
Basically just saying that neural nets are taken over software, the realm of software, and taking more and more and more tasks.
And at the time, I think not many people understood this deeply enough that this is a big deal,
this is a big transition.
Neural networks were seen as one of multiple classification algorithms you might use for
your data set problem on Kaggle.
Like, this is not that.
This is a change in how we program computers.
And I saw neural nets as this is going to take over.
The way we program computers is going to change is not going to be people
writing a software in C++ or something like that and directly programming the software.
It's going to be accumulating training sets and data sets and crafting these objectives
by which you train these neural nets. And at some point, there's going to be a compilation process
from the data sets and the objective
and the architecture specification into the binary,
which is really just the neural net weights
and the forward pass of the neural net.
And then you can deploy that binary.
And so I was talking about that sort of transition.
And that's what the post is about.
And I saw this sort of play out in a lot of fields,
auto-poll being one of them,
but also just simple image classification.
People thought originally in the 80s and so on
that they would write the algorithm
for detecting a dog in an image.
And they had all these ideas about how the brain does it,
and first we detect corners, and then we detect lines,
and then we stitch them up,
and they were like really going at it. They were like thinking about how they're
going to write the algorithm and this is not the way you build it. And there was a smooth transition
where okay, first we thought we were going to build everything. Then we were building the features
so like hog features and things like that that detect these little statistical patterns from image
patches and then there was a little bit of learning on top of it,
like a support vector machine or binary classifier for cat versus dog and images on top of the features.
So we wrote the features, but we trained the last layer, sort of the classifier.
And then people are like, actually, let's not even design the features because we can't,
honestly, we're not very good at it.
So let's also learn the features.
And then you end up with basically a convolutional neural net,
where you're learning most of it,
you're just specifying the architecture,
and the architecture has tons of fill-in blanks,
which is all the knobs,
and you let the optimization write most of it.
So this transition is happening across the industry everywhere,
and suddenly we end up with a ton of code
that is written in neural net weights.
And I was just pointing out that the analogy is actually pretty strong.
We have a lot of developer environments for software 1.0,
like we have IDE,
how you work with code,
how you debug code, how you run code,
how do you maintain code we have GitHub.
I was trying to make those analogies in the new realm.
What is the GitHub software 2.0?
Turns out it's something that looks like hugging face right now.
You know, and so I think some people took it seriously and built
cool companies and many people originally attacked the post.
It actually was not well received when I wrote it.
I think maybe it has something to do with the title,
but the post was not well received and I think
more people have been coming around to it over time.
Yeah, so you were the director of AI at Tesla
where I think this idea was really implemented at scale
which is how you have engineering teams doing software 2.0.
So can you sort of linger on that idea of,
I think we're in the really early stages of everything
you just said, which is like GitHub IDEs.
How do we build engineering teams that work in software 2.0 systems? And the data collection and the data annotation, which is all part of that software 2.0.
What do you think is the task of programming a software 2.0?
Is it debugging in the space of hyperparameters, or is is it debugging in the space of hyperparameters
or is it also debugging in the space of data?
Yeah, the way by which you program the computer
and influence its algorithm is not by writing the commands
yourself, you're changing mostly the data set.
You're changing the loss functions of what the neural net
is trying to do, how it's trying
to predict things, but basically the datasets and the architecture, so the neural net.
So in the case of the autopilot, a lot of the datasets had to do with, for example, detection
of objects and lane line markings and traffic lights and so on.
So you accumulate massive datasets of, here's an example, here's the desired label.
And then here's roughly how the architect, here's roughly what the algorithm should look like,
and that's a completion on neural net.
The specification of the architecture is like a hint
as to what the algorithm should roughly look like.
Then the fill-in-the-blanks process of optimization is the training process.
Then you take your neural net that was trained,
it gives all the right answers on your data set and you deploy it. So there is, in that case, perhaps at all machine learning
cases, there's a lot of tasks.
So is coming up formulating a task,
like for a multi-headed neural network,
is formulating a task part of the programming?
Yeah, very much so.
How do you break down a problem into a set of tasks? Yeah. I'm going to
high level, I would say, if you look at the software running in the autopilot, I give a number of
talks on this topic. I would say originally, a lot of it was written in software 1.0.
There was, imagine lots of C++, right? And then gradually, there was a tiny neural nut that was,
for example, predicting, given a single image, is there like a tiny neural nut that was, for example, predicting given a
single image, is there like a traffic light or not, or is there a lane line marking or not.
And this neural nut didn't have too much to do in the scope of the software.
It was making tiny predictions on individual little image.
And then the rest of the system stitched it up.
So, okay, we're actually, we don't have just a single camera with eight cameras.
We actually have eight cameras over time.
And so what do you do with these predictions?
How do you put them together?
How do you do the fusion of all that information?
And how do you act on it?
All of that was written by humans in C++.
And then we decided, okay, we don't actually want
to do all of that fusion in C++ code,
because we're actually not good enough to write that algorithm.
We want the neural nuts to write the algorithm.
And we want to port all of that software into the 2.0 stack. And so then we actually
have neural nuts that now take all the eight camera images simultaneously and make predictions
for all of that. So, and actually they don't make predictions in the space of images.
They now make predictions directly in 3D. And actually, they don't, in three dimensions,
around the car.
And now, actually, we don't manually
fuse the predictions in 3D over time.
We don't trust ourselves to ride that tracker.
So, actually, we give the neural net the information
over time, so it takes these videos now
and makes those predictions.
And so, you're starting to like putting more and more power
into the neural net, more and more processing.
And at the end of it, the eventual sort of goal is to have most of the software potentially
be in the 2.0 land because it works significantly better.
Humans are just not very good at writing software, basically.
So the prediction is happening in this like 4D land with three dimensional world over time.
How do you do annotation in that world?
What have you, as a data annotation,
whether it's self-supervised or manual by humans,
is a big part of the self-super-to-point-all world?
Right. I would say by far in the industry,
if you're talking about the industry and how,
what is the technology of what we have available,
everything is supervised learning. So you need a data sets of input desired output and you need lots of it. And there are three properties of it that you need.
You need it to be very large. You need it to be accurate. No mistakes. And you need it to be diverse. You don't want to
just have a lot of correct examples of one thing. You need to really cover the space of possibility as much as you can. And the more you can cover the space of possible inputs,
the better the algorithm will work at the end. Now, once you have really good data sets that you're
collecting, curating, and cleaning, you can train your neural net on top of that. So a lot of
the work goes into cleaning those data sets. Now, as you pointed out, it's probably, it could be,
the question is, how do you achieve a ton of, if you want to basically predict in. Now, as you pointed out, it's probably, it could be, the question is, how do you achieve
a ton of, if you want to basically predict in 3D, you need data in 3D to back that up.
So in this video, we have eight videos coming from all the cameras of the system.
And this is what they saw.
And this is the truth of what actually was around.
There was this car, there was this car, this car.
These are the lane line markings.
This is the geometry of the road.
There's traffic light in this redemotional position.
You need the ground truth.
And so the big question that the team was solving, of course,
is how do you arrive at that ground truth?
Because once you have a million of it,
and it's large, clean, and diverse, then training
in your own met on it works extremely well.
And you can ship that into the car.
And so there's many mechanisms by which we collected that
at training data.
You can always go for a human annotation.
You can go for simulation as a source of Gruntreat.
You can also go for what we call the offline tracker that we've spoken about at the AI day
and so on, which is basically an automatic reconstruction process for taking those videos
and recovering the three-dimensional sort of reality of what was around that car.
So basically think of doing like a three-dimensional reconstruction as an offline thing,
and then understanding that, okay,
there's 10 seconds of video, this is what we saw,
and therefore here's all the lane lines cars and so on.
Then once you have that annotation,
you can train your own led to imitate it.
How difficult is the reconstruction?
It's difficult, but it can be done.
So there's overlap between the cameras and you is the reconstruction? It's difficult, but it can be done. So there's overlap between the cameras
and you do the reconstruction.
And there's perhaps if there's any inaccuracy,
so that's called an annotation step.
Yes, the nice thing about the annotation
is that it is fully offline.
You have infinite time.
You have a chunk of one minute, and you're
trying to just offline in a super computer somewhere,
figure out where were the positions of all the cars, of all the people. And you have chunk of one minute and you're trying to just offline in a supercomputer somewhere, figure out where were the positions of all the cars,
all the people, and you have your full one minute video from all the angles.
And you can run all the neural nets you want, and they can be very efficient,
massive neural nets. There can be neural nets that can't even run in the car
later at test time. So they can be even more powerful neural nets than what you
can eventually deploy. So you can do anything you want, three dimensional
reconstruction neural nets, anything you want just to recover that truth, and then you supervise that truth.
What have you learned? You said no mistakes about humans doing annotation, because I assume humans
there's like a range of things they're good at in terms of clicking stuff on screen. Isn't that
how interesting is that you have a problem of designing an annotator or humans are accurate
and do it, like what are they even the metrics
or efficient or productive, all that kind of stuff?
Yeah, so I grew the annotation team at Tesla
from basically zero to a thousand while I was there.
That was really interesting.
You know, my background is a PhD student researcher.
So growing that common organization was pretty interesting. You know, my background is a PhD student researcher. So growing that common organization was pretty crazy.
But yeah, I think it's extremely interesting
and part of the design process very much
behind the autopilot as to where you use humans.
Humans are very good at certain kinds of amitations.
They're very good, for example,
at two dimensional annotations of images.
They're not good at annotating cars over time
in three dimensional space, very, very hard.
And so that's why we were very careful to design the tasks that are easy to do for humans
versus things that should be left to the offline tracker.
Like maybe the computer will do older triangulation in three-degree construction, but the human
will say exactly these pixels of the image are car.
Exactly these pixels are human.
And so co-designing the data annotation pipeline was very much,
Brandon Butter was what I was doing daily. Do you think there's still a lot of open problems in that
space? Just in general, annotation where the stuff the machines are good at, machines do, and the
humans do what they're good at, and there's maybe some iterative process. Right. I think to a very
large extent, we went through a number of iterations and we learned a ton
about how to create these datasets.
I'm not seeing big open problems.
Like originally when I joined, I was like, I was really not sure how this would turn out.
Yeah.
But by the time I left, I was much more secure and actually we sort of understand the
philosophy of how to create these datasets.
And I was pretty comfortable with where that was at the time. So what are strengths and limitations of cameras for the driving
task in your understanding? When you formulate the driving task as a vision task with eight cameras,
you've seen that the entire, you know, most of the history of the computer vision field when it
has to do a new one that works, what, Just if you step back, what are the strengths and limitations of pixels?
Of using pixels to drive?
Yeah, pixels, I think, are a beautiful sensory,
beautiful sensor, I would say.
The things like cameras are very, very cheap
and they provide a ton of information, ton of bits.
Also, it's an extremely cheap sensor for a ton of bits
and each one of these bits has a constraint
on the state of the world.
And so you get lots of megapixel images, very cheap, and it just gives you all these constraints for understanding what's actually out there in the world.
So vision is probably the highest bandwidth sensor.
It's a very high bandwidth sensor.
And I love that pixels is a constraint on the world.
It's this highly complex, high bandwidth constraint on the world
on the state of the world.
That's fascinating.
And it's not just that, but again, this real, real importance of,
it's the sensor that humans use.
Therefore, everything is designed for that sensor.
Yeah.
The text, the writing, the flashing signs, everything is designed for that sensor. The text, the writing, the flashing signs,
everything is designed for vision.
And so you just find it everywhere.
And so that's why that is the interface you want to be in,
talking again about these universal interfaces.
And that's where we actually want to measure the world as well,
and then develop software for that sensor.
But there's other constraints on the state of the world that humans use to understand
the world.
I mean, vision ultimately is the main one, but we're like referencing our understanding
of human behavior and some common sense physics that could be inferred from vision from a perception
perspective, but it feels like we're using some kind of reasoning
to predict the world.
Not just the pixels.
I mean, you have a powerful prior,
for how the world evolves over time, et cetera.
So it's not just about the likelihood term coming up
from the data itself,
telling you about what you are observing,
but also the prior term of where the likely things
to see and how do they likely move and so on.
The question is how complex is the range of possibilities that might happen in the driving
task.
Is that to you still an open problem of how difficult is driving, like philosophically speaking? All the time you work on driving, do you understand how hard driving is?
Yeah, driving is really hard because it has to do with the predictions of all these other
agents and the theory of mind and, you know, what they're going to do.
And are they looking at you?
Are they where are they looking?
Where are they thinking?
Yeah.
There's a lot that goes there at the, at the full tail of, you know, the expansion of the
noise that we have to be comfortable with it eventually. The final problems are of that form.
I don't think those are the problems that are very common. I think eventually they're important,
but it's like really in the tail end. In the tail end, the rare edge cases, from the vision
perspective, what are the toughest parts of the vision problem of driving? Well, basically, the sensor is extremely powerful,
but you still need to process that information.
And so going from brightnesses of these pixel values to,
hey, here the three-dimensional world is extremely hard.
And that's what the neural networks are fundamentally doing.
And so the difficulty really is in just doing an extremely good job of
engineering the entire pipeline, the entire data engine,
having the capacity to train these neural nets, having the ability to evaluate the system and iterate on it.
So I would say just doing this in production at scale is like the heart part, it's an execution problem. So the data engine, but also the deployment of the system,
such that has low latency performance.
So it has to do all these steps.
Yeah, for the neural, not specifically,
just making sure everything fits into the chip on the car.
And you have a finite budget of flops
that you can perform and memory bandwidth
and other constraints.
And you have to make sure it flies.
And you can squeeze in as much computer
as you can into the tiny.
What have you learned from that process?
Because maybe that's one of the bigger,
like new things coming from a research background,
where there's a system that has to run
under heavily constrained resources,
has to run really fast.
What kind of insights have you learned from that?
Yeah, I'm not sure if there's too many insights.
You're trying to create a neural method that will fit in what you have available,
and you're always trying to optimize it.
We talked a lot about it on the AI day,
and basically the triple backflips that the team is doing to make sure it all fits and utilizes the engine.
I think it's extremely good engineering.
And then there's all kinds of little insights
peppered in on how to do it properly.
Let's actually zoom out,
because I don't think we talked about the data engine,
the entirety of the layout of this idea
that I think is just beautiful with humans in the loop.
Can you describe the data engine?
Yeah, the data engine is what I call
the almost biological feeling like process by which
you perfect the training sets for these neural networks.
So because most of the programming now is in the level of these data sets and make sure
they're large, diverse and clean, basically you have a data set that you think is good.
You train your neural net, you deploy it,
and then you observe how well it's performing.
You're trying to always increase the quality of your data set.
You're trying to catch scenarios,
basically, they are basically rare.
It is in these scenarios that your neural net will typically
struggle in because they weren't told what to do in
those rare cases in the data set.
But now you can close the loop because if you can now collect all those at scale,
you can then feed them back into the reconstruction process
I described and reconstruct the truth in those cases
and add it to the dataset.
And so the whole thing ends up being like a staircase
of improvement of perfecting your training set.
And you have to go through deployments
so that you can mine the parts that are not yet
represented well on the data set.
So your data says basically imperfect, it needs to be diverse. It has pockets,
there are missing, and you need to pat out the pockets. You can sort of think of it that way
in the data. What role do humans play in this? So what's the this biological system,
like a human body is made up of cells? What role, like how do you optimize the human system,
the multiple engineers collaborating, figuring out
what to focus on, what to contribute,
which task to optimize in this neural network,
who is in charge of figuring out which task
needs more data, can you speak to the hyperparameters,
the human system?
It really just comes down to extremely good execution
from an engineering team who knows what they're doing.
They understand intuitively the philosophical insights
underlying the data engine and the process
by which the system improves.
And how to, again, delegate the strategy
of the data collection and how that works.
And then just making sure it's all extremely well executed.
And that's where most of the work is,
is not even the philosophizing or the research
or the ideas of it, it's just extremely good execution.
It's so hard when you're dealing with data at that scale.
So your role in the data engine executing well on it,
it's difficult and extremely important.
Is there a priority of like a vision board
of saying,
we really need to get better at stoplights?
The prioritization of tasks,
is that essentially, and that comes from the data?
That comes to, the very large extent
to what we are trying to achieve in the product format,
what we're trying to, the release we're trying to get out,
in the feedback from the Q18 worth it,
where the system is struggling or not, the things we're trying to get out in the feedback from the Q18 worth it where the system is struggling or not, the things we're trying to improve.
And the Q18 gives some signal, some information in aggregate about the performance of the
system in various conditions.
And then of course, all of us drive it and we can also see it.
It's really nice to work with a system that you can also experience yourself and it drives
you home.
Is there some insight you can draw from your individual experience that you just can't quite get from an aggregate statistical
And that's the data. Yeah, it's so weird, right? Yes
It's not scientific in a sense because you're just one anecdotal sample. Yeah, I think there's a ton of
It's a source of truth is your interaction with the system
Yeah, and you can see it you can play with it you can
It's a source of truth is your interaction with the system. And you can see it, you can play with it, you can perturb it, you can get a sense of it,
you have an intuition for it.
I think numbers, just like have a way of numbers and plots and graphs are much harder.
It hides a lot of, it's like if you train a language model, it's a really powerful way
is by you interacting with it.
Yeah, 100% try to build up an intuition.
Yeah, I think like Elon also, like, 100% try to build up an intuition. Yeah, I think like
Elon also like he always wanted to drive the system himself. He drives a lot and I want to say
almost daily. So he also sees this as a source of truth. You driving the system and it performing and
yeah. So what do you think? Tough questions here. So Tesla last year removed radar from the sensor suite, and now just announced there's
going to remove all ultrasonic sensors, relying solely on vision, so camera only.
Does that make the perception probably harder or easier?
I would almost reframe the question in some way.
So the thing is basically, you would think that additional sensors,
by the way, I can't just interrupt.
I wonder if a language model will ever do that if you prompt it.
Let me reframe your question.
That would be epic.
This is the wrong prompt.
Sorry.
It's like a little bit of a wrong question because basically,
you would think that these sensors are an asset to you.
Yeah.
But if you fully consider
the entire product and its entirety, these sensors are actually potentially liability,
because these sensors aren't free, they don't just appear on your car. You need suddenly,
you need an entire supply chain, you have people procuring it, there can be problems with them,
they may need replacement, they are part of the manufacturing process, they can hold back the line
in production, you need to source them, you need to maintain them, you have to have teams that ride the firmware,
all of the all of it, and then you also have to incorporate them, fuse them into the system in some way.
And so it actually like blotes the organ, the a lot of it. And I think Elon is really good at
Simplify, simplify, best part is no part. And he always tries to throw away things that are not essential because he understands the entropy in organizations and in an approach. And I think in this case,
the cost is high and you're not potentially seeing it if you're just a computer vision engineer.
And I'm just trying to improve my network and, you know, is it more useful or less useful
how useful is it? And the thing is, once you consider the full cost of a sensor, it actually
is potentially a liability and you need to be really sure that it's
giving you extremely useful information.
In this case, we looked at using it or not using it,
and the delta was not massive, and so it's not useful.
Is it also below in the data engine,
like having more sensors?
100% at the end.
And is a distraction.
And these sensors, they can change over time, for example.
You can have one type of say radar,
you can have other type of radar, they change over time, and I suddenly need to worry about it. And I
was talking to you of a column in your SQLite telling you, oh, what sensor type was it. And they
all have different distributions. And then they can, they just, they contribute noise and entropy
into everything. And they bloat stuff. And also organizationally has been really fascinating to
me that it can be very distracting. If you, if you. If you only want to get to work as vision,
all the resources are on it and you're building out a data engine,
and you're actually making forward progress because that is the sensor with
the most bandwidth, the most constraints on the world,
and you're investing fully into that and you can make that extremely good.
If you're only a finite amount of spend of focus across
different facets of the system.
And this kind of reminds me of Rich Sutton's a bit of lesson that just seems like simplifying the system.
Yeah.
In the long run, of course, you know, know what the long run, it seems to be always the right solution.
Yeah.
Yes.
In that case, it was for RRL, but it seems to apply generally across all systems, the dual
computation.
Yeah.
So, what do you think about the LiDAR as a crutch debate, the battle between point clouds
and pixels?
Yeah, I think this debate is always like slightly confusing to me, because it seems like
the actual debate should be about like, do you have the fleet or not?
That's like the really important thing about whether you can achieve a really good functioning
of an AI system at the scale.
So data collection systems.
Yeah, do you have a fleet or not?
It's significantly more important
whether you have LIDAR or not.
It's just another sensor.
And yeah, I think similar to the radar discussion,
basically I, yeah, I don't think it basically doesn't offer
extra information. It's extremely costly.
It has all kinds of problems.
You have to worry about it.
You have to calibrate it, et cetera.
It creates bloat and entropy.
You have to be really sure that you need this sensor.
In this case, I basically don't think you need it.
And I think honestly, I will make a stronger statement.
I think the others, some of the other companies that are using it are probably going to drop
it.
Yeah, so you have to consider the sensor in the full, in considering can you build a
big fleet that collects a lot of data and can you integrate that sensor with that data
and that sensor into a data engine that's able to quickly find different parts of the
data that then continuously
improves whatever the model that you're using.
Yeah, another way to look at it is like vision is necessary in a sense that the
drive, the world is designed for human visual consumption, so you need vision.
It's necessary.
And then also it is sufficient because it has all the information that you need for driving
and humans obviously is vision to drive.
So it's both necessary and sufficient, so you want to focus resources. And you have to be really
sure if you're going to bring in other sensors. You could add sensors to infinity at some point
you need to draw the line. And I think in this case you have to really consider the full cost of
any one sensor that you're adopting and do you really need it? And I think the answer in this case is no. So what do you think about the idea that the other companies
are forming high resolution maps and constraining heavily
the geographic regions in which they operate?
Is that approach not in your view
not going to scale over time to the entirety of the United States?
I think it'll take too long.
As you mentioned, they pre-map all the environments,
and they need to refresh the map.
They have a perfect centimeter-level accuracy map
of everywhere they're going to drive.
It's crazy. How are you going to...
We're talking about autonomy actually changing the world.
We're talking about the deployment on the global scale,
autonomous systems for transportation.
If you need to maintain a centimeter accurate map for Earth,
or like for many cities and keep them updated,
it's a huge dependency that you're taking on, huge dependency.
It's a massive, massive dependency,
and now you need to ask yourself, do you really need it?
Humans don't need it.
It's very useful to have a low-level map of like,
okay, the connectivity of your road, that know that there's a fork coming up.
When you drive an environment, you sort of have that high level understanding. It's like a small Google map and Tesla uses Google map like similar kind of resolution information in its system, but it will not pre map environment to send me to a lot of accuracy.
It's a crutch. It's a distraction. It costs entropy and it diffuses the team thatutes the team, and you're not focusing on what's actually necessary, which is the computer version problem.
What did you learn about machine learning, about engineering, about life, about yourself as one human being from working with Elon Musk?
I think the most I've learned is about how to sort of run organizations efficiently and how to
create efficient organizations and how to fight entropy in an organization. So human engineering
in the fight against entropy. Yeah. I think Elon is a very efficient warrior in the fight against
entropy in organizations. What does entropy in an organization look like? Exactly. It's process,
What does the entropy in an organization look like exactly? It's process, it's process and inefficiencies
in the form of meetings and that kind of stuff.
Yeah, meetings, he hates meetings,
he keeps telling people to skip meetings if they're not useful.
He basically runs the world's biggest startups, I would say.
Tesla SpaceX started the world's biggest startups.
Tesla actually is multiple startups,
I think is better to look at it that way. And so I think he's extremely good at that. And yeah, he's a very good intuition for
streamlining processes, making everything efficient. Best part is no part, simplifying,
focusing, and just kind of removing barriers, moving very quickly, making big moves.
All of this is a very start-upy sort of seeming things, but at scale.
So a strong drive to simplify, from your perspective,
I mean, that also probably applies to just designing systems
and machine learning and otherwise, like simplifies, simplifies.
Yes.
What do you think is the secret to maintaining
the startup culture in a company that grows?
Is there, can you expect that?
I do think you need someone in a powerful position with a big hammer like Elon, who's
like the cheerleader for that idea and ruthlessly pursues it. If no one has a big enough hammer,
everything turns into committees, democracy within the company, process, talking to stakeholders,
decision-making,
just everything, just crumbles.
If you have a big person who is also really smart and has a big hammer, things move quickly.
So you said your favorite scene in Interstellar is the intense docking scene with the AI and
Cooper talking, saying, Cooper, what are you doing?
Docking, it's not possible.
No, it's necessary.
Such a good line.
By the way, just so many questions there.
Why an AI in that scene presumably is supposed to be
able to compute a lot more than the human.
It's saying it's not optimal.
Why the human, I mean, that's a movie,
but shouldn't the AI know much better than the human?
Anyway, what do you think is the value of setting seemingly impossible goals?
So like our initial intuition, which seems like something that you have taken on that
Elon espouses that where the initial intuition of the community might say this is very difficult
and then you take it on anyway with a crazy deadline.
You're just from a human engineering perspective.
Have you seen the value of that?
I wouldn't say that setting impossible goals exactly is a good idea
but I think setting very ambitious goals is a good idea.
I think there's a what I call sub linear scaling of difficulty,
which means that 10x problems are not 10x hard. Usually 10x, 10x harder problem is like two or
three x harder to execute on. Because if you want to actually like, if you want to improve the system
by 10%, it costs some amount of work. And if you want to 10x improve the system, it doesn't cost
100x amount of the work. And It's because you fundamentally changed the approach.
If you start with that constraint,
then some approaches are obviously dumb enough going to work.
It forces you to re-evaluate.
I think it's a very interesting way of approaching problem solving.
It requires a weird thinking.
Going back to your PhD days,
how do you think which ideas in the machine learning
community are solvable? Yes. It requires, what is that? I mean, there's the cliche of first
principles thinking, but like it requires to basically ignore what the community is saying,
because it doesn't the community, doesn't A community in science usually draw lines of what
isn't as impossible. And like it's very hard to break out of that without going crazy.
Yeah. I mean, I think a good example here is, you know, the deep learning revolution in some
sense, because you could be in computer vision at that time during the deep learning sort of
revolution of 2012 and so on. you could be improving a computer vision stack
by 10% or we can just be saying,
actually all of this is useless.
And how do I do 10x better computer vision?
Well, it's not probably by tuning a hog feature detector.
I need a different approach.
I need something that is scalable,
going back to Richard Sutton's,
and understanding sort of like the philosophy
of the bitter lesson and then being like actually
I need much more scalable system like in your own network that in principle works and then having
some deep believers that can actually execute on that mission and make it work. So that's the
10x solution. What do you think is the timeline to solve the problem of autonomous driving?
That's still in part open question. Yeah, I think the tough thing with timelines of autonomous driving. This is still in part open question.
Yeah, I think the tough thing with timelines
of self-driving obviously is that no one
has created self-driving.
Yeah.
So it's not like, what do you think is a timeline
to build this bridge?
Well, we've built million bridges before.
Here's how long that takes.
It's, you know, it's a, no one has built autonomy.
It's not obvious.
Some parts turn out to be much easier than others.
So it's really hard to forecast.
You do your best based on trend lines and so on
and based on intuition, but that's why fundamental
is just really hard to forecast this.
No one has to.
So even still like being inside of it is hard to do.
Yes, some things turn out to be much harder
and some things turn out to be much easier.
Do you try to avoid making forecasts?
Because Elon doesn't avoid them, right?
And heads of car companies in the past have not avoided it either.
Ford and other places have made predictions that we're going to solve at level four driving
by 2020, 2021, whatever.
And now they're all kind of backtrack in that prediction. I you as a, as an AI person, do you for yourself privately make predictions or do they get in
the way of like your actual ability to think about a thing?
Yeah, I would say like what's easy to say is that this problem is tractable and that's
an easy prediction to make.
It's tractable.
It's going to work.
Yes, it's just really hard. Something's not not to be harder, some things are not to be easier.
But it definitely feels tractable and it feels like at least the team at Tesla, which is what I
saw internally, is definitely on track to that. How do you form a strong representation that allows
you to make a prediction about tractability? So like you're the leader of a lot a lot of humans
You have to kind of say this is actually possible
Like what how do you build up that intuition? It doesn't have to be even driving. It could be other tasks. It could be
And I want to what difficult tasks that you work on your life. I mean classification
At achieving certain just an image, certain level of superhuman level
performance.
Yeah, expert intuition.
It's just intuition, it's belief.
So just like thinking about it long enough,
like studying, looking at sample data,
like you said, driving.
My intuition is really flawed on this,
like I don't have a good intuition about tractability.
It could be anything, it could be solvable.
The driving task could be simplified into something quite trivial.
The solutions to the problem would be quite trivial.
At scale, more and more cars driving perfectly might make the problem much easier.
The more cars you have driving,
like people learn how to drive correctly,
not correctly, but in a way that's more optimal
for heterogeneous system of autonomous and semi-autonomous
and manually driven cars, that could change stuff.
Then again, also I've spent a ridiculous number of hours
just staring at pedestrians crossing streets, thinking about humans. And it feels like the way we use our
eye contact, it sends really strong signals. And there's certain quirks in edge cases of
behavior. And of course, a lot of the fatalities that happen have to do with drunk driving and both on the pedestrian side
and the driver side.
So there's that problem of driving at night and all that kind of stuff.
So I wonder, you know, let's say the space of possible solution to autonomous driving
includes so many human factor issues that it's almost impossible to predict.
It could be super clean, nice solutions.
Yeah. I would say definitely like to use a game analogy, there's some fog of war,
but you definitely also see the frontier of improvement. And you can measure historically how much
you've made progress. And I think, for example, at least what I've seen in roughly five years at
Tesla, when I joined it barely kept lane on the highway. I think going up from Pellalto to SF was
like three or four
interventions. Anytime the road would do anything geometrically or turn too much, it would just like
not work. And so going from that to like a pretty competent system in five years and seeing what
happens also under the hood and what the scale of which the team is operating now with respect to
data and compute to everything else is just massive progress. So you're climbing a mountain.
Yes, fog, but you're making a lot of progress.
You're making progress and you see what the next directions are.
And you're looking at some of the remaining challenges.
And they're not like, they're not perturbing you,
and they're not changing your philosophy, and you're not contorting yourself.
You're like, actually, these are the things I've always
to do. Yeah, the fundamental components of solving the problem seem to be there.
The data engine to the computer,
the computer on the car to the computer as a training all that kind of stuff.
So you've done
over the years you've been a test, you've done a lot of amazing
breakthrough ideas and engineering all of it
from the data engine to the human side all of it.
Can you speak to why you chose to leave Tesla?
Basically, as I described that, Ren,
I think over time, during those five years,
I've gotten myself into a little bit of a managerial position.
Most of my days were meetings and growing the organization
and making decisions about a high level strategic decisions
about the team and where it should be working on and so on.
And it's kind of like a corporate executive role.
And I can do it, I think I'm okay at it,
but it's not like fundamentally what I enjoy.
And so I think when I joined,
there was no computer vision team
because Tesla was just going from the transition
of using mobile eye, a third party vendor
for all of its computer vision
to having to build its computer vision system.
So when I showed up,
there were two people training deep neural networks
and they were training them at a computer,
at their legs, like, yeah.
That was a work, kind of, basic classification task.
Yeah, and so I kind of like grew that into
what I think is a fairly respectable deep learning team,
massive computer cluster, a very good data
annotation organization. And I was very happy with where that was. It became quite autonomous.
And so I kind of stepped away and I, you know, I'm very excited to do much more technical
things again. Yeah. And kind of like we focus on AGI.
What was this soul searching like? You took a little time off and think like, what, how
many mushrooms did you take? I mean, what was going through your mind the human lifetime is finite. Yeah,
you did a few incredible things. You're you're one of the best teachers of AI in the world.
You're one of the best and I don't mean that I mean the best possible way. You're one of the best
tinkers in the AI world, meaning like understanding the fundamental, fundamental
of how something works by building it from scratch and playing with it, with the basic
intuitions. It's like Einstein Feynman, we're all really good at this kind of stuff. Like
a small example of a thing to, to play with it, to try to understand it, so that in obviously
now with, with us, you have built a team of machine learning engineers
and assistant that actually accomplished something in the real world.
So given all that, what was the social searching like?
Well, it was hard because obviously I love the company a lot, and I love Elon.
I love Tesla.
I want to, it was so hard to leave.
I love the. I want it so it's hard to leave. I love the team basically.
But yeah, I think I actually, I will be potentially like interested in revisiting it. Maybe coming back at some point, we're working on Optimus, we're working on an AGI at Tesla.
I think Tesla is going to do incredible things. It's basically like,
it's a massive large scale robotics kind of company for the ton of
in-house talent for doing really incredible things. And I think human
robots are going to be amazing. I think autonomous transportation is going to be
amazing. All this is happening at Tesla. So I think it's just a really amazing
organization. So being part of it and helping it along, I think was very basically
I enjoyed that a lot. Yeah, it was basically difficult for those reasons because I love the company.
But I'm happy to potentially add something coming back for Act 2, but I felt like at this
stage, I built the team, it felt autonomous, and I became a manager, and I wanted to do a
lot more technical stuff, I wanted to learn stuff, I wanted to teach stuff, and I just kind
of felt like it was a good time for change of pace a little bit.
What do you think is the best movie sequel of all times speaking of part two? Because
most of them suck in movies sequels. Yeah. And you tweeted about movies. So just in a tiny
tangent, is there what's your, what's like a favorite movie sequel? Godfather part two.
Are you a fan of Godfather?
Because you didn't even tweet or mention the Godfather.
Yeah, I don't love that movie.
I know it hasn't even ended that out.
We're gonna edit out the hate towards the Godfather.
How dare you just spread that out?
I think I will make a strong statement.
I don't know why.
I don't know why, but I basically don't like any movie
before 1995.
Something like that?
Didn't you mention Terminator?
Two.
Okay, okay, that's like a,
Terminator 2 was a little bit later, 1990.
No, I think Terminator 2 was in the 80s.
And I like Terminator 1 as well.
So, okay, so like a few exceptions,
but by and large, for some reason,
I don't like movies before 1995 or something.
They feel very slow, the camera is like zoomed out,
it's boring, it's kind of naive, it's kind of weird.
And also, Terminator was very much ahead of its time.
Yes. And the Godfather, there's like no AGI.
I mean, but you have good will hunting was one of the movies you mentioned,
and that doesn't have any AGI either.
I guess that's mathematics.
Yeah. I guess occasionally I do enjoy movies that don't feature.
Or like Anchorman's that's the
thing for men is so good. I don't understand. I'm speaking of a GI because I don't understand why wolf arrows so funny.
It doesn't make sense. It doesn't compute. There's just something about him. And he's a singular human because you don't get that many
comedies these days. And I wonder if I have to do about the culture or the like the machine of Hollywood or does it have to do with just we got lucky
with certain people in comedy. It came together because he is a singular human.
That was a ridiculous tangent I apologize, but you mentioned human or robot. So what do you think
about Optimus? About Tesla bot about do you think we'll have robots
in the factory and in the home in 10, 20, 30, 40, 50 years? Yeah. I think it's a very hard project.
I think it's going to take a while. But who else is going to build humanoid robots at scale?
Yeah. And I think it is a very good form factor to go after. Because like I mentioned, the world
is designed for humanoid form factor. These things would be able to operate our machines. They would
be able to sit down in chairs,
potentially even drive cars.
Basically, the world is designed for humans.
That's the form factor you want to invest into
and make work over time.
I think there's another school of thought, which is,
okay, pick a problem and design a robot to it,
but actually designing a robot and getting a whole data
engine and everything behind it to work
is actually a really hard problem.
So it makes sense to go after general interfaces that,
okay, they are not perfect for any one given task,
but they actually have the generality of just with a prompt
with English able to do something across.
And so I think it makes a lot of sense to go after
a general interface in the physical world.
And I think it's a very difficult project.
It means going to take time.
But I've seen no other company that can execute on that vision.
I think it's going to be amazing.
Basically, physical labor,
like if you think transportation is a large market,
try physical labor.
It's insane.
But it's not just physical labor to me.
The thing that's also exciting is the social robotics.
The relationship will have on different levels
with those robots. That's why I was really excited to see Optimus, like people have criticized me
for the excitement, but I've worked with a lot of research labs that do humanoid
legular robots, Boston Dynamics, Unitry, a lot of companies that do legular robots, Boston Dynamics, Unitry, a lot of, there's a lot of companies that do legged robots, but that's the the elegance of the movement is a tiny tiny part of
the big picture. So integrating the two big exciting things to me about Tesla
doing humanoid or any legged robots is clearly integrating into the data engine. So the data engine aspect.
So the actual intelligence for the perception and the control and the planning and all that kind
of stuff, integrating into the fleet that you mentioned, right? And then speaking of fleet,
the second thing is the mass manufacturers, just knowing.
a fleet, the second thing is the mass manufacturers, just knowing culturally driving towards a simple robot that's cheap to produce at scale and doing that well, having experience to do that,
well, that changes everything. That's a very different culture and style than Boston Dynamics.
Who, by the way, those robots are just the way they move, it'll be a very long time before
Tesla can achieve this smoothness of movement.
But that's not what it's about.
It's about the entirety of the system, like we talked about the data engine and the fleet.
That's super exciting.
Even the initial models, but that too was really surprising that in a few months you can get a prototype.
Yeah. And the reason that happened very quickly is as you alluded to, there's a ton of copy-based
from what's happening on the autopilot. A lot. The amount of expertise that came out of the
woodworks at Tesla for building the human robot was incredible to see. Like basically Elon said,
at one point, we're doing this. And then next day basically,
like all these CAD models started to appear.
And people talking about like the supply chain
and manufacturing.
And people showed up with like screwdrivers
and everything like the other day
and started to like put together the body.
And I was like, whoa, like all these people exist at Tesla.
And fundamentally building a car
is actually not that different from building a robot.
The same and that is true, not just for the hardware pieces.
And also, let's not forget hardware, not just for demo, but manufacturing of that hardware at
scale. It's like a whole different thing. But for software as well, basically this robot currently
thinks it's a car. It's going to have a mid-life crisis. It thinks it's a car. Some of the earlier demos, actually, we were talking about potentially doing them outside
in the parking lot because that's where all of the computer vision was like working out
of the box instead of like inside.
But all the operating system, everything just copypaste, computer vision, mostly copypaste.
I mean, you have to retrain the neural nuts, but the approach on everything and data engine
and offline trackers and the way we go about the occupancy tracker and so on, everything copypaste. You just need to retrain the neural nuts, but the approach on everything and data engine and offline trackers and the way we go about the occupancy tracker and so on,
everything copy paste.
You just need to retrain the neural nuts.
And then the planning control, of course,
has to change quite a bit.
But there's a ton of copy paste
from what's happening at Tesla.
And so if you were to go with goal of like,
okay, let's build a million human robots
and you're not Tesla, that's a lot to ask.
If you're a Tesla, it's actually like, it's not, it's not that crazy.
And then the follow-up question is on how difficult, just like we're driving, how difficult is the
manipulation task such that it can have an impact at scale. I think depending on the context,
the really nice thing about robotics is that unless you do a manufacturer and that kind of stuff, is there is more room for error.
Driving is so safety critical.
And also time critical.
I got a robot is allowed to move slower, which is nice.
Yes.
I think it's going to take a long time,
but the way you want to structure the development,
is you need to say, OK, it's going to take a long time.
How can I set up the product development roadmap
so that I'm making revenue along the way?
I'm not setting myself up for a zero-one loss function
where it doesn't work until it works.
You don't want to be in that position.
You want to make it useful almost immediately,
and then you want to slowly deploy it.
And that's generalizing it.
That's scale.
And you want to set up your data engine,
your improvement loops, the telemetry,
the evaluation, the harness, and everything.
And you want to improve the product over time
incrementally and you're making revenue along the way.
That's extremely important.
Because otherwise you cannot build these large undertakings
just like don't make sense economically.
And also from the point of view of the team working on it,
they need the dopamine along the way.
They're not just going to make a promise about this being useful.
This is going to change the world in 10 years when it works.
This is not where you want to be.
You want to be in a place like I think
Audipald is today where it's offering increased safety and convenience of driving today.
People pay for it, people like it, people purchase it,
and then you also have the greater mission that you're working towards.
And you see that.
So the dopamine for the team, that was the source of happiness.
Yes, and that's the first time. You're deploying this. People like it. People drive it. People pay for it.
They care about it. There's all these YouTube videos. Your grandma drives it. She gives you feedback.
People like it. People engage with it. You engage with it. Huge.
Do people that drive testless like recognize you and give you love like, uh, like, hey,
thanks for the, for the, uh, the nice feature that is doing.
Yeah, I think the tricky thing is like,
some people really love you,
some people unfortunately,
you're working on something
that you think is extremely valuable useful, et cetera.
Some people do hate you.
There's a lot of people who hate me and the team
and the whole project.
And I think that-
Are they Tesla drivers?
Many cases they're not, actually.
Yeah, that's actually makes me sad about humans or the current ways that humans interact. I think
that's actually fixable. I think humans want to be good to each other. I think
Twitter and social media is part of the mechanism that actually somehow makes
the negativity more viral that it doesn't deserve like disproportionately add
like a viral viral boost in negativity.
But I wish people would just get excited about, so suppress some of the jealousy, some
of the ego and just get excited for others. And then there's a karma aspect to that. You
get excited for others, they'll get excited for you. Same thing in academia, if you're
not careful, there is a is a dynamical system there.
If you think of in silos and get jealous of somebody else being successful, that actually
perhaps, counterintuitively, leads the less productivity of you as a community and you
individually.
I feel like if you keep celebrating others, that actually makes you more successful.
Yeah.
And I think people have, depending on the industry, haven't quite learned that yet.
Yeah.
Some people are also very negative and very vocal, so they're very prominently featured.
But actually, there's a ton of people who are cheerleaders, but they're sound cheerleaders.
And when you talk to people just in the world, they will all tell you, it's amazing, it's
great.
Especially like people who understand how difficult it is to get the stuff working.
Like people who have built products and makers and entrepreneurs entrepreneurs like making this work and changing something
Is is incredibly hard those people are more likely to cheerlead you
Well, one of the things that makes me sad is some folks in the robotics community
Don't do the cheerleading and they should
There's a because they know how difficult it is
Well, they actually sometimes don't know how difficult it is to create a product that scale, right? They actually deployed in
the real world. A lot of the development of robots and AI systems is done on very specific
small benchmarks. And as opposed to real world conditions. Yes.
Yeah, I think it's really hard to work on robotics in academic setting or AI systems that apply in the real world.
You've criticized, you flourished and loved for time the
image net, the famed image net data set, and have recently
had some words of criticism that the academic research
ML community gives a little too much love still to the
image net or like those kinds of benchmarks. Can you speak to the strengths and weaknesses of
data sets used in machine learning research? Actually, I don't know that I
recall the specific instance where I was unhappy or criticizing ImageNet. I
think ImageNet has been extremely valuable. It was basically a benchmark that
allowed the deep learning community to demonstrate that deep neural works has been extremely valuable. It was basically a benchmark that allowed
the deep learning community to demonstrate
that deep neural networks actually work.
It was, there's a massive value in that.
So I think image net was useful,
but basically it's become a bit of an MNIST at this point.
So MNIST is like little 228 by 28 gray scale digits.
There's kind of a joke dataset that everyone like crushes.
You're still papers written on on them this though, right?
Maybe they should have strong papers.
Like papers that focus on like, how do we learn with a small amount of data that
could stuff? Yeah, I could see that being helpful, but not in sort of like
mainline, computer vision research anymore, of course.
I think the way I've heard you somewhere, maybe I'm just imagining things,
but I think you said like, image that was a huge contribution to the community for a long time,
and now it's time to move past those kinds of...
Well, image net has been crushed.
I mean, you know, the error rates are...
Yeah, we're getting like 90% accuracy
in 1,000 classification way prediction
and I've seen those images
and it's like really high.
That's really good.
If I'm correctly, the top five error rate
is now like 1% or something.
Given your experience with a gigantic real world dataset,
would you like to see benchmarks moving
in certain directions that the research community uses?
Unfortunately, I don't think I can
then externally have the next image net.
We've obviously, I think we've crushed MNIST.
We've basically kind of crushed MNET.
And there's no next
sort of big benchmark that the entire community rail is behind and uses for further development
of these networks.
What do it takes for data set to captivate the imagination of everybody, like where they
all get behind it?
That could also need a like a leader, right?
Somebody with popularity, I mean, yeah, what did it imagine that take off?
Is it just the accident of history?
It was the right amount of difficult.
It was the right amount of difficult and simple.
And interesting enough, it just kind of like it was the right
time for that kind of a data set.
Question from Reddit.
What are your thoughts on the role of the synthetic data and game engines will
play in the future of neural net model development? I think as neural nets converge to humans, the
value of simulation to neural nets will be similar to value of simulation to humans. So people
use simulation for people use simulation because
they can learn something in that kind of a system. And without having to actually experience
it. But you're referring to the simulation with doing our head. No, sorry simulation, I mean
like video games or you know, other forms of simulation for various professionals.
Well, let me push back and that because maybe there's simulation that we do in our heads.
Like, simulate if I do this, what do I think will happen?
Okay, that's like internal simulation.
Yeah, internal.
Isn't that what we're doing?
It's simulated before we act.
Oh, yeah, but that's independent from the use of simulation
in a sense of computer games
or using simulation for training set creation.
Is it independent or is it just loosely
correlated? Because like, isn't that useful to do like
counterfactual or like edge case simulation to like, you
know, what happens if there's a nuclear war? What happens
if there's, you know, like those kinds of things, because
yeah, that's a different simulation from like unrelendion.
That's how I interpreted the question. Ah, so like simulation of the average case.
Is that what's Unreal Engine? What do you mean by Unreal Engine? So,
simulating a world, physics of that world, why is that different? Because you also can add
behavior to that world,
and you can try all kinds of stuff, right?
You could throw all kinds of weird things into it.
So a real engine is not just about,
similarly, I mean, I guess it is about
submitting the physics of the world.
It's also doing something with that.
Yeah, the graphics, the physics,
and the agents that you put into the environment
and stuff like that.
Yeah.
I think I feel like you said that it's not that important, I guess, for the future of AI
development. Is that correct to interpret it either way? I think humans use simulators for humans
use simulators and they find them useful. And so computers will use simulators and find them
useful. Okay. So you're saying it's not I don't use simulators very often.
I play a video game every once in a while,
but I don't think I derive any wisdom about my own existence
from from those video games.
It's a momentary escape from reality versus a source of wisdom
about reality.
So I don't so I think that's a very polite way of saying
simulation is not that useful.
Yeah, maybe maybe not.
I don't see it as like a fundamental, really important part of like training
neural nets currently. But I think as neural nets become more and more powerful, I think you
will need fewer examples to train additional behaviors. And simulation is, of course, there's
a domain gap in a simulation. There's not the real world. There's slightly something different.
But with a powerful enough neural net, you need the domain gap in a simulation. There's not the real world. There's slightly something different. But with a powerful neural net,
you need the domain gap can be bigger, I think,
because neural net will sort of understand
that even though it's not the real world,
it like has all this high level structure
that I'm supposed to be learning from.
So the neural net will actually,
yeah, we'll be able to leverage this data data better
by closing the gap, but understanding in which ways this is not a data better by closing the get by understanding in which ways this is not real data.
Exactly. I'm ready to do better questions next time. That was a question that I'm just kidding.
All right. So, is it possible, do you think speaking of MNIST, to construct neural nets and
training processes that require very little data.
So we've been talking about huge data sets like the internet for training. I mean, one way to say that is like you said, like the querying itself is another level of training,
I guess, and that requires a little data. But do you see any value in doing research
and kind of going down the direction, can we use very little data
to train to construct a knowledge based on that?
100%.
I just think at some point you need a massive data set.
And then when you pre-training your massive neural nut and get something that is like a
GPT or something, then you're able to be very efficient at training and you are returning
new tasks.
So a lot of these GPTs, you can do tasks like sentiment analysis
or translation or so on just by being prompted
with very few examples.
Here's the kind of thing I want you to do.
Like here's an input sentence,
here's the translation into German.
Input sentence, translation to German.
Input sentence, blank, and the neural net will complete
the translation to German just by looking at
sort of the example you've provided.
And so that's an example of a very few
shot learning in the activations of the neural net instead of the weights of the neural net. And so I
think basically just like humans neural nets will become very data efficient at learning any other
new task. But at some point you need a massive data set to pre-training your network.
To get that and probably we humans have something like that.
Do we do we have something like that?
Do we have a passive in the background, background model constructing thing that just runs all
the time in a self supervised way.
We're not conscious of it.
I think humans definitely, I mean, obviously we have, we learn a lot during our lifespan,
but also we have a ton of hardware
that helps us initialize the initialization
coming from evolution.
And so I think that's also a really big component.
A lot of people in the field,
I think they just talk about the amounts of seconds
and the person has lived,
pretending that this is a WLIR ASA,
sort of like a zero initialization of a neural net.
And it's not.
You can look at a lot of animals,
like for example, Zeebras get born
and they see and they can run.
There's zero trained data in their lifespan.
They can just do that.
So somehow, I have no idea how evolution has found a way
to encode these algorithms
and these neural net initializations
that are extremely good into ATCGs.
And I have no idea how this works,
but apparently it's possible
because here's a proof-by existence.
There's something magical about going from a single cell
to an organism that is born to the first few years of life.
I kind of like the idea that the reason we don't remember anything
about the first few years of our life
is that it's a really painful process.
Like it's a very difficult challenging training process.
Like intellectually.
Like, and maybe, yeah, I mean, why don't we remember any of that?
There might be some crazy training going on
and maybe that's the background model training
that is very painful.
So it's best for the system once it's trained not to remember how it's constructed.
I think it's just like the hardware for long-term memory is just not fully developed.
I kind of feel like the first few years of infants is not actually learning.
It's brain maturing.
We're born premature. There's a theory along those lines
because of the birth canal and this, along with the brain. And so we're born premature. And then
the first few years, we're just the brain's maturing. And then there's some learning eventually.
That's my current view on it. What do you think? Do you think neural nets can have long-term memory?
Like that approach to something like humans. Do you think
there needs to be another meta-architecture on top of it to add something like a knowledge
base that learns facts about the world and all that kind of stuff?
Yes, but I don't know to what extent it will be explicitly constructed. It might take
unintuitive forms where you are telling the GPT, like, hey, you have a declarative memory bank
to which you can store and retrieve data from.
And whenever you encounter some information
that you find useful, just save it to your memory bank.
And here's an example of something you have retrieved
and how you say it, and here's how you load from it.
You just say load whatever, you teach it in text in English.
And then it might learn to use a memory bank from from that.
Also, in, so the neural net is the architecture for the background model, the, the, the base thing and then
yeah, everything else is just on top of the text, right? It's a, you're given it gadgets and gizmos, so
you're teaching some kind of a special language by which it can, it can save arbitrary information and
retrieve it at a later time. And you, you're telling about these special tokens and how to arrange them to use these interfaces.
It's like, hey, you can use a calculator.
Here's how you use it.
Just do 5, 3, plus 4, 1 equals.
And when equals is there, a calculator will actually read out the answer and you don't
have to calculate it yourself.
And you just like tell it in English.
This might actually work.
Do you think, in that sense, Gato is interesting, the deep mind system,
that it's not just no language,
but actually throws it all in the same pile,
images, actions, all that kind of stuff.
That's basically what we're moving towards.
Yeah, I think so.
So Gato is very much a kitchen sink of approach to like
reinforcement learning lots of different environments with a single fixed transformer model, right?
I think it's a very sort of early result in that in that realm, but I think, yeah, it's a long lines of what I think things will eventually look like.
Right. So this is the early days of assistant that eventually will look like this like from a rich, a certain perspective.
and she will look like this from a rich side in perspective. Yeah, I'm not super huge fan of, I think,
all these interfaces that look very different.
I would want everything to be normalized
into the same API.
So for example, screen pixels, very same API.
Instead of having different world environments
that are very different physics and joint configurations
and appearances and whatever,
and you're having some kind of special tokens
for different games that you can plug,
I'd rather just normalize everything to a single interface. So it looks the same to the neural nut. If that makes
sense, so it's all going to be pixel based pong in the end. I think so. Okay. Let me ask
you about your own personal life. A lot of people want to know you're one of the most productive
and brilliant people in the history of AI. what is a productive day in the life of Andre Kapati look like? What time do you wake up? You should
imagine some kind of dance between the average productive day and a perfect productive day. So
the perfect productive day is the thing we strive towards in the average is kind of what it kind
of converges to, give all the mistakes and human eventualities and so on.
So what times you wake up? Are you a morning person?
I'm not a morning person. I'm a night owl for sure. I think stable or not.
That's semi-stable, like eight or nine or something like that. During my PhD, it was even later.
I used to go to sleep usually at three a.m. I think the a.m. hours are precious and very
interesting time to work because everyone is asleep.
At 8am or 7am, the East Coast is awake.
So there's already activity, there's already some text messages, whatever, there's stuff
happening, you can go on some news website and there's stuff happening and it's distracting.
At 3am, everything is totally quiet.
And so you're not going to be bothered and you have solid chunks of time to do work.
So I like those periods, night owl by default.
And then I think like productive time basically, what I like to do is you need to like build some momentum on the problem without too much distraction.
And you need to load your RAM, your working memory with that problem. And then you need to be obsessed with it when you're
taking shower, when you're falling asleep, you need to be obsessed with the problem and it's
fully in your memory and you're ready to wake up and work on it right there. So it is the skill
is this in a scale temporal scale of a single day or a couple of days a week a month? So I can't
talk about one day basically in isolation because it's a whole process when I want to get when I
want to get productive in the problem, I feel like I need a span of a few days where I can really get in on that
problem. And I don't want to be interrupted, and I'm going to just be completely obsessed
with that problem. And that's where I do most of my good workouts.
You've done a bunch of cool, like, little projects in a very short amount of time, very quickly.
So that requires you just focusing on it. Yeah, basically, I need to load my working memory
with the problem, and I need to be productive,
because there's always a huge fixed cost
to approaching any problem.
I was struggling with this, for example, at Tesla,
because I want to work on a small side project.
But, okay, you first need to figure out,
okay, I need to SSH into my cluster,
I need to bring up a VS code editor
so I can work on this.
I need to, I run into some stupid error because of some reason.
Like you're not at a point where you can be just productive right away.
You are facing barriers.
And so it's about really removing all that barrier
and you're able to go into the problem and you have the full problem loaded in your memory.
And somehow avoiding distractions of all different forms.
Like news stories, emails, but also distractions from
other interesting projects that you previously worked out, currently working on and so on.
You just want to really focus your mind.
And I mean, I can take some time off for distractions and in between, but I think it can't be too
much.
You know, most of your day is sort of like spent on that problem.
And then, you know, I drink coffee, I have my morning routine,
I look at some news, Twitter, hacker news, Wall Street Journal, etc. So it's great. So basically,
you wake up, you have some coffee, are you trying to get to work as quickly as possible? Do you
do take in this diet of like what the hell is happening in the world first? I am, I do find it
interesting to know about the world. I don't know that it's useful or good,
but it is part of my routine right now. So I do read through a bunch of news articles
and I want to be informed. And I'm suspicious of it. I'm suspicious of the practice,
but currently that's where I am. Oh, you mean suspicious about the positive effect
of that practice on your productivity and your wellbeing? My wellbeing psychologically.
And also on your ability to deeply understand the world because there's a bunch of sources of
information. You're not really focused on deeply integrating it. Yeah, it's loaded distracting.
Yeah. Yeah. In terms of a perfectly productive day, for how long of a stretch of time in one session,
do you try to work and focus anything? A couple hours, there's a one hours,
at 30 minutes, there's 10 minutes.
I can probably go like a small few hours
and then I need some breaks in between for food and stuff.
And yeah, but I think it's still really hard to accumulate hours.
I was using a tracker that told me exactly
how much time I spent coding any one day.
And even on a very productive day, I still spent only like six or eight hours. Yeah. And it's just because there's so much padding, commutes, talking to people, food, etc.
There's like the cost of life, just living and sustaining and homeostasis and just maintaining
yourself as a human is very high. And there seems to be a desire within the human mind to participate in society that creates
that padding.
Because the most productive days I've ever had is just completely from start to finish,
just tuning out everything, and just sitting there.
And then you could do more than six and eight hours.
Is there some wisdom about what gives you strength to do tough days of long focus?
Yeah, just like whenever I get obsessed about a problem,
something just needs to work, some just needs to exist.
It needs to exist, so you're able to deal with bugs and programming issues and technical issues
and design decisions that turn out to be the wrong ones, you're able to think through all of that,
given that you want to think to exist.
Yeah, it needs to exist. And then I think to me also, big factor is,
you know, our other humans are going to appreciate it.
Are they going to like it?
That's a big part of my motivation.
If I'm helping humans and they seem happy,
they say nice things, they tweet about it or whatever,
that gives me pleasure because I'm doing something useful.
So like you do see yourself sharing it with the world,
like with San GitHub, with the blog posts,
or videos.
Yeah, I was thinking about it.
Like, suppose I did all these things, but I did not share them.
I don't think I would have the same motivation that I can build up.
You enjoy the feeling of other people,
gaining value and happiness from the stuff you've created.
Yeah. What about diet?
Is there, I saw you play in a manifest, you fast?
Is that help?
With everything.
With the things you played, played was been most beneficial to the your ability to mentally focus
on a thing and just mental, mental productivity and happiness.
You still fast?
Yeah, I still fast, but I do intermittent fasting, but really what it means at the end of
the day is I skip breakfast.
Yeah.
So I do 18, six roughly by default when I'm in my steady state.
If I'm traveling or doing something else, I will break the rules, but in my steady state, I do 18-6 roughly by default when I'm in my steady state. If I'm traveling or doing something else, I will break the rules.
But in my steady state, I do 18-6.
So I eat only from 12 to 6.
Not a hard rule, and I break it often, but that's my default.
And then, yeah, I've done a bunch of random experiments.
For the most part right now, where I've been for the last year and a half,
I want to say, is I'm plant-based or plant-forward.
I heard plant-forward. It soundsforward. I heard plant-forward.
It sounds better.
That being exactly.
I didn't actually know the differences, but it sounds better in my mind.
But it just means I prefer plant-based food and raw or cooked.
I prefer cooked and plant-based.
So plant-based, forgive me, I don't actually know how wide the category of plant entails.
Well, plant-based just means that you're not
a lot of talk about it and you can flex.
And you just prefer to eat plants.
And you're not trying to influence other people.
And if someone is, you come to someone's house party
and they serve you a steak that they're really proud of,
you will eat it.
Yes.
That's beautiful.
I mean, that's, I'm the flip side of that,
but I'm very sort of flexible. I have you tried doing one meal a day.
I have accidentally not consistently, but I've accidentally had that. I don't I don't like it. I think it makes me feel not good. It's too much too much of a hit. Yeah.
And so currently I have about two meals a day, 12 and six. I do that nonstop from doing it, no, to one meal a day.
Okay.
It's interesting.
It's interesting feeling.
Have you ever fasted longer than a day?
Yeah, I've done a bunch of water fasts, because I'm curious what happens.
What?
Anything interesting.
Yeah, I would say so.
I mean, you know, what's interesting is that you're hungry for two days.
And then starting day three or so, you're not hungry.
It's like such a weird feeling because you haven't eaten in a few days and you're not hungry.
Is that weird?
It's really one of the many weird things about human biology.
It figures something out.
It finds another source of energy or something like that or relaxes the system.
I don't know how to work.
Yeah, the body is like, you're hungry, you're hungry and then it just gives up.
It's like, okay, I guess we're fasting now.
There's nothing.
And then it's just kind of like focuses on trying to make you not hungry
and not feel the damage of that
and trying to give you some space
to figure out the food situation.
So are you still to this day most productive at night?
I would say I am, but it is really hard
to maintain my PhD schedule,
especially when I was, say, working at Tesla on someone, it's a non-starter.
But even now, people want to meet for various events.
The society lives in a certain period of time, and you have to work.
So that's hard to do a social thing, and then after that, return and do work.
Yeah, it's just really hard.
That's why I try to do social things.
I try not to do too much drinking, so I can return and continue doing work.
But a Tesla is there, is there conversions, Tesla, but any company, is there converges to almost a
schedule, or is there more, is that how humans behave when they collaborate? I need to learn about this.
Do they try to keep us consistent schedule
or you're all awake at the same time?
I'm gonna do try to create a routine
and I try to create a steady state
in which I'm comfortable in.
So I have a morning routine, I have a day routine,
I try to keep things to a steady state
and things are predictable and then you can sort of just
like your body just sort of sticks to that.
And if you try to stress that
a little too much it will create a you know when you're traveling and you're dealing with jet lag you're not able to really ascend to
You know where you need to go. Yeah, yeah, that's what you're doing with humans with the habits and stuff
What do you thoughts on work life balance throughout a human lifetime?
So test any part was known for sort of pushing people to their limits, in terms
of what they're able to do, in terms of what they're trying to do, in terms of how much they work,
all that kind of stuff. Yeah, I mean, I will say test like it's all too much bad rep for this,
because what's happening is test lies, it's a bursting environment. So I would say the baseline,
my only point of reference
is Google, where I've interned three times,
and I saw what it's like inside Google and deep mind.
I would say the baseline is higher than that,
but then there's a punctually equilibrium
where once in a while there's a fire
and someone like people work really hard.
And so it's spiky and bursty.
And then all the stories get collected about the bursts.
And then it gives the appearance of like total insanity,
but actually it's just a bit more intense environment.
And there are fires and sprints.
And so I think, you know, it definitely though,
I would say it's a more intense environment
than something you would get.
Like in your personal, forget all of that.
Just in your own personal life,
what do you think about the happiness of a human being,
a brilliant person like yourself
about finding a balance between work and life, or is it such a thing, not a good thought experiment?
Yeah, I think balance is good, but I also love to have sprints that are out of distribution.
And that's when I think I've been pretty creative
and as well.
So, sprints out of distribution means that most of the time
you have a quote unquote balance.
I have balance most of the time.
I like being obsessed with something once in a while.
Once in a while is what?
Once a week, once a month, once a year.
Yeah, probably like say once a month or something, yeah.
And that's when we get and you get hub repo for monitoring.
Yeah, that's when you're like really care about a problem.
It must exist.
This will be awesome.
You're obsessed with it.
And now you can't just do it on that day.
You need to pay the fixed cost of getting into the groove.
And then you need to stay there for a while.
And then society will come and they will try to mess with you
and they will try to distract you.
Yeah, the worst thing is like a person who's like,
I just need five minutes of your time. Yeah. This is the cost of that is not five minutes.
And society needs to change how it thinks about just five minutes of your time.
Right. It's never, it's never, it's just one minute, it's just 30s, just a quick thing.
What's the big deal? Why are you being so? Yeah, no. What's your computer setup?
What's like the perfect, are you somebody that's flexible to no matter what laptop?
For screens. Yeah, or do you prefer a certain setup that you're most productive?
I guess the one that I'm familiar with is one large screen 27 inch
And my laptop on the side. What operating system?
I do max.
That's my primary.
For all tasks.
I would say OSX, but when you're working on deep learning everything is Linux.
Your SSH into a cluster and you're working remotely.
But what about the actual development like they're using the ID?
You would use, I think, a good way is you just run VS code.
My favorite error right now on your Mac, but you are actually, you have a remote folder
through SSH. So the actual files that you're manipulating are on the cluster somewhere else.
So what's the best IDE? VS Code, what else do people, so I use EMAX. Still, it's cool.
It may be cool, I don't know if it's maximum productivity.
So what do you recommend in terms of editors?
You work a lot of software engineers,
editors for Python, C++ machine learning applications.
I think the current answer is VS code.
Currently, I believe that's the best IDE.
It's got a huge amount of extensions.
It has GitHub Copilot integration, which I think is very valuable. What do you think about the co-pilot integration?
I was actually, I got to talk a bunch with Guida Ronson, who's a creative python, and he loves
co-pilot. He like, he programs a lot with it. Yeah. Do you? Yeah, use co-pilot. I love it.
And it's free for me, but I would pay for it. Yeah, I think it's very good. And the utility that I found with it was
isn't, isn't, I would say there is a learning curve and you need to
figure out when it's helpful and when to pay attention to its outputs
and when it's not going to be helpful, where you should not pay attention to it.
Because if you're just reading at suggestions all the time,
it's not a good way of interacting with it.
But I think I was able to sort of like mold myself to it.
I find it's very helpful, number one, in copy paste and replace some parts.
So when the pattern is clear,
it's really good at completing the pattern.
And number two, sometimes it suggests APIs that I'm not aware of.
So it tells you about something that you didn't know.
And that's an opportunity to discover and use it.
It's an opportunity to, so I would never take
GoPilot code as given.
I almost always copy a copy face into a
Google search and you see what this function is doing.
And then you're like, oh, it's actually
exactly what I need.
Thank you, go pilot.
So you learned something.
So it's in part of search engine,
part, maybe getting the exact syntax
correctly that once you see it,
it's that NP hard thing is that once you see it,
you know, yes, exactly.
Correct exactly.
You yourself, you can verify efficiently,
but you can't generate efficiently. And co-pilot, really, it's autopilot for programming,
right? And currently, it's doing the link following, which is like the simple copy paste and
sometimes suggest, but over time, it's going to become more and more autonomous. And so the same
thing will play out in not just coding, but actually across many many different things probably.
Coding is an important one, right? Writing programs. How do you see the future of that developing
the program synthesis, like being able to write programs that are more and more complicated?
Because right now it's human supervised in interesting ways.
Yes. It feels like the transition will be very painful.
My mental model for it is the same thing will happen
as with the autopilot.
So currently, he's doing what he's following,
he's doing some simple stuff.
And eventually, we'll be doing autonomy.
And people will have to intervene less and less.
And there could be like testing mechanisms.
Like, if the right's a function, and that function looks pretty
damn correct, but how do you know it's correct?
Because you're like getting lazy and lazy as a programmer, like your ability to, because like little bugs,
but I guess it won't make little mistakes. No, it will. Copilot will make off-by-one subtle bugs.
It has done that to me. But do you think future systems will? Or is it really, the off-by-one is
actually a fundamental challenge
of programming.
In that case, it wasn't fundamental, and I think things can improve, but I think humans
have to supervise.
I am nervous about people not supervising what comes out, and what happens to, for example,
the proliferation of bugs in all of our systems.
I am nervous about that, but I think there will probably be some other copilots for bug
finding and stuff like that at some point.
Because there will be like a lot more automation for a man.
So it's like a program, a copylotid generates a compiler, one that does a linter, one that
does like a type checker.
It's a committee of like a GPT sort of like.
And then there will be like a manager for the committee.
Yeah. And then there'll be somebody that says a new version of this is needed.
We need to regenerate it.
Yeah.
There were 10 GPT's.
They were forwarded and gave 50 suggestions.
And other one looked at it and picked a few that they like.
A bug one looked at it and it was like it's probably a bug.
They got re-ranked by some other thing.
And then a final ensemble GPT comes in and it's like,
okay, given everything you guys have told me, this is probably the next token.
You know, the feeling is the number of programmers in the world has been growing and growing
very quickly. Do you think it's possible that it'll actually level out and drop to like
a very low number with this kind of world? Because then you'll be doing software 2.0 programming
and you'll be doing this kind of generation of copilot type systems
programming but you won't be doing the old school software one-point-o programming.
I don't currently think that they're just going to replace human programmers.
I'm so hesitant saying stuff like this, right? Because this is going to be replacing five years.
I know it's going to show that like this this is where we thought, because I agree with you, but I
think it might be very surprised, right? Like, what are the next, I, what's your sense
of where we stand with language models? Like, does it feel like the beginning or the middle
or the end?
The beginning, 100%. I think the big question in my mind is, for sure, GPT will be able to program quite well,
competently and so on.
How do you steer the system?
You still have to provide some guidance
to what you actually are looking for.
And so how do you steer it and how do you say,
how do you talk to it, how do you audit it and verify
that what is done is correct,
and how do you work with this?
And it's as much, not just an AI problem,
but a UI UX problem. Yeah. So beautiful fertile ground for so much interesting work for VS Code Plus Plus where
you're not just this not just human programming anymore, it's amazing.
Yeah, so you're interacting with the system. So not just one prompt, but it's iterative prompting.
Yeah. You're trying to figure out having a conversation with the system. Yeah.
That actually, I mean to me, that's super exciting to have a conversation with the program I'm writing.
Yeah.
Yeah, maybe at some point, you're just conversing with it.
It's like, okay, here's what I want to do.
Actually, this variable, maybe it's not even that level is variable, but you can also
imagine like, can you translate this to C++ and back to Python and back to it?
Yeah, I don't really kind of exist in some of it.
But just like doing it as part of the program experience,
like I think I'd like to write this function in C++
or like you just keep changing for different programs
because of different syntax.
Maybe I want to convert this into a functional language.
And so like you get to become multi-lingual
as a programmer and dance back and forth efficiently.
Yeah.
I mean, I think the UI X of it, though,
is still very hard to think through
because it's not just about writing code on a page.
You have an entire developer environment.
You have a bunch of hardware on it.
You have some environmental variables.
You have some scripts that are running in a Chrome job.
There's a lot going on to working with computers.
And how do these systems set up environment flags,
and work across multiple machines,
and set up screen sessions, and automate different processes,
like how all that works,
and as auditable by humans and so on,
is massive question to a moment.
You've built Archive Sanity.
What is Archive, and what is the future of
academic research publishing that you would like to see.
So archive is this pre-print server. So if you have a paper, you can submit it for publication to journals or conferences and then wait six months and then maybe get a decision, pass or fail,
or you can just upload it to archive. And then people can tweet about it three minutes later
and then everyone sees it, everyone reads it and everyone can profit from it in their own little ways. You can cite it, and it has an official look to it.
It feels like a publication process.
It feels different if you just put it in a blog post.
Oh, yeah.
Yeah, I mean, it's a paper.
And usually the bar is higher for something that you would expect on archive as a post
to and something you would see in a blog post.
Well, the culture created the bar, because you could probably
host a pretty crappy fixer in the archive.
So what does that make you feel like?
What does that make you feel about peer review?
So rigorous peer review by two, three experts versus the peer
review of the community right as it's written.
Yeah, basically, I think the community is very well able to
peer review things very quickly
on Twitter.
And I think maybe it just has to do something with AI machine learning field specifically
though.
I feel like things are more easily auditable and the verification is easier potentially
than the verification somewhere else.
So it's kind of like you can think of these scientific publications as like little
blockchains where everyone's building on each other's work and setting each other.
And you sort of have AI, which is kind of like this much faster and loose blockchain.
But then you have any one individual entry is like very cheap to make.
And then you have other fields where maybe that model doesn't make as much sense.
And so I think in AI, at least things are pretty easily very viable.
And so that's why when people upload papers, there are a
really good idea on someone. People can try it out the next day.
And they can be the final arbiter of whether it works or not on
their problem. And the whole thing just moves significantly faster.
So I kind of feel like academia still has a place, so this like
conference journal process still has a place, but it's sort of like
an it lacks behind, I think, and it's a bit more, maybe higher quality
process. But it's not sort of the place where you will discover cutting edge work anymore.
Yeah. It used to be the case when I was starting my PhD that you go to conferences and journals
and you discuss all the latest research. Now when you go to conferences or general, like,
no one discusses anything that's there because it's already like three generations ago
irrelevant. Yeah, it makes me sad about like, Deep Mind, for example, where they still publish
in nature and these big prestigious,
I mean, there's still value, I suppose,
of the prestigious that comes with these big venues.
But the result is that they'll announce
some breakthrough performance and it'll take like a year
to actually publish the details.
I mean, and those details, if they were
published immediately, wouldn't inspire the community to move in certain directions
or that.
Yeah, let's speed up the rest of the community, but I don't know to what extent that's
part of their objective function also.
That's true.
So it's not just the prestige a little bit of the delay is part.
Yeah, they certainly deep mind specifically has been working in the regime of having slightly higher quality
basically process and latency and publishing those papers that way. Another question from Reddit.
Do you or have you suffered from imposter syndrome? Being the director of AI Tesla,
being this person when you're a Stanford where like the world looks at you as the expert in AI to teach the world about machine learning.
When I was leaving Tesla after five years, I spent a ton of time in meeting rooms, and I would read papers.
In the beginning when I joined Tesla, I was writing code, and then I was writing less and less code, and I was reading code, and then I was reading less and less code.
And so this is just a natural progression that happens, I think.
And definitely, I would say near the tail end,
that's when it sort of starts to hit you a bit more
that you're supposed to be an expert.
But actually, the source of truth is the code
that people are writing the GitHub
and the actual code itself.
And you're not as familiar with that as you used to be.
And so I would say maybe there's some
like insecurity there.
Yeah, that's actually pretty profound.
That a lot of the insecurity has to do
with not writing the code in the computer science
space, because that is the truth.
The code is the source of truth.
The papers and everything else, it's a high level summary.
I don't, yeah, just a high level summary, but at the end of the day, you have to read code.
It's impossible to translate all that code into actual, you know, paper form.
So when things come out, especially when they have a source code available,
that's my favorite place to go.
So like I said, you're one of the greatest teachers of machine learning AI ever.
From CS231N to today,
what advice would you give to beginners interested in getting into machine learning?
Beginners are often focused on like what to do,
and I think the focus should be more like how much you do.
So I am kind of like believer on the high level in this 10,000 hours kind of concept where.
You just kind of have to just pick the things where you can spend time and you care about and you're interested in.
You literally have to put in 10,000 hours of work.
It doesn't even matter as much like where you put it and you'll iterate and you'll improve and you'll waste some time.
I don't know if there's a better way.
You need to put in 10,000 hours.
But I think it's actually really nice because I feel like there's some sense of determinism
about being an expert at a thing if you spend 10,000 hours.
You can literally pick an arbitrary thing.
And I think if you spend 10,000 hours of deliberate effort and work, you actually will become
an expert at it.
And so I think that's kind of like a nice thought. And so basically I would focus
more on like, are you spending 10,000 hours? And I focus on. So, and then thinking about what kind
of mechanisms maximize your likelihood of getting to 10,000 hours, exactly, which for us silly humans
means probably forming a daily habit of like every single day actually doing the thing.
Whatever helps you. So I do think to to large extent is a psychological problem for yourself.
One other thing that I help that I think is helpful for the psychology of it is many times people
compare themselves to others in the area. I think it's very harmful. Only compare yourself to you
from some time ago, like say a year ago. Are you better than you year ago? It's the only way to think.
And I think this then you can see your
progress and this very motivating. That's so interesting that focus on the quantity of hours.
I think a lot of people in the beginner stage, but actually throughout, get paralyzed by the choice,
like which one do I pick this path or this path? Yeah. Like they'll literally get paralyzed by
like which IDE to use.
Well, they're worried, yeah, they'll
worry about all these things,
but the thing is some of the,
you will waste time doing something wrong.
Yes.
You will eventually figure out it's not right.
You will accumulate scar tissue.
And next time you will grow stronger,
because next time you'll have the scar tissue,
and next time you'll learn from it,
and now next time you come to a similar situation,
you'll be like, oh, I messed up. I've spent
a lot of time working on things that never materialize into anything. And I have all
that's cartosh you and I have some intuitions about what was useful, what wasn't useful,
how things turned out. So all those mistakes where we're not dead work, you know. So I just
think you should, did you just focus on working? What have you done? What have you done last
week? That's a good question actually to ask for for a lot of things that just machine learning
It's a good way to cut the
The way I forgot what the term we use but the fluff the blubber whatever the
The inefficiencies in life. What do you love about teaching? You seem to find yourself
Often in the like draw into teaching.
You're very good at it, but you're also drawn to it.
I mean, I don't think I love teaching. I love happy humans.
And happy humans like when I teach.
I wouldn't say I hate teaching. I tolerate teaching.
But it's not like the act of teaching that I like.
It's that, you know, I have something, I'm actually okay at it.
Yes. I'm okay at I'm actually okay at it. Yes.
I'm okay at teaching and people appreciate a lot.
And so I'm just happy to try to be helpful.
And teaching itself is not like the most, I mean, it's really, it can be really annoying
frustrating.
I was working on a bunch of lectures just now.
I was reminded back to my days of 231 and just how much work it is to create some of
these materials and make them good.
The amount of iteration and thought and you go down blind alleys and just how much
you change it.
So creating something good in terms of like educational value is really hard.
And it's not fun.
It was difficult.
So for people to definitely go watch your new stuff, you put out their lectures, we actually
building a thing like from like you said, the code is truth.
So discussing backpropagation by building it building it by looking through and just the whole thing.
So how difficult is that to prepare for? I think it's a really powerful way to teach
Did you have to prepare for that? Or are you just live thinking through it?
I will typically do like say three takes and then I take like the the better take.
So I do multiple takes and then I take some of the better takes, and then I just build that lecture that way.
Sometimes I have to delete 30 minutes of content
because it just went down the alley
that I didn't like too much.
There's a bunch of iteration,
and it probably takes me somewhere around 10 hours
to create one hour of content.
To get one hour.
It's interesting.
I mean, is it difficult to go back to the basics?
Do you draw a lot of wisdom from going back to the basics?
Yeah, going back to back propagation,
lost functions where they come from.
And one thing I like about teaching a lot, honestly,
is it definitely strengthens your understanding.
So it's not a purely altruistic activity.
It's a way to learn.
If you have to explain something to someone,
you realize you have gaps in knowledge.
And so I even surprised myself in those lectures.
Like, the result will obviously look at this
and then the result doesn't look like it.
And I'm like, okay, I thought I understood this.
Yeah.
Well, that's why it's really cool.
The literally code you run it in a notebook
and it gives you a result and you're like,
oh, wow.
And like actual numbers, actual input, actual code.
Yeah, it's not mathematical symbols, etc.
The source of truth is the code.
It's not slides.
It's just like, let's build it.
It's beautiful.
You're a rare human in that sense.
What advice would you give to researchers
trying to develop and publish the idea
that have a big impact in the world of AI?
So maybe undergrads, maybe early graduate students.
Yeah.
I mean, I would say they definitely have to be
a little bit more strategic than I had to be
as a PhD student because of the way AI is evolving.
It's going the way of physics,
where if physics used to be able to do experiments
in your bench top and everything was great
and you could make progress.
And now you have to work in like LHC or like CERN.
And so AI is going in that direction as well.
So there's certain kinds of things that's just not possible to do on the
bench top anymore.
And I think that didn't used to be the case at the time.
You still think that there's like
GAN type papers to be written or like, like,
very simple idea that requires this one computer
to illustrate a simple example.
I mean, one example that's been very influential recently is diffusion models.
The fusion models are amazing.
The fusion models are six-year-old for the longest time people who are kind of ignoring
them as far as I can tell.
And they're an amazing generative model, especially in images.
And so stable diffusion and so on. It's all the fusion based.
The fusion is new. It was not there and came from, well, it came from Google, but a researcher could
have come up with it. In fact, some of the first, actually, no, those came from Google as well.
But a researcher could come up with that in an academic institution.
Yeah, what do you find most fascinating about diffusion models? So from the societal impact
of the technical architecture.
What I like about diffusion is it works so well. Is that surprising to you? The amount of the variety,
almost the novelty of the synthetic data is generating. Yes, the stable diffusion images are
incredible. The speed of improvement in generating images has been insane. We went very quickly from generating like tiny digits to tiny faces and it all
looked messed up and now we were stable diffusion and that happened very quickly.
There's a lot that academia can still contribute. You know, for example, flash
attention is a very efficient kernel for running the attention operation inside
the transformer. That came from academic environment. It's a very clever way to
structure the kernel that do the duster. It's a very clever way to structure the kernel
that do the duster calculation, so it doesn't materialize the attention matrix.
And so there's I think there's still like lots of things to contribute, but you have to be just more strategic. Do you think neural networks can be made to reason? Yes. Do you think they already reason?
Yes. What's your definition of reasoning? Information processing.
All right.
So in the way the humans think through a problem
and come up with novel ideas, it feels like reasoning.
So the novelty, I don't want to say,
but out of distribution ideas, you think it's possible.
Yes.
And I think we're seeing that already in the current neural nets, you're able to remix
the training set information into true generalization in some sense.
That doesn't appear in the training set.
Like you're doing something interesting algorithmically, you're manipulating some symbols
and you're coming up with some correct unique
answer and a new something.
What would illustrate to you, holy shit, this thing is definitely thinking.
To me, thinking or reasoning is just information processing and generalization and I think
the neural nets already do that today.
So being able to perceive the world or perceive whatever the inputs are and to make predictions
based on that or actually based on that, that's the reason.
Yeah, you're giving correct answers in novel settings by manipulating information.
You've learned the correct algorithm.
You're not doing just some kind of a look up table on the Earth's neighbor search.
Let me ask you about AGI.
What are some moonshirt ideas you think might make significant progress towards AGI?
Well, maybe another way is what are the big blockers that we're missing now.
So basically, I am fairly bullish on our ability to build AGIs.
Basically, automated systems that we can interact with and are very human-like,
and we can interact with them in a digital realm
or a physical realm.
Currently, it seems most of the models
that do these magical tasks are in a text realm.
I think, as I mentioned,
I'm suspicious that the text realm is not enough
to actually build full understanding of the world.
I do actually think you need to go into pixels and understand the physical world and how it works.
So I do think that we need to extend these models to consume images and videos and train on a lot more data that is multimodal in that way.
Do you think you need to touch the world to understand it also?
Well, that's the big open question I would say in my mind is if you also require the embodiment and the ability to sort of interact with
the world, run experiments and have a data of that form, then you need to go to Optimus
or something like that.
And so I would say Optimus in some way is like a hedge in AGI because it seems to me that
it's possible that just having data from the internet is not enough. If that is the case, then optimists may lead to AGI.
Because optimists, to me, there's nothing beyond optimists.
You have this humanoid form factor that can actually
do stuff in the world.
You can have millions of them interacting with humans and so on.
And if that doesn't give a rise to AGI, at some point,
I'm not sure what will.
So from a completeness perspective,
I think that's the, that's a really good platform.
But it's a much more harder platform
because you are dealing with atoms
and you need to actually like build these things
and integrate them into society.
So I think that path takes longer,
but it's much more certain.
And then there's a path of the internet
and just like training these compression models,
effectively, on trying to path of the internet and just like training these compression models effectively on
Trying to compress all the internet and
That might also give these agents as well
Compressed the internet, but also interact with the internet. Yeah, it's not obvious to me
In fact, I suspect you can reach a GI without ever entering the physical world and
But which is a little bit more concerning because it might that results in it happening faster.
So it just feels like we're in boiling water.
We won't know as it's happening.
I would like to, I'm not afraid of AI.
I'm excited about it.
There's always concerns.
But I would like to know when it happens.
Yeah, and have hints about when it happens. Like a year from now, it will happen, that kind of thing.
Yeah, I just feel like in the digital realm, it just might happen.
Yeah. I think all we have available to us because no one has built AGI again. So all we have
available to us is, is there enough for Cal ground on the periphery?
I would say yes, and we have the progress so far, which has been very rapid.
And there are next steps that are available.
And so I would say, yeah, it's quite likely that will be interacting with digital entities.
How do you know that we, somebody's ability to do it?
It's going to be a slow, I think it's going to be a slow incremental transition.
It's going to be product based and focused. it's going to be GitHub, Copa,
out getting better, and then GPs help in your right. And then these oracles that you can
go to with mathematical problems, I think we're on a, on a verge of being able to ask
very complex questions and chemistry, physics, math of these oracles and have them complete
solutions. So AGI to use primarily focused on intelligence.
So, consciousness doesn't enter into it.
So, in my mind, consciousness is not a special thing.
You will figure out and bolt on.
I think it's an emergent phenomenon of a large enough
and complex enough generative model, sort of.
So, if you have a complex enough world model
that understands the world, then it also understands
its predicament in the world as being a language model,
which to me is a form of consciousness,
or self-awareness.
So in order to understand the world deeply,
you probably have to integrate yourself into the world.
Yeah.
And in order to interact with humans
and other living beings, consciousness is a very useful tool.
I think consciousness is like a modeling insight.
Modeling insight. Yeah, it's a you have a powerful enough model of understanding the world that you actually understand that you are an entity in it.
Yeah, but there's also this
perhaps just a narrative we tell ourselves. There's a it feels like something to experience the world the hard problem of consciousness.
Yeah, but that could be just an narrative that we tell ourselves. Yeah, I don't think well. Yeah, I think it will emerge
I think it's going to be something very boring like we'll be talking to these digital AI's they will claim their conscious
Mm-hmm. They will appear conscious
They will do all the things that you would expect of other humans and it's going to just be a stalemate
I think there would be a lot of actual fascinating ethical questions, like Supreme Court level
questions of whether you're allowed to turn off a conscious AI. If you're allowed to
build the conscious AI, maybe there would have to be the same kind of the base that you
have around, sorry, to bring up a political topic,
but a portion, which is the deeper question with abortion, is what is life? And the deep
question where they are is also what is life and what is conscious. And I think they'll be very
fascinating to bring up. It might become illegal to build systems that are capable of such a level of intelligence
that consciousness would emerge and therefore the capacity to suffer would emerge.
And a system that says, no, please don't kill me.
Well, that's what the Lambda chatbot already told this Google engineer, right?
Like it was talking about not wanting to die or so on.
So that might become illegal to do that.
Right.
Right.
Right.
Because otherwise you might have a lot of creatures that don't want to die and they will
have to spawn and finish you some cluster.
And then that might lead to like horrible consequences. because then there might be a lot of people that
secretly love murder and they'll start practicing murder and
those systems. I mean, there's just I to me, all of this stuff
just brings a beautiful mirror to the human condition and
human nature will get to explore it. And that's what like the
best of the Supreme Court of all the different debates we have
about ideas of
what it means to be human. We get to ask those deep questions. They will be asking throughout
human history. There has always been the other in human history. We are the good guys and
that's the bad guys and we are going to, you know, throughout human history. Let's murder
the bad guys. And the same will probably happen with robots. So it'll be the other at first,
and then we'll get to ask questions.
So what does it mean to be alive?
What does it mean to be conscious?
Yeah.
And I think there's some canary in the coal mines,
even with what we have today.
And, you know, like for example, there's these like
wifews that you get like work with,
and some people are trying to like,
this company's going to shut down,
but this person really like,
love their wifew, and like,
is trying to like, port it somewhere elseu and like, it's trying to like, port it somewhere else
and like, it's not possible. And like, I think like definitely, people will have feelings towards
towards these systems because in some sense, they are like a mirror of humanity because they are
like sort of like a big average of humanity in a way that it's trained. But we can that average,
we can actually watch.
It's nice to be able to interact with
the big average of humanity and do like a search query on it.
Yeah, it's very fascinating.
Of course, also like shape it,
it's not just a pure average, we can mess with the training data,
we can mess with the objective,
we can fine tune them in various ways.
So we have some impact on what those systems look like.
If you want to achieve AGI, and you could have a conversation with her and ask her, talk
about anything, maybe ask her a question, what kind of stuff would you ask?
I would have some practical questions in my mind, like, do I or my loved ones really have to die?
What can we do about that?
Do you think it will answer clearly or would it answer poetically?
I would expect it to give solutions. I would expect it to be like, well, I've read all of these textbooks and I know all these things that you've produced and it seems to me like here are the
experiments that I think it would be useful to run next and here's some gene therapies that I think
would be helpful and here are the kinds of experiments that you should run.
Okay, let's go with this thought experiment.
Imagine that mortality is actually a pre-requisite for happiness.
So if we become immortal, we'll actually become deeply unhappy.
And the model is able to know that.
So what is this supposed to tell you stupid human about it?
Yes, you can become immortal, but you will become deeply unhappy.
If the AGI system is trying to empathize with you human,
what is it supposed to tell you that yes, you don't have to die,
but you're really not going to like it.
Is it going to be deeply honest?
Like there's an interstellar. What
is it? The AI says, like, humans want 90% honesty. So, like, you have to pick how honest
I want to answer these practical questions. Yeah. I love AI interstellar, by the way.
I think it's like such a sidekick to the entire story, but at the same time, it's like
really interesting. It's kind of limited in certain ways, right?
Yeah, it's limited.
And I think that's totally fine, by the way.
I don't think, I think it's fine and plausible to have limited and imperfect AGI's.
Is that the feature almost?
As an example, it has a fixed mind of compute on its physical body.
And it might just be that even though you can have a super
amazing mega brain, super intelligent AI, you can also have like, you know, less intelligent
AI so you can deploy in a power efficient way and then they're not perfect and might
make mistakes.
No, I meant more like say you had infinite compute and it's still good to make mistakes
sometimes.
Like in order to integrate yourself like, what is it going back to Goodwill
Hunting? Robin Williams character says like the human imperfections, that's good stuff, right?
Isn't that the like we don't want perfect? We want flaws in part to form connected with
each other because it feels like something you can attach to your feelings to the flaws.
In that same way, you want AI that's flawed.
I don't know.
I feel like perfectionists, but then you're saying, okay, yeah.
But that's not AGI.
But CGI would need to be intelligent enough to give answers to humans and to understand.
And I think perfect isn't something humans can't understand.
Because even science doesn't give perfect answers.
There's always gabs and mysteries and I don't know.
I don't know if humans want perfects.
Yeah, I can imagine just having a conversation
with this kind of oracle entity as you'd imagine them.
And yeah, maybe it can tell you about,
you know, based on my analysis of human condition,
you might not want this.
And here are some of the things that might,
you might not want to.
But every dumb human will say, yeah, yeah, yeah, trust me.
I can give me the truth, I can handle it.
But that's the beauty, like people can choose.
So.
But then,
the old marshmallow test with the kids and so on,
I feel like to many people, people can't handle the truth,
probably including myself.
Like, did deep truth to the human condition?
I don't know if I can handle it.
Like, what if there's some darks that,
what if we are an alien science experiment?
And it realizes that, what if it hacked?
I mean, this is the matrix, you know, all over again. I don't know. I would, what would I talk about? I don't even, yeah, I,
probably I will go with the safe for scientific questions at first that have
nothing to do with my own personal life and mortality, just like about physics and so on.
Yeah. To build up, like, let's see where is that,
or maybe see if it has a sense of humor.
That's another question.
Would it be able to, presumably, in order to,
if it understands humans deeply,
would it be able to generate...
Yeah.
...to generate humor.
Yeah, I think that's actually a wonderful benchmark almost.
Like, is it able, I think that's a really good point, basically,
to make you laugh. Yeah, if it's able to be like a very effective stand-up comedian that is
doing something very interesting computationally, I think being funny is extremely hard. Yeah. Because
it's hard in a way, like a touring test, the original intent of the touring test is hard,
because you have to convince humans and there's nothing that's why
That's what I mean comedians talk about this like there's this is deeply honest because if people can't help but laugh
And if they don't laugh that means you're not funny. Yeah, it's funny
And you're showing you need a lot of knowledge to create to create humor about like the argument you mentioned human condition and so on and then you need to be clever with it
You mentioned a few movies you tweeted movies that I've seen five plus times, but
I'm ready and willing to keep watching interstellar gladiator contact good will hunting the matrix load of the rings all three
Avatar fifth element so on goes on term a two mean girls. I'm not gonna ask about that
But I think girls is great. What are some of the jump onto your memory that you love
in why? Like you mentioned the matrix is a computer person why do you love the matrix?
There's so many properties that make it like beautiful and interesting so there's all
these philosophical questions but then there's also AGI's and there's simulation and it's cool. And there's, you know, the black, you know, the look of
it, the feel of it, the feel of it, the action, the bullet time. It was just like innovating
in so many ways. And then good, well, good little hunting way like that one. Yeah, I just,
I really like this torture genius sort of character who's like grappling
with whether or not he has like any responsibility or like what to do with this gift that he was given
or like how to think about the whole thing. And there's also dance between the genius and the
the personal like what it means to love another human being and there's a lot of themes there.
It's just a beautiful movie. And then the fatherly figure the mentor
in the psychiatrist.
And it like really like it messes with you.
You know, there's some movies that's just like
really mess with you on a deep level.
Do you relate to that movie at all?
No, it's not your fault, I'm not sure.
As I said, Lord the Rings, that's self-explanatory.
Terminator 2, which is interesting.
You rewatch that a lot.
Is that better than Terminator 1?
You like Arnold's stuff?
I do like Terminator 1 as well.
I like Terminator 2 a little bit more, but in terms of like at surface properties.
Do you think SkyNet is at all a possibility?
Yes.
Like the actual sort of autonomous weapon system kind of thing.
Do you worry about that stuff?
I do worry I've been used for war.
I 100% worry about it.
And so the, I mean, some of these fears of AGI's
and how this will plan out, I mean, these
will be like very powerful entities probably at some point.
And so for a long time, they're
going to be tools in the hands of humans. You know, people talk about like alignment of AGI's and how to make the problem
is like even humans are not aligned. So how this will be used and what this is going to look like is
yes, troubling. So do you think it'll happen slowly enough that we'll be able to
as a human civilization think think through the problems.
Yes. That's my hope is that it happens slowly enough and an open enough way where a lot of
people can see and participate in it. Just figure out how to deal with this transition,
I think, which is going to be interesting.
I draw a lot of inspiration from nuclear weapons because I sure thought it would be, it
would be fucked once they develop nuclear weapons. But it's almost like when the systems
are not so dangerous, they destroy human civilization, we deploy them and learn the lessons.
And then we quickly, if it's too dangerous, we're quickly, quickly, we might still deploy
it, but you very quickly learn not to use them. And so there'll be like this balance achieved.
Humans are very clever as a species. It's interesting. We exploit the resources as much as we can, but we avoid
destroying ourselves. It seems like. Well, I don't know about that actually. I hope it continues.
I mean, I'm definitely like concerned about nuclear weapons and so on, not just as a result of
the recent conflict, even before that. That's probably my number one concern for this committee.
So if humanity destroys itself or destroys 90% of people, that would be because of nukes.
I think so. And it's not even about full destruction. To me, it's bad enough if we reset society,
that would be terrible. It would be really bad. And I can't believe we're like so close to it. It's like so crazy to me. It feels like we might be a few tweets
away from something like that. Yeah. Basically, it's extremely unnerving, but and has been for me
for a long time. It seems unstable that world leaders just having a bad mood can like take one step towards a bad direction and escalates.
Because of a collection of bad moods, it can escalate without being able to stop.
Yeah, it's a huge amount of power. And then also with the proliferation, basically, I don't
actually really see, I don't
actually know what the good outcomes are here. So I'm definitely worried about that a lot.
And then AGI is not currently there, but I think at some point will more and more become
something like it. The danger with AGI even is that I think it's even less likely worse in a sense
that there are good outcomes of AGI and then the bad outcomes are like an
epsilon away, like a tiny one way. And so I think capitalism and humanity and so on will drive
for the positive ways of using that technology, but then if bad outcomes are just like a tiny,
like flip a minus sign away, that's a really bad position to be in.
A tiny perturbation of the system results in the destruction of the human species.
It's a weird line to walk.
Yeah, I think in general, it's really weird about the dynamics of humanity and this explosion
we've talked about. It's just like being saying coupling afforded by technology.
Just the instability of the whole dynamical system, I think it just doesn't look good, honestly.
Yes, that explosion could be destructive and constructive and the probabilities are non-zero in both.
Yeah, I'm going to have to, I do feel like I have to try to be optimistic and so on.
And yes, I think even in this case, I still am predominantly optimistic, but there's definitely.
Me too.
Do you think we'll become a multi-point area species?
Probably yes, but I don't know if it's dominant feature of future humanity. There
might be some people on some planets and so on, but I'm not sure if it's like, yeah,
if it's like a major player in our culture and so on, we still have to solve the drivers
of self-destruction here on earth. So just having a backup on Mars is not going to solve
the problem. So by the way, I love the backup on Mars.
I think that's amazing.
We should absolutely do that.
Yes.
And I'm so thankful.
Would you go to Mars?
Personally, no.
I do like Earth quite a lot.
OK.
I'll go to Mars.
I'll go for you.
I'll tweet at you from now.
Maybe eventually I would.
It's safe enough, but I don't actually
know if it's on my lifetime scale, unless I can extend it by a lot.
I do think that, for example, a lot of people might disappear into virtual realities and
stuff like that.
I think that could be the major thrust of the cultural development of humanity if it survives.
So it might not be, it's just really hard to work in physical realm and go out there.
And I think ultimately all your experiences are in your brain.
Yeah. And so it's much easier to disappear into digital realm. And I think people will find them
more compelling, easier, safer, more interesting. So you're a little bit captivated by virtual
reality by the possible worlds, whether it's the metaverse or some other manifestation of that.
Yeah. Yeah, it's really interesting. It's, I'm interested just just talking a lot
to Carmac. Where's the, where's the thing that's currently preventing that? Yeah, I mean,
to be clear, I think what's interesting about the future is it's not that I kind of feel
like the variance in a human condition grows. That's the primary thing that's changing. It's
not as much demean of the distribution
It's like the variance of it. So there will probably be people in Mars and there will be people in VR and there will people here on Earth
It's just like there will be so many more ways of being and so I kind of feel like I see it as like a spreading out of a human experience
There's something about the internet that allows you to discover those little groups and you gravitate to something about your biology, likes that kind of world, and that you find each other.
And we'll have transhumanists, and then we'll have the omnis, and everything is just
going to coexist.
The cool thing about it, because I've interacted with a bunch of internet communities, is
they don't know about each other.
You can have a very happy existence, just like having a very close-knit community and
not knowing about each other. I mean, even you even sense this just having travel to Ukraine, they don't know so many
things about America. When you travel across the world, I think you experience this too.
There are certain cultures that are like, they have their own thing going on. So you
can see that happening more and more and more and more in the future. We have little communities.
Yeah, yeah, I think so.
That seems to be the, that seems to be how it's going right now.
And I don't see that trend like really reversing.
I think people are diverse and they're able to choose their own path and existence.
And I sort of like celebrate that.
And so we use been some much time in the metaverse, in the virtual reality,
or which community area.
Are you the physicalist, the physical reality, enjoyer, or do you see drawing a lot of pleasure
and fulfillment in the digital world?
Yeah, I think, well, currently, the virtual reality is not that compelling.
I do think it can improve a lot, but I don't really know to what extent.
Maybe there's actually even more exotic things you can think about with neural links or
stuff like that.
Currently I kind of see myself as mostly a team human person.
I love nature.
I love harmony.
I love people.
I love humanity.
I love emotions of humanity.
And I just want to be like in this solar punk little utopia,
that's my happy place.
My happy place is like people I love
thinking about cool problems,
surround by a lush, beautiful, dynamic nature.
And secretly high tech in places that count.
Places like a, the use technology
to empower that love for other humans and nature.
Yeah, I think a technology used very sparingly.
I don't love when it sort of gets in the way of humanity in many ways.
I like just people being humans in a way we sort of like slightly evolved and prefer,
I think, just by default.
People kept asking me because they know you love reading.
Are there particular books that you enjoyed that had an impact on you, for silly or for profound reasons
that you recommend. You mentioned the vital question. Many, of course. I think in biology as an
example, the vital question is a good one. Anything by Nikolayn, really, life ascending, I would say,
is like a bit more potentially representative, is like a summary of a lot of the things he's been
talking about. I was very impacted by the selfish gene. I thought that was a really good book that
helped me understand altruism as an example and where it comes from and just realizing that
you know the selection is on the level of genes was a huge insight for me at the time and it's
sort of like cleared up a lot of things for me. What do you think about the idea that ideas are the organisms,
the meats?
Yes.
Love it.
100%.
Are you able to walk around with that notion for a while
that there is an evolutionary kind of process with ideas as well?
There absolutely is.
There's memes just like genes and they compete
and they live in our brains.
It's beautiful.
Are we silly humans thinking that we're the organisms?
Is it possible that the primary organisms are the ideas?
Yeah, I would say like the idea is kind of living
in a software of like our civilization
in the minds and so on.
We think as humans that the hardware
is the fundamental thing.
I human is a hardware entity,
but it could be the software. Yeah, I would
say there needs to be some grounding at some point to like a physical reality. But if we
clone an Andre, the software is a thing like, is this thing that makes that thing special?
Yeah, I guess I you're right. But then cloning might be exceptionally difficult.
Like there might be a deep integration between the software
and the hardware in ways we don't quite understand.
Well, from the old shim,
in front of you, like what makes me special is more like the gang
of genes that are writing in my chromosomes, I suppose,
right?
They're replicating unit, I suppose.
And no, but that's just for a few.
The thing that makes you special, sure.
Well, the reality is what makes you special is your ability to survive based on the software
that runs on the hardware that was built by the genes.
So the software is the thing that makes you survive, not the hardware.
It's a little bit of both.
It's just like a second layer.
It's a new second layer
that hasn't been there before the brain. They both coexist. But there's also layers of
the software. I mean to reach for textbooks sometimes.
I kind of feel like books are for too much of a general consumption sometime.
And they just kind of like,
they're too high up in the level of abstraction
and it's not good enough.
So I like textbooks.
I like the cell.
I think the cell was pretty cool.
That's why also I like the writing of Neck Lane
is because he's pretty willing to step one level down
and he doesn't,
yeah, he's willing to go there. But he's also willing to sort of be throughout the stack.
So he'll go down to a lot of detail, but then he will come back up. And I think he has a,
yeah, basically I really appreciate that. That's why I love college early college, even high school.
Just textbooks on the basics of computer science and mathematics
of biology, of chemistry.
Those are, they condense down, like, it's sufficiently general that you can understand
both the philosophy and the details, but also like you get homework problems and you get
to play with it as much as you would if you were in programming stuff.
And then I'm also suspicious of textbooks honestly,
because as an example in deep learning,
there's no like amazing textbooks
and I feel this changing very quickly.
I imagine the same is true and say synthetic biology
and so on, these books like the cell are kind of outdated.
They're still high level.
Like what is the actual real source of truth?
It's people in wet labs working with cells,
sequencing genomes and labs working with cells, you know, sequencing genomes and
yeah, actually working with working with it. And I don't have that much exposure to that or what that looks like. So I still don't fully, I'm reading through the cell and it's kind of interesting
and I'm learning, but it's still not sufficient, I would say, in terms of understanding.
Well, it's a clean summarization of the mainstream narrative.
Yeah, but you have to learn that before you break out at the
tutorials, the cutting edge.
What is the actual process of working with these cells
and growing them and incubating them?
It's kind of like a massive cooking recipe,
so making sure you're self-slows and proliferate,
and then you're sequencing them running experiments,
and just how that works, I think is kind of like
the source of truth of, at the end of the day,
what's really useful in terms of creating therapies and so on.
Yeah, I wonder what in the future AI textbooks will be because you know, there's artificial
intelligence, the modern approach, I actually haven't read if it's come out the recent
version, the recent, there's been a recent addition.
I also saw there's a signs of deep learning book.
I'm waiting for textbooks that were recommending worth reading.
Yeah.
It's tricky because it's like papers and code, code, code.
Honestly, I find papers are quite good.
I especially like the appendix of any paper as well.
It's like the most detail you can have.
It doesn't have to be cohesive connected to anything else.
You just describe me in a very specific way.
You saw the particular thing.
Yeah, many times papers can be actually quite readable,
not always, but sometimes the introduction
in the abstract is readable, even for someone outside of the field.
This is not always true.
And sometimes I think I'm fortunately scientist to use complex terms, even when it's not
necessary.
I think that's harmful.
I think there's no reason for that.
And papers sometimes are longer than they need to be in the parts that don't matter.
Yeah, appendix would be long, but then the paper itself, you know, look at Einstein, make it simple.
Yeah, but suddenly I've come across papers I would say, say, like synthetic biology or something
that I thought were quite readable for the abstract and the introduction, and then you're reading
the rest of it, and you don't fully understand, but you kind of are getting a gist, and I think it's cool.
What advice, you give advice to folks interested
in machine learning and research but in general life advice to a young person high school,
early college about how to have a career that can be proud of or life that can be proud of.
Yeah, I think I'm very hesitant to give general advice. I think it's really hard. I've mentioned
like some of the stuff I've mentioned is fairly general, I think like focus on just the amount of work you're spending on like a
thing compared yourself only to yourself, not to others. That's good. I think those are fairly general.
How do you pick the thing? You just have like a deep interest in something or like try to like
find the argmax over like the things that you're interested in. Argmax at that moment and stick with it. How do you now get distracted and switch to another thing?
You can, if you like.
If you do an argmax repeatedly every week, every week,
it doesn't converge.
It doesn't.
It's a problem.
Yeah, you can like low pass filter yourself in terms of like what has consistently been true
for you.
But yeah, I definitely see how it can be hard,
but I would say like you're going to work the hardest
on the thing that you care about the most.
So look past filter yourself and really introspect
in your past where are the things that gave you energy
and where are the things that take energy away from you,
concrete examples.
And usually from those concrete examples,
sometimes parents can emerge.
I like it when things look like this,
when I'm in these positions.
So that's not necessarily the field,
but the kind of stuff you're doing in a particular field.
So for you, it seems like you were energized
by implementing stuff, building actual things.
Yeah, being low level, learning, and then also communicating
so that others can go through the same realizations
and shortening that gap, because I usually
have to do way too much work to understand a thing.
And then I'm like, okay, this is actually like, okay, I think I get it.
And like, why was it so much work?
It should have been much less work.
And that gives me a lot of frustration.
And that's why I sometimes go teach.
So aside from the teaching you're doing now, putting out videos, aside from a potential
Godfather Part II, with the AGI at Tesla and beyond, what does the future
of Fondreck or Pathy hold? Have you figured that out yet or no? I mean, as you see through the fog
of war, that is all of our future. Do you start seeing silhouettes of what that possible future could look
like? The consistent thing I've been always interested in for me at least is AI.
And that's probably why I'm spending my rest of my life on because I just care about it a lot.
And I actually care about like many other problems as well, like say aging,
which I basically view as disease. And I care about that as well, but I don't think it's a good
idea to go after it specifically.
I don't actually think that humans will be able to come up
with the answer.
I think the correct thing to do is to ignore those problems
and you solve AI and then use that to solve everything else.
And I think there's a chance that this will work.
I think it's a very high chance.
And that's kind of like the way I'm betting at least.
So when you think about AI, are you interested in all kinds of applications,
all kinds of domains,
and any domain you focus on will allow you to get insights
to have the big problem of AGI?
Yeah, for me, it's the ultimate meta problem.
I don't want to work on any one specific problem,
there's too many problems.
So how can you work on all problems simultaneously?
You solve the meta problem,
which to me is just intelligence,
and how do you automate it?
Is there cool small projects like archive sanity and so on that you're thinking about
that the world, the ML world can anticipate?
There's always like some fun site projects. Archive sanity is one. Basically,
there's way too many archive papers, how can I organize it and recommend papers and so on.
I transcribed all of your podcasts.
What is your look from that experience?
From transcribing the process of like you like consuming audiobooks and podcasts and so on.
Here's a process that achieves
closer to human level performance and annotation.
Yeah, well, I definitely was like surprised that transcription with opening as Whisperer was
working so well.
Compared to what I'm familiar with from Siri and like a few other systems, I guess, it
worked so well.
And that's what gave me some energy to try it out.
And I thought it could be fun to run on podcasts.
It's kind of not obvious to me why Whisperer is so much better compared to anything else
because I feel like there
Should be a lot of incentive for a lot of companies to produce transcription systems and that they've done so over a long time
whisper is not a super exotic model. It's a transformer
It's takes male spectrograms and you know just outputs tokens of text. It's not crazy
The model and everything has been around for a long time
I'm not actually 100% sure why it's not obvious to me either.
It makes me feel like I'm missing something
for the missing something.
Yeah, because there is huge, even at Google
and so on YouTube transcription.
Yeah, it's unclear, but some of it is also integrating
into a bigger system.
That is so the user interface, how it's deployed,
and all that kind of stuff.
Maybe running it as an independent thing is eat much easier like an order magnitude easier than deploying to a large integrated system like YouTube transcription or
Anything like meetings like zoom has trans transcription
That's kind of crappy but creating
Interface where the text the different individual speakers. It's able to
but creating interface where the text is different individual speakers, it's able to
display it in compelling ways, run it real time, all that kind of stuff. Maybe that's difficult.
I still have the explanation I have because I'm currently paying quite a bit for human transcription and human captions and annotation. It seems like there's a huge incentive to automate that.
Yeah, it's very confusing.
I think, I mean, I don't know if you look at some of the whisper transcribes, but they're
quite good.
They're good.
And especially in tricky cases.
Yeah, I've seen whisper performance on like super tricky cases and it doesn't credibly
well.
So I don't know.
Apoptia is pretty simple.
It's like high quality audio and you're speaking usually
pretty clearly.
And so I don't know, I don't know what open AI's plans are
either.
But yeah, there's always like fun projects basically.
And stable diffusion also is opening up a huge amount
of experimentation.
I would say in the visual realm and generating images
and videos and movies.
Ultimately. Yeah, videos now.
And so that's going to be pretty crazy. That's going to, that's going to almost certainly
work and it's going to be really interesting when the cost of content creation is going
to fall to zero. You used to need a painter for a few months to paint a thing and now it's
going to be speak to your phone to get your video.
So Hollywood will start using it to generate scenes, which completely opens up.
Yeah, so you can make a movie like Avatar, eventually, for under a million dollars.
Much less, maybe just by talking to your phone.
I mean, I know it sounds kind of crazy.
And then there'll be some voting mechanism, like how do you have a, like, would there be
a show on Netflix as generated completely, uh, automatically?
So, uh, essentially, yeah.
And what does it look like also when you can generate it on demand and it's, uh, and
there's infinity of it.
Yeah.
Oh, man.
All the synthetic art.
I mean, it's humbling because we treat ourselves
as special for being able to generate art and ideas
and all that kind of stuff.
If that can be done in an automated way by AI.
Yeah.
I think it's fascinating to me how these,
the predictions of AI and what it's going to look like
and what it's going to be capable of
are completely inverted and wrong.
And sci-fi of 50s and 60s, which is just totally not right. They
imagine AI as like super-calculating theoremprovers and we're getting things that can talk to you
about emotions. They can do art. It's just like weird.
Are you excited about that feature? Just AI's like hybrid systems, heterogeneous systems,
the humans and AI's talking about emotions, Netflix and children and AI system, that's
shit.
Word and that looks thing you watch is also generated by AI.
I think it's going to be interesting for sure.
And I think I'm cautiously optimistic, but it's not obvious.
Well, the sad thing is your brain and mine developed in a time where before Twitter, before the internet.
So I wonder, people that are born inside of it might have a different experience.
Like I, maybe you can still resist it.
And the people born now will not.
Well, I do feel like humans are extremely malleable.
Yeah.
And you're probably right.
What is the meaning of life, Andre?
We talked about sort of the universe having a conversation with us humans or with the
systems we create to try to answer for the universe, for the creator of the universe
to notice us, we're trying to create systems that are loud enough to answer back.
I don't know if that's the meaning of life. That's like meaning of life for some people.
The first level answer I would say is anyone can choose their own meaning of life because we are
conscious entity and it's beautiful. Number one, but I do think that like a deeper meaning of life
is someone is interested, is or along the of like, what the hell is all this?
And like why?
And if you look at the interfundamental physics and the quantum field theory and the standard
model, they're like very complicated.
And there's this like 19 free parameters of our universe.
And like, what's going on with all this stuff?
And why is it here?
And can I hack it?
Can I work with it?
Is there a message for me?
Am I supposed to create a message?
And so I think there's some fundamental answers there.
But I think there's actually even like, you can't actually really make dent in those without
more time.
And so to me, also, there's a big question around just getting more time, honestly.
Yeah, that's kind of like what I think about quite a bit as well.
So kind of the ultimate or at least first
way to sneak up to the why question is to try to escape the system, the universe. And then
for that, you sort of backtrack and say, okay, for that, that's going to be take a very
long time. So the why question boils down from an engineering perspective to how do we extend?
Yeah, I think that's the question number one, practically speaking, because you can't,
you're not going to calculate the answer to the deeper questions in time you have.
And that could be extending your own lifetime or extending just the lifetime of human civilization.
Of whoever wants to. Not many people might not want that. But I think people who do want that,
I think I think it's probably possible.
And I don't know that people fully realize this.
I kind of feel like people think of death as an inevitability.
But at the end of the day, this is a physical system.
Some things go wrong.
It makes sense why things like this happen evolutionarily, speaking.
And there's most certainly interventions that mitigate it. That would be interesting if death is eventually looked at as a fascinating thing that used to happen to humans.
I don't think it's unlikely. I think it's likely.
And it's up to our imagination to try to predict what the world without death looks like.
Yes, it's hard to, I think the values will completely change.
Could be. I don't really buy all these ideas that, oh, without death, there's no meaning,
there's nothingness. I don't intuitively buy all those arguments. I think there's plenty of meaning,
plenty of things to learn. They're interesting, exciting. I want to know, I want to calculate,
I want to improve the condition of all the humans and organisms that are alive, at the way we find meaning
might change. There is a lot of humans probably including myself that finds meaning in the
finiteness of things, but that doesn't mean that's the only source of meaning. I do think many people
will go with that, which I think is great. I love the idea that people can just choose their own adventure
Like you you are born as a conscious free entity by default. I like to think and
You have your alienable rights for life in the pursuit of happiness
I don't know
In the nature the landscape of happiness and You can choose your own adventure mostly,
and that's not the fully true, but.
I'm still pretty sure I'm an NPC,
but an NPC can't know, it's an NPC.
There could be different degrees
and levels of consciousness.
I don't think there's a more beautiful way to end it.
Andre, you're an incredible person.
I'm really honored you would talk with me.
Everything you've done for the machine learning world,
for the AI world, to just inspire people
to educate millions of people.
It's been great, and I can't wait to see what you do next.
It's been an honor, man.
Thank you so much for talking to me.
Awesome, thank you.
Thanks for listening to this conversation
with Andre Karpathi.
To support this podcast, please check out our sponsors
in the description. And now, let me leave you with some words from Samuel Carlin. The purpose of
models is not to fit the data, but to sharpen the questions. Thanks for listening and hope to see you next time.