Big Technology Podcast - NVIDIA's AI Moat & Origins — With Bryan Catanzaro
Episode Date: February 28, 2024Bryan Catanzaro is NVIDIA's VP of applied deep learning research. He joins Big Technology Podcast to discuss why NVIDIA is building more than just chips, examining its software and algorithms that hel...p tech companies build and run AI models. Join us for a conversation about how NVIDIA sees the world, what's led to its success, and what makes it indispensable. In the second half, we discuss how Bryan helped kick off NVIDIA's push into AI, from the very start to where it is today. --- --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. For weekly updates on the show, sign up for the pod newsletter on LinkedIn: https://www.linkedin.com/newsletters/6901970121829801984/ Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
The Nvidia executive who started its AI push joins us to talk about what makes the company
so indispensable unpacking the secrets to its success. That's coming up right after this.
Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world
and beyond. We have a great conversation for you today. We're going deep inside
Nvidia with Brian Katanzaro. He's the vice president of applied deep learning research at
at NVIDIA, and he's going to be speaking at the company's forthcoming GTC conference, March 18th and 21
in San Jose, and his conversation is going to be specifically on practical AI agents that reason
and code at scale. So if you're in the area, you're thinking about heading out, you can mark that down,
but let's get to the conversation. Brian, welcome to the show. Thanks. I'm glad to be here.
Great to have you. You're kind of the guy that kicked off this whole AI push within NVIDIA.
Well, you know, it took a whole company to transform Nvidia into what it is today.
So there's tens of thousands of people that deserve a lot of credit.
But I was honored to be one of the people that helped Nvidia get started a long time ago.
What is exactly happening within your company?
What do you offer that allows anyone that wants to build AI and then run AI models to do it effectively?
Invidia is an accelerated computing company.
Wait, wait, so what does that mean, accelerated computing?
I hear you guys talk about it all the time.
We've been talking about it for a long time, and the world still doesn't quite understand it.
So I'm glad to try to explain it.
The idea is that the world faces a lot of computational challenges that can't be solved without faster computers.
But building a computer is not enough in order to actually deliver acceleration.
All of the pieces have to line up and plug together and be fully optimized across the entire stack
so that people have the chance to do things that they just couldn't do otherwise computationally.
And AI is a great example of that.
You know, training and deploying the awesome generative models that are changing the world right now
is extraordinarily computationally intensive.
It's the biggest computational challenge the world has ever faced.
And the reason that Nvidia is providing something,
useful here is because for for decades we've taken on this mission of optimizing the entire stack
to build software algorithms libraries frameworks compilers systems networking chips the whole thing
and optimize them together for the most important workloads and and that's AI so it's
interesting because if you think about like a traditional chip company and by the way I might be
totally off base on this and correct me if I am the manual
manufacturer will buy the chip, put it inside, let's say, their computer, right? And then build all the software around it. But you guys also build, you not only manufacture the chip, but you build the algorithms and the software that surrounds it so that enables companies to get the most out of it. So is that a right characterization of what you're doing? Yeah, I think so. I mean, the core thesis that powers Nvidia is that a chip could never be enough. You know, just just the same way that a chip couldn't be enough.
for my Apple phone, for example.
You know, Apple makes awesome chips,
but the experience of using my phone
is a lot more than the chip.
And, you know, the way that Apple's able
to vertically integrate and optimize their entire system
in order to create an amazing consumer experience,
I think is pretty incredible and super valuable.
What Nvidia is doing is not the same,
but it's related in the sense that we understand
that the value of the technology we create,
create is only understood in composition, in context. It's really about are we delivering
acceleration, transformative acceleration to the most important computational workloads of our time.
So why couldn't other companies then just go and build their own software to train
using Nvidia chips or other chips? Because it seems to me like, correct me from it wrong here
also, but if I'm reliant on Nvidia's software, it's close source, right? So I'm going to train my
model with it, it's kind of difficult to switch to another chip. So why particularly rely on the
Nvidia software? Yeah, I mean, we have a lot of open source software as well as some close
source software. You know, we make the decision about what to open source based on what we think
is going to help the market most. But, I mean, the reason why people work with us is because
we deliver transformational acceleration. We enable people to do things computationally that
they just couldn't do. And we know that it would never be enough to just provide a chip that
said it was really fast and had like a lot of operations per second inside the chip. Because the
gap between, you know, what a particular chip can do and what the experience of a scientist
or engineer trying to invent the future, that gap is quite enormous. And if any one of the
links in that chain, whether we're talking about systems design or networking or data center
design or the compilers, frameworks, libraries, applications, algorithms, you know, if any of
those links are to fail, the acceleration is lost and the value therefore is lost. And so
Nvidia has a unique way of approaching this problem, co-optimizing the entire stack in order
to deliver that acceleration to the end scientists and researchers that are trying to invent
the future. And that's what differentiates us from other.
companies. Now, could other companies do that? I mean, absolutely. It's not a secret. In fact,
we've been shouting it from the rooftops for decades that this is what we do and that it's different
from being a chip company. But, you know, we're continuing to test that thesis. You know,
is there value in accelerated computing that's above and beyond what you get just from making
and selling awesome chips? And I think the answer to that is yes. And, you know, I think that's
the reason why we've been so successful. Right. And to me, I think this is just for anyone listening,
like, I spent the whole week making calls on Nvidia trying to figure out, because I was wrong.
I thought that like everything would kind of slow down this year. And I was like speaking to
customers and analysts, tell me exactly what I missed. And this was the thing that I underestimated,
which is that it's not just the chip, but it's the chip, the software and everything that goes along
with it. And that's why the company has been so successful. So let's talk a
a little bit about like what it actually goes into training an AI model. So let's say I'm an
organization. I come to invidia, say I have a bunch of data or maybe I even don't have data
and I'm looking to build a large language model. What do I do now? Deep breath. That's a great question.
You know, the first thing that's on my mind is like, you know, what data center are you going to
use to train this model in? And, you know, that's a really important question because it turns out
that the AI market is growing pretty fast
because there's so many institutions
that are training these huge models
and you actually have to have a building
to put these machines in
and they're not small.
And you need to hook it up to power.
So that would be one of my first questions
is like, okay, are you ready to stand this up?
Or are you going to be working with a CSP,
like for example, AWS, you know?
And we love to support our customers
through cloud providers as well.
Okay.
And so then what happens next?
Let's say I'm set up.
You're set up.
Okay, so we will definitely point you to our reference implementations of the various LLMs
and their training setups on these clouds.
We'll show you how to scale it to, you know, many thousand GPUs efficiently.
We'll tell you what kind of speed you should expect to get while training the model.
and we'll also, you know, discuss things about reliability, you know, how do we make sure that
the job is actually progressing properly and yielding, you know, intelligence. You know, so we
definitely help our customers with things like that. I think also then when the model's trained,
there's a question about how do we deploy it, you know, and we'd love to help people deploy
AI as well. I think Jensen said in the earnings call this week that somewhere around 40% of our
data center GPUs were going for inference, which I think is pretty amazing and definitely
a shift from where things have been a few years ago. And so we're spending a lot more time
helping our customers accelerate the deployment of these models as well, making sure that they
get the best speed so that they can get as much out of the systems they're deploying
these models on as possible.
And what are they using the models for?
Language models are starting to be used in a lot of different parts of a lot of different companies.
Things like question answering I think are really important to help help people understand
answers to their specific questions, especially relating to private data stores that they need to answer questions with.
You know, I think we're seeing a lot of people use AI and office type settings.
I don't know if you've interacted with Microsoft co-pilot at all, but it can be really helpful, at least to me, when I'm looking at a summary of a meeting and what the action items are for everybody at the meeting.
you know, and other sort of office automation tasks.
You know, we're also pushing forward with the use of these models for our own internal work.
And in VDIA, we have a project called Chip Nemo that is using language models to help our chip designers and verification efforts be more efficient as we build our own products.
It's called Chip Nemo?
Yeah, Nemo is kind of our most user-friendly,
open source software for training and fine-tuning language models and other kinds of conversational
AI. It also has a lot of speech capabilities as well. And, you know, we've been building it for
quite a few years because we believed that conversational AI was really going to transform
industry. We wanted to make a platform for companies to build their own, build and deploy
their own conversational AI. So that's what, that's what Nemo is. And so when we talk about Chip Nemo,
We're talking about using that for our own chip work.
So how do you use it for your own chip work?
At the moment, a lot of it has to do with improving communication between chip designers.
So you have like a thousand people working on this project and there's a lot of interfaces that need to be described and, you know, people have questions.
They don't know exactly who to talk to.
So basically we're making knowledge bases about our own work that then people can use to answer questions.
And we found that that it's kind of like having a more senior engineer that you can talk to all the time that helps you find the things you need to find in a huge code base.
And so that's the primary thing that we're doing right now is augmenting the engineers on the team with kind of superpowers to understand our own code better and interact with it better.
Over time, I expect that Chip Nemo is going to do other things as well, you know, improving the quality of our design.
lines, you know, our Hopper GPUs, for example, have a lot of circuits in them that were designed by AI that we built ourselves that have better speed and power and cost characteristics than we knew how to build with any other tool.
Wait, generative AI programs designed some of the chips?
Yes, Hopper. Hopper is designed with generative AI.
That's insane. It's wild. So let's dream a little bit.
Obviously, we know that knowledge repositories inside companies is something that this stuff
is going to be really good for, maybe a little bit of like consumer agents or consumer chatbots
like chat chepti. Is this where it ends? Like, where do you see it going? I don't think this is
where it ends. You know, I've been thinking recently about past revolutions in the media space.
Um, you know, we, we got books, uh, which transformed society, you know, when, because we could
distribute ideas and we could reference the same ideas in a new way because everybody could
read the same book, you know, um, audio, you know, as soon as we got audio recordings that
created an entirely new industry, you know, the recorded music industry, which, uh, continues
to be totally vibrant and important to our culture. Um, movies, uh, TV, you know, um, every,
and video games, you know, every, every, every time.
that we come up with a new technology, we find a way to explore ideas as humans and explore our
culture together in a way that helps us solve problems better and also creates a new form of
culture that we interact with. And I think that the most exciting applications for AI are ones
we haven't really even dreamed up yet in the same way that it would be, it would have been
hard to imagine how books were going to change the world back when Gutenberg first made
the press. I think AI is going to create a new form of media that is much more interesting,
much more engaging, much more useful. And ultimately, we're going to use that to refine our ideas
and explore them together. And the way that we have with other media, it's just going to be much
more interesting and useful. When I hear you say media, it needs me to believe that you think
that this is going to be more of like an agent or a digital friend that people will start
interacting with, right? Because that's media. Unless is there something else or something else
I could be thinking about? I'm not thinking about that could take the form of media. Yeah,
something along those lines. I mean, I think, you know, I'm expecting AI is going to change the lives
of all of us here on planet Earth. And when I think about how 8 billion people on this planet live,
you know, most of us aren't reading and writing that much, you know.
But we do love virtual worlds.
People love interacting in video games.
We love interacting with each other.
And I think that the primary way that people are going to interact with AI is going to be in virtual worlds.
Because I think that's going to be the most natural way of interaction and the most useful way.
And I think we're going to perceive that as a new form of media that really touches, you know, all aspects of our work and our play.
You know, it's going to be something new.
So you're a real believer in this method.
ediverse vision that you'll just kind of end up in a digital world and the people and the scenery
will all be AI generated or maybe mostly AI generated and you go. People, we have a culture,
it's very important to us. You know, the ideas that we share together and the sort of shared
humanity that we have is more important to us than the content of the things that we're
interacting with. So, for example, AI is probably going to be really awesome at playing soccer.
But do I think that people are going to go to watch robots play soccer? Even if the robots
are kicking around the ball better than humans, I don't think it's as interesting because I don't
think that it is related to us. You know, I think the primary thing that we're interested in is
ourselves. We're trying to understand ourselves and how we relate to other people. I think AI is going
to give us new ways of doing that. We are going to be interacting in virtual worlds. You know,
Nvidia has been a big believer in virtual worlds for, you know, the past 30 years. It's something
on gaming before you were on. That's right. And we've had this initiative called the Omniverse
long before Meta renamed itself because we believe that simulating the world and providing,
you know, virtual agents a place to interact with people is hugely important to the future of
technology. I see these things coming together. I think there's a lot of opportunities to use
virtual worlds to make AI stronger, to teach AI how to understand the real world and act better in the
real world. And then, of course, giving humans the opportunity to interact with AI's in much
more natural and useful ways. I think a lot of that's going to happen in a virtual world.
Okay, I want to talk about reasoning and a few other ways about how companies are going to work with
NVIDIA and what might be coming down the pike. So we're going to do that right after the
break coming up right after this.
Hey everyone, let me tell you about The Hustle Daily Show, a podcast filled with
business, tech news, and original stories to keep you in the loop on what's trending.
More than 2 million professionals read The Hustle's daily email for its irreverent and
informative takes on business and tech news.
Now, they have a daily podcast called The Hustle Daily Show, where their team of writers
break down the biggest business headlines in 15 minutes or less and explain why you
should care about them.
So, search for The Hustle Daily Show.
and your favorite podcast app, like the one you're using right now.
And we're back here on Big Technology Podcasts with Brian Katanzaro.
He's the vice president of Applied Deep Learning Research at NVIDIA.
What made you think, okay, AI is going to be big enough that I should get to Jensen,
CEO of NVIDIA, and say, we need to really work hard to make this part of our core offering?
I had been spending my research career at Berkeley as a Ph.D. student on the future of computing.
And we knew that computing was going to have to change back in 2005 or so.
It was obvious that computers would have to be different.
The standard way of making computers wasn't working anymore.
We would have to be more specific.
We'd have to be more parallel.
And so I had been spending my time as a grad student thinking about what kinds of applications
could take advantage of the computers that will be possible to build,
but then are going to provide enormous amounts of value to humanity.
And at the time, AI was not a very big field,
and it wasn't actually super popular to work in it.
But when I was thinking about it, I felt like from first principles,
it made sense to me that this was something that had the potential to really change the world.
And, you know, Invidia's approach to solving this,
think was also, you know, fairly careful and iterative. You know, so, you know, I published my first
paper in 2008 on machine learning on the GPU. And Nvidia really jumped in full steam ahead
for the whole company to become an AI company in 2013. So it took about five years of
sort of testing that thesis. Like, is AI actually going to be something that could really change
the world. And we started getting some early indicators of success. One of those was, of course,
the ImageNet competition in 2012, which really shocked the world with the quality of results
and wouldn't have been possible without accelerated computing. The results that they got
were so incredible because they built a very fast system for training neural nets and training
It wasn't generative, right? That was just identifying ozone photos. That's correct. Yeah, it wasn't generative at the time. But, you know, the idea of generative AI is fairly old. I mean, when I was a grad student, generative AI was a thing that we talked about all the time. It's just that we weren't using neural nets for it. We were using other models like graphical models. These are other mathematical approaches that are a little bit more clever, but don't scale as well. And so this is another part of the thesis that,
that I had is that, you know, the thing that's really going to help AI succeed is scale.
You know, if we can apply huge data sets and huge amounts of compute to AI, then the results
are going to get much better. And this is also controversial back then. And even today,
some people really don't like this idea because they would like AI progress to be
mostly held back by our smarts, like our mathematical skills in like coming up with more clever
models to describe our data in the world. But it does seem these days that there's a lot of
evidence that the most important thing is having really good data sets to learn from and then
enormous computational scale. And so that was my thesis. And I was advocating for that at
NVIDIA. I wrote this little prototype of a library for training neural nets on the GPU,
which then became KudianN, which was our very first library for,
for AI on the GPU. And, you know, the process of getting the company to rally around that
and build that as a product and ship it, you know, it took some time. But because there were
these, you know, early indicators of success that there was a lot of demand picking up, even back
then, it made sense for the company to really pay attention. And then, you know, Jensen himself is
such a visionary. I remember when he first started interacting with me about this back in 2012,
I felt like he was just so hungry to learn. So I felt like I gave him all the things that I
learned from my Ph.D. in the course of like an hour about like how AI could change
Nvidia's business and what Nvidia could potentially build. And my ambitions for what that
meant were like a thousand times smaller than Jensen's were. You know, he took it immediately and
then elaborated on it and thought about where is this going.
One of the things he first said back in 2012 was this is an entirely new way of writing
software rather than having humans enumerate all the different cases that software needs
to understand.
We're going to have models that learn from our data, how to solve problems.
And these days, that sounds like the truth, right?
Like we see that happening every day when we interact with these models.
But, you know, 12 years ago, that was a pretty bold thing to say.
And I was a little bit nervous about it because, you know, the history of AI over the past, you know, 70 years had been one of overpromising and under delivering in a lot of ways, which then caused a lot of booms and busts.
Right.
Yeah, a lot of AI winters.
I think it can do something and then just totally dry it up until it's starting to prove itself again.
And so when when Jensen like immediately glommed onto this and started.
like, thinking about what it could mean, I wanted to slow him down a little bit and be like,
Jensen, like, this is a big, huge idea, but like, I'm not sure if it's going to happen now.
It might be 30 years from now, you know.
But it turns out that Jensen was right about this.
This was the right time to apply enormous data and enormous compute to AI and get these results.
Right, but 2012 wasn't, I mean, it took another 10 years, 11 years, really, for the boom to come.
So what did it feel like?
Go ahead.
Oh, I was going to say, I think Nvidia is really good at decade-long technology development.
I've seen that happen at Nvidia many times.
You know, ray tracing, I was in meetings in 2008 with Jensen on ray tracing, and we launched our
first ray tracing GPU in 2018.
You know, it took 10 years of continuous development and research in order to make ray-traced
virtual worlds a reality.
And Kuda itself, you know, the project.
that led to Kuta, they started in the early 2000s.
Kuda was released as a beta in 2006.
Right, and this is the software that all AI programming is done with with the H-100s, pretty much,
and A-100s.
Right.
Yeah, Kudas are our framework for programming the GPU and making a do stuff that's interesting.
And, you know, that project was crazy for a long time.
You know, Wall Street hated it.
it subtracted value from our earnings reports.
Like they looked at the costs of our products and they're like,
these products are too expensive.
Your margins are too low.
You know,
back then the margins were quite low.
And that's because, you know,
there wasn't the applications and the ecosystem that we're using CUDA yet in order
for us to,
you know,
build a strong business around it.
But Nvidia continued investing in CUDA in the libraries,
the software,
the compilers, the frameworks, and of course, also the chips for 10 years, you know,
actually maybe more than 10 years before all of a sudden Kuda became an overnight success.
You know, it's like 10 years of hard work that everyone ignored and Wall Street criticized
Nvidia for mercilessly. Why are you wasting your time on this? You know, everybody knows
the GPU is just for gamers. Why are you trying to make the GPU do something else? And, you know,
we did it anyway. And, you know, that's one of the things I love about this company. I think it's
one of the reasons why we're successful at the accelerated computing mission is that when we
decide to do something, we do it out of our convictions about how technology will unfold,
and we base those convictions on a speed of light analysis about what's actually possible to
try to keep ourselves honest. And then, you know, once we have that conviction, we're able to
follow through. What did you see in those years that everybody gave up on this? I mean, obviously
there were big advances that were made in things like machine learning, right?
computer vision and natural language processing and that's where we had facebook really take the lead
as the public spokesperson for this stuff talking about image recognition and they even built this
fake generative chat bot called m that i had access to that basically would be like it's supposed to be
a large language model we didn't even know it was going to be large language model as a pre-transformer
right but like you would talk to this bot and it would talk back and they were trying to figure out
like what people who were interested in if they're going to build a bot and they had this whole bot
platform that came out. But overall, like everyone's telling you, yeah, this is not worth building.
I mean, it's maybe just one or two companies that are using it. So why did you still think that,
I mean, I guess it's hard to predict what happened next, but why did you believe that that was
going to happen? Invita really thinks about these problems from first principles. You know,
we know that the way that computers are built is changing. We know that because of, you know,
Moore's law is slowing down. That requires more specialization. We know that there's a lot of
opportunities to really provide transformational speedups to important workloads if we specialize
the systems and the software for them. And we felt like, what is more important than this?
You know, what's more important than intelligence? And does the world need more intelligence?
Absolutely. The world needs enormous amounts of intelligence, like the problems that we face
as a planet, I think we're going to need a lot of intelligence to work through them.
And so for us, it was, I think, just kind of an obvious thing to do.
We had a lot of conviction.
We understood the technology.
We also saw early indicators of success from a lot of different directions, you know,
a lot of different companies, a lot of research institutions that were talking to us and saying,
hey, we have these goals to, like, train this huge language model.
like on enormous amounts of text, but, you know, the current systems are just too slow.
And, you know, there's this idea, you know, back 10 years ago, there was this idea that unsupervised
learning was going to change the world, but nobody knew how.
You know, unsupervised learning, meaning that rather than having humans go in and label every
picture, is it a cat, isn't a dog, that's supervised learning.
We're just going to show the model, all the pictures that we can find, and the model is going
to learn itself, something about pictures that then we can use to solve problems.
know, that idea has been around for decades, but actually turning it into something that
worked, you know, that's only happened over the past 10 years. And I think it's only happened
because of, you know, the increases in scale that we've been able to bring to the problem. So
during that 10 years, you know, we saw continuous improvement, even if the rest of the world
didn't see it. I think one of the one of the things about technology, when it's growing
on an exponential curve is that the beginning of it feels like nothing's happening outside,
you know? So exponential curves, the hockey stick kind of curve, it looks like nothing,
nothing, nothing, all of a sudden, huge success, you know, that's kind of what the exponential
curve looks like. But the interesting thing about an exponential curve is that the rate of
progress is constant, you know, it's always getting, you know, let's say 10% better.
Like every year, it's just 10% better, right? And so you can tell that.
you can see like wow this technology it's continuing to improve even if it's not reached the point
where it's useful for for the world yet we just have this confidence that it was so people had this
basically large you know swallows of text and like we want to build something like a large language
model but it just wasn't available yet did you guys notice when uh in 2017 the paper attention is all
you need comes out from google which is like the basic so what was the reaction
internally because even within Google, I'll say this, I've spoken with people within Google,
it was not a yawn, but it was like, uh, okay, not a like holy crap moment. But I'm curious what
happened within Nvidia because it's sort of, you know, your bread and butter. Yeah, absolutely.
And that paper caught our attention immediately because of the implications for our entire business.
So, you know, I told you earlier, accelerated computing is not about the chips. And this is a great
example of that. Like if we built a system that is for, let's say, Resnet 50, which in 2017,
that was the most widely, you know, talked about kind of neural network is these image classification
networks. If we built systems to accelerate that, that would be a really different kind of system
than a system designed to accelerate transformers. And so we have to ask this question, you know,
what's going to be the future? What's going to drive demand? How are we going to build the right
technology to accelerate the things that will matter a few years from now. And so of course,
we're always asking ourselves that question, you know, is there something coming along that's
going to change the way that people build AI? And if there is, then we need to think about what are
the implications for the systems that we're building. So yeah, we saw that paper. I have to say that
the title is a little bit like maybe of a pill to swallow because, you know, but attention is all
you need. It's like, but is it, you know, like it kind of, it kind of elicits that reaction from a lot of
people. But the thing that was really attractive about transformers to us was that we knew that
they had really favorable computational properties. And again, going back to this thesis that
the model is a little bit, the model is less important than the data and the compute that goes
into training the model. If you have a model that has really excellent compute properties that
allows you to scale really well efficiently to, you know, many thousand of GPs, the kinds of
results you can get from that are pretty spectacular. So we saw that early on. That's what the transformer
model did. That's what this paper attention is all you need, sort of architected. Absolutely. And
so we saw that it had the potential to do that. And so we were very curious about it. And, you know,
In my team, we had our own language models team back in 2017, and at the time we were using recurrent neural networks, which were the standard way of doing things before the transformer paper came out.
And so I asked an intern, hey, can you take a look at doing language modeling with transformers?
I'm hearing good things about it.
It would be great for us to have an independent perspective on whether this is a good idea.
and he came back, you know, a month or two later with just really astonishing results.
You know, there was no question that it was better than the models that we were using
and also that it was more scalable, so we were able to train bigger and smarter models
because of that scalability.
And so that was really important for us.
And then, you know, the whole company kind of paid attention to the way that Transformers
were changing AI and then started, you know, building systems to help make that even better.
Should Google have open sourced it?
I mean, they haven't gotten the most value out of it.
You know, others have gotten more value out of that paper.
I can't really speculate on Google's business or, you know, whether they should or shouldn't
have done things.
I think if Google had not open source that or had not published that paper, but if we started
seeing like incredible language modeling results, um, uh, uh,
We would have figured out some sort of a model that had good scalable properties that could help with this space.
And, you know, there's not just one transforming model.
There's lots of variation.
Yeah.
I think ultimately the community would have figured something out because it's so important, you know.
Yep.
I think Google deserves a ton of credit for doing that work first and for publishing that paper.
So you basically build the, you know, you're building, you know, for this world of AI.
you see the transformer model come out, you shift, you incorporate it, you start to see the GPTs from OpenAI.
Is that the next big moment on this journey where you're like, oh, this could be, this could be, because it was interesting, speaking again about like what people saw from the outside, we all knew that, you know, Open AI was doing text generation, but it didn't really click for most people until it became a chatbot.
So what did it look like for you when it was just like, you know, you've been watching this the whole time.
What did it look like on your end?
Yeah, well, you know, I'd been watching Open AI's work in language modeling since before GPT.
I don't know if you recall they had this sentiment neuron project, which I thought was really cool because it was doing unsupervised modeling of text.
And then they were able to find that just by showing the model a lot of text that all of a sudden the model had started to understand high level concepts about text, like for example, what kind of emotion is being expressed inside of this text.
And that was a really interesting thing because, like I said, unsupervised learning, the idea had been around for a long time that we would make a lot more progress as a field if we were able to do unsupervised learning.
But actually figuring out how to practically get some value out of just showing a lot of data to a model, it wasn't very obvious to everybody.
And so when I saw that unsupervised sentiment neuron project from Open AI, I thought that was really interesting.
And they followed that up with the first GPT paper, which kind of applied transformers to this.
And in the process, you know, made a much better sort of text analytics model.
The first GPT one, you know, it was really kind of using a generative model more for classification than for generation.
It was more like, you know, can we use a generative pretrained model to understand text rather than can we use it to create text?
Because at the time, creating text seemed too hard.
And then, of course, you know, GPT2 came out and had really astonishing text generation capabilities.
And not just that, but also already had started to learn things about the world that were very difficult to teach any AI system before.
I remember they have this story about unicorns in South America being studied by some university professor and that the model could remember that, like, in South America, people speak Spanish.
And, you know, there's a country in South America called Peru.
and like there's mountains in that country and you know it's like wow the the amount of facts that
this model is able to recall after only being trained on an enormous amount of text is really
shocking right what do you think when people say that it's just these models just predict the
next word don't get too excited about it i mean the what the what you're describing it seems like
something more yeah i mean it's always possible to to get very reductive with systems i mean you
could say that um i'm just meat right
I'm just a monkey made of meat and like you know everything that's happening in my head is also just
energy minimization like there's chemistry happening in my head I had someone who is to tell me that
yeah love is not love it's just a chemical I think you're totally right it's just a chemical yes it's a chemical
but also there might there's something more here so you're saying with LLMs yeah I mean I think I mean
so the fact that chemistry is involved in our own consciousness doesn't make our consciousness
less interesting to me. The fact that like, you know, neural networks are trained to predict the
next word and, you know, and that may not be like the ultimate way of training them. You know,
we're learning how to do this, right? So maybe we'll come up with a better way tomorrow. I'm not
attached to that particular way, but I also don't think that understanding a little bit of how
something works takes away from the magic. Chat GPT, when that comes out, I mean, obviously you had
already been pretty impressed by GPT one and two. We're already at three, three and a half, right?
by the time chat chp t comes out in november 2022 and then this stuff explodes your reaction yeah
like what was it like sitting where you were it was just extraordinary i mean the amount of change
that chat ch pt brought to the world uh incredible i didn't i thought it was kind of cheeky of open
a i to release it at the same time as the nureps conference because um you know usually the ai world
is entirely focused on like the cool papers that are coming out at the conference but
But instead, the entire world was focused on this chatbot, you know, that was doing things that, you know, no one had ever seen a chatbot do before.
And, you know, to me, that was a statement that we were entering a new era of AI where applied research, um, uh, starts to dominate, you know, so chatchipti didn't come out with a fully fledged academic paper that described exactly what they did to make it so awesome.
Um, but because the results were so strong, it kind of dominated.
the academic discussion.
And I felt like that was really interesting in terms of a watermark, a watershed moment
for sort of the maturity of the AI industry, you know, that it was now possible to create
systems that would solve problems in ways that we'd never seen before if we applied
some really good engineering and applied research to it.
And so that, you know, definitely, definitely changed the world.
And since then, you know, my world has been just continuously on fire.
You know, every day I open my email.
There's a new awesome result.
It's really exciting times.
And working at Nvidia, one of my favorite things about working at Nvidia is that we get to collaborate with people from all sorts of companies and institutions.
And we get to sort of rejoice in the good work that's happening around the industry because at the end of the day, you know, it's really exciting to see AI flourish.
that's our mission, is to make AI flourish everywhere.
And so when I open my email and see all these great results, it always makes me happy.
Do you think that we're going to get to artificial intelligence that's on par greater than human level intelligence?
I don't really like that question because I don't know what human intelligence really is.
For example, I think that Cardi B is extremely intelligent.
She is able to capture the attention of hundreds of millions of people by doing things that,
I'm not exactly sure why they're so interesting, but they totally are, right?
There's a lot of people that would love to do that, but don't have the kind of intelligence that she does in order to make that work.
What is Cardi B's SAT score?
I have no idea.
It's not very interesting.
Well, yeah, there's book smarts and emotional sparts and other forms of brilliance.
There's eight billion forms of brilliance on this planet.
This is the thing, though.
These models are getting good at everything, right?
They're making music.
They're writing books.
They're making videos.
so there's a world there where you could say it can approximate there's a chance i mean getting just
to the baseline of human intelligence is one thing but there's a chance that this stuff can maybe
even exceed some of our most talented people in all spectrums well it you know um AI has been
smarter than humans that many things for a long time i mean when i was in high school deep blue
beat gary caspar off at chess right did that mean that humans stop playing chess no actually it changed
the way humans played chess. It made humans play chess better because humans had new tools to learn.
They had AI to help them learn how to play chess. And the reason we play chess isn't to win,
you know, we play chess because it's part of our culture, because it's interesting, because we like
the challenge, because we like the interaction, you know, that, because it's, it's what we're doing
as humans is exploring, you know, what does it mean to exist? I don't think that AI challenges that.
You know, I've, I've been in a lot of rooms with a lot of smart people. I don't think that it's
necessary for me to be the smartest person in order to have value or to, you know, be interested
or engaged in. No way. I never want to be the smartest person in a room because I'm not learning
that way. Right. Exactly. So I'm not, I'm not threatened by this. So my thesis is AI has always
been smarter than us at some things. The number of things that it's getting better at us is is getting
larger, but that doesn't threaten me. I'm not worried about being obsolete in the same way that I don't
think an oak tree is obsolete what does it mean for a tree to be obsolete like how do you measure
the worth of a tree like are we going to just talk about how tall it is or like you know how many
leaves it has and count them and say well this tree is worth more because it has more leaves and it
made that other one it's just not a very interesting question well it's so interesting that you're
going straight to obsolescence where like some might say this is actually you know if we if a i
equals human intelligence it's not a bad thing like maybe there's actually like yeah yeah it becomes
a tool for us i i think it is a tool for us i i think it is a tool for us
It is interesting, and the way that it's portrayed will take these conversations often to the obsolescence part.
I don't really fear that either.
I don't either.
You know, one person that I really love his thoughts on this is Juergen Schmidt-Huber, and he has said multiple times that a truly intelligent AI is going to be, first of all, not very interested in living on the surface of planet Earth because it can beam it.
itself over the radio at the speed of light anywhere.
And it can live underground.
In fact, underground and different places is better.
There's more resources outside of the crust of planet Earth where we live.
And so I think that, you know, we don't have a lot to fear.
I think the scariest thing for me is, you know, are we going to, you know, not figure out
how to use this technology because I think we desperately need it.
I think our world desperately needs more intelligence.
And so that's our mission.
Yeah, I've been emailing with Juergen trying to get him on the show.
So you're reminding me, A, to follow up here.
I don't know, maybe you can help me put a good word.
Brian, great speaking with you.
Thanks so much for joining.
Great, great to have you on the show.
Yeah, thank you, Alex.
All right, everybody, thanks so much for listening.
And we'll see you next time on Big Technology Podcast.
Thank you.