a16z Podcast - GPT-3: What's Hype, What's Real on the Latest in AI
Episode Date: July 30, 2020In this episode -- cross posted from our 16 Minutes show feed -- we cover all the buzz around GPT-3, the pre-trained machine learning model from OpenAI that’s optimized to do a variety of natural-la...nguage processing tasks. It’s a commercial product, built on research; so what does this mean for both startups AND incumbents… and the future of “AI as a service”? And given that we’re seeing all kinds of (cherrypicked!) examples of output from OpenAI’s beta API being shared — how do we know how good it really is or isn’t? How do we know the difference between “looks like” a toy and “is” a toy when it comes to new innovations? And where are we, really, in terms of natural language processing and progress towards artificial general intelligence? Is it intelligent, does that matter, and how do we know (if not with a Turing Test)? Finally, what are the broader questions, considerations, and implications for jobs and more? Frank Chen explains what “it” actually is and isn’t and more in conversation with host Sonal Chokshi. The two help tease apart what’s hype/ what’s real here… as is the theme of 16 Minutes.
Transcript
Discussion (0)
Hi, everyone. Welcome to this week's episode of 16 Minutes. I'm Sonal, your host, and this is our show where we talk about the headlines, what's in the news, and where we are on the long arc of tech trends. We're back from our holiday break, and so this week we're covering all the recent and ongoing buzz around the topic of GPT3, the natural language processing-based text predictor from the San Francisco Research and Development Company Open AI. They actually released their paper on GPT3 in late May, but only released their broader commercial.
API a couple of weeks ago, so we're seeing a lot of excitement and activity around that
in particular, although it's all being called GPT3. So we're going to do one of our
explainer episodes. It's a 2x explainer episode going into what it really is, how it works, why it
matters, and broader implications and questions while teasing apart what's hype, what's real,
as is the premise of this show. But before I introduce our expert, let me just quickly summarize
some of the highlights. So while GPT3 is technically a text predictor, that actually reduces
is what's possible, because of course, words and software are simply the encoding of human thought,
to borrow a phrase from Chris Dixon, which means a lot more things are possible. So we're seeing,
and note, these are all cherry-picked examples, believable forum posts, comments, press releases,
poetry, screenplays, articles. Someone even wrote an entire article, headlined OpenAIs, GPT3,
maybe the biggest thing since Bitcoin and then revealed midway that he didn't actually write the article,
but that GPT3 did. We're also seeing strong.
strategy documents, like for business, CEOs, and advice written entirely in GPT3, and not just
words, but we're seeing people design using words to write code for designing websites and
other designs. Someone even built a Figma plugin. Again, all of it showing the transmutability
of thoughts to words to code, to design, and so on. And then someone made a search engine that can
return answers and URLs in response to, quote, ask me anything, which is anyone who's been in the
NLP space knows. I was at Park when we spent off PowerSet back in the day, and that's always
been sort of a holy grail of question answering, which you know all about, too, having worked
in this world, Frank. Now, let me introduce you, our expert in this episode, Frank Chen has
written a lot about AI, including a primer on AI deep learning and machine learning, a pulse check
on AI, what's working, what's not, a microsite with resources for how to get started practically
and do something with your own product and your own company, and then reflecting on jobs,
and humanity and AI working together.
You can find all of that on our website.
Frank, to start things off,
what's your favorite example of GPT3 so far?
Mine is founding principles for a religion written in GPT3.
I'd love to hear your favorite
and also your quick take on wily excitement
to start us off before we dig in a bit deeper.
My favorite out of the whole thing is
it's doing arithmetic.
So if you ask it, what's 23 plus 67?
Like just arbitrary two-digit arithmetic.
It's doing it.
This is a natural language processing model.
And so basically, it got trained by feeding it lots and lots of text.
And out of that, it's figuring out, we think, how to do arithmetic, which is very, very surprising.
Because you don't think that, like, exists in texts.
The excitement potentially is promising signs of, you know, progress towards general artificial intelligence.
So today, if you want to do very highly accurate, uh,
natural language processing, you build a bespoke model. You have your own custom architecture.
You feed it a ton of data. What GPT3 shows is that they train this model once, and then they
throw at a whole bunch of natural language processing tasks like fill in the blank or inference
or translation. And without retraining it at all, they're getting really good results compared to
finely tuned models. Before we even go into teasing apart what's hype, what's real, let's first
talk about the it. What is GPT3? So we have two things. One, we have a machine learning model.
GPT is actually an acronym. It stands for generative, pre-trained transformer. We'll go through all
those in a sec. But thing one is we have a pre-trained machine learning model that's optimized
to do a wide variety of natural language processing tasks like reading a Wikipedia article
and answering questions from it or guessing what the ending of a story should be or so on and
so on. So we have a machine learning model. The thing that people are playing with is an API
that allows developers to essentially ask questions of that model. So instead of giving you the
model and you program it to do what you want, they're giving you selective access via the
API. One of the reasons they're doing this is that most people don't have the compute infrastructure
to even train the model. There's been estimates that if you wanted to train the model from
scratch, it would cost something like $5 to $10 million of cloud compute time. That's a big,
big model. And so they don't give out the model. And then two, the controversy around this
thing when they released the first version was they were worried that if they just gave the raw
model out, people would do nefarious things with it, like generate fake news articles that you
would just like saturate bomb the web. And so they're like, look, we want to be responsible with
this thing. And so we'll gate access via API. So then we know exactly who's using it. And then
the API can be a bit of a throttle on what it can and can't do. Right. Well, all helping them
learn. And just as a reminder, APIs are application programming interfaces. We've talked a lot
about on the podcast. And people want to learn more. It can go to A6NZ.com slash API to read
all our resources, explainers. There's so much we have on this whole topic. But the key
underlying idea, and this goes to your point about the cost of what it would take if you were
trying to build this from scratch, is APIs give developers and other businesses superpowers
because they lower the barrier to entry in this case for anyone being able to use AI who
doesn't necessarily have a whole in-house research team, et cetera. And so that's one of the
really neat things about the API. But I do want to correct one misconception that folks out there
aren't aware of when it comes to GPT3. What they're describing as GPT3, they're actually playing
with OPI's API, which is not just GPT3. Obviously, some of the technical achievements of GPT3 are
in the API, of course, but it's a combination of other things. It's like a set of technologies
that they've released, and it's their first commercial product, in fact. So that's just to get
people a little context on what the it is and isn't there. Let's go ahead and go a level deeper
into explaining what it is.
And their paper, they describe it simply
as an auto-regressive language model.
Can you share what it is
and kind of the category this fits in?
Yeah, so the broad category of things it fits into,
it is a neural network or a deep neural network.
And architectures basically talk about
the shape of those networks.
At the highest level, visualize it as a something
comes in on the left, and then I want something
to shoot out on the right side.
And in between is a bunch of nodes
that are connected to each other.
and the way in which those nodes are connected to each other,
and then the connection waits, that's essentially the neural network.
GPT3 is one of those things.
Technically, it's called a transformer architecture.
This is an architecture for neural networks that Google introduced a few years ago,
and it's different than a convolutional neural network,
which is great for images.
It's different than a recurrent neural network,
which is good for simple language processing.
The way the nodes are connected to each other
results in it being able to do essentially computations on large sentences filled with different words
and doing it concurrently instead of sequentially. So RNNs, which were the former state of the art
on natural language processing, they're very sequential. So they'll kind of go through a sentence
a word at a time. Recurrent, right? Exactly. These transformer networks can basically sort of
consider the entire sentence in context while it's doing its computation.
One of the things that you classically have to do with natural language processing is you have to disambiguate words.
I went to the bank.
That could mean I went to go withdraw some money, or it could mean I went right up to the edge of the river.
Because we have ambiguity in these words.
The natural language processing system needs to figure out, well, which sense of bank did you mean?
And you need to know all the other words around that sentence in order to disambiguate it.
And so these transformers consider large chunks of text in trying to make that decision.
all at once instead of sequentially.
So that's what the transformer architecture does.
And then what OpenAI has been doing is basically transforming this type of neural network
with the transformer architecture on larger and larger data sets.
Conceptually, think of it as you have it read Wikipedia, and think of that as Generation 1.
Generation 2 is I'm going to have it read Wikipedia and all of the open source textbooks that I can find.
This generation, they trained it on what's called Common Crawl.
It's kind of the same thing that Google uses to search and index the internet.
There's an open source version of that.
Think of it as robots go onto every web page.
They gather the text.
And now we're using that as the training set for GPT3.
Yeah, something like half a trillion words, I believe.
Yeah, it's a crazy number of words.
And then this thing has two orders of magnitude more than the previous attempts.
something like 175 billion parameters.
For the purposes of this conversation,
a way of measuring the complexity of a neural network.
Right. GPT2 had 1.5 billion.
And in between GPT2 and 3,
Microsoft did one that was 17 billion, right?
So, like, there is a bit of an arms race here going on,
which is, like, how big are your neural networks?
What does it mean?
Because the paper is called language models are few shot learners.
And I remember this movement in one-shot learning
where you can learn on very few examples.
But honestly, what you just described to me
sounded like almost a trillion examples
when you think about what it's ingesting as an input.
So can you actually explain what Fushot even means in this context?
Yeah. So first, they trained this model on the Internet.
Basically, what came in as input on the left side
was reams and reams and reams of text,
all the text they could get their hands on,
and they cleaned it a little.
And so this is very traditional deep learning.
It is not itself a zero shot.
or few shot approach. It's deep learned, which means I have incredible amounts of input text.
What they mean in the context of this paper around no shot and few shot is the model can
perform a variety of natural language processing tasks. So a good example of it is analogies.
King is to queen as water is to what, right? In the context of this system, what you can do is
you could give it an example of that, and they call that one shot, which is I'm going to give you an
example of an analogy that's completely filled out, and then I want you to fill out more analogies.
Another task would be pick the right ending of a story, and I will give you one example with the
correct answer. So I'm just going to give it to you once. Now, typically what happens when you do
traditional neural network learning, you take an example, you give it to the system, and you tell the
system the right answer. The system uses that right answer to basically readjust the neural net. It's
called back propagation. And the theory is that as it adjusts the weights inside the neural
network, it will get that answer more correct the next time it sees it. And so everything up
into this point has basically been, if I give you enough examples, I'm going to be able to tell
whether that picture has a hot dog in it or not. I will be able to generalize the features of a
hot dog and I will basically deduce hot dogness if you just give me enough pictures and you tell
me hot dog or not. What's going on here is
they train this model once, and then they give it one example. That example doesn't adjust the
weights of the model. It really just primes the system to basically prepare it to answer this type of
question. So you basically tell it, look, I want you to work on fill in the blank. And I'm going to
give you one or a few examples, few shot, of this. And then we'll go from there. But those examples
that you give it don't adjust the weights of the model. It's one model to rule them all. And
this is kind of how humans learn.
They don't need to see a thousand, 10,000, 100,000 examples of hot dogs
before they can start reliably telling whether it is hot dog or not.
It's like how children learn language.
Yeah, exactly.
Babies before they can say cat and dog can recognize the difference between cat and dogs,
they didn't see a million of them, right?
In fact, they can't say the words dog and cat yet.
And so maybe something like this is going on in the brain,
which is you have this sort of general processor,
and then it instantly knows how to adapt itself to solve a lot of different problems,
including problems it had never seen before.
And so I'm going to go back to my favorite example of what GPT3 was used for.
Like, how in the world did it deduce the rules for two-digit arithmetic by reading a lot of stuff?
And so maybe this is the beginnings of a general intelligence that can rapidly adapt itself.
Now, look, I don't want to get ahead of myself.
it falls apart on four-digit arithmetic,
and so it's not generally smart yet.
But the fact that it got all of the two-digit addition
and subtraction problems right by reading text.
Like, that's crazy to me.
The general takeaway is that it does some complicated things really well
and some really easy things really badly,
and this is actually true of most AI.
The researchers have a huge section on limitations
where, quote, GP2-3 samples can lose coherence
over sufficiently long passages,
contradict themselves,
and occasionally contain non-sequiters sentences or paragraphs.
Now, of course, as an editor, that made me laugh because that's also true of human writing.
So I was like, okay, this is also true of the writing I've seen and edited.
So I don't know who's talking here.
Help me tease apart where we really are in this long arc.
I'm having a hard time knowing what's real, what's not.
Like, help me kind of understand what is this thing, really at this moment in time.
So we have the most sophisticated natural language processing pre-trained model of its kind.
The natural language processing community has basically divided the problem of understanding language into dozens and dozens of sub-tasks.
And task after task after task, GPT3 goes up against the state-of-the-art, the best-performing system.
And basically what the paper does is lay out, okay, here's where GPT-3 is approaching state-of-the-art, here's where it's far away from state-of-the-art, and that's basically all we know is compared to state-of-the-art.
of the art, techniques for solving that particular natural language processing task,
how does it perform? We're really in the research domain. Right. So if you were to ask me,
can I build a startup on it? Can I build the world's best chat bot on it? Can I build the world's
best customer support agent on it? I was going to ask you that. Yeah. I think it's really too
early to tell whether you can build any of those things. The hope is that you could. And long term,
really the hope is having built a model like this and exposed an API, you could take any Silicon Valley
startup that wants to solve a text problem, chatbots, or pre-sale support, or post-sales customer
support, or building a mental health app that talks to you. All of those things will get dramatically
cheaper and faster and easier to build on top of this infrastructure. If this works, you have this
generally smart system that's already been trained. Then you show it a couple examples of problems that you
want to solve, and then it will just solve them with very high accuracy. All you have to do as a
startup or a programmer is to say, hey, look, I'm going to give you a couple examples of the type of
problem that I want solved. And then that priming is going to be enough for the system to get
very accurate results, and in fact, sometimes better results than if you had built the model
and fed it the data set yourself. So that's the hope, but we just don't know yet.
that's a really good reminder because they themselves are like this is early days it's research there's a lot of work to be done but it's also really exciting as you're saying because this is one of the most advanced natural language models we've seen so the question i have then on the startup and building side what would it take to what are the kinds of considerations to make it more practical and scalable i mean for one thing the size you described how the transformer has this ability to sort of comprehend so much at once without doing it in kind of this RNN model but the tradeoff is
of that is that it's so slow or be able to fit on a GPU. So I'd love to have a quick take from
you on what are the things that need to happen to make something like this more usable, et cetera.
I think what's going to need to happen is that the Open AI product team is going to have
conversations with dozens and dozens of startups that are using their technology. And then
they successfully refine the API and improve the performance and set up the security rules and all
of that so that it becomes something as easy to use as, say, Stripe or Twilio.
Stripe or Twilio, we're very straightforward. Send a text message or processes payment.
This is a lot more amorphous, which is, hey, I can do SAT analogies. How's that relevant for my
startup? Well, there's a bit of a gap there, right? You have a startup that's like, hey, I need
my document summarized, or I need you to go through all of the complaints we've ever gotten
and give me product insight for product managers. And so there's basically a divide between
there, and it means they closed over time.
Right. So what does this mean with the data world?
Because one really interesting thing to me is on one hand,
APIs give you superpowers, kind of democratizing things.
On the other hand, it kind of makes things a bit of a race at the bottom then,
because then you have to differentiate on kind of private, proprietary,
these other elements.
So do you have thoughts on what that means?
Yeah.
I mean, the hope for something like a GPT3 is that it's going to dramatically reduce
the data gathering, cleansing, cleaning process,
and frankly building the data model as well.
your machine learning model. So let me try to put it in economic terms. Let's say we put
$10 million into a series A company and then $5 million of it goes to getting data and cleaning
it and hiring your machine learning people and then renting a bunch of GPUs in Amazon or Google
or Microsoft wherever you do your compute. The hope is that if you could stand on the shoulders
of something like GPT3 and it'll be a future version of it, you would reduce those costs from
$5 million to $100,000. You're basically making APs.
P.I calls and the way you program, quote unquote, this thing is you just show it a bunch of
examples that are relevant to the problem that you're trying to solve. So you show it texts
where you had a suicide risk and you don't need to show it a bunch because it's pre-trained
and you show it a new text that it hasn't seen before and you ask it, what is the risk of
suicide in this text exchange? The hope is that we can dramatically reduce the costs
of gathering that data
and building the machine learning models.
But it's really too early to tell
whether that's going to be practical or not.
So we know what it means for startups,
but how do the incumbents respond
in that kind of a world?
It seems almost inevitable
that the big players,
there might be an AWS potentially, right?
That could make this a given in their services.
Like this kind of bigger question
around this business model
of AI as a service.
Yeah, so the first thing I'll say
is this is OpenAI's first commercial product,
which is interesting, right?
recall that Open AI started as a research institution, so we'll sort of see what the pricing is.
If this works, the scenario that I described earlier, which is dramatically reduce the time, it takes to build a machine learning inside product, then all of the public cloud providers and other startups will offer competing products because they don't want to let Open AI just take all of the sort of text understanding ability of the internet, right?
Google Cloud and Microsoft and Amazon and Baidu and Tencent,
they're all going to say, hey, look, I can do that too.
Build your application on me.
Now, I will say that because of the large costs of training the model,
so I mentioned estimates ranging from $5 to $10 million to train this thing once,
and obviously they didn't train it once to get to where they were.
They trained it multiple times as they did the research process.
And so this is not going to be for the faint of heart.
it's going to come on the back of a lot of money
with very skilled scientists using enormous infrastructure
but to the extent that this product works
then you're going to have very healthy competition
among all of the incumbents you might even have new players
who figure out a different angle on it
you know it's really fascinating watching the people who have access
and basically the recurring theme is that
it's not like plug and play it's obviously not built and ready for that yet
the prompt and the sampling hyperparameters matter a lot
priming is an art, not a science. So I'm curious for where you think the knowledge value is going to go
in the future. What are the sort of the data scientists of the future going to look like for people
who have to work with something like this? Now, granted, the models are going to evolve, the
API will evolve, the product will evolve. But what are the skills that people need to have in
order to really do well in this world coming ahead? It's really too early to tell, but it is a
fundamentally different art of programming. So if you think of programming to date, it's basically
I learned Python and I learned to be efficient with memory and I learned to write clever algorithms that can sort things fast.
That's well understood art, thousands of classes, millions of people know how to do that.
If this approach works, basically, there is this massive pre-trained natural language model and the programming technique is basically I show you a couple examples of the tasks that I want you to perform.
it'll be about what examples do I show you and in what form and do I show you the outliers
or do I show you some normal ones, right? And so if this approach works, it'll all be about
how do you prime the model to get the best accuracy for the real world problems you actually
want your product to solve. Programming becomes what examples do I show you as opposed to
how do I allocate memory and write efficient search algorithms? It's a very different thing.
Vitolic Buteran, the inventor of Ethereum, described this when he was observing some of this buzz around GPT3, that, quote, I can easily see many jobs in the next 10 to 20 years changing their workflow to human describes, AI builds, human debugs.
There's a lot of speculation about how this might affect jobs.
It can displace customer support, sales support, data scientists, legal assistance, and other jobs like that are at risk.
But do you have thoughts on the labor and jobs side of this?
Like just sort of the broader questions and concerns here?
The way I think about this generally informed a lot by Eric Brin-Hompson and other people.
So if you think about a job as a set of tasks, some tasks will get automated.
And then some tasks will be stubbornly hard to automate.
And then there will be new tasks.
And so think of jobs as sort of an ever-changing bundle of tasks, some of which are performed by
humans today, some of which get automated, and then there are new tasks. And so what Vitolik describes,
if this AI stuff works, being able to prime the AI system with the right examples and then being
able to debug it at the end, those are two new tasks. No human on the planet gets paid to do that
outside of AI researchers today. But that could be mainstream knowledge work in 10 years,
which is you pick a good examples and then you debug it at the end. So you have these brand new tasks
that are generating economic value and people get paid for them.
didn't exist before.
I find it very fascinating what you said, by the way, because what it also means to me
is it becomes more inclusive for more people to enter the worlds that might have been previously
closed off to a certain class of type of programmers or people who have certain technical
skills, because let's say you're very good at describing things and it's more of an art
than a science, and you're very good at sort of fiddling with and hacking at things.
You might be better off than someone who went through like years and years of elite PhD
education at tuning something than someone else.
I think the machine learning algorithms will invite more people
who would otherwise be discouraged in pursuing careers
in careers they wouldn't have naturally risen to the top of.
So I think you're right.
What do you make of the concern?
There was concerns that GPT3, these answers that it gave that it predicted,
were riff with racism or stereotypes.
What do you make of the data issues around that?
Okay, we're going to feed it every piece of text on the internet
and then we're going to ask it to make generalizations.
What could possibly go wrong?
A lot could possibly go wrong.
If you look at the heart of this system is basically, I'm trying to guess the next word.
And the way I make my guests is I go look at all the documents that have been written ever.
And I ask what words are most likely to have occurred in those documents, right?
You're going to end up with culturally offensive stereotypes.
And so we need to figure out how do we put the safety rails, how do we erect the APIs?
I'm glad they open AI researchers and the community around them are being very careful about this because we obviously have to.
how do we basically teach it the social norms we want it to emit
as opposed to the ones that it found by reading texts?
Another whole philosophical sidebar,
but really important is if you think about the Internet
as a sum total of human knowledge
and other things that reflect many of the realities in the world,
which are atrocious and awful in many cases,
the flip side of it is it's a lot harder to change the real world
and people and behavior and society and systems,
but probably a hell of a lot easier to change a technical system
and be able to do certain things.
So to me, what's implicit in what you said
is that there's actually a solution,
you don't mean to be solutionistic,
but that's within the technology
that you don't necessarily get from IRL in real life.
Yeah, that's exactly right.
And if it were in algorithm land, so to speak,
where we are, right, GPT3 and its descendants,
let's say GPT17 gave you a text document, right?
It wrote a text document for you.
You could take that document
and put it through whatever,
filter that you wanted, right, to filter out sexism or racism. And that layer could be inspectable
and tunable to everybody. You didn't know how GPT17 came up with its recommendations, but you have this
safety net at the end, which is you can filter out things that you don't want. So you have
the second step that you can actually put into your system. Do you don't have to depend just on the
first thing? You can catch that at a subsequent stage. Right. You can have sort of a system of checks
and balances. So a broad meta question. One of my favorite post was from Kevin Lacker, and he basically
gave GPT3 a Turing test. And he tested it on these questions of common sense, obscure trivia, logic.
And one of the things he observed is that, quote, we need to ask it questions that no normal human
would ever talk about. And so he said, if you're ever a judge in a Turing test, make sure you ask
some nonsense questions and see if the interviewer responds away a human would. Because the system
doesn't know how to say I don't know. And this goes at this question of what does a Turing
test tell us. And there's been a lot of work, as you know, over the years about what the modernization
of the Turing test, like in 2016, Gary Marcus, our friend Gary Marcus, Francesca Rossi, and
Manila Veloso published an article beyond the Turing test in AI magazine. Barbara Gross of Harvard
wrote a piece called What Question Would Turing Pose Today in AI Magazine in 2012? And she basically
starts by saying that in 1950, when Turing proposed to replace the question, can machines think
with the question, are there computers which would do well in the imitation game? At the time,
computer science wasn't a field of study. You know, Klaude Chana's Theer of Information was just
getting started. Psychology was just only starting to go beyond behavior. And so what would
Turing ask today? You probably propose a very different test. And so the question I really wanted
to ask you is how do we know if the thing is measuring what it's supposed to measure or answering
what it's supposed to answer or that it's getting smarter, I guess. This is more a philosophical
question than an engineering question. So why don't I say what we know? And then I'll widely
speculate on the other stuff. That's great. That's life and science. I'll go for it.
So basically, if you read the paper, you'll see that it compares GPT3's performance against
various other state-of-the-art techniques on a wide variety of natural language processing
tasks. So, for instance, if you're asking it to translate from English to French, there's this thing
called the blue score. The higher the blue score, the better your translation. And so every test
has its measure. And so what we do know is we can compare GPT3 performance versus other algorithms,
other systems. What we don't know is how much does it really understand? So what do we really
take away from the fact that it aced two-digit arithmetic. Like, what does that mean? What does it
understand of the world? Does it get math? Let's say you had a system that was 100% accurate on
every two-digit arithmetic problem that you ever gave it. It's perfect at math, but it doesn't get
it. Like, it doesn't know that these numbers represent things in the real world. But what does that
mean to claim that it doesn't get it? That's a philosophical question. Right. It's philosophical because
the question then becomes, does even matter if it comes to applying things practically, because I think
about this from the world of education, you know, there's a big focus on metacognition and the
awareness of knowing what you know and don't know. But at a certain point, if the kid is doing
well on the test and the test is applicable to the world and they can basically survive and do well,
does it even matter if they really understood what arithmetic really means? As long as they can
solve the problem when you go to the store, that I give you a dollar, I get five cents change
back. You know what I mean? That's exactly right. And if you generalize that out to other tasks
that humans solve in the real world,
imagine you just got good at 100
and then 1,000 and then 10,000 of the tasks
that you had never seen before,
let's say descendants of GPT3
got that good at a wide variety of language tasks.
What does it mean to insist,
but it doesn't get the world?
It doesn't get language, right?
That's fantastic.
I'd love to get sort of your perspective
on how do we think about
this broader arc of innovation
that's playing out here.
Daniel Gross called GPT's three screenshots
It's the TikTok videos of nerds, and there's something to that.
It's kind of created this inherent virality.
So I'm curious for your take on that.
On the one hand, some of the most important technologies start out looking like a toy.
Chris Dixon paraphrased a really important idea from Clayton Christensen about how disruptive innovation happens.
But a lot of the people who are researchers really emphasize this is not a toy.
This is a big deal.
There are a lot of TikTok-ish-like videos that are coming out of the whole playground,
which is basically a place where you can try out the model.
And on the one hand, people are saying it's a toy because they're in the sandbox
and they're basically having fun, feeding it prompts.
Some of those examples are actually really good and some of those are like comically bad.
Right.
So it feels toy-like.
The tantalizing prospect for this thing is that we have the beginnings of an approach to general
intelligence that we haven't seen us make this much progress on before,
which is to date, if you wanted to build a specific system for a specific natural language processing
task, you could do that. Custom architecture, lots of training data, and lots of hand-tuning, and
lots of, like, Ph.D. time. The tantalizing thing about GPT3 is it didn't have an end use case
in mind that it was going to be optimal for, but it turns out to be really good at a lot of them,
which kind of is how people are. You're not tuned to, like, learn polka or do you.
double-entry bookkeeping or learn how to audio edit a podcast. Like you didn't come out of the
womb with that. But your brain is this general purpose computer that can figure out how to get
very, very good at that with enough practice and enough intentionality. Well, it's really great that
you use the word tantalizing because if you remember the Greek myth root behind it, tantalus was
destined to constantly get this like tempting fruit dangling above him as punishment. And it was
so close, yet so out of reach.
the same time. So bottom line it for me, Frank. It's tantalizing, right? Now look, there's a limit to
how big these models can get and how effective the APIs will be once we sort of, you know,
unleash them to regular programmers. But it is surprising that it is so good across a broad
range of tasks, including ones that the original designers didn't contemplate. So maybe this is
the path to artificial general intelligence. Now, look, it's way too early to tell. So I'm not saying
that it is. I'm just saying it's very robust across a lot of very different tasks and that's
surprising and kind of exciting. Thank you so much for joining this episode of 16 minutes, Frank.
Awesome. Thank you, Sonal, for having me.