Y Combinator Startup Podcast - The REAL Potential of Generative AI
Episode Date: March 6, 2023What is a large language model? How can it be used to enhance your business? In this conversation, Ali Rowghani, Managing Director of YC Continuity, talks with Raza Habib, CEO of Humanloop, about the ...cutting-edge AI powering innovations today—and what the future may hold. They discuss how large language models like Open AI's GPT-3 work, why fine-tuning is important for customizing models to specific use cases, and the challenges involved with building apps using these models. If you're curious about the ethical implications of AI, Raza shares his predictions about the impact of this quickly developing technology on the industry and the world at large. Apply to Y Combinator: https://www.ycombinator.com/apply/
Transcript
Discussion (0)
You've heard of large language models like chat GPT.
Chat GPT. Chat GPT.
Chat GPT.
It can answer questions, write stories, and even engage in conversation.
But if you want to build a business that uses this technology, you'll need to ask yourself an important question.
How do I take this raw model and this raw intelligence and actually customize this to my use case?
How do I make it really good for my users so that it's differentiated and better than what's out there?
This is Raza Habib.
His company, Human Loop, enables large language models.
to have even greater superpowers.
We can help you build
differentiated applications and products
on top of these models.
The range of use cases is like
now feels to be more limited by imagination
than it is limited by technology.
You can replicate your exact writing style,
customize tone, fact check answers,
and train the model on your company's unique data.
We really hope that this is a platform
on top of which, you know,
the next million developers can build LLM applications.
In our conversation,
we explore the secrets to building an app
that stands out.
What made it so good that a million users signed up in five days was a fine-tuning exercise.
The impact of generative AI on developers.
They're finding a significant fraction of their code is being written by a large language model.
And what the future of large language models might bring to society as a whole.
It's an ethical mind field.
There are going to be societal consequences on the path to AGI.
Potential benefits are huge as well, but we do need to tread very carefully.
Let's start like basics and high level.
Like what is a large language model and why is it that they've suddenly sort of made a splash?
I assume they've been around a lot longer than the past year or two.
Yeah, so language models themselves are really old concepts and old technology and really
all it is as a statistical model of words in English language.
You take a big bunch of texts and you try to predict what is the word that will come
next given a few previous words.
So the cat sat on the mat is the most likely word and then you have a distribution over
all the other words in your vocabulary. As you scale the language models, both in terms of the
number of parameters they have, but also in the size of the data set that they're trained on,
it turns out that they continue to get better and better at this prediction task. Eventually,
you have to start doing things like having world knowledge. You know, early on, the language model
is learning letter frequencies and word frequencies, and that's fairly straightforwards,
and that's kind of what we're used to from predictive text in our phones. But if the language model
is going to be able to finish the sentence, today the president of the United States,
it has to have learned who the President of the United States is.
If it's going to finish a sentence that's a math problem,
it has to be able to solve that math problem.
And so where we are today is that, you know, I think starting from GPT 1 and 2,
but then GPD 3 was really the one that I think everyone said,
okay, something is very, very different here.
We now have these models of language that they're just models of the words, right?
They don't know anything about the outside world.
There's loads of debates about whether they actually understand language,
but they are able to do this task extremely well.
and the only way to do that is to have gotten better at some form of reasoning and some form of knowledge.
What are some of the challenges of using a pre-trained model like chat chippy T?
So one of the big ones is that they have a tendency to confidently bullshit or hallucinate stuff.
I think Nat Friedman described it as alternating between spooky and kooky.
Sometimes it's so good that you cannot believe the large language model was able to do that,
and then just occasionally it's horrendously wrong.
And that's just to do with how the model is originally trained.
they're trained to do next word prediction
and so they don't necessarily know
that they shouldn't be dishonest.
Yeah, sometimes they get it wrong.
Sometimes they get it wrong,
but the danger is that they confidently get it wrong.
So, and very persuasively,
you know, very authoritatively, they get it wrong.
And so people might, you know, mistakenly trust these models.
So there's a couple of ways that you can, you know,
hopefully fix that.
And it's an open research question.
But the way we can help you with Human Loop to do this today
is we make it very easy to pull in a factual context
to the prompt that you give to the model.
And so the model is much more likely to use that rather than make something up.
And so we've seen that as a very successful technique for reducing hallucinations.
Terrific.
And this is an element to building a differentiated model for your use case.
Absolutely.
And an element for making it safe and reliable.
Right.
Yeah.
You know, I think when ChatGPT came out, there was a lot of frustration from people who didn't like its personality.
The tone was a bit obsequious and it's, you know, it'll defer.
It doesn't want to give strong opinions on things.
And to me, that demonstrates the need for, you know, many different types of models and tone and
customizations depending on the use case and depending on the audience, and we can help you do that.
Can you talk a little bit about what it means to fine tune a model and why that's important?
If you look at what the difference is between ChatGPT or the most recent OpenAI text
DaVinci 3 model and what's been in the platform for two years and has not gotten as much
attention, the difference is fine-tuning. It's the same base model more or less. You can see it on
the OpenAI website. It's one of their code pre-trained models. And what made it
so good that a million users signed up in five days was a fine-tuning exercise. And so what
fine-tuning is, is gathering examples of the outputs you want for the tasks that you're trying
to do, and then doing a little bit of extra training on top of this base model to specialize
it to that task. What OpenEI, I think, did first and others have followed to do is to first
do a fine-tuning round of these models on input and output pairs that are actually instructions
and the results that you would like from the instructions.
So those are human-generated pairs of data.
And then to further fine-tune the model
using something called reinforcement learning from human feedback
where you get human preference data.
So you show people a few different generations from the model,
ask them to rank them or choose which of two they prefer,
and then use that to train a signal
that can ultimately fine-tune the model.
And it turns out that reinforcement learning from human feedback
makes a huge difference to performance.
It's really hard to understate that
in the instruct GPT paper that OpenEI released,
they compared a one or two billion parameter model
with instruction tuning in RLHF to the full GPT3 model,
and people preferred that,
despite the fact it was a hundred times smaller.
Anthropic had this very exciting paper just a couple of weeks ago
where actually we're able to get similar results to RLHF
without the H.
So just actually having a second model
provide the evaluation feedback as well,
and that's obviously a lot more scalable.
And what data do developers need to bring
in order to fine tune a model?
So there's this kind of two types of fine-tuning you might do.
They might just show up with a corpus of books or some background.
They just want to fine-tune for tone.
They have their company's chat logs or tone of voice from marketing communications
and they just want to adjust the tone.
Right.
Or all the emails they've sent costs.
Or all the emails they've sent, for example.
That's kind of almost extra pre-training, I would think about it as, but it's fine-tuning as well.
And then the other fine-tuning data comes actually from in-production usage.
So once they have their app being used, they're capturing the data that their customers are providing.
They're capturing feedback data from that.
And in some sense, it's being automated at this point, right?
Like, Human Loop is taking care of that data capture for you, and it's making the fine-tuning easy.
So you have an interaction with the customer that the LLM produces, and the customer sort of gives a thumbs-up or thumbs-down as to whether that was helpful.
To give you a concrete example, you know, imagine, you give the email example.
Imagine that you're helping someone draft a sales email.
and so you generate a first draft for them,
and then they either send it or they don't.
So that's like a very interesting piece of feedback
that you can capture.
They probably edit it,
so you can capture the edited text,
and they may be get a response
or they don't get a response.
So all of those bits of feedback
are things we would capture
and then use to drive improvements
of the underlying model.
Got it.
If a developer is trying to build an app
using a large language model
and is doing it for the first time,
what problems are they likely to encounter,
and how do you guys help them address some of those problems?
Yeah, so we typically help developers
with kind of three key problems.
One is prototyping, evaluation, and finally customization.
Maybe I can sort of talk about each of those.
So at the early stages of developing a new large language model product,
you have to try and get a good prompt that works well for your use case.
That tends to be highly iterative.
You have hundreds of different versions of these things lying around.
Managing the complexity of that versioning, experimenting,
that's something we help with.
Then the use cases that people are building now tend to be a lot more subjective
than you might have done with machine learning before.
And so evaluation is a lot harder.
You can't just calculate accuracy on a test set.
And so helping developers understand how well is my app working with my end customers
is the next thing that we really make easy.
And finally, customization.
Everyone has access to the same base models.
Everyone can use GPT3.
But if you want to build something differentiated,
you need to find a way to customize the model to your use case,
to your end users, to your context.
And we make that much easier, both through fine tuning
and also through a framework for running experiments.
We can help you get a product to market faster, but most importantly, once you're there,
we can help you make something that your users prefer over the base models.
That seems pretty fundamental.
I mean, it's prototyping, getting you the first versions out, testing and evaluation, and then differentiation.
This seems pretty fundamental to building something great.
I think so.
I mean, we really hope that this is a platform on top of which, you know, the next million developers
can build LLM applications.
And we worked really closely with some of the first companies to realize, you know, the importance
of this, understood.
the pain points they had and in a proper YC approach, have tried to build something that those
people really wanted. And I think we've got to a point that now we're seeing from others, that it
really does solve acute pain points for them. And it doesn't really matter to us what base language
model you're using. We can help you with the data feedback collection, with fine tuning,
with prototyping. And those problems are going to be very similar across different models.
And really, we just want to help you get to the best result for your use case. And sometimes that'll
mean choosing a different model. I wanted to ask, how is the job or role of a developer likely to
change in the future because of this technology? This is interesting. I've thought about this a lot.
I think in the short term, it augments developers, right? You can do the same thing you could do faster.
To me, the most impressive application we've seen of the large language model so far is GitHub co-pilot.
I think that they cracked a really novel U.X and figured out how to apply a large language model in a way that's now used.
by I think 100 million developers,
and many people I speak to
who say that they're finding a significant fraction of their code
is being written by a large language model.
And I think if you'd ask people,
will that happen two years ago?
No one would have thought on that.
One thing that is surprising to me
is that the people who say to me they use it the most
are some of the people I consider to be better
or more senior developers.
You might have thought this tool would help juniors more.
But I think people who are more accustomed to editing
and reading code actually benefit more from the completions.
So short term, it just accelerates us and allows us to do more.
On a longer time horizon, you could imagine developers becoming more like product managers
in that they're writing the spec, they're writing the documentation,
but more of the grunt work and more of the boilerplate is taken care of by models.
I don't know, long enough type horizon, I mean, there's very few jobs that can be done so much
through just text, right?
We've really pushed it to the extreme.
We've got GitHub and you have remote work.
engineers can do a lot of their jobs entirely sitting at a computer screen.
And so when we do get towards things that look like AGI,
I suspect that developers will actually be one of the first jobs to see large fractions of their job be automated,
which I think is very counterintuitive, but also predicting the future is hard.
Yeah. What do you think the next breakthroughs will be in LLM technology?
So I actually think here the roadmap is quite well known almost.
I think there's a bunch of things that are coming that we are kind of bake,
in, we know where they're coming, we just have to wait for it to be achieved. One thing that I think
developers will really care about is the context window. So at the moment, when you sort of use
these models, there's a limit to how much information you can feed it every time you use it,
and extending that context window is going to add a lot more capabilities. One thing that I'm really
excited about is actually augmenting large language models with the ability to take actions.
And so we've seen a few examples of this. It's a startup called Adept AI that are doing this and a few
others, where you essentially let the large language model decide to take some tasks, so it can
output a string that says, you know, search the internet for this thing, and then off the basis
of the result, generate some more and repeats. You actually start treating these large language
models much more like agents than just text generation machines. Well, something we have to
sort of expect or look forward to is, you know, AI taking actions. Can this technology just
fundamentally be steered in a safe and ethical direction? And how?
Oh gosh, that's a tough question.
I certainly hope so.
And I think we need to spend more time thinking about this
and working on it than we currently do,
because as the capabilities increase,
it becomes more pressing.
There's a lot of different angles to that, right?
So there are people who worry about just like end safety.
So people like Eli Zedukovsky,
in order to distinguish himself from just normal AI safety,
he just talked about AI, not kill everyone, right?
like he thinks the risks are potentially so large that this could be an existential threat.
And then there are just a shorter term threat, it's a social disruption.
People feel threatened by these models.
There are going to be societal consequences, even to the weaker versions on the path to AGI
that raise serious ethical questions.
The models bake in biases and preferences that were in the model and the data and the team that built it at the time that it was being constructed.
So there are, it's an ethical minefield.
I don't think that means we shouldn't do it because I think the potential
benefits are huge as well, but we do need to tread very carefully.
How strong is the network effect with these models? In other words, is it the case that in the
future there may be one model that sort of rules them all because it will be bigger and hence
smarter than anything anyone else could build? Or is that not the dynamic that's at play here?
So I don't think that's the dynamic that's at play here. Like to me, the barriers to entry of
training one of these models are mostly capital and talent.
Like the people needed are still very specialized and very smart, and you need lots of money to pay for GPUs.
But beyond that, I don't see that much secret sauce, right?
Like, you know, opening eye, for all the criticism they get, they actually have been pretty open.
And deep mind have been pretty open.
They've published a lot about how they've achieved, what they've achieved.
And so the main barrier to replicating something like GPT3 is can you get enough compute and can you get smart people and can you get the data?
and more people are following on their heels.
There's some question about whether or not the feedback data
might give them a flywheel.
I'm a little bit skeptical of that,
that it would give them so much that no one could catch up.
Why? That seems pretty compelling.
If they have a two-year head start
and thousands and thousands of apps get built,
then the lead they have in terms of feedback data
would seem to be pretty compelling.
So I think the feedback data is great for narrow applications, right?
Like if you're building an end-user application,
then I think you can get a lot of differentiation
through feedback and customization.
But they're building this very general model
that has to be good at everything.
And so they can't kind of like
let it become bad at code whilst it gets good at something else,
which others can do.
I see. Got it.
Now, let me ask you probably the hardest question here.
Open AI's mission is to build AGI,
artificial general intelligence,
so that machines can be at the cognitive level of humans,
if not better.
Do you think that's within reach?
The breakthroughs recently mean that that
that's closer than people thought?
Or is this still for the time being science fiction?
So there's a huge amount of uncertainty here.
And if you poll experts, you get a wide range of opinions, even if you poll the people
who are closest to it, if you chat to folks at Open AI or other companies, opinions differ.
But I think compared to most people's perception in the public, people think it's plausible
sooner than I think a lot of us thought.
So there are prediction markets on this, meticulous sort of polls people and how likely they
think AGI will be, and I think the median estimate, something like 2040. And if you, even if you
think that that's plausible, that's remarkably soon for a technology that might, you know, upend
almost all of society. What is very clear is that, you know, we are still going to see very dramatic
improvements in the short term. And even before AGI, a lot of societal transformation, a lot of
economic benefit, but also questions that we're going to have to wrestle with to make sure that
this is a positive for society.
So yeah, I think on the short end of timelines, you know, there are people who think 2030 is plausible,
but those same people will accept there's some probability that it won't happen for hundreds of years.
You know, there's a distribution.
If you take it seriously, and I think you should take it seriously, and it's very hard to take it seriously,
even like having made that choice of like, I'm going to accept that by 2030 it's plausible,
that we will have machines that can do all the cognitive tasks that humans can do and more.
and then you ask me like, okay, Rosa, like, are you building your company in a way that's, like, obviously going to make sense in that world?
Like, I'm trying, but it's really hard to internalize that intuitively.
Stuart Russell has a point where he says, you know, if I told you an alien civilization was going to land on Earth in 50 years, you wouldn't do nothing.
And there's some possibility that, you know, we've got something like an alien arriving soon.
Right.
Soon.
An alien arriving soon.
Yeah.
You heard you here first.
So let me ask you, what does this new technology mean for startups?
Oh, man.
It's unbelievably exciting.
It's really difficult to articulate.
There's so many things that previously you required a research team for and that felt
just impossible that now you just ask the model.
Like honestly, stuff that during my PhD I didn't think would be possible for years or that
I spent trying to solve problems on where you want to have a system that can generate
questions or can do something.
be a really good chat bot like chat GPT,
like a realistic one that can understand context
over long ranges of time,
not like Alexa or Siri that's a single message.
The range of use cases is like,
now feels to be more limited by imagination
than it is limited by technology.
And when there is a technology change this abrupt,
where something has improved so much,
YC teaches this, right?
There's sort of a few different things
that open up opportunities for new applications.
And we're beginning to see it,
you know, a sort of Cambrian explosion
of new startup.
I think the latest YC batch has many more startups.
We see it at Human Loop.
We get a lot of inbound interest from companies that are at the beginning of their
explorations and trying to figure out how do I take this raw model and this raw intelligence
and actually turn that into a differentiated product.
Hopefully we have some AI engineers or aspiring AI engineers listening today and might be interested
in working at Human Loop.
Are you guys hiring and what kind of culture and company you're trying to build?
We absolutely are hiring.
We're hoping to build a platform that's,
potentially for one of the most disruptive technologies we've ever had,
and that ideal will be used by millions of developers in the future.
And there's going to be a lot of doing stuff for the first time
and also inventing novel UX or UI experiences.
So full stack developers were comfortable, like genuinely really comfortable up and down the stack,
and who deeply care about the end user experience,
who will enjoy speaking to our customers.
And they're fun customers to work with,
because we're working with startups and AI companies
who are really on the cutting edge.
They're really innovators.
You know, if that sounds exciting to you, it will be very hard.
Less of it will be very new, but it'll also be very rewarding.
Well, this has been really fascinating.
I think what my crystal ball says is that one day in the future,
literally millions of developers will be using your tools to build great applications
using AI technology.
So I wish you luck and thank you again for your time.
Thank you, Ali. It's been an absolute pleasure.
