Y Combinator Startup Podcast - The REAL Potential of Generative AI

Starting point is 00:00:00 You've heard of large language models like chat GPT. Chat GPT. Chat GPT. Chat GPT. It can answer questions, write stories, and even engage in conversation. But if you want to build a business that uses this technology, you'll need to ask yourself an important question. How do I take this raw model and this raw intelligence and actually customize this to my use case? How do I make it really good for my users so that it's differentiated and better than what's out there? This is Raza Habib.

Starting point is 00:00:26 His company, Human Loop, enables large language models. to have even greater superpowers. We can help you build differentiated applications and products on top of these models. The range of use cases is like now feels to be more limited by imagination than it is limited by technology.

Starting point is 00:00:42 You can replicate your exact writing style, customize tone, fact check answers, and train the model on your company's unique data. We really hope that this is a platform on top of which, you know, the next million developers can build LLM applications. In our conversation, we explore the secrets to building an app

Starting point is 00:00:59 that stands out. What made it so good that a million users signed up in five days was a fine-tuning exercise. The impact of generative AI on developers. They're finding a significant fraction of their code is being written by a large language model. And what the future of large language models might bring to society as a whole. It's an ethical mind field. There are going to be societal consequences on the path to AGI. Potential benefits are huge as well, but we do need to tread very carefully.

Starting point is 00:01:31 Let's start like basics and high level. Like what is a large language model and why is it that they've suddenly sort of made a splash? I assume they've been around a lot longer than the past year or two. Yeah, so language models themselves are really old concepts and old technology and really all it is as a statistical model of words in English language. You take a big bunch of texts and you try to predict what is the word that will come next given a few previous words. So the cat sat on the mat is the most likely word and then you have a distribution over

Starting point is 00:01:59 all the other words in your vocabulary. As you scale the language models, both in terms of the number of parameters they have, but also in the size of the data set that they're trained on, it turns out that they continue to get better and better at this prediction task. Eventually, you have to start doing things like having world knowledge. You know, early on, the language model is learning letter frequencies and word frequencies, and that's fairly straightforwards, and that's kind of what we're used to from predictive text in our phones. But if the language model is going to be able to finish the sentence, today the president of the United States, it has to have learned who the President of the United States is.

Starting point is 00:02:33 If it's going to finish a sentence that's a math problem, it has to be able to solve that math problem. And so where we are today is that, you know, I think starting from GPT 1 and 2, but then GPD 3 was really the one that I think everyone said, okay, something is very, very different here. We now have these models of language that they're just models of the words, right? They don't know anything about the outside world. There's loads of debates about whether they actually understand language,

Starting point is 00:02:57 but they are able to do this task extremely well. and the only way to do that is to have gotten better at some form of reasoning and some form of knowledge. What are some of the challenges of using a pre-trained model like chat chippy T? So one of the big ones is that they have a tendency to confidently bullshit or hallucinate stuff. I think Nat Friedman described it as alternating between spooky and kooky. Sometimes it's so good that you cannot believe the large language model was able to do that, and then just occasionally it's horrendously wrong. And that's just to do with how the model is originally trained.

Starting point is 00:03:28 they're trained to do next word prediction and so they don't necessarily know that they shouldn't be dishonest. Yeah, sometimes they get it wrong. Sometimes they get it wrong, but the danger is that they confidently get it wrong. So, and very persuasively, you know, very authoritatively, they get it wrong.

Starting point is 00:03:42 And so people might, you know, mistakenly trust these models. So there's a couple of ways that you can, you know, hopefully fix that. And it's an open research question. But the way we can help you with Human Loop to do this today is we make it very easy to pull in a factual context to the prompt that you give to the model. And so the model is much more likely to use that rather than make something up.

Starting point is 00:04:01 And so we've seen that as a very successful technique for reducing hallucinations. Terrific. And this is an element to building a differentiated model for your use case. Absolutely. And an element for making it safe and reliable. Right. Yeah. You know, I think when ChatGPT came out, there was a lot of frustration from people who didn't like its personality.

Starting point is 00:04:16 The tone was a bit obsequious and it's, you know, it'll defer. It doesn't want to give strong opinions on things. And to me, that demonstrates the need for, you know, many different types of models and tone and customizations depending on the use case and depending on the audience, and we can help you do that. Can you talk a little bit about what it means to fine tune a model and why that's important? If you look at what the difference is between ChatGPT or the most recent OpenAI text DaVinci 3 model and what's been in the platform for two years and has not gotten as much attention, the difference is fine-tuning. It's the same base model more or less. You can see it on

Starting point is 00:04:53 the OpenAI website. It's one of their code pre-trained models. And what made it so good that a million users signed up in five days was a fine-tuning exercise. And so what fine-tuning is, is gathering examples of the outputs you want for the tasks that you're trying to do, and then doing a little bit of extra training on top of this base model to specialize it to that task. What OpenEI, I think, did first and others have followed to do is to first do a fine-tuning round of these models on input and output pairs that are actually instructions and the results that you would like from the instructions. So those are human-generated pairs of data.

Starting point is 00:05:29 And then to further fine-tune the model using something called reinforcement learning from human feedback where you get human preference data. So you show people a few different generations from the model, ask them to rank them or choose which of two they prefer, and then use that to train a signal that can ultimately fine-tune the model. And it turns out that reinforcement learning from human feedback

Starting point is 00:05:49 makes a huge difference to performance. It's really hard to understate that in the instruct GPT paper that OpenEI released, they compared a one or two billion parameter model with instruction tuning in RLHF to the full GPT3 model, and people preferred that, despite the fact it was a hundred times smaller. Anthropic had this very exciting paper just a couple of weeks ago

Starting point is 00:06:10 where actually we're able to get similar results to RLHF without the H. So just actually having a second model provide the evaluation feedback as well, and that's obviously a lot more scalable. And what data do developers need to bring in order to fine tune a model? So there's this kind of two types of fine-tuning you might do.

Starting point is 00:06:29 They might just show up with a corpus of books or some background. They just want to fine-tune for tone. They have their company's chat logs or tone of voice from marketing communications and they just want to adjust the tone. Right. Or all the emails they've sent costs. Or all the emails they've sent, for example. That's kind of almost extra pre-training, I would think about it as, but it's fine-tuning as well.

Starting point is 00:06:47 And then the other fine-tuning data comes actually from in-production usage. So once they have their app being used, they're capturing the data that their customers are providing. They're capturing feedback data from that. And in some sense, it's being automated at this point, right? Like, Human Loop is taking care of that data capture for you, and it's making the fine-tuning easy. So you have an interaction with the customer that the LLM produces, and the customer sort of gives a thumbs-up or thumbs-down as to whether that was helpful. To give you a concrete example, you know, imagine, you give the email example. Imagine that you're helping someone draft a sales email.

Starting point is 00:07:19 and so you generate a first draft for them, and then they either send it or they don't. So that's like a very interesting piece of feedback that you can capture. They probably edit it, so you can capture the edited text, and they may be get a response or they don't get a response.

Starting point is 00:07:32 So all of those bits of feedback are things we would capture and then use to drive improvements of the underlying model. Got it. If a developer is trying to build an app using a large language model and is doing it for the first time,

Starting point is 00:07:42 what problems are they likely to encounter, and how do you guys help them address some of those problems? Yeah, so we typically help developers with kind of three key problems. One is prototyping, evaluation, and finally customization. Maybe I can sort of talk about each of those. So at the early stages of developing a new large language model product, you have to try and get a good prompt that works well for your use case.

Starting point is 00:08:02 That tends to be highly iterative. You have hundreds of different versions of these things lying around. Managing the complexity of that versioning, experimenting, that's something we help with. Then the use cases that people are building now tend to be a lot more subjective than you might have done with machine learning before. And so evaluation is a lot harder. You can't just calculate accuracy on a test set.

Starting point is 00:08:22 And so helping developers understand how well is my app working with my end customers is the next thing that we really make easy. And finally, customization. Everyone has access to the same base models. Everyone can use GPT3. But if you want to build something differentiated, you need to find a way to customize the model to your use case, to your end users, to your context.

Starting point is 00:08:41 And we make that much easier, both through fine tuning and also through a framework for running experiments. We can help you get a product to market faster, but most importantly, once you're there, we can help you make something that your users prefer over the base models. That seems pretty fundamental. I mean, it's prototyping, getting you the first versions out, testing and evaluation, and then differentiation. This seems pretty fundamental to building something great. I think so.

Starting point is 00:09:04 I mean, we really hope that this is a platform on top of which, you know, the next million developers can build LLM applications. And we worked really closely with some of the first companies to realize, you know, the importance of this, understood. the pain points they had and in a proper YC approach, have tried to build something that those people really wanted. And I think we've got to a point that now we're seeing from others, that it really does solve acute pain points for them. And it doesn't really matter to us what base language model you're using. We can help you with the data feedback collection, with fine tuning,

Starting point is 00:09:35 with prototyping. And those problems are going to be very similar across different models. And really, we just want to help you get to the best result for your use case. And sometimes that'll mean choosing a different model. I wanted to ask, how is the job or role of a developer likely to change in the future because of this technology? This is interesting. I've thought about this a lot. I think in the short term, it augments developers, right? You can do the same thing you could do faster. To me, the most impressive application we've seen of the large language model so far is GitHub co-pilot. I think that they cracked a really novel U.X and figured out how to apply a large language model in a way that's now used. by I think 100 million developers,

Starting point is 00:10:17 and many people I speak to who say that they're finding a significant fraction of their code is being written by a large language model. And I think if you'd ask people, will that happen two years ago? No one would have thought on that. One thing that is surprising to me is that the people who say to me they use it the most

Starting point is 00:10:32 are some of the people I consider to be better or more senior developers. You might have thought this tool would help juniors more. But I think people who are more accustomed to editing and reading code actually benefit more from the completions. So short term, it just accelerates us and allows us to do more. On a longer time horizon, you could imagine developers becoming more like product managers in that they're writing the spec, they're writing the documentation,

Starting point is 00:10:57 but more of the grunt work and more of the boilerplate is taken care of by models. I don't know, long enough type horizon, I mean, there's very few jobs that can be done so much through just text, right? We've really pushed it to the extreme. We've got GitHub and you have remote work. engineers can do a lot of their jobs entirely sitting at a computer screen. And so when we do get towards things that look like AGI, I suspect that developers will actually be one of the first jobs to see large fractions of their job be automated,

Starting point is 00:11:26 which I think is very counterintuitive, but also predicting the future is hard. Yeah. What do you think the next breakthroughs will be in LLM technology? So I actually think here the roadmap is quite well known almost. I think there's a bunch of things that are coming that we are kind of bake, in, we know where they're coming, we just have to wait for it to be achieved. One thing that I think developers will really care about is the context window. So at the moment, when you sort of use these models, there's a limit to how much information you can feed it every time you use it, and extending that context window is going to add a lot more capabilities. One thing that I'm really

Starting point is 00:12:00 excited about is actually augmenting large language models with the ability to take actions. And so we've seen a few examples of this. It's a startup called Adept AI that are doing this and a few others, where you essentially let the large language model decide to take some tasks, so it can output a string that says, you know, search the internet for this thing, and then off the basis of the result, generate some more and repeats. You actually start treating these large language models much more like agents than just text generation machines. Well, something we have to sort of expect or look forward to is, you know, AI taking actions. Can this technology just fundamentally be steered in a safe and ethical direction? And how?

Starting point is 00:12:39 Oh gosh, that's a tough question. I certainly hope so. And I think we need to spend more time thinking about this and working on it than we currently do, because as the capabilities increase, it becomes more pressing. There's a lot of different angles to that, right? So there are people who worry about just like end safety.

Starting point is 00:13:01 So people like Eli Zedukovsky, in order to distinguish himself from just normal AI safety, he just talked about AI, not kill everyone, right? like he thinks the risks are potentially so large that this could be an existential threat. And then there are just a shorter term threat, it's a social disruption. People feel threatened by these models. There are going to be societal consequences, even to the weaker versions on the path to AGI that raise serious ethical questions.

Starting point is 00:13:25 The models bake in biases and preferences that were in the model and the data and the team that built it at the time that it was being constructed. So there are, it's an ethical minefield. I don't think that means we shouldn't do it because I think the potential benefits are huge as well, but we do need to tread very carefully. How strong is the network effect with these models? In other words, is it the case that in the future there may be one model that sort of rules them all because it will be bigger and hence smarter than anything anyone else could build? Or is that not the dynamic that's at play here? So I don't think that's the dynamic that's at play here. Like to me, the barriers to entry of

Starting point is 00:14:05 training one of these models are mostly capital and talent. Like the people needed are still very specialized and very smart, and you need lots of money to pay for GPUs. But beyond that, I don't see that much secret sauce, right? Like, you know, opening eye, for all the criticism they get, they actually have been pretty open. And deep mind have been pretty open. They've published a lot about how they've achieved, what they've achieved. And so the main barrier to replicating something like GPT3 is can you get enough compute and can you get smart people and can you get the data? and more people are following on their heels.

Starting point is 00:14:39 There's some question about whether or not the feedback data might give them a flywheel. I'm a little bit skeptical of that, that it would give them so much that no one could catch up. Why? That seems pretty compelling. If they have a two-year head start and thousands and thousands of apps get built, then the lead they have in terms of feedback data

Starting point is 00:14:56 would seem to be pretty compelling. So I think the feedback data is great for narrow applications, right? Like if you're building an end-user application, then I think you can get a lot of differentiation through feedback and customization. But they're building this very general model that has to be good at everything. And so they can't kind of like

Starting point is 00:15:13 let it become bad at code whilst it gets good at something else, which others can do. I see. Got it. Now, let me ask you probably the hardest question here. Open AI's mission is to build AGI, artificial general intelligence, so that machines can be at the cognitive level of humans, if not better.

Starting point is 00:15:31 Do you think that's within reach? The breakthroughs recently mean that that that's closer than people thought? Or is this still for the time being science fiction? So there's a huge amount of uncertainty here. And if you poll experts, you get a wide range of opinions, even if you poll the people who are closest to it, if you chat to folks at Open AI or other companies, opinions differ. But I think compared to most people's perception in the public, people think it's plausible

Starting point is 00:15:58 sooner than I think a lot of us thought. So there are prediction markets on this, meticulous sort of polls people and how likely they think AGI will be, and I think the median estimate, something like 2040. And if you, even if you think that that's plausible, that's remarkably soon for a technology that might, you know, upend almost all of society. What is very clear is that, you know, we are still going to see very dramatic improvements in the short term. And even before AGI, a lot of societal transformation, a lot of economic benefit, but also questions that we're going to have to wrestle with to make sure that this is a positive for society.

Starting point is 00:16:35 So yeah, I think on the short end of timelines, you know, there are people who think 2030 is plausible, but those same people will accept there's some probability that it won't happen for hundreds of years. You know, there's a distribution. If you take it seriously, and I think you should take it seriously, and it's very hard to take it seriously, even like having made that choice of like, I'm going to accept that by 2030 it's plausible, that we will have machines that can do all the cognitive tasks that humans can do and more. and then you ask me like, okay, Rosa, like, are you building your company in a way that's, like, obviously going to make sense in that world? Like, I'm trying, but it's really hard to internalize that intuitively.

Starting point is 00:17:11 Stuart Russell has a point where he says, you know, if I told you an alien civilization was going to land on Earth in 50 years, you wouldn't do nothing. And there's some possibility that, you know, we've got something like an alien arriving soon. Right. Soon. An alien arriving soon. Yeah. You heard you here first. So let me ask you, what does this new technology mean for startups?

Starting point is 00:17:35 Oh, man. It's unbelievably exciting. It's really difficult to articulate. There's so many things that previously you required a research team for and that felt just impossible that now you just ask the model. Like honestly, stuff that during my PhD I didn't think would be possible for years or that I spent trying to solve problems on where you want to have a system that can generate questions or can do something.

Starting point is 00:17:59 be a really good chat bot like chat GPT, like a realistic one that can understand context over long ranges of time, not like Alexa or Siri that's a single message. The range of use cases is like, now feels to be more limited by imagination than it is limited by technology. And when there is a technology change this abrupt,

Starting point is 00:18:17 where something has improved so much, YC teaches this, right? There's sort of a few different things that open up opportunities for new applications. And we're beginning to see it, you know, a sort of Cambrian explosion of new startup. I think the latest YC batch has many more startups.

Starting point is 00:18:31 We see it at Human Loop. We get a lot of inbound interest from companies that are at the beginning of their explorations and trying to figure out how do I take this raw model and this raw intelligence and actually turn that into a differentiated product. Hopefully we have some AI engineers or aspiring AI engineers listening today and might be interested in working at Human Loop. Are you guys hiring and what kind of culture and company you're trying to build? We absolutely are hiring.

Starting point is 00:18:56 We're hoping to build a platform that's, potentially for one of the most disruptive technologies we've ever had, and that ideal will be used by millions of developers in the future. And there's going to be a lot of doing stuff for the first time and also inventing novel UX or UI experiences. So full stack developers were comfortable, like genuinely really comfortable up and down the stack, and who deeply care about the end user experience, who will enjoy speaking to our customers.

Starting point is 00:19:21 And they're fun customers to work with, because we're working with startups and AI companies who are really on the cutting edge. They're really innovators. You know, if that sounds exciting to you, it will be very hard. Less of it will be very new, but it'll also be very rewarding. Well, this has been really fascinating. I think what my crystal ball says is that one day in the future,

Starting point is 00:19:41 literally millions of developers will be using your tools to build great applications using AI technology. So I wish you luck and thank you again for your time. Thank you, Ali. It's been an absolute pleasure.

Y Combinator Startup Podcast - The REAL Potential of Generative AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.