No Priors: Artificial Intelligence | Technology | Startups - Model Quality, Fine Tuning & Meta Sponsoring Open Source Ecosystem

Episode Date: October 9, 2023

What Does it Take to Improve by 10x or 100x? This week is another host-only episode. Sarah and Elad talk about the path to better model quality, the potential for fine tuning to different use cases, r...etrieval systems (RAG), feedback systems (RLHF, RLAIF) and Meta’s sponsorship of the open source model ecosystem. Plus Sarah and Elad ask if we’re finally at the beginning of a new set of consumer applications and social networks. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil  Show Notes: 0:03:00 - AI Models and Open AI Advances  0:08:59 - Addressing Hallucinations in AI Models  0:13:22 - Open Source Models in Consumer Engagement 0:16:23 - New Trends in Social Content Creation 0:21:53 - Balancing Ambition With Realistic Customer Expectations

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, No Pryor's listeners. Time for a host-only episode. This week, Alad and I talk about the path to better model quality from here. The potential of fine-tuning, RLHF, RLAIF, Ragh and Retrieval Systems Generally, meta-sponsorship of the open-source model ecosystem, and finally, the beginning of a new set of consumer applications and social networks. Thanks for tuning in. So one thing everybody is thinking about is what it takes to get to 10x or 100x better AI systems.
Starting point is 00:00:35 Like I think it'd be useful to just sort of enumerate the elements to sort of step function better. Alad, what do you think? Yeah, you know, it's interesting because there's a few different aspects of that that people always talk about. There's scalability of data sets and compute and parameters and all these things. But the reality is, I think a lot of people believe that in order to 10x or even 100x use cases and usages for AI, outside of that, there's things that could just be done on existing models today. So you don't need to wait for GPT7 or whatever. You can start with GPT4 or GPD3.5 and add these things.
Starting point is 00:01:04 And I think they are kind of bucketed into five or six areas. Number one is multimodality. So that means being able to use text or voice or images or video is both input and output. So you should be able to talk to a model, type to it, upload an image and ask about the image. And then it could output anything from code to a short video for you. Second is long context windows. So basically, when you prompt a model, you basically are feeding it data or commands or other things,
Starting point is 00:01:31 and everybody realizes that you need longer and longer and longer context windows. So Magic, for example, is doing that for code. You should be able to dump an entire code repo into a coding model instead of having to do a piecemeal. Third, which we're going to talk about today, is model customization. So that's things like fine-tuning, something known as RAG. There's data cleaning, there's labeling. There's a bunch of stuff that just makes models work better for you. Fourth is some form of memory, so the AI actually remembers what it's doing.
Starting point is 00:02:00 Fifth is some form of recursion, so looping back and reusing models. And then six, which is related as potentially a bunch of small models that are very specialized, being orchestrated by a central model or sort of AI router that says, well, for this specific task or use case, I'm going to route the prompt or the data or the output into this other model that's doing this other thing, which is basically how the human brain works, right? you process visual information through your visual cortex, but then you use other parts of your brain to make decisions, right? And so it's very similar to what evolutions sort of decided was an optimal approach. But I think it's really interesting because I think many people in
Starting point is 00:02:34 the field know that these five or six things are absolutely coming. And they can dramatically improve the performance on existing systems. Again, 10x, 100x better for certain things. And so it's more just a matter of when, right? It's not really an if anymore. A bunch of people are working on different aspects of this. And, you know, I think it's all coming really fast. And so, you know, there's sort of two things that came out in the last week or two that are really relevant to this. It'd be great to get your thoughts on.
Starting point is 00:03:00 One is Open AI announcing that they're not going to allow people to fine-tune models. And the second is Google where they looked at human-generated feedback versus AI-generated feedback for models and sort of fine-tuning models that way. So, and if you want to tell people a bit more about what happened with Open-EI and why that's important. Yeah. So fine-tuning as a capability has been. offered by Open AI for several years, right? But they've made like a specific investment in allowing
Starting point is 00:03:26 people to do that with more sophisticated models in particular, like 3-5, and then also making it possible for more enterprise use cases, right? And if you think about sort of like why that matters at all, as you said, like, you know, you have a bunch of these labs who are working on general capability and working on the sort of direction of scaling laws, like Transformers predictably improve with scale data and compute. But I think what's really interesting is, like, the way every, the way these models end up being used in many business or even consumer application context is against a specific task, right? And so we've talked a lot about like where research effort is being put or compute is being spent in the industry right now. And there's a really,
Starting point is 00:04:10 I think there's really interesting question of we don't even know how good models can be at certain scale, right, at 70 or 30 or 100 billion parameters or more, but not at GPT4 scale based on really high quality data and curation of that data, because it hasn't been explored. And so I think we should talk about some of the different ways you get these models to actually operate against a specific task with either fine-tuning with RLHF against, you know, the reward for your task or with RAG, as you said, in terms of retrieving from a data set that you've specified, right? And there's reasons you would do all three of these. But I think it's actually a pretty big step for Open AI to enable this, because I think there was, at certain points in the,
Starting point is 00:05:01 in the research world, there's been a narrative that like fine-tuning doesn't really matter, right? The general model matters. And I'd be curious if you think that's a change in research point of view or just a commercial decision in terms of labs wanting to make money or that being more important than ever. Yeah, I think everybody realized that fine-tuning works really well when chat GPT came out because what chat GPT is, is they took this model GPT 3.5, which existed at the time, and that wasn't seeing as much usage, at least from people just going in and querying it unless they were really good at prompts. And they basically hired a bunch of people and the people ranked the output of the model
Starting point is 00:05:39 and they effectively fine-tune the model against that feedback from the people who are assessing is this the answer that I wanted based on the prompt that I put in, right? And so fine-tuning really just means you create a lot of feedback, usually at least today, through people responding to output and saying, is it good or bad? And it created a dramatic step function in the utility of GPT 3.5
Starting point is 00:06:01 for end consumers or end users or students or lawyers or all sorts of different types of people. And it really helped, it was kind of the starting gun for this whole AI revolution right now because everybody suddenly realized how powerful these models were. And the model underlying it fundamentally hadn't really changed that much. What they'd done is they'd fine-tuned it with reinforcement learning through human feedback or LHF. And so I think that created this viewpoint that these types of fine-tunings or, you know, we can talk about RAG in a minute.
Starting point is 00:06:30 I'd love to get your thoughts on that, can fundamentally change the user affinity for a product. And so you could imagine in an enterprise, you say, well, I really want to fine tune this model so that it reflects medical data that I have this proprietary that could help make a better doctor assistant. Or I want to fine tune it against this, you know, set of HR responses that are unique to my company so that if I have an employee who really wants to get answer to a question, they can get a really good answer back. And so it really gets into those sorts of things where you can dramatically improve the output of a model against something that you specialize. Do you want to talk about how rag ties into that? because I think that's a really key component of it, too. I think the sort of basic premise with RAG that everybody should understand is you want to retrieve against a specific corpus, right?
Starting point is 00:07:17 And so you're still going to reason. You might have a generation or an answer based on that corpus. But if you pick a set of documents, it could be legal cases. It could be internal company documents. It could be medical information, as you said. Right. So you still want the reasoning capabilities of the model. A diagnosis requires reasoning, but you want it to come from a specific set of data versus
Starting point is 00:07:41 like, let's say, all of the pre-training data of random information on the internet about whether or not you have this disease, right? And every piece of forum conversation about this disease that has ever happened. So, you know, I think of the core driver as like trustworthiness, right, citation, control of information source. And so now you have this architecture where people are using, think of it as like traditional information retrieval techniques and search in combination with these models. I think the other sort of driver besides trustworthiness on these rag approaches is two things. One is cost and the other is like freshness, right? So every time you retrain a model or even fine-tune a model, like there is compute involved, see the idea
Starting point is 00:08:31 that, you know, being able to incorporate new information without retraining and just using the reasoning capabilities of the models, I think it's very attractive to people. And very, that's also related to the freshness point of view, which is like, you actually want the most recent medical research or the cases from this past year. I think that's, that's sort of a set of the drivers behind people being excited to take this approach and use it against their private data sets. Yeah, and that actually helps a lot with hallucinations, right? And so I think it's important to sort of explicitly point that out because one of the knocks on the current set of AI technologies as well, it may hallucinate or say, you know, say things that aren't necessarily
Starting point is 00:09:09 true or cite a legal case that doesn't exist. And by using Rag, you can actually help say, okay, I'm only going to use things that I know exist or I'm going to filter for things that are going to be answers that fit well with, you know, the current set of knowledge that people have relative to these sets of issues. So to your point on trustworthiness, I think it's really important to call it hallucinations explicitly since that's something people keep bringing up is sort of naysayers, oh my gosh, what if it hallucinates and some terrible misinterpretation happens and therefore we need to regulate this thing, right? So it's kind of interesting. You know, I guess related to that, there's this reinforcement learning through human feedback versus
Starting point is 00:09:45 AI feedback. And Google just came out with a really interesting paper on that where, you know, they showed that you can have an AI similarly provide feedback to whether the AI itself is generating good output. And for certain use cases, that works as well as people. And so suddenly, instead of having to hire an army of people to go and help fine-tune these models, you can actually have an AI help fine-tune this model. And I think the early signs that that was going to be true was actually MedPalm 2, where Google showed that they trained a model specifically on medical data, and the output from the model tended to be more correct than human physician experts. And so for certain use cases, we are already seeing AI provide more accurate answers than specialists, experts, right?
Starting point is 00:10:31 And in RLA AIF, you're trying to sort of generalize that and say, what are all the different ways that instead of using expensive people to do this, we can use really cheap AI models to provide that same feedback as sort of train things? And so there's all these techniques and technologies that are coming now as part of this sort of list of six big innovations that are part of the future AI 10X or 100X roadmap. that are starting to fall into place. I think it's a very exciting time. And I think, you know, in the next year, we'll keep seeing stuff like that. So there's a few other announcements that have come out related to this in terms of using different datasets or different models, but coming from social networks. So, for example, Twitter, or I guess now we should call it X, said it will train ML models
Starting point is 00:11:12 off of Twitter data. And that may have really interesting consumer applications or outcomes. And then meta is really now emerging as a primary sponsor for OpenEA. open source models. Lama and Lama 2 have really taken off in sort of the developer and enterprise ecosystem around LLMs. So it'd be great to hear what you think in terms of why are they doing this? Why are they becoming the primary sponsor for open source AI? And how do you think they're going to apply it within their own company? I really draw a analogy from the current sponsorship of META and Zuck of, you know, Lama and the open source model ecosystem to like,
Starting point is 00:11:53 MySQL, right? So for those of us who remember, like, what happened with these open source database companies, MySQL ended up being originally made by this guy, Monty Widenius, and some Swedish company became part of Sun, become part of Oracle. And in the early days, like MySQL would crash and corrupt data. And there were some early internet scale companies like Facebook who wanted to use it, wanted to not be beholden to commercial database vendors, made at scale. made it more robust and contributed back, right? And I think, like, it's a reasonable analogy in terms of, like, some core technology to your company where you don't want to have a vendor, you don't see it as part of your core business model, but you want there to be open source options, right? And so I have a lot of admiration for what meta is doing.
Starting point is 00:12:44 And I think, like, I think it is very likely to be a big mover in the ecosystem, Because if they sponsor some baseline of models that are big enough to be valuable, high quality enough to be valuable with Facebook AI research, and then enough people find these models useful and strategic and they create a developer ecosystem, it's hard for me to picture them not being sustained as an important ecosystem and an alternative to these research labs that in many ways compete with Facebook or meta in different ways and are very expensive to maintain. But if you look at the history of open source, is that really true? So say, for example, you look at Linux, right? And Linux in part was very much sponsored by IBM throughout the late
Starting point is 00:13:27 90s to the tune of in some years a billion dollars a year. And so even these external ecosystems tend to get quite expensive, you know. And the reason that IBM sponsored Linux was to provide a real offset to Microsoft, right? They basically said Microsoft is dominant. On the desktop, they're really getting aggressive on sort of the server and infrastructure world. And so therefore, let's fund this offset for open source. How do you think that analog applies to meta or does it? Or do you just think it's a different reason in terms of why they're pursuing it? Well, I think they're pursuing it because they want to use it and they don't want to be trapped, right?
Starting point is 00:14:02 Oh, sure, but they don't have to open source it, right? They could just continue to develop it like they have been. And so why open source it? One piece of it is like wanting to offset the development costs and the compute costs at some point, right? And that's sort of one of the core premises of open source. They've also done, like, other really related things, like the open compute project. But, you know, if you think about why that analogy does or doesn't apply, right? Like, one is, does Meta want to make money off of this in some sort of, like, B2B way?
Starting point is 00:14:31 If they keep open sourcing it, the answer is no, right? They want to use it in their core consumer businesses. And then, two, like, for this to work, I think one of the ways the analogy breaks down is very much, like, the need for centralized training today. right? It's a complicating factor. Like, can you really coordinate that with the politics and slow decision making of open source communities? I don't know. I think that's challenging. There are interesting folks working on at least these sort of like technical coordination of this as well, right? Like foundry and together. But if you, just to like make explicit like why might they care? My guess is like the ability to use these models. It applies in sort of the more traditional ways. Like we can use them to make the data center. like more energy efficient. We can, and there's been publishing about this, we can use these models to improve, like, ad serving, right? Like, lots of things that matter to the core meta business, but it's also just one of the most interesting things to happen in consumer in a long time, right? You have things like character, inflection, mid-jurney, PICA, experiments like can
Starting point is 00:15:35 of soup, like these things, they have caught the attention of consumers in a way that few things have over the last few years. And so I think it's known that there are Instagram chat bots being tested. Right. And so if this is a path to consumer engagement and then therefore ads and it's going to be a really important element, I think they just want to have access to it without being to hold into a sponsor. What's your view? Yeah. I mean, I think it's amazing that meta has decided to make this move. And I think it's really beneficial to the ecosystem overall. So, you know, at this point, I think Lama 2 is really emerging as the model that a lot of people are rallying around, and obviously that may change over time, but for now, I think it's one of the primary
Starting point is 00:16:14 models people are using on the open source side and the people view is quite high quality. So I think it's super impressive. I think more broadly in social on AI, it's kind of striking that the last large social network in some sense was TikTok, which was launched seven years ago now. So it's been a while since we've seen a major shift. And part of that is because large-scale social products have already been established. And so now you need to sort of pry users away from existing products, which is much harder than just filling time otherwise. I remember talking once with Jack, the founder of Wikihau,
Starting point is 00:16:44 which was like a how-to, you know, community-driven website. And he said that the main way that they lost people who were contributing to Wikihau was they went to social gaming. They were just playing games instead, right? So it was sort of this time and attention shift 10 years ago when you mentioned this to me, right? And so number one is you have to displace other people. number two, a lot of the innovation and social kind of stagnated a little bit for startups, right? It became a lot more, let's do Twitter, but more woke or more right wing, or let's do early Facebook again as a mobile app versus, hey, we're going to reinvent the modality or we're going to reinvent the use case or the communication channel, whatever may be. And it feels like generative AI is the first thing in many years just sort of create that new window or opening. And I think the big social networks like meta and Twitter and others may actually be the biggest beneficiaries of this new way, but there also should be room for startups.
Starting point is 00:17:41 And there's some new things, you know, Can of Soup was in the recent YC batch and they're doing kind of interesting things. And I think it's almost like asking what's the Gen. A.I. Native modality and use case. And typically when you look at social products, you used to have this two by two or some people had like a two by three of, you know, is it broadcast versus mutual follows in terms of network structure. what's the modality, is it images, is it video, et cetera? And then what's the length and persistence of it? Is it long form? Is it ephemeral, et cetera? So, for example, Snapchat started off as, you know, short form broadcast in one-on-one that was ephemeral, right? And so you could kind of map out the whole social world against those dimensions. And now there's this new interesting thing of, you know, new forms of content creation, potentially upending one or two
Starting point is 00:18:27 of those quadrants. So it seems like a very exciting time overall. Yeah. Yeah. I have had a, you know, long-time obsession with Tociao and TikTok and some of the Chinese social companies that really started as like AI-native content aggregators, right? And if you think about what they did, they really figured out this like cold start problem in terms of they, like Tothiao originally, they aggregated news content from other places and then bootstrapped your preferences. They didn't require explicit user input to say, like, I am interested in these. topics. They analyzed your social profile for your interest. They collected like location and
Starting point is 00:19:07 demo and analyzed articles for like quality and topics. So they had these like rich per user models of engagement based on interaction data. And then you have this magical experience of like a better content feed that then drove the iteration around better labeling. And I think exactly as you said, if those companies figured out like the cold start unrelevance, maybe the opportunity, I think one of the potential opportunities in this generation of social is like cold start on the content and self, right? Like you've seen other amazing companies like the Instagrams of the world, right? They create tools for content creation for like magically compelling assets that are much easier and then like turn it into a social network. And so generation feels like a really compelling answer in terms of like how to have.
Starting point is 00:19:58 have a content feed that is both like really engaging for you and then giving people creation superpowers. Yeah. And I think Mid Journey and Pekar are two great examples of that to the point earlier. And then character is sort of a form of that if you decide to create your own character or sort of interact with something that's more customized there. So it does seem like there are these really interesting shifts that are happening. And then the question is, is it more for creation and sharing or does it become a new social product or a new communication product? In other words, is it GIFI or is it, you know, Facebook, right? And Lenza was a good example of Giffy, right? It was used to basically create content that you share in other social networks. And the question
Starting point is 00:20:39 is what are going to be the big consumer apps that sort of emerge on top of that? And again, it may just be meta again, right? But I think it's a super interesting question and probably the most exciting time in social for a very long time. And it's kind of this oddly almost ignored an area out from a entrepreneurship and founder perspective right now. Everybody's rushing at the enterprise stuff and the infrastructure and, you know, that whole stack. And it's almost like the generation of people who were going to start social products all did them five years ago and did the, you know, let's do Twitter again. And the generation that's really focused now kind of grew up where SaaS was sort of opportunity or SaaS and Dev tools were the opportunities that everybody
Starting point is 00:21:16 was mining again. So it'll be interesting to see whether or not that shifts back in any meaningful way. the one other thing that I think is kind of interesting just related to entrepreneurship and AI right now and I was talking to a founder about this where they were trying to do something really hard and by really hard I mean addressing a really hard market by using Gen AI and early in markets
Starting point is 00:21:37 like when a new technology shifts and disrupts a whole market you actually want to just do the easy stuff right? Why do the hard stuff? There's so much low-hanging fruit why don't you just go after the stuff is super easy and my sort of advice to founders generically on this stuff. It's like, don't do the hard stuff right now. Or if it's hard, do something that's technically hard that enables a giant breakthrough in terms of use case. But don't actually do the hard market because there's so many easy markets right now.
Starting point is 00:22:01 You should just go for the easy stuff. And if you're grinding and grinding and grinding and not getting customer attention, don't spend more time on it. It's just not worth it right now. Now, five years from now, when the use of these technologies are a bit more saturated, that's when you have to go do the hard stuff. But, you know, it's kind of interesting to to think about, you know, prior technology waves and when should you do the easy versus hard? Yeah, I was actually just talking to some of the founders that are in our accelerator right now that come from like really great technical and research backgrounds. And they were reaching for a problem broadly in the engineering and code generation space that was very ambitious, right? And I could see
Starting point is 00:22:39 kind of a solve it all type problem. And it's not that it's not valuable. It's just that there is so much you could do that is, as you point out, easier and valuable today and requires pushing the bounds of research, but you have far higher likelihood of having something that's useful to give to customers this year with far less risk. And I don't mean to constrain people's ambitions, but the ability to give yourself multiple at-bats with the wind at your back in terms of the entire field progressing versus trying to get out in front of everyone else with a multi-year research goal when there's like just gold hanging out everywhere. You know, my orientation is I think similar here. Yeah, it's no GPU before product market fit. I think that's
Starting point is 00:23:27 the takeaway. A lot slogan of the year. Okay. Awesome. Fun to hang out and talk about the news the week. Find us on Twitter at No Prior's Pod. Subscribe to our YouTube channel if you want to see faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.