No Priors: Artificial Intelligence | Technology | Startups - Model Quality, Fine Tuning & Meta Sponsoring Open Source Ecosystem
Episode Date: October 9, 2023What Does it Take to Improve by 10x or 100x? This week is another host-only episode. Sarah and Elad talk about the path to better model quality, the potential for fine tuning to different use cases, r...etrieval systems (RAG), feedback systems (RLHF, RLAIF) and Meta’s sponsorship of the open source model ecosystem. Plus Sarah and Elad ask if we’re finally at the beginning of a new set of consumer applications and social networks. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: 0:03:00 - AI Models and Open AI Advances 0:08:59 - Addressing Hallucinations in AI Models 0:13:22 - Open Source Models in Consumer Engagement 0:16:23 - New Trends in Social Content Creation 0:21:53 - Balancing Ambition With Realistic Customer Expectations
Transcript
Discussion (0)
Hi, No Pryor's listeners.
Time for a host-only episode.
This week, Alad and I talk about the path to better model quality from here.
The potential of fine-tuning, RLHF, RLAIF, Ragh and Retrieval Systems Generally,
meta-sponsorship of the open-source model ecosystem,
and finally, the beginning of a new set of consumer applications and social networks.
Thanks for tuning in.
So one thing everybody is thinking about is what it takes to get to 10x or 100x better AI systems.
Like I think it'd be useful to just sort of enumerate the elements to sort of step function better.
Alad, what do you think?
Yeah, you know, it's interesting because there's a few different aspects of that that people always talk about.
There's scalability of data sets and compute and parameters and all these things.
But the reality is, I think a lot of people believe that in order to 10x or even 100x use cases and usages for AI, outside of that,
there's things that could just be done on existing models today.
So you don't need to wait for GPT7 or whatever.
You can start with GPT4 or GPD3.5 and add these things.
And I think they are kind of bucketed into five or six areas.
Number one is multimodality.
So that means being able to use text or voice or images or video is both input and output.
So you should be able to talk to a model, type to it,
upload an image and ask about the image.
And then it could output anything from code to a short video for you.
Second is long context windows.
So basically, when you prompt a model, you basically are feeding it data or commands or other things,
and everybody realizes that you need longer and longer and longer context windows.
So Magic, for example, is doing that for code.
You should be able to dump an entire code repo into a coding model instead of having to do a piecemeal.
Third, which we're going to talk about today, is model customization.
So that's things like fine-tuning, something known as RAG.
There's data cleaning, there's labeling.
There's a bunch of stuff that just makes models work better for you.
Fourth is some form of memory, so the AI actually remembers what it's doing.
Fifth is some form of recursion, so looping back and reusing models.
And then six, which is related as potentially a bunch of small models that are very specialized,
being orchestrated by a central model or sort of AI router that says, well, for this specific task or use case,
I'm going to route the prompt or the data or the output into this other model that's doing this other thing,
which is basically how the human brain works, right?
you process visual information through your visual cortex, but then you use other parts of
your brain to make decisions, right? And so it's very similar to what evolutions sort of decided
was an optimal approach. But I think it's really interesting because I think many people in
the field know that these five or six things are absolutely coming. And they can dramatically
improve the performance on existing systems. Again, 10x, 100x better for certain things.
And so it's more just a matter of when, right? It's not really an if anymore. A bunch of people
are working on different aspects of this.
And, you know, I think it's all coming really fast.
And so, you know, there's sort of two things that came out in the last week or two
that are really relevant to this.
It'd be great to get your thoughts on.
One is Open AI announcing that they're not going to allow people to fine-tune models.
And the second is Google where they looked at human-generated feedback versus AI-generated
feedback for models and sort of fine-tuning models that way.
So, and if you want to tell people a bit more about what happened with Open-EI and why that's
important.
Yeah.
So fine-tuning as a capability has been.
offered by Open AI for several years, right? But they've made like a specific investment in allowing
people to do that with more sophisticated models in particular, like 3-5, and then also making it
possible for more enterprise use cases, right? And if you think about sort of like why that
matters at all, as you said, like, you know, you have a bunch of these labs who are working on
general capability and working on the sort of direction of scaling laws, like Transformers predictably
improve with scale data and compute. But I think what's really interesting is, like, the way
every, the way these models end up being used in many business or even consumer application
context is against a specific task, right? And so we've talked a lot about like where research
effort is being put or compute is being spent in the industry right now. And there's a really,
I think there's really interesting question of we don't even know how good models can be at certain
scale, right, at 70 or 30 or 100 billion parameters or more, but not at GPT4 scale based on
really high quality data and curation of that data, because it hasn't been explored.
And so I think we should talk about some of the different ways you get these models to actually
operate against a specific task with either fine-tuning with RLHF against, you know, the reward for your
task or with RAG, as you said, in terms of retrieving from a data set that you've specified,
right? And there's reasons you would do all three of these. But I think it's actually a pretty
big step for Open AI to enable this, because I think there was, at certain points in the,
in the research world, there's been a narrative that like fine-tuning doesn't really matter,
right? The general model matters. And I'd be curious if you think that's a change in research
point of view or just a commercial decision in terms of labs wanting to make money or that being
more important than ever. Yeah, I think everybody realized that fine-tuning works really well
when chat GPT came out because what chat GPT is, is they took this model GPT 3.5, which existed
at the time, and that wasn't seeing as much usage, at least from people just going in and querying
it unless they were really good at prompts. And they basically hired a bunch of people
and the people ranked the output of the model
and they effectively fine-tune the model
against that feedback from the people who are assessing
is this the answer that I wanted based on the prompt that I put in, right?
And so fine-tuning really just means you create a lot of feedback,
usually at least today, through people responding to output
and saying, is it good or bad?
And it created a dramatic step function
in the utility of GPT 3.5
for end consumers or end users or students or lawyers
or all sorts of different types of people.
And it really helped, it was kind of the starting gun for this whole AI revolution right now
because everybody suddenly realized how powerful these models were.
And the model underlying it fundamentally hadn't really changed that much.
What they'd done is they'd fine-tuned it with reinforcement learning through human feedback or LHF.
And so I think that created this viewpoint that these types of fine-tunings or, you know,
we can talk about RAG in a minute.
I'd love to get your thoughts on that, can fundamentally change the user affinity for a product.
And so you could imagine in an enterprise, you say, well, I really want to fine tune this model so that it reflects medical data that I have this proprietary that could help make a better doctor assistant.
Or I want to fine tune it against this, you know, set of HR responses that are unique to my company so that if I have an employee who really wants to get answer to a question, they can get a really good answer back.
And so it really gets into those sorts of things where you can dramatically improve the output of a model against something that you specialize.
Do you want to talk about how rag ties into that?
because I think that's a really key component of it, too.
I think the sort of basic premise with RAG that everybody should understand is you want to
retrieve against a specific corpus, right?
And so you're still going to reason.
You might have a generation or an answer based on that corpus.
But if you pick a set of documents, it could be legal cases.
It could be internal company documents.
It could be medical information, as you said.
Right.
So you still want the reasoning capabilities of the model.
A diagnosis requires reasoning, but you want it to come from a specific set of data versus
like, let's say, all of the pre-training data of random information on the internet about
whether or not you have this disease, right? And every piece of forum conversation about this
disease that has ever happened. So, you know, I think of the core driver as like trustworthiness,
right, citation, control of information source. And so now you have this architecture where
people are using, think of it as like traditional information retrieval techniques and search
in combination with these models. I think the other sort of driver besides trustworthiness on these
rag approaches is two things. One is cost and the other is like freshness, right? So every time
you retrain a model or even fine-tune a model, like there is compute involved, see the idea
that, you know, being able to incorporate new information without retraining and just using the
reasoning capabilities of the models, I think it's very attractive to people. And very, that's also
related to the freshness point of view, which is like, you actually want the most recent medical
research or the cases from this past year. I think that's, that's sort of a set of the drivers
behind people being excited to take this approach and use it against their private data sets.
Yeah, and that actually helps a lot with hallucinations, right? And so I think it's important
to sort of explicitly point that out because one of the knocks on the current set of AI
technologies as well, it may hallucinate or say, you know, say things that aren't necessarily
true or cite a legal case that doesn't exist. And by using Rag, you can actually help say,
okay, I'm only going to use things that I know exist or I'm going to filter for things that
are going to be answers that fit well with, you know, the current set of knowledge that people
have relative to these sets of issues. So to your point on trustworthiness, I think it's really
important to call it hallucinations explicitly since that's something people keep bringing up
is sort of naysayers, oh my gosh, what if it hallucinates and some terrible misinterpretation happens
and therefore we need to regulate this thing, right? So it's kind of interesting. You know,
I guess related to that, there's this reinforcement learning through human feedback versus
AI feedback. And Google just came out with a really interesting paper on that where, you know,
they showed that you can have an AI similarly provide feedback to whether the AI itself is generating
good output. And for certain use cases, that works as well as people. And so suddenly, instead of
having to hire an army of people to go and help fine-tune these models, you can actually
have an AI help fine-tune this model. And I think the early signs that that was going to be true
was actually MedPalm 2, where Google showed that they trained a model specifically on medical
data, and the output from the model tended to be more correct than human physician experts.
And so for certain use cases, we are already seeing AI provide more accurate answers than specialists, experts, right?
And in RLA AIF, you're trying to sort of generalize that and say, what are all the different ways that instead of using expensive people to do this, we can use really cheap AI models to provide that same feedback as sort of train things?
And so there's all these techniques and technologies that are coming now as part of this sort of list of six big innovations that are part of the future AI 10X or 100X roadmap.
that are starting to fall into place.
I think it's a very exciting time.
And I think, you know, in the next year, we'll keep seeing stuff like that.
So there's a few other announcements that have come out related to this in terms of using
different datasets or different models, but coming from social networks.
So, for example, Twitter, or I guess now we should call it X, said it will train ML models
off of Twitter data.
And that may have really interesting consumer applications or outcomes.
And then meta is really now emerging as a primary sponsor for OpenEA.
open source models. Lama and Lama 2 have really taken off in sort of the developer and enterprise
ecosystem around LLMs. So it'd be great to hear what you think in terms of why are they doing
this? Why are they becoming the primary sponsor for open source AI? And how do you think they're
going to apply it within their own company? I really draw a analogy from the current sponsorship
of META and Zuck of, you know, Lama and the open source model ecosystem to like,
MySQL, right? So for those of us who remember, like, what happened with these open source database companies, MySQL ended up being originally made by this guy, Monty Widenius, and some Swedish company became part of Sun, become part of Oracle. And in the early days, like MySQL would crash and corrupt data. And there were some early internet scale companies like Facebook who wanted to use it, wanted to not be beholden to commercial database vendors, made at scale.
made it more robust and contributed back, right?
And I think, like, it's a reasonable analogy
in terms of, like, some core technology
to your company where you don't want to have a vendor,
you don't see it as part of your core business model,
but you want there to be open source options, right?
And so I have a lot of admiration for what meta is doing.
And I think, like, I think it is very likely to be a big mover in the ecosystem,
Because if they sponsor some baseline of models that are big enough to be valuable, high
quality enough to be valuable with Facebook AI research, and then enough people find these
models useful and strategic and they create a developer ecosystem, it's hard for me to
picture them not being sustained as an important ecosystem and an alternative to these research
labs that in many ways compete with Facebook or meta in different ways and are very expensive
to maintain. But if you look at the history of open source, is that really true? So say, for example,
you look at Linux, right? And Linux in part was very much sponsored by IBM throughout the late
90s to the tune of in some years a billion dollars a year. And so even these external
ecosystems tend to get quite expensive, you know. And the reason that IBM sponsored Linux was
to provide a real offset to Microsoft, right? They basically said Microsoft is dominant. On the
desktop, they're really getting aggressive on sort of the server and infrastructure world. And so
therefore, let's fund this offset for open source.
How do you think that analog applies to meta or does it?
Or do you just think it's a different reason in terms of why they're pursuing it?
Well, I think they're pursuing it because they want to use it and they don't want to be trapped, right?
Oh, sure, but they don't have to open source it, right?
They could just continue to develop it like they have been.
And so why open source it?
One piece of it is like wanting to offset the development costs and the compute costs at some point, right?
And that's sort of one of the core premises of open source.
They've also done, like, other really related things, like the open compute project.
But, you know, if you think about why that analogy does or doesn't apply, right?
Like, one is, does Meta want to make money off of this in some sort of, like, B2B way?
If they keep open sourcing it, the answer is no, right?
They want to use it in their core consumer businesses.
And then, two, like, for this to work, I think one of the ways the analogy breaks down is very much, like, the need for centralized training today.
right? It's a complicating factor. Like, can you really coordinate that with the politics and slow decision making of open source communities? I don't know. I think that's challenging. There are interesting folks working on at least these sort of like technical coordination of this as well, right? Like foundry and together. But if you, just to like make explicit like why might they care? My guess is like the ability to use these models. It applies in sort of the more traditional ways. Like we can use them to make the data center.
like more energy efficient. We can, and there's been publishing about this, we can use these
models to improve, like, ad serving, right? Like, lots of things that matter to the core
meta business, but it's also just one of the most interesting things to happen in consumer in a long
time, right? You have things like character, inflection, mid-jurney, PICA, experiments like can
of soup, like these things, they have caught the attention of consumers in a way that few
things have over the last few years. And so I think it's known that there are Instagram chat
bots being tested. Right. And so if this is a path to consumer engagement and then therefore
ads and it's going to be a really important element, I think they just want to have access to
it without being to hold into a sponsor. What's your view? Yeah. I mean, I think it's amazing that
meta has decided to make this move. And I think it's really beneficial to the ecosystem overall.
So, you know, at this point, I think Lama 2 is really emerging as the model that a lot of people are rallying
around, and obviously that may change over time, but for now, I think it's one of the primary
models people are using on the open source side and the people view is quite high quality.
So I think it's super impressive.
I think more broadly in social on AI, it's kind of striking that the last large social network
in some sense was TikTok, which was launched seven years ago now.
So it's been a while since we've seen a major shift.
And part of that is because large-scale social products have already been established.
And so now you need to sort of pry users away from existing products, which is much harder
than just filling time otherwise. I remember talking once with Jack, the founder of Wikihau,
which was like a how-to, you know, community-driven website. And he said that the main way that
they lost people who were contributing to Wikihau was they went to social gaming. They were
just playing games instead, right? So it was sort of this time and attention shift 10 years ago
when you mentioned this to me, right? And so number one is you have to displace other people.
number two, a lot of the innovation and social kind of stagnated a little bit for startups, right?
It became a lot more, let's do Twitter, but more woke or more right wing, or let's do early Facebook again as a mobile app versus, hey, we're going to reinvent the modality or we're going to reinvent the use case or the communication channel, whatever may be.
And it feels like generative AI is the first thing in many years just sort of create that new window or opening.
And I think the big social networks like meta and Twitter and others may actually be the biggest beneficiaries of this new way, but there also should be room for startups.
And there's some new things, you know, Can of Soup was in the recent YC batch and they're doing kind of interesting things.
And I think it's almost like asking what's the Gen. A.I. Native modality and use case.
And typically when you look at social products, you used to have this two by two or some people had like a two by three of, you know, is it broadcast versus mutual follows in terms of network structure.
what's the modality, is it images, is it video, et cetera? And then what's the length and
persistence of it? Is it long form? Is it ephemeral, et cetera? So, for example, Snapchat started
off as, you know, short form broadcast in one-on-one that was ephemeral, right? And so
you could kind of map out the whole social world against those dimensions. And now there's this
new interesting thing of, you know, new forms of content creation, potentially upending one or two
of those quadrants. So it seems like a very exciting time overall. Yeah. Yeah. I have
had a, you know, long-time obsession with Tociao and TikTok and some of the Chinese social
companies that really started as like AI-native content aggregators, right?
And if you think about what they did, they really figured out this like cold start problem
in terms of they, like Tothiao originally, they aggregated news content from other places
and then bootstrapped your preferences.
They didn't require explicit user input to say, like, I am interested in these.
topics. They analyzed your social profile for your interest. They collected like location and
demo and analyzed articles for like quality and topics. So they had these like rich per user
models of engagement based on interaction data. And then you have this magical experience of like
a better content feed that then drove the iteration around better labeling. And I think exactly as
you said, if those companies figured out like the cold start unrelevance, maybe the opportunity,
I think one of the potential opportunities in this generation of social is like cold start on the content and self, right?
Like you've seen other amazing companies like the Instagrams of the world, right?
They create tools for content creation for like magically compelling assets that are much easier and then like turn it into a social network.
And so generation feels like a really compelling answer in terms of like how to have.
have a content feed that is both like really engaging for you and then giving people creation
superpowers. Yeah. And I think Mid Journey and Pekar are two great examples of that to the point
earlier. And then character is sort of a form of that if you decide to create your own character
or sort of interact with something that's more customized there. So it does seem like there are
these really interesting shifts that are happening. And then the question is, is it more for
creation and sharing or does it become a new social product or a new communication product? In other
words, is it GIFI or is it, you know, Facebook, right? And Lenza was a good example of Giffy, right?
It was used to basically create content that you share in other social networks. And the question
is what are going to be the big consumer apps that sort of emerge on top of that? And again,
it may just be meta again, right? But I think it's a super interesting question and probably the most
exciting time in social for a very long time. And it's kind of this oddly almost ignored an area
out from a entrepreneurship and founder perspective right now. Everybody's rushing at the enterprise
stuff and the infrastructure and, you know, that whole stack. And it's almost like the generation
of people who were going to start social products all did them five years ago and did the,
you know, let's do Twitter again. And the generation that's really focused now kind of grew up
where SaaS was sort of opportunity or SaaS and Dev tools were the opportunities that everybody
was mining again. So it'll be interesting to see whether or not that shifts back in any meaningful way.
the one other thing that I think is kind of interesting
just related to entrepreneurship and AI right now
and I was talking to a founder about this
where they were trying to do something really hard
and by really hard I mean
addressing a really hard market by using Gen AI
and early in markets
like when a new technology shifts and disrupts a whole market
you actually want to just do the easy stuff
right? Why do the hard stuff? There's so much low-hanging fruit
why don't you just go after the stuff is super easy
and my sort of advice to founders
generically on this stuff. It's like, don't do the hard stuff right now. Or if it's hard,
do something that's technically hard that enables a giant breakthrough in terms of use case.
But don't actually do the hard market because there's so many easy markets right now.
You should just go for the easy stuff. And if you're grinding and grinding and grinding and
not getting customer attention, don't spend more time on it. It's just not worth it right now.
Now, five years from now, when the use of these technologies are a bit more saturated,
that's when you have to go do the hard stuff. But, you know, it's kind of interesting to
to think about, you know, prior technology waves and when should you do the easy versus hard?
Yeah, I was actually just talking to some of the founders that are in our accelerator right now that
come from like really great technical and research backgrounds. And they were reaching for a problem
broadly in the engineering and code generation space that was very ambitious, right? And I could see
kind of a solve it all type problem. And it's not that it's not valuable. It's just that there is
so much you could do that is, as you point out, easier and valuable today and requires pushing
the bounds of research, but you have far higher likelihood of having something that's useful
to give to customers this year with far less risk. And I don't mean to constrain people's
ambitions, but the ability to give yourself multiple at-bats with the wind at your back in
terms of the entire field progressing versus trying to get out in front of everyone else with a
multi-year research goal when there's like just gold hanging out everywhere. You know, my
orientation is I think similar here. Yeah, it's no GPU before product market fit. I think that's
the takeaway. A lot slogan of the year. Okay. Awesome. Fun to hang out and talk about the news
the week. Find us on Twitter at No Prior's Pod. Subscribe to our YouTube channel if you want to see
faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new
episode every week. And sign up for emails or find transcripts for every episode at no-dash priors.com.