No Priors: Artificial Intelligence | Technology | Startups - Future of LLM Markets, Consolidation, and Small Models with Sarah and Elad

Episode Date: September 12, 2024

In this episode of No Priors, Sarah and Elad go deep into what's on everyone’s mind. They break down new partnerships and consolidation in the LLM market, specialization of AI models, and AMD’s st...rategic moves. Plus, Elad is looking for a humanoid robot.  Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil  Show Notes: (0:00) Introduction (0:24) LLM market consolidation  (2:18) Competition and decreasing API costs (3:58) Innovation in LLM productization  (8:20) Comparing  the LLM and social network market (11:40) Increasing competition in image generation (13:21) Trend in smaller models with higher performance (14:43) Areas of innovation (17:33) Legacy of AirBnB and Uber pushing boundaries (24:19) AMD Acquires ZT  (25:49) Elad’s looking for a Robot

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, listeners. Welcome to No Pryors. Today, Alad and I are just hang out. We're going to talk about LLM consolidation, what's going on in chips. I think an interesting dynamic around what type of risk you should take as an AI company and pushing the envelope and some big transactions. So let's get into it. First topic. Alad, are we done here? Is it too late? Is the LM market consolidated? It's such an interesting question, right? Basically, what we're seeing is a number of models. companies are either, you know, having their teams join larger enterprises so that may be parts of inflection or parts of character or parts of other companies, parts of adapt to AWS. And so that, you know, there's sort of one dynamic going on there with the model side. And many of those companies are continuing to exist, right? Like, you know, some of the products are still running and being used in different ways. At the same time, there's enormous capital modes emerging to get to the biggest scale for foundation models. And so if you look at it, you know, these companies are
Starting point is 00:01:01 not raising billions or tens of billions of dollars often either from hyperscalers. So the Amazon's and Microsofts of the world or from sovereigns, right? Because those are the only people who can actually give you billions and billions of dollars. The venture capital industry is just too small to actually be able to support the next round for these companies. So everybody's kind of partnering up. And so it's a really interesting question to ask, well, for all the other players in the market, where are they going to get these ever rising amounts of capital? And And who do they partner with? Does Apple end up with a partner?
Starting point is 00:01:29 Does Samsung end up with a partner? Does, you know, X, Y, Z, other company end up with a partner. And so you can kind of map, like, all the potential partners to all the model companies and just ask, how does all that fall out? And then in parallel, the big hypers have an incentive to fund these companies simply because it, you know, in some cases also translates over to more cloud utilization as an industry in general. The incentives start to flip between BCs, clouds, other strategics and sovereigns in terms
Starting point is 00:01:55 or what they want to do, it does seem like it's increasingly hard to think that most companies will end up being competitive outside of a fundamental breakthrough in the model architecture or cost of actually training and then running inference on the model or doing the post-training side. So I think it's a really interesting open question, but it does feel like we're moving into a stage of more and more consolidation. I don't know. What do you think about it? Most that makes sense to me. I would argue that the market has become more competitive, not less of the last, like, year and a half. Maybe it's competitive between a set of players that have, like, as you described it,
Starting point is 00:02:28 you know, a capital mode, that there's some breakaway scale. But the dynamic now, at least from the consumption side, is there's continual and aggressive performance increases and, like, competition on the benchmark and price decreases. And also, you know, real open source players. And so you can have consolidation and people not necessarily making money yet. I think you actually raise a really interesting point, which shouldn't be under discussed, which is the API costs have dropped something like 200X in the last 18 to 24 months or something along that order of magnitude. David on my team actually pulled together a chart of all the pricing for all the various models and what that looks like over time.
Starting point is 00:03:13 And it's dramatic in terms of how cheap, you know, the dollars per million tokens has gotten. And so I think related to that, if you have a 200x decrease in the cost of running these things or inferencing these things, and part of that is distillation, part of that is like, what are you actually using in terms of the generation of GPU, et cetera, et cetera. The actual margin available and the revenue available is increasing from the perspective of usage, but it's becoming harder and harder to just go out and compete with the model, at least as an API business, right? And so that kind of pushes you into specialization into other areas of doing either bespoke specialized models or specific types of post training or vertical applications or things like that. I think the other way that you could look at the consolidation is just like what is the argument for capital at that scale from a business perspective? And like the mostly unsaid thing is like really like it's still AGI is the business, right? There will be emergent models. Emergent behaviors and capabilities in the model where it will figure out how to make.
Starting point is 00:04:17 money for us or it will be obvious, like, how valuable it is. But I think in the more, you know, immediate, and I'm not even saying that's wrong, but I'd say in the more immediate and like two to three horizon, you know, you have consumer as a business, either apps, like advertising or subscriptions, and nobody's gone the advertising route in anger yet or enterprise as a business. Both of these are real today. But I think also it's become like much more of a product fight, right? You have the big hires of Mikey K and Kevin Wheel at Anthropic and Open AI. I think you see players trying to build moats around the capital mode and research that they've done to make more than just chat as an interface. And so I don't, I think on both sides, like you're going to see providers try
Starting point is 00:05:12 and push customers down a more locked-in path in terms of, like, APIs, right? We saw this with, you know, AWS. You're going to have sophistication around, like, don't just do, it's a storage bucket. You'll have prompt caching and JSON output interfaces, fine-tuning. And that actually makes it much less of a commodity market if people adopt it, because it's easier. Yeah, there's a lot more features being built in. And I think you're referring to sort of what Anthropic has done on the caching side, which is a really interesting move in terms of, you know, how that impacts cost and timing and
Starting point is 00:05:46 everything else or latency. Yeah. And I do think I'm pretty excited as a consumer, like what we should expect from interfaces, right? Not that chat goes away, but you can imagine much smarter chat with automatic contacts and different surfaces. So I do think that, like, you know, there's a question about whether or not you can compete with the consolidation. There's a question of how the big players compete.
Starting point is 00:06:09 But the challenges, I think, that are possible in the market that people are trying today or the reasons you could still go after, it would be like if people are taking different, very different reasoning approaches where, you know, you can collect the amount of capital required to get to competitive scale, which decreases, you know, when you're repeating work that has already been done. and because you have the benefit of like, you know, the hardware progress that continues to be made. But here you have people working on, you know, math and code for self-play. And so I think that's interesting. It's not necessarily like purely different architecture, but what the next level of scaling is. And then just, you know, distillation and relevance of small model, fine-tuning. I think is like another open question of like how people are going to really use these things. Yeah. And I think it's important to clarify that we're talking specifically about language models,
Starting point is 00:07:12 right? And so there's lots of other model types that will be coming over time in physics and biology and material science and image gen of different forms, et cetera. And, you know, in some cases these things are going multimodal, but in many cases you're going to have unique models for each. And really what we're referring to right now is sort of this core large language model market it and how that evolves over time. And then to your point, there's other pieces on top of that that could be used either pro-language models or other types of models in terms of reasoning models, agentic flows. It's almost like an orthogonal axis.
Starting point is 00:07:45 And then the third piece of it is the differentiation within the infrastructure around the model. So you mentioned caching as an example. And then there's a long context window. There's rag. There's all sorts of other things as well. And so I do think as we think about how all this stuff evolves, we're going to see evolution across all three axes.
Starting point is 00:08:01 And the real question that we're trying to address right now is just simply for the core LLM market. You're building better and better larger language models. You know, how does that evolve? And there it does feel like things have consolidated a bit. But, you know, it's funny, you look at the history of social networks. And everybody thought this company called Friendster was going to win, and everybody thought MySpace was going to win.
Starting point is 00:08:23 And then Facebook emerged. And by the time Facebook emerged, everybody said, well, it's just a commodity market and they'll never be a long-term differentiator. and then Facebook won sort of the core social piece. And then even after that, you had Instagram and you had Twitter and you had Snapchat and eventually bite dance and TikTok, right? So there are these ongoing waves of stuff even after people called the end of social.
Starting point is 00:08:45 And so I think the same thing is likely to be true here where there'll be certain people who start to grab parts of the market. You know, LinkedIn became the sort of enterprise identity social network or whatever you want to call it, right? Your resume social network. But then Facebook became one core piece, Insta became one core piece, Twitter became one core piece, et cetera. You know, Twitter was kind of news and real-time information. The same thing should have happened here over time.
Starting point is 00:09:09 Do you think the direction of other domains, let's say like video or audio or other model domains, goes in the direction of this commoditization? I think the reality is that it's going to be general purpose models for certain things and then specialized ones for other application areas. and that could be wrong, right? It comes down to what's the degree of generalizability that you have not only in the model capabilities, but then also in the tooling around it,
Starting point is 00:09:36 and then does the tooling need to be vertically integrated with the model? And to say, for example, you have a really good image gen product, and it may have artistic applications, it may have graphic design applications, it may have UI design applications. Is that all one model? Is it fine tunes or post-training in one model? Or is it, you know, one big model for one aspect,
Starting point is 00:09:58 and then a bunch of fine-tuned models, or I should say specialized models for other things. And I think that's a really big open question. And, you know, I think there's similar discussions to be had just around AGI or more general purpose intelligence, right? Like, is that going to be, if you look at the way the brain works, it's a set of reasonably specialized modules for vision and vision processing, for different aspects of emotion.
Starting point is 00:10:20 You know, there's these really interesting things in the psychological literature where somebody will literally have like a steel beam accidentally driven through their head in a construction site and it's something to lose a very specific type of emotional functioning or reasoning, but everything else is fine, right?
Starting point is 00:10:34 And so the question is like how specialized will these models be and how generalized and I think that's also true for things like image gen. You know,
Starting point is 00:10:40 will you have a different model for graphic design than what you're using for artistic expression? I don't know. You know, it's an interesting question. And I think time will sort of tell on that as well.
Starting point is 00:10:51 I do think one thing that's been interesting is in the last couple months, it does feel like the image on market has started to kind of heat up, right? Before it just felt like Majorney was going to be the default independent player that wins and then maybe end up with some multimodal stuff around Dolly and OpenEI or some of stuff Gemini was doing. But there's like an increasing number of companies now that are really emerging that seem really
Starting point is 00:11:15 interesting in terms of the fidelity of their models. One of the things that makes me feel a little bit silly is if I, like, have a belief like, you know, video, video and image models, audio models, like, they will tend toward, like, rapidly increasing capability and some commoditization and then still being surprised by the pace, right? And so I do think that there's, like, when SORA came out, for example, it's an amazing research advance. But there's also a sense of like who's really going to be able to catch them. And, you know, you could argue now that you have a handful of companies that are showing really amazing video generation capabilities where it's not actually like a bunch of smaller players are step function behind there between runway and
Starting point is 00:12:09 PICA and even, you know, if you go from, as you said, like image and video, you have like very small players or mid-stage players like the ideograms hot shots of the world. It's impressive to me how many times I see researchers come out, have a five-person team and not that much capital and versus the narrative of the AI like market five, six months ago, say like, oh, like, you know, we can produce something really competitive. Even the stuff that Luma Lapse has been coming out with, right? And so I think that's been an update for me mentally. I think one thing that's striking is the size of the models is shrinking relative to performance over time too, right? And so that may be through distillation, that may be through other things,
Starting point is 00:12:59 but across the board, we're seeing more and more performance off of smaller and smaller models, which I think is the other thing that I think a priori wasn't as expected, say, a year or two ago when all this stuff was kicking off. Like, you knew there was room to sort of effectively compress certain things but you know it's it's it's been striking how far you can go in some cases um and again the brain may be a good a good example of what is possible because you have a 20 or 30 watt device running in your head that's pretty good in terms of being general purpose and doing image identification and other things you know we have very cheap hardware running um so you know for that perspective there's still quite a bit of room to go and it's in a compact space i think we're
Starting point is 00:13:41 going to see really really cool experiences on the image video audio side because as you say if the models get smaller and they get better um they also get you know and there's different architectures like what cartisia is working on you're going to get much more real time and we don't uh i don't think we have a lot of real time applications in production at scale today and the the difference and experience like mark talked about this of you know you can generate images as you speak is a very different one than um the uh like No, I'm an artist making an output experience. And so I think that will happen over the next couple months.
Starting point is 00:14:21 There's sort of two areas of innovation relative to the stuff we're talking about. And we should probably touch on both of them. One is sort of the chip layer and how that may further accelerate certain things. And then secondly, I think it's a little bit of like what, how do you think about what you actually do in terms of the output, the data you train on, etc. And how much do you push the envelope on that? So, for example, say you go back to the early days of Google, and there was huge controversials, the controversies around Google, because what Google was doing is indexing the web. So it was taking all this content that was distributed around the world. And it was from the perspective of some of the folks back then, they were effectively scraping the web, right?
Starting point is 00:15:02 They were taking all the news content. They were taking everything that everybody had written and posted, and they were indexing it, and then they were making money off of it. And one of the things that they would do is they'd have this small, what was? it's called a snippet, right? And if you look at the Google result, there's like a little blob of text and then the link. And that blob of text, some people claimed wasn't under fair use for copyright law. It was a concept of fair use. Like, you know, can you use a small thing without having to pay the copyright owner? And so there was all these lawsuits and people coming after Google, both on the news side as well as the fact that they were portraying these
Starting point is 00:15:35 snippets that some people viewed as copyrighted information. And where that all netted out years later was sort of three things. Number one is they invented somebody known something known as Robots. Text, which is a file that you put on your website that tells a web crawler like Google or Bing or whoever whether or not they're allowed to index your information or crawl it.
Starting point is 00:15:54 Number two is people decided that these snippets fell under fair use from a copyright law perspective. Number three is there were some content deals that were struck for content, particularly around very specialized content, where Google was getting feeds and then incorporating them into their one boxes and things like that. And then the fourth thing I think that happened was that some of the people realized it was
Starting point is 00:16:15 better to be in Google than not because they'd get attribution. And so a lot of the news parties that pulled themselves out of the Google index, I said, just removed me from the index, realized they lost a ton of traffic by doing that. And so they went back and said, actually, you can start indexing us again because we realize it's a bigger financial penalty to not do it and to do it. And this took maybe a decade or 15 years to play out, right? It was kind of this ongoing arc. And Google, by pushing the envelope and being very thoughtful, actually, about the legalities of these things.
Starting point is 00:16:47 They had a really sharp team focused specifically on copyright in other areas, kind of threaded the needle, right? And kind of made it out reasonably unscathed by all this. How do you think that evolves for other companies in terms of, you know, the places that perhaps are seeing some questioning of approaches or image and some. some of the audio companies, et cetera, you know, how much risk do you think a startup should take and how should they think about the various approaches? And again, we have this really interesting set of past case examples that may be informative relative to this. So we have we have these businesses like as historical examples that absolutely push the envelope like Airbnb and Uber that challenge the concepts of, you know, restrictions on leasing and medallions as regulatory
Starting point is 00:17:37 capture, right? And the companies, these services wouldn't exist that many consumers love if they hadn't said, like, well, you know, we think consumers want this business. We're going to try to get to scale and try to understand the risk profile as we go. And then at some point when we have more market power of people actually using the business, you know, we will address some of these issues and like come up with a policy point of view. And I do think a lot of companies, that are operating in the AI space will have to wrangle with these questions along the way. I think a really common question for many companies is like, is Google likely to come after you for scraping YouTube data, for example?
Starting point is 00:18:24 Because there's, you know, I'd be shocked to find out if there are ways to get to scale a video data that don't involve some YouTube data. And, you know, I think the overall. Overall, orientation toward this should be like a business risk one, right? If you think about the story you just told about fair use and Google and, like, their general attitude towards scraping, I'd ask, like, well, why do they, you know, why do they allow SERP businesses to exist is taking a certain stance on YouTube, hypocritical, legally relative to their core business. There's also some examples of companies in the past that were completely obliterated by going too far. So Napster would be an example of that where the music industry sued it basically into the grave, right? And the music industry in particular, there's lots of examples of companies that have died due to lawsuits.
Starting point is 00:19:23 And, you know, I guess there's two types of risk. There's almost the legal slash lawsuit risk, which aren't always the same thing, right? There's a third type, the second type of risk, which is regulatory. Are you doing something that's pushing the regulatory envelope or where the regulations are very unclear? You know, crypto would be an example of that, but there's some examples in AI right now. And then the third is almost like reputation risk. What outputs are you willing to allow? And I think Grock has been really interesting from that perspective in terms of their explicitly saying,
Starting point is 00:19:52 we're not going to police the output that much, right, relative to what all the other parties are doing. and that includes what images will allow to be generated, and that includes what sort of text we allow to be generated or the kind of responses that we allow. And to some extent, that's probably a closer mimic to human behavior than what many of these companies have been doing, right? A lot of the companies have really been actively focused on preventing lots of different types of output from these models.
Starting point is 00:20:24 And in some cases, it feels like it's trying to do the right thing by users, And in some cases, it feels very politically driven in terms of the orientation. And so, you know, I think that's a really fascinating experiment that's ongoing right now in terms of how much does society care about the output of the model in terms of what you allow and don't allow relative to what are other norms for speech or other norms for creative expression that already exist in society. And I think a lot of the companies have actually curtailed it more than the norms for much of society, right? There may be a slim part of society that feels a certain way, but for most of the society, you know, there tends to be, or it looks like there's broader tolerance for certain types of things. And obviously there's things that you never want to have that are, you know, truly disturbing or are illegal in terms of content output? But I think it'll be interesting to watch how that all goes. I think it's a philosophical question as well of, you know, are you, are the restrictions to be on generation or on distribution, right? Because I think it's a, um, a, much stronger argument that if you own a platform, like controlling for certain types of distribution is a responsibility. Generation feels a little bit more like free speech, but I think
Starting point is 00:21:37 it is like a complicated question. Should we talk about semis? No, I guess the other piece that we talked about touching on here, one was sort of content and risk and, you know, how you think about the degree to which you should or should not push the envelope as a company. The other piece was semiconductors, and since semiconductor performance underlies a lot of everything that's happening in AI right now, be it training, be at inference, et cetera, how do you think about the coming wave of semiconductor startups or system startups that really started to emerge again? I think there was a prior wave maybe six, seven years ago, which was grok and cerebrus and a few other folks, and now it seems like we have a new wave between Maddox and etched and a few other
Starting point is 00:22:22 companies, some of whom are going to participate in this podcast recently soon. What do you think is interesting in this market? What's going on? So as you said, the wave, like, you know, more than five years back now was, and I really admire the foresight of some of these companies saying, we're going to have a different workload, that AI workload requires a different type of computation. But making a bet so far in the future on chip and systems design is a very hard thing, right? And And so I think the, you know, seven years ago, it was not abundantly clear that scale transformers were going to be such a big piece of the workload. And so I'd say, you know, the market has evolved in very unpredictable ways. And now you have, I think, a cluster of companies that is very focused on optimizing for transformer architectures and, like, area allocated to matrix math.
Starting point is 00:23:20 And so I think it would be an interesting question of whether or not you can surpass the economics of AMD and the economics, you know, performance for economics of AMD and Nvidia, which have been really strong, like, high speed innovators to date with, you know, some argument that AMD is making progress, especially with the investment in that ZT acquisition as well. But I'd say like the whole thing with chip investing is like what architectural bet are you willing to make because you have to run on a multi-year cycle and then pace of delivery and then price performance, right? But it feels like that bet is worth making. You know, I think you know a lot also about the like shape of demand that is also emerged, right? You have like a lot of sovereign cloud demand as well, which is an interesting opportunity for. companies. Yeah, what do you make of the AMD ZT acquisitions? How do you think about why they did it? What's the purpose? Like, what does this move by AMD right now? I'd say the market is pretty divided about whether or not AMD can become competitive. If you look at the pieces that they need,
Starting point is 00:24:31 you know, they need better software, like if you think about the competitiveness with CUDA. And so they bought this company Silo a little while back, which is essentially like a $600 million dollar aqua hire of hundreds of AI engineers and researchers who've done a lot of work on AMD. So there's that layer. Then you have the networking piece. So they're part of like UALink, open source competitor to NVLink. And then like the theoretical like missing component that ZT fills is, if you think of it as like a one to two billion dollar aquire of a thousand systems designs folks to support the like rack and data center scale AI business instead of like individual chips or component scales. Because
Starting point is 00:25:14 the, you know, the thing that Nvidia is really selling now is full systems through like multi-year strategy of, you know, delivering these essentially like data centers for, um, for research labs. And, and the question is like, can AMD go assemble the pieces to go do that? But, you know, one could argue these are all the components. Cool. I think we're at time. All right. Well, I'm excited to like talk to, um, you know, etched and Maddox and Cerevers and some of the companies that are working on this in the next wave. Yeah, it should be very exciting. And I'm going to do a quick PSA. So I'm very interested in buying either a human form slash humanoid robot or a like a spot or something else from like Boston Dynamics or, you know, one of those
Starting point is 00:26:01 really interesting robots. So if you have any suggestions or advice ping me or if you have one for sale, let me know. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel. follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.