No Priors: Artificial Intelligence | Technology | Startups - Context windows, computer constraints, and energy consumption with Sarah and Elad

Episode Date: May 9, 2024

This week on No Priors hosts, Sarah and Elad are catching up on the latest AI news. They discuss the recent developments in AI like Meta’s new AI assistant and the latest in music generation, and ...if you’re interested in generative AI music, stay tuned for next week’s interview! Sarah and Elad also get into device-resident models, AI hardware, and ask just how smart smaller models can really get. These hardware constraints were compared to the hurdles AI platforms are continuing to face including computing constraints, energy consumption, context windows, and how to best integrate these products in apps that users are familiar with. Have a question for our next host-only episode or feedback for our team? Reach out to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil  Show Notes:  (0:00) Intro (1:25) Music AI generation (4:02) Apple’s LLM (11:39) The role of AI-specific hardware (15:25) AI platform updates (18:01) Forward thinking in investing in AI (20:33) Unlimited context (23:03) Energy constraints

Transcript
Discussion (0)
Starting point is 00:00:00 Hey listeners, you are here for another episode of No Pryors, just with me and Alad. And there's been a lot going on in earnings and in the technical world. So I think we will start with maybe just one fun thing that seems to have taken flight in terms of music generation. Alad, what do you make of the popularity that Suno and Udio have found? When you said fun things, I thought you were going to talk about my hats. I have two hats. We can talk about your hats. I have my Bitcoin halving hat that I got from Coinbase, as you see,
Starting point is 00:00:40 the Bitcoin having. And then Zaid has these Maki Eye Great, again, hats. He's actually selling. I think it's going to fund his data labeling habit. That's a lot of hats. I know, yeah. That's all I got. It's all I got left, sir.
Starting point is 00:00:52 If anybody has read Jared Kushner's book, There's a great bit about how much money they're making from all the MAGA swag. And so listeners, Elad and I are making hats and tequila for the guests. Yeah. But for the low, low price of one H100 GPU, we will give each of those to you too. Or a Bitcoin. Either way. Okay, I'll take the Bitcoin instead. Yeah. Me too. I know. Yeah.
Starting point is 00:01:19 I think we all wait at this point. We can check in again in a couple months and the D100s come out or whatever time period. So, yeah, so, you know, as you know, there's been some really interesting things happening on the music generation side. And so there's both Suno and UDO and both seem to be kind of taking fuel by storm in terms of really interesting music-based models. And it feels like one of those things where it's early, but it's really giving a glimpse of what's coming in terms of the ability to create other types of content. Obviously, the very first content wave in some sense with simple text-based things on GPT3, like Jasper. And then we hit an image gen wave. and that was my journey and stable diffusion and things like that.
Starting point is 00:01:58 And then we had obviously chat come out as sort of a new type of format and interaction modality. And then we had video with things like PICA. And so it just feels like sequentially we're hitting these different formats. And then obviously Suno from Open AI. And now we have these really interesting music models where you can specify the type of music that you want. You can write the lyrics.
Starting point is 00:02:19 It'll add vocals. And so, you know, these really seem to be the two. models initially, at least the people are really adopting. And so it just seems like an interesting moment in time from the perspective of look at all these different creative things that people are now empowered to do and look at the different ways to engage. And of course, you could imagine going forward in time and saying, okay, at some point they'll be voice cloning where, and I think it was Drake who put out a song, right,
Starting point is 00:02:43 where he had two or three other rappers that he just voiced cloned in. And you can imagine a world where you could use anybody's voice, assuming there's permissions and everything else, to generate your own songs and content. things. So it just seems like a very exciting future world between UDOs, you know, and some of the companies. Yeah, I think one of the things that's not obvious here is in like media platforms in general, they're like the ratio varies, right? But there are a lot more readers on X or consumers, like people who scroll a feed on TikTok or not TikTok anymore, I suppose, but whatever it is than than creators.
Starting point is 00:03:23 And so I think, like, one thing that is just, like, unknown is how many people actually want to create music if you make it a lot easier to, you know, create something that's any good, right? And if the music we get changes. And so I was talking to one of these founders, and he was like, everybody should have a personalized soundtrack for their life, but in the voice of Taylor Swift, in the style of Taylor Swift. So I already have one of those, but yeah.
Starting point is 00:03:50 So what is yours? I can't really share it publicly, but we can talk about it later. Okay, okay, yeah. It's going to be a little bit, I think it's going to be a little bit of personal thing. Yeah, it's great. What else is going on? I don't know. I mean, what about local LMs in the Apple release?
Starting point is 00:04:04 You want to talk about that? Yeah, so it's interesting. Apple has entered the chat with a release of, you know, relatively small models that are now on hugging face and such. And I think it's just worth talking about the fact that, What some of the initial open source releases, Mixtral Lama did, was create, you know, pretty impressive reasoning capabilities that were open that developers could use. But there's been huge demand for models that actually have, you know, that level of capability or at least useful capability in a one and three billion parameter size that'll fit on edge devices. And of course, like, if you enable that, you have a very different latency paradigm in terms of what experiences for a consumer are possible. And then not paying for the compute of inference all the time means you can do things much more easily that are ongoing and passive and proactive.
Starting point is 00:05:07 And so I think there's a lot of developer demand. And I think of like foreshadows we should see Apple creating interfaces for running models locally as part of their ecosystem. That's my general prediction. Yeah, I think it's kind of interesting because I've seen a number of people building apps recently that have been these Mac or iPhone OS resident sort of exposure to LLMs. And in some cases, it's let me index everything on your computer and let you interact with it or search on it or build an embedding on top of it or set up or embeddings. Or let me just integrate chat GPT or other things into your desktop.
Starting point is 00:05:45 And there used to be like search bar apps and things like that and sort of prior generations. And so I think a lot of that is going to be really interesting from user experience perspective. And one can imagine a couple years from now, that's just going to be something that comes standard on your device, either direct from Apple or through some sort of partnership or something else. Do you believe any of those things can persist independently, right? Because I'm reminded of like the launcher platforms. Like, you know, if you think about Android as a more permissive platform, people trying to choose core, change sort of core user experiences like that at what felt like an operating system level was kind of tough. I think it depends on whether or not you truly take advantage of the browser and you're also accessing browser-related third-party applications that you're just getting through the web and indexing them or not. And so really it comes down to like what's the footprint of content that you're searching over in some sense.
Starting point is 00:06:38 And that to me is the biggest pivot point in terms of whether you'll end up with something that's going to be specific to the OS company or broader. I think in general, platforms tend to integrate the most valuable things into themselves, and so it's always risky. They do that as a strategy, and every once in a company that has breakout success, even though it's built on top of a platform. So I think the oddest example of that is Viva. Do you know Viva? They're like a $40 billion vertical SaaS company focused on pharma.
Starting point is 00:07:07 Oh, yes, yeah. Yeah, was built on top of Salesforce. It was just a Salesforce app. And they were just selling this into pharma, and then it started working, and they eventually swapped out Salesforce in the back end. But they were literally just some like thin layer on top of Salesforce for years. And so that's the only example I can really think of as something that's getting truly massive
Starting point is 00:07:27 on top of a platform that wasn't then subsumed by that platform. But I'm sure there's other examples too. Yeah, I think they're like all kind of apples and oranges. Yeah, Viva, sorry, so out of context. I'm like, what does I have to do with AI, man? And like operating systems. Yeah, I think choosing the vertical thing, make sense to me there, right? You've got compliance around workflows for selling to doctors
Starting point is 00:07:50 for life sciences that a Salesforce, like, didn't really have the expertise or maybe they just didn't see how big it was to attack. Well, they verticalized. I mean, Salesforce always had these verticals, right? Let's see. What's Salesforce market cap? It'd be interesting to see, like, relative to Viva, how much value. Salesforce is 266 billion. So yeah, Viva is like 20% of Salesforce or something like that. Yeah, that was worth investing in. Yeah, I think people also forget that all of Microsoft Office at some point were third-party applications.
Starting point is 00:08:23 So in the 80s, there were separate companies like Lotus and others that were providing what turned into Excel. There was a PowerPoint company that was very popular. Before Word, there was documents that people would buy separate applications, and then Microsoft just ended up subsuming them into its own distribution platform. And so, again, it's a very standard kind of traditional thing to have happened. the default case would be that those things eventually become part of the operating system
Starting point is 00:08:48 or the operating system company, but again, you never know. So I always think it's interesting to see what people do and can they make it cross-platform? Does that matter? The one other thing that I think is interesting about small models is the question of how smart can small models get
Starting point is 00:09:07 and how much can you actually pack into them? And to some extent, if you look at an LLM, there's like three or four pieces of capability that people care about. There's sort of the reasoning part of it. there's a set of capabilities in terms of what it can do from a synthesis or other perspective. There's a multimodality. The knowledge base or knowledge set is actually resident in the model. And you can only step so much into a 3B model or a 7B model or whatever
Starting point is 00:09:31 size you end up eventually you end up eventually running on device. And so that's the other question is what are the capabilities that you can actually have that are device resident versus which you all of them have to go out to the cloud for. And obviously, devices always expand their capabilities over time because of the microprocessors that have and everything else. But fundamentally, there are going to be limitations. And so the question is, where is that line? And you could probably define that analytically and just say, okay, today, this is all the stuff that's going to get cut off if you just do it on device and here's all the things that
Starting point is 00:09:59 you can do. And as we move up in terms of device capabilities and their ability to provide access to larger and larger LMs that are resident, what are the capabilities that come with that? And so I haven't really seen that analysis, but that seems like something that should be to be straightforward to do in terms of just trying to understand. understand what can you really do on device versus not. And therefore, what apps or products can you build that are going to be device resident? Yeah, well, I think there's like an obvious analogy to draw to just like the weight of like how much computing is done like on client or
Starting point is 00:10:31 in cloud, client server if you're really old. Right. And I don't think it's like obvious that there is any sort of principled answer here, except it's going to vary by application and how quickly, as you said, like capabilities actually fit into a whatever hardware people already have because now there are companies doing AI specific hardware already, right? And, you know, if you're trying to make something that feels very AI native work, there is a question of like, okay, well, you know, do we try to make the model work? What hardware do we, what compute do we want in the device itself? How much pre-processing do we do to, you know, send, you know, send less data over the network to large model in the cloud.
Starting point is 00:11:19 And so I definitely think there's going to be like a distribution of that computer, just like the religious wars about like, you know, what gets done, you know, on client and server in your CDN, whatever. But at one-circups test is, do you think that's not just your phone and your watch or something? Like where do you think the new AI hardware will matter, at least from a consumer perspective? One of the most interesting theories, and we can actually talk about like the meta-AI launch as well, is whether or not you want some sort of passive device
Starting point is 00:11:46 that is like continually collecting data about the world from a vision environment perspective, which is why you get like rayband better glasses instead of just like a phone or a watch that's sitting in your pocket, right? Sure. I guess you could also just fix small cameras to other devices as well. So it just comes down to do you need a new form factor or not,
Starting point is 00:12:08 although I think it's an interesting area, an interesting direction. And I guess relatedly, like when do you need that extra data for that information and under what circumstances we'll use it. I mean, it's super interesting, right? I think in general, if I look at the early mobile device products that emerged from it, a lot of them just took advantage of the accelerometer if you were doing fitness or GPS if you were doing everything from Uber to a variety of other applications. So I do think those sorts of new capabilities from a primitive's perspective are super interesting. So again,
Starting point is 00:12:41 it just comes down to, okay, what are you going to use? for now and when, you know, there's also really interesting unexpected applications of vision to other areas right now. So I really know if it's, you know, the meta thing I think is super interesting. Have you tried the product? Yeah. Yeah, I was very impressed. Yeah. They enter the race with like multiple different products, like different modalities up front. It's interesting that they haven't actually, they haven't pushed it that aggressively into their existing surfaces yet because they have like unlimited distribution, right? And it's just meta.com, like, independent product.
Starting point is 00:13:15 But I imagine they're going to, I imagine they're going to phase into it. Yeah, it seems like you would potentially, and who knows what their plans are, have like a, you know, a channel or bot on WhatsApp or on, you know, Messenger or on multiple different services that you can just start interacting with to do things for you. And that could effectively be a way to encapsulate meta. AI is like a just another line item or another account that you're interacting with in chat or another sort of properties they own. But I thought it was really well done.
Starting point is 00:13:46 I've been making different images with my kids on the different services, you know, playground and open AI and meta and everything else. And they really enjoyed the one-click animation. It's like a feature. Yeah. I think it's an impressive product launch. And then on the open source side, They've also made a splash where I think one of the things that surprise people is,
Starting point is 00:14:12 there's an argument generally in research about like how much do you want to try to change like the efficient frontier of model training or like do you just, you know, ride the curve and scale up. And if you are meta and you have somewhere between, you know, 22,000 GPU clusters and 350,000 GPUs available, then continuing to train past supposedly optimal points, like does improve performance, apparently, and doesn't just fully asymptote as soon as maybe a lot of the research community predicted. And so I think it just is actually a point in favor of the very large firms who are really willing to invest against this and don't, like,
Starting point is 00:14:57 it begs the question of how important is efficiency. I think efficiency and creativity and architectural approach is going to end up being really important for lots of different use cases. Like over time, applications are going to want efficient inference. And it's like really large models are impossible today to serve for the vast majority of use cases from a cost and speed perspective. But if all you're trying to do is a technical demonstration, like this is very impressive. Yeah, it did really nice work.
Starting point is 00:15:24 I guess the other thing on the sort of data compute platform side while we're talking about all these different platforms is Snowflake and Databricks and sort of that layer of companies. Do you think you need to own the model as a data or compute platform? Or how do you think about these various OpenSvores models that these folks are starting to launch now? One thing that's tricky as an investor or as an opera in the space is like it begins to look like all one blended landscape of competition. So the Snowflake platform hosts a bunch of different models from other players, including, for example, Mistral. and they're also training their own and their models are available on, for example,
Starting point is 00:16:04 like Microsoft Azure, right? And so you've got a huge number of players all in competition somewhere between models and inference. And it's like, I can't tell you how I think it is going to work. I do think that in the immediate term, like it is less obvious to me that they necessarily, like these compute platforms necessarily need to own the models, especially if there's a big landscape of open source out there versus need to demonstrate to customers that and actually have the expertise of, let's say, like, training models and fine-tuning them and deploying them and customizing them to different use cases. So I think there's a piece of it that is actually learning something that they can deliver to their customers and marketing too.
Starting point is 00:16:55 Yeah, it definitely feels like in the long run there's a bit of of a capital scale question, which is if you're focused on the frontier models at least, you need more and more scale in terms of compute or other things, which means you need more and more money to invest behind it. And that's why a lot of the models seem to have sponsors in the hyperscalers or other sort of large corporations. And so the question is, how long do some of these other players want to keep up in terms of trying to build things that are bigger and bigger? Do they just focus on a specific subclass of models that are more kind of medium to small that have more specialized use cases? And so one could
Starting point is 00:17:27 argue that in the long run, that's where those types of models should go, is the inference platforms probably provide things that are more in that range, and then the hyperscalers and their partners, you know, a handful of them will be at the frontier because those are the ones that have business models that can actually afford the subsidization of these massive frontier models with the idea that the ROI pays for itself dramatically later as you scale into more and more intelligence and more applications and things like that. So what's kind of investing far ahead And the question is how far I had concern in companies invest.
Starting point is 00:17:59 I'm sure you've been getting this question, but I get like a stressed out question from investors of essentially like how much money should the world spend on this and like is it rational? And I think this is an impossible question because should is like this is a question of people's business decisions,
Starting point is 00:18:19 like a very small number of individuals actually, right? And something that feels unique in AI to me is you have these full, who, like, they really believe in technology and, you know, strong intelligence, they can reason about exponential growth. That's something like Silicon Valley people are very proud of, but also all of these people are very capable of. And they have, like, very high risk appetite. And then, you know, they can control and marshal a huge amount of resources, $10 billion plus spend, $20 billion plus spend if you're meta every year on GPUs, 30, I think, 30 to 35 this year,
Starting point is 00:18:55 They may have changed the estimate there against these bets. And I actually think it's interesting to put in context. Like, this isn't completely unheard of in terms of investment toward some future goal. Like, you know, you and I have talked about chip fabs before and how much they cost. But if you think in aggregate, a handful of players in terms of the hyperscalers are spending almost $200 billion this year on compute for AI. And then you try to think about the other types of investment in a new ecosystem. Like oil majors spend $80 billion a year. Broadband providers, you know, last few, like it's about $100 billion a year of spent in infrastructure, right? Like CAPEX for high speed internet.
Starting point is 00:19:43 This is like going back a decade or two. But if you look at railroad freight, like there are years where you're spending a huge number on railroad CAPEX. And so I think, like the different dimensions you can think about spent are like well what is it in overall overall context and then you know what is what does that get you over time i think both these questions are pretty pretty hard to answer but it's not unheard of in terms of scale yeah 100 and i again it depends on your belief in terms of where all this goes but i think it's notable that the hyperscalers are doing it and then the one other potential source of immense scale in the long run maybe sovereigns as people want to customize models that are specific to their region or customs
Starting point is 00:20:28 or language or culture or whatever maybe. So there are a few areas of research that have been like debated hotly recently. And one is this idea increasingly of like unlimited context, which I think is like mostly a term for like actively managing context. But you're an investor in magic. Magic is one of the key players here. Can you can you talk about this? Oh, sure. I mean, In Magic, I think, was one of the first companies to launch a really long context window. I think they launched a 5 million token context window, oh, you know, six or 12 months ago now, so it's been a while. And then when Google came out with Gemini 1.5, I think it was a million token context window.
Starting point is 00:21:05 And then it seems lucky that a lot of people end up in the 10 million plus range in the next, say, year or two. Or, you know, some reasonable time frame I had. And what's happening is with longer and longer context windows, the way that you think about what you put into a prompt changes pretty dramatically because you can start dropping everything from entire code repos to all sorts of documents if you're dealing with legal on through to, you know, you kind of name it. You can also drop in all the context of a giant customer support queue, you know, because, you know, some of these things eventually get big enough that you can do really
Starting point is 00:21:40 significant things with them. I think one of the most striking examples of long context window being important is actually some biology models that have come out recently where just increasing the context window for things like protein folding seems to really make a big difference in terms of your end result in terms of the fidelity of that folding.
Starting point is 00:21:57 So I think this is going to end up being one of those really significant things. And it reminds me a little bit of microprocessors or bandwidth in the 90s where each generation of microprocessors that came out had a big step up in things that you could do with it or bandwidth, you know, instead of a dial-up modem
Starting point is 00:22:14 some of you had a fatter connection and then eventually had broadband, and now people have fiber in some cases. And so it feels like a similar thing where there will be a really long period, and long period in this, and AI means like two weeks or whatever, there'll be a really long, I mean, it's going to be much longer than that. There's a really long period where bigger and bigger context windows will matter, and then there will be some shift where, okay, now we've kind of maxed out what we're really going to be able to do with it.
Starting point is 00:22:39 But it seems like a significant shift in thinking around how important this is going to be. And again, in areas that I didn't expect necessarily like biology. I guess the other thing that people have been looking at, you know, we talked a little bit about context window. We talked a little about compute platforms. We talked a little about the capabilities with small models and where they constrained. The other potential future constraint is energy. So I'm just curious how you think about, you know, if you have a 500 megawatt or gigawatt data centers, like does the way we think about energy shifts? Like there have been things like job ads posted on the Microsoft website for like nuclear engineering and things like that. And so what do you think
Starting point is 00:23:21 happens from an energy constraint perspective? Because you kind of look at it. The first constraint was like chips and then it was packaging for the chips and then there's probably going to be some data center constraint where everybody miscalculates how many data centers we actually need and then energy is sort of related to that. So I'm just curious how you think about future energetic needs. I think like one supposed coming limit to scale is going to be energy and that nobody's built like a 500 megawatt gigawatt data center yet. And if you think of it as the equivalent of like a nuclear power plant's worth of energy going toward a single data center, it is quite large. And I mean, the basic understanding here is today to train these large models.
Starting point is 00:24:01 You need all of the GPUs co-located because there is enough data transfer between different chips, right, between your nodes. And there's a physical constraint on that in that you need to get that much power to a data center. And I think it's kind of, it's kind of interesting because Sam made the point recently that when you have constraints that require permitting and physical world changes, like you'll see a slow down in terms of how quickly you can go deploy this, right? It's no longer a software engineering problem only. And I think the fear, I think we're likely to work through a lot of these things.
Starting point is 00:24:39 But the fear is that some of the potential limits to progress are going to be, like, we can't just throw more compute at the problem because it's like physically hard to throw more compute at the problem, energy data centers, right? And then also, you know, the concept of the data wall, right, like the number of cheap available tokens on the internet we have used. And now we have to go figure out how to go get more, collect more. in the world or more likely like, you know, or in combination with generating synthetic data. That still feels like a bit's not atoms problem and you can solve pretty fast. But you can even see it in the designees for the new DHS safety board that the AI safety board that just got announced, right? Some of the players are very obvious. So the, you know, Sam and Satyos of the world, but it also includes people who work on energy to this point and infrastructure security.
Starting point is 00:25:45 Yeah, it makes a lot of sense. I mean, I actually found that the news around Microsoft investing $1.5 billion in Abu Dhabi's G42 was super interesting from the perspective of you're effectively setting up a big AI data center. A, you're starting to sort of democratize or broaden global access to it. But B, you're doing it. You're an energy center. And so, you know, I think that's really interesting. I think, you know, in general, if you view the last 50 or 70 or 100 years as two sort of competing philosophies around progress and anti-progress or abundance and scarcity, you know, one of the biggest wins on the scarcity side was really shutting down nuclear power
Starting point is 00:26:29 in the 70s, at least in the U.S. or the ability to add new nuclear power since, you know, 17 or 18 percent of U.S. power generation today is still nuclear. it's like 70% in France, it's 30-ish percent in Japan. So it's still quite significant in some places, but it could have been 70% in the U.S. and a bunch of other countries. And if you would have thought of that large, abundant cheap resource and now how that ties into things like compute and other areas,
Starting point is 00:26:55 it's a real game changer, right, in terms of capabilities. And so there are solutions. The question is, are we going to adopt those solutions at any point? But these are actually very solvable problems if we choose to solve them. Which is why I thought that job posting on the Microsoft website was so interesting. Yeah, well, and I'd love to see that reversed, even beyond just like general energy needs. If you do think of AI as a strategic issue and a national security issue, not using every energy resource we have and every energy technology we have, like you and I have talked about, you know, generations of nuclear technology that have existed, not deplore. in the United States for really like policy reasons.
Starting point is 00:27:42 It is yet another dependency that we're creating for ourselves on other countries without needing to, right? It makes the sort of energy dependency even worse to have AI rely on it. Yeah, all of geopolitics would be dramatically different if we just had a lot of widespread and clear power. And it's interesting to think of that. And you think of some of the biggest producers being Russia and Venezuela. obviously the Gulf, but, you know, Iran, et cetera.
Starting point is 00:28:09 And you think of geopolitical policy, it's kind of interesting to think about how the world would be different if we weren't dependent or as dependent on some of these forces. So anything else we should talk about? I think we're good. Do you want to see my hats again? Yeah, let's see the hats. That's the other piece of infrastructure
Starting point is 00:28:25 where people spend a lot of money in the past and just sort of hat-making, you know? Yeah. Millions of dollars a year under the infrastructure. The trade is good for anybody listening. We will offer you one brand, branded bottle of tequila or mock keelah or whatever you'd call it and a uh a cool no priors hat for h 100s let us know okay that's all we got find us on twitter at no priors pod subscribe to
Starting point is 00:28:53 our youtube channel if you want to see our faces follow the show on apple podcasts spotify or wherever you listen that way you get a new episode every week and sign up for emails or find transcripts for every episode at no dash priors dot com Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.