No Priors: Artificial Intelligence | Technology | Startups - Chips, Neoclouds, and the Quest for AI Dominance with SemiAnalysis Founder and CEO Dylan Patel

Starting point is 00:00:00 Hi, listeners. Welcome back to No Pryors. Today I'm here with Dylan Patel, the chief analyst at Semi Analysis, a leading source for anyone interested in chips and AI infrastructure. We talk about open source models, the bottlenecks to building a data center the size of Manhattan, geopolitics, and poker as a tell for entrepreneurship. Welcome, Dylan. Dylan, thank you so much for being here. Thank you for having me. I've been really looking forward to this conversation. You're such a a deep thinker about the space. And then also it's very odd. You clearly have the Samsung watch.

Starting point is 00:00:35 Yeah. I got the foldy phone. I got the blig. And the laptop. The fold. Yeah. Yeah. Tell me more. So part of the origin story is that I was moderating forms when I was a child. And my dad's first Android phone was the droid. Right. And for some reason, I was obsessed with like messing with it, like rooting it, like underclocking it, improving the battery life, all this things. Because when we run a road trip, there's nothing to do besides like mess around on his phone. So I posted so much about Android that I became a moderator slash R-slash Android on Reddit and like many other subreddits related to hardware and NVIDIA and tell them all this stuff. But because of that, I've just always had Android.

Starting point is 00:01:11 Now, I've had work iPhones before, but I just really love Android and that it's like, if you're going to like technology, I'm not like someone who pushes it, but like get the best stuff. So I have like the ultra Samsung watch, which I think looks cool and the foldy phone, right? It's fun. It's obviously different and weird. No iMessage is a travestee. What does it dominate at? What is it better at besides the openness of like the hackability?

Starting point is 00:01:33 I don't even hack that much stuff anymore, right? It's like, what do you use your phone for? I think the main thing is like you can have like slacken an email up on two different parts of your phone. I think that's probably the main thing or like you can actually use like a spreadsheet on a folding phone. You cannot use a spreadsheet on a regular phone. Okay. And that's not even an Android thing. Like Apple's folding phone next year will be able to do that just fine and I'll have no argument then.

Starting point is 00:01:56 But I just like it. You know, people have their preferences. people are creatures of habit. You got to look at the GPU purchasing forecast on a sheet on your phone. Yes, I do. I do, no. It's like someone's telling you numbers. You're like, wait, this is like slightly different than my number, right?

Starting point is 00:02:09 Okay, so we have a week of big rumored announcements coming up. Tell me your, like, reaction to the Open A.I. Open source model. In theory, it's going to be amazing, right? Like, I assume this is releasing after it's released or? Yes. So that's okay. The open source model is amazing, guys.

Starting point is 00:02:27 I think the world is going to be really, like, shocked and excited. It's the first time America's had the best open source model in six months, nine months, a year. Lama 3.1, 405B was the last time we had the best model. And then Mishral took over for a little bit, if I recall correctly, and then the Chinese labs have been dominating for the last, like, six, nine months, right? So it'll be interesting. It'll also be funny because, like, the open source model probably won't be the best for just regular chat. because it is like more reasoning focused and all these things.

Starting point is 00:03:01 But it'll be really good at code and it's how I'm excited for that. Yeah, like tool use, although that's like going to be confusing. Like how do you use the tools if you don't have access to opening eyes tool use stuff, but the model is trained to do so. That'll be interesting for people to figure out. I think the last thing is like the way they're rolling it out is really interesting. They accidentally leaked all the weights, but no one in the open sources figured out how to actually run inference on it

Starting point is 00:03:24 because there's just some weird stuff in the model with the architecture, like forbid and like the biases and all this other stuff. But what's interesting is other companies drop the model weights and say, go, make your own inference implementation. But opening eyes like actually like dropping the model weights and like all these custom kernels for people to implement in inference. So everyone has a very optimized inference stack day one. And they work with partners on it too.

Starting point is 00:03:47 Yeah, working with partners on this. But this is very interesting because like when deep seeks drops, it's like, well, together and fireworks are like, yeah, we're the best at inference because we have all these like people who are really good at low level coding, whether it would be like fireworks with all their like former pie torch meta people or together with like, you know, Tree Dow and all the, you know, Dan Fu and all these like super cracked like colonel people. They have like higher performance, right? But in this case, like opening eyes, releasing a lot of this stuff. So it's, it's interesting for the inference providers too. Like how do they differentiate now? Yeah. I mean, my premise on this is

Starting point is 00:04:18 in the end, a lot of the model optimization performance layer is open source. And, it's a commodity. And it will end up being like a fight at the infrastructure level, actually. Interesting. And so, you know, all of these inference providers, like as you mentioned, you know, fireworks and together, base 10 and such, they compete on both dimensions. And the question is, what's going to matter in the long term? Why would these model level software optimizations all be open? They haven't been open so far and the advancements are so fast, right? Well, I think they, a bunch of them have been partially

Starting point is 00:04:51 open. And I think opening eye is also pushing for them to be open. as well, right? And so I think there's a lot of force in the ecosystem to open source from both like the invidia level up and from the model providers down. And so I think today these providers all fight on that dimension. Yeah. And they also fight on the infrastructure dimension. And I think infrastructure is going to end up being a bigger differentiator. That makes sense. You can't open source your actual infrastructure, right? You just have to have the network and you have to run it. Yeah, yeah. That makes a lot of sense. Although like I see today the inference like providers have such a wide variance right like the ones you mentioned are

Starting point is 00:05:28 on the like the leading edge especially like together and fireworks i think are on the leading edge of like their own custom stacks all the way down to like there's a lot of people who just take the out-of-box open source software yeah i think there's no market for that but those guys have just yeah i agree there's no market it's like commoditized yeah they have really really way worse margins than the people who are very optimized when you see invidia trying to open source all the stuff around dynamo and and opening i and all these other people are trying to open source stuff but the level of optimizations is also like really, really large, like cashing between turns and caching tool use calls and all these other things. And it's not just like a single server

Starting point is 00:06:05 problem. Like it's like the deep seek implementation of inference is like 160 GPUs or something like that. Like that's over $10 million of hardware. And then that's just one replica. And then you'll have a lot of replicas and you share the caching servers between them. So like seems like just that orchestration of that, but also the infrastructure of that. It's a very large amount of infrastructure. I don't know. That's interesting thought that that will be completely commoditized optimization layer. Well, I think that there's optimization at the single node level and then there's like the system software where you can like orchestrate this. And I think that owning the abstractions for it and having people use your tools and more sophisticated

Starting point is 00:06:40 teams to do that optimization. It's like very ugly distributed systems problem. I think that will matter. Okay. I could agree with that. I can agree with that. Single node is not necessarily. Yeah. I agree. Let's move a, you know, out and a layer down like what does having access to an American open source model mean or just more and more powerful, like, open source AI models mean for the application ecosystem. I mean, I know, like, a lot of people and some enterprises are really iffy about, like, using, like, the best open source model. They're, like, worried.

Starting point is 00:07:09 It's like, there's nothing wrong with them today. There's nothing in them today, right? You know, there's the worry that one day they will. How do you check? I mean, you don't, but you can just vibes it out. Like, they're, like, competing with each other to just release as fast as possible, right? Like, like Deep Seek and Moonshot and all these other last. you know, Alibaba, et cetera, like, they're competing to release as fast as they can with each other.

Starting point is 00:07:28 The Alibaba teams in Singapore, like, I don't think that they're, like, putting Trojan horses in these models, right? And, like, there's some interesting papers that Anthropic did on, like, you know, trying to embed some stuff in models and ended up, like, being detectable pretty easily. Again, like, I don't know how to, you know, I'm not, I'm not too much into that space of interpretability and, like, evals, but I just don't think that they are, right? It's just a vibes thing. But some people are worried that they could be or they're just like iffy. Like, oh, I don't want to use a Chinese model. It's like, well, fine, but now you're going to go use a service that is backed by a Chinese model, which is fine. Like, you know, like, but they, you know,

Starting point is 00:08:03 they're fine with that. They just don't want to directly use the model. I don't know. I think, I think it's, it's interesting for some enterprises who are still stuck on Mama, but it's mostly just really interesting because it continues to move the commodity bar up. Now with this tier being open source and sure, like probably won't be like drastically better than Kimmy. but Kimi is so big, it's so difficult to run, like people aren't running it, whereas the opening eye models is like relatively small, so you can run it without being like gigabrain of infrastructure. You end up with that commoditizing so much more of the closed source API market. And I think that's just going to be great for an option, right?

Starting point is 00:08:38 Yeah, one of my hopes is for our companies that are doing more with reasoning, it is like they're still blocked on cost and latency. So this is something that I've found very interesting is that we've been trying to build a lot of alternative data sources for token usage. Who's using what tokens, what models, where, et cetera, why? And it's very clear that people aren't actually using the reasoning models that much in API. Like Anthropic has eclipsed open AI and API revenue and their API revenue is primarily not thinking. It's clawed for, but it's not in the thinking mode. You know, code is code being the biggest use case that's skyrocketing.

Starting point is 00:09:18 And the same applies to like open AI and deep bind from what we see querying big users and other ways of like scraping alternative data because the latency issues, because the cost issues especially, right? The cost is just ridiculous. Exactly. So I guess my view is you're not allowed to have a tech podcast without saying the words Jevin's paradox now. And I think like I think the behavior is going to be like we see a lot of people use reasoning

Starting point is 00:09:43 because it's so much cheaper to run if you take out a big piece of the margin layer and you make it smaller. And so I think, like, we have a lot of companies that are at scale who are using it, but it's so expensive that they restrain themselves. For a long time, opening I was charging more per token for the reasoning model, right, 01 and 03 than they were for GPD40, even though the architecture is like basically the same. It's just the weights are different. And there's like some reason for it to be a little bit more expensive per token because the context length is on average longer. But in general, like it made no sense for it to be like, was it like 4X the cost per token? That didn't make any sense.

Starting point is 00:10:17 And then finally they, like, cut it. But for a long time, not only was it like way more tokens outputed, it was also a way or higher price per token, even though, and they were just taking that as margin. Because they could, right? Because they had the only thing out there. Yeah. And then, you know, Deepseek dropped and Anthropic and Google and others started releasing models and it like, you know, commoditized quite a bit.

Starting point is 00:10:35 But this is going to just like kneecap, like cut everyone off at the hip, right? And bring margins down again. So that would be fun. Who has an API business, you mean? Yeah, yeah. For API for models that aren't like, like, super. leading edge. What do you think evolves in the sort of neocloud layer over time? It's funny. Every day we still find a new neocloud. We have like 200 now. And still every day

Starting point is 00:10:58 we find new ones, right? Should they all exist? Obviously not, right? So to some extent, it depends on what the neocloud business is. Like today, there is quite a bit of differentiation between the neoclouds. It's not just like buy a GPU, put it in a data center. Otherwise, you wouldn't have some neoclouds with like horrible utilization rate and you wouldn't have some neoclouds who are like completely sold out on four, five, six year contracts, right? Like Correweeb, for example. Who doesn't even quote most startups? Or they just give them a stupid quote because they just like, I don't want your business. Or like they want a long-term contract, right? Which a lot of

Starting point is 00:11:28 people don't want to sign. And so like there's quite a bit of differentiation in financial performance of these neoclouds, time to deploy, reliability, the software they're putting on top, right? Like many of them can't even install slurm for you. It's like, what are you doing? And you should have some sort of like- So very low-level hardware management. Yeah, yeah. It's like very, and it's like to some extent, from the investor side, we see a lot more debt and equity flowing in from the commercial real estate folks. As commercial real estate has been really poor over the last couple years, few years, they've been starting to pour money into cloud space. Obviously, the return profile is

Starting point is 00:12:00 quite different because it's like a short-lived asset versus like a longer-lived asset. But at the end of the day, like these companies, they're okay with a 10, 15% return on equity, right? And over time, that falling. That is not okay for venture capital, right? And yet a lot of these neoclods are backed by venture capital. So a lot of these companies will fail either because it no longer makes sense for them to continue to get venture funding or they end up getting out competed because they just can't get their utilization up unlike, you know, some other clouds, right? Like the like the, uh, core weaves and crusos and such of the world, right? So there's sort of like a rock and a hard place for a hundred of these neocl clouds. And there's many of them who are like, oh no, I purchased these

Starting point is 00:12:41 GPUs. I have a loan. It costs me this much. And because my utilization is here, I'm, like burning cash, right? And they should at the very least not be burning cash, right? And so some of them are like, you know, they're desperate to sell the remaining GPUs. So they go out to like, you know, companies and give them insanely low deals. There's some startups who I really commend because they like really figured out how to get the desperate neoclouds to give them GPUs. But those neoclouds are going to go bankrupt at some point because their cash flow is worse

Starting point is 00:13:08 than their debt payment. But at the end of the day, like there's going to be a lot of consolidation. There is going to be differentiation, right? There's a lot of software today. But we have this thing called ClusterMax where we review all the neoc clouds and major clouds. And it's like, like actually some of these neoclouds are better than Amazon and Google and Microsoft in terms of software. In terms of uptime and availability or however you measure that. Yeah, uptime availability, network performance.

Starting point is 00:13:33 There's just a variety of things that they don't have all the old baggage. But the vast majority are worse. And we measure across like a bunch of different metrics, including the ones I mentioned and security and so on and so forth. But our vision of like ClusterMax is that it starts at like a really low stage today, which is like, does the cloud work and how long does it take the user to like get a workload running? Because you have Slurm installed or you have K's installed and your network performance is good or your reliability is good and it's secure, right? Like these are like table stakes. Like what we consider gold or platinum tier today will be just like table stakes in like, you know, six months a year, a couple of years. There will be a whole layer of like software on top.

Starting point is 00:14:11 And then it's like, do neoclouds build this software, right? And some of them are, right? Like together, Nebius are offering inference services on top, right? So they're saying, hey, we actually want to provide an API endpoint, not just rent GPUs. And Corrie rumored by the information to be attempting to buy fireworks for the same reason, right? Like, do you move up or do you just slide down into like, I'm making commercial real estate returns? Or you have to go crazy, right? Like Crusoe is like, we're going to build gigawatt data centers, right?

Starting point is 00:14:41 like, okay, there's no competition there. There's like a few companies doing that, right? So it's very different. So either have to go like really, really big or you need to move into the software layer or you just make commercial real estate or you go bankrupt, right? Like these are the paths for all NeoClouds, I think. I really have to believe there's a reason for being for these companies. And my like simple framework for it is I think the software layer is really hard for people coming from this operation to to try and build. Right. It's actually a lot of very specialized software. So I think people will buy or partner into it. But if you think about other inputs, it could be like, I'm very good at, like, finding and controlling power agreements.

Starting point is 00:15:17 Yeah. It could be like, I build it a scale. Other people are incapable of doing so. Yeah, yeah, which is like sort of what like. Or like Nvidia wants me to exist, right? I can't like think of like a lot of arguments beyond that. And so I would agree with you, like eventually we're going to see consolidation either in this layer or, you know, commoditization by the inference providers. But in the meantime, there is a lot of lunch to eat from.

Starting point is 00:15:41 Amazon, who continues to charge, you know, really, and Google and Microsoft, who continue to charge, like, absurd margins for their compute because they're just used to doing that in CPU world. Yeah. Right. And so, like, their R.OIC is, like, extremely high on CPU and storage. And to assume that it can, like, translate over to GPUs is a bit of a fallacy, which is why, which is why a lot of these companies are moving in, right?

Starting point is 00:16:07 And it's like, okay, in standard cloud, there's a lot more software that, like, people can't just build out of nowhere. Yes, EC2 is a product that is like pretty simple, but like block storage and all these other things are actually quite difficult to do at scale well, like that Amazon does. And that's what makes them able to charge this absurd margin on standard compute. But now, like, it's like, well, the cloud doesn't actually generate, create any software that the user, end user actually uses, right? It's like, sure, I need summer communities, but then I'm just using PyTorchrist or open source. And I'm using a bunch of Nvidia software maybe, or which is open source.

Starting point is 00:16:39 I'm using a bunch of open source models. I'm using VLum and SGL, which are open source. It's like you just go down the list. It's like there's actually no software that the cloud can provide to deserve the margins that Amazon and Google's clouds do have today. If you're just infrastructure provider. I think that there is software that the cloud can provide. Yes.

Starting point is 00:16:57 But the major clouds have not delivered that the software. Agree. Agree. Okay. Same page. Because it's really hard to do this stuff, right? Like there is no reason that every single startup needs to have like multiple people dedicated to Infra and like figuring out to run models and like their SLA, their reliability is just

Starting point is 00:17:13 so low, right? Like so many, so many random SaaS providers that are AI, like they have GPUs, they have open source model. It works great, except sometimes it fails and then it's down for eight hours and it's like, why? This shouldn't be a problem. It should be something you should just be able to pay away. I mean, I feel like the multi-trillion dollar question that you have thought about for perhaps longer than almost anyone else is like, what does it take to actually challenge Nvidia, you know, asking for a friend, what would it take? The, like, you know, simple way to put it is like, it's a three-headed dragon, right? Like, you have, you have, they're actually just really, really good at, you know, engineering

Starting point is 00:17:48 hardware and GPUs, like, that is difficult. They're really, really good at networking, and then they're really, I would actually say they're like, okay at software, but everyone else is just terrible. No one else is even close on software, but, you know, and I guess in that argument, you can say they're great at software, but, like, actually, like, you know, installing in video drivers is not, like, not always easy, right? Well, there's great, and there's also just, like, well, there's like 20 years plus of work in the ecosystem, right? Yeah.

Starting point is 00:18:12 There's today's capability and, like, usability and there's just, like, mass of, like, libraries. Yeah, so I think invidia is really hard to take down because of those three reasons. And it's like, okay, as a hardware provider, can I do the same thing as Nvidia and win? No, they're an execution machine, and they have these three different pillars, right? I'm sure they have a lot of margin, but, like, you have to do something different, right? In the case of the hyperscalers, right, Google with TPUs, Amazon with Traneum, meta with MTIA, they are making a bet of, I can actually do something pretty similar to Invidia. If you squint your eyes now, like Blackwell and TPU is starting, like the Invidia architecture with TPU architectures are actually converging, like, say, memory hierarchies and similar sizes of systolic arrays, like it's actually not that different anymore. It's still quite different, right? But hand wave view, it's like pretty similar. And Tranium and TPUs are very similar.

Starting point is 00:18:59 architecturally, the hypers are not doing anything crazy. But that's okay because they can just like do the mass, the margin game. That's fine. But for a chip company to try and compete, they must do something very unique. Now, if you do something unique,

Starting point is 00:19:11 it's like, okay, all your energy is focused on that one unique thing, but on every other vector, you're going to be worse. Like, are you going to be there at the latest process known as fast as in media? No, okay, that's like 20, 30% right, on cost slash performance and power, right?

Starting point is 00:19:23 Are you going to be on the latest memory technology as fast as in video? No, you'll be like a year behind. Great. Same penalty. Are you going to be the same on networking? No, okay, you know, you just stack all these penalties up. It's like, oh, wait, your unique thing can't just be like two to four X faster.

Starting point is 00:19:38 It has to be like way faster. But then the problem is if you really look at it simplistically, right? Like a flop is a flop, right? Again, like this is super simple. But like there is not 10x you can get out of doing a standard von Neumann architecture on efficiency of compute. In which case, do all of these things that in video will engineer better than you because they have a team of 50 people working on, you know, just memory controllers and

Starting point is 00:20:02 HBM and just like a networking, or actually like thousands of people working on networking, but like each of these things, do they just cut you by a thousand? And that's like, oh, actually what would have been 5x faster is now only like 2x faster. Plus if I like misstep, I'm like six months behind and now the new chip is there, right? And you're screwed. So or or supply chain or like intrinsic like challenges with, okay, getting other people to deploy it now or rack deployments. There's all these supply chain challenges, right?

Starting point is 00:20:27 Like literally in Amazon's most recent earnings, they said they're like chip architecture is not aggressive. Their rack architecture is very simple. It's not that aggressive. They're like, yeah, we have rack integration yield issues, which is why we've had, which they like blamed their miss on AWS for their trading of not coming online fast enough because of rack integration issues. And when you look at the architecture, like we have an article on it.

Starting point is 00:20:48 It's like it's not like that crazy. Like it's like what Google was doing like four or five years ago, right? It's like, oh, wait, supply chain is hard. And Amazon couldn't get everything in supply chain to work. And so therefore they missed their AWS revenue by a few percent, right, which caused the whole stock market to freak out. But it's like, there are so many things that can go wrong in hardware and the time scales are so long. And then the last thing is that like model architecture is not stagnant. If it was, in video, it's optimized for it.

Starting point is 00:21:12 But model architecture and hardware, right, software hardware code design is the thing that matters, right? And these two things, you can't just like look at one in individual, right? Like there's a reason why Microsoft's hardware programs suck, right? Because they don't understand models at all. Right? Meta, meta, their chips actually work for recommendation systems and they're deployed for recommendation systems because they can do hardware software code design. Google is awesome because they do hardware software code design. Why is AMD not catching up despite being awesome at hardware engineering? Well, yeah, they're bad at networking, but also they suck at software

Starting point is 00:21:41 and they can't do hardware software code design. You know, there's like much deeper reasons why you can get into this, but you have to understand the hardware and the software and they move in lockstep. And whatever your optimization is doesn't end up working, right? So one example is all of the first wave AI company, AI hardware companies, right? Cerebris, GROC, Samba Nova, yeah, Graph Corps. All of them made a very similar bet. No, they were very different, right? Some of these are architecturally pretty weird relative. Right, they're architecturally pretty weird, but they made the same bet on memory versus compute, right? We're going to have more on-chip memory and lower bandwidth, right, off-chip, right? Because that was the trade-off they decided to

Starting point is 00:22:19 make. So all of them had way more on-chip memory than Nvidia, right? Invidia, their on-chip memory has not really grown much from A100 H100 Blackwell, right? It's up 30% in like three generations, whereas these guys had like 10x the on chip memory, right? All the way back in like when they were competing with A100 or even the generation before. But that ended up being a problem because they were like, oh yeah, we could just run the model on the chip, right? You can put the whole weight, all the weights on there. And then, you know, we'll be so much more efficient. And then the models just got way too big, right? And Cerebrus was like, oh, wait, but our ship is huge. Oh, wait, but still the model's way too big to fit on it. This is like

Starting point is 00:22:53 very simple, right? You know, the same thing's happening. in the other direction, right? Like, some companies are like, oh, we're going to make our, like, systolic array, your compute unit super, super, super, super large because, let's say, Lama 70B is an 8K hidden dimension and your batch and all that. Like, it's a pretty large map mole. Oh, great. Okay, we'll make this chip. And then all of a sudden, all the models get super, super sparse MOEs, right? Like, the hidden dimension of deepseek's models are like really tiny because they have a lot of experts, right? Instead of one large map mall, it's a bunch of small ones, you do route, right? And all of a sudden, like, if I made a really, really large hardware

Starting point is 00:23:26 unit, but I have all these small experts, how am I going to run it efficiently? You know, I, I, no one, they didn't really predict that the hardware would go that way, but it ended up going that way. And this is like, this is actually the case with at least two of the AI hardware companies today. I don't want to, I don't want to shut up to talk them just because, you know, it's a, let's be friendly. But like, this is like, like, clearly like what's happening, right? So it's like, you can make a decision. It's a hardware bet that will actually be way better on today's architectures, but then architecture evolves in the generality of like invidia's GPUs or even like TPUs and Traneum is like more general than like as an architecture, but then it doesn't

Starting point is 00:24:02 beat Nvidia by that much, right? In which case, they're just going to destroy you with their six months or a year ahead on every technology because they have more people working on it. And their supply chain is better, right? So you, you, it's kind of really tough to make the architecture bet, have the models not just go in a different direction that no one predicted because no one knows where models are headed, right? even like you know you can get Greg Brockman and he might like have like a good idea but like I'm sure he doesn't even know where bottles will look like in two years so there's got to be a

Starting point is 00:24:29 level of generality and it's hard to like hit that intersection properly and so I'm very hopeful people compete with invidia I think it would be a lot more fun there'd be a lot less margin eaten up by the infra there'd just be a lot more deployment of AI potentially if someone was able to compete with invidia effectively but invidia charges a lot of money because they're the best And, like, if there was something better, people would use it, but there isn't. And it's just really hard to get, be better than them. I mean, you had to give the first-gen AI hardware companies some credit because they, like, made a secular, correct decision about the workload.

Starting point is 00:25:03 But then the architectural decisions, like, ended up being hard to predict correctly, right? Then you have the cycle of Nvidia innovation, which is really hard to compete with, both hardware and also, as you said, supply chain issues. Even just putting together servers is hard. Yes. I think the thing that you point out that, like, people oversimplified was with maybe a current generation of AI chip startups. They're like, we're betting on transformers. And it's a lot more complicated than that in terms of workload at scale and continued evolution and model architecture.

Starting point is 00:25:33 And it's also not exposed so that if you're not working with the soda labs at, like, from the beginning. And then you can't make predictions because nobody can make a lot of predictions right now. It's very hard to, like, say, I'm going to be better at the workload. two years from now in a very comfortable way with no other changes happening. Like, I can't make that better right now. Yeah, and it's like one of the interesting things about open eyes, open source models, it's like all their training pipelines, but on a quite boring architecture, right? Like, it's not their crazy, like, cool architecture advantages that they have in their

Starting point is 00:26:07 closed source models, which make it better for long contacts or more efficient KV cash or all these other things, right? They're doing it on a standard model architecture that's publicly available. They like intentionally made the decision to open source a model with a boring architecture that's pretty much open source, right, already. Like people have already done all these things and kept all the secrets internal that they wanted to keep. And it's like what's what's in there, right? Are they even doing standard scale dot product attention? Probably.

Starting point is 00:26:33 But like there's probably a lot of weird things they're doing, which don't map directly to hardware. Like you mentioned, right, like transformer chip architecture is like there's a lot more complicated here than just like, oh, it's optimized for transformer. because like so is an Nvidia chip and a TPU and their next generation is more optimized for it. Like they take steps towards it. They don't leap. But as long as they're like close enough to where you are architecturally optimized for workload, they'll beat you because of all the other reasons. And I think your description of like how might a like a chip startup win or any vendor win by specializing.

Starting point is 00:27:06 Like that actually is really hard in this era. Like generalization may continue to win to a degree. And it happened with all the edge hardware companies too. You know, we talk about the first gen AI hardware companies for Data Center. there were a handful, but for the edge, there were like 40, 50. And like, none of them are winning because it turns out the edge is just take a Qualcomm chip or an Intel chip that's made for PC or smartphone and deploy it on the edge, right? Like, that ended up being way more meaningful.

Starting point is 00:27:33 So it ends up being like the incumbents, they can take steps towards what you're going for. And if you didn't execute perfectly or if the models didn't change the architecture away from what you thought it would be, you end up failing. If you had to make a bet that something becomes competitive, what is the configuration or company type that does that. I don't want to show any company that I've invested in or anything like that. And so therefore I'm not investment advice. No, no, no.

Starting point is 00:27:54 But like I would just say like I probably think that like AMD GPUs or Amazon's Traneum will be probably more likely to be a best second choice for people or Google TPU, of course, but I think Google is just more interested in it for internal workloads. I just think that those will be much more likely options to succeed than a chip hardware startup. But I mean, I really hope they do because there's some really cool stuff they're doing. If we zoom out to the macro and we think about just the scale of hardware and data center deployment for these workloads, people talk a lot about the operational constraint on building data centers of this size, the power constraints. I think in particular on the power side, it's very interesting

Starting point is 00:28:39 how that practically shows up. Is it generation at scale, at cost? Is it, it grid issues? How should, you know, more people in technology understand this? Yeah. So supply chain is always like fun because like people want to point at one thing is the issue. But it always ends up being these things are so complicated. Like if one thing was solved, you could increase production another 20 percent and then something else would be the issue. You think it's a multi-bottleneck issue. Yeah. Or like, hey, for company A, it's actually because their supply chain is this, this is the issue. And for company B, it's this is the issue. But, you know, that's sort of in generalities.

Starting point is 00:29:13 But, like, I think zooming out, right, like, no opinion, like, he had a really fun blog about, like, is this AI hardware buildout going to cause a recession? I think it's actually funny because you can flip the statement and be like, actually, the U.S. economy would not be growing that much this year. If it weren't for all the AI buildouts and as a result, data center infrastructure as a result, electricians wages have soared. As a result, power deployments and other capital investments, which have 1530 year lifespans are being made. and all of this CAPEX is in turn actually growing the economy and like actually maybe the economy wouldn't even be growing much or at all if it weren't for all of these investments. One thing that is perhaps looked over from the White House AI Action Plan was the view of like, we're going to build these AI data centers in the United States.

Starting point is 00:29:59 We're actually going to need like a lot of general investment beyond the GPUs and the power, which are everybody's first two items into like labor, for example, right? So if you just, you know, for simplicity's sake, be like, it's the size. of Manhattan and we have to run it and it's a new system with changing topology and like very high degree of relatively novel hardware with failure. Yeah. And like lots of networking that I'm like, um, like kind of feels like we need to have a bunch of new capacity like from a labor or robotics. In like 23, it was like very simple. It's like, Nvidia can't make enough chips. Oh, okay, why can't Nvidia make enough chips? Oh, co-os, right? Chip on Wayfront substrate packaging

Starting point is 00:30:38 technology. And it was like, oh, HBM, right? Like those were like, it was like very Very simple, 23, 24, like, yeah, all these tools involved in that supply chain. It was great. But then it, like, very quickly became much more murky, right? Then I was like, oh, data centers are the issue. Oh, okay, we'll just build a lot of data centers. Oh, wait, substation equipment and transformers are the issue. Oh, wait, power generation is the issue.

Starting point is 00:30:58 It's not like the other issues went away, right? Like, actually, you know, COAS is still a bottleneck and HBM is still a bottleneck. Optical transceivers are still a bottleneck. But so is power generation and data center physical real. estate, right? Like, I mentioned like meta is literally building these like temporary like tent structures to put GPUs in because building the building takes too long. And it takes too much labor, right? As you mentioned labor, right? That's like one way they were able to remove a part of a constraint. They're still constrained on power and they had to delay the bring up of some

Starting point is 00:31:29 GPUs in Ohio because the AEP, the grid in Ohio like had some issues, right? The utility, right? With like bringing on a generator or something, right? Oh, okay, great. Well, we'll buy our own generators and put them on site. Oh, wait. Now there's an eight-year backlog or whatever for your backlog for GE's turbines. Yeah. Oh, okay, great. I'm Elon. I'm going to buy a power plant from overseas that's already existing.

Starting point is 00:31:49 You're going to move it in. Okay, great. Now there's like permits and people protesting against me in Memphis. Like, you know, there's like, there's like a bazillion things that can go wrong. And labor is a huge one. I've literally had people in pitches be like, no, no, no. We've already booked all the contractors. So no one else is going to be able to build a data center in this entire area of this

Starting point is 00:32:08 magnitude besides us. Because we took all the people. We took all the people, they're going to have to fly them in. But it's like, okay, fine. Like, you can fly them in, but it's like, there's just, like, not that many electricians in America. And as a result, we've seen the wages rise a lot for people building data center infra. There's a group of, like, these Russian guys who used to work for Yandex, Russia's search engine, who, like, wire up data centers who now live in America and they get paid a ton. Like, and they get paid bonuses for being faster.

Starting point is 00:32:34 And therefore, they do, like, certain drugs to be able to finish the buildouts faster. Because they get bonuses based on how fast they build it, right? Like, it's like, there is crazy stuff going on to alleviate bottlenecks, but it's like, there's bottlenecks everywhere. And it really just takes a really, really hyper-competent organization tackling each of these things and creatively thinking about each of these things. Because if you do it the layman old way, you're going to lose and you're going to, like, you're going to be too slow, right?

Starting point is 00:32:57 Which is why Open AI and Microsoft, partially like Microsoft is not building Stargate for Open AI, right? It's because it would have just been too slow and they're doing it the layman old way. You have to go crazy. You have to go. That's why Microsoft rents from Core we have a ton, right? because, oh, wait, we need someone who can do things faster than us. And, oh, look, Corwin is doing it faster.

Starting point is 00:33:14 And now, like, you know, Open AI is, like, going to Oracle and Corrieve and others, right, N-scale in Finland and all these other companies all around the world, the Middle East, right, G-42, like anywhere and everywhere they can get compute because you put your eggs in many baskets and whoever executes the best will win. And this infrastructure is very, very hard. Software is, like, fast turnaround times, like, you know, it's still hard. Software's not easy, but it's like the cycle time is very fast for, like, try something. fail, right? Try something else. It is not for infra, right? Like, what is XAI actually done to

Starting point is 00:33:45 deserve their prior funding rounds? They haven't released a leading edge model, right? And yet their evaluations higher than Anthropic today, right? At least, you know, Anthropics raising, but whatever, right? Like, it's Elon A. And B, they've tackled a problem creatively and done it way faster than anyone else, which is building Colossus, right? Like, and that's like commendable because that is part of the equation of being the best of models, right? Yeah, besides the talent. Yeah. And Elon is like known for me, I want to get talent. So it's like, it's like there's so much complicated on the infra that, you know, it'd be nice to say there's one thing. But yeah, like the White House action plan lists a lot of things. But I want like, you know, how do we concretely like solve

Starting point is 00:34:20 the talent issue? It's like there's not enough people in trade school. The pay will go up and that'll help, but the time skills and that are too slow. Like do we somehow import labor, right? That's how the Middle East is building all their data centers. They're just importing labor. Or is there something more intelligent we can do? Robotics, right? I think I just realized today you told me just now, like a company I seen or I angel invested in, you led the round, right? Like, it's really cool for data center automation, right? Like, there's all sorts of, like, interesting problems on the infra layer that could be tackled and tackled creatively. Speaking of, like, the policy and geopolitics implication here, like, what do you think

Starting point is 00:34:55 about the, you know, White House implication that America needs to, like, export the AI stack or, like, needs to control important components of it? Like, it's better for us to be exporting invidia chips than to foster a new industry, it's better for us to have like a globally leading open source model, et cetera. Like what actually makes sense to you there? I want to tell a crazy story. I was in Lebanon for a week. This is a good start. Yeah, this is completely unrelated, but it just popped in my head. I think it'll be entertaining. I was in Lebanon. I was with a few of my friends. So it was like two Indian people, two Chinese people and then a Lebanese person, right? And these like 12 year old girls right up to the Chinese woman that was with us,

Starting point is 00:35:34 like my friend. And they were like, oh my God, your skin's so beautiful. Do you like sushi, right? It's like fine. You're just ignorant. But what was really interesting is like when they asked where we're from where like San Francisco, they're like, do people get shot in the streets? Because their entire worldview is built from TikTok. Okay. Of politics. And it's like when you think about the global propaganda machine that is Hollywood and it's not intentional. It's just American media is pervasive. It built such a positive image of America. Now like with monoculture broken and it's more social media based. A lot of the world thinks America is like people are getting shot all time. It's, like, really bad, and it's, like, bad lives and people are working all the

Starting point is 00:36:08 time. It's unsafe. And, like, you know, like, Europe has a certain view of America. And, like, I don't think it's accurate. Like, random Lebanese, 12-year-old had a really negative view of some, like, they liked America. They loved Target for some reason because some influencers posted TikToks about Target, but, like, they had negative views of America. It's like, from a sense of, like, what is important is, like, the world should still run on American technology, right? And they generally do still in terms of the web, although, you know, Tick-Tock has broken that to a large degree. But in this next age, do you want them to run on Chinese models, which now have Chinese values, which then spread Chinese values to the world? Or do you want

Starting point is 00:36:44 them to have American models, have American values? Like you talk to Claude and it has a worldview, right? And it's like, I don't know if you want to call that propaganda or what. There's a worldview that you're pushing, right? And so I think it makes sense that we need that worldview espouse. Now, how do you do that, right? The prior administration, current administration had different viewpoints on this, right? Prior administration said, yes, we would love for the whole world to use our chips, but it has to be run by American companies. And so it was like Microsoft, Oracle, we're cool with you building shitloads of capacity in Malaysia. We don't want random other companies doing it in Malaysia. So the prior

Starting point is 00:37:16 diffusion rule had a lot of technical ways in which like, you know, you could be, you can have these like licenses and all this. It was very hard for like random small companies to build large GPU clusters, right? But it was very easy for Microsoft and Oracle to do it in Malaysia. Of course, the current administration tore that up, and they have their own view on things. I mean, I think there was a lot of things wrong with the diffusion rules, right? They were just too complicated. They pissed a lot of people off, et cetera. Now they have a different view, which is like, what did they do in the Middle East, right? With the deal they signed. Well, actually, most of those GPUs are being operated by American companies or rented to

Starting point is 00:37:48 American companies, right? Either or, right? Like G42 operating them, but renting them mostly to like Open AI and such for a large part. Or Amazon and Oracle and others are operating the GPUs themselves in the Middle East. So it's like, okay, that's effectively the same thing, but in a very different way. That is still, I think, a view, right? Which is like, we want America to be as high in the value stack as possible, right? If we can sell tokens or if we can sell services, we should. Okay, but if we can't sell the service, let's at least sell them tokens. Okay, we can't sell them tokens, at least sell them like infra, right? Whether it'd be data centers or renting GPUs or just the GPUs physically. And it's sort of like makes

Starting point is 00:38:23 sense right in the value chain like give them the highest value highest margin thing where we capture most of the value and like squeeze it down to where like actually for like the bottom of the stack right like the tools to make chips maybe you shouldn't sell and so like current export controls and policy dictate that yes you know it's better to sell them services but sell them both right like give the option let us compete and don't let anyone else win I think the challenge here is that like how much are you enabling China by selling them their R-GPUs, like, how much fear-mongering around, like, Huawei's production capacity is there? Like, how realistic is it versus not because of the bottlenecks of, like, Korea sanctions

Starting point is 00:39:03 that America's made Korea put on China for memory or Taiwan on China for chips or, you know, U.S. equipment on China, right? Like, there's a lot of different sanctions. Many of these are not well-enforced slash have holes, but it's sort of like a, it's a very difficult argument on, like, how much capacity of GPUs should be sold to China. A lot of people in San Francisco, frankly, don't sell China any GPUs. But then they cut off rare earth minerals and, you know, like, ostensibly most people think that like the deal was that you get, you get GPUs and also EDA software because the administration banned EDA software

Starting point is 00:39:38 for a little bit, just for like a few weeks, basically, until China was like, okay, we'll ship rare earth minerals. You can't just ban everything because China can retaliate. If they banned rare earth minerals and magnets and such, car factories in America would have shut down and the entire supply chain there would have had like hundreds of thousands of people not working right like you know like there is like

Starting point is 00:39:55 there is a push and pull here yeah there is a push and pull here so like do I think China should just have the best Nvidia GPUs? No like that that would suck but like you know can you give them no GPUs? No they're going to retaliate like there is a middle ground and like Huawei is eventually going to have a lot of production capacity but there's ways

Starting point is 00:40:12 to slow them down right like properly ban the equipment because it's not there's a lot of loopholes there properly banned the subcomponents of like of memory and wafers because Huawei is still getting you know wafers in Taiwan from TSM through like shell companies right like it's like you know there's a lot of enforcement challenges because parts of the government are not like funded properly or not competent enough and has never been competent right so it's like how do you work within this framework well like okay fine we should sell them some GPUs so that they you know that kind of slows them down on a Huawei standpoint, although not really, right? But also, like, gets us back

Starting point is 00:40:47 the rare earth minerals, but don't sell them too many, right? Like, how do you find that massive gray line is what the administration's grappling with, in my view? Implied in that opinion is your belief of they are going to be able to build Nvidia equivalent GPUs eventually, if forced. Maybe not equivalent. Sorry, price performance competitive. There's, like, interesting things here, right? Like, if China has a chip that consumes 3x the power. But they have 4x the power. But they have 4x the power then. Yeah, like, who cares, right? Like, you know, obviously there's a lot of supply chain challenges with building that. And it's like, hey, maybe it's on N-minus two technology. It's on five-year-old

Starting point is 00:41:19 technology or four-year-old technology. Great. And it only consumes three-six-the-power because they are able to do a lot of software optimization, architecture optimization, et cetera. They end up with something that maybe cost a little bit more. But like, what do you think about the value of a GPU today, right? Like, you know, the GPUs dominate the cost of everything. But over time, services will be built out with your high margin, right? And you can go look at Anthropic or opening AI fundraising docs and, like, see that their API margins are good. API margins are nothing compared to what service margins will be for people who use these APIs to build services. And that's nothing compared to the, like, net good to the economy from how much

Starting point is 00:41:55 automation can happen and how much increased economic activity there is. So this is the argument of like, okay, even if their chips costs 3X as much, do you help subsidize that rationally? They can subsidize that rationally because the end goal is like, oh, wait, actually, we can deploy a lot of Chinese AI and make money and gather data. Because people are sending us their like prompts and all their databases and all this stuff to our models controlled by our companies, et cetera, right? Like plus we're just making money off of it. And they've done this in other industries, right?

Starting point is 00:42:22 They rationally subsidized like solar and now no one can even compete on solar or EV. And it's like very close to no one can compete on EVs even, right? Besides like Tesla really. And even Tesla is adopting a lot of like Chinese supply chain, right? It is rational to say you want to have America have more AI prowess around the world. you know, so that random child in Lebanon doesn't think America is, like, bad, or they're using American products more than China, Chinese products. But, like, how you get there is very difficult. And it's a, it's a hard thread to weed. Thread. You got it. I don't croquet, you know.

Starting point is 00:42:55 Oh, my God. Crochet. Crochet. You clearly don't. Croquet is the game. I want to ask you, like, a wild card, a question to finish out. We're trying to get Mark to do the podcast. Zuck. Yes. You can ask him any question. what would you ask? Mark, you got to do the podcast.

Starting point is 00:43:13 I thought like the, did you read the page they put up? I thought that was very interesting that they were like, we want AI to be your companion. So my question to him is not like around his infestuff because I feel like I know most everything. Like you can figure that stuff out from supply chain and like satellites and all this stuff. But like the interesting thing I'm curious about is philosophically. What exactly like does the world look like if everyone is talking to AIs more than other people or if they're interacting socially with the AIs more than other people? Do we lose our human element?

Starting point is 00:43:40 do we lose our human connection? It's not the same thing as, hey, I'm posting on social media and we're interacting with our social media posts, which that already breaks the brain of a lot of people. What happens when it's like always on your face, like, meta, you know, his worldviews like meta reality labs makes these like devices that you wear and they're always, they have all this AI on them and you're talking to the AI companion all the time. How does that change the human psyche? Like this human machine evolution, like, is what are the negative ramifications of it?

Starting point is 00:44:08 what are the positive ramifications? How do we, how are you going to make sure that there's more positive ramifications from this than like, you know, the sloppification and like complete brain rot of like our youth, right? Which I like love my brain rot, right? Like it's like, okay. Obviously the coding wars continue to be like very central. And we were talking about cognition's relevance and like how to think about the strategy here. But I do think it's really funny what flipped to your bit on cognition. Can you tell the story? I thought cognition, NGMI, right? Like, you know, like Open AI, Anthropic, X-A-I, etc.

Starting point is 00:44:44 They're just going to make better code models. Like, you know, they just have way more resources. General models will win. You know, I hadn't really met too many people. There was just like a pure vibes-based thing. And I, you know, I'd used a little bit of Devin, but I was like, whatever, right? Like, it was like, cloud code seems better. And we use that internally.

Starting point is 00:44:58 But, like, I went to Koto's East Meets West event. It's an awesome event where there's people from Asia. Like, there was like, you know, all these like CFOs and CEOs of like major Chinese companies. East Coast of U.S., all these finance bros, also West Coast, like a lot of tech people, right? So you and I were both there. There were people from governments and major companies. And Scott was there.

Starting point is 00:45:17 I spoke with them, like, very briefly. But then what was interesting is like, it's like, you know, they have a poker night one night. And everyone gets blasted. The, like, leader of Kutu is like very good at poker. These hedge fund guys are just good at poker generally. And I love it. Like poker as well. Yeah, it was a big poker culture in the Bay.

Starting point is 00:45:33 I was playing. I'm okay. Right. But I see, I look over. at the super high stakes table, Scott's just dominating everyone, right? I'm like, what is going on? Like, how are you like, you're like taking chips from like, CEO of major Chinese company?

Starting point is 00:45:47 I don't want to name people's names because I think there's like some terms around them like naming who's there. But like, you know, it's like, you're like winning like a lot of chips from a lot of big people. And it's like all of a sudden my vibes were like, I don't know, maybe like maybe he can. Maybe he can take from the lion, you know? So I was like very excited about that.

Starting point is 00:46:03 You know, I thought it was funny. I still have zero like, I have not done. much due diligence on their code product. Like, you know, like, it's like, nor have I on, like, Claude Code besides the fact that we use it. But it's like, you know, cool. Well, I think WinSurf Acquisition Part 2 is like a pretty good hand to play here. And, you know, as somebody who invests a lot at a, you know, violently competitive application level.

Starting point is 00:46:25 Yeah. Poker game is live, man. Everybody, they're, you just invested live players. Exactly. And it's, I just loved that, you know, that was how he, uh, he dominated everyone. And it's like, it's such a stupid reason because I pride myself on being analytical at, like, data-driven. And it's like, you know, vibes. Correct for any entrepreneurs listening.

Starting point is 00:46:44 I think, like, you know, Dylan might angel invest or we might back you fully if you win the cognition poker game. And we'll host it conviction. Okay, if we got it. Good. Awesome. Yeah, thank you. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel if you want to see our faces.

Starting point is 00:47:05 Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Chips, Neoclouds, and the Quest for AI Dominance with SemiAnalysis Founder and CEO Dylan Patel

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.