This Week in Startups - CoreWeave’s Brannin McBee on the future of AI infrastructure, GPU economics, & data centers | E1925

Starting point is 00:00:00 You look at the existing cloud infrastructure that was built over the last decade. It was built for serializable workloads. It wasn't built for parallelizable workloads. And it's like you're having to rebuild the cloud, so to say. And you're having to rebuild physical infrastructure at the pace of AI software adoption. It's a mind-blowing concept, right? Because AI software is being adopted at the most rapid scale of any technology that we've ever observed.

Starting point is 00:00:23 I mean, we're building at, I think it's 28 data centers this year across North America. We're one of the largest operators of this infrastructure in the world. And we are unable to keep up with demand. And we really don't see that subsiding for years to come. This Week in Startups is brought to you by Open Phone. Create business phone numbers for you and your team that work through an app on your smartphone or desktop. Twist listeners can get an extra 20% off any plan for your first six months. at openphone.com slash twist.

Starting point is 00:01:01 Gusto is easy online payroll, benefits, and HR built for modern small businesses. Get three months free when you run your first payroll at gusto.com slash twist. And Northwest Registered Agent. When starting your business, it's important to use a service that will actually help you. Northwest Registered Agent is that service. They'll form your company fast, give you the documents you need to open a business bank account, and even provide you with mail scanning and a business address to keep your personal privacy intact. Visit Northwest Registeredagent.com slash twist to get a 60% discount on your next LLC.

Starting point is 00:01:47 All right, everybody, welcome back to this weekend. Startups, we've got a great guest for you today. You may have been wondering, who's buying all of these Nvidia H-100s? And how are people getting access to all of this hardware? Well, there's a couple of companies that got to the hosting of AI in GPUs early. One of those companies is CoreWeave. They started in the space, I believe, doing a lot of crypto, where miners were renting GPUs from their cluster.

Starting point is 00:02:18 And Fortune favors the bold. They were in the catbird seat when the AI revolution happened. and everybody decided, well, they got to train their own models, going to need a bunch of Nvidia's hardware and other people's hardware, and we'll talk about that today. And they've since grown to a massive scale. For those of you who don't know,

Starting point is 00:02:41 invidious market cap has increased more than 7x from $300 billion to $2.3 trillion if you've been living under a fuck. And you haven't been watching this. It's because people want access to these chips. Welcome to the program, Ranan McBee, who is the CDO and co-fee. founder. What does CDO stand for? Chief Development Officer. So my role is raising capital for the business.

Starting point is 00:03:04 I interface with equity and debt participants for the company and help fuel the growth of the business. And this is a very capital intensive business. You spent a lot of money on GPUs and setting up infrastructure. The company's been around for just under a decade. Am I correct? Yes. We founded the company in 2018. Got it. Am I also correct that you were supplying GPUs largely to crypto and Bitcoin miners and this cohort of individuals? Or that was the beachhead market?

Starting point is 00:03:38 It was absolutely the beachhead market. So, you know, I can take a couple steps back on our founding stories. So we, it's myself, my two co-founders, we're from the institutional commodity trading sector. So we're risk managers by background. We're from hedge funds, finance guys, is a part of. probably the best way to look at it, but we were finance guys who were heavily data-oriented, right? We worked in this commodity sector that you can actually solve for price. You can

Starting point is 00:04:06 figure out supply demand. And there is a dollar per barrel of oil, so to say that that solves market dynamics. So we've always worked with a lot of compute. We've worked with a lot of software. And the crypto space was interesting because it was a arbitrage opportunity, right? There's a very discrete input price, cost of power, and you could model the revenue very efficiently because there was no customers, right? You were just participating in this network, and thus, if you were just to sell the revenue from the crypto mining proceeds every day, you could effectively qualify it as an arbitrage opportunity. And that was interesting to us, but it wasn't as compelling as, you know, a large business because at the end of the day, all you're going to do

Starting point is 00:04:58 is chase the power or price of power lower, right? Like, that's the only advantage you can really extract unless you expand into other markets. Right. So when we were looking at the cryptocurrency space, it wasn't Bitcoin mining that we were interested in. It was Ethereum mining and GPU-oriented mining, because a Bitcoin miner, basic, things that are produced by entities like that mean, they can only do that one thing. They can only participate in Bitcoin, and they're very good at it. But at GPU, well, it can do lots of things, including running AI workloads. So we started in the crypto space, but it was always with this idea, and we had no idea how complicated an idea was at the time, but started with the idea that, well, you could do crypto

Starting point is 00:05:48 and other things. Right. Started there. The other use of these is, I guess, running video games in the cloud. Is that current? Yes. Cloud video games become a real market, or do people who are into video games just buy themselves an alienware, Dell, whatever, and be done with it?

Starting point is 00:06:07 I think it's more the latter that they use it more for that. We certainly don't see that demand for video game streaming. I think a few of the hypers scalers, tried to launch into that market, and I don't believe that there has been a substantial demand. Got it. And so when we look at crypto, that market kind of fizzled right as AI was starting to boom. So you were able to sort of just navigate that? Or is the crypto still going on?

Starting point is 00:06:37 And people are still using your services for Ethereum and, you know, being part of that network? Or is that just too hard of an arbitrage now because people in China have stolen electricity we hear or their friend runs the hydro dam so they run an extension cord, so to speak, over to their warehouse with a bunch of servers in it. And you're up against people getting zero cost of input electricity. It's a great question. And yes, I'm extremely excited that we're not involved in that cryptocurrency market anymore. We haven't been involved for a number of years at this point. We actually started making this transition into the cloud infrastructure market in 2019. We hired Peter Selenky, who was recently elevated to our chief technology officer position in 2019 to build a cloud for us.

Starting point is 00:07:26 And it's not just plugging in GPUs and having users come access. It's this really complicated software stack that runs the cloud. Or in other words, it's an orchestration environment that enables users to access and use our infrastructure. And it's one that it's very different than the way that the hyperscale is. built it because they built for hosting websites and storing data leaks. And we built our cloud from a no compromises engineering solution for running AI workloads and highly paralyzable workloads. And there's engineering decisions you make in doing that that you wouldn't make for hosting websites. And that's allowed for us to, I would say, overperform in market with a product

Starting point is 00:08:16 that really doesn't have competition. We started in 2019. Yeah, the software layer to provision these H-100s, A-100s, whatever people are using, that's a key part of the puzzle that you have to build, you have to master. And AWS's Google's, cloud, everybody's Azure, they're all slightly different in using their own provisioning software, or is there some open source standard there for doing all that? That's exactly correct. You know, we use and contribute to open.

Starting point is 00:08:46 source as much as we can, but we have a proprietary orchestration solution that looks different than the hypers do. My favorite analogy for this actually comes out of the automobile sector, where at the end of the day, everyone produces vehicles the same way, right? From research, design, scaling, servicing, it's the same sort of product with different badges and different colors on it. Right. And it's been that way for 60 plus years.

Starting point is 00:09:16 And then in the 2000s, a company came along and said, well, what if we started with a blank slate and designed this process today? And, you know, ultimately Ford might have to produce vehicles like Tesla does, but I think we can all appreciate the foundational difference in the way that those vehicles have brought to market. And the challenges that Ford will have to go through to get there. And I think it's a lot of the same for the hyperscalers, right? I'm not going to tell you a trillion dollar company with tens of thousands of engineers can't do what we do. But I will highlight the innovator's dilemma that sits there because there's an existing product. You have to, you know, change everything underneath to run infrastructure like we do. And it's going to be hard to get there.

Starting point is 00:10:09 And these, we go through the economics of, let's just say, NH100. This is Nvidia's state of the art. it's not just a GPU, it's a rack, essentially. It's a platform to put many GPUs on. I'm not sure exactly how many of the H-100 holds, but it holds a number of GPUs. They go for like 30, 40 grand each, is my understanding. So it's a server or node, as we call it,

Starting point is 00:10:35 within a network fabric. And a server has typically eight GPUs within it. And then you put those into a cabinet and you put those into a data center. and bring power into the data and you connect it to the internet. Right. And then, yes, those things, that price range was accurate for each one of those GPUs in a server. So a server can cost upwards of a quarter million dollars.

Starting point is 00:11:00 And so you rent out one of those H100 GPUs, an individual GPU in a server like that, a cluster, a node for four bucks an hour, something to that effect, yeah? Yeah. Yeah, that's right. Where we specialize, though, is doing it at scale. Right. Like, we don't have many clients who just use one at a time. Our clients will use 10,000 at a time in a single contiguous fabric, which makes it a supercomputer.

Starting point is 00:11:31 And it's interesting, like, this actually has become where we operate some of the world's largest supercomputers at this point. I think, you know, several of the top 10 now sit on our platform because of how large. these fabrics are and how performant these sheep use are at these specific tasks. So somebody fires up 10,000 of those. I'm assuming they get some kind of volume discount. So if it was $2 or $3 an hour, they're spending $20,000, $30,000 an hour on a job at one of those, correct? Something in that range?

Starting point is 00:12:03 Yes, but I will correct the discount. It actually works inverse, right? Because it's extremely surprising, but it's because building a single fabric, of this size is so engineering intensive that not many people are able to do it. Right? Like, there's not a template. Not a lot of companies have gone out and done it.

Starting point is 00:12:25 It's actually, you know, maybe three or four on the planet. We're actually building fabrics of this scale. So you actually make a more scarce resource through scaling. You're kind of like decommoditizing the market through scale. Like it's not one GPU or 10,000 GPUs. It's, oh, 10,000. thousand GPUs, that's a totally different engineering solution. Juggling multiple devices and apps to run your business is a mess.

Starting point is 00:12:53 Open phone is here to make it simple by simplifying your business communications with one easy-to-use app. Open phone has rethought every detail of what a modern business phone should be. And here is the magic. It works through a beautiful, elegant app on your phone or you can just use it on your desktop, making it super easy to get a business phone number for your entire team. you know how brilliant open phone is? My teams use it every single day. My sales team loves it. My ops team, they use it all day long. And here's the features that we love. You can create a shared phone number like customer support with multiple employees fielding all the calls and all the text

Starting point is 00:13:30 to that one number. At my investment firm launch, we pride ourselves on replying to every single call or email instantly. And open phone is the number one rated business phone on G2 for customer satisfaction. So here's your call to action. Super easy. Open phone. This is already affordable. Starts at just 13 bucks a month, but Twist listeners get an extra 20% off any plan for the first six months at Openphone.com slash twist. And if you have existing numbers with other services, no problem. Open phone's going to port them over easy, peasy, lemon squeasy, no extra cost. Head over to openphone.com slash twist to start your free trial and get 20% off. And how much of that cost, when we look at a $4 an hour cost, would you say energy is? Because

Starting point is 00:14:10 these things are extremely energy reliant, right? They consume. a lot of energy. So I'm curious how much of all this is energy, and then where do you put your, this will take us down the energy rabbit hole, but where do you put your data centers? And then where a data center is going to be in the future because these GPUs are taking a multiple of what CPUs would use.

Starting point is 00:14:31 Yeah. They do. So maybe you could explain that to us. Yeah. It actually causes a pretty substantial bottleneck that exists in the market now is data center capacity, right? And it's not square footage of data center. It's data centers that have enough power brought into them and adding more power to those data centers.

Starting point is 00:14:51 And arguably, that's where like the next bottleneck in this cloud infrastructure, or GPU cloud infrastructure market sits, is how do we access enough data center space to accommodate the volume of demand that's coming in? So power, you know, it's roughly about 10% of our cost to deliver this infrastructure. The infrastructure itself is actually where most of the cost sits from a depreciation perspective. Depreciate over a six-year life on the infrastructure. The power side is, that's my background, right? It was in power markets and trading different electricity markets. So it consumes a lot of power.

Starting point is 00:15:33 It has an immense amount of efficiency over CPU infrastructure as well for what it's doing there, to run the same workloads on GPU versus CPU, it's actually more power efficient to run it on a GPU, if you're trying to achieve the same outcome, because you'd have to use so many CPU cores to get to the same solution. So, yes, they consume more power on a density basis, but on a workload basis, they're more efficient. And this is, I mean, it's staggering.

Starting point is 00:16:06 I was looking at one study that said, like, one of these GPUs at 60, 70% capacity years, like the average American households, energy consumption, and that's just one of them. So this would be the equivalent of like, if somebody's using 10,000 of these, or I think Zuckerberg's going to be using low millions of these, it's like putting a million households online

Starting point is 00:16:26 or something to that effect, yeah? So. Yes, it's an immense amount, but it's also a, you know, it's a transformational technology. I'd say that we're looking at, and its ability to unlock value from data is something that we've never observed before.

Starting point is 00:16:44 Yeah, so that speaks to justifiable. Like, yeah, I mean, I'm not even looking at this judgmentally. Like, is it worth the amount of energy it's consuming? I was looking pragmatically where, let's assume it is worth it. Where is it comes from? Yeah, let's say it's going to cure cancer. It's going to find solutions for renewables or fusion that like we didn't even conceive of or the gains from it will be so extraordinary.

Starting point is 00:17:09 It will obviously pay for itself and create an energy independent future. But what's happening in the industry today as people are buying these and looking for places to store them, you're looking to build up your infrastructure? Are we just out of energy? And where are the nooks and crannies where people are looking to locate these facilities? I heard that nuclear power plants are going to become like a place where people put these, the plan is to put a nuclear power plant and these GPU data centers next to each other. Yeah.

Starting point is 00:17:40 Is there any truth to that? Yeah, look, I believe that was Microsoft or Amazon is effectively taking that nuclear plant and siding a data center next to it. Power, I think, beyond data center space, right, is a national concern. I'd so to say, there's been an immense amount building out renewable capacity over the past decade, which is fantastic, but it's also a, not necessarily the right kind of capacity, what you need for consistent demand growth, right? As you know, solar works when the sun's out, wind works when the wind blows. Neither of those things work for a data center or even necessarily for electric vehicles for all these like kind of demand areas. We need more base load power.

Starting point is 00:18:27 That's traditionally come from coal and natural gas. Fortunately, it's been more. It's been more so from natural gas over the last decade, because coal is quite dirty from an emissions perspective. And my personal hope is that it's more nuclear going forward. But it takes time to build nuclear sites. Right. So I think it's a decade for, you know, siting and build and in the United States. And we haven't built one in a long time. I mean, I think the last one broke around in the late 60s or early 70s, and we haven't had one since. So this would lead to lead one to believe that somebody with a lot of nuclear power and a lot of GPUs would have a massive advantage. It certainly helps. We've contracted a substantial amount of capacity for looking

Starting point is 00:19:14 to ensure the growth profile of our business, but it is going to be a bottleneck for all other participants in the market. What about heat? These things throw off a lot of heat. And And, you know, some areas in the country are warmer than others. Is it, are people moving these data centers north in order to get the cold air? You just, you know, we've seen pictures of, you know, data centers that have open sides where cold air just blows right in or, you know, open doors essentially. Because in other places, if you were to put these GPUs in Texas, I think you're going to be air conditioning them, which seems doubly inefficient.

Starting point is 00:19:56 So maybe talk a little bit about the heat these things generate today. and there's any hope of cooling them down without air conditioning. Yeah. So it's a great question. It's funny. It takes me vividly and visually back to my crypto mining days where we did run those warehouses with the open sides and the giant fans. And we were up north could never run this infrastructure in those environments.

Starting point is 00:20:21 From a security perspective, from a reliability perspective, it is mandatory to run this infrastructure and what's called a tier four data center environment or sometimes even a tier five data center. And that's the highest classification in terms of reliability, redundancy, security, and environmental handling. Right. So these are sites that you would see like Amazon or Google or Microsoft running within, right?

Starting point is 00:20:49 Like true data centers that are meant for cloud infrastructure. And the way to think about the heat output, the other variable, there, which is wild, is actually this sound, is there extremely loud in these environments? You know, I'm worth about 100 decibles, right, which is, you know, a direct derivative of the heat they're consuming, right, and then to move the air. But the way you look at it is the critical load around the infrastructure, right? So it takes one unit of energy to run the infrastructure, it takes another 0.2 to 0.3 units of energy to cool the infrastructure and run the networking and everything else around it. The way that the world's leading sites handle this is just through forest air, right?

Starting point is 00:21:36 Just move tens of thousands of cubic feet per minute of air through these highly contained pods. So you have all the hot air in a really small area and you're just jamming air through it. And sometimes it's conditioned, sometimes it's just air. But eventually, it's going to be liquid. We're going to move to an environment where you have direct-to-chip liquid cooling instead. And that efficiency ratio, call it 1.3, will drop to about 1.1 instead. So your GPU infrastructure will inherently become more energy efficient as we move to a liquid-cooled environment. and we're working with leading data center operators such as Switch to facilitate and implement that movement for these upcoming generations of GPUs.

Starting point is 00:22:28 When people say liquid cooled, most people have not actually physically seen that. Unless maybe you're a gamer and you've seen your chip and a tube run to the chip and there's literally liquid on the top of the chip that's cooling it. How do these, when you say liquid cooled, what could people envision of how these solutions are going to work? Are they going to just be like a bunch of racks in an Olympic-sized swimming pool or is it just like little contained amounts of water on top of the GPUs? Yeah. So you're qualifying that correct. There's two broad categories of liquid cooling. There's immersion cooling, which is the Olympic swimming pool method, downsized, obviously.

Starting point is 00:23:11 And then there's direct-to-chip liquid cooling, which is running the pipes to the chip. We will sit on the direct-the-chip liquid cooling side because it's operationally more efficient for us. That's where we think that the sector is broadly going to go. If you think of immersion cooling, right, you're literally dunking a server into a Vata liquid. That liquid has its own problems with it as well. But let's say you had to go service that server. there's a node or component that was wrong with it, well, you've got to lift it out of the liquid, right?

Starting point is 00:23:48 And what happens then, well, you've got to wait probably an hour for all the liquid to drain out. Now, before the tech can even get into it. So you're extending these service times, materially and response times versus direct-the-chip liquid cooling, you know, pop it out and you don't have water containment issues, you know, things splashing across the data center. Like, it's a mess. These are highly sterile and contained environments.

Starting point is 00:24:13 They don't even let us bring cardboard inside of the data center area because it's combustible. Right? And you can have little particulates that float around the site and can accrete into the nodes. Bad. You don't want fires and data centers. When you are talking about that much air being pushed around, that means any particulates in the air are going to get pushed around. And so if you just had some very small amount of particulates floating around a room,

Starting point is 00:24:42 now imagine that room is changing the. air every X amount of time, the number of particulates is going to grow. And then you're going to have a small fire on a chip, which is just absolutely crazy. Do you think the demand is going to keep up? Are you seeing any signs of demand? People saying, okay, we built our 10,000 GPUs. We're making more efficient software, making more efficient use of the chips. Okay, yeah, we're getting to a steady state.

Starting point is 00:25:06 We bought enough. So are you starting to see that with your customers saying, you know what, we've got enough GPUs, we've got enough infrastructure right now, or are they staying? still, you know, in the begging, pleading and, you know, doubling, you know, what are you seeing from top customers? Are they doubling their capacity every year? Are they tripling? What's the field report? Yeah, I'll qualify it a couple ways. So, you know, one way, we will increase our revenue by about tenfold this year. And we're already sold out of all of our capacity through the end of the year. Right. So I have a build schedule. We have about 500

Starting point is 00:25:40 employees today, I'll be closer to 800 by the end of this year. That build schedule is fully booked this year already. We see that broadly across the sector. There's just an immovable wall of demand for this compute. A lot of it is being driven from this move from training the models to inference, right? And inference is actually bringing the commercial value out of training. So you want to go train a foundation model that takes compute to be built in the configuration that we build it, right, in these 10,000, 30,000 GPU clusters. And then you got to go, you know, make it action, right? Like drive revenue off it and bring a product. And, you know, what we're observing is it might take 10,000 GPUs to train a model.

Starting point is 00:26:31 But inference is linked to the number of users, right? If you go into chat GPT, for example, and query, that's spinning up a GPU. And now there's a million of you doing it, 5 million, 10 million. That informs the size of inference. So inference will really truly be linked to the growth of this market. And we're seeing users who are using 10,000 GPUs for training need hundreds of thousands for their early stage inference product. right? So we don't see demand going anywhere

Starting point is 00:27:10 but up to the right for this infrastructure. While they may not need exponential use of GPUs for training, they'll get more and more efficient at that. What they will need

Starting point is 00:27:25 is those inference, when people ask the query, that's inference, not training the model, but asking a question of the model, that is massively compute intensive. and in H-100, if we were to look at that unit in an hour at full capacity, how many queries? I know it depends is always the answer, but an average query, like these things were costing a couple of pennies per query. Is that correct? Ballpark? Yes. Yeah. That's correct. And I think that's the

Starting point is 00:27:54 right way to qualify it is a cents per query or dollars per query. Right. So you're getting in, you know, hundreds of queries within that. period. And as you said, like, that will become more efficient over time as well. But it's, it's just an unbelievable volume of demand. And like when you step back and think about it, right, like you look at the existing cloud infrastructure that was built over the last decade, it wasn't built for this use case, right? It was built for serializable workloads. It wasn't built for parallelizable workloads. And it's like you're having to rebuild the cloud, so to say, and you're having to rebuild it.

Starting point is 00:28:33 You have to rebuild physical infrastructure at the pace of AI software adoption. It's a mind-blowing concept, right? Because AI software is being adopted at the most rapid scale of any technology that we've ever observed. And you're asking people to build. I mean, we're building at, I think it's 28 data centers this year across North America. We're one of the largest operators of this infrastructure in the world.

Starting point is 00:28:58 And we are unable to keep up with demand. And we really don't see that subsiding for years to come. Listen, as a founder, there are things I love doing, like building products or meeting with partners, hanging out with my team and dreaming up new ideas. And then there are chores that I don't want to do. I don't want to do HR. I don't want to do payroll. I don't want to deal with all that.

Starting point is 00:29:20 So I use Gusto. Gusto is the best for payroll, for HR services, and for running a small business, it makes everything so much easier. Even a mid-sized business, man, I get a lot of portfolio companies that are pre-sized. using Gusto because it is designed for you, the small business owner. And payroll is something you definitely do not want to mess up. You got to get it right. And Gusto is going to make it perfect for you by calculating paychecks perfectly. Also payroll taxes. You got to get your taxes right. You can't make mistakes there. And you want to set up open enrollment. You want to be good

Starting point is 00:29:51 to your people. Gusto handles onboarding, health insurance, 401k, time tracking, commuter benefits off the letters, and they even give you access to HR experts. So Gusto takes all of this your hand and let you focus on important stuff, your product and your customers. It's super easy to set up and get started. And if you're moving from another provider, Gusto will transfer all your data for you. Here's your call to action. Because you're a Twist listener and you're part of the family, you're going to get three months free. Incredibly generous, totally unnecessary. Thank you so much to our friends at gusto.com slash twist. You must go to Gusto. Again, gusto.com slash TWIST to get three months free. Thank you, Gusto team. So then this would lead us to LPUs.

Starting point is 00:30:32 obviously using a GPU, very expensive, right? But Grok, my friend Chamaut's company, has this inference engine and these LPUs. Are you starting to see those and that hardware stack emerge, these language processing units? And do you think that'll have a good effect on the industry in terms of lowering cost and having purpose-built hardware for the inference moment?

Starting point is 00:31:01 Yeah, so as opposed to the Lord of the Rings, where there's one ring to rule them all, I don't think that there's going to be one GPU, LPU, one accelerator to rule them all, nor do I think there's going to be one model to rule them all either. I think there's going to be lots of different models with different objectives, right? Like models that do different things, right, whether it's helping drive a car or cure cancer or be an AI character, models will do different things. And then there will be infrastructure

Starting point is 00:31:36 that is most efficient for each different type of model. And I think that's why you're seeing entities like Microsoft, meta, etc., who are focused on building their own silicon, they're not trying to replace the GPU. They're just trying to solve for different models that they're running internally.

Starting point is 00:31:55 So I think the groks of the world will absolutely have a place. somewhere, but I also think that you'll see GPUs have this place. And where we're observing their place is at foundation models, at latest generation models, the most demanding and complex workloads will continue to sit on GPUs. And Nvidia just has this unbelievable solution for iterating continually better generations of GPUs. And we think that those models will continue to accrete to NVIDIA's platform.

Starting point is 00:32:31 So they're going to win the day, no doubt, Nvidia, when it comes to training the models. Inference, you might see other folks Carver Nietzsche themselves is how you would bet this emerges. Yeah, yeah, I think inference will

Starting point is 00:32:45 have various levels of infrastructure that provide solutions for it. I will say, you know, if a model is trained on A-100s, it'll probably run inference. on A100s as well, like kind of tough to make that architecture shift. And it's tough because of the software that NVIDIA has, right?

Starting point is 00:33:07 Like, their driver solution, Kuda, Nvidia very thoughtfully open source that driver solution in the early 2010s to support this sector and the engineers who wanted to work on these products. and it has become effectively a default solution across the market. It's similar to drivers for CPU, right? Everything was X-86 for decades, right? And it didn't really matter if something was better or not than it. It's just what people use, right?

Starting point is 00:33:42 Because there's an efficiency loss if you say, well, I'm going to go learn this other thing and just hope other people will use it. Or you could just use the thing that everyone else uses. And that's what dominates the market. And Nvidia has an amazing moat that they've developed out of the superiority of their software solution for their infrastructure. And I think that's going to keep people using their platform for a long time to come. Now, are people using Kuda yet to address other GPUs because it's open source? and it's obviously being used

Starting point is 00:34:23 for parallel computing here when you've got a supercomputer you need to send a job across many different GPUs are people for Kuda or have they adapted Kuda in order to have it send a job to some

Starting point is 00:34:39 Intel server, some in video ones and is that opening up possibilities I think for a more open source future and then I'm curious what you think of open source chips and chip architecture, and if you think that is ever going to have some sort of an impact here on the space.

Starting point is 00:34:56 Sure. So this will get a little bit beyond my domain expertise. But yes, there has been forks, so to say, and software that enables Kuda to run on different infrastructure, but it comes at a hefty cost, right? It comes at performance loss. It comes with configurability loss, like so much so that none of our clients. are requesting that, right? We're talking, you know, 30, 60, 80% performance loss, right?

Starting point is 00:35:28 So the most natural thing for the largest consumers of this compute is to stick on Nvidia infrastructure with Nvidia software. And that goes to your second question, which is around open source, you know, it's tough for me to say, but I would highlight the behemoth that's that's, driving the research and the path forward on Nvidia GPUs, they just have so much capital they're putting to work to ensure that they have the most performant piece of infrastructure in the market that, sure, there might be some use cases for that open source infrastructure to be applied,

Starting point is 00:36:12 similar to how Brock can be there or other custom silicon chips. I think the vast majority of workloads are going to accrete and stay with a GPU infrastructure of which. Who's number two or three in the space? Does anybody have a chance of closing the gap? Because obviously people are watching Nvidia print money. And obviously, that's, I don't know, what percentage of your infrastructure that you provide is Nvidia, but I'm guessing it's 90% plus. But is there a number two or three in this space? And do they have a chance of gaining market share?

Starting point is 00:36:45 You think this is FetaComple? We're just, we're going to live in an NVIDIA world for the next decade. I think we're in NVIDIA world for a while, right? You know, you have AMD out there, but A&D doesn't have a performant training fabric, right? Like, that's something that's proprietary with an NVIDIA is in FITM band, right? So you can't build this a comparatively performant training fabric with the AMD infrastructure, right? So you can kind of only use it then for inference. And it's sort of, well, if you've already trained your model on NVIDIA, it's a tough leap to want to move your software, move your infrastructure over to AMD compliant.

Starting point is 00:37:28 So it's certainly a market I would expect that AMD is allocating their time to, but we're not seeing the customer demand for it at scale. Right. And we really serve as scale consumers of compute. certainly there's you know your guys who want ones or tens of GPUs out there who will say oh I'd love to work with you know in my 300s

Starting point is 00:37:50 but it's those entities who want tens of thousands of GPUs that are sticking with Nvidia and we haven't really seen any deviation from that and for folks you don't know Infinite Bend is kind of a contemporary or competitor to Ethernet or

Starting point is 00:38:07 fiber in a data center if you had a bunch of storage in one location or even in a GPU between GPUs, passing data between them. There has to be some way to move data from one cluster to the other. If they were passing training data or something, the speed at which the training data can get on the GPU to be processed, that is a bottleneck.

Starting point is 00:38:30 And Infiniband is the solution to moving large amounts of data. Am I correct in my description of it? That's exactly right. That's exactly right. It's infrastructure that Nvidia acquired and have integrated into their solution, called it the DGX solution. And it is the most performant fabric solution, other words, network solution for this infrastructure for data throughput. Hey, startups, you're a new company and you're looking to form your business. But navigating through a maze of hidden fees and legal jargon, it's complicated.

Starting point is 00:39:09 It's going to eat up all your time. Well, Northwest Registered Agent will form your business quickly and easily. And it only takes 10 clicks and 10 minutes. They provide you with a full business identity setup. That means they'll give you everything you need to start and to maintain your business. When you hire a registered agent to form your company, they take care of everything. You get a registered agent service, a business address, their corporate guide service, a phone line, mail scanning, a free domain, a website, and hosting. Northwest Registered agent makes the whole process transparent.

Starting point is 00:39:39 quick and enjoyable. Whether you're setting up an LLC, a corporation, or a nonprofit, they've got you covered. Here's your call to action. For just $39 plus state fees, Northwest Registered Agent will form your company and launch your business in minutes. Visit Northwest Registeredagent.com slash twist today. That's Northwest registeredagent.com slash twist today. Is this still one of the key challenges in terms of training large language models is the throughput of the Infinite Band or Ethernet solutions to just move the data around. This is the bottleneck over GPUs in many of these jobs.

Starting point is 00:40:15 Yes, it's critical to build with a non-blocking infiniban fabric. So non-blocking means that every component can operate at the same performance and efficiency as everything else. There's just nothing blocking that performance. No bottlenecks. Yes, no bottlenecks. Right.

Starting point is 00:40:32 And it's really interesting because it's a physical engineering problem. So a 16,000 GPU fabric, right, which is about 2,000 nodes or individual servers with 8 GPs per server. It has 48,000 discrete connections that have to be made across the fabric, right? So you plug in Infineband into each GPU in the server, then that goes out to a switch and like you're part of this fabric, right? Every connection has to be made correctly. And you're doing this with 500 miles of fiber optic cabling. within that 16,000 GPU fabric. And we run a number of those of larger and smaller size.

Starting point is 00:41:14 So we built a lot of these things, run a lot of fiber in our days. But it's a complex physical problem that no one's really been presented with before. This wasn't a problem when you were running with Ethernet or hosting websites and storing data lakes. You didn't have to build fabric this way. It would be one connection per server, not eight connections per server. per server and to be doing it in this contiguous non-blocking fabric in a single footprint. And so it's just lots of new things that are happening at the same time with an immense amount of capital at risk and immense amount of capital is being consumed in the fastest-paced technology

Starting point is 00:41:57 environment that we've ever been in. And it's creating problems all over the market. And where we've found ourselves is having a software solution and a company that's only focused on these types of workloads. And accordingly, we accrete clients into our platform for having that best engineering solution and actually being able to deliver it to end consumers. Yeah. And we've never seen at-scale companies like a Microsoft, like a meta, like a Google. These companies are at scale. they have massive amounts of capital,

Starting point is 00:42:33 which they can't deploy in M&A anymore, right? We have a framework in the West where you're not allowed to buy companies, and I made this went on all in a couple of months ago, instead of like if you were Apple or you're Googling, you're sitting on tens of billions, hundreds of billions of dollars in cash, you can't buy Uber, Airbnb,

Starting point is 00:42:52 you can't buy CoinBish, you're not allowed to buy even Figma for $20 billion. You can't if you make a small purchase like that without getting blocked. But what's the next best thing you can do that capital. You can build infrastructure. Infrastructure as a weapon. And now you've got this massive infrastructure. Will you have jobs for it? I'm sure there'll be some. Will those jobs turn into commercial products? Some will, some won't. But it's a better use than sitting on the cash,

Starting point is 00:43:14 or it's a better bet. It's a better use of capital rather than trying to make a couple of points on it and, you know, or buying back your shares. It feels like, gosh, if you have this infrastructure, you could have induced jobs, which is to say some crazy person on the meta team is going to be like, what if we did X? And having that infrastructure allows somebody with a crazy idea to then go give it a shot

Starting point is 00:43:41 and spend a million dollars running a job across this infrastructure, whatever the pro rata version of it is, and maybe they find something really interesting. Who knows? Yeah, what people are going to do with this infrastructure? You do. You're watching them. What is the interesting

Starting point is 00:43:56 jobs you're starting to see and use cases. I mean, some of it's public and some is private. I'm obviously don't want you to betray anybody's trust here. But just what are people doing with this infrastructure that you find interesting when they come to you and they say, hey, we need a solution for this or here's what we're building? What are some of the things that you think are most promising on certain verticals, sectors that are most promising? Sure.

Starting point is 00:44:18 So I think the areas where AI will be adopted first and fastest and do it at scale, right? Because you can always find like five users to do something. Right, but how do you get 5 million users? Yeah. It's going to be within products that the user doesn't have to learn something new. It might not even be a new but, right? It just comes naturally to them. It feels organic.

Starting point is 00:44:39 It doesn't require, you know, a new app to be somewhere. It's integrated into existing products. And I think that's largely going to be co-pilot, right? Like various co-pilot solutions, not the name to one product, but just the idea that you're integrating AI into apps to, assist a user with a pre-existing process. Right. That's something that we're seeing scale right now.

Starting point is 00:45:06 And the ability for those products to scale are limited by the amount of cloud infrastructure that's able to handle those users. Again, remember, it takes each time you come in and query that co-pilot product, it's using a GPU. So cloud infrastructure inherently limits the pace at which those products. can grow. And I think you've seen some products delayed even because there wasn't enough cloud infrastructure available to power their launch even. Yeah, if you look at search engines like Bing, Bing kind of was doing the custom answers. You'd have to like click a second button to get it,

Starting point is 00:45:45 right? Go to another experience, whereas some, you know, search engines powered by AI were doing it automatically because they didn't have a large flow of it. If every single Google search resulted in a query to a GPU, they would actually bankrupt Google right now because they have so many queries and a three or four cents extra per query, there's not enough infrastructure in the world to convert all of those queries today. Look, it brings up a question of, will AI be a tax or a margin expander for software products? I think some of them, it will be a tax, right?

Starting point is 00:46:17 It'll become mandatory and they might not be able to drive incremental direct revenue off those products. But, you know, the other outcome, if you didn't integrate that AI at that tax, could be you lose users and you lose market share to someone else. Right. If you look at search, that would be the perfect example. If being offers this to, you know, their four or five percent market share, they're kind of, they can lose money on it because they're building that business. Whereas Google, it's their core business. They would, if they put it on all 90 percent and they start losing money, they could just flip their business upside down, right? That's right. And the other interesting point to that is, you know, Google might have the option to integrate AI if it doesn't have the infrastructure available at the volume that's required. And I think that's why I'm seeing some companies, Betup, Microsoft, throw so much CAPX into ensuring they have the volume of infrastructure necessary because it having to compute at scale, going back to my point early, it decommoditizes compute, right? Like that, that in and of itself is a strategic

Starting point is 00:47:22 advantage. So I'd say the other area that I personally think will accrete AI rapidly is in the advertising sector. Oh, really? I thought you were going to say health care or biology or something. I agree. I think that's a second half of this decade thing that we're extremely excited about. I mean, I can't wait to have infrastructure that directly supports the advancement of healthcare solutions. But advertising, I mean, think of the way that ads work, right? You throw an ad, you hope it reaches an audience, and then a subset of that audience will actually identify with the right. It's probably a pretty small sliver of it. Yeah.

Starting point is 00:48:01 Instead, if you could use generative AI to create on-demands, always-on ads for people that are 100% specified to the metadata associated with that user, those are going to be much more highly effective. Yeah. Here's the example, right? Like you were, you live in Utah, you have a green kayak and you're searching for a new Kia, right? It's a blue Kia. And you've been looking at it for a few days. And instead of just receiving the general Kia ad of, you know, a gray Kia somewhere, you now get an ad that is a blue Kia with a green kayak on top.

Starting point is 00:48:39 Yep. Driving through the desert in Utah to a river. Right. And you get several different iterations of that until you go by that Kia. Yeah. That's a area where the user doesn't know that it's generative AI, but it'll be so accretive and disruptive to the advertising sector. That'll just be mandatory for them to use it.

Starting point is 00:49:00 Because the ads effectiveness will increase that much. It's fascinating. You know, you look at what happened with meta. There was this idea that when they lost access to smartphone data, when Apple anonymized it, you know, they would have a really hard time doing targeted advertising. It actually kicked them in the ass and made them implement a. and they have now recovered and gone further, I think, in terms of personalization. You're exactly right.

Starting point is 00:49:25 If it knows you have two kids, it's going to put, you know, in that Jeep Wrangler, you know, with your kayak on the top or whichever car it is, two car seats. It's going to show kids in it. And the message will have something about how great it is for toddlers or young kids. And here are some, you know, here's the media center that puts the TVs on the back so they can watch Netflix. It's just going to have so much information to custom. the ad that you get the gap between the aspiration of the ad and the reality of your life is going to close right because ads are aspirational so it's my it's like minority report if you remember

Starting point is 00:50:00 and everything goes back to minority report you know the customization of the ads will be absolutely phenomenal to a level that yeah it's like beyond creepy it's just like mind reading ads yeah and you've never had that before right and what's important there is it it took x amount of resources to generate that one kind of mass media ad previously. Well, now, each time you have that iterative always on ad, that's querying infrastructure. So the infrastructure demand for this new type of advertising will be voluminous. Yeah. Right? It'll be more effective. And I think it will actually be better dollars spent in advertising, but it'll be an immense amount of infrastructure demand behind it. So I think co-pilots there today in scaling, but they're

Starting point is 00:50:47 The next big thing in there to really scale will be within the advertising space. Yeah, it makes a lot of sense. It's a huge business. And, yeah, you know, anywhere there's a lot of data and a frequent transaction. I mean, that's just a great place for GPUs and this AI revolution to take part of because it's frequent. And there's a transaction. This is where like Amazon and how Amazon sells you stuff in Walmart and Target, my lord,

Starting point is 00:51:14 e-commerce in the last mile. it's already been impacted in a way. I mean, if you look at the amount of advertising revenue for Uber and Instacart, which are, you know, for Uber eats essentially, you know, that's like being in line at the checkout counter. And then they're doing a billion each, I think, a year roughly. And then Amazon might be doing 30 or 40 billion in advertising now. Like those three businesses, which are seemingly transaction-based businesses, right?

Starting point is 00:51:38 Shopping for groceries, food delivery, mobile transportation, then Amazon buy anything. They're all becoming advertising business. businesses. It's like pure profit for them. It's going to be wild. The AI impact on those businesses. And all roads lead back to generative AI. It's just all converging here. Right. And it all needs this type of infrastructure. And that goes back to my point of like how much demand is there. We just don't see the path to to resolve the amount of infrastructure that needs to be built for the demand that there is within, you know, at minimum the next few years. Yeah. Right. There's just so much needs be built because last generation's clouds aren't designed for this.

Starting point is 00:52:21 And it's not like you're swapping out a UI and saying, like, oh, you like tweak some software here and there and all of a sudden it works, right? It's, no, it's the foundational difference. It's the Tesla versus Ford manufacturing process. Yeah. And it's, CPUs are just never going to take these workloads.

Starting point is 00:52:36 CPUs will be for serving up images or light work. It's not going to ever compete with this level of. It's completely different. Yeah, it's completely different. Any worries about overbuilding this infrastructure at this point? If we were going to start talking about a slowdown or a certain amount of infrastructure, is there somewhere on the chart that you start thinking, yeah, this is going to, we'll fill the demand five years out, 10 years out.

Starting point is 00:53:03 Where do you think we have enough capacity enough, you know, and this supply demand becomes normalized. Right now, it's abnormal, obviously. When do you think this normalizes? When do we catch up? So between the infrastructure demand and the data center demand, right? So it's multiple components in here. And then it's all the infrastructure pieces that go into a data center. Yep.

Starting point is 00:53:27 And then it's the power that goes into it. Like it's this really complicated physical stack to serve it. It honestly could be the end of this decade until you see this rebalancing of supply and demand. And not to say that that's overbuilding, right? Like that's just still on a heavy growth trajectory. That's just when infrastructure may have had an ability to catch up to where demand is. And I geek out over this stuff because that's my background and my co-founders. We're all from this commodity trading sector where all we did was assess supply demand

Starting point is 00:53:58 and understand physical disruption of commoditized markets. And that's exactly what we're looking at here. Yeah. But these aren't commodities yet. You know, like H-100's trade at a price. And I guess they're commodities. but they feel like a very resource-contrained commodity right now. So I guess they are commodities.

Starting point is 00:54:19 They're sort of, you know, if you think about like cloud infrastructure for hosting websites, right, it was fungible, right? It didn't really matter if you were on AWS, JCP Azure to host your website. It all felt like the same thing. It was the same product to go host your website, right? Yeah. What's changing is that lack of fungibility, right? an H-100 hosted at AWS is very different than an H-100 hosted at CoreWeave because of the way that we run that infrastructure differently from a software perspective and the way we build it from a from an physical perspective.

Starting point is 00:54:55 So that's the commoditization that did exist. That's now being decommoditized through software and infrastructure disruption. Right. Yeah. It's amazing. What a moment. What a time to be alive. It's so much fun. It was absolutely fascinating to talk to you for an hour, and I'll let you get back to racking and stacking. I'm sure you've got tons of H-100s and A-100s to unbox. I mean, just unboxing and racking stuff.

Starting point is 00:55:23 I mean, you have hundreds of people doing that at this very moment? Hundreds. And semi-trucks arriving to our 28 data centers across the U.S. It's an operational feat. I think we're hiring 20 people a week right now. Yeah. And these are like system operations people. These are high level people to come in and configure this infrastructure.

Starting point is 00:55:46 Well, massive success. And thanks for building out the infrastructure. Let's solve some huge problems. And we'll see you all next time. And this week in startups, bye-bye.

This Week in Startups - CoreWeave’s Brannin McBee on the future of AI infrastructure, GPU economics, & data centers | E1925

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.