Limitless Podcast - AI Arms Race: Can Elon’s 550,000 GPU Monster Beat OpenAI’s Stargate?

Starting point is 00:00:03 Welcome to the AI Scaling Wars. A race for the prize isn't gold, but godlike intelligence that can solve cancer, spark wars. It can do the best and the worst out of humanity. This is AI scaling and there's a few major players here. So we have XAI, OpenAI, Anthropic and Google. They're kind of battling to build the biggest brains by cramming more compute power, data and electricity into these things called data centers.

Starting point is 00:00:26 Now, by 2028, the expectation is this is going to suck up to 50 gigawatts of power. Now, for reference, picture lighting up all of New York City twice. That's how much energy is being consumed. It is estimated they're going to spend up to $3 trillion by the end of this decade alone. This is a huge arms race to scaling superintelligence. And there's one company in particular that we're probably going to be highlighting a lot throughout this episode, just because they've been absolutely crushing it. They are the youngest company out of the group, yet they are the furthest ahead when it comes to benchmarks. And that is XAI.

Starting point is 00:00:57 That's Elon's company who is building Colossus 2 currently in Memphis. This is where I want to dish it off to EGIs, because he's been in the weeds with this. He has been digging deep on exactly what they're building, why what they're building is so impressive, and how important all these elements are to actually scaling AI. So EGAS, can you just kind of lay out the landscape for us? Let us know why XAI is doing so well and what they're doing to accelerate so quickly. Sure. So the headline news this week was Elon Musk is launching a second data center to train XAI's AI models.

Starting point is 00:01:30 and it's something called Colossus 2, which implies that there was a Colossus 1. And there was. Colossus 1 was around 200,000 GPUs in size. Colossus 2 is over twice the size. We're talking about 550,000 GPUs. Now, for the listeners on this podcast, trying to understand what that means in perspective,

Starting point is 00:01:52 100,000 GPUs is probably the fastest supercomputer on the planet right now. So if you had 100,000 GPUs, you have the fastest supercomputer in the planet. But it would take around three years to prepare to build this supercomputer. So think about it. You need to get the equipment, Josh. You need to order the designs. You need to kind of like figure out how to manage all these things. And then once you have all the equipment, it'll take you around a year to set up this supercomputer.

Starting point is 00:02:22 This is just 100,000 GPUs. Let me ask you this, Josh. How long did you think it took Elon Musk and his team to set up 100,000 GPUs over the last month? Okay, I cheated because I am obsessed with this topic and I know the answer. But I have a feeling that we have a fun little clip to show people on how quickly they got it done. Because it was fast. It was mind-blowingly fast. Yeah, it was actually 19 days.

Starting point is 00:02:47 That's insane. So he took a process that would take four years down to 19 days. And let's let none other than the man himself, Jensen, kind of comment on how amazing this was. Yeah, this is remarkable. I've got this little clip to play. Yeah. What do you think about their ability to stand up that super cluster?

Starting point is 00:03:07 And there's talk out there that they want another 100,000 H-200s, right, to expand the size of that super cluster. You know, first talk to us a little bit about X and their ambitions and what they've achieved, but also are we already at the age of clusters of 200 and 300,000 GPUs? The answer is yes. And then the first, first of all, acknowledge. of achievement where it's deserved. From the moment of concept to a data center that's ready for NVIDIA to have our gear

Starting point is 00:03:35 there to the moment that we powered it on, had it all hooked up, and it did its first training. Yeah. So that first part, just building a massive factory, liquid cooled, energized, permitted in the short time that was done. I mean, that is like superhuman. And as far as I know, there's only one person in a world who could do that. I mean, Elon is singular in this understanding of engineering and construction and large systems and marshaling resources. Incredible.

Starting point is 00:04:09 Yeah, it's unbelievable. And then, of course, then his engineering team is extraordinary. I mean, the software team is great. The networking team is great. The infrastructure team is great. You know, Elon understands this deeply. And from the moment that we decided to get to go, the planning of a, with our engineering team, our networking team,

Starting point is 00:04:26 our infrastructure computing team, the software team, all of the preparation advance, then all of the infrastructure, all of the logistics and the amount of technology and equipment that came in on that day and video and media infrastructure and computing infrastructure and all that technology to training 19 days.

Starting point is 00:04:45 Did anybody sleep 24-7? No question. That's so insane. So it's important. It's ridiculous, 19 days. Relative to what other companies are, doing, which is, you know, a couple of years, like 18 to 24 months. Now, it's important to note the data on this episode, which is October 13, 2024. So this was not recent news. And from now,

Starting point is 00:05:06 from that time until now, there have been remarkable improvements. So what Jensen here was referencing Jensen's, the CEO of NVIDIA, that was for the building of the Colossus 1. Classus 1 is what trained GROC 4, and that's currently what we're using now. So GROC 3 and 4, that is Colossus 1 is responsible for that. What we're building now and what we're going to be talking about is Colossus 2, which is the next version of this that will have way more than just the 100,000 GPUs that were initially available in this first cluster. Exactly, Josh. And to kind of like give you guys an idea of like how big this behemoth is going to be, I want to pull up this tweet which kind of like puts two kind of like very consequential facts together to help you picture this in your mind. So in contrast, GPT4, which is the open A.I.I. former Frontier model was trained on around 25,000 A100s.

Starting point is 00:05:57 A100s refers to a GPU, right? Which is roughly 12.5,000 H-100s. Now, Josh, we've actually mentioned H-100s a lot on this show. It's basically the crem della crem of GPUs. If you're training an AI model, you need H-100s and you need many of them in size,

Starting point is 00:06:16 but they're so scarce. They're so hard to get. And they're all coming out of one manufacturer, which is Nvidia, that's why Jensen Huang is, you know, speaking about this so often at conferences and stuff. The goal of Colossus 2 is to hit 50 million H-100s equivalent in compute. That is an insane amount given that it only took 25,000 of like an old model GPU to train GPT4. It kind of hurts your brain when you think about those numbers because you think of what GPT4 did to the world. It was the first form of like pretty broad scale intelligence.

Starting point is 00:06:53 It was incredible. And that only took 25,000 of these A100. So to factor that up, I don't even know what that multiplier is. From 25,000 to 50 million, you really start to feel the power. And you're like, oh, wait a second. If all it took was 25K to get that level of GPT4 intelligence, 50 million, it starts to be like, okay, surely there's no way we don't have super intelligence from this. Surely after 50 million GPUs working on this one problem,

Starting point is 00:07:20 There's no way we can't solve new physics and get new science. And you really start to, this is the first time for me at least to kind of hit me. I was like, wait a second, this is how we do it. This is how we get to superintelligence. And Josh, do you remember when you reminded me in last week's episode that XAI has only been around for like two years? That's the craziest thing. They're like the youngest AI model creator company. And the rate of progress is just insane.

Starting point is 00:07:48 Like, look at this tweet that we have pulled up here, right? They are already pretty much a year ahead of the competition if they execute on this Colossus 2 data center. And so far in phase one, they're executing. And I was looking up the size of the data centers that other folks, you know, top folks like Google and meta, used to train their models, Josh. And we're talking in the range of like 50,000 H-100s

Starting point is 00:08:15 or maybe even like working towards 100,000, H-100s. And this, like, when you compare it to like 550,000, which is like, you know, whatever, quadruple that, and then the end goal of 50 million H-100's equivalent worth of compute, it's just insane. Josh, I'm noticing something as well. When I look at the specs for Colossus 2, there's a lot of, like, code names for all these GPUs, right?

Starting point is 00:08:42 And it can get kind of, like, overwhelming. And I don't really understand the difference. One thing that keeps repeating in Colossus 2 is this concept of a GB200, which they used for Colossus 1, but now this fancy new thing called GB300s. Can you help me understand what this is? Is this, where is this coming from? What's it made of? And why is it useful? Yeah, so there's a bunch of different ship architectures that are used when training these things.

Starting point is 00:09:07 The H100 is the most popular. It is the most well-known. It is basically the flagship ship that Nvidia ships. Now, this new ship on the block, which is generally referred to as Blackwell, is the GB300. So Blackwell uses this dual chip design, which uses two chips and one, which is a total of 208 billion transistors, which is an outrageously large number. You can think of GB300s as one and a half times more powerful than the GPUs that were being used in Colossus 1.

Starting point is 00:09:33 So now he has all of these chips that are one and a half times more powerful. They're much more efficient. He has roughly an order of magnitude more compute relative to the H-100s that he was using. And remember, it was H100. It only took 25,000 to train GPT4. So if we look at this post here, that is roughly equivalent. It's Elon is posting this. He says, the XAI goal of 50 million units of H100 equivalent AI compute online within five years.

Starting point is 00:09:59 So that means we are going to get this. It's this crazy order of magnitude upgrade in terms of efficiency, in terms of power. And I was listening to Jensen actually talk yesterday about how he considers these new chip architectures. And there's basically two things. because you would think that the H-100 chips are outdated, right? Like, now that we have these GB-300s, they're significantly more powerful, there's significantly more efficient. But he thinks of it in two ways.

Starting point is 00:10:22 There's one where you can increase the efficiency, and that allows you to squeeze more value out of each watt of energy you have, or there's just more compute, which allows you to squeeze more profit out of each incremental GPU you have. So the idea is that hopefully the X-AI team will be able to generate enough energy to power not only the GB300's, but also the older H-100s because they're still very powerful.

Starting point is 00:10:48 And they will just add to the pie. One of the cool things about training coherent clusters is you can just kind of throw all the chips you have at it, and they all just work as one collective brain. So even though there are going to be outdated H-100 chips, that won't be as efficient and that won't be as powerful, they still contribute to the collective knowledge of these larger models that are being trained.

Starting point is 00:11:07 So you're saying it's essentially a compounded network. So your GPUs that you set up in Data Center 1 doesn't just get thrown away. You kind of just can add it on to the new data centers that you're adding, even though you're adding new models of GPUs. Is that what you're saying, John? Exactly. Yeah, and there's this funny phenomenon happening where, like, H-100s now are not necessarily the cream of the crop.

Starting point is 00:11:28 We move into the Blackwell architecture. There are newer chips that are slightly better. But even the H-100 chips, if you go on any sort of AWS Cloud, compute server, Google Cloud, they're not available to rent. They are still fully utilized 100% being used because there's still so much demand for compute. So even though some of the ships are older, they still work very well. And it doesn't actually lower the quality of the training data. It just doesn't train it as quickly.

Starting point is 00:11:54 That's amazing. I just saw this tweet here, Josh, because you were talking about energy consumption earlier, when Elon is done with building this data center, it is going to be the equivalent of consuming 2% of global human electricity consumption. That's crazy. Just let that settle in right now. That's 2% of present-day human electrical consumption. More so, by the time he's done with this, he would have invested $20 billion into just this single data center,

Starting point is 00:12:27 $20 billion. So we're not talking about like a couple hundred million these days. We're talking about a significant KAPX investment that's going to eat massively into XAI's profits, but also all as, presumably other companies that are being involved that are using these different chips. It is a huge, huge investment. And the third fascinating strategy that he's taking with this data center is it's only being used to train AI models, Josh.

Starting point is 00:12:56 And I think this is really important because his competitors, Open AI, Anthropic, whoever's setting up major data meta, they're using their data centers to do both the training and handling the inference. But Elon's taking kind of like a wild strategy where he thinks training, so basically the quality of the model is the most important part,

Starting point is 00:13:15 and he's throwing everything that he can at it, $20 billion, and he's just outsourcing all the inference stuff to cloud providers right now. So I don't know whether this results in better quality model compounded at a quicker rate, as you said, you know, you just keep piling on hardware

Starting point is 00:13:32 on top of hardware, and he ends up winning this race, versus other people who are mostly taking another strategy where they're kind of like combining training and inference. Got it. This is an important thing because a lot of the times, actually one of the reasons why GPC 4.5, if you remember, was deprecated,

Starting point is 00:13:49 was because it was a very high intensity, high compute model. And he used a lot of resources when you wanted to submit a query. And the problem with that is GPUs are very limited. So when the OpenAI team has to serve this query to the servers, it's using the service that could otherwise be used for training. So it's important to understand there are just GPUs and the GPUs can be used for anything.

Starting point is 00:14:10 A lot of the times are used for training, but they also have to be used for serving data. So the double-edged sword with OpenAI is their user base is huge and they're getting a tremendous amount of queries per day. And those all need to be served through GPU usage. So a lot of their GPUs on a regular basis are going to serving this inference need instead of the training need. And what it appears the XAI team is doing is they're running 100% of this compute and training cluster

Starting point is 00:14:32 to the training need and not actually serving up inference data, which is a big difference because it allows them to put 100% of their compute cluster into making better models instead of using some just to serve uptime to keep up with their amount of users. I remember when the figure of $10 billion spent

Starting point is 00:14:50 towards training a model was an insane number to think about. Remember Meta's Lama when they announced that probably like two years ago? And the rate of progression is, I never thought we'd be sitting here, basically saying, yeah, if you spend $20 billion to try and train an AI model, that's still not enough compute. You still need a video to basically 100x that. I was also thinking about

Starting point is 00:15:13 the rate of progress that XAI and mainly Elon is making. And I have to say, when I step back and think about it, Josh, I'm not all too surprised by it, right? Like, this is exactly how Elon has built kind of anything that he's had. He's kind of had this like unwavering focus when he decided to come in and take over the automotive market and create the best electric car or when he decided to come in and say, you know what, traffic sucks. Here's the boring company, right?

Starting point is 00:15:41 Or if he's like, you know what? Humanity can excel beyond just a mobile phone. Let's put chips in their brains. And you of all people know about this, Josh. I don't know whether you see this as like a familiar pattern, whether you think that this is just an Elon specific thing or whether it's just happenstance. I feel like it's the former, but I don't know.

Starting point is 00:16:00 Yeah, it's certainly the former. And you could actually see it in the top-down overlook the Google Maps view of these training clusters. So there's Colossus in Texas. And then Google has one and Meta has one. And when you look at them, they're kind of designed very efficiently and very intentionally. So when you look at the way that the training cluster for Open AI is trained in Abilene, Texas, it's very intentional design.

Starting point is 00:16:25 It has your power over here. It has your chip cluster over here. It's all very unified. It's all very pretty. when you look at the Memphis cluster, which is what the XAI team built with Colossus, it kind of looks like a train wreck. Like there's no rhyme or reason why certain things are in certain places. And that actually is very high signal because the way the Memphis structure, the Memphis Training Center was built,

Starting point is 00:16:48 was it was basically an old washing machine factory, I believe. They used to make just some sort of appliance there. But it was built along a gas line and it had a small power plant next to it. So they were like, okay, well, we could just take this factory. it's close to energy, we can tab into it. The problem is in order to get permits to tap into a gas line, you need to wait a long time. And maybe 12 months, I'm not sure the exact amount, but it would be like close to a year. And normally for a company who was trying to build one of these, that's fine because they take 12 to 18 months to make anyway.

Starting point is 00:17:15 But Elon was like, no, we need to spend this up in 19 days like we heard Jensen say earlier. So what they did is instead of tapping into the gas line, well, they just started bringing lots of generators to the facility. So now on one side of the facility, you have a ton of generators that are just there generating just power until they could actually tap into the grid. They have this small power plant next to it that's kind of on the opposite side. And then there wasn't cooling built in. So they were like, all right, well, we need lots of cooling. So they rented, I want to say, 30 to 40 percent of all of the United States cooling,

Starting point is 00:17:48 portable cooling tech. They poured all that stuff in. Of the entire country. It was a significant percentage of the entire country. Wow. And then they have. had another problem where they're like, hey, these generators that we have, they're not really producing steady power because when GPU clusters power up and power down, it happens very quickly

Starting point is 00:18:04 in a fraction of a second. And that either draws a ton of power or it doesn't draw anything at all. And it's very difficult for a traditional grid to supply that in a steady state. So what they did is they were like, okay, well, it's a good thing we have megapacks. Let's call up Tesla. So they call it their friends of Tesla who had megapacks. They custom wrote code and megapacks for people who don't know are just these gigantic battery packs. They're just good for storing a lot of power. So now what they do is generators power these battery packs. These battery packs have been trained to distribute the power evenly without these jitters that cause problems during training runs.

Starting point is 00:18:34 And now they have the sustainable energy source. And it's this like this very crappy, very resourceful thinking, very like, we need to do this yesterday. Yeah, just out of the box thinking and pouring resources from other companies that makes the difference. And I think when you'll see, you'll see them post an update Sunday night where they're shipping out some new code day. So they've just released a new training run.

Starting point is 00:18:56 And it's like they are working 24 hours a day, seven days a week, with the only intention of actually building the damn thing. There's no real legislature. There's no like there's nobody telling them what they can't do. It's just like you must do anything you can to make this work. And I think that's why you see the rate of acceleration being so quick with them. It's basically found a mode with multi-trillion dollar companies. Multiple multi-trillion dollar companies all with the same CEO. And like I found it interesting.

Starting point is 00:19:24 You just mentioned the super PACs, right, which are these like huge battery packs. Aren't they made by Tesla, a completely separate company? So he's basically got like this protocol of companies that are all kind of like coalescing around creating AGI and owning the machinery side of things and the automotive side of things and the robotic side of things. And it's all tied together cohesively by this, or maybe not so chaotically cohesively by Elon Musk. And that is just insane.

Starting point is 00:19:54 But Josh, I just want to kind of like step away from Elon and X-A-A-A-A-R for a second. What's everyone else to it? Is like Open AI and META just kind of like sitting on their ass? Or are they actually doing something about this? They're cooking. Okay, let's tell me more about this. Yeah. So like X-A-I, sorry, Open AI is building their gigantic Stargate plant.

Starting point is 00:20:16 So we've talked about this in the past. They are building Stargate, Abilene, Texas. They're partnering with the government. They're partnering with Oracle. they're partnering with a bunch of other companies to make this happen. They are building an additional four and a half gigawatts of additional power to the Stargate Center. For a total of five gigawatts. Now, for reference, a single gigawatt is about 750,000 homes worth of power,

Starting point is 00:20:37 and they're planning to do five of these. So that takes us just below four million homes worth of energy. But this Abilene site is not going without some drama, because the reason this is being funded is by a deal, that Sam Altman had with SoftBank. And I remember when they announced it, it was for half of a trillion dollars, Sam went on stage with Donald Trump. It was this big United States effort to pushing AI. But from what I understand, there's a little bit of issues with actually securing that funding. So you walk us through what's happening with that. Yeah, so there was a bit of drama. I got the

Starting point is 00:21:12 original tweet where Open AI announced a Stargate here, Josh. And you're right. Like they announce a $500 billion investment over the next four years. this was primarily going to be funded by SoftBank, as you said, and a few others, right? We've got OpenAI, Oracle, and MGX, which is the Saudi Fund as well. But I want to direct you to this tweet that Elon kind of posted immediately after, which was they don't actually have the money, right? Where Sam responds, wrong. As you surely know, want to come and visit the first site already underway. This is great for the country. I realize what is great for the country isn't always what's optimal for your companies and he's responding and referring to Elon Musk company. So there was this back and forth

Starting point is 00:21:54 basically being like, does Open Air have the money? And, you know, this is on the back of like open air already kind of facing a lot of heat and competition from model competitors where they had this like massive lead and now that's kind of being constrained. And the Wall Street Journal was quick to follow up pretty shortly after saying, you know, apparently the rumor has it that they have to scale back their ambitions because they don't have the kind of money. And there's this like thumbnail picture of Masayoshi Sam, which is like the head of soft. off bank, basically implying that, like, you know, maybe they don't want to commit the money. But drawing attention back to the Oracle deal, Josh, it's confirmed now on locked in that

Starting point is 00:22:31 opening air is going to be spending $30 billion and partnering with Oracle to provide a lot of their compute. And so that's locked in. They're going to be building this extra four and a half gigawatts, so you know, under four million homes worth of compute that you mentioned earlier. So it's locked in. It's happening. And I think a lot of this drama comes from a very tumultuous leadership with Sam Altman at the head, right? He, I think, is an amazing CEO, and he's like, you know, led this company to where it is today, but it's not without the drama. You know, they had the Microsoft drama, so there's rumors behind the mill that Microsoft is

Starting point is 00:23:05 basically pulling out or trying to negotiate equity terms. They have a deal that kind of like gets them to 2030, but then they don't have ownership of the models going forth. So there's a lot of like this tension, I feel. And so Sam is just kind of like rearranging some chess piece. He's breaking connections and reforming new connections. That's kind of the way I'm looking at it. That sounds about right.

Starting point is 00:23:26 And also to discredit slightly the XAI ambitions. So just a little bit of math here, if you were to buy the 50 million H-100s that Elon is projecting right now today, that costs $1.4 trillion. That's a tremendous amount of money, which clearly they don't have. No company in the world has that amount to spend on GPUs currently. So I think what we're going to see as this evolves is,

Starting point is 00:23:48 I mean, surely the costs are going to come down. the compute is going to go up. But these factors are going to really sway who wins and how much money is required to do so. Because I mean, these ambitions are not matching what's currently available. These are definitely all projections. And whose projection maps the closest to what they actually need will be the thing that we're probably going to be watching a lot as we go forward. Makes sense. And I just kind of want to put this recent tweet out from Simon, which basically says, hey, we'll cross well over 1 million GPUs brought online by the end of this year. That is double the size of what Elon is planning for his Colossus 2 data center that we were

Starting point is 00:24:27 speaking about earlier in the near term. So, you know, he has some competition. Elon isn't far and away just yet, but if he continues executing and if Sam keeps kind of like biting back like we do, we have ourselves a race, Josh. This is a proper race. I don't think. So here's the thing. I want to give XAI the advantage strictly because.

Starting point is 00:24:46 their rate of acceleration, right? X-AI has moved the fastest. They have brought the most compute online. They have shipped the most features the quickest, and it appears as if the way that they are planning to go about using AI in terms of truth-seeking instead of alignment is probably a more optimal way of putting AI out there because it allows you to push it out faster than needing to go back and filter a lot of these things, which is kind of what we're starting to see with open AI, where they're starting kind of get shackled by their own policies, where we had this open-source model. And sure, that sounds great. Open source model, we're releasing this Thursday.

Starting point is 00:25:17 And then it never came out. And it hasn't come out. And there hasn't really been much reason why it hasn't come out. And if it were framed to be maximally truth-seeking, well, maybe they wouldn't have had these alignment problems where it wasn't working that well. There's these interesting problems that each company faces. It's going to be interesting to see who wins.

Starting point is 00:25:37 We don't want to discredit open-A-I-at-all because they're very clear of the leader. They have all the funding in the world that they need. I'm sure if they can't get it from soft-backer, bank, they'll get it from somewhere else. They'll get the money to push the GPUs. But to me, it seems like that's the current big race, right? It's the OpenAI versus XAI. But then also we have Gemini 3 and Google is coming soon. And Google has incredible models. So there really is this, like, it's still anyone's game. I would give it to XAI now because of the rate of acceleration, but they certainly do not have the lead and a strong lead because, I mean, GPT5 rumors are we're going

Starting point is 00:26:09 to get that in the next week or so. And that's probably going to blow everyone else out of the water. No, I hear you. I just, it's a toss-up. I actually have no idea who's going to get this. But there is one clear winner in all of this, Josh. And it's going to sound cheesy, but it's true. It's America. So, like, the one common theme across all these guys that are building these crazy expensive data centers

Starting point is 00:26:35 is they're all going to be located in the USA. And that's pretty major because for the last decade, there's been this exponential trend of big tech companies outsourcing all their tech effort because it's cheaper, because it's easier to scale. And more recently, you know, especially with the new US government administration and Trump's tariffs, etc., we're trying to bring back manufacturing back to the US,

Starting point is 00:26:59 trying to make it American-made. And it's really important because of two things. Number one, you don't want to have the infrastructure that is going to determine whether your country's economy lives or dies in another country. It's as simple as that. You want to have it in your land. You want to have it protected, high security, all the works.

Starting point is 00:27:19 Imagine if China was able to hack into your AI GPU cluster and inject it with false biases and propaganda. That is the kind of power that could start wars or an insurrection or whatever that might be. So you want to have it located in your country and that's great. But number two, the jobs that are going to be created from this, Josh, are insane. So I was looking into this. Stargate by Open AI, phase one is going to create 100,000 new jobs. And that is just over the first two years. Now, can you imagine that scales out over five years and with their new Stargate clusters that they keep opening? I think like this week alone, they announced one from Norway as well. Basically, we're going to end up with 500,000 to a million new jobs created over the next couple of years. Why I think this is so important is there's been this like narrative. Josh, you and I have heard it all the time, that air is going to automate jobs, it's going to replace your job, you're done, your toast, whatever. And we've always kind of been on the thesis of, yeah, that might be true in the near term, but really it's going to create so many more new jobs

Starting point is 00:28:22 that it's not going to matter, right? And those new roles are going to be kind of like made in real time, right? I'm guessing these 100,000 new jobs that Open Air is created for their Stargate thing isn't going to be like people that have 10 years of experience of building AI clusters, right? They're going to come, learn on the job and become experts and take those skills elsewhere. Number two, having all these data centers located in America means that all the offshoots of that, so all the new Facebooks, all the new Teslers that get created in the next decade are likely going to be located in America. So it kind of like stays, it doesn't fall too far from the tree, is basically what I'm trying to say. And I think that's amazing. Yeah, this is,

Starting point is 00:29:04 it's a big deal. I think the jobs thing we've discussed before, there are going to be jobs. taken away, there will be much more jobs generated. In fact, the future will probably yield a reality in which you can just opt into a job if you want to. It will not even be required with the amount of productive output we have. The jobs thing is one part of it, but the interesting thing that I am excited about in moving a lot of this back into America is just the existential risk we face by not doing so. So, yes, exit or not XAI, the Tesla team. The Tesla team is planning to build their new AI-6 chips in the United States. And they're planning to make a new. And they're planning to make their batteries in the United States.

Starting point is 00:29:40 And a lot of these are key risks, particularly around batteries, because not even for Tesla, but when you think about all of the robotics that are going to be coming online, they all require these actuators, which are the things that move, the joints that pivot, they all require batteries. They all require a lot of these precious materials manufacturing capabilities that we don't really have at scale in the United States. And in the case that something does happen, or in the case that the tables have turned, Imagine if Kimi K2 and Deepseek were closed source models that were run by China. Imagine if China had the Nvidia equivalent.

Starting point is 00:30:15 Imagine if they kept all the data siloed, that would be kind of a scary place for us to be in. And we just so happen to be in a lucky position where Nvidia is an American-made company, where these Deep Seek breakthroughs are open source so we can then emulate them, copy them, and then push them into our code. But there's a world in which that does change. And not being reliant on these foreign entities, being able to create these ships on United States soil, feels like a very nice national security improvement for us, at least, a nice competitive advantage. It will cost more. I was listening to Lisa Sue, the CEO of AMD.

Starting point is 00:30:43 She was talking recently about what it looks like to bring chips and compute architecture onshore. And she was saying, well, it's going to cost more. It's going to be maybe low 10% higher in terms of cost. But the output's going to be about the same. She said, you can get about the same output per wafer as you can in other countries. It'll just cost a little bit more. And so long as we could bear that cost and so long as we can provide enough energy to

Starting point is 00:31:04 power all of this new infrastructure, that puts us in a really good place. And I'm really excited about the direction that we're heading when it comes to onshoreing a lot of this chip manufacturing. Yeah, I mean, all in all, I'm just so excited for what the future is going to yield. And I think this whole topic of infrastructure can sound so boring. But when you have numbers like, you know, $100 billion being thrown around and what did you say, five million homes worth of power for a single data center, you can't help but pay attention to this. And this is something that we're going to be tracking a lot more on the show. We're going to be getting guests, experts that can speak to a much higher extent about

Starting point is 00:31:39 these things. If you enjoyed this episode, if you like the topics that we're discussing, please give us a thumbs up, like, subscribe, or comment in our DM, send us some DMs, give us some feedback, let us know what you want to hear more of, and we'll see you on the next show. Thank you, guys.

Limitless Podcast - AI Arms Race: Can Elon’s 550,000 GPU Monster Beat OpenAI’s Stargate?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.