Limitless Podcast - Exploring the Tech that Enables AGI: Claude Mythos and NVIDIA's Next Generation

Starting point is 00:00:00 A couple weeks ago, we covered the Claude Mythos release, the model that found decade-old security flaws overnight and scared the hell out of basically anyone who is following the AI story. So much so that the federal government is involved. But the part that we didn't get into is the back end that powered this model. Mythos was built on a chip from March 2024 that Jensen pulled out of his pocket on stage at GTC, which was the Blackwell chip. It had 2008 billion transistors. Everyone treated it like the future had arrived. And yet, it took two years of fabrication for us to get the first manifestation of, that, which is Claude Mythos, 24 models from keynote to a working model. It happened with Hopper,

Starting point is 00:00:35 it happened again with Blackwell, and it's going to happen again with their future models. But the difference is we have a series of future models that exist today that we can kind of map out to where we're going to be heading based on this trajectory that we've seen with the previous chips. And it's pretty awe-inspiring to see where we are going to go, considering there are three generations of chips that have already been announced since Blackwell. We have Vera Rubin, Ruben Ultra and Feynman, each one many multiples more powerful than the less. And when you look at what Blackwell already produced in the very first version, it gets impossible to imagine a world where we don't reach AGI on hardware that's already been designed.

Starting point is 00:01:10 Everything that's been announced that is going into production almost certainly is going to produce models indistinguishable from AGI. At least that's what it seems like on surface level. Yeah, so the story here in a single sentence is AGI, AGI, like AI models, are already here. We just haven't distributed it because we haven't powered up the GPUs that enable it. So everyone is obsessed with AI models. We talk about our favorite models, how we prompt them, how intelligent they are. But very few people are talking about the fact that the hardware is the thing that powers these things.

Starting point is 00:01:42 They train these things. They inference these things. And it's still about 70% of the influence of how intelligent your model is. And the prime example, most recent example of that, has been Anthropics Mythos. release, right? You just mentioned it. It's discovered a bunch of different cybersecurity flaws. It is this all-being powerful thing that the governments around the world, including the U.S. Government, the Federal Reserve, they're sharing meetings with the top banks. To talk about the craziness of this model, we must prepare. There's a lot of duma news out there

Starting point is 00:02:08 in the future. Little do you know that this is powered by a GPU, or this was trained by a GPU that was built 20 months ago. So we're talking about almost two years ago. It's called Blackwell. And I want to give you guys an idea of of the timeline of what this looked like. So in March 2024, Nvidia GTC, which is like their developer conference, Jensen Huang comes on stage,

Starting point is 00:02:30 and he presents this gargantuan scrap of metal. It looks very pretty, by the way. And he goes, this is Blackwell, GB200, GB300, a brand new GPU. We can train frontier models on it. Everyone gets so excited.

Starting point is 00:02:44 Their stock price absolutely ascends, right? The thing is, people couldn't get their hands on this until exactly a year later. So to give you guys an idea of the timeline, he announces it in March 2024, then by the middle of the year, they discovered there's like a bit of a design floor and they amend that. And then by the end of 2024, early 2025, they start shipping these units of Blackwell GPUs out to the top frontier AI labs. But there's an important nuance here, which is, it's just the GPU sitting in a data center. They aren't actually powered up.

Starting point is 00:03:16 It's not until six to 12 months after that fact that these GPUs were finally power. up used to train models, which is why we now start to see these new AGI-like models like Open AIS Spud and Claude Mithos come to fruition. So the point is, there is a long gap between the frontier GPUs being announced and rolled out to them actually being powered to train the models. We talked about Elon Musk and XA.R a lot on this show before. They actually have the largest arsenal of these Blackwell GPUs. They bought about a million of them. The crazy part about this now is they're not like one, two, but three new Nvidia GPU models that have been announced in the

Starting point is 00:03:55 recent Nvidia GTC. So there is a major lag between frontier hardware and the new AI models that are being released and people don't understand this and we want to tell you the story. You just remember GPT4 how long ago that was and how that felt like the huge

Starting point is 00:04:11 most pivotal model that Open AI ever released. I mean, that was the big one right after ChatGPT came out. That was trained using the Hopper chips. You know, the most recent model, Hopper is a word I haven't heard in a while, Josh. Yeah, well, you know, GPT 5.4, the most recent model that we're using every single day on ChadGPT, that was also trained on Hopper chips.

Starting point is 00:04:30 The same chips are chaining models from GPT4 to GPT5.4. And it's a testament to how the efficiency gains of software can actually increase the throughput of hardware. And I think I want to use that as an example because what we just got recently with Mythos, through Anthropic. That seems to be the first real implementation of a true Blackwell model. And rumors are that Spud, the new Open AI model, is going to kind of be the same in terms of power that is coming as it relates to the first Blackwell model. And even if we don't actually iterate on the hardware, the amount of progress we're going to get from Blackwell models alone seems like it is going to be difficult to imagine it doesn't become some sort of an AGI. It's like when you

Starting point is 00:05:14 think about the difference of intelligence between GPT4 and GPD5.4. and how far we've come, that applied to Blackwell at this new scale seems crazy. But that's not even the crazy part because we have an entire roadmap of these three generations of chips that are coming that we can very clearly map to the gains that we're going to see. And I think that's when things get particularly disturbing. Because on this chart that we're looking on screen now, we have Blackwell. That's where we are right now. Blackwell is a significant improvement over the previous model.

Starting point is 00:05:43 But then we have Vera Rubin, which jumps from 20 petaflops to 50 pedophobiles. to 50 petaflops. That's a two and a half to five times multiple on the compute. Then we have Rubin Ultra, which is scheduled for the second half of 2027. That is a 14 times multiple. And then we have Feynman in 28, which is an estimated 30 to 50 times multiple on the current chip stack that we have today, assuming that we get no software progress at all. And what we saw with the hopper chips is that we got a tremendous amount of progress just from software. So when you combine this 30 to 50 times multiple with a maybe another 100 times multiple on software if we make another breakthrough, we're looking at some pretty insane improvements here that are really hard to wrap your head around.

Starting point is 00:06:30 I want to point out that these improvements, these multiples that you just mentioned, are just on the speed and power of these hardware modules, right? So it's going to work 3x harder or 14x harder, but it's also going to cost you a lot less to be able to do. train the same type of intelligence or model. So the intelligence per density, which is a unit that we completely made up, and we don't know if it exists, but it somehow rhymes, or in my head at least, is improving and it's going to be cheaper with each successive model. But if you want to get a bit of context as to like what that looks like in terms of like the models that you use today and what it's going to look like tomorrow, we have this other table here, which kind of like maps it out.

Starting point is 00:07:11 So with Blackwell today, you get about a 2 to 3x more intelligent, crazier model, right? That's what called Mythos is supposedly meant to be. It's like a larger size. It's trained on these Blackwells. You're going to see a bunch of models similar come out from Open AI and XIA over the next couple of months. And just to pause you there, these are already models deemed too dangerous to release for the public. Yes. Just like, just like there are emergency meetings literally being called by the federal chair, top banks.

Starting point is 00:07:36 Actually, I read something yesterday that the NSA is using. or conferring or re-engaged with Anthropic, as well as the Pentagon and the U.S. Defense Department, after banning and blacklisting Anthropic because it's so powerful. And that's where we are today. That's today. So that's right here, 26, 2 to 3x, right? Yeah, crazy.

Starting point is 00:07:57 Now, you might notice that by next year, we have a larger multiple on the original multiple. By next year, we're going to have a 10 to 50 next improvement purely through Vera Rubin, GPUs. Now, I must emphasize, this does not. include post-training. This doesn't include all the fine fancy techniques that AI labs themselves will implement to make a smart model. This is just the hardware. It's like buying the hardware and training a model today versus next year, you're going to get a 10 to 50 next more intelligent

Starting point is 00:08:27 model, but it gets even scarier. 2028, 30 to 50x. 2029, 100 to 200x. Now, I haven't seen these multiples in any other industry for any kind of performance or hardware improvement. So this, this I can't wrap my head around this because it looks like just a few small numbers that are getting larger, but these are multiples of its predecessor, which means that like we're probably going to get AGI by honestly by the start of next year. And they're trained on hardware that currently exists and is rolling out. I don't know. I'm just kind of scared reading all of this, to be honest, because like what happens if we have universal access to this? Like there's going to be a load of malicious actors which can use these models for various different things. But also, I don't know what

Starting point is 00:09:10 these models are going to be capable of, they're going to be so much smarter than humans themselves. The disturbing thing is that this technology is here. Like this is, it's no longer an engineering problem. It's just a matter of actually producing the thing and plugging it into an outlet and putting it online. And this is coming. Like, there are no novel breakthroughs required to make this a reality. Now, what that looks like on the other side, I don't know, but I think it's safe to assume

Starting point is 00:09:34 the velocity of improvement we're going to get is certainly not slowing down. It is turning more closely resemble vertical line than anything else. And I think it begs the question, like, at what point do we reach AGI and how do we even define that? Because I'm not sure we spoke about that much on the show, but EJA, when you say AGI, what do you mean by AGI? What would you be looking for to declare, okay, we have finally reached AGI? Okay, so this is like my own made-up definition, but it's what will make me go,

Starting point is 00:10:03 okay, this is AGI. It would be a single, AI model, not many, but a single AI model that advances the frontier of three key major industries autonomously. So I'll pick these industries as examples. Financial industry, so it trades better than the average world, sorry, than the best hedge fund or investor. It is able to make assessments better than any of the financial analysts, the top experts, etc. in that industry. In science, it has discovered a bunch of medical cures for some major disease. such as cancer, Alzheimer's and stuff like that,

Starting point is 00:10:40 that scientists, top scientists at their top level could not figure out. It accelerates their research. And maybe one of the industry that I can't think of right now, but it's when these models start doing things that the best of the best humans right now couldn't figure out themselves and couldn't have seen themselves. Do you have a similar definition?

Starting point is 00:10:59 Yeah, I think that sounds right. I think, and again, it's very fuzzy. Everyone kind of has their own custom definition of what they believe AGI is going to be. but for me, it's just AI that's smarter than the smartest human at pretty much any cognitive task that exists. So you can go to this model and it will be better than anyone else who you can ask on planet Earth about anything. And the problem with models today is they're very spiky. Like you can do this for code probably and it can code better than every human on Earth.

Starting point is 00:11:25 But if you ask it, you know, a generalized question about something that you really know a lot about, there's a lot of times where it's not completely accurate or it will respond as if it has the intelligence of a three-year-old, it fails the reasoning tests of a lot of simple things, it still feels like it's this very spiky entity. Once it is fully developed, once it is actually better at every cognitive task, that includes physical things too. That includes like understanding physics of the real world models. That feels like AGI. And then artificial superintelligence ASI feels like it is smarter than all humans combined. So it's like if we put all of our brains together, no matter how long we tried, we can never come up with the things that artificial super

Starting point is 00:12:03 intelligence will come up with. And I mean, will we get there using this chip architecture? Possibly. I'm seeing a 50x multiple, not including the software multiples. And like those compounding on top of each other at the rate that we're moving seems like the only real constraint is going to be physical. It's going to be actually rolling out these models and powering them on. Well, another crazy thing is I think a lot of people including myself would assume that with every chip upgrade, it's going to be more expensive. and it's going to be bigger. It's going to be clunkier, right?

Starting point is 00:12:36 Like the data centers are going to get bigger. It's going to be more expensive. I wish I had a chart to show this, but it's actually the complete inverse. And I'll give you some examples, some numbers to explain that, right? So a reasoning task that costs $1 on Blackwell costs 20 cents on Vera Rubin,

Starting point is 00:12:53 which is rolling out as we speak or later this year. And it'll only cost seven cents on Rubin Ultra, which starts to get released by the start of next year. So cost is going down pretty massively. Now, by 2028, Jensen announced the Feynman GPU, right? A single rack of that. So we're talking about just a couple of that, right, blocked on top of each other, will process more compute than was required to train GPT4 that you mentioned earlier, Josh.

Starting point is 00:13:23 So the point is less is more, but somehow more powerful, but also somehow more cheap relative to the intelligence that you're building. And if you assume this intelligence is going to reach this ASI, AGI-like state, it's going to make you money as well. So you end up just having, I guess, I'm afraid to say this, but the best of old worlds, both worlds. I don't know what humans are going to be doing, but it's great for AI, basically. Yeah, there's no world in which things don't get better. And it feels like right now we're really just constrained by this compute power. There's this great meme that I saw online.

Starting point is 00:13:55 It said mythos is too powerful public release, but the reality is that they're just completely out of compute. and Anthropic can't actually supply the tokens required to give mythos to the world. These optimizations, these cost structures, yeah, there it is. We got on screen now. Great meme. Great meme. But these cause structures that are going to incur from these new models

Starting point is 00:14:15 are going to completely destroy that factor, at least for now, until whatever that next generation of model is that is so powerful that it's constraining GPUs. And the interesting thing is that OpenAI has the same exact thing going on. All these models are kind of converging on the same spot, but they all seem to be compute constrained. I think what critics will push back on though, Josh, for everything that we've said so far,

Starting point is 00:14:38 is, okay, cool, you can buy these new hardware things, but why would you do that if you could just wait a few months or six months and buy the next thing? Jensen's just shipping out these products. He's making a load more money. It doesn't make sense. These things are depreciating assets. By the time you've bought the first one

Starting point is 00:14:53 and you've ramped that up with power and training your next model, there's already three other new chip architectures. and he would be right, that critic would be right, except that they're massively, massively wrong, and we have proof for that, right? GPUs have now become this anti-depreciation machine. One of the most amazing things about this phenomenon,

Starting point is 00:15:12 and it feels like a narrative violation, is the idea that the GPUs that were released three years ago are actually more valuable today than they were at the time they launched, which is a pretty bizarre idea. We have this artifact on screen that shows a chart, and an H-100 from Nvidia cost $30,000 when it launched in 2023. At its peak, because of the scarcity, because everyone needs these things, it was selling for a four-times multiple at $120,000 per H-100.

Starting point is 00:15:40 This is kind of outrageous. It was a little exorbitant. We don't need to be paying that much money. But now that they are old, they're not depreciated, but there's much better hardware out there, they're still holding their price at $30,000. In fact, you can see a rebound that happens in late 2025, where the cost of these H-100 GPUs actually ticks upwards. And I think a lot of the people, Michael Burry most famously, who is the guy behind the big short, he created an entire short thesis around the idea that the

Starting point is 00:16:08 depreciation schedule of these GPUs wasn't aggressive enough. And they were actually going to lose their value and therefore the market was going to deflate because the companies weren't marking these down properly. The reality is that not only are they not going down, they're starting to trend back up because the incremental cost for a token is so low with these. And Everyone's so desperate for compute that they're like, well, might as well spend some extra money, get the H100s and start generating inference tokens with them. It's this pretty amazing phenomenon that's happening. Yeah. So if you're wondering why this is happening, explicitly, its AI demand is growing faster than chip supply can expand. We don't have enough fabs or the manufacturing prowess or the energy grid to support creating and generating more GPUs to satiate the demand that we're seeing in AI across all these different industries, right?

Starting point is 00:16:54 It's a very pervasive bit of technology. Now, the data that we're showing you on the screen right now isn't siloed to like a few research papers. This is happening in the market right now and it's incredibly liquid. So a new phenomenon of companies in AI whose stocks have all skyrocketed are these things called Neo Clouds, right? So these are like, think of it as like AWS. They supply compute to train your AI models by setting up their own data centers and they kind of like provided to you in like a cloud. or data center specific structure. Examples would be

Starting point is 00:17:27 Call Weave, for example. The idea here is these data centers or these GPU providers, 70% of the GPUs that they're running are old GPUs that we're showing you on our screen right now. And they're booked out, I'm not exaggerating,

Starting point is 00:17:42 six to 12 months in advance. In fact, they're done so in contracts and the same providers renew the contracts three months before the contract needs to be renewed just to make sure that they get access to these older GPUs. the point I'm trying to make, and you mentioned this just now, Josh, is all that matters is,

Starting point is 00:17:58 can I get AI tokens generated to do the thing that my company needs or answer the prompt that I have? And if the answer is yes, and it's for a reasonable price, I'm down to go for that because the value that you can build and earn on top of that is invaluable. They can have a large markup on that. So it makes sense that these assets are kind of like in high demand. And to your earlier point. Michael J. Burry shorted the entire market saying that these are depreciating asset, and he got that completely wrong. And his thesis specifically was based on it can't train frontier models. And he's actually right. The older models can't train frontier models. But what they are being used for is one thing very specifically, inference, which is,

Starting point is 00:18:39 if someone has a question, how do I get them the answer? How do I process the prompt? That's what the older GPUs are being used for, and they're really damn good at it. And the reason why it's important and essential for AI labs specifically who are training models, who you might think might want the expensive models is they have a ton of inference. They have the use inference to even train the new models. So it's this new paradigm where all these do, these old GPU architectures are being refound or repurposed for this really important thing that is inference. So important context to understand if you're investing in some of these companies, for example. Yeah, and why is it so valuable? Well, it's a testament to the software improvements, right? So we have those software efficiency

Starting point is 00:19:16 the improvements that we didn't have three years ago. So that same hardware generates a lot more value. And if we scroll down to the value multiplier section of this artifact, it shows that the cost of a chatbot inference in 2023 was $3 an hour. And now autonomous agents completing these complex task is $30 to $300 per hour. So the value that you can charge for these tokens is significantly higher than it was in the past. And the amount of tokens that you're able to generate efficiently, that higher quality is much higher as well. So there's this. all of these converging forces that are just making the market desperate for compute. Nobody has the compute required that they want.

Starting point is 00:19:53 And Nvidia is trying to put it online as fast as they can, but it's not fast enough. And I assume as we go through this, we're going to continue to see varying bottlenecks, and the efficiencies will move to where there aren't bottlenecks, which creates new bottlenecks. Right now we're seeing some convergence around CPUs, and CPUs seem to be like they're going to be hitting a shortage somewhat soon because we're out of GPUs. Let's move to CPUs. And it's this really interesting dynamic. but that is the idea on this Nvidia episode,

Starting point is 00:20:17 or just the chip episode in general, that it is hard to imagine a world in which we don't reach AGI given the currently announced infrastructure. It doesn't require any breakthroughs. It's just if Nvidia does what they announced on stage through Jensen Huang through these next three chips, it is almost impossible to imagine what the world of intelligence is going to look like.

Starting point is 00:20:36 And I think it's important to understand is that mythos is trained on a two-year-old chip. And no one's really talking about that. So it blew my mind. Hopefully it blew yours as well. At least found it a little bit fascinating. And that is our episode today. Thank you guys so much for watching.

Starting point is 00:20:50 We really appreciate it. And I know some of you are probably thinking, there's a bunch of challenges here. And Josh actually just mentioned one of them, which is like you've got CPUs. We don't have enough energy. We don't have enough memory. And that's like, you know,

Starting point is 00:21:02 another episode that we can get into. So all of those things assumed will be leveled at some point. And we're going to see all those industries grow versus being constrained. Like, people are throwing trillions of dollars into this industry, so all of those problems should theoretically be fixed. But rest be sure, we will be the first show to cover it and give you those thoughts before it happens, by the way. And Intel is a sneaky one to get into. But we'll talk about that another time. Thank you so much for listening.

Starting point is 00:21:27 If you are not subscribed to us, please subscribe. It helps us out massively. We are having banger weeks on YouTube, Spotify, Apple, and wherever you listen to us, please rate us. Leave us a comment. We love hearing your feedback. There are like thousands of newbies that are listening to the show. Welcome. And also give us feedback about stuff that we may not be covering that you want to hear more of.

Starting point is 00:21:48 We're always open to feedback. But until then, I guess we'll see you on the next one.

Limitless Podcast - Exploring the Tech that Enables AGI: Claude Mythos and NVIDIA's Next Generation

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.