Limitless Podcast - THIS WEEK IN AI: Chat GPT-5.5 Beats Claude Mythos, SpaceX Cursor Rumors, Google's New TPU's

Starting point is 00:00:00 The most powerful model in the world is here right now. In fact, it's so good that it beats Claude Mythos. OpenAI just released ChatGPT 5.5, and it crushes Claude on every single benchmark. It's the new number one coding model. It can do 20-hour tasks that expert software engineers sometimes can't do. It's already discovered groundbreaking solutions in maths and frontier sciences, such as genetics, and it's cheaper than GPT 5.4. This is the result of two years worth of frontier research released in this one single model.

Starting point is 00:00:32 In fact, it's so good that an Nvidia engineer said, and I quote, losing access to GPD 5.5 feels like I've had a limb amputated. I think a lot of people are going to compare this to Opus 4.7, and that's fair. But I really think the true comparison is to Mythos, because Sam Elman recently, he just posted something as the model was coming out, that felt very much like a jab at Mythos. And we're going to get into the benchmarks comparing them, many of which will actually beat the Claude model.

Starting point is 00:00:58 But what I find most interesting about this post is the second paragraph where he says, we believe in democratization. And he mentioned specifically, we have been tracking cybersecurity as a preparedness category for a long time and have built mitigations we believe in that enable us to make capable models broadly available. So this is very much a dig at Mythos, which is, as we all know, privately available, only gated to the companies that are given allowance to it. ChatGPT and OpenAI are like, hey, we're going to give you the powerful cybersecurity.

Starting point is 00:01:25 we're just going to bake in the precautions into the model so that everyone could have it. And it ends by saying it's this really sweet thing. It's like we we love you and we want you to win. We believe in everyone having access to this intelligence. And I really respect that. And I think it's an awesome way to set the precedence for what the next generation of these models is going to look like. But before we're going any further, let's talk about the model itself. It's out right now. If you have a chat GPT membership, you can go and use it, go and play with it. EJS, what's the TLDR? What are the high level things that everyone should know? What's most new and noteworthy about GPT 5.5. Okay. So, in By your mythos comparison, the first question that popped into my head is I use Claude Opus 4.7 every single day. So I'm like, is it better than this? Like, should I be switching back to chat GBT right now? The answer might be yes. So if we look at the benchmark score right here, GPT 5.5 on the left over here absolutely crushes all the standard benchmarks that these frontier models are weighted against. And if you look on the right over here, Claude Opus 4.7, it either doesn't even measure in a particular category or it's completely begued. by GPD 5.5. In fact, the only stat that GPT 5.5 doesn't beat Opus 4.7 in is something called

Starting point is 00:02:31 software engineering benchmark verified pro or something like that. It's like the pro software coding situation. But there's a footnote at the bottom of this blog where Open Air States Anthropics publicly said that they might have gamed that particular benchmark and they need to be re-evaluated. So we might have a complete clean sweep for 5.5 as we see today. So it's an incredibly powerful model, but a question that popped to my head is, does it actually beat Mithos? And we have a direct comparison right here. Yeah, so it shows that it does across some benchmarks. Now, again, these benchmarks are pretty fuzzy. We don't know which ones are game to do what, but there is a world in which GPT 5.5 will outperform Mythos on some things, which ones were not entirely

Starting point is 00:03:12 sure. I think as we kind of figure out ways to describe GPT 5.5, it seems as if it's their first attempt at making a model built for autonomy instead of answers. I think a lot of the benchmarks that they're working on is in agent decoding, things like it handles tasks that are 20 hours long. We'll get into that. It's doing 85% of OpenAI's internal work already. And it also helped rewrite the infrastructure that built it. There was this amazing quote in the blog post that said, OpenAI says 5.5 itself helped optimize the stack that serves it. Codex analyzed weeks of production traffic and wrote custom heuristics for load balancing that boosted token generation speed by over 20%. So they're using the model to actually build the model and make it maximally efficient

Starting point is 00:03:54 based on the data that it's collected from users like us who are interacting with the model on a daily basis. So it's very smart, it's very clever, it's not just there to give you answers. It's there to think deeply and actually solve problems for you in a way that I think mythos and a lot of these other frontier models are kind of pivoting towards now. The great thing about this model release is it reveals a few things that OpenA has as an advantage against, say, a frontier lab like Anthropic. Like, it's clear looking at these benchmarks compared to METOS, which, by the way, the entire world is spiraling because of this model,

Starting point is 00:04:25 because it's going to, like, have the cybersecurity ability to take over any kind of government system. This model is pretty close, and Sam is going to be releasing this publicly, or Open Area is going to be releasing it publicly for everyone to use. So a question that pops to my head is, does this mean that it's a matter of compute, and Open Area just simply has more of them? Certainly, if you compare Sam Altman's, ability to acquire compute and spend all these trillions of dollars to acquire it versus Anthropic. Anthropic has been extremely conservative and now they're struggling. Like, you know, they recently

Starting point is 00:04:55 signed a $5 billion deal with Amazon, which we'll get to later on. But the point is, this is a tale of two stories. Either Open Air has enough compute and they're about to leapfrog called because of that, and they're proving that through this model that is a very good answer to Mythos or, and this is the alternative side, Anthropics' mythos model is just plainly better than 5.5. And these benchmarks are actually verified, which is technically kind of true because I don't know how official these things are. These are just through tests that a small set of users have done. So it's a game of both. I'm sure Anthropics is watching this and thinking, hmm, maybe we should roll out mythos, but they don't have the compute. Yeah, they don't have the inference. In fact,

Starting point is 00:05:32 is speaking of the inference, Sam actually made a post saying that he's really excellent work by the inference team to serve this model so efficiently. He wanted to really highlight the fact that to a significant degree, they have become an AI inference company now. And I think that's a really big difference than what has previously stated. Like, Anthropic has a really tough time serving compute, and we see that. And even if they had mythos available in a way that was safe, they can't serve it. Open AI can. And we see it reflected in pricing because, I mean, we have some pricing for this model, right? And it seems as if it's roughly at par with 4.7, if not slightly better. It's slightly more expensive, but not by much. So for every million tokens input, it's both

Starting point is 00:06:09 the same for Anthropic Opus 4.7 and GPD 5.5. It's $5.5. But the output is $30 for 5.5 per million tokens and $25 per million tokens for 4.7. So it's a little more expensive. So it's a little more expensive, but here's where you actually have more of a bargain using the more expensive model 5.5. It is cheaper than GPD 5.4 and it uses tokens way more efficiently to think. So what does that mean if you are an enterprise that wants to, you know, plug in this AI model and not worry about it and just have it power your entire profit engine. Well, you end up using less token so you hit your rate limits in a much slower rate, which means that you end up getting more bang for your buck as long as you use the model like 24-7, or you use it effectively. If you are just

Starting point is 00:06:58 kind of out there using 5.5 to like ask questions that you should maybe be asking Google, this is probably not the model for you, but otherwise it's super powerful one. Yeah, and if these prices don't mean anything to you, that's fine. As long as you have a $20 a month subscription, in fact, this is going to be available to free users fairly soon, I believe. But anyone who is a subscriber has access to this. You don't need to use the API. There's nothing fancy. You open up your app on your phone. You go to the web browser. It's there. It's available, ready to go. Now, there's a few interesting things that you can do with this model that haven't previously been possible. And although we don't quite have access to it just yet, we're recording this right

Starting point is 00:07:31 as the model got launched, we do have a blog post from OpenAI themselves who are showcasing a few demo. So again, take these with the grain of salt. These are straight from Open AI, but they are seemingly pretty impressive and pretty noteworthy as to what they're capable of doing, starting with this space mission application, which is pretty cool and very reminiscent of the moon mission that we just had. Yeah. So if you guys don't know, Josh has a sequel, he has many secrets on this show. One is he's a massive space fan. And when he's not hanging out with me, he's doing space simulations on whatever he can do, right? Well, okay, maybe part of that is a bit of a lie. But, um, With this new app that we're seeing in front of us right now, this was completely vibe-coded using 5.5.

Starting point is 00:08:12 And it's used to simulate a specific space mission. Now, if this looks very similar, it's because we just had a space mission. First time we visited or went back to the moon in 53 years. Pretty big deal. And we can see a pretty accurate simulation going on right here. So as you can see, there's various different toggles. The physics of the entire thing is very important. And that's another point I want to make about this model.

Starting point is 00:08:32 It is being used for frontier research, not just in AI, but in mathematics. in genetics. It made frontier progression on both of these fronts. And so what we're showing here is this is a model that goes way beyond just text and telling you what could be. It actually implements this into a lot of different things and understands the world around it, which is extremely powerful. But we have another one here. We have an earthquake tracker. For anyone who wants to make websites, it's so good at making websites. And this appears to be one of the strong suits. In this case, there's a few things to highlight on this earthquake tracker. One of them, being that it's one just like a pretty elegantly designed website but two all of the graphics are

Starting point is 00:09:09 interactive you'll notice that they update dynamically as you hover over them as you click it looks very clean i assume that it is pulling up-to-date information from an API somewhere that it's set up it is just truly competent and capable of doing these kind of longer tail tasks that are a bit more complicated than a static landing page but have dynamic data have the richness that you would expect from a high-end high-quality polished website except just built with an AI model from the someone who doesn't need to know anything about coding at all. And then for the gamers, also, there's another great example of a dungeon game, which is they're describing as a playable 3D dungeon arena prototype,

Starting point is 00:09:44 built with Codex and GPT models. Now, I think this is something novel to this setup, where Codex handles the game architecture, the combat systems, the enemy encounters, and then the character models, the character textures and animations, those were created with third-party asset generation tools using something like ImageGen 2.0. So this is also one of the earlier signs where you can actually merge a lot of these tools together to build something dynamic in a way that you previously couldn't have done before. Yeah, actually, the quality of this game looks like something out of

Starting point is 00:10:12 a League of Legends or something like that. At least that's what it reminds me of. Like, these games are getting way more high-depth than I expected. I know it's just like, it's pretty basic for anyone that's watching this, they can kind of like pick with a fine ride, but it's cool. But for those of you who prefer like the more traditional side of games, this might be something that you can kind of vibe code in a couple of minutes. Now, it may look basic, but theoretically, this is like a 3D spatially aware game and that's not something that you could achieve at least very easily with previous models.

Starting point is 00:10:40 What I love about this as well is it's also, they've also created or included the prompt for all of these things. So this is something that you can try right now. Like look at this. And the prompt is no more than what's this one to do for like 12 lives, 12 lines, dude. And you can have like a fully functioning game.

Starting point is 00:10:56 You can probably then add an extra step or extra prompt saying, hey, can you deploy this to VersaL and send that to your friends. Now you can use, you have a game. you're a game creator, you're a game developer. So the applications for this model cannot be understated. I'm going to be very honest.

Starting point is 00:11:11 I thought this model was going to be just an iterative upgrade. I didn't think it would get anywhere near Claude Methos. Two stories have now revealed themselves, which is, one, it's the answer to Claude Methos, and two, it's really damn good. I am now convinced that compute is everything, but not in the way that I thought it would be useful. I thought it would be largely for pre-training.

Starting point is 00:11:31 But to Sam's tweet earlier on, and also in Greg Brockman, the president of OpenAI's recent interview, they're going all in on inference, test time compute, which just means that if you have more compute and if you have a good enough model, it can do the thing. This thing, like I said, built itself. It's a self-improving model. Very, very impressive.

Starting point is 00:11:49 It's good for solving hard problems. It's good for thinking for a long time. In fact, they marketed it as a model that can now think for 20 hours coherently, which is almost a full day it can work on a problem. And what you're noticing from this prompt that's on screen is it doesn't take that much to get it going. You don't need to kind of spoon-feed it all the way through anymore.

Starting point is 00:12:06 It can make decisions on its own. It can infer conclusions on what you want just based on the knowledge architecture that it currently has. It's amazingly impressive. In fact, one of the people who got access to it early just posted on X that he's posting live as his prompt is seven hours into its task. It has been running for over seven hours. He said, this has literally never happened before.

Starting point is 00:12:26 The models would maybe run for 30 minutes or so. Wow. Or if you really shouted them after two to three hours. But he's on seven plus hours. I think this is going to be fun for people with complicated things. If you really want to make a AAA feeling video game or a simulator or a really complex website, this is the model to try out and to use it with codex and see how all these things kind of piece together. It's really, I mean, I wasn't, I didn't have my hopes very high based on the Opus 4.7 to 4.6 incremental improvement.

Starting point is 00:12:54 This seems like a very solid improvement over 5.4. Absolutely. And listen, if you are listening to this and you're like, listen, I'm not a gamer. I can't waste my time with that. I focus on more serious things. Well, for you serious people, if you're a manager at a top company or whatever that might be, this isn't just a toy or a model used for coders. A lot of the examples that we just gave are around coding.

Starting point is 00:13:15 You can use this for just admin stuff or managerial work. Like the capability of this model to think more strategically and long term and understand the context of the tasks that you're working towards. Like we said earlier, for coding specifically, it can work on 20-hour-long expert task. That also applies for administrative stuff or things that are more generalized white-collar worker work. And so in this example, Noam Brown says, I'm a manager at Open AI, but I'm using this model to basically manage my entire team and make sure we're focused on the right things. And guess what? The output of this team and this product has been pretty amazing.

Starting point is 00:13:49 So all around really excellent work by the entire team and the inference team specifically, as Sam Altman says here. And yeah, I'm looking forward to using this thing. I don't have access to it right now. I've refreshed my account probably like five times at this point and it hasn't appeared. So maybe it's like a slow rollout. But if you're listening to this and you've tried it out, let us know what you're using it for. Let us know what it amazes you.

Starting point is 00:14:10 Like, I really want to hear more. Yeah, Open AX had a pretty incredible week. And this comes on the back of their new ImageGen model that they just released, which was also unbelievable. If you haven't seen that episode, we just recorded it yesterday. So I would go advise you to see because, oh, my God, it is amazing. We also recorded an episode on Apple's new CEO this week and what that means for the company, as well as the hardware race and how this, I mean, this model, Opus, no, not Opus, this is GPT.

Starting point is 00:14:34 GPT 5.5 is very much part of the AGI class of models that is built on Blackwell chips. And we've recorded an entire episode all about that. Very interesting, very fascinating. Also interesting and fascinating because, as always, this is the weekly roundup. We have a few other topics to talk about. We have some news out of SpaceX, which is a pseudo acquisition. Now, they haven't quite acquired Cursor being the company in question, but they have at least partnered with them with the option to buy cursor for either $60 billion or pay $10 billion

Starting point is 00:15:03 for the right to actually work together. This seems like a big deal. This seems like, I mean, XAI, we could call it SpaceX, but SpaceX AI is taking AI very seriously. They're currently behind. They clearly don't want to be behind. This is a huge step and a huge kind of trust of support in cursor with this minimum of $10 billion into accelerating their progress and trying to get themselves into this game. This is actually a genius deal. And there are a few stories. why it makes that so. So let me explain. If your space XAI,

Starting point is 00:15:32 which by the way is a ridiculous name now, like we'll just call them XAI, you are currently harboring one to 1.5 million of the frontier GPUs, mainly Nvidia, in a warehouse. There's one issue.

Starting point is 00:15:47 You're not really utilizing all of it because XAI has had a bit of a slow start to training their models. What's a genius idea? Hmm. If I rent those out to another company, to train their own model, then we can make money from that.

Starting point is 00:16:01 Okay, so that's win number one for SpaceX. But then they've thought of another thing, which is, huh, GROC isn't really good at coding, and we are losing the race every single day we don't update our model like coding because Anthropic and ChatGPT 5.5 is completely running away with it. So how did they leapfrog and get ahead? They should acquire the company that is using their own GPUs

Starting point is 00:16:24 to train a frontier coding model. So then the question becomes, well, who the hell is Cursor? What's the mode that they have? Like, why do they have a good shot of training a better coding model than Anthropic and GPD 5155? Aren't those two companies way ahead? Well, the answer is not quite so. Cursor, for the longest time, was the number one platform and tool for people to use to do their vibe coding. Why? Not only did they have access to frontier coding models from Claude and Chatt GPT, they also had something called an agent harness. Now, you'll notice in GPD 505, it's really good at coding, because of something called agenic coding.

Starting point is 00:17:00 That is something that Cursor pretty much pioneered. It's basically the harness, the prompts, the environment, that they mold the model, or rather that they mold around the model, that makes it so good and intuitive and remembers the context across every single project, like menial things, like understanding your GitHub branches and working on separate flows at the same time.

Starting point is 00:17:20 A lot of the top software engineers in the world right now use tools like Cursa and Agentic Coding to be able to pull this up. So Elon Musk thought, hmm, if I give you the GPUs to train a better coding model, which gives you a better product, I should have the option to acquire you. In inquiring you, I can integrate you with GROC and GROC somehow becomes the number one coding model over the next year or so, depending on if this deal goes. And if the deal falls through and they create a really bad model, well, you pay me $10 billion for the service. Or I pay you $10 billion. Not a bad deal.

Starting point is 00:17:53 Yeah, it seems like they're going to be continuing to work with other companies to accelerate. in places that they're weak at currently. Because, I mean, they're so strong at building out the hardware and creating these huge data centers. They need someone who could take advantage of all those GPUs. Hopefully, this will help serve that cause. And that's not the only SpaceX news this week. The other is that they have officially filed an S-1,

Starting point is 00:18:12 which for those who are not familiar, it means they're going public. It's officially official 100%. They will be going public this year. If there were any doubts, please let them be relinquished. Here we have it. SpaceX will be going published. So the most interesting thing from this was,

Starting point is 00:18:25 I think the share structure. sure of how they're going to be organizing this for Daddy Elon who's going to be getting quite a big payday if he does well. So we have on screen here just a series of some of the financials. I mean, we know Starlink as a business has been doing unbelievable. They have about $25 billion in cash, $92 billion in assets, $50 billion liability. Dude, that's quite a lot of liabilities on this. My God. They got a lot of debt, man. I don't know. We'll see once they finally publish everything. I'm very excited for the first earnings report where you really get a true peek behind the scenes of what's going on there.

Starting point is 00:18:56 But it looks like it's going to be going public at a $1.75 trillion evaluation. Now, in terms of pay structure, Elon is posed to get 60 million shares, which is 11 tranches, vesting in $500 billion market cap increments from $1.1 trillion to $6.6 trillion share price. Oh. So for those unfamiliar with the current ceiling, I think it's Nvidia. Invidia is what? 5 trillion, under 5 trillion, close to a 5 trillion? Under 5.3.

Starting point is 00:19:26 five. Okay, so not even close. They're like 20% away from five trillion. SpaceX needs to be, what is that? Like 20 something percent, more valuable than the most valuable company in the world. But if they do, Elon gets 60 million shares. Now, I haven't done the math on exactly how much that is. But if we make some assumptions here, the total value at Vest looks like it could be about a quarter of a trillion dollars. So pretty good payday for Elon. I think the most important thing is that he's getting a lot of control over this. it seems as if he's going to have 40 something percent control of the company, which is really ultimately what was most important to him as they went public. So really exciting news.

Starting point is 00:20:03 I am hopeful that it happens this June, which we can expect. And it's without a shadow of a doubt going to be the largest IPO in history. I think everyone's going to be talking about it. There is a new vehicle in which some people are investing in. We're actually going to have the founder on the show soon. So keep an eye out for that one. And yeah, the SpaceX news is very exciting. Now, in the world of AI hardware, many people think that Nvidia has run away with the win.

Starting point is 00:20:27 And you could argue that with a $4.300 market cap, not many people are competing, except that there is one company, Google. Now, you might be thinking Google does all my search engine and stuff. Well, Google is the only vertically integrated Mag 7 company that is involved or has a frontier capability at every single layer of the AI stack. Now, right at the bottom are these things called Google TPUs, tensor processing units, and they're their version of the GPU. In fact, fun fact, Google's Gemini models has never trained on an Nvidia GPU. It's all been their own internal warehouse infrastructure, and they've been working on this thing for 10 years.

Starting point is 00:21:06 Now, just today, or rather, this week, they released their latest generation of TPUs, the TPU8T and the TPUAI. Now, the TPU8T, T stands for training or pre-training. It is highly optimized for the pre-training part of an AI model. So this is like the bulk, arguably the more expensive part of training a model. It's like teaching it like, hey, these are words, these are the general fundamental set of facts that you need to know before we can kind of like put you out into the world and present you to our users. TPU8I is specialized or hyper-specialized in inference specifically. Now, the important part about inference is it's being used for so many different things.

Starting point is 00:21:46 Number one, it's to answer all your different prompts. Whenever you write a prompt and you submit it to an AI model, it is known as inference. It's getting inference. It needs to query the model and make sure it like does the right types of thinking and gives you the right answer. But the other part of inference is post-training, where a lot of people train the model and then they do more training after the facts by using it to help the model reason and think of other alternative facts before it presents you the actual answer.

Starting point is 00:22:11 and that's what that second TPU is. Now, Google's TPUs have been used extensively. In fact, their largest customer is a little-known AI Lab known as Anthropic, which currently runs 1.5 million TPUs, so the argument can be made that TPUs are largely responsible for Clords and Opus's success. So very impressive all around, but there's some other facts about this, right? Yeah, well, I love the dual architecture training setup that they have here, being hyper-specific.

Starting point is 00:22:37 I mean, the 8-T chip in particular, it's built to reduce frontier model development cycles. they said, from months to weeks. And then we have the AI, which is the reasoning engine, which is specifically served for agentic use to deliver tokens really quick as fast as possible. And as we know, Anthropic is working closely with them. And also, I mean, Google is making these for themselves. So I think whoever is working with Google,

Starting point is 00:22:58 whoever's kind of focus on these accelerators, is probably in for a nice little windfall as it relates to increased velocity of the training and also increase ability to distribute these models. as we know Anthropics is having a very difficult time with this. Now, Nvidia and Jensen are probably feeling a little shook. They've got to be feeling a little bit of pressure here. And it seems as if that's why they're pushing to be open source,

Starting point is 00:23:19 because if you are in a closed source world where everyone is making close-host models on their own architecture, then the Nvidia edge very quickly disappears. And, I mean, I'm looking at these ships in hand. They look beautiful. They're taped out, ready to be manufactured. And I think you could start getting kind of excited about this new world of accelerated hardware.

Starting point is 00:23:37 And we're seeing this happen again and again because Amazon just made another big investment in who else other than Anthropic. And the deal, I think, is like, this has to be close to a record deal. They're owning a tremendous amount of this company now. Yep. So the news here is Amazon announced they're investing $5 billion into Anthropic. They've just raised $5 billion. Congrats. And so the reason why this is important is, well, there's a few reasons.

Starting point is 00:24:04 Number one, Anthropic knows that they don't have enough compute. The argument could be made. That's why Claude Mithos hasn't been rolled out. Well, hey, hey, Presto, now you have $5 billion worth more of compute. Now, for those of you who didn't know, Amazon is a primary investor already in Anthropic. Before this announcement, they owned around 17% of Anthropic.

Starting point is 00:24:23 After this announcement, it's closer to 20%. So we're talking about one company that's publicly tradable right now that owns a fifth. Is my math right? Yeah, a fifth of the world's leading AI lab, which is pretty crazy. Now, if we look into the stats of this, this is a 5-gagawatt deal,

Starting point is 00:24:42 which is more than any single data center that's currently live. It's actually a multiple of five. I think SpaceX AI's Colossus 2 is the largest right now with their 1 million TBs. So it's going to be 5x larger than the average data center that we're seeing right now for AI specifically, and they're aiming to get one gigawatt online by the end of the year.

Starting point is 00:25:01 Now, the reason why this is so good for both teams is Anthropic already has a close relationship. with AWS and Amazon's cloud computing department. So spinning up more compute clusters is going to be so easy for them. They have a working relationship. They're used to training code models on this. So it shouldn't be too hard to ramp this up. If you're Amazon, hey, welcome back.

Starting point is 00:25:21 That $5 billion is going to come right back to you. So I don't know what kind of like circle economy this is, but it's back and it's very impressive for them. Is it ironic that today Amazon hit an all-time high? Oh, gosh. Maybe not. I'm holding stock. I got the stock.

Starting point is 00:25:34 Clearly, clearly they're doing something right. is a phenomenal company there, the largest shareholder in Anthropic. It's hard not to be bullish on them. It's hard not to be bullish on the accelerated computing stack. And I think that's probably what Jensen is getting nervous about. That's by NVIDIA is pushing open source. And the good news is is he has some help. He has some assistance from the folks overseas in China who have been pumping out unbelievable models all week long as it relates to Kimmy and Kwen, our Chinese favorites. We have Kimmy K2.6 and Quinn 3.6. There's a lot of digits and numbers. All you need to know is that the best open source models in the world didn't exist last week, they now exist this week,

Starting point is 00:26:10 and they are better at pretty much everything, but exceptional at coding. In fact, word on the street is that some of these models are as good as GPT 5.4 was and only a few points off of Claude. I mean, these are pretty amazing open source models that, again, are free to run locally on your machine if you have the machine capability of doing so. That's a big, this is big game changer. Okay, so typically the story we tell with these open source models is, Wow, aren't they so amazing? Yeah, they're the good younger brother. They're not as good as the frontier AI labs.

Starting point is 00:26:40 That completely changed this week. So Kimi K2.6 is the latest model from a Chinese lab called Moonshot Labs. I believe it's Moonshot or Moonshot AI. And they released their model, which ends up being as good as coding or at coding as Opus 4.7, and it's 100% open source, like you mentioned, Josh, which means that maybe you could run this on a local device. Now, the answer that you would typically get back from this is,

Starting point is 00:27:03 hey, like, listen, it's too large to run on my laptop. But that is true. But with the latest Quen model, which is a 3.6 version, you can run it as an 18-gibite-sized model, slightly quantized, on your laptop today. So the point that I want to make about these models isn't exactly the specifics, but across all benchmarks,

Starting point is 00:27:23 it's not as good as the frontier AI labs, but it's a few points. That difference and gap has closed massively over the last couple of months, which tells me two things. Number one, China has figured out some kind of groundbreaking way to train their models that they haven't told the West about, and they're going to keep it closed-guard and eventually close-source their model releases going forwards. And number two, they've figured out a new way to use inference to their benefit.

Starting point is 00:27:48 Like, one thing I'm going to highlight here is this new KimiK2.6 model can code continuously for 12 hours straight using 300 agents. So the unlock here isn't one model itself. It's spitting up 300 versions of itself and getting it to attack the problem. That's something Sam realized and what he's implementing in 5.5. That's something Opus 4.7 realized and is doing probably similarly with Mithos. So I have this question here, which is like how they have to try to do this. Well, I think every three months that there's a new open model that gets released, they're making these jumps because they're using these models to train themselves.

Starting point is 00:28:19 We proved that with Kimi K2.5. There's too many two point whatever's. And the same thing is happening with Quinn. It's just all around pretty amazing stuff. Yeah, China's crushing. Okay. So before we go, we have two quick things to hit. The first being, one that we missed.

Starting point is 00:28:32 last week, which we need to touch on quickly. Anthropic has a design tool now. If you are a designer, if you are interested in building webpages, videos, graphics, slideshows, pitch decks, any type of visual asset. Cloud now has an entire design suite built just for this purpose. It's called Cloud design. It exists separately. You can access it through the desktop app or on your browser. And it basically allows you to build visual assets in a way that you couldn't previously. Previously with Claude, you had artifacts, an artifact you could generate something dynamic. It could kind of build do a webpage, this takes it to a whole new level. You could generate wireframes if you want to try it to use less tokens. You could fill it out and create properly created prototypes that are

Starting point is 00:29:11 actually clickable. It's amazing. The video we're seeing on screen highlights a few of them. Unfortunately, there was a big loser in this because this sounds like a lot of what that little design company named Figma does. Yeah, the little company. The stock market did not love the reaction to that, did it? Nope, nope. It is down almost 20% on the week. I actually tracked the stock price after the announcement was made. So it wasn't even readily available. It was literally just the tweet. 20 minutes after it was tweeted,

Starting point is 00:29:37 the stock was down 6%. So the point being, whether this is market speculation or not, like, listen, Claude design isn't as good as Figma. They're working with a few of these different partners such as Canva, but two weeks ago,

Starting point is 00:29:50 one of Anthropics' former most execs left the board of Figma and the rumors was because they were building a competitor. So it's pretty clear, Anthropic is going off to every single sector, whether you're a designer, a software engineer, a mathematician, a research scientist, it doesn't matter.

Starting point is 00:30:05 They're going off to everything because the model is applicable to everything. And I don't know what this means for certain modes that companies like Figma holds, but it's certainly going to affect the stock price. Can you do me a favor and click the Max button real quick for me just to show the chart? Oh, yeah. Yeah, minus 86% since IPO for those who are not watching on screen. It's been a pretty bad, rough run for Figma. We have to start naming Anthropic the stock killer, Josh.

Starting point is 00:30:30 This is like every single tweet is tanking a strong. No, it's tough. It's brutal. It's brutal. We had one one last thing that you wanted to mention. I know. We got to end on this strong. What do we have?

Starting point is 00:30:38 How good is your accent or impersonation of your president, of our president, Josh? Pretty horrible. Not good. Okay. Well, then we're not going to attempt it. I love to hear your British take on it. Oh, hell no. If you're feeling ambitious.

Starting point is 00:30:51 Okay. So my British take on this is, this is, albeit hilarious and somewhat terrifying, that the president of the United States is saying this. he commented, okay, on the government's relationship with Anthropic. Now, if you're wondering why on Earth he's commenting on it, they're going to be releasing this called Mythos model. It might be a security risk. It's probably good for the government to have access to this thing and prepare necessarily.

Starting point is 00:31:13 The government has been having very important conversations with bankers and governments all around the world to just try and figure out, you know, how best to prepare for this. And after having an in-depth discussion with Dario Modi, which, by the way, he blacklisted that CEO and Anthropic entirely from the government using it, he's now re-execlassed. kindling it and saying, maybe there's a deal on the line. He goes, and I quote, I'm not going to do the accent. We'll get along with Anthropic just fine. Trump said on CNBC. We got to let me try. We'll get along with Anthropic just fine. I think they can be of great use to us. They're high IQ people. Very good. Very good. They tend to be on the left, radical left,

Starting point is 00:31:47 but we get along with them. I don't know. That's all I got. But that is what he said. Were you practicing that? That was actually pretty good. I was practicing my head. I was rehearsing. I closed my eyes once you were doing that whilst I was laughing. And that was like feel right. It sounded like him. Good. It channeled his spirit. It was there. It was a good effort. But I believe that's it. That is the end of the roundup. What a whirlwind man. Josh and I are recording this, FYI. It's 4 p.m. over here. Typically, we're morning birds. We deliver this in the morning, but we waited for the announcement of SPUD, GPD 5.5 just for you guys. And we're going to be bringing you the cutting edge news every single week. As Josh mentioned,

Starting point is 00:32:23 we had three other amazing episodes that we filmed earlier this week. Definitely go check them out. They're all each 20-minute song. your commute to work. It's your gym session if you're not that active. Definitely go check it out and let us know what you think. But yeah, Josh, any final thoughts? Call me crazy, but I like the afternoon recordings. I got good energy. I'm like woken up. I'm 100% right now. I'm rocking and rolling. I'm feeling good. So I don't know. Maybe we'll have to lean into this a little bit more, but that's everything. If you've made it this far, if you're still listening to this and you've heard our other episodes, you're caught up. You're done for the week. You can go touch grass.

Starting point is 00:32:52 Enjoy your weekend. There will be a lot more to talk about next weekend. But for now, you have fully synchronized with all of the chaos happening on the frontier of AI and technology. Thank you so much for watching, as always. We very much appreciate it. If you enjoy this episode or any of our previous episodes from this week, don't forget to share them with a friend who you also might enjoy it, possibly. We have a newsletter on Substack that goes live twice a week. Just went live yesterday, going live again tomorrow. The Friday issue is a recap of everything that happens this week, which is always fun and exciting. In fact, I'm going to go write that as soon as we finish this episode. So thank you all for watching. As always, don't forget

Starting point is 00:33:23 subscribe, like, comment, all the good things, and we will see you guys next week.

Limitless Podcast - THIS WEEK IN AI: Chat GPT-5.5 Beats Claude Mythos, SpaceX Cursor Rumors, Google's New TPU's

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.