Limitless Podcast - "Kimi K2 Thinking" is China's Plan To End American AI Dominance

Starting point is 00:00:00 The world's latest and greatest AI model is 100% free for you to download and run at home right now. Kimi K2 Thinking is the latest reasoning model from Moonshot AI Labs, which is a Chinese frontier AI lab, and it beats OpenAI's GPT5, Anthropics Claude, and Google's Gemini, across pretty much all benchmarks. But that's not even the most shocking part. The most shocking part is that it only costs $4.6 million to train and build, which is only a fraction of the billions of dollars spent by Open AI to train GPT in the first place.

Starting point is 00:00:33 It's also 100% open source, which means that you can download and run Frontier AI right at home where you're sitting right now. But of course, it begs two very important questions. Number one, is open source AI the winning strategy? We've been led to believe that closed source is typically the better strategy when you run a business, but China and their AI models are proving us wrong here.

Starting point is 00:00:54 And the second question, the more ominous question, is will the U.S. stock market bubble finally pop? Josh, what have we got here? What is this new model and why is it taking over social media everywhere? They did it again. The Chinese did it again. They knocked it out of the park. Grand Slam, home run.

Starting point is 00:01:12 It's an unbelievably impressive model. And this happens every time. We get this amazing flagship model out of the U.S. A couple months later, we get the same thing, marginally better, at one-tenth of the cost. Like full orders of magnitude of the cost. less than what it costs for the leading AI labs in the US today. The specs are really impressive. We're going to get into everything.

Starting point is 00:01:32 We'll start with, I guess, just like the high-level spec sheet. State of the art on Humanities Last Exam, which is the reference point that we kind of use in terms of benchmarks. It scored the highest anyone's ever scored, 44.9%. It has a bunch of these really cool breakthroughs. But the big thing that it excels at, like it says in the post here, reasoning, agentic search, and coding. Now, there's a few cool things that we could talk about here.

Starting point is 00:01:56 EJAS, maybe we'll just get into the charts because I feel like that's an easy way to visualize how much better this model really is than all of the others. And what we're seeing on this chart is that, well, GPD5 was the best. Kimmy K-2 is now the new best. And this is as it relates to thinking and reasoning. And again, this is so impressive because one, this model is fully open source. You can go download the model and run it yourself locally for free. What were your first thoughts when you saw this? Because to me, I was like, oh my God. Why would I use anything else? My first thought, if I'm being honest, Josh, was like to look at the stock market. I was like, is this going to crash the entire US stock market?

Starting point is 00:02:31 Like when Deepseek initially released the R1 thinking model, do you remember? It was at the end of last year. People's kind of entire bubble and vision of how AI models were trained was completely burst. And since then, China has repeatedly delivered on breaking edge models, one of which is the Moonshot AI Lab team, which built Kimi K2. it's such an impressive model for a few different reasons for me. Number one, it can now compete with all the best. And personally, GPT5 is something that I use pretty much every day,

Starting point is 00:03:03 whether it's for like kind of casual prompts and requests or whether it's kind of like the deeper thinking and research and some of the lines of work that I do. So it's become kind of like quintessential for me. Now, to have a separate model that I can download and run privately on my own computer at home, that I'm showing on this tweet here, that costs 60 cents per million token input and $2.5 output is just an insane cost-cutting average way. If I was running a business using an AI model,

Starting point is 00:03:32 there's like very little reason for me not to switch over to something like this, aside from maybe like maintenance and setup and stuff like that. The other really impressive thing for me, Josh, was the team itself. Like, this is only a two-year-old startup, which reminds me of another two-year-old startup, which is Elon Musk's X-A-I, right? And there's a funny link between these two models, Josh, which is Kimi K-2's reasoning, this thinking model, can do so because it does this like really neat little chain of thought experiment where it takes many steps to kind of think to a logical answer versus just kind of like splurging an answer for you. That's something that Grok Heavy Four did that they pioneered when they launched their new product.

Starting point is 00:04:18 So Kimi K2 is kind of like drawn on some of these learnings from XAI to produce a similar model. The other really cool thing is it does this thing call tool use or tool calling whilst it's thinking. So if you imagine as I'm kind of like trying to think through a complex problem, I will leverage different tools to be able to help me get to the answer. So if I'm doing a maths exam, I can use a calculator or if I'm doing a deep research question, I might use Google. this AI model naturally does that and has access to over two to 300 different tool calls and tool uses whilst it does its thinking. So just overall a very impressively new-looking AI model. Yeah, I just mentioned the cost being 60 cents per million tokens. And I just want to add a little bit of context as to how low that actually is.

Starting point is 00:05:07 I was looking at the GPT5 Pro cost per inputs. And it is $15 per million dollars per million tokens. 15 for the GPT5 Pro cost currently. the output is $120 per million tokens. Granted, this is the top of the top. If you're using GBT5 standard, input is $1.25 per million tokens. Output is $10. So any way you scrape it, it's at least a 2x multi-cost reduction up to like 100x on the highest end,

Starting point is 00:05:38 assuming it can compete with GPT5 Pro, which all those benchmarks suggest it very well can. So the cost is really like it's a big deal. To get, kind of dig more into the point that you were making EJOS and how it actually works, well, we get to this. Sorry, sorry. No, no, no, no, sorry. We'll get there. Save the memes.

Starting point is 00:05:55 Don't spoil the memes yet. We got to get to the funny jokes next. But basically, the way this works is, like, there's this very complicated diagram on the screen. I'm not going to try to even explain what that is. But there's this fun way that I like to describe it when I was describing it to my friend earlier this morning, which is that, like, Kimi K2, it's like this giant school, and it has these things called specialist. And in fact, Kimi K2 has 384. specialists. You could think of these specialists as like a math club or a history club, coding club,

Starting point is 00:06:21 debate, whatever it is. And when you ask it a question, it doesn't invite the whole school. It doesn't invite all the clubs. It's just, EJAS, if you ask for a math question, it will query the math club. And it chooses eight out of those 384 clubs to help combine their answers, pick the experts, and decide how it's going to solve this problem. So it has a trillion parameters, but it only uses 32 billion of them at once. And that's how we're able to get the huge cost reduction, because it uses this thing called mixture of experts. A lot of people describe it as M-O-E. But basically what it is, instead of using the entire model's intelligence to answer, what should I have for breakfast this morning, it will take the chef club, it will take the health club, it will combine those together and it will form

Starting point is 00:07:02 an answer that should hopefully give you just as good as a result if you took the entire model, but it's much more efficient in terms of cost, in terms of energy, and in terms of the amount of tokens that could generate, because it's so much cheaper across the board. And I think that's one of the big, really exciting things that has been cool to see coming out of China. We saw it with Deep Seek, we see it with Kimi, and it's this mixture of agents architecture where they're really kind of modularizing the entire model and only using the stuff that's important for the specific query.

Starting point is 00:07:29 They will put in a very constrained position, which is they didn't have access to the latest GPUs or NVIDA GPUs. There's been a bunch of U.S. tariff restrictions on Chinese labs getting access to these kinds of things. So they've really needed to kind of work within their bounds and means. And so coming up with an architecture like mixture of experts or the one that they did is super important. And it brings me to this meme, Josh, which is, what are we doing here? There is an obvious mismatch between American-made AI models and the Chinese ones.

Starting point is 00:08:01 You've got Open AI, which is now projected to spend $1.4 trillion over the next five years. That's trillion with a T versus Kimi training for $4.6 million. Now, I know there's a bit of like click baitiness here. That $4.6 million was relative to one training run and usually takes a few training runs. But let's say it took like 20 training runs, right? At $4.6 million, that's still only like a like 100 mel, right, or less than that. So it doesn't really matter when you put it into the context that GPT5 is rumored to have cost $1.7 to $2.4 billion for Open AI to train. So there's a mismatch that I don't quite understand, Josh. And that's, what makes me the most nervous when it comes to what American-made companies on Frontier Labs are doing. I feel like they're missing the mark. I don't quite know what it is, whether it's this mixture of experts thing, but there's someone's being sold a lie, and I don't know whether it's me or whether it's me like looking at this Kimi K2 model and being like, wow, it's so amazing. Yeah, when I think about the role that China plays versus the United States in terms of like

Starting point is 00:09:05 open source companies or close source companies here in the U.S., the thing that is reassuring to me, at least, is a lot of these innovative breakthroughs that happen on the software level actually do happen in these private AI labs. We do get like chain of thought and reasoning and there's like this whole slew of new innovation that becomes standard very quickly that all happens in the United States AI labs. And as far as we're concerned, the AI labs in the U.S. still have, they're making the most progress the fastest. They are creating the most innovation. And then what you kind of see like we described earlier in the episode is that innovation starts to trickle down, whether it's voluntary or whether it's stolen, and it gets implemented into these new models,

Starting point is 00:09:44 and they just completely cut out the bottom in terms of cost and efficiency, because that's kind of all they're able to do. They don't have access to the resources of millions of GPUs from Jensen Huang. And in Vibya, they don't have the access to $50 billion of CAPEX just to spend on employees, just to spend on salaries and compensation. So it seems to me like, I mean, we're still doing very well. It's just China is very good at implementing the technology and applying it at scale in a way that's open sourced. And the open source thing, there's a lot to say for that because it's very impressive.

Starting point is 00:10:17 And it's kind of this community effort that we saw early days with the United States. But once they became better, they closed it off. So what happens is you get innovation in one company like Kimmy and then you see it implemented in Deepseek. And then you see it implemented in Quinn. And then suddenly this technology is kind of synchronously growing between the three because it's all open source. they're publishing all the code, all the open weights, and it's much more easier for them to thrive, whereas innovation in the United States very much happens behind a closed wall, and it's only leaked out at the advent of a new model when they release it to the world, and people kind of reverse engineer

Starting point is 00:10:49 how it works. I was reading an article in the Financial Times where they interviewed Jensen Huang, and he said verbatim that China will win the AI race if they continue down the pot that they're currently on, and if the U.S. doesn't kind of ramp up their energy production. He was making a wider point that their open source strategy is pretty effective in the way that they're building these new AI models with the constraints that you just mentioned. Kind of speaking more about the open sourceness and the benefits of this, I've got a tweet up here which shows that Kimmy K2 thinking this new model can basically run on two MacBook M3 Ultras, which is like a couple of thousand dollars worth of cost, which is an insane thing to do to run Frontier AI models. at home privately in your house, trained and fine-tuned on any of your own private data, so you don't need to kind of like sell that data to Sam or Moena or whoever. Just super cool and super cheap, right?

Starting point is 00:11:48 Because you're running local inference at home. So you don't have to worry about anyone kind of like spying on any of your queries or your prompts or your research. It's just all at home, which I thought was super cool. The other part of the open sourcedness, which I found interesting, Josh, was the fact that they had an MIT license with this new release or an adjoining. adjusted MIT license. And we'll dig into that in a second. But the point being, when DeepSeek released their first major open source model and took the world by storm, there wasn't any major

Starting point is 00:12:18 licenses around that. So you could pretty much download and do whatever the hell you wanted to it for it. You could implement it into your own product, whether you were an American founder. And if let's say you scaled that up to a million users that used a feature that was leveraging that deep seek model, you wouldn't have to credit that team at all. Kimmy K2 kind of like takes a step in a different direction here where they're released an MIT license where I think if you hit, I think it's either 10 million or 20 million users for your product. You need to show the Kimi K2 label and say, listen, I'm using this model under the hood. But there's some differences with this license, right, Josh? Can we dig into that? I believe it's modified. I don't know to the extent that it is

Starting point is 00:12:57 modified, but I know that there is something different going on here. What is to say? our only modification part is that if the software or any derivative works thereof is used for any of your commercial products or services that have more than 100 million monthly active users or more than 20 million US dollars or equivalent of other currencies in monthly revenue you shall prominently display Kimmy K2 on the user interface of such product or service that's a fun little marketing ploy fair enough fair enough you know what it reminds me of Josh What's that? It's what Meta tried to do

Starting point is 00:13:32 with their Lama models, right? So Meta is the only other major American company that I can think of that went down this open source AI route and the goal or the intended goal at the time was to basically level the playing field between Meta and OpenAI and other frontier model AI labs

Starting point is 00:13:50 which had raced so far ahead. So if you released all this cutting-edge AI tech for free and accessible to anyone, then it kind of drives down the cost premium that Open AI and all these other frontier AI labs can charge you to access this thing. China is doing that as a vast hole on the American AI stock market, right? So that's why we saw like Nvidia crash, I think, 4.2% on the news getting released and such. I'm curious whether this kind of pops the bubble and the Cappex bubble in America, Josh. Is that a crazy thing to say?

Starting point is 00:14:22 I mean, the markets reacted pretty viscerally to this news. I don't think I have a problem with this. I don't think it's popping a bubble. I don't think we're in trouble. I think this is just totally fine so long as we continue to stay slightly ahead or at least at par. I think we're really excellent at making software, distributing software, creating products. I think China's really good at shamelessly innovating and deploying without needing to go through all of the hoops and intellectual problems that the United States mostly has. So I don't think this will lead to any sort of bubble popping. I think a lot of the frontier innovative stuff still happens in the U.S.

Starting point is 00:15:05 The place where I will begin to start to get a little worried is when this switches to embodied AI. Once we start moving from large language models to implementing these into robots or implementing these into physical hardware, that's where I think we have problems. On the software front, we're good. We're crushing it. Everyone's spending tons of money. On the hardware front, we don't have the same lead.

Starting point is 00:15:29 And over the last, what, 30 to 50 years, we've kind of outsource some manufacturing capabilities to other places. And therefore are just kind of, I mean, everyone knows. We just can't really make things cost effectively here in the United States. If we are at a foot race with China when it comes to making embodied AI, like humanoid robots, specialized robots, whatever it may be, that's where things start to get a little bit scary because that's where there is a significant lead. And that lead is comes in the form of atoms, which are much more difficult to move than bits,

Starting point is 00:15:54 because you can steal some open source code, create this slight innovation on top, roll it out to a billion users overnight, and that's innovation. That does not happen between version two and version three of your humanoid robot. You actually have to build it with a factory, with real materials and people and places,

Starting point is 00:16:08 and it's very difficult and challenging to do, and China very much stands to be the largest winner in that. So I think on the software front, I feel really confident, and as of now, that's all that we're battling on. But in this near future, where things start to become embodied, where AI Bcard becomes physically manifested in the world around us, that seems like a place where I would start looking at Chinese investments a little bit more than the American ones. Okay, I think I might push back a little bit and say that there is reasonable evidence to be bearish on the software side before it gets embodied AI.

Starting point is 00:16:40 I mean, so a few ways to think about it. There is such a gross discrepancy when it comes to capital expenditure for these things. on one side you've got the US spending trillions of dollars literally to train AGI or the best AI models. And on this side, you're in like the hundreds of millions of dollars, which is like an order of magnitude less, right? So there's an obvious mismatch here that we aren't seeing. Whether it comes down to training architecture, training design, or just kind of like hardware manufacturing, I don't know where that kind of advantage is being played, but the Chinese have found it and they're able to kind of really push down on that lever to get ahead or on par with the US.

Starting point is 00:17:22 And they've been able to successfully do this for years now at this point. DeepSeek was kind of like test case one. Now I've seen like, you know, at least 50 open source models come out of Chinese frontier AI labs since then. Number two, it's not like the US government has kind of like not tried to constrain them. We've imposed a number of different sanctions, which include, you know, constraining which GPUs, and Nvidia and other manufacturers within the US can sell to China, but that still hasn't stopped them. They've been able to maintain and train these frontier AI intelligences despite all of these different things. So I think if I would have to look on the other side of this, it would be

Starting point is 00:18:03 so what if you have an open source model that is super cool? Why aren't you using it right now? Like I'm not using Kimi K2 regularly, even though I use GPT5 and it might be better than GPT5. And the answer for me is pretty simple. I'm locked into an eco-source. ecosystem in Open AI that I'm pretty happy with, which is it has memory on me. It understands who I am. It has a context of all the previous chats that I have with it. But also most importantly, Josh, if there's an issue with something on my account or something that I'm trying to use, there's a community that I can access. There's a support team that I can speak to. There's a software ecosystem that supports me, right? Versus me jumping ship to kind of Kimmy K2, setting it up

Starting point is 00:18:40 on my own and then having to like troubleshoot it myself, I think a lot of people will be disincentivized to do that. It is difficult, but I mean, we're seeing market forces from both sides, right? Like I saw you included a link here somewhere where Curse and WinSurf's new AI models. They were using some sort of Chinese models. In fact, they were thinking in Chinese. And I found this really fascinating that, like, American-made products are now thinking in the Chinese language.

Starting point is 00:19:05 So that's certainly a concern in terms of the commercial side, where those API costs really matter, where if you can get a million tokens for 60 cents versus $10, that's, that really affects the margins of your business. For consumers like us, there's no real interest to use Kimi K2. And the phenomenon you spoke about earlier where you can actually run a quantized version of Kimi K2 on to Mac studios running the M3 Ultra chips, it generates tokens at like 13 to 15 tokens per second. So it's very slow.

Starting point is 00:19:36 You're getting like a second sentence or two every second, which it's much slower. It's going to feel groggy. it's not going to feel well. There's a case to be made that that changes because this year, and it's funny that Apple's really the only computer that supports this now. They're releasing the M5 Ultra, which will be the new version. And it's going to be interesting to see how it plays out. What I found interesting, this one side note, actually, that I wanted to share with you,

Starting point is 00:20:00 because you might find it cool too, is the version that runs on these Apple computers, the Apple studios. It's a slightly quantized version. And I heard about this, and I learned about this recently in the Tesla earnings call that they had the shareholder meeting recently. And we're going to have an episode on this later this week. But there's this interesting thing that Elon mentioned during the episode where he was talking about quantized versus floating point AI.

Starting point is 00:20:24 And I was like, what the hell is that? Like, why are you spending so much time talking about this? It doesn't make sense. And what I realized is a lot of AI models, they use like many, many points after the decimal in terms of data to get more precise results. And that is floating point. When you quantize a model, you remove all of the data to the right of the model

Starting point is 00:20:41 and you just go to single integers. So you lose the variance of maybe up to like 60%, but you gain so much faster efficiency, so much better speed improvements, cost improvements, and you can actually run it locally on these things. So I think it's interesting to see the different decisions that people are making in terms of, well, how precise does the model have to be versus how cost effective

Starting point is 00:21:03 and how efficient does it need to be. And what we're seeing with Kimmy Gay-2 is it's very easy to over-index on the efficiency, but maybe that's not the stated goal of open AI, where if they really wanted to, they could sign to quantize these models. They could go more to integer type compute. And it's just something I was thinking about is how they approach them,

Starting point is 00:21:21 because it could just be, well, Kimmy's just kind of optimizing for speed and efficiency and the downstream effect is it's also really fast, whereas Open AI kind of hasn't really optimized for that specifically yet. Right. And the counter argument to that point would be, well, Josh, it's crushing all the benchmarks

Starting point is 00:21:37 that we've evaluated all the other American model on, right? So surely it's much better. And my pushback on that would be like, well, benchmarks don't really materialize in real life use. So what if it crushes 50% on humanity's last exam? Is it useful for me to use? Does it understand what I'm trying to say? Does it understand the context of the prompts that I'm putting into it? The other side of this, you know, on the point of quantization, Josh, is I think that a lot of frontier American AI labs, like Open AI, Google, etc, actually have enough compute to give you the best experience, the highest floating point experience to put it to put into that context.

Starting point is 00:22:20 But they're using the majority of that compute to train the next big model that we haven't even seen yet, right? There was news that broke last week that Open AI is doing this, right? So technically they have enough compute to give you like amazing service all year round, but they're using 70% of that compute to train GBT6. I think it's just a matter of prioritization right now until we reach some kind of parity that these AI models are good enough. But I will say from all of the things that we've discussed on this episode so far, there is one clear winner. And that is the consumer. It's you, I,

Starting point is 00:22:54 and everyone listening to the show, which basically gets access to frontier level intelligence for the cost of next to nothing, download it completely free and run it privately at home. On this tweet that I have pulled up here, it basically says for every closed model, there's an open source alternative in it, and it goes through a list like Sonnet 4.5, you've got GLM 4.6, GROC code fast, you've got GPDOSS, GPT5, you got Kimi K2 thinking, and it just goes on and on and on and if we look at this kind of like a year and a half ago, maybe even two years ago, this list would be non-existent. It would just be Frontier AI Labs on the closed source side and zero open source side. So to see this kind of progress is really, really encouraging.

Starting point is 00:23:34 Yeah, it's going to be a race. It's going to be a battle between opening closed source. And perhaps that's not even the battle. Perhaps it's open source until they catch up to closed source. And then it's closed source across the board. So it's going to be interesting to see the developments. We have a new batch of models that are coming. We're kind of in this weird limbo where Gemini 3 is hopefully coming soon. We'll have some new benchmarks. And one of the things that that was this harsh truth to kind of wrap my head around, which is what you just mentioned, EJazz. And the fact that everyone should just compute constraint. Like opening I could have made GPT-5, probably twice as impressive if they really wanted to, they just have no compute to serve that. And it would have been way too expensive and way too slow. So it's not that it's, it can't be done. It's just that people don't have the resources to do it. So it's this constant balancing act. And it's going to be fun to see how companies kind of slot themselves into that curve of like how much they want to spend on compute versus cost versus just what they have available to actually use to train these models and deploy them at scale to users. And that's it for today, folks. Super fun episode. It is always surprising to me how quickly open source catches up with

Starting point is 00:24:41 close source centralized AI. I always think kind of like it's going to lag a few years and now it's come down to the fact that it's lagging a few weeks. We have a jam-packed week. We have potentially a new nano-bonana model being released by Google tomorrow. Fingers crossed. I'm praying for that. Fingers crossed. I'm also praying for that as well. And we have a second episode based on Tesla's Invest today, which had some really jam-packed, exciting news. Now listen, if you want the US to win this AI race and make no mistake, it is a race, you need to subscribe to American AI YouTube channels, one of which is us. Please subscribe, hit the notification button, wherever you're listening to, give us a rating.

Starting point is 00:25:20 We are helped by these so much. It is bringing up so much awareness. The algorithm is favoring us. We're getting all these wonderful views and new incomers. We've got a thousand of you from last week, which is just insane. Hello, welcome to the channel. We hope you enjoy the content. and we will see you on the next one.

Starting point is 00:25:37 Yeah, before I let them off the hook, I'm checking, I'm doing the stat update. 83% the people that watched last week were not subscribed. If you're watching this on YouTube, don't get subscribed. Or go on Spotify, my preferred place of finding this podcast. It's the best. I'm telling you, I don't know how to describe this to people any better. Spotify is so good. You have the video, you have the audio.

Starting point is 00:25:55 You could turn it off and lock your phone without needing a premium membership. Please go over there. Go leave a comment over there because also the comment section is kind of popping too. So, yeah, anyway. Thank you for all this work. not pick and choose wherever you listen go for it there you go all right we will see you guys in the next one thank you for watching as always much appreciated peace

Limitless Podcast - "Kimi K2 Thinking" is China's Plan To End American AI Dominance

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.