Limitless Podcast - "Kimi K2 Thinking" is China's Plan To End American AI Dominance
Episode Date: November 11, 2025In this episode, we discuss the launch of Kimi K2 Thinking from Moonshot AI Labs, an open-source AI model targeting GPT-5 and Gemini for just $4.6 million. With its impressive benchmarks, the...re are major implications for the American AI industry amidst rising competition. Tune in for insights on Kimi K2’s innovative architecture and its potential to reshape the future of AI and its economy!------🌌 LIMITLESS HQ: LISTEN & FOLLOW HERE ⬇️https://limitless.bankless.com/https://x.com/LimitlessFTSubstack:https://limitlessft.substack.com/------TIMESTAMPS0:00 Kimi K2: The New Frontier AI1:29 Impressive Specs and Performance4:54 Cost Comparison with GPT-56:59 Mixture of Experts Architecture8:47 U.S. vs. Chinese AI Models11:21 Open Source Advantages13:02 Licensing and Commercial Use18:37 User Experience and Ecosystems21:05 Efficiency vs. Precision22:53 The Consumer Advantage24:03 Future of Open vs. Closed Source25:32 Closing Thoughts and Call to Action------RESOURCESJosh: https://x.com/joshjkaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
The world's latest and greatest AI model is 100% free for you to download and run at home right now.
Kimi K2 Thinking is the latest reasoning model from Moonshot AI Labs, which is a Chinese frontier AI lab,
and it beats OpenAI's GPT5, Anthropics Claude, and Google's Gemini,
across pretty much all benchmarks.
But that's not even the most shocking part.
The most shocking part is that it only costs $4.6 million to train and build,
which is only a fraction of the billions of dollars spent by Open AI
to train GPT in the first place.
It's also 100% open source,
which means that you can download and run Frontier AI right at home
where you're sitting right now.
But of course, it begs two very important questions.
Number one, is open source AI the winning strategy?
We've been led to believe that closed source is typically
the better strategy when you run a business,
but China and their AI models are proving us wrong here.
And the second question, the more ominous question,
is will the U.S. stock market bubble finally pop?
Josh, what have we got here?
What is this new model and why is it taking over social media everywhere?
They did it again.
The Chinese did it again.
They knocked it out of the park.
Grand Slam, home run.
It's an unbelievably impressive model.
And this happens every time.
We get this amazing flagship model out of the U.S.
A couple months later, we get the same thing, marginally better, at one-tenth of the cost.
Like full orders of magnitude of the cost.
less than what it costs for the leading AI labs in the US today.
The specs are really impressive.
We're going to get into everything.
We'll start with, I guess, just like the high-level spec sheet.
State of the art on Humanities Last Exam, which is the reference point that we kind of use
in terms of benchmarks.
It scored the highest anyone's ever scored, 44.9%.
It has a bunch of these really cool breakthroughs.
But the big thing that it excels at, like it says in the post here, reasoning, agentic
search, and coding.
Now, there's a few cool things that we could talk about here.
EJAS, maybe we'll just get into the charts because I feel like that's an easy way to visualize
how much better this model really is than all of the others. And what we're seeing on this chart is
that, well, GPD5 was the best. Kimmy K-2 is now the new best. And this is as it relates to thinking
and reasoning. And again, this is so impressive because one, this model is fully open source.
You can go download the model and run it yourself locally for free. What were your first thoughts
when you saw this? Because to me, I was like, oh my God. Why would I use anything else?
My first thought, if I'm being honest, Josh, was like to look at the stock market.
I was like, is this going to crash the entire US stock market?
Like when Deepseek initially released the R1 thinking model, do you remember?
It was at the end of last year.
People's kind of entire bubble and vision of how AI models were trained was completely burst.
And since then, China has repeatedly delivered on breaking edge models, one of which is
the Moonshot AI Lab team, which built Kimi K2.
it's such an impressive model for a few different reasons for me.
Number one, it can now compete with all the best.
And personally, GPT5 is something that I use pretty much every day,
whether it's for like kind of casual prompts and requests
or whether it's kind of like the deeper thinking and research
and some of the lines of work that I do.
So it's become kind of like quintessential for me.
Now, to have a separate model that I can download and run privately
on my own computer at home,
that I'm showing on this tweet here, that costs 60 cents per million token input and $2.5
output is just an insane cost-cutting average way. If I was running a business using an AI model,
there's like very little reason for me not to switch over to something like this,
aside from maybe like maintenance and setup and stuff like that. The other really impressive
thing for me, Josh, was the team itself. Like, this is only a two-year-old startup, which reminds me of
another two-year-old startup, which is Elon Musk's X-A-I, right? And there's a funny link between
these two models, Josh, which is Kimi K-2's reasoning, this thinking model, can do so because
it does this like really neat little chain of thought experiment where it takes many steps to
kind of think to a logical answer versus just kind of like splurging an answer for you. That's
something that Grok Heavy Four did that they pioneered when they launched their new product.
So Kimi K2 is kind of like drawn on some of these learnings from XAI to produce a similar model.
The other really cool thing is it does this thing call tool use or tool calling whilst it's thinking.
So if you imagine as I'm kind of like trying to think through a complex problem, I will leverage different tools to be able to help me get to the answer.
So if I'm doing a maths exam, I can use a calculator or if I'm doing a deep research question, I might use Google.
this AI model naturally does that and has access to over two to 300 different tool calls and tool uses whilst it does its thinking.
So just overall a very impressively new-looking AI model.
Yeah, I just mentioned the cost being 60 cents per million tokens.
And I just want to add a little bit of context as to how low that actually is.
I was looking at the GPT5 Pro cost per inputs.
And it is $15 per million dollars per million tokens.
15 for the GPT5 Pro cost currently.
the output is $120 per million tokens.
Granted, this is the top of the top.
If you're using GBT5 standard, input is $1.25 per million tokens.
Output is $10.
So any way you scrape it, it's at least a 2x multi-cost reduction up to like 100x on the highest end,
assuming it can compete with GPT5 Pro, which all those benchmarks suggest it very well can.
So the cost is really like it's a big deal.
To get, kind of dig more into the point that you were making EJOS and how it actually works,
well, we get to this.
Sorry, sorry.
No, no, no, no, sorry.
We'll get there.
Save the memes.
Don't spoil the memes yet.
We got to get to the funny jokes next.
But basically, the way this works is, like, there's this very complicated diagram on the screen.
I'm not going to try to even explain what that is.
But there's this fun way that I like to describe it when I was describing it to my friend earlier this morning,
which is that, like, Kimi K2, it's like this giant school, and it has these things called specialist.
And in fact, Kimi K2 has 384.
specialists. You could think of these specialists as like a math club or a history club, coding club,
debate, whatever it is. And when you ask it a question, it doesn't invite the whole school. It doesn't
invite all the clubs. It's just, EJAS, if you ask for a math question, it will query the math club.
And it chooses eight out of those 384 clubs to help combine their answers, pick the experts,
and decide how it's going to solve this problem. So it has a trillion parameters, but it only uses
32 billion of them at once. And that's how we're able to get the huge cost reduction, because it
uses this thing called mixture of experts. A lot of people describe it as M-O-E. But basically what it is,
instead of using the entire model's intelligence to answer, what should I have for breakfast this morning,
it will take the chef club, it will take the health club, it will combine those together and it will form
an answer that should hopefully give you just as good as a result if you took the entire model,
but it's much more efficient in terms of cost, in terms of energy, and in terms of the amount of tokens
that could generate, because it's so much cheaper across the board. And I think that's one of the big,
really exciting things that has been cool to see coming out of China.
We saw it with Deep Seek, we see it with Kimi,
and it's this mixture of agents architecture
where they're really kind of modularizing the entire model
and only using the stuff that's important for the specific query.
They will put in a very constrained position,
which is they didn't have access to the latest GPUs or NVIDA GPUs.
There's been a bunch of U.S. tariff restrictions
on Chinese labs getting access to these kinds of things.
So they've really needed to kind of work within their bounds and means.
And so coming up with an architecture like mixture of experts or the one that they did is super important.
And it brings me to this meme, Josh, which is, what are we doing here?
There is an obvious mismatch between American-made AI models and the Chinese ones.
You've got Open AI, which is now projected to spend $1.4 trillion over the next five years.
That's trillion with a T versus Kimi training for $4.6 million.
Now, I know there's a bit of like click baitiness here. That $4.6 million was relative to one training run and usually takes a few training runs. But let's say it took like 20 training runs, right? At $4.6 million, that's still only like a like 100 mel, right, or less than that. So it doesn't really matter when you put it into the context that GPT5 is rumored to have cost $1.7 to $2.4 billion for Open AI to train. So there's a mismatch that I don't quite understand, Josh. And that's,
what makes me the most nervous when it comes to what American-made companies on Frontier Labs are
doing. I feel like they're missing the mark. I don't quite know what it is, whether it's this mixture
of experts thing, but there's someone's being sold a lie, and I don't know whether it's me
or whether it's me like looking at this Kimi K2 model and being like, wow, it's so amazing.
Yeah, when I think about the role that China plays versus the United States in terms of like
open source companies or close source companies here in the U.S., the thing that is reassuring to me,
at least, is a lot of these innovative breakthroughs that happen on the software level actually do
happen in these private AI labs. We do get like chain of thought and reasoning and there's like
this whole slew of new innovation that becomes standard very quickly that all happens in the
United States AI labs. And as far as we're concerned, the AI labs in the U.S. still have,
they're making the most progress the fastest. They are creating the most innovation. And then what you
kind of see like we described earlier in the episode is that innovation starts to trickle down,
whether it's voluntary or whether it's stolen, and it gets implemented into these new models,
and they just completely cut out the bottom in terms of cost and efficiency,
because that's kind of all they're able to do.
They don't have access to the resources of millions of GPUs from Jensen Huang.
And in Vibya, they don't have the access to $50 billion of CAPEX just to spend on employees,
just to spend on salaries and compensation.
So it seems to me like, I mean, we're still doing very well.
It's just China is very good at implementing the technology and applying it at scale in a way that's open sourced.
And the open source thing, there's a lot to say for that because it's very impressive.
And it's kind of this community effort that we saw early days with the United States.
But once they became better, they closed it off.
So what happens is you get innovation in one company like Kimmy and then you see it implemented in Deepseek.
And then you see it implemented in Quinn.
And then suddenly this technology is kind of synchronously growing between the three because it's all open source.
they're publishing all the code, all the open weights, and it's much more easier for them to thrive,
whereas innovation in the United States very much happens behind a closed wall, and it's only leaked out
at the advent of a new model when they release it to the world, and people kind of reverse engineer
how it works. I was reading an article in the Financial Times where they interviewed Jensen Huang,
and he said verbatim that China will win the AI race if they continue down the pot that they're
currently on, and if the U.S. doesn't kind of ramp up their energy production.
He was making a wider point that their open source strategy is pretty effective in the way that they're building these new AI models with the constraints that you just mentioned.
Kind of speaking more about the open sourceness and the benefits of this, I've got a tweet up here which shows that Kimmy K2 thinking this new model can basically run on two MacBook M3 Ultras, which is like a couple of thousand dollars worth of cost, which is an insane thing to do to run Frontier AI models.
at home privately in your house, trained and fine-tuned on any of your own private data,
so you don't need to kind of like sell that data to Sam or Moena or whoever.
Just super cool and super cheap, right?
Because you're running local inference at home.
So you don't have to worry about anyone kind of like spying on any of your queries or your
prompts or your research.
It's just all at home, which I thought was super cool.
The other part of the open sourcedness, which I found interesting, Josh, was the fact that
they had an MIT license with this new release or an adjoining.
adjusted MIT license. And we'll dig into that in a second. But the point being, when DeepSeek
released their first major open source model and took the world by storm, there wasn't any major
licenses around that. So you could pretty much download and do whatever the hell you wanted to
it for it. You could implement it into your own product, whether you were an American founder.
And if let's say you scaled that up to a million users that used a feature that was leveraging that
deep seek model, you wouldn't have to credit that team at all. Kimmy K2 kind of like takes a step in a
different direction here where they're released an MIT license where I think if you hit, I think it's
either 10 million or 20 million users for your product. You need to show the Kimi K2 label and say,
listen, I'm using this model under the hood. But there's some differences with this license,
right, Josh? Can we dig into that? I believe it's modified. I don't know to the extent that it is
modified, but I know that there is something different going on here. What is to say?
our only modification part is that if the software or any derivative works thereof is used
for any of your commercial products or services that have more than 100 million monthly active users
or more than 20 million US dollars or equivalent of other currencies in monthly revenue
you shall prominently display Kimmy K2 on the user interface of such product or service
that's a fun little marketing ploy fair enough fair enough you know what it reminds me of Josh
What's that?
It's what Meta tried to do
with their Lama models, right?
So Meta is the only other major American company
that I can think of
that went down this open source AI route
and the goal or the intended goal
at the time was to basically level the playing field
between Meta and OpenAI
and other frontier model AI labs
which had raced so far ahead.
So if you released all this cutting-edge AI tech
for free and accessible to anyone,
then it kind of drives down the cost
premium that Open AI and all these other frontier AI labs can charge you to access this thing.
China is doing that as a vast hole on the American AI stock market, right? So that's why we saw
like Nvidia crash, I think, 4.2% on the news getting released and such. I'm curious whether this
kind of pops the bubble and the Cappex bubble in America, Josh. Is that a crazy thing to say?
I mean, the markets reacted pretty viscerally to this news. I don't think I have a problem with
this. I don't think it's popping a bubble. I don't think we're in trouble. I think this is just
totally fine so long as we continue to stay slightly ahead or at least at par. I think we're really
excellent at making software, distributing software, creating products. I think China's really good at
shamelessly innovating and deploying without needing to go through all of the hoops and intellectual
problems that the United States mostly has.
So I don't think this will lead to any sort of bubble popping.
I think a lot of the frontier innovative stuff still happens in the U.S.
The place where I will begin to start to get a little worried is when this switches to embodied
AI.
Once we start moving from large language models to implementing these into robots or
implementing these into physical hardware, that's where I think we have problems.
On the software front, we're good.
We're crushing it.
Everyone's spending tons of money.
On the hardware front, we don't have the same lead.
And over the last, what, 30 to 50 years,
we've kind of outsource some manufacturing capabilities to other places.
And therefore are just kind of, I mean, everyone knows.
We just can't really make things cost effectively here in the United States.
If we are at a foot race with China when it comes to making embodied AI,
like humanoid robots, specialized robots, whatever it may be,
that's where things start to get a little bit scary because that's where there is a significant lead.
And that lead is comes in the form of atoms, which are much more difficult to move than bits,
because you can steal some open source code,
create this slight innovation on top,
roll it out to a billion users overnight,
and that's innovation.
That does not happen between version two and version three
of your humanoid robot.
You actually have to build it with a factory,
with real materials and people and places,
and it's very difficult and challenging to do,
and China very much stands to be the largest winner in that.
So I think on the software front, I feel really confident,
and as of now, that's all that we're battling on.
But in this near future, where things start to become embodied,
where AI Bcard becomes physically manifested in the world around us,
that seems like a place where I would start looking at Chinese investments a little bit more than the American ones.
Okay, I think I might push back a little bit and say that there is reasonable evidence to be bearish on the software side before it gets embodied AI.
I mean, so a few ways to think about it.
There is such a gross discrepancy when it comes to capital expenditure for these things.
on one side you've got the US spending trillions of dollars literally to train AGI or the best AI models.
And on this side, you're in like the hundreds of millions of dollars, which is like an order of magnitude less, right?
So there's an obvious mismatch here that we aren't seeing.
Whether it comes down to training architecture, training design, or just kind of like hardware manufacturing,
I don't know where that kind of advantage is being played, but the Chinese have found it and they're able to kind of really push down
on that lever to get ahead or on par with the US.
And they've been able to successfully do this for years now at this point.
DeepSeek was kind of like test case one.
Now I've seen like, you know, at least 50 open source models come out of Chinese frontier AI labs since then.
Number two, it's not like the US government has kind of like not tried to constrain them.
We've imposed a number of different sanctions, which include, you know, constraining which GPUs,
and Nvidia and other manufacturers within the US can sell to China, but that still hasn't stopped
them. They've been able to maintain and train these frontier AI intelligences despite all of
these different things. So I think if I would have to look on the other side of this, it would be
so what if you have an open source model that is super cool? Why aren't you using it right now?
Like I'm not using Kimi K2 regularly, even though I use GPT5 and it might be better than GPT5.
And the answer for me is pretty simple. I'm locked into an eco-source.
ecosystem in Open AI that I'm pretty happy with, which is it has memory on me. It understands
who I am. It has a context of all the previous chats that I have with it. But also most
importantly, Josh, if there's an issue with something on my account or something that I'm trying
to use, there's a community that I can access. There's a support team that I can speak to. There's a
software ecosystem that supports me, right? Versus me jumping ship to kind of Kimmy K2, setting it up
on my own and then having to like troubleshoot it myself, I think a lot of people will be disincentivized
to do that.
It is difficult, but I mean, we're seeing market forces from both sides, right?
Like I saw you included a link here somewhere where Curse and WinSurf's new AI models.
They were using some sort of Chinese models.
In fact, they were thinking in Chinese.
And I found this really fascinating that, like, American-made products are now thinking
in the Chinese language.
So that's certainly a concern in terms of the commercial side, where those API costs really
matter, where if you can get a million tokens for 60 cents versus $10, that's, that
really affects the margins of your business.
For consumers like us, there's no real interest to use Kimi K2.
And the phenomenon you spoke about earlier where you can actually run a quantized version
of Kimi K2 on to Mac studios running the M3 Ultra chips, it generates tokens at like 13
to 15 tokens per second.
So it's very slow.
You're getting like a second sentence or two every second, which it's much slower.
It's going to feel groggy.
it's not going to feel well.
There's a case to be made that that changes because this year, and it's funny that
Apple's really the only computer that supports this now.
They're releasing the M5 Ultra, which will be the new version.
And it's going to be interesting to see how it plays out.
What I found interesting, this one side note, actually, that I wanted to share with you,
because you might find it cool too, is the version that runs on these Apple computers, the Apple
studios.
It's a slightly quantized version.
And I heard about this, and I learned about this recently in the Tesla earnings call
that they had the shareholder meeting recently.
And we're going to have an episode on this later this week.
But there's this interesting thing that Elon mentioned during the episode
where he was talking about quantized versus floating point AI.
And I was like, what the hell is that?
Like, why are you spending so much time talking about this?
It doesn't make sense.
And what I realized is a lot of AI models,
they use like many, many points after the decimal in terms of data
to get more precise results.
And that is floating point.
When you quantize a model, you remove all of the data to the right of the model
and you just go to single integers.
So you lose the variance of maybe up to like 60%,
but you gain so much faster efficiency,
so much better speed improvements, cost improvements,
and you can actually run it locally on these things.
So I think it's interesting to see the different decisions
that people are making in terms of, well,
how precise does the model have to be versus how cost effective
and how efficient does it need to be.
And what we're seeing with Kimmy Gay-2 is it's very easy to over-index on the efficiency,
but maybe that's not the stated goal of open AI,
where if they really wanted to,
they could sign to quantize these models.
They could go more to integer type compute.
And it's just something I was thinking about
is how they approach them,
because it could just be,
well, Kimmy's just kind of optimizing for speed and efficiency
and the downstream effect is it's also really fast,
whereas Open AI kind of hasn't really optimized
for that specifically yet.
Right.
And the counter argument to that point would be,
well, Josh, it's crushing all the benchmarks
that we've evaluated all the other American model
on, right? So surely it's much better. And my pushback on that would be like, well,
benchmarks don't really materialize in real life use. So what if it crushes 50% on
humanity's last exam? Is it useful for me to use? Does it understand what I'm trying to say?
Does it understand the context of the prompts that I'm putting into it? The other side of this,
you know, on the point of quantization, Josh, is I think that a lot of frontier American AI labs,
like Open AI, Google, etc, actually have enough compute to give you the best experience,
the highest floating point experience to put it to put into that context.
But they're using the majority of that compute to train the next big model that we haven't
even seen yet, right?
There was news that broke last week that Open AI is doing this, right?
So technically they have enough compute to give you like amazing service all year round,
but they're using 70% of that compute to train GBT6.
I think it's just a matter of prioritization right now until we reach some kind of parity that
these AI models are good enough. But I will say from all of the things that we've discussed
on this episode so far, there is one clear winner. And that is the consumer. It's you, I,
and everyone listening to the show, which basically gets access to frontier level intelligence
for the cost of next to nothing, download it completely free and run it privately at home. On this
tweet that I have pulled up here, it basically says for every closed model, there's an open source
alternative in it, and it goes through a list like Sonnet 4.5, you've got GLM 4.6, GROC code fast, you've got
GPDOSS, GPT5, you got Kimi K2 thinking, and it just goes on and on and on and if we look at
this kind of like a year and a half ago, maybe even two years ago, this list would be non-existent.
It would just be Frontier AI Labs on the closed source side and zero open source side.
So to see this kind of progress is really, really encouraging.
Yeah, it's going to be a race. It's going to be a battle between opening closed source. And perhaps that's not even the battle. Perhaps it's open source until they catch up to closed source. And then it's closed source across the board. So it's going to be interesting to see the developments. We have a new batch of models that are coming. We're kind of in this weird limbo where Gemini 3 is hopefully coming soon. We'll have some new benchmarks. And one of the things that that was this harsh truth to kind of wrap my head around, which is what you just mentioned, EJazz. And the fact that everyone should just compute constraint. Like opening I could have made
GPT-5, probably twice as impressive if they really wanted to, they just have no compute to serve that.
And it would have been way too expensive and way too slow. So it's not that it's, it can't be done.
It's just that people don't have the resources to do it. So it's this constant balancing act.
And it's going to be fun to see how companies kind of slot themselves into that curve of like how
much they want to spend on compute versus cost versus just what they have available to actually use
to train these models and deploy them at scale to users. And that's it for today, folks.
Super fun episode. It is always surprising to me how quickly open source catches up with
close source centralized AI. I always think kind of like it's going to lag a few years and
now it's come down to the fact that it's lagging a few weeks. We have a jam-packed week.
We have potentially a new nano-bonana model being released by Google tomorrow.
Fingers crossed. I'm praying for that. Fingers crossed. I'm also praying for that as well.
And we have a second episode based on Tesla's Invest today, which had some really jam-packed, exciting news.
Now listen, if you want the US to win this AI race and make no mistake, it is a race,
you need to subscribe to American AI YouTube channels, one of which is us.
Please subscribe, hit the notification button, wherever you're listening to, give us a rating.
We are helped by these so much.
It is bringing up so much awareness.
The algorithm is favoring us.
We're getting all these wonderful views and new incomers.
We've got a thousand of you from last week, which is just insane.
Hello, welcome to the channel.
We hope you enjoy the content.
and we will see you on the next one.
Yeah, before I let them off the hook, I'm checking, I'm doing the stat update.
83% the people that watched last week were not subscribed.
If you're watching this on YouTube, don't get subscribed.
Or go on Spotify, my preferred place of finding this podcast.
It's the best.
I'm telling you, I don't know how to describe this to people any better.
Spotify is so good.
You have the video, you have the audio.
You could turn it off and lock your phone without needing a premium membership.
Please go over there.
Go leave a comment over there because also the comment section is kind of popping too.
So, yeah, anyway.
Thank you for all this work.
not pick and choose wherever you listen go for it there you go all right we will see you guys
in the next one thank you for watching as always much appreciated peace
