Limitless Podcast - Kimi K2 is the Open Source Claude-Killer | US vs China AI
Episode Date: July 16, 2025Kimi K2 is a groundbreaking open-source AI model from China with 1 trillion parameters. We discuss its competitive advantages, including low operational costs and superior coding capabilities... through a "mixture of experts" approach. Josh highlights the implications for AI competition as Kimi K2 emerges in the market alongside OpenAI’s plans for an open-source model.We also explore Kimi K2’s two versions—Base and Instruct—its impact on the AI landscape, and the challenges faced by OpenAI's ChatGPT, xAI's Grok, and Anthropic's Claude. Tune in for key insights on how Kimi K2 could reshape AI development!------💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT-----TIMESTAMPS0:00 Intro0:58 The Rise of Kimi K22:49 Efficiency and Cost Benefits3:53 Training Breakthroughs Explained5:37 Innovations in AI Training6:30 The Impact of Open Source8:05 Competitive Landscape of AI9:41 Context Window Capabilities12:55 The Surge of Kimi K215:36 Market Adoption Insights19:57 Versions of Kimi K224:21 Privacy and Local AI26:30 The AI Talent Landscape31:04 China's AI Competitive Edge32:40 Open Source vs. Closed Source40:19 Closing Thoughts and Future Prospects42:49 Get Involved-----RESOURCESJosh: https://x.com/Josh_KaleEjaaz:https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
A bunch of AI researchers from China just released a brand new AI model called Kimi K2,
which is not only as good as any other top model like Claude,
but it is also 100% open source,
which means it's free to take, customize and create into your own brand new AI model.
This thing is amazing at coding.
It beats any other model like creative writing and it also has a pretty insane voice mode.
Oh, and I should probably mention that it is one trillion parameters in size,
which makes it one of the biggest and largest models to ever be created.
Josh, we were winding down on a Friday night and this news broke.
They dropped the bomb.
Absolutely crazy bomb, especially with like Open AI rumored to release their open source model this week.
You've been jumping into this.
What's your take?
Yeah, so last week we crowned Grock 4 as the new leading private model, close source model.
This week we got to give the crown to Kimmy K2.
We got another crown going for the open source team.
They are winning.
I mean, this is better than Deep Seek and Deep Seek R2.
This is basically Deep Seek R3, I would imagine.
And if you remember back a couple months,
Deepseek really flipped the world on his head because of how efficient it was
and the algorithmic upgrades it made.
And I think what we see with Kimmy K2 is a lot of the same thing.
It's these novel breakthroughs that come as a downstream effect of their needing to be resourceful.
So China, they don't have the mega GPU clusters we have.
They don't have all the cutting edge hardware.
but they do have the software prowess to find these efficiencies.
I think that's what makes this model so special.
And that's what we're going to get into here is specifically what they did to make this model so special.
I mean, look at these stats here, Josh, like one trillion parameters in total.
It's 32 billion active mixture of expert model.
So what this means is, although it's really large in size,
typically these AI models can become pretty inefficient if it's large in size.
It uses this technique called mixture of experts,
which means that whenever someone queries a model,
it only uses or activates a number of parameters
that are relevant for the query itself.
So it's more smarter, it's much more efficient,
and it doesn't use or consume as much energy
as you would if you wanted to run it locally at home
or whatever that might be.
It's also super cheap.
I think I saw somewhere that this was 20% the cost of Claude, Josh,
which is just insane.
For all the nerds that kind of want to run, you know,
really long tasks or just set and forget the AI to run on like your coding log or whatever that
might mean, you can now do it at a much more affordable rate at one-fifth the cost than some of the
top models that are out there. And it is as good as those models. So just insane kinds of things.
Josh, I know there's a bunch of things that you wanted to point out here on benchmarks.
What do you want to get into? Yeah, it's really amazing. So they took 15.5 trillion tokens.
And they condensed those out into a one trillion parameter model. And then what's amazing is when
you use this model, like you said, it uses a thing called mixture of experts. So it has, I believe,
384 experts. And each expert is good at a specific thing. So let's say in the case you want to do
a math problem, it will take a 32 billion parameter subset of the 1 trillion total parameters,
and it will choose eight of these different experts in a specific thing. So in the case of math,
it'll find an expert that has the calculator tool. It'll find an expert that has a fact,
a fact-checking tool or a proof tool to make sure that the math is accurate,
it'll have just a series of tools to help itself, and that's kind of how it works so efficiently,
is instead of using a trillion parameters at once, it uses just $32 billion,
and it uses the eight best specialists out of the 384 that it has available to it.
It's really impressive, and what we see here is the benchmarks that we're showing on screen,
and the benchmarks are really good. It's up there in line with just about any other top model,
except with the exception that this is open source. And there was another breakthrough that
we had, which was the actual way that they handled the training of this. And yeah, this is the
loss curve. So what you're looking at on screen for the people who are listening, it's this really
pretty smooth curve that kind of starts at the top and it trends down in a very predictable
and smooth way. And most curves don't look like this. And if they do look like this, it's because
the company has spent tons and tons of money on error correction to make sure this curve is so smooth.
So basically what you're seeing is the training run of the model. And a lot of times what happens is
you get these very sharp spikes and it starts to defer away from the normal training run.
And it takes a lot of compute to kind of recalibrate and push that back into the right way.
What they've managed to do is really make it very smooth.
And they've done this by increasing these efficiency.
So if you can think about it, there's this analogy I was thinking of right before we hit the record button.
And it's if you were teaching a chef how to cook, right?
So we have chef Ejazz here.
I am teaching him how to cook.
I am an expert chef.
and instead of telling him every ingredient and every step for every single dish,
what I tell him is like, hey, if you're making this amazing dinner recipe,
all you need that matters is this amount of salt applied at this time,
this amount of heat applied for this length of time,
and the other stuff doesn't matter as much.
So just put in whatever you think is appropriate,
but you'll get the same answer.
And that's what we see with this model is just an increased amount of efficiency
by being direct, by being intentional about the data that they used to train it on,
the data that they used to fetch in order to give you high-quality queries. And it's a really novel
breakthrough. They call it the Muan clip optimizer, which, I mean, it's a Chinese company. Maybe it
means something special there. But it is a new type of optimizer. And what you're seeing in this
curve is that it's working really well and it's working really efficient. And that's part of the
benefit of having this open source is now we have this novel breakthrough. And we could take this
and we could use this for even more breakthroughs, even more open source models. And that's part,
that's been really cool to see.
I mean, this is just time and again from China,
so, so amazing from their research team.
So, like, just to kind of like pick up your comment on Deepseek,
at the end of last year,
we were utterly convinced that the only way to create a breakthrough model
was to spend billions of dollars on compute clusters.
And so therefore it was a pay-to-play game.
And then Deepseek, a team out of China released their market,
model and completely open source did as well. And it was as good as Open AI's frontier model,
which was the top model at the time. And the revelation there was, oh, you don't actually just need
to chuck a bunch of compute at this. There are different techniques and different methods.
If you get creative about how you design your model and how you run the training cluster,
the training one, which is basically what you need to do to make your model smart, you can run it
in different ways that is more efficient, consumes less energy.
and therefore less amount of money,
but is as smart, if not smarter
than the frontier models that American AI companies are making.
And this is just a repeat of that, Josh.
I mean, look at this curve.
For those who are looking at this episode on video,
it is just so clean.
Yeah, it's beautiful.
The craziest part about this is when DeepSeek was released,
they pioneered something called reasoning or reinforcement learning,
which are two separate techniques
that made the model super-success.
smart with less energy and less compute spend.
With this model, they didn't even implement that technique at all.
So theoretically, this model can get so much more smarter than it already is.
And they just kind of leverage a new method to make it as smart as it already is right now.
So just such a fascinating kind of like progress in research from China and it just keeps on
coming out. It's so impressive. Yeah, this is this was the exciting part to me is that
we're seeing so many algorithm or exponential improvements in so many different categories.
So this was considered a breakthrough by all means.
And this wasn't even the same type of breakthrough that DeepSeek had.
So we get this now compounding effect where we have this new training breakthrough.
And then we have Deepseek who has the reinforcement learning.
And that hasn't even yet been applied to this new model.
So we get the exponential growth on one end, the exponential growth on the reasoning end.
Those come together.
And then you get the exponential growth on the hardware stack where the GPs are getting much
faster. And there's all of these different subsets of AI that are compounding on each other and
growing and accelerating quicker and quicker. And what you get is this unbelievable rate of progress.
And that's what we're seeing. So reasoning isn't even here yet. And we're going to see it soon
because it is open source so people can apply their own reasoning on top of it. I'm sure the
Moonshot team is going to be doing their own reasoning version of this model. And I'm sure we're
going to be getting even more impressive results soon. I see you have a post up here about the testing
and overall performance, can you please share?
Yeah, yeah.
So this is a tweet that summarizes really well
how this model performs in relation to other frontier models.
And the popular comparison that's taken for Kimmy K2
is against Claude.
So Claude has a bunch of models out.
Claude 3.5 is its earlier model,
and then Claude 4 is its latest.
And the general take is that this model is just better
than those models,
which is just insane to say because for so long,
long, Josh, we've said that Claude was the best coding model. And indeed it was. And then within
the span of, what is it, five days, GROC4 released and it just completely blew Claude 4 out of the
water in terms of coding. Now Kimmy K2, an open source model out of China who doesn't even have access
to the research and kind of proprietary knowledge that a lot of American AI companies have
also beat it as well. Right. So it kind of beats Claude at its own game. But it's also cheaper. It's 20% the cost
of Claude 3.5, which is just an insane thing to say, which means that if you are a developer
out there that wants to try your hand at kind of like vibe coding a bunch of things, or actually
seriously coding something, you know, that's quite novel, but you don't have the hands on deck to do
that, you can now spin up a Kimi K2 AI agent, actually multiple of them for a very cost-efficient,
reasonable, you know, salary. You don't have to pay like hundreds of thousands of dollars or, you know,
hundreds of millions of dollars, which is what MET is doing to kind of like buy a bunch of
these software engineers, you can spend, you know, the equivalent of maybe a Netflix subscription
or 500 to a thousand bucks a month and spin up your own app. So super, super cool. And also one-handed
perk that's there is it's that even if you have a lot of GPU sitting around, you can actually
run this model for free. So that's the cost if you actually query it from the servers, but I'm
sure there's going to be companies that have access to XSGPUs. They can actually just
download the model because it's open source, open weights, and they can run it
on their own, and that brings the cost of compute down to the cost per kilowatt of the energy required
to run the GPUs. So because it's open source, you really start to see these costs decline,
but the quality doesn't. And that's, every time we see this, we see a huge productivity on lock
in encoding output, and amount of queries used. It's like, this is freaking awesome. Yeah. Josh,
I saw something else come up as well. So do you remember when Claude first released
that frontier model? I think it was 3.5, or maybe it was 4, one of their brand. One of their
bragging rights was it had a
one million token
context window. Oh yes,
which was huge. Yeah, which
for listeners of this show is huge.
It's like several
book novels worth
of words or characters you could
just bung into one single prompt.
And the reason why that was such an amazing thing
was for a while
people struggled to kind of communicate
with these AIs because they couldn't
set the context. There wasn't enough
bandwidth within their chat.
log window for them to say, you know, and don't forget this and then there was this. And then,
you know, this detail and that detail, there just wasn't enough space. And models weren't
performing enough to kind of consume all of this in one go. And then Claude came out and was like,
hey, we have one million context windows. Don't worry about it. Chuck in all the research papers
that you want, chuck in your essay, chuck in reference books and we got you. I saw this
tweet that was deleted. I think you sent this to me. We got the screen shots. We always come
with receipts. Yeah, I wonder why they deleted it. But a good catch from you.
Yeah, let's get at this. What's your take on this job?
was first posted, I think earlier today, yeah, like an hour ago, and then deleted, pretty shortly
afterwards. And this is from a woman named Crystal. Crystal works with the Moonshot team.
She is part of the team that released Kimmy K2. And in this post, it says, Kimmy isn't just another
AI. It went viral in China as the first to support a two million token context window.
And then she goes on to say, we're in AI lab with just 200 people, which is ministerially small
compared to a lot of the other labs they're competing with. And it was acknowledgement that they had a
two million token context window. And for those who, just a quick refresher on the context window stuff,
it's imagine you have like a gigantic textbook and you've read it once and you close it and you
kind of have a fuzzy memory of all the pages. The context window allows you to lay all of those out
in clear view and directly reference every single page. So when you have two million tokens,
which is roughly two million words of context, we're talking about like hundreds and hundreds of
books and textbooks and knowledge, and you could really dump a lot of information in this for the AI
to readily access. And that, if they released that, a 2 million token open source model, that's
huge deal. I mean, even GROC 4 recently, I believe, what did we say it was? It was a 256,000
token context window, something like that. So GROC 4 is one-eighth of what they supposedly have
accessible right now, which is a really, really big deal. So I'm hoping it was deleted because
they just don't want to share that, not because it's not true.
I would like to believe that it's true because, man, that'd be pretty epic.
Yeah, and the people are loving it, Josh.
Check out this graph from OpenRouter, which basically shows the split of usage between everyone
on their platform that are querying different models.
So for context here, Open Router is a website that you can go to and you can type up a prompt,
just like you do at Chad ChupT.
And you can decide which model your process.
your prompt goes to, or you could let OpenRadder decide for you,
and it kind of like divvies up your query.
So if you have a coding query, it's probably going to send it to Claude,
or now Kimi K2, or Grok4.
But if you have something that's more like to do with creative writing
or something that's like a case study, it might send it to OpenAI's O3 model, right?
So it kind of like decides for you.
OpenRatter released this graphic,
which basically shows that Kimi K2 surpassed XAI in token market share,
just a few days after launching,
which basically means that XAI spent, you know,
hundreds of billions of dollars training up their Grog4 model,
which just kind of beat out the competition just last week.
Then Kimi K2 gets released, completely open source,
and everyone starts to use that more than GROC4,
which is just an insane thing to say
and just shows how rapidly these AI models compete with each other
and surpass each other.
I think part of the reason for this, Josh, is it's open source.
right? Which means that not only are retail users like myself and yourself using it for our daily queries,
you know, create this recipe for me or whatever, but researchers and builders all over the world
that, you know, have so far been challenged or had this obstacle of, you know, pots of money
basically to start their own AI company now have access to a frontier world-renowned model
and can create whatever application, website or product.
they want to make. So I think that's part of the usage there as well. Do you have any takes on this?
Yeah, and it's downstream of cost, right? Like, we always see this when a model is cheaper and
mostly equivalent, the money will always flow to the cheaper model. It'll always get more queries.
I think it's important to note the different use cases of these models. So they're not directly
competing head-to-head on the same benchmarks. I think what we see is like when we talk about
Claude, it's generally known as the coding model. And I don't think, like, OpenAI's 03 is not really
competing directly with Claude because it's more of a general intelligence versus a coding-specific
intelligence. K2 is probably closer to a Claude, I would assume, where it's really good at coding
because it uses this mixture of experts. And I think that helps it find the tools. It uses this
cool new novel thing called like multiple tool use. So each one of these experts can use a tool
simultaneously. And they could use these tools and work together to get better answers. So in the
case of coding, this is a home run. Like it is very cheap cost for token, very high-quality output.
I actually think you can compete with Open Air O3, Josh.
Check this out.
Oh?
So, Rowan, yeah, Rowan Cheng put this out yesterday and he basically goes, I think we're
at the tipping point for AI generated writing.
It's been notoriously bad, but China's Kimi K2, an open weight model, is now topping
creative writing benchmarks.
So just to put that into context, that's like having the top most, I don't know,
smartest or slightly autistic software engineer,
the top engineering company working on AI models,
also being the best poet or creative script for
and directing the next best movie or whatever that might be,
or creating a Harry Potter novel series.
This model can basically do both.
And what it's pointing out here is that compared to 03,
it tops it.
Look at this.
Completely beats it.
Yep.
Okay, so I take that back.
Maybe it is just better at everything.
Yeah, and that's some pretty impressive results.
I think, like, what's worth pointing out here is,
and I don't know whether any of the American air models do this, Josh,
but mixture of experts seems to be clearly a win here.
The ability to create an incredibly smart model
doesn't come without, you know, this large storage load that is needed, right?
One trillion parameters.
But then combining it with the ability to be like,
hey, you don't need to query the entire thing.
We've got you.
We have a smart router which basically pulls on the best experts, as you described earlier,
for whatever relevant query you have.
So if you have a creative writing task or if you have a coding thing, we'll send it to two
different departments of this model.
That's a really huge win.
Do any other American models use this?
Well, the first thing that came to my mind when you said that is GROC4, which doesn't
exactly use this, but uses a similar thing, where instead of using a mixture of experts,
it uses a mixture of agents.
So Grockfor Heavy uses a bunch of distributed.
agents that are basically clones of the large model. But that takes up a tremendous amount of compute.
And that is the $300 a month plan. That's replicating GROC 4 though, right? So that's like taking
the model and copy pasting it. So let's say GROC 4 was one trillion parameters just for ease
of comparison. That's like creating, if there were those four agents, that's four trillion parameters,
right? So it's still pretty costly and inefficient. Is that what you're saying?
It's actually the opposite direction of K2. So what they have used. So what they have used.
used is just, and again, this is kind of similar to tracking sentiment between the United States
and China, where the United States will throw compute at it, where China will throw like,
a kind of clever resource at it. So Grock, yeah, when they use their mixture of agents,
it actually just costs a lot more money, whereas K2, when they use their mixture of experts,
well, it costs a lot less. Instead of using four trillion parameters in this case, it uses just $32 billion.
And it kind of copies that $32 billion over and over. And it's really, it's a really elegant solution
that seems to be yielding pretty comparable results.
So I think as we see these efficiency upgrades,
I'm sure they will eventually trickle down into the United States models.
And when they do, that is going to be a huge unlock in terms of cost per token,
in terms of the smaller distilled models that we're going to be able to run on our own computers.
But yeah, I don't know if any who are also using it at this scale.
It might be novel just a K2 right now.
And I think that this is the method that probably scales the best, Josh.
Like, I, it makes sense.
Efficiency always wins at the end, right?
And to see this kind of innovation come pretty early on in a technology's life cycle is just super impressive to see.
Another thing I saw is there's two different versions of this model.
I believe there's something called Kimi K2 Base, which is basically the model for researchers who want full control for fine tuning and custom solutions, right?
So imagine this model as the entire parameter set.
So you have access to 1 trillion parameters, all the weight designs and everything.
And if you're a nerd that wants to nerd out, you can go crazy.
You know, if you have like your own GPU cluster at home or if you happen to have a convenient warehouse full of servers that you weirdly have access to, you can go crazy with it.
You can, if you think about like the early gaming days of Counterstrike and then you could like mod it, you can basically.
mod this model to your heart's desire. And then there's a second version called K2 Instruct,
which is for drop-in general purpose chat and AI agent experiences. So this is kind of like at
the consumer level, if you're experimenting with these things or if you want to run a experiment
at home on a specific use case, you can kind of like take that away and do that for yourself.
That's how I understand it, Josh. Do you have any takes on this? That makes sense. And I think
that second version that you're describing is what's actually available publicly on
their website, right? So if you go to Kimmy.com, it has a text box. It looks just like chat,
GBT, like you used to. And that's where you can run that second tier model, which you described
as, that's the drop in general purpose chat. And then, yeah, for the hardcore researchers,
there is a GitHub repo, and the GitHub repo has all the weights and all the code. And you can really
download it, dive in, use the full thing. I was playing around with the Kimmy tool. And it's really cool.
It's fast. Oh, I mean, it's lightning fast. If you go from a reasoning model to an inference model
Like, Kimmy, you get responses like this.
Like when I'm using GROC 4 or O3, I'm sitting there sometimes for a couple of minutes waiting
for an answer.
This, you type it in and it just types back right away, no time waiting.
So it's kind of refreshing to see that.
But it's also a testament to how impressive it is.
I'm getting great answers and it's just spitting it right out.
So what happens when they add the reasoning layer on top?
Well, it's probably going to get pretty freaking good.
So the trend we're seeing, and we saw this last week with GROC 4 is typically we're expected
to wait a while when we send a prompt to a breakthrough model
because it's thinking, it's trying to basically replicate
what we have in our brains up here.
And now it's just getting much quicker and much smarter
and much cheaper.
So the long story short is these incredibly powerful.
I kind of think about it as how we went from massive desktop computers
to slick cell phones, Josh,
and then we're going to eventually have chips in our brain.
AI is just kind of like fast-tracking that entire lifecycle
within like a couple of years, which is just insane.
And these efficiency improvements are really exciting because you can see how quickly they're shrinking
and allowing eventually for those incredible models to just run on our phones.
So there's totally a world a year from now in which like a GROC-403 Kimi K2 capable model
is small enough that it could just run inside in our phone and run on a mobile device
or run locally on a laptop or you're offline.
And you kind of have this portable intelligence that's available everywhere anytime,
even if you're not connected to the world.
And that seems really cool.
Like we were talking a few episodes ago about Apple's local free AI inference running on an iPhone,
but how the base models still kind of suck.
Like they don't really do anything super interesting.
They're basically good enough to do what you would expect Siri to do but can't do.
And these models, as we get more and more breakthroughs like this that allow you to run
much larger parameter counts on a much smaller device, it's going to start really superpowering
these mobile devices.
And I can't help but think about the Open AI hardware device.
I'm like, wow, that'd be super cool.
You had like 03 running locally in the middle of the jungle somewhere with no service
and you still had access to all of its capabilities.
Like that's probably coming downstream of breakthroughs like this,
where we get really big efficiency unlocks.
I mean, it's not just efficiency though, right?
It's the fact that if you can run it locally on your device,
it can have access to all your private data without exposing all of that to the model providers
themselves, right? So one of the major concerns of not just AI models, but also with mobile phones,
is privacy. I don't want to share all my kind of like private health, financial and social media
data, because then you're just going to have everything on me and you're going to use me.
You're going to use me as a product, right? And that's kind of like been the quota for the last decade
in tech. And so with AI, that's a supercharged version of it. The information gets more personal.
It's not just your likes. It's, you know, where Josh shops every day and, you know, who
who he's dating and all these kinds of things, right?
And that becomes quite personal and intrusive very quickly.
So the question then becomes,
how can we have the magic of an AI model without it being so obtrusive?
And that is open source locally run AI or privately run AI.
And Kimi K2 is a frontier model that can technically run on your local device.
If you set up the right hardware for it and the way that we're trending,
you can basically end up having that on your device,
which is just a huge unlock.
And if you can imagine how you use OpenAI 03 right now, Josh, right?
I know you use it as much as I do.
The reason why you and I use it so much isn't just because it's so smart,
but it's because it remembers everything about us.
But I hate that Sam knows or has access to all that data.
I hate that if he chooses to switch on personalized ads,
which is currently the model where most of these tech companies make money right now,
he can.
And I've got nothing to do about it because I don't want to use any other model
apart from that, but if there was a locally run model that had access to all the memory and context,
I'd use that insert.
And this is suspicious.
I mean, this is a different conversation in total, but isn't it interesting how other companies
haven't really leaned into memory when it's seemingly the most important mode that there is?
Like, Grakfor doesn't have good memory rolled out.
Gemini doesn't really have memory.
There's no, Clod doesn't have memory the way that Open AI does, yet it's the single biggest
reason why we both continue to go back to Chad GBT and Open AI.
So that's just been an interesting thing.
I mean, Kimmy is open source.
I wouldn't expect them to lean too much into it.
But for these close source models, that's just, it's another interesting just observation.
Like, hey, the most important thing isn't, doesn't seem to be prioritized by other companies just yet.
Why do you think that is?
So my theory, at least from XAI or GROC4's perspective, is Elon's like, okay, I'm not going to be able to build a better chat bot or chat messenger than OpenAI has.
there's not too many features I can set GROC4 apart,
then that O3 doesn't already do, right?
But where I can beat O3 is at the app layer.
I can create a better app store than they have
because I haven't already created one.
That is sticky enough for users to continually use,
and I can use that dataset to then unlock memory and context at that point, right?
So I just saw today that they released, they being XAI,
released a new feature for GROC4 called,
I think it's Companions, Josh.
Oh, yeah, play with it.
These animated avatar-like characters,
so they basically look like they're from an anime show.
And you know how you can use voice mode in OpenAI
and you can kind of like talk to this realistic human-sounding AI?
You now have a face and a character on Gron 4.
And it's really entertaining, Josh.
Like, I find myself kind of like engaged in this thing because I'm not just typing words.
It's not just this binary to and fro with this chat messenger.
It's this human, this cute, attractive human that I'm just like now speaking to.
And I think that that's the strategy that a lot of these AI companies, if I had to guess,
are taking to kind of like seed their user base before they unlock memory.
I don't know whether you have a take on that.
Yeah, I have a fun little demo.
I actually played around with it this morning and I was using it totally unhandling.
No filter, very vulgar, but like kind of fun. It's like a fun little party trick. And yeah, I mean, that was a surprise to me this morning when I saw that rolled out. I was like, huh, that doesn't really seem like it makes sense. But I think they're just having fun with it. Can we for a second talk about the team? So we've mentioned just now how they've all come from China and how China's like really advancing open source AI models and they've completely beat out the competition in America. Mata's Lama being the obvious one.
We've got Quinn from Alibaba, we've got Deep Seekar 1, now we have Kimi K2.
The team is basically the AI Avengers of China, Josh.
So these three co-founders all have deep AI ML backgrounds that hail from like the top American universities such as Carnegie Mellon.
One of them has like a PhD from Carnegie Mellon and machine learning, which is basically for those of you don't know is like God tier degree for AI.
That means you're desirable and hireable by every other AI company after you graduate.
But as such as that, they also have credibility and degrees from the top universities in China,
especially this one university called Tsinghua, which seemed to be the top of their field.
I looked them up on rankings for AI universities globally,
and they often come in number three or four in the top 10 AI university.
So pretty impressive from there.
But what I found really interesting, Josh, was one of the co-founders.
was an expert in training AI models on low-cost optimized hardware.
And the reason why I mentioned this is it's no secret that if you want a top-frontier
AI model, you need to train it on Nvidia's GPUs.
You need to train it on Nvidia's hardware.
Nvidia's market cap, I think, at the end of last week, surpassed $4 trillion.
That's $4 trillion with a T.
that is more than the current GDP of the entire British economy.
And the largest in the world.
There's never been a bigger company.
There's never been a bigger company.
It's just insane to grab your head around.
And it's not without reason.
They supply basically, or they have a grasp or a monopoly on the hardware that is needed to train top models.
Now, Kimmy K2 comes along, casually drops a one trillion parameter model, one of the largest models ever released.
and it's trained on hardware that isn't in videos.
And Jensen Huang, I need to find this clip, Josh,
but Jensen Huang basically was on stage.
I think it was at a private conference maybe yesterday,
but he was quoted as saying 50% of the top AI researchers are Chinese and are from China.
And what he was implicitly getting at is they're a real threat now.
I think for the last decade we've kind of been like,
ah, yeah, China's just going to copy, paste everything that comes out of America's
tech sector. But when it comes to AI, we've kind of like maintained the same mindset up until now
where they're really just competing with us. And if they have the hardware, they have the
ability to research new techniques to train these models like deep seeks reinforcement learning
and reasoning. And then Kimmy K2's kind of like efficient training run, which you showed earlier,
they've come to play, Josh. And I think it's worth highlighting that China has a very strong
grasp on top AI researchers in the world and models that are coming out of it.
Where are their $100 million offers?
I haven't seen any of those coming through.
None, dude.
The most impressive thing is that they do it without the resources that we have.
Imagine if they did have access to the clusters of these like H-100s that NVIDIA is making.
I mean, that would be, would they crush us?
And we kind of have this timeline here.
We're kind of running up against the edge of energy that we have available to us to train these massive models, whereas China does not have that constraint. They have significantly more energy to power these. So in the event, the inevitable event, that they do get the chips and they are able to train at the scale that we are, I'm not sure we're able to continue our rate of acceleration in terms of hardware manufacturing, large training, as fast as they will. And they already have done the hard work on the software efficiency side. They've cranked out every
single efficiency because they
are doing it on constrained hardware. So
it's going to create this really interesting effect where
they're coming at it from the
ingenuity software approach. We're coming at
it from the brute force, throw a lot of compute added
approach, and we'll see where both sides end up.
But it's clear that China is still behind because they are the ones
open sourcing the models. And we know at this
point now, if you're open source of your model, you're doing it
because you're behind. Yeah, yeah. I mean,
one thing that did surprise me, Josh, was that they
released a $1 trillion parameter open source model.
I didn't expect them to catch up that quickly.
Like, $1 trillion is a lot.
Yeah.
Another thing I was thinking about is China has dominated hardware for so long now.
So it wouldn't really surprise me if, like, I don't know, a couple years from now,
they're producing better models at specific things, basically because they have better hardware
than America, than the West.
But where I think the West will continue to dominate is at the application layer.
And I don't know, if I was a betting man, I would say that most of the money is eventually
going to be made on the application side of things.
I think GROC 4 is starting to kind of show that with all these different kinds of novel
features that they're releasing.
I don't know if you've seen some of the games that are being produced from GROC4, Josh,
but it is ultimately insane.
And I haven't seen any similar examples come out of Asia from any of their AI models,
even when they have access to American model.
So I still think America dominates at the app player.
But Josh, I just came across this tweet,
which you reminded me of earlier.
Tell me about OpenAI's strategy to open source model
because I got this tweet pulled up from Sam Altman,
which is kind of hilarious.
Yeah, all right.
So this week, if you remember from our episode last week,
we were excited about talking about OpenAI's new open source model.
Open AI, open source model, all checks out.
This was going to be the big week.
They released their new flagship.
open source. Well, conveniently, I think the same day as K2 launched later in the day,
or perhaps the very next morning, Sam Altman posted a tweet. He says, hey, we plan to launch our
open weights model next week. We are delaying it. We need time to run additional safety tests and
review high-risk areas. We are not yet sure how long it will take us. While we trust the
community will build great things with this model, once weights are out, they can't be pulled back.
This is new for us and we want to get it right. Sorry to be the bearer of bad news. We are working super
hard. So there's a few points of speculation. The first, obviously, being, did you just get your
ass handed to you? And now you are going back to reevaluate before you push out a new model. So that's
one possible thing where they saw K2. They were like, oh, boy, this is pretty sweet. This is our first
open source model. We probably don't want to be lower than them. And there is the second point of
speculation, which EJAS, you mentioned to me a little earlier today where maybe something went wrong
with the training one. And it's not quite that they're getting beat up by a Chinese company.
It's that like they actually made it a mistake on their own accord. And can you explain to me
specifically what that might be, what the speculation is at least? Yeah, well, I'll keep it short.
I think it was a little racist under the hood. And I can't find the tweet, but basically one of these
AI researchers slash product builders on X got access to the model supposedly, according to him.
and he tested it out in the background.
And he said, yeah, it's not really an intelligence thing.
It's just worse than what you'd expect from an alignment and consumer-facing approach.
It was ill-mannered.
It was saying some pretty wild shit, kind of the stuff that you'd expect coming out of 4chan.
And so Sam Omband decided to delay whilst they kind of like figured out why it was kind of acting out.
Got it.
Okay.
So we'll leave that speculation somewhere it is.
There's a funny post that I'll actually share with you if you want to throw it up,
which was actually from Elon.
And we'll abbreviate, but it was like,
Elon was basically saying it's hard to avoid the LibTard slash Mecca Hitler approach both of them
because they're on so polar opposite ends of the spectrum.
And he said he spent several hours trying to solve this problem with the system prompt,
but there's too much garbage coming in at the foundation model level.
So basically, I mean, what happens with these models is you train them based on all the
human knowledge that exists, right? So everything that we've believed, all the ideas that we've
shared, it's been fed into these models. And what happens is you can try to adjust how they
interpret this data through the system prompt, which is basically an instruction that every single
query gets passed through. But at some point, is reliant on this swath of human data that is just
it's too overbearing. And that's kind of what Elon shared. And the difference between Open AI and
GROC is that GROC will just ship the crazy update. And that's what they did. And they caught a lot of
backlash trauma. But what I find interesting and what I'm sure Open AI will probably follow is this
last paragraph where he says, our V7 foundation model should be much better and we're being far more
selective about training data rather than just training on the entire internet. So what they're planning
to do is solve this problem, which is what I assume open AI probably ran into in the case that
the AI training model kind of went off the rails and it started saying bad things about lots of people,
is that you kind of have to rebuild the foundation model with new sets of data. And in the case of
GROC, I know one of the intentions for B7 is actually to generate its own database of data
based on synthetic data from their models. And I'm assuming OpenA will probably have to do this
too if they want to calibrate a lot of times people call that the temperature, which is the
variance of aggression in which a model uses. And I don't know, I think we're going to start
to see interesting approaches from that because as they get smarter, you really don't want them
to necessarily have these evil traits as the default. And it's very hard to, you're going to
get around that when you train them on the data that they've been trained on so far.
It just goes to show how I guess cumbersome it is to train these models, Josh.
It's such a hard thing.
Yeah.
Yeah.
It's not something that you can just kind of like jump into the code and tweak a few things.
Most of the time, you don't know what's wrong with the model or where it went wrong.
I mean, we've talked about this on a previous episode, but essentially if you build out this
model, right, you spend hundreds of millions of dollars.
And then you feed it a query.
So you put something in and then you wait to see what it spits out.
You don't really know what it's going to spit out.
You can't predict it.
It's completely probabilistic.
And so if you release a model and it starts being a little racist or, you know, kind of crazy,
you have to kind of like go back to the drawing board and you have to analyze many different sectors of this model.
Like was it the data that was poisoned or was it the way that we trained it or maybe it was a particular model way.
that we tweak too much or whatever that might be.
So I think over time it's going to get a lot easier
once we understand how these models actually work.
But my God, it must be so expensive
to just continually rerun and retrain these models.
Yeah, when you think about a coherent cluster of 200,000 GPUs,
the amount of energy, the amount of resources,
just to retrain a mistake is huge.
So I think, I mean, the more we go into it,
the deeper we get, the more it kind of makes sense
paying so much money for talent to avoid these mistakes
where if you pay $100 million for one employee who will give you a strategic advantage to avoid having
to do another training run, that will cost you more than $100 million.
You've already, you're already in the profit.
So you kind of start to see the scale, the complexity, the difficulties.
I do not envy the challenges that some of these engineers have to face, although I do envy the
salary.
I envy the salary.
I envy the salary.
And I envy the adventure.
Like, how cool must that be trying to build super intelligence for the world as a human
for the first time in like the history of everything.
So it's got to be pretty fun.
This is where we're at now with the open source models, close source models.
K2's pretty epic.
I think that's a home run.
I think we've crowned a new model today.
Do you have any closing thoughts?
Anything you want to add before we wrap up here?
This is pretty amazing.
I think I'm most excited for the episode that we're probably going to release a week from now, Josh,
when we've seen what people have built with this open source model.
That's the best part about this, by the way.
to remind the listener, anyone can take this model right now. You, if you're listening to this,
can take this model right now, run it locally at home and tweak it to your preference. Now, yes,
it's going to be, you know, you kind of need to know how to tweak model weights and stuff,
but I think we're going to see some really cool applications get released over the next week,
and I'm excited to play around with them personally. Yeah, if you're listening to this and you can
run this model, let us know because that means you have quite a solid rig at your home. I'm not sure
the average person is going to be able to run this, but that is the beauty of the beauty of
the open weights is that anybody with the capability of running this can do so. They could tweak it
how they like. And now they have access to the new best open source model in the world,
which I mean, just a couple months ago from now would have been the best model in the world.
So it's moving really quickly. It's really accessible. And I'm sure as the weeks go by,
I mean, hopefully we'll get open AI's model, open source model soon in the next few weeks. We'll
be able to cover that. But until then, just lots of stuff going on. This was another great episode.
So thank you everyone for tuning in again for rocking with us. We actually plan on making this like 20 minutes, but we just kind of kept tailing off into more interesting things. There's a lot of interesting stuff to talk about. I mean, there's really, you could take this in a lot of places. So hopefully this was interesting. Go check out Kimmy K2. It's really, really impressive. It's really fast. It's really cheap. If you're a developer, give it a try. And yeah, that's been another episode. We'll be back again later this week with another topic. And just keep on chugging along as the frontier.
of A-I models continues to head west.
Also, we'd love to hear from you guys.
So if you have any suggestions on things that you want us to talk more about,
or maybe there's like some weird model or feature that you just don't understand
and maybe we can do a job at explaining it, just message us.
Our DMs are open or respond to any of our tweets and we'll be happy to oblige.
Yeah, let us know.
If there's anything cool that we're missing, send it our way and we'll cover it.
That'd be great.
But yeah, we're all going on the journeys together.
like we're learning this as we go. So hopefully today was interesting. And if you did enjoy it,
please share with friends, likes, comment, subscribe, all the great things. And we will see you
on the next episode. Thanks for watching. See you guys. See you.
