The AI Daily Brief: Artificial Intelligence News and Analysis - The Open Source AI Model Beating GPT-5 on Agents
Episode Date: November 11, 2025Today on the AI Daily Brief, NLW explores the rise of Kimi K2 Thinking, a new open-source model from China that’s outperforming GPT-5 and Claude 4.5 Sonnet on agentic benchmarks—and doing it at a ...fraction of the cost. We’ll look at how this shift is changing the balance of power between closed and open models, why Silicon Valley startups are already adopting Chinese systems, and what it means for the next phase of the AI race. Plus: Meta’s new speech model, DeepSeek’s dire job-market warning, and CoreWeave’s data-center delays.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, meet the open source model that is outperforming GPT5 and basically everyone else when it comes to a gentic performance.
Before that on the headlines, maybe vibe-goating isn't dead after all.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, Super Intelligent, Robots and Pencils, Blitzie, and KPMG.
To get an ad-free version of the show, go to patreon.com.
or you can subscribe on Apple Podcasts.
If you are interested in sponsoring the show, and especially if you are hoping to get any Q1
placements, now is a really good time.
Things are filling up fast.
And I'm trying to map everything out.
So if you are interested or thinking about sponsoring the show and you just want to learn about
the opportunities we have, send us a note at sponsors at AIDDailyBief.A.I.
Like I said, if you are hoping to get Q1 placement, now is a good time to reach out.
Lastly, as I mentioned yesterday, we are now up over a thousand use cases contributed to the AI-R-OI
benchmarking study. I am so appreciative of all of your help so far, and if you want your use cases
included, as well as to get access to the full readout of all of this incredible AI-R-OI information,
go to ROISurvey.com. It'll be live for about another week and a half. With that, let's dive in.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around
five minutes. Apparently, rumors of vibe coding's demise have been greatly exaggerated. Speaking with
TechCrunch on Monday, Lovable CEO Anton Ocica said that the company is closing in on 8 million
users, dramatic growth from their 2.3 million active users back in July. Oseka claimed the company
is now seeing 100,000 new products built on Lovable every single day. We didn't get a new revenue
number, but Lovable crossed the $100 million ARR milestone back in June, and there are currently
rumors of new funding being raised at a $5 billion valuation, which would almost almost
almost tripled their valuation from fundraising over the summer. Now, part of the interview addressed
a report from Barclays in September, which showed that traffic to Lovable had dropped by 40%
since a peak in August. Oseka said that retention was still strong, with 100% net dollar retention,
meaning the average user spends more over time. Now, of the major vibe coding startups,
Loveable might be the one that's most focused on empowering non-coters. The platform not only enables
easy prototyping, but is increasingly being used to deploy full products. If you've ever been
on AIDailybrief.Ai, for example, that is built.
built, maintained, and hosted all with help from Lovable.
Now, when it comes to where the company is focused, it follows from that same specialization.
Oseka said the part of the engineering organization that we're moving the quickest on hiring is security engineers.
He said that the goal is to make building with Lovable more secure than building with just human written code.
Now, in terms of the battle for the vibe coding space and increased competition from OpenAI and Anthropic,
Oseca said that he thinks it's not winner take all.
He said, if we can unlock more human creativity and human agency,
and just driving the change so that anyone can create if they have good
ideas, that should be celebrated regardless of whoever does that.
Next up, Meta has returned to open source with a new speech recognition model.
Called Omnilingual ASR, the model's big selling point is support for a huge range of underserved
languages.
Out of the box, the model can recognize over 1,600 languages.
In contrast, OpenAI's Open Source Whisper model supports 99 languages.
Developers can also extend this support with a feature called Zero Shot in Context Learning.
The model can learn new languages at inference time using just a few parody.
examples of speech and text, with no retraining required. Meta said the feature can allow the model to
support as many as 5400 languages, which is pretty close to every language in use globally.
Functionally, then, meta are claiming to have created something like an AI Rosetta Stone for
universal speech recognition. Reported benchmarks are also very strong, with the model more than
quadrupling the performance of Open AI's Whisper Large model. Meta claims a character error rate of
less than 10% for 95% of high and medium resource languages, as well as 36% of low resource languages
with less than 10 hours of audio in their datasets.
Now, while the model itself is very cool,
the reason that most people are taking notice
is that the release suggests that meta might not be completely done
with open source models.
When Mark Zuckerberg started spending billions of dollars
to build out the superintelligence team,
there was a suspicion that the days of leading open source models
coming out of meta were numbered.
Does this suggest that those concerns were overblown?
Only time will tell, but it's certainly a positive sign.
Next up, some interesting comments from a deep-seek researcher
who has warned that AI could replace most jobs within a decade.
Senior researcher Chen Deli made a rare public appearance at the World Internet Conference in China
late last week alongside executives from five other AI and robotics companies.
He warned that over the next 10 to 20 years, quote, societal structures will also be greatly
challenged.
Tech companies should play the role of guardians of humanity at the very least protecting human
safety, then helping to reshape societal order.
Chen said that we're currently in the honeymoon phase where AI cannot work independently
to complete economically useful tasks, and people can harnesses.
AI to boost their own productivity. However, he predicted that the next five to 10 years will see a
rapid transition that leads to massive job cuts. Chen suggested, quote, during this period, tech
companies should serve as whistleblowers warning society of potential risks. Now, this view certainly isn't
rare in the West. What makes it interesting is to see it emerge from one of the leading Chinese
companies. AI optimism among the U.S. population is among the lowest in the world at 39%. But in contrast,
Chinese sentiment is among the highest at 83%. The AI transformation has become a core part of
of the Chinese government's economic and social strategy. In that context, the comments from Chen
seem extremely non-consensus and frankly, even potentially a little risky.
Moving over to markets, Corweave more than doubled their revenue forecast last quarter,
but delays in data center construction have lowered revenue forecasts. The AI data center operator
reported earnings on Monday, with revenue doubling year-over-year to come in at 1.36 billion,
outperforming analyst estimates. Corweave also trimmed their loss making to 22 cents per share,
coming in way under the 57 cents per share projected by analysts and an 85% reduction compared to a
year ago. Still, the big story from Corweave's earnings was a delay to a major product that's limiting
forward revenue. CEO Michael in Trader disclosed that a third-party developer is causing temporary delays.
Fourth quarter earnings will be impacted, but the client agreed to an adjusted timeline,
so Corweave will maintain the full value of the contract. Entrater said,
Everybody is frustrated, the data center provider is frustrated, we're frustrated, the client is frustrated,
People who are waiting on the next iteration of AI are frustrated.
Now, the mystery client could be OpenAI or Meta, who each have over $10 billion in contracts
with Corweave.
CoreWe've lowered full-year revenue forecast to $5.05 billion from $5.15 billion due to the delays.
Now, one really positive signal, however, from that call, it seems that installed GPUs are
holding their value for longer than expected.
Corweave has been criticized in the past for assuming a six-year depreciation schedule
on Nvidia H-100s, which is longer than the more common four- or five-year schedule.
During earnings, however, Corey announced that their first H-100 contract was reaching Exbury
and was re-signed within 5% of the original price.
In other words, at the moment at least, it looks like the scarcity of compute is trumping
all other factors in the current market.
Now, checking in on AI stock themes overall, it does seem like many of the jitters last week
were perhaps broader macro factors and not AI alone.
As we came into the week, with a deal to end the government shutdown deal on the horizon,
there was a major Wall Street rebound with AI stocks leading the way.
The S&P 500 was up 1.3% winning back around 75% of its drop from last week.
The NASDAQ regained around 2 thirds of last week's loss, and NVIDIA led the way with a 4.8% rally.
Now, I certainly do not think that this means that all of the concern that we saw last week
was just based on bigger macro factors, but it is a good reminder that right now,
AI is both the chief beneficiary and biggest victim of any shift in market sentiment,
good, bad, or otherwise.
That, however, is going to do it for today's headlines.
Next up, the main episode.
Today's episode is brought to you by my company, Super Intelligent.
You've got 100 what if ideas, but which one becomes an agent.
Super Intelligent maps every AI use case across your company
and helps you create an agent plan that you can actually execute.
We match opportunities to your tech stack, your data profile, and your team.
No more guesswork, just a clear path from pilot to production.
If you want agents that deliver business outcomes, start with planning.
Go to BSUper.ai and sign up for a demo.
Small, nimble teams beat bloated consulting every time.
Robots and Pencils partners with organizations on intelligent, cloud-native systems powered by AI.
They cover human needs, design AI solutions, and cut-through complexity to deliver meaningful impact without the layers of bureaucracy.
As an AWS-certified partner, robots and pencils combines the reach of a large firm with the focus of a trusted partner.
With teams across the U.S., Canada, Europe, and Latin America, clients gain local expertise and global scale.
As AI evolves, they ensure you keep peace with change, and that means faster results, measurable
outcomes, and a partnership built to last. The right partner makes progress inevitable.
Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief.
This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform
with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours
to understand Enterprise-scale code bases with millions of lines of code. Enterprise Engineering
leaders start every development sprint with the Blitzie platform, bringing in their development
requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each
task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the
final 20% of human development work required to complete the sprint. Public companies are achieving a
5x engineering velocity increase when incorporating Blitzie as their pre-IDE development tool, pairing it
with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzy.com and press
get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI Native.
What if AI wasn't just a buzzword, but a business imperative?
On You Can with AI, we take you inside the boardrooms and strategy sessions
of the world's most forward-thinking enterprises.
Hosted by me, Nathaniel Wittamore, and powered by KPMG,
the seven-part series delivers real-world insights from leaders who are scaling AI with purpose,
from aligning culture and leadership to building trust, data readiness, and deploying AI-8.
agents. Whether you're a C-suite executive, strategist, or innovator, this podcast is your front-row seat
to the future of Enterprise AI. So go check it out at www.kpmg.org.us slash AI podcasts or search
you Penn with AI on Spotify, Apple Podcasts, or wherever you get your podcasts.
Welcome back to the AI Daily Brief. Today we are once again talking about another Chinese
open source model that is really changing people's sense of what is possible.
in the field of AI today.
Now, to put this model release in some proper context, we have to go back to January.
It is now coming up towards the end of the year.
And of course, this is the time when I start to plan out my end of year coverage,
which is a big time for reflecting on the year that has passed and what's to come.
And any end of year big story recap is inevitably going to kick off with the big story from January,
which was, of course, the release of Deepseek.
When Chinese lab Deepseek dropped their reasoning model, it caused an absolute tizzy in the AI
industry that even sent stocks reeling. Now, there were three big reasons that Deepseek was such a big
deal. The first was that it totally changed people's perception of how far behind us China really
was. Up until that point, people were working on the assumption that when it came to model
development, China was meaningfully behind the U.S., and Deepseek seemed to suggest that wasn't true.
The second big reason for concern and the one behind the big stock wobble was that at the time
it appeared that they had achieved those results at significantly lower cost than big U.S. training runs.
This made everyone question the incredible amount of resources being spent on the data center buildout.
The third reason DeepSeek was such a big deal was more on the consumer side.
When they released their R1 reasoning model, the chatbot app that housed it actually dethroned
chat chitpity to become the number one downloaded free app on Apple's App Store for iPhone.
Now, what was interesting about this was that Deepseek was not the first company to release
a reasoning model. At that point, OpenAI's R1 had been available for a number of months. The
difference was that DeepSeek made it available for free, meaning that for most people, it was their
first experience with a reasoning model, which, of course, if you've ever experienced the jump
from a non-reasoning to a reasoning model, is just a fundamentally different LLM experience.
So this is what kicked off the year, and set the tone for a number of different conversations
that we'd be having throughout the year.
Now, more recently, the whole China element of this story has heated back up in a big way.
Invitya CEO Jensen Huang recently said in very stark terms that he believed that China would win
the AI race because of their disposition towards it.
And even though, by the way, all these outlets are reporting that he backtracked,
for my money, the backtrack was kind of more just a reaffirmation of what he was saying
while trying to present a slightly more positive spin like the U.S. still had a chance.
along with the rise in AI skepticism among market investors,
there has also been a surge in the idea that China isn't building as many data centers
and that perhaps the U.S. is overbuilding then.
Investor Gordon Johnson went viral with a tweet that said,
Question for the AI Bulls.
The U.S. currently has around 5,426 data centers and is investing billions to build more.
China has around 449 data centers and is not adding.
If AI is real, why isn't China building thousands of data centers every month,
which they could clearly do?
Semi analysis is Dylan Patel responded,
Where did you get the idea that they aren't adding? Not as much as the U.S., but China has thousands
of data centers and are building many more. Your data source sucks. Now, the substance here is
less important than the narrative and the fact that once again, China's actions become the big foil
for the U.S.s. And this is the setup into which the new Kimi K2 thinking model was released.
The new model was released by Moonshot last Thursday with claims of outperformance on major benchmarks.
The model purportedly leads both GPD5 and Claude Sommet 4.5 on Humanity's Last Exam,
which is a general knowledge test, on Browsecomp, which is a test of agentic search,
and Seal Zero, which is a test of the ability to collect real-world data.
The model lags slightly on major coding benchmarks like Sweet Bench verified, but not by much.
Didi Das of Menlo Ventures wrote,
Today is a turning point in AI.
A Chinese open source model is number one.
Kimi K2 Thinking scored 51% on Humanity's last exam,
higher than GPT5 in every other model.
60 cents per million tokens and $2.5.00 per million tokens output.
The best at writing and does 15 tokens per second on two Mac M3 Ultras.
Seminal moment in AI.
In other words, the point that D.D. is making here is that in addition to performing well,
it's doing so cheaply and in a way that's efficient enough that people could run it on their own hardware.
Now, in addition to scorching the benchmarks, Moonshot claimed the model is capable of 200 to 300 sequential tool calls
without human interference. If that's true, it would make it incredibly capable for agenic workflows,
frankly, head and shoulders above many of the Western frontier models. Indeed, according to independent
testing from artificial analysis, Kimmy is now ranked ahead of GPT5, Clod 4.5 Sonnet, and GROC4 on agentic
tool use, and there's a fairly significant gap. Some, like Dan Mack, suggested that this might be
enough to delay the release of the next generation of models as the frontier labs go back to the drawing board.
referencing that same recent quote that we were just talking about from Jensen Huang,
the one where he said that Chinese AI is nanoseconds behind America.
Dan wrote,
Jensen is right, look at Kimi K2 thinking.
Watch for delayed releases of Gemini 3, Opus 4.5 and GPD 5.1.
Delays signal they are not clearly better or cheaper than Kimi K2 thinking.
That is evidence that the USA is indeed falling behind in the race.
Said Machina, Kimi K2 beating Gemini 3 would be, well, humiliating doesn't even cover it.
Think about what Google has.
Decades of data, the best talent money can buy, infrastructure that runs the Internet.
And they're sweating a smaller team's model?
That's not supposed to happen in tech.
The big guy wins always.
Maybe not this time, though.
Now, part of what has people excited is that the model is open source, so people were running their own tests over the weekend.
Pietro Sherato, the CEO at Magic Pathai, wrote,
Kimi K2 Thinking is incredible.
So I built an agent to test it out, Kimi Writer.
It can generate a full novel from one prompt,
running up to 300 tool requests per session.
Here it is creating an entire book,
a collection of 15 short sci-fi stories.
LXE gave the model the task of balancing nine eggs,
a book, a laptop, an empty plastic bottle,
and a nail to try out its reasoning.
The model came up with a counterintuitive solution
of arranging the eggs to support the book as the starting point,
then adding the book, laptop, bottle, and the nail in turn.
LXE remarked,
Kimi Ketu Thinking is the only modern reasoning model in recent memory
that provided a human solution to this on the first try.
Now, another big shift here is that Chinese models are now right there with the U.S. models on coding.
AI coding has been the breakout killer use case for this year, and frankly, that's probably
been something of a comfort for the Western companies, as this is one area where they've
continued to maintain something of a lead. At the beginning of the year, Claude 3.5 Sonnet was the
premier model with no-close competitor. Since then, later versions of Claude, GPD5, Gemini
2.5 Pro, GROC4, all have vied for the top of the leaderboards and API credits from developers.
increasingly, though, Chinese models are catching up, if not to the absolute state of the art,
at least presenting a very compelling cost-of-value trade-off.
Kimi-Katu Thinking is clearly better at coding than Claude 3.5 Sonnet, the model that everyone
was using just a few months ago, and it's being served at a fraction of the cost.
In a recent article, the information suggested that that competition is a huge problem
for Anthropic in particular, given how much of their revenue is derived from API use for coding.
They also point out that looking abroad is an imperative for the Chinese startups, writing,
it is critical they find customers outside China who pay to access the AI models through APIs
no matter how low the prices are. That's because it's difficult for AI companies in China to generate
revenue from domestic customers, where price competition is fierce, and business customers are reluctant
to pay for subscriptions. The article continues, as the overall AI coding market grows rapidly,
the Chinese companies are betting that there will be sufficient demand for cheaper and good
enough options. And in fact, this is one way that the release of Kimmy K2 could end up being
different to the Deepseek moment. If the release of Deepseek R1 was all about giving consumers
their first glimpse of reasoning models that were hidden behind the paywall at OpenAI,
Kimmy K2 Thinking could end up being more about providing a near state-of-the-art model
that could perform in the enterprise at a fraction of the cost. Another interesting shift is that
models like Kimmy K2 Thinking are opening the door to self-hosted LLMs in a way that wasn't
really feasible last year. Up until recently, there has been a stark trade-off when a developer
chose to run models locally.
Previously, you could use open-source models to underpin products that didn't need
state-of-the-art AI or you could tinker around with them.
But for serious advanced production use cases, there needed to be a very significant reason
to want the privacy or security of a local model to make up for the reduced performance.
Kimi Ketu Thinking is one of a crop of Chinese models that have reduced that gap.
One of the reasons for that is an innovation in quantization.
You can think of quantization as kind of like compression for AI models.
While the process reduces performance, it also lowers the memory requirements substantially
to allow models to fit on consumer hardware.
Kmi K2 Thinking, for example, can be quantized down to run on a pair of Mac M3 Ultras,
which is certainly not a cheap consumer setup, but it is a realistic rig for a professional
programmer or a company.
Some are starting to wonder if local LLMs will be a growing trend.
I'm not really sure that I'm convinced at this point, but it is possible that we will
see certain types of industrial use cases where the balance of value that you get
from running locally does shift things and that will be an important trend to keep an eye on.
And while we haven't seen a lot of U.S. enterprises all of a sudden adopting Chinese models,
there are growing reports that the startup ecosystem has already made the switch.
Bloomberg opinion columnist Catherine Thorbeck wrote,
In recent weeks, a subtle shift has become increasingly apparent.
Speculation has been stirring for months that low-cost, open-source Chinese AI models
could lure global users away from U.S. offerings.
But now it appears they are also quietly winning.
over Silicon Valley. She referenced Chimath Palahapitia commenting that one of his portfolio
companies has already moved major workflows to Kimi K2, which he said is, quote, frankly,
just a ton cheaper than OpenAI and Anthropic. That same week, Airbnb's CEO, Brian Chesky
said that they hadn't integrated with OpenAI because the connections aren't quite ready.
Instead, Airbnb's new service agent is, quote, relying a lot on Alibaba's Quen3 model,
which Chesky said is very good and also fast and cheap. Miramarati's thinking machine's lab is also
building on Quinn 3. Cursor's new in-house coding agent, Composer 1 is rumored to be built on top of a
Chinese model, and Hugging Face downloads for Quinn have recently overtaken downloads of
meta's Lama models, suggesting a shift in user patterns for open source AI. Referencing that same
Jensen Huang quote, Thorbeck wrote, it's premature for Huang to declare a winner. The U.S. still has
clear advantages when it comes to access to cutting-edge chips and computing power, but Beijing's
low-cost in open-source pushes undoubtedly attracting developers the backbone of AI innovation.
If Washington truly wants to come out on top in the long run, it should start by asking why Silicon
Valley is already switching sides.
So what's the net of all of this?
Cashat Patel writes, Kimi Ket-2 thinking is more important than O3, not because the model is better,
but because of what it signals about the future of AI development.
For him, there are a few different elements of this.
First, that the open-source lag is now measured in months, not years, that basically we've seen
the closed model advantage window collapsed from more than 18 months to three to four months,
that China is treating AI like they treated electric vehicle manufacturing,
in other words, not trying to match the West but trying to lap it on price and accessibility
and competing on economics. And then this observation,
the real race isn't to AGI, it's to democratization. He writes,
Who cares if you build AGI if only a thousand companies can afford it?
Kimi K2 provides frontier performance at commodity prices. That's the game.
Dean Sackaransky thinks that the agenic capabilities update is the real deal here.
He writes,
In July 2025, models could not effectively call tools,
three to five pool calls max.
Then Kimi K2 released,
and every subsequent model has been post-trained for tool calling.
Now we have agents that can run for an hour and 30 minutes.
This is the quietest and most significant advancement in recent memory.
Bindu Ready, writes,
In spite of all the closed-source drama,
the biggest story of 2025 has been open-source agentic models.
Three new models dominate the cheap mass market agent space,
GLM, Kimi K2, and Quinn Koder are all.
amazing, with trillions of tokens being used every day. That leads to a prediction from Bindu.
2026 will be the year of open weights. We will see at least two U.S. labs enter the arena.
Kimi and GLM will push to close the gap in agentic coding. DeepSeek will finally release R2.
We will have state-of-the-art image and video generation models. LLM developer community will
explode. Now look, obviously one of the subtexts for a lot of this show is around the geopolitics
of this, but when it comes to consumer choice, it's hard to see all of these advanced.
is anything but incredibly valuable.
New frontiers of performance and costs are being pushed,
bringing the efficiency and affordability of everything down,
and that's going to mean all of us being able to do even more with these models
than what was previously possible.
Pretty interesting stuff, obviously a lot to keep track of.
For now, it's going to do it for today's AI Daily Brief.
Appreciate you listening or watching as always,
and until next time, peace.
