The AI Daily Brief: Artificial Intelligence News and Analysis - Why AI Advantage Compounds
Episode Date: December 12, 2025AI advantage is proving to be compounding, not linear. Drawing on new data from OpenAI, Menlo Ventures, EY, and early AI ROI Benchmarking results, this episode explains how leading organizations are p...ulling away by using AI more intensively, moving beyond time savings into higher-value use cases, and reinvesting gains back into deeper capabilities—creating flywheels that laggards will struggle to catch. In the headlines: real-world AI benchmarks, OpenAI user growth, chip-war intrigue, Oracle earnings, and AI market volatility.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsGemini - Build anything with Gemini 3 Pro in Google AI Studio - http://ai.studio/buildRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefLandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
This podcast is sponsored by Google.
Hey folks, I'm Amar, product and design lead at Google DeepMind.
Have you ever wanted to build an app for yourself, your friends,
or finally launched that side project you've been dreaming about?
Now you can bring any idea to life, no coding background required,
with Gemini 3 in Google AI Studio.
It's called vibe coding and we're making it dead simple.
Just describe your app and Gemini will wire up the right models for you
so you can focus on your creative vision.
Head to AI.studio slash build to create your first app.
Today on the AI Daily Brief, why AI Advantage compounds.
Before that in the headlines, AI benchmarks for the real world.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, Gemini, KPMG, Blitzy, Robo, and Robots and Penciles.
To get an ad-free version of the show, which is just $3 a month, go to patreon.com slash AI Daily Brief,
or you can subscribe directly on Apple Podcasts.
If you are interested in sponsoring the show, send us a note at sponsors at AIDailybrief.com
and newly returned for a test run.
Heading into 2026, I want to do some version of an AI newsletter, but I don't want to just
repeat what everyone else is doing.
So we're experimenting for the next couple of weeks with an evening edition we're calling
Nightdesk.
If you go to AIDailydief.ai, you can click on newsletter to subscribe to updates.
But with all that out of the way, let's dive in.
Welcome back to the AI Daily Brief Headlines edition.
all the daily AI news you need in around five minutes.
Regular listeners will know I'm not a super fan of benchmarks,
specifically the way that we use them around new model releases,
for all the reasons that other people complain about benchmarks as well.
Many of them are saturated, meaning the gradations between different models are incredibly small.
Many of them can be gamed.
Mostly, though, they just don't really exist and operate in the real world that we're using
these models in, and so they don't tell us all that much about how those models work in the real world.
Lately, however, there has been an effort to try to build e-vowels and benchmark
benchmarks that are the province of the real world, and one of them, which was introduced by
OpenAI in September, was called GDP Val.
Basically, that test modeled performance against a set of economically valuable tasks,
representing over 44 occupations.
The benchmark measures capabilities to complete knowledge work tasks end-to-end, including
following instruction, researching, doing the actual work, and then delivering the final
product.
Now, the grading of OpenAI's approach to this was difficult.
They rely on expert graders, which was a group of experienced professionals from the same
occupations that were represented in the dataset. They paired those human graders with an
automated grader, but said at the time that it wasn't yet as reliable as expert graders,
so they weren't in a rush to replace them. Well, now artificial analysis has completed an
evaluation harness based on that set of GDP val tasks to allow the benchmark to be run on any LLM.
As you might imagine, they are relying on an AI-based grading pipeline to allow the benchmark to be
run autonomously and at scale. They wrote, we think this makes it today's best way to compare
general agendic performance of language models. They're referring to their setup as GDP Val
AA for artificial analysis, and in their testing run, they found that despite OpenAI creating
the benchmark, Opus 4.5 was the leading model. GPT5 was in second place, Claude Sonnet 4.5 was third.
Interestingly, GPT51 underperformed GPT5, but still did well enough to land in fourth place,
and Deepseek 3.2 and Gemini 3 Pro were tied for fifth place. GPt 5th place. GPt51's slight
underperformance was curious, but artificial analysis noted that the model used half as many tokens
to complete the tasks. While the drop in quality is only slight, it still highlights that
5-1's increased efficiency does come at a cost. While Opus 4.5's run top the charts, it was very
expensive at $608, which was more than twice the cost of any other model that they tested.
DeepSeek 3-2 was the standout for cost efficiency, completing the benchmark run for $29,
bucks, 120th the cost of Opus, and a third of the cost of Gemini 3 Pro's run that tied on a score
basis. Ultimately, like anything, I think you need to continue to be skeptical of most benchmarks,
but at least this one is trying to get at the actual types of tasks that people in the real world
will be using these tools for. Speaking of OpenAI, one quick note on them, the information
reports that their sources say that ChatchipT is now nearing 900 million weekly active users,
up from that 800 million that they've been stuck at for some time. Moving over into the
world of drama and intrigue.
Bombshell news from the chip war as reports claim that Deepseek has built a Blackwell
Training cluster from smuggled chips.
The information reports that Deepseek has begun developing its next frontier model on a cluster
of several thousand of invidious state-of-the-art Blackwell chips.
These chips are banned for export into China even under Trump's recently announced reductions
in export controls.
The information is citing six people with knowledge of the matter as their sources, seeming to
suggest that the sources are China-based with close ties to Deepseek rather than U.S.-based
intelligence. The reporting claims that Deepseek secured this chip cluster by importing the chips via
data center providers in a third country. The claim is that Nvidia servers were delivered and
installed in third-party data centers, inspected by vendors for compliance with the export controls,
then dismantled and smuggled into China as individual components. Now, if true, the report changes
the landscape of the chip war. It would be the strongest reporting to date to suggest that Chinese
labs have been able to smuggle enough cutting edge chips to build a commercial training cluster.
previous reports have largely been about smuggling small batches of chips for research, and the
incident's reference were often from years ago. Now, in video for their part, is calling the report
into questions, stating, we haven't seen any substantiation or received tips of phantom data
centers constructed to deceive us and our OEM partners, then deconstructed, smuggled, and
reconstructed somewhere else. However, they did leave open the possibility that this was true, saying,
while such smuggling seems far-fetched, we pursue any tip we receive. Alongside the Deepseek report,
we also have news that Beijing is holding emergency meetings with tech companies on the new possibility
of H-200 imports. Officials reportedly met with representatives from Alibaba, Bight Dance, and Tencent this week
to assess their demand for H-200s. Officials reportedly asked for specific information on how many H-200s
the tech companies need for training, somewhat suggesting that Beijing is preparing to take up the White House's offer
and allow the chips into the country. Now, we talked a lot about the U.S.'s strategic decision-making,
but as the information points out, the quote, drumbeat of actions highlights Chinese policy
maker's dilemma, whether to support AI development that needs powerful chips China can't yet produce
or push through the adoption of homegrown chips to eventually rid the country of U.S. technology.
Overall, just a ton of intrigue to end the year in the geopolitical AI race.
Moving over briefly to markets, Oracle put out a rough earnings report that sent AI stocks into a
tailspin. For the past few months, Oracle and their sky-high AI infrastructure has been viewed as
the canary in the coal mine for the AI bubble, meaning of course that weak earnings are a red flag.
While Oracle reported 34% growth in cloud sales and 68% growth in their infrastructure business,
both numbers fell short of Wall Street estimates.
Their remaining performance obligations, a measure of orders yet to be fulfilled, jumped 5x to
$523 billion.
Now, this was the quarter that Oracle booked their massive set of new orders from OpenAI,
and also the quarter where they realized the gain from selling Ampeer computing.
So realistically, the numbers were always going to be all over the place in this report,
and frankly don't say all that much about the company's health.
The figure that many honed in on was that capital expenditures were around $12 billion
from the quarter up from $8.5 in the past quarter.
Analysts had actually expected a reduction to $8.25 billion, making this a big surprise to
investors.
Oracle also raised CAPX forecast to $50 billion for the fiscal year ending in November
2026, which is a $15 billion increase from their previous forecast.
Now, it shouldn't come as a shock that Oracle is spending a ton in CAPEX, but this was the
first time those increased numbers have been laid out in an earnings report.
ultimately the stock fell by 11% in after-hours trading,
and the report seemed to drag down other AI stocks as well,
with Nvidia losing 1% overnight.
Look, ultimately, I think we are going to see a lot of pendulum swinging between
it's a bubble and it's not, and doveishness and hawkishness,
and we kind of just got to roll along for the ride.
With that, however, we will close the headlines.
Next up, the main episode.
Sure, there's hype about AI, but KPMG is turning AI potential into business value.
They've embedded AI and agents across their entire enterprise to boost efficiency,
improve quality and create better experiences for clients and employees.
KPMG has done it themselves.
Now they can help you do the same.
Discover how their journey can accelerate yours at www.kpmg.
dot us slash agents.
That's www.kpmg.org.us slash agents.
This episode is brought to you by Blitzy,
the Enterprise Autonomous Software Development Platform with Infinite Code Context.
Blitzy uses thousands of specialized AI agents that think for hours
to understand enterprise scale.
code bases with millions of lines of code. Enterprise engineering leaders start every development sprint
with the Blitzie platform, bringing in their development requirements. The Blitzy platform provides a plan,
then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work
autonomously, while providing a guide for the final 20% of human development work required to complete
the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating
Blitzie as their pre-I-D-E development tool, pairing it with their coding pilot of choice to bring an AI-native
SDLC into their org.
Visit blitzie.com and press get a demo to learn how Blitzy transforms your SDLC from AI
assisted to AI native.
Meet Rovo, your AI-powered teammate.
Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or
build your own agent with Studio.
Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and
secure platform, so it's always working in the context of your work.
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps
and delivers personalized AI insights from day one.
Robo is already built into Jira, Confluence and Jira Service Management Standard, Premium, and Enterprise Subscriptions.
Know the feeling when AI turns from tool to teammate?
If you Rovo, you know.
Discover Rovo, your new AI teammate powered by Atlassian.
Get started at ROV as in VictoryO.com.
Small, nimble teams beat bloated consulting every time.
Robots and Pencils partners with organizations on intelligent cloud-native systems powered by AI.
They cover human needs, design AI solutions, and cut-through complexity to deliver meaningful impact without the layers of bureaucracy.
As an AWS-certified partner, robots and pencils combines the reach of a large firm with the focus of a trusted partner,
with teams across the U.S., Canada, Europe, and Latin America.
clients gain local expertise and global scale.
As AI evolves, they ensure you keep peace with change.
And that means faster results, measurable outcomes, and a partnership built to last.
The right partner makes progress inevitable.
Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief.
Welcome back to the AI Daily Brief.
Today's topic is something I'm sure that for some of you will be kind of like, well, duh.
But I think for others, especially those of you who are trying to advocate for AI initiatives inside your organization,
organizations, the point at the core here is really important, and especially for those who are
perhaps not as advanced in their AI journey, could represent a fairly important shift in their way
of thinking. We are now three years into the Gen AI era. In that time, pretty much every organization,
large and small, has at least started to dabble with how Gen AI could impact how they do business.
However, there is, of course, a huge breadth and range of how different companies are engaging
with AI. We have everything from nascent experiments or a sort of DIY.
posture that allows employees to go figure it out for themselves, but doesn't really have
anything formal at the organization level, to on the other end of the spectrum, extremely
sophisticated and comprehensive org-wide efforts. The point of today's episode is that the stakes
for the organizations that are starting to lag are perhaps even higher than they think.
Increasingly, we are seeing how AI advantage compounds. In other words, how the enterprise AI rich
get richer and are likely to grow the gap between them and peers and competitors who are not
moving as swiftly with AI rather than see that gap shrink as the laggards start to catch up.
Like I said, for some of you, this may be intuitive, but the important thing is that we're
starting to see evidence of this. Now, before we fully get into it, over the last week, we've talked
about a couple of different reports. We've talked about the State of Enterprise AI report from Open
AI, as well as the Menlo state of generative AI in the Enterprise Report, which is their third annual,
and in the background, we are working on the analysis at the AI-R-OI benchmarking survey,
which ended up having people submit and quantify more than 5,000 use cases.
As this was all happening, we also got a new Pulse Survey from EY.
And since we haven't talked about that in a previous show, I want to talk about a couple of the highlights.
This was once again a poll of about 500 U.S. senior leaders,
and much of what they found is very similar to these other reports.
Quite simply, as they put it, the early promise of AI is no longer speculative.
96% of the leaders surveyed are seeing AI-driven productivity gains, with 57% of them seeing
significant gains.
96% of those surveyed are also seeing significant measurable improvements in overall financial
performance, with only 4% saying they're seeing no measurable improvements in overall financial
performance.
Now, interestingly, there are still lots of challenges.
One that they hone in on is what they call the attribution conundrum, where they say in
some cases it's difficult to specifically attribute those productivity gains directly to
AI.
88% of leaders said that AI-driven.
productivity is a key metric that leaders at their organization are evaluated on, but about two-thirds,
65% said their organization struggles to tie certain productivity gains directly to AI adoption.
Also, in a good reminder, that self-reporting and qualitative studies can only tell us so much,
EY did find that there was something of an over-optimistic streak among the leaders that they
surveyed in terms of how much budget they were actually going to put towards AI. For example,
in 2024, 65% of those surveyed said that they expected their organization to invest at least
a million dollars into AI. And while it wasn't far off, in 25, 58% of their organizations said that
they actually do. There was a bit more of a gap among those who anticipated investing 10 million or more.
In 24, a little over a third at 34% of leaders said that their organizations would spend 10 million
on AI, but the number in reality was only 23%. Now, there are some other interesting statistics
from this, but I want to put them in the context of this larger compounding idea. So I'll sum up
with a paragraph from their introduction, which is so resonant based on everything else we've been
hearing recently. What separates leaders now is not the number of tools, but the discipline of
enterprise-wide integration. Successful businesses will move from isolated experiments to enterprise
transformation, weaving AI into how the business runs and embedding responsibility from the jump.
All right, so let's come back to this idea of AI advantage compounding and put it together
across all of these different sources. So first, let's talk about the usage gap, and the idea
that leaders are actually using AI differently. Open AI called those in the 95th percentile,
of adoption intensity, Frontier workers and Frontier organizations. These frontier workers generate
six times as many messages as the median worker, and frontier organizations generate two times
as many messages per seat than the median enterprise. And importantly, this gap widens when you look at
complex tasks. Frontier workers are 10 times as active in analysis and calculations and 17 times more
in coding compared to the median. Now, based on OpenAI's research, they find that as organizations
move from simple use to more mature complex use,
they move more and more of their work to these custom GPs
because they become repositories of context and knowledge.
And so when you see that frontier organizations
are sending seven times as many messages to GPTs,
it means that they're not just chatting more,
but have fundamentally integrated AI into more complex workflows.
What's more, these more complex uses
are making up a growing portion of the total overall enterprise usage.
The number of weekly users of custom GPs and projects was up 19x.
About a fifth of all enterprise messages now are going through custom GPs or through projects.
Now, the next interesting thing to note is that there's evidence in a bunch of places that more usage begets more value in a nonlinear way.
So previewing some of the results from the AI-R-OI benchmarking survey, we divided impact into eight different types of benefits.
Cost savings, time savings, increased revenue, new kids,
capabilities, improved decision-making, risk reduction, and a couple of others. And we found that
respondents who shared use cases with a wider breadth of benefit types reported higher ROI, the more
benefit types their use cases had. With a three representing modest ROI gains and four representing
significant, those whose use cases had just one benefit type, had a mean ROI of 3.13. Those who
reported four benefit types had a mean ROI of 3.35, and those who reported eight benefit types,
saw a mean ROI of 3.65. Open AI also identified this phenomenon of people with more intensive
AI use getting more value. Workers who save over 10 hours a week use about eight times as much
intelligence than those reporting zero hours saved. They're also using multiple models,
more tools, and AI across more types of work. They also found that workers who engage across
more different task types report more time saved than those using fewer task types. Specifically,
workers who engaged across seven task types reported five times as much time saved than those using
only four task types. Let's double click, however, on this idea of time savings as the metric.
As I mentioned, we divided things into eight different potential benefit types. And what we found is,
on the one hand, time savings is for sure the universal entry point to AI value. More than 76% of
respondents to the AI-R-OI benchmarking study reported time savings as at least one of the benefits across the
use cases that they quantified. However, time savings overall has a weaker correlation with high
ROI than some other categories of benefits. The strongest predictors of high ROI were in use
cases whose primary benefit was improved decision making, new capabilities, or increased revenue,
suggesting that as individuals and organizations move up the value chain from the simple surface
layer of time savings towards deeper, more complex and sophisticated use,
uses of AI, they are getting differentiated, again, nonlinear ROI value as compared to those
simpler use cases, which are the domain of many of the laggard organizations.
It turns out there is also a money side of this. According to the EY survey, organizations that
invested 10 million or more in their survey were far more likely to see significant productivity
gains compared to those investing less than 10 million. For those investing less than 10 million,
52% said their organization had seen significant AI productivity gains, and that number jumped
to 71% for those investing 10 million or more. And the important thing is that the big
spenders seeing big results are then immediately plowing those gains back in to get further ahead.
96% of organizations that are seeing gains are then reinvesting them. 47% are reinvesting into
expanding their existing AI capabilities, 42% are putting it into developing new AI capabilities,
and 39% are putting it back into research and development.
Only 17% are reducing headcount,
and only 24% are returning capital to stakeholders.
And this might be the scariest part for the laggards.
The leaders aren't taking profits.
They're buying more AI.
They're reinvesting 47% of their gains back into AI capabilities,
creating a flywheel that makes them impossible to catch.
And I believe that this reinvestment is poised to increase
the speed at which value compounds even further.
One of the things that we learned from the method,
Lowe study was that only 16% of enterprise deployments could really qualify as agentic.
In other words, systems where an LLM was actually planning and executing the action,
observing feedback, and adapting their behavior.
And even those that were agentic were very, very simple.
However, the companies that are ahead, and the reason that real agentic deployments are
still so nascent is that even more than co-pilots, they require some actual organizational
infrastructure to be built to really get those gains.
Data needs to be organized, ready and accessible.
Specific tool calling needs to be wired into the design of systems
and to be able to plug into the systems that already exist, etc., etc.
Basically, organizations are learning that to really get the most out of agentic and autonomous AI,
they have to redesign the stack to support it.
However, they're starting to do that,
and once they're able to actually deploy autonomous agents
that can do bigger, more complex chunks of work,
the compounding flywheel that increases their separation from the laggards
is just going to move faster and move them farther ahead.
Effectively, you have advantage loops that compound in an increasing fashion as they move to scale.
Individuals build AI skills, save time, discover more and more advanced use cases, and get more value.
Those skilled individuals then create organizational momentum.
They start to embed AI into more complex workflows, which allow at the organization level
for the capture of more productivity gains, which get reinvested in AI capabilities,
and increasingly build structural advantages.
Those structural advantages are then used to reshape markets.
Structural advantages turn into new benchmarks for current offerings,
in terms of how fast you can produce them, how much you can produce, or at what cost.
But it's not just current products.
Remember, 39% of leading organizations are reinvesting into R&D,
and 42% are reinvesting in new AI capabilities,
meaning that in addition to producing the current crop of products and services faster, better, cheaper,
they're also innovating new product lines.
Those new product lines are going to give them revenue advantages,
all of which leads to more investment and a compounding competitive moat.
The point ultimately is that being behind or ahead in AI is not linear scale.
The organizations that are behind now are likely to get farther behind.
The organizations that are ahead now are likely to get farther ahead,
which is, of course, good news for the leaders and very bad news for the laggards.
And to top it all off, as I said, I think it's going to get even more dramatic,
as those leaders increasingly put the infrastructure in place that allows
them to fully tap into more autonomous and agentic AI.
Anyways, guys, as I've been watching all of these enterprise surveys, it feels to be like this is
one of the most important subtextual lessons that was really worth digging out in deeper fashion.
And so I hope you found this useful.
If you did, and if you happen to find yourself in one of those laggard organizations,
tell your colleagues why you're right to advocate for more determined and concerted AI
efforts.
For now, that's going to do it for today's AI Daily Brief.
Appreciate you listening or watching as always.
And until next time, peace.
