The AI Daily Brief: Artificial Intelligence News and Analysis - Will This OpenAI Update Make AI Agents Work Better?
Episode Date: December 15, 2025Today’s episode breaks down OpenAI’s quiet adoption of Anthropic’s “skills” mechanism and why it could meaningfully change how AI agents work in practice. The discussion explains what skills... are, how progressive disclosure improves efficiency and reliability, and why modular, shareable instruction folders may matter more than building ever-more complex agents. In the headlines: fallout from the White House executive order blocking state AI regulation, GOP infighting over AI policy, Nvidia H200 export approval to China and Beijing’s response, and early benchmark results for GPT-5.2.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefLandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, why Open AI are adopting the skills mechanism and how it could improve agents.
Before that in headlines, the fallout from the latest White House executive order on AI.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG, Rovo, Robots and Pencils, and Blitzy.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts.
If you are interested in sponsoring the show, send us a note at sponsors at AIdailybrief.
We can send you all the information you need.
Also at AIdailybrief.aI, you can find out anything else you might need to know about the podcast.
We're going to be doing a few more days of this newsletter test this week before reviewing
and seeing what the plan is for January.
For now, like I said, you can find that all on AIDailybrief.aI.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around
five minutes.
Last week, after a lot of behind-the-scenes discourse, some of which spilled into very public acrimony,
President Trump signed a highly contentious order attempting to block states from passing their own AI regulations.
Now, this is one of those classic debates that's about 100 things at once.
To take the administration at face value, this is about creating a single federal rulebook
as a necessary step to ensuring the U.S. can win the AI race.
But then, of course, underneath that, there are issues of the power relationship between the federal government and states.
That's one that's been big here in the U.S. for the last 250 years or so.
And there's also the sub-story of the GOP fracturing around Trump's alliance with AI technology
companies. A draft of the order had circulated in late November, sparking outrage on both sides
of the aisle. The executive order that ended up passing on Thursday was substantively identical
to the draft. That included the controversial measure of establishing a dedicated task force
within the DOJ to start a campaign of litigation against states with their own AI laws.
The order also instructed the Commerce Department to withhold federal broadband funding from states that had, in the words of the EO, onerous AI laws.
There are three big issues that the EO brings up when it comes to state-level regulations.
First, they said by definition it creates a patchwork of 50 different regulatory regimes that makes compliance especially for startups, particularly challenging.
Second, the White House claims, quote, state laws are increasingly responsible for requiring entities to embed ideological bias within models.
Third, they say, state laws sometimes impermissibly regulate beyond state laws.
state borders impinging on interstate commerce. Now, of course, the Democratic side of the aisle immediately
had a lot to say about this. Scott Wiener, who has been extensively involved in state AI legislation
in California, said, it's absurd for Trump to think he can weaponize the DOJ in commerce to undermine
those state rights. If the Trump administration tries to enforce this ridiculous order, we will see
them in court. Federal Senator Brian Schatz has already sponsored a bill that would overturn the order.
Shatz drew on the criticism that this order blocks state law and replaces it with nothing,
commenting, Congress has a responsibility to get this technology right and quickly, but states must
be allowed to act in the public interest in the meantime. Now, as I mentioned before, the order also
triggered infighting for Republicans who are worried that AI will be a losing issue in the midterms.
Writes the Washington Post, populist forces within the Republican Party mounted an extensive
campaign to derail the action after a draft of the order leaked last month, arguing that fears over
AI's potential to automate jobs would undermine the party's message to workers. Now, the post said,
a handful of tech leaders neutralized those fears for now,
convincing the president, a longtime real estate developer,
that burdensome regulation could cripple the industry.
White House AIsR David Sachs did take to Twitter slash X
to have some conciliatory words on at least a few of the concerns from the right.
He called them the four Cs, child safety, communities, creators, and censorship.
On child safety, he said,
preemption would not apply to generally applicable state laws.
So state laws requiring online platforms to protect children from online predators
or sexually explicit material would remain in effect.
On communities, he said, AI preemption would not apply to local infrastructure. In short, preemption would not force communities to host data centers they don't want. On creators, he said, copyright law is already federal, so there's no need for preemption here. Questions about how copyright law should be applied to AI are already playing out in the courts. That's where this issue will be decided. And on censorship, he claimed, as mentioned, the biggest threat of censorship is coming from certain blue states. Red states can't stop this. Only President Trump's leadership at the federal level can. Still, it does not seem all is resolved when it comes.
to AI politics on the right. The Post describes a, quote, simmering rift between the populace
and tech factions of the Republican Party, with one source saying, it feels like millions of votes
across the country just got traded for thousands of VCs and tech-rich votes in regions
Republicans will never win. Now, moving over to another recent move. Last week, the president
announced that Nvidia's previous generation H-200 chips would be approved for export, the first time
that unmodified Western versions of the chips had been approved in over three years. That news was
immediately followed by reports that Beijing was meeting with tech firms and considering how
tightly to restrict access. Basically, the strategic consideration for China is how much to allow
in these new chips, which could accelerate the output of their labs, versus to continue to focus
on their domestic chip industry, which while potentially slowing down those outputs in the short
term could create long-term resilience and independence. Speaking with Bloomberg on Friday,
AIsar David Sachs said, China's rejecting our chips. Apparently they don't want them. And I think
the reason for that is they want semiconductor independence.
Now, he cited Financial Times reporting here rather than inside communications.
Still, the comments highlight that the chip strategy may be too late.
The logic of granting access to H-200s was largely that the U.S. needs to get ahead of China
developing their own advanced chips.
And if NVIDIA can't flood China with their chips, then that sort of puts the strategy
in jeopardy.
Invidia, meanwhile, said, while we do not yet have results to report, it's clear that three
years of overbroad export controls fueled America's foreign competitors and cost U.S. taxpayers
billions of dollars. Added sacks, what you see is China's not taking them because they want to prop up
and subsidize Huawei. That was part of our calculation of selling not the best but lagging chips to China
is that you can take market share away from Huawei. But I think the Chinese government has figured
that out and that's why they're not allowing them. To that point, Bloomberg is reporting that Beijing
is preparing a $70 billion package to incentivize domestic chipmaking. Final details including
target companies are still to be determined, but this could be the largest ever state-backed investment
and semiconductors. For comparison, $39 billion was allocated to the Chipsack subsidies in the U.S.
And the EU is currently putting together a $46 billion package for their domestic industry.
Moving over to models, GPT 5.2 has been out for a few days and the independent benchmarking results
are in. The model is now tied for leader in the overall artificial analysis intelligence index,
nuzzling up together with Gemini 3 Pro. On their coding index, the model also tied for first place
with Gemini 3 Pro, with Claude 4.5 Opus a couple of points behind.
for any of you who follow developers on X and see the difference of opinion on Opus 4.5
versus all these models is exactly the sort of reason why you need to be skeptical of the overall
value of benchmarks. On their agentic index, GPT5.2 is in second place to Opus 4.5, but slightly
ahead of Gemini 3 Pro. Overall, all these results really do to show is that with 5.2, OpenAI
now has a credible competitor to the other big labs. It is not decidedly and clearly better
than the other models, but it is a meaningful bump from GPT5 and 5.
one. Now, recent reporting suggested that Code Red would continue until next year, and these results,
I think, helped show why. Now, one particularly interesting result was on GDP Val. That benchmark,
you might remember, was developed by OpenAI and seeks to measure agenda capabilities by giving
models real-world white-collar tasks with established economic value, unlike some other benchmarks
it measures end-end task completion. Artificial analysis recently developed an independent AI evaluator for the
tasks that allows them to include GDP Val in their assessment suite. When OpenA.
announced it, they were using real-world experts in addition to an experimental AI assessor.
On that benchmark 5-2 managed to top the leaderboards, pulling ahead of Opus 4.5 by a decent margin.
I think people are still trying to wrap their head around GDP Val and come to a common sense
understanding of just how valuable the benchmark is. But again, this just further solidifies to me
that there is a very tight, clear competition with the premier models of all the major foundation
labs. We will see if OpenAI can change that with their next release, which is anticipated in January.
however, that is going to do it for today's headlines.
Appreciate you listening or watching as always, and until next time, peace.
Hello, friends, if you've been enjoying what we've been discussing on the show,
you'll want to check out another podcast that I have had the privilege to host,
which is called You Can With AI from KPMG.
Season 1 was designed to be a set of real stories from real leaders,
making AI work in their organizations,
and now season 2 is coming and we're back with even bigger conversations.
This show is entirely focused on what it's like to actually drive
AI changed inside your enterprise and as case studies, expert panels, and a lot more practical
goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple,
Spotify, or YouTube and subscribe today. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential
of your team with AI-powered search, chat, and agents, or build your own agent with studio.
Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform,
so it's always working in the context of your work.
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps
and delivers personalized AI insights from day one.
Robo is already built into Jira, Confluence, and Jira service management standard,
premium, and enterprise subscriptions.
Know the feeling when AI turns from tool to teammate. If you rovo, you know.
Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com.
AI isn't a one-off project. It's a partnership that has to evolve as the technology does.
Robots and pencils work side by side with clients to bring practical AI into every phase.
Automation, personalization, decision support, and optimization. They prove what works through applied experimentation,
and build systems that amplify human potential.
As an AWS-certified partner with global delivery centers,
robots and pencils combines reach with high-touch service,
where others hand off they stay engaged,
because partnership isn't a project plan.
It's a commitment.
As AI advances, so will their solutions.
That's long-term value.
Progress starts with the right partner.
Start with robots and pencils at robots and pencils.com
slash AI Daily Brief.
This episode is brought to you by Blitzy,
the Enterprise Autonomous Software Development Platform with infinite code context.
Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale
codebases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzy platform,
bringing in their development requirements.
The Blitzy platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the
final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie as their pre-IDE development tool,
pairing it with their coding pilot of choice to bring an AI-Native SDLC into their org.
Visit Blitzy.com and press get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI Native.
Welcome back to the AI Daily Brief.
Today we're getting a little bit more technical than we normally do.
But there's a reason for that.
One of the big themes of 2025 was supposed to be AI agents.
And while I would argue that that came true, it was a little bit more nuanced than I think
people thought it would be going into it.
I believe that the expectation was that we would see agents proliferate across the
enterprise.
Instead, what we got was, one, coding agents becoming the most important breakout category
and AI writ large.
And two, a lot of infrastructure and standards type work around how we build agents that
set us up for that sort of maturity and proliferation in the years to come.
Now, around that, one of the things that's been interesting is,
to see how companies, even very fiercely competitive companies in the space,
have frequently decided over the course of the last year
to adopt each other's standards rather than trying to compete around standards.
We saw this, of course, with MCP,
which became a standard way adopted by Google and OpenAI and Microsoft,
even though it originated with Anthropic,
to allow LLMs and AI applications to access outside information,
and now it appears that something similar might be happening with skills.
At the end of last week,
A number of folks on Twitter slash X, including Simon Willison,
notice that Anthropic's skills mechanism was starting to show up in the open AI ecosystem.
So let's talk about what skills are and why this could be a big deal.
Back in October, Anthropic introduced agent skills,
which they called a new way to build specialized agents using files and folders.
And at core, files and folders are what skills are.
Specifically, Anthropic writes that skills are organized folders of instructions,
scripts, and resources that agents can discover and load dynamically to
perform better at specific tasks. The goal is to allow general purpose agents to become specialized
agents in the context of the work that they're doing at the time. And in many ways, when Anthropic
introduced this, that seemed to be the goal. Instead of developers having to build this complicated
balkanized and fragmented landscape of custom-designed agents for every single different use
case, by making capabilities and knowledge composable and accessible on demand, a much
less fragmented landscape of generalized agents,
could access those capabilities and knowledge when needed to become specialized agents.
A skill is basically a folder or a directory that contains a file called skill.md.
In other words, a markdown file.
That file has a name, a description, and instructions.
When an agent that has access to skills starts up,
it loads the names and descriptions of all installed skills into its systems prompt.
And then when a relevant task comes up, Claude can read the full instructions.
This is what Anthropic calls progressive disclosure.
Claude only loads context when it needs it.
In other words,
Claude doesn't have to waste a bunch of time,
loading up all the instructions in each skill.
It can just sort through that name and description metadata
to figure out which skills it should be accessing for a particular task.
So layer one of progressive disclosure
is that basic metadata of a name and a description.
The second layer of detail is the actual body of the file,
with instructions, procedural knowledge, context, whatever it may be.
if there is even additional content that can also be bundled underneath, leading to a third level
of progressive disclosure. In that announcement, Postanthropic wrote, as skills grow in complexity,
they may contain too much context to fit into a single skill.md or context that's relevant only in
specific scenarios. In these cases, skills can bundle additional files within the skill directory
and reference them by name from skill.md. These additional linked files are the third level and beyond
of detail, which Claude can choose to navigate and discover only as needed. In the example they give,
which is a comprehensive PDF toolkit for extracting text and tables,
the second layer overview includes a line
for advanced features, JavaScript libraries, and detailed examples,
see reference.md, and if you need to fill out a PDF form,
read forms.md, and follow its instructions.
This is that bundling of additional content.
So like I said, sometimes skills are going to include procedural knowledge,
sometimes they're going to include background and context,
sometimes they're going to include code.
For example, instead of Claude generating code to extract PDF form fields,
A skill might include a Python script that does it reliably.
So there are a bunch of theoretical benefits of this system.
Skill files are markdown files, meaning that anyone can write them.
This allows for customization without engineering.
If you can write instructions for a human, you can write instructions that become part of a skill.
The second benefit is efficiency.
Progressive disclosure means that context is only loaded when it's needed so that the user isn't
burning tokens on irrelevant instructions.
There is the composability benefit in the fact that skills stack.
You can have multiple skills working to.
together instead of building single-purpose agents, there's reliability. We just mentioned that
coding example and skills can include code that runs deterministically, instead of it being
regenerated every single time. And finally, there's portability. Institutional knowledge gets captured
in a format that persists and can be transferred, meaning that new users or agents can access it
immediately. So basically, if the model context protocol is an open standard for allowing
LLMs to connect to external tools and data sources in a uniform way, skills are a standard
for specialized instructions and context that allow LLMs or agents to perform specialized tasks
without the user having to re-explain the process every time. Now, when skills came out,
there was a lot of excitement about them. AI engineering thought leader Simon Willison,
for example, wrote a post called Claude Skills are awesome, maybe a bigger deal than MCP.
Now, Simon's core argument comes down to efficiency and simplicity. Back in October, he wrote,
Model Context Protocol has attracted an enormous amount of buzz since its initial release back in November last year.
Over time, the limitations of MCP have started to emerge.
The most significant is in terms of token usage.
GitHub's official MCP on its own famously consumes tens of thousands of tokens of context.
And once you've added a few more to that, there's precious little space left for the LLM to actually do useful work.
Simon continued,
My own interest in MCPs has waned ever since I started taking coding agents seriously.
Almost everything I might achieve with an MCP can be handled by a CLI,
or command line interface instead.
LLMs know how to call CLI tool help,
which means you don't have to spend many tokens
describing how to use them.
The model can figure it out later when it needs to.
Skills have the exact same advantage,
only now I don't even need to implement a new CLI tool.
I can drop a markdown file in describing how to do a task efficiently,
adding extra scripts only if they'll make things more reliable or efficient.
Now, trying to simplify this as much as possible,
basically what Simon is saying is that with MCP,
you have to build something for Cod to use a tool.
With a CLI,
Claude can just use tools that already exist. But with skills, Claude can just read instructions
you wrote and figure it out. And indeed, to Simon, as he puts it, the simplicity is the point.
He writes, one of the most exciting things about skills is how easy they are to share. I expect
many skills will be implemented as a single file. More sophisticated ones will be a folder with a few more.
Something I love about the design of skills is that there is nothing at all preventing them from being
used with other models. You can grab a skills folder right now, point Codex CLI or Gemini
ICLI add it and say read PDF slash skill MD and then create me a PDF describing this project and it will
despite those tools and models having no baked in knowledge of the skills system. I expect we'll see a
Cambrian explosion of skills which will make this year's MCP rush look pedestrian by comparison. The
core simplicity of the skills design is why I'm so excited about it. Now in retrospect that looks a little
prophetic. Sean Wang, Swicks wrote, I was skeptical when Simon Willison said that Claude Skills are
awesome may be a bigger deal than MCP, but early indications are this is correct. He then shared a talk
from the recent AI Engineer Code Summit, which he said is the fastest talk to ever pass 100,000
views on the AI Engineer channel. The talk, by the way, was about why we should stop building
agents and start building skills. The problem they identified was intelligent agents lack expertise,
genius without experiences, they put it. The solution is a new architecture with skills. A skill,
they say, is an expert in a folder. And the new App Store for AI are the skills that they can access.
The old way then are monolithic agents that have a separate agent for each domain,
hard-coded or prompted in context, and which doesn't improve over time,
while the new way, agents plus skills, are a general agent with many skills,
packaged in simple reusable folders that enable continuous and tangible learning.
Then at the end of last week, people started to notice skills showing up in the OpenAI ecosystem.
AI techie Arun writes,
OpenAI just quietly stole Anthropics homework and it's brilliant.
Open AI integrated anthropic skills mechanism to chat GPT and Codex,
allowing the models to dynamically manage files like spreadsheets and PDFs.
This modular approach to agent capabilities is proving to be a foundational piece of next-gen
LLMs. Simon Willison also picked up on this. On Friday, he wrote,
OpenAI aren't talking about it yet, but it turns out they've adopted Anthropics
brilliant skills mechanism in a big way. Skills are now live in both ChatGBT and their Codex
CLI tool. This was confirmed a couple days later by Tebow at OpenAI, who wrote,
We've added experimental support for skills and it combines well with GPT5 too.
already seeing some cool things in the wild that leverage skills in codex.
I think about skills as an extension of Agents.D with progressive disclosure.
By the way, agents.md was OpenAIs, lightweight markdown standard
for providing AI coding agents specifically with project-specific instructions.
So thinking in a similar domain.
Now, in Simon's new post, he wrote,
one of the things that most excited me about Anthropics' new skills mechanism back in October
is how easy it looked for other platforms to implement.
A skill is just a folder with a markdown file and some optional extra resources and scripts,
so any LLM with the ability to navigate and read from a file system should be capable of using them.
It turns out, OpenAI are doing exactly that with skills support quietly showing up in both their
Codex CLI tool and now also in chat GPT itself.
Now, so far, people are just starting to experiment and figure out how they work in OpenAI.
But as Simon summed up, when I first wrote about skills in October, I said they're awesome,
maybe a bigger deal than MCP.
The fact that it's just turned December and OpenAI have already leaned into them in a big way
reinforces to me I called that one correctly.
Hold aside Simon.
good call. This to me is continued evidence that it matters way more to these foundation lab companies
to move at the speed of development than to own the standard. Kishan wrote, OpenAI seems comfortable
to let Anthropic create standards like MCP and skills, then adopt them later. Skills are wonderfully simple,
and I wish all the CLI agents adopt the pattern. Look, even though 2025 was a big year for agents
in a lot of ways, it's still very clear that we are so barely scratching the surface of what's possible.
and one of the things that will accelerate us heading into 2026 is the common adoption of these mutual standards.
So super interesting stuff, excited to see what people go build with this.
For now, that is going to do it for today's AI Daily Brief.
Appreciate you listening or watching, as always.
And until next time, peace.
