The AI Daily Brief: Artificial Intelligence News and Analysis - Will This OpenAI Update Make AI Agents Work Better?

Starting point is 00:00:00 Today on the AI Daily Brief, why Open AI are adopting the skills mechanism and how it could improve agents. Before that in headlines, the fallout from the latest White House executive order on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Rovo, Robots and Pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a note at sponsors at AIdailybrief. We can send you all the information you need.

Starting point is 00:00:44 Also at AIdailybrief.aI, you can find out anything else you might need to know about the podcast. We're going to be doing a few more days of this newsletter test this week before reviewing and seeing what the plan is for January. For now, like I said, you can find that all on AIDailybrief.aI. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Last week, after a lot of behind-the-scenes discourse, some of which spilled into very public acrimony, President Trump signed a highly contentious order attempting to block states from passing their own AI regulations.

Starting point is 00:01:15 Now, this is one of those classic debates that's about 100 things at once. To take the administration at face value, this is about creating a single federal rulebook as a necessary step to ensuring the U.S. can win the AI race. But then, of course, underneath that, there are issues of the power relationship between the federal government and states. That's one that's been big here in the U.S. for the last 250 years or so. And there's also the sub-story of the GOP fracturing around Trump's alliance with AI technology companies. A draft of the order had circulated in late November, sparking outrage on both sides of the aisle. The executive order that ended up passing on Thursday was substantively identical

Starting point is 00:01:49 to the draft. That included the controversial measure of establishing a dedicated task force within the DOJ to start a campaign of litigation against states with their own AI laws. The order also instructed the Commerce Department to withhold federal broadband funding from states that had, in the words of the EO, onerous AI laws. There are three big issues that the EO brings up when it comes to state-level regulations. First, they said by definition it creates a patchwork of 50 different regulatory regimes that makes compliance especially for startups, particularly challenging. Second, the White House claims, quote, state laws are increasingly responsible for requiring entities to embed ideological bias within models. Third, they say, state laws sometimes impermissibly regulate beyond state laws. state borders impinging on interstate commerce. Now, of course, the Democratic side of the aisle immediately

Starting point is 00:02:33 had a lot to say about this. Scott Wiener, who has been extensively involved in state AI legislation in California, said, it's absurd for Trump to think he can weaponize the DOJ in commerce to undermine those state rights. If the Trump administration tries to enforce this ridiculous order, we will see them in court. Federal Senator Brian Schatz has already sponsored a bill that would overturn the order. Shatz drew on the criticism that this order blocks state law and replaces it with nothing, commenting, Congress has a responsibility to get this technology right and quickly, but states must be allowed to act in the public interest in the meantime. Now, as I mentioned before, the order also triggered infighting for Republicans who are worried that AI will be a losing issue in the midterms.

Starting point is 00:03:09 Writes the Washington Post, populist forces within the Republican Party mounted an extensive campaign to derail the action after a draft of the order leaked last month, arguing that fears over AI's potential to automate jobs would undermine the party's message to workers. Now, the post said, a handful of tech leaders neutralized those fears for now, convincing the president, a longtime real estate developer, that burdensome regulation could cripple the industry. White House AIsR David Sachs did take to Twitter slash X to have some conciliatory words on at least a few of the concerns from the right.

Starting point is 00:03:39 He called them the four Cs, child safety, communities, creators, and censorship. On child safety, he said, preemption would not apply to generally applicable state laws. So state laws requiring online platforms to protect children from online predators or sexually explicit material would remain in effect. On communities, he said, AI preemption would not apply to local infrastructure. In short, preemption would not force communities to host data centers they don't want. On creators, he said, copyright law is already federal, so there's no need for preemption here. Questions about how copyright law should be applied to AI are already playing out in the courts. That's where this issue will be decided. And on censorship, he claimed, as mentioned, the biggest threat of censorship is coming from certain blue states. Red states can't stop this. Only President Trump's leadership at the federal level can. Still, it does not seem all is resolved when it comes. to AI politics on the right. The Post describes a, quote, simmering rift between the populace and tech factions of the Republican Party, with one source saying, it feels like millions of votes

Starting point is 00:04:32 across the country just got traded for thousands of VCs and tech-rich votes in regions Republicans will never win. Now, moving over to another recent move. Last week, the president announced that Nvidia's previous generation H-200 chips would be approved for export, the first time that unmodified Western versions of the chips had been approved in over three years. That news was immediately followed by reports that Beijing was meeting with tech firms and considering how tightly to restrict access. Basically, the strategic consideration for China is how much to allow in these new chips, which could accelerate the output of their labs, versus to continue to focus on their domestic chip industry, which while potentially slowing down those outputs in the short

Starting point is 00:05:10 term could create long-term resilience and independence. Speaking with Bloomberg on Friday, AIsar David Sachs said, China's rejecting our chips. Apparently they don't want them. And I think the reason for that is they want semiconductor independence. Now, he cited Financial Times reporting here rather than inside communications. Still, the comments highlight that the chip strategy may be too late. The logic of granting access to H-200s was largely that the U.S. needs to get ahead of China developing their own advanced chips. And if NVIDIA can't flood China with their chips, then that sort of puts the strategy

Starting point is 00:05:39 in jeopardy. Invidia, meanwhile, said, while we do not yet have results to report, it's clear that three years of overbroad export controls fueled America's foreign competitors and cost U.S. taxpayers billions of dollars. Added sacks, what you see is China's not taking them because they want to prop up and subsidize Huawei. That was part of our calculation of selling not the best but lagging chips to China is that you can take market share away from Huawei. But I think the Chinese government has figured that out and that's why they're not allowing them. To that point, Bloomberg is reporting that Beijing is preparing a $70 billion package to incentivize domestic chipmaking. Final details including

Starting point is 00:06:12 target companies are still to be determined, but this could be the largest ever state-backed investment and semiconductors. For comparison, $39 billion was allocated to the Chipsack subsidies in the U.S. And the EU is currently putting together a $46 billion package for their domestic industry. Moving over to models, GPT 5.2 has been out for a few days and the independent benchmarking results are in. The model is now tied for leader in the overall artificial analysis intelligence index, nuzzling up together with Gemini 3 Pro. On their coding index, the model also tied for first place with Gemini 3 Pro, with Claude 4.5 Opus a couple of points behind. for any of you who follow developers on X and see the difference of opinion on Opus 4.5

Starting point is 00:06:50 versus all these models is exactly the sort of reason why you need to be skeptical of the overall value of benchmarks. On their agentic index, GPT5.2 is in second place to Opus 4.5, but slightly ahead of Gemini 3 Pro. Overall, all these results really do to show is that with 5.2, OpenAI now has a credible competitor to the other big labs. It is not decidedly and clearly better than the other models, but it is a meaningful bump from GPT5 and 5. one. Now, recent reporting suggested that Code Red would continue until next year, and these results, I think, helped show why. Now, one particularly interesting result was on GDP Val. That benchmark, you might remember, was developed by OpenAI and seeks to measure agenda capabilities by giving

Starting point is 00:07:31 models real-world white-collar tasks with established economic value, unlike some other benchmarks it measures end-end task completion. Artificial analysis recently developed an independent AI evaluator for the tasks that allows them to include GDP Val in their assessment suite. When OpenA. announced it, they were using real-world experts in addition to an experimental AI assessor. On that benchmark 5-2 managed to top the leaderboards, pulling ahead of Opus 4.5 by a decent margin. I think people are still trying to wrap their head around GDP Val and come to a common sense understanding of just how valuable the benchmark is. But again, this just further solidifies to me that there is a very tight, clear competition with the premier models of all the major foundation

Starting point is 00:08:10 labs. We will see if OpenAI can change that with their next release, which is anticipated in January. however, that is going to do it for today's headlines. Appreciate you listening or watching as always, and until next time, peace. Hello, friends, if you've been enjoying what we've been discussing on the show, you'll want to check out another podcast that I have had the privilege to host, which is called You Can With AI from KPMG. Season 1 was designed to be a set of real stories from real leaders, making AI work in their organizations,

Starting point is 00:08:43 and now season 2 is coming and we're back with even bigger conversations. This show is entirely focused on what it's like to actually drive AI changed inside your enterprise and as case studies, expert panels, and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple, Spotify, or YouTube and subscribe today. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.

Starting point is 00:09:24 Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence, and Jira service management standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com. AI isn't a one-off project. It's a partnership that has to evolve as the technology does.

Starting point is 00:10:01 Robots and pencils work side by side with clients to bring practical AI into every phase. Automation, personalization, decision support, and optimization. They prove what works through applied experimentation, and build systems that amplify human potential. As an AWS-certified partner with global delivery centers, robots and pencils combines reach with high-touch service, where others hand off they stay engaged, because partnership isn't a project plan. It's a commitment.

Starting point is 00:10:26 As AI advances, so will their solutions. That's long-term value. Progress starts with the right partner. Start with robots and pencils at robots and pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale

Starting point is 00:10:49 codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-Native SDLC into their org.

Starting point is 00:11:19 Visit Blitzy.com and press get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI Native. Welcome back to the AI Daily Brief. Today we're getting a little bit more technical than we normally do. But there's a reason for that. One of the big themes of 2025 was supposed to be AI agents. And while I would argue that that came true, it was a little bit more nuanced than I think people thought it would be going into it. I believe that the expectation was that we would see agents proliferate across the

Starting point is 00:11:52 enterprise. Instead, what we got was, one, coding agents becoming the most important breakout category and AI writ large. And two, a lot of infrastructure and standards type work around how we build agents that set us up for that sort of maturity and proliferation in the years to come. Now, around that, one of the things that's been interesting is, to see how companies, even very fiercely competitive companies in the space, have frequently decided over the course of the last year

Starting point is 00:12:18 to adopt each other's standards rather than trying to compete around standards. We saw this, of course, with MCP, which became a standard way adopted by Google and OpenAI and Microsoft, even though it originated with Anthropic, to allow LLMs and AI applications to access outside information, and now it appears that something similar might be happening with skills. At the end of last week, A number of folks on Twitter slash X, including Simon Willison,

Starting point is 00:12:42 notice that Anthropic's skills mechanism was starting to show up in the open AI ecosystem. So let's talk about what skills are and why this could be a big deal. Back in October, Anthropic introduced agent skills, which they called a new way to build specialized agents using files and folders. And at core, files and folders are what skills are. Specifically, Anthropic writes that skills are organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. The goal is to allow general purpose agents to become specialized

Starting point is 00:13:15 agents in the context of the work that they're doing at the time. And in many ways, when Anthropic introduced this, that seemed to be the goal. Instead of developers having to build this complicated balkanized and fragmented landscape of custom-designed agents for every single different use case, by making capabilities and knowledge composable and accessible on demand, a much less fragmented landscape of generalized agents, could access those capabilities and knowledge when needed to become specialized agents. A skill is basically a folder or a directory that contains a file called skill.md. In other words, a markdown file.

Starting point is 00:13:49 That file has a name, a description, and instructions. When an agent that has access to skills starts up, it loads the names and descriptions of all installed skills into its systems prompt. And then when a relevant task comes up, Claude can read the full instructions. This is what Anthropic calls progressive disclosure. Claude only loads context when it needs it. In other words, Claude doesn't have to waste a bunch of time,

Starting point is 00:14:12 loading up all the instructions in each skill. It can just sort through that name and description metadata to figure out which skills it should be accessing for a particular task. So layer one of progressive disclosure is that basic metadata of a name and a description. The second layer of detail is the actual body of the file, with instructions, procedural knowledge, context, whatever it may be. if there is even additional content that can also be bundled underneath, leading to a third level

Starting point is 00:14:37 of progressive disclosure. In that announcement, Postanthropic wrote, as skills grow in complexity, they may contain too much context to fit into a single skill.md or context that's relevant only in specific scenarios. In these cases, skills can bundle additional files within the skill directory and reference them by name from skill.md. These additional linked files are the third level and beyond of detail, which Claude can choose to navigate and discover only as needed. In the example they give, which is a comprehensive PDF toolkit for extracting text and tables, the second layer overview includes a line for advanced features, JavaScript libraries, and detailed examples,

Starting point is 00:15:11 see reference.md, and if you need to fill out a PDF form, read forms.md, and follow its instructions. This is that bundling of additional content. So like I said, sometimes skills are going to include procedural knowledge, sometimes they're going to include background and context, sometimes they're going to include code. For example, instead of Claude generating code to extract PDF form fields, A skill might include a Python script that does it reliably.

Starting point is 00:15:34 So there are a bunch of theoretical benefits of this system. Skill files are markdown files, meaning that anyone can write them. This allows for customization without engineering. If you can write instructions for a human, you can write instructions that become part of a skill. The second benefit is efficiency. Progressive disclosure means that context is only loaded when it's needed so that the user isn't burning tokens on irrelevant instructions. There is the composability benefit in the fact that skills stack.

Starting point is 00:16:00 You can have multiple skills working to. together instead of building single-purpose agents, there's reliability. We just mentioned that coding example and skills can include code that runs deterministically, instead of it being regenerated every single time. And finally, there's portability. Institutional knowledge gets captured in a format that persists and can be transferred, meaning that new users or agents can access it immediately. So basically, if the model context protocol is an open standard for allowing LLMs to connect to external tools and data sources in a uniform way, skills are a standard for specialized instructions and context that allow LLMs or agents to perform specialized tasks

Starting point is 00:16:35 without the user having to re-explain the process every time. Now, when skills came out, there was a lot of excitement about them. AI engineering thought leader Simon Willison, for example, wrote a post called Claude Skills are awesome, maybe a bigger deal than MCP. Now, Simon's core argument comes down to efficiency and simplicity. Back in October, he wrote, Model Context Protocol has attracted an enormous amount of buzz since its initial release back in November last year. Over time, the limitations of MCP have started to emerge. The most significant is in terms of token usage. GitHub's official MCP on its own famously consumes tens of thousands of tokens of context.

Starting point is 00:17:09 And once you've added a few more to that, there's precious little space left for the LLM to actually do useful work. Simon continued, My own interest in MCPs has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI, or command line interface instead. LLMs know how to call CLI tool help, which means you don't have to spend many tokens describing how to use them.

Starting point is 00:17:30 The model can figure it out later when it needs to. Skills have the exact same advantage, only now I don't even need to implement a new CLI tool. I can drop a markdown file in describing how to do a task efficiently, adding extra scripts only if they'll make things more reliable or efficient. Now, trying to simplify this as much as possible, basically what Simon is saying is that with MCP, you have to build something for Cod to use a tool.

Starting point is 00:17:51 With a CLI, Claude can just use tools that already exist. But with skills, Claude can just read instructions you wrote and figure it out. And indeed, to Simon, as he puts it, the simplicity is the point. He writes, one of the most exciting things about skills is how easy they are to share. I expect many skills will be implemented as a single file. More sophisticated ones will be a folder with a few more. Something I love about the design of skills is that there is nothing at all preventing them from being used with other models. You can grab a skills folder right now, point Codex CLI or Gemini ICLI add it and say read PDF slash skill MD and then create me a PDF describing this project and it will

Starting point is 00:18:26 despite those tools and models having no baked in knowledge of the skills system. I expect we'll see a Cambrian explosion of skills which will make this year's MCP rush look pedestrian by comparison. The core simplicity of the skills design is why I'm so excited about it. Now in retrospect that looks a little prophetic. Sean Wang, Swicks wrote, I was skeptical when Simon Willison said that Claude Skills are awesome may be a bigger deal than MCP, but early indications are this is correct. He then shared a talk from the recent AI Engineer Code Summit, which he said is the fastest talk to ever pass 100,000 views on the AI Engineer channel. The talk, by the way, was about why we should stop building agents and start building skills. The problem they identified was intelligent agents lack expertise,

Starting point is 00:19:06 genius without experiences, they put it. The solution is a new architecture with skills. A skill, they say, is an expert in a folder. And the new App Store for AI are the skills that they can access. The old way then are monolithic agents that have a separate agent for each domain, hard-coded or prompted in context, and which doesn't improve over time, while the new way, agents plus skills, are a general agent with many skills, packaged in simple reusable folders that enable continuous and tangible learning. Then at the end of last week, people started to notice skills showing up in the OpenAI ecosystem. AI techie Arun writes,

Starting point is 00:19:40 OpenAI just quietly stole Anthropics homework and it's brilliant. Open AI integrated anthropic skills mechanism to chat GPT and Codex, allowing the models to dynamically manage files like spreadsheets and PDFs. This modular approach to agent capabilities is proving to be a foundational piece of next-gen LLMs. Simon Willison also picked up on this. On Friday, he wrote, OpenAI aren't talking about it yet, but it turns out they've adopted Anthropics brilliant skills mechanism in a big way. Skills are now live in both ChatGBT and their Codex CLI tool. This was confirmed a couple days later by Tebow at OpenAI, who wrote,

Starting point is 00:20:12 We've added experimental support for skills and it combines well with GPT5 too. already seeing some cool things in the wild that leverage skills in codex. I think about skills as an extension of Agents.D with progressive disclosure. By the way, agents.md was OpenAIs, lightweight markdown standard for providing AI coding agents specifically with project-specific instructions. So thinking in a similar domain. Now, in Simon's new post, he wrote, one of the things that most excited me about Anthropics' new skills mechanism back in October

Starting point is 00:20:39 is how easy it looked for other platforms to implement. A skill is just a folder with a markdown file and some optional extra resources and scripts, so any LLM with the ability to navigate and read from a file system should be capable of using them. It turns out, OpenAI are doing exactly that with skills support quietly showing up in both their Codex CLI tool and now also in chat GPT itself. Now, so far, people are just starting to experiment and figure out how they work in OpenAI. But as Simon summed up, when I first wrote about skills in October, I said they're awesome, maybe a bigger deal than MCP.

Starting point is 00:21:08 The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me I called that one correctly. Hold aside Simon. good call. This to me is continued evidence that it matters way more to these foundation lab companies to move at the speed of development than to own the standard. Kishan wrote, OpenAI seems comfortable to let Anthropic create standards like MCP and skills, then adopt them later. Skills are wonderfully simple, and I wish all the CLI agents adopt the pattern. Look, even though 2025 was a big year for agents in a lot of ways, it's still very clear that we are so barely scratching the surface of what's possible.

Starting point is 00:21:41 and one of the things that will accelerate us heading into 2026 is the common adoption of these mutual standards. So super interesting stuff, excited to see what people go build with this. For now, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching, as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Will This OpenAI Update Make AI Agents Work Better?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.