The AI Daily Brief: Artificial Intelligence News and Analysis - How to Use Agent Skills

Episode Date: March 18, 2026

The team behind Claude Code's agent skills shares lessons on building, testing, and organizing skills — and the concept is converging across the entire AI stack, from hardcore developers to main...stream tools like Notion. Whether you're orchestrating multi-agent teams or just trying to get an AI to reliably do one task your way, skills represent a shift from ad hoc prompting to reusable, repeatable capabilities. In the headlines: Claude Cowork gets mobile control via Dispatch, China's government grows wary of Open Claw, and Andy Jassy sees AI doubling AWS revenue.Brought to you by:KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG’s new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.kpmg.us/Navigate⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Mercury - Modern banking for business and now personal accounts. Learn more at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://mercury.com/personal-banking⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AIUC-1 - Get your agents certified to communicate trust to enterprise buyers - ⁠⁠⁠⁠⁠⁠⁠https://www.aiuc-1.com/⁠⁠⁠⁠⁠⁠⁠Blitzy - Want to accelerate enterprise software development velocity by 5x? ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pod.link/1680633614⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Our Newsletter is BACK: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://aidailybrief.beehiiv.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Interested in sponsoring the show? sponsors@aidailybrief.ai

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, how the team that designed agent skills uses agent skills, and before that in the headlines, you can now control Claude Co-work from your phone. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Blitzy, AIUC, and Mercury. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. Add free is just $3 a month.
Starting point is 00:00:37 If you are interested in sponsoring the show, send us a note at sponsors at AIDailybrief.aI. At this point, we are firmly selling into the summer, so if you are planning campaigns in the future, it is a good time to reach out. And of course, if you need to know anything else about the ecosystem, you can also find that on AIDilydief.aI. I would once again point you to the newsletter,
Starting point is 00:00:55 which is back, and is basically the best way to get access to the links that I talk about in the show. Again, you can get that all on AIDily Brief. AI. And with that out of the way, let's dive in. One of the interesting ways that you can tell what's really important to AI builders and people on the front lines is when there's a story that on the surface looks fairly small, but which is getting a disproportionate share of the conversation in AI circles. Our first story today is exactly that. On the surface, it's just a
Starting point is 00:01:26 simple new feature for Claude Co-work. In this case, it's called Dispatch, and it allows you to bring your Claude co-work with you on the go. That said, based on the reaction, 3 million views on the announcement tweet, 9,000 bookmarks, this one is a big deal to people. In the wake of OpenClaw, companies in the agent space have either been, A, releasing their own versions of OpenClaw, that was obviously the topic of our show yesterday, or they've been slowly adding the important features of OpenClaude to their existing product suites, which has been, of course, Anthropics approach. A couple weeks ago, we got remote control for Claude Code, which allowed users to initiate Claude sessions on their computer and then carry them onto their mobile devices where they could
Starting point is 00:02:04 control them doing whatever it was that they were doing, basically coding from the gym. Dispatch is basically that but for co-work. The co-work sessions are still hosted in a sandbox on your computer, meaning Claude still has the same access and protections. However, you can now kick off a co-work session and then continue monitoring progress and providing approvals while out and about. Anthropic described the feature as like having a walkie-talkie for communicating with Claude. Co-work developer, Felic Rysberg wrote, it feels pretty magical to give Claude a mission on my computer and get occasional updates like creating reports from internal dashboards or finding me a better seat on my next flight. Everything Claude can do on your computer, files,
Starting point is 00:02:38 browser tools are reachable from wherever you go. First impressions are good. Daniel San writes, testing co-work from my phone. The walkie-talkie analogy is spot on. Your phone becomes a remote control that talks to Claude running on your desktop. One more to the weekend testing list. Stay tuned, post and coming on how it works. Ethan Malik writes, after using it a bit, Claude Co-Work dispatch covers 90% of what I was trying to use OpenClaw for, but feels far less likely to upload my entire drive to a malware site. He continues what I like better. Easy, much more stable and safe.
Starting point is 00:03:09 Existing connectors mean better integration with Gmail, browsers, etc., very good tool use. What is missing for me, ability to invite Claude to any channel, the heartbeat and proactivity, and multiple sessions. Right now, dispatches one chat. Now, for hardcore open claw users, all of those things would be deal breakers, but this isn't necessarily about converting hardcore open claw users, it's about bringing those types of feature sets to the full spectrum of tools for all the different
Starting point is 00:03:32 types of agent users. Indeed, I think Powell Heron gets it right when he writes, the bigger story, code, co-work, web, and now dispatch are all converging towards the same thing, a persistent AI layer that follows you across devices and contexts. I think that is exactly right. This clarification that we keep talking about is actual just form factor adjustments as everyone figures out the right way for people to interact with agents across a variety of different use cases and behavior patterns. Speaking of OpenCla, one of the things that we've been tracking is the rise of OpenClaught in China. You might remember seeing a bunch of viral videos about people standing in line to get access to their first OpenClaas,
Starting point is 00:04:09 supported by some of the big Chinese tech companies, but apparently the Chinese government is now growing concerned. In recent weeks, regulators warn staff at government agencies and state-owned enterprises of the dangers of OpenClaw and advise them not to install the agent. This seems to be somewhere between a stern warning and an outright ban across different regions and entities. Last week, authorities released a list of six do's and don'ts for organizations deploying OpenClaw. Among their suggestions were using the official version and minimizing internet access and permissions. Adoption is so pervasive that the Hong Kong Monetary Authority,
Starting point is 00:04:40 which is basically their central bank, issued an official statement that they had no plans to deploy openclaw on their internal IT systems. Chinese media is now running OpenClawe's horror stories regarding privacy leaks and financial screw-ups, with one user apparently giving their open-claw access to a credit card, which was promptly run up to the limit. Wendy Chang, a senior analyst at the Mercator Institute for China Studies believes that OpenClaugh has a natural cultural resonance in China. She said most people view technology as a convenience, so when something new comes out, they're more willing to try it. Some have suggested OpenClaw being free and open source has a major role to play in its popularity. Many analysts have noted that Chinese tech firms
Starting point is 00:05:13 have struggled to monetize their models among consumers, as the concept of software subscriptions is far less developed in the East. Then for Professor Graham Webster, who focuses on geopolitics and tech, but who before that was my homemade and entrepreneurial collaborator at Northwestern back 20 years ago, suggested that the rise of open claw could be a flashpoint for China's AI industry. Until now, any and all experiments have been encouraged under a formal regional initiative called AI Plus. However, the clear privacy and security concerns could trigger a rethink, according to Webster, who said, it could be a moment that starts to cause the Chinese government to think about the downsides of widely available open models. It feels to me like there's an
Starting point is 00:05:48 interesting story brewing here, although I'm still not exactly sure what it is and what it says about where we are, but it's something that I'm going to continue to pay attention to. One flag related to that. While in general, optimism about AI is way higher in China than it is in the U.S., there was a huge spike in the term AI anxiety on WeChat in February, peaking in mid-March as Open Clawmania hit a crescendo. Tony Peng of RICO, China wrote, What is different this time is the mood. In those earlier waves, the mainstream mood was excitement awe and curiosity. This time, more and more people are expressing anxiety, fear, and concern. Tony argues that the most obvious reason is job insecurity. He writes, for most ordinary people in China, AI still means chatbots. Claude code or codex is not available.
Starting point is 00:06:29 There's no household AI agent with real penetration. Then all of a sudden, media reports are claiming open clock can handle a wide range of tasks autonomously, and the gap between what people knew and what they were being told deepened the sense of being left behind. In other words, even in a place with high AI optimism, the job displacement fear persists. Now, separate Shortly, Chinese authorities are taking a second look at META's acquisition of Manus. From the outset, it seemed that Manus had designed their corporate structure to circumvent controls on Chinese tech exports. The company relocated their headquarters from Beijing to Singapore in July of last year, shortly after they began taking capital from U.S. venture firms.
Starting point is 00:07:04 Sources said that officials at China's National Development and Reform Commission called executives from META and Manus to a meeting last week to express concerns over the deal. Government actions remain unclear, but they appear to include an effort to bar Manus executives from departing China for Singapore. The New York Times discussed a range of different options that Chinese officials might pursue, including clawing back data exports or declaring the relocation unlawful. This could be a reaction to growing concerns about losing AI talent to the West. However, some analysts have suggested it's just a maneuver to create leverage ahead of trade talks later this month. Meta is trying to present themselves as unconcerned with a spokesperson stating,
Starting point is 00:07:38 the transaction complied fully with applicable law. The outstanding team at Manus is now deeply integrated into Meta. We appreciate the appropriate resolution to the inquiry. One last one on China, NVIDIA says its restarting production as Chinese export plans get back on track. Speaking at a press conference on Tuesday, Jensen Huang said, We've been licensed for many customers in China. We've received purchase orders from many customers and we're in the process of restarting our manufacturing. Our supply chain is getting fired up. Now, the process for getting export approval for H-200s has been an on-again, off-again affair
Starting point is 00:08:07 since the idea was floated by President Trump back in December. The most recent chatter from the beginning of March was that Nvidia would shut down production and reallocate the fab time to producing next generation Vera Rubin hardware. No single catalyst was attributed to the decision, but export plans have seen multiple setbacks from both Beijing and Washington in recent months. Huang suggested on Tuesday that the squabbling within the Trump administration had been settled, commenting, President Trump's intention is that the United States should have a leadership position
Starting point is 00:08:32 in access to Nvidia's best technology. However, he would like us to compete worldwide and not concede those markets unnecessarily. Reuters, meanwhile, reported that it's all systems go from the Chinese side as well. sources familiar with the situation confirmed that Chinese authorities had granted approval for multiple companies to purchase H-200s. Earlier reports suggested demand was staggering, with multiple Chinese firms placing orders for hundreds of thousands of chips. That demand could go towards explaining Huang's new forecast that Nvidia could see a trillion dollars in sales by 2027. Lastly today, speaking of that big prediction from Jensen about revenue, Amazon CEO Andy Jassy also sees AI doubling revenue for AWS. According to Reuters sources, Jassy shared the lofty.
Starting point is 00:09:12 projection with staff at a recent all-hands. He said that over the long-term, AI could boost annual sales for AWS to $600 billion, double his prior estimate. Jesse said, I've been thinking for the last number of years that AWS, call it 10 years from now, could be a $300 billion annual revenue run-rate business. I think what's happening in AI that AWS has a chance to be at least double that. AWS most recently booked $128 billion in sales for 2025, 19% growth from the prior year, and while the numbers that he's throwing around seem big, the prediction might not be all that extravagant. This would represent 17% annual growth for the coming decade.
Starting point is 00:09:47 Analyst Patrick Moorhead writes, in my view, this is the clearest signal yet that hyperscale cloud is entering a second growth phase that dwarfs the first. Net, AI is repricing the entire cloud total addressable market upward. Brock, meanwhile, points out, if AI genuinely doubles AWS revenue to $600 billion by 2036, then Amazon will emerge as one of the biggest beneficiaries
Starting point is 00:10:07 of the entire AI build-out without even having to build the models themselves. Interesting stuff going on, but that is going to do it for today's headlines. Next up, the main episode. Agentic AI is powering a $3 trillion productivity revolution, and leaders are hitting a real decision point. Do you build your own AI agents, buy off the shelf, or borrow by partnering to scale faster? KPMG's latest thought leadership paper, Agentic AI untangled, navigating the build-by-or-borrow decision, does a great job cutting through the noise with a practical frame to help you choose based on value, risk, and readiness.
Starting point is 00:10:44 And how to scale agents with the right trust, governance, and orchestration foundation. Don't lock in the wrong model. You can download the paper right now at www.kpmg.us slash navigate. Again, that's www.kpmg.us slash navigate. If you're looking to adopt an agentic SDLC, Blitzy is the key to unlocking unmatched engineering velocity. Blitzie's differentiation starts with infinite code context. Thousands of specialized agents ingest millions of lines of your code in a single pass,
Starting point is 00:11:12 mapping every dependency. With a complete contextual understanding of your code base, enterprises leverage Blitzy at the beginning of every sprint to deliver over 80% of the work autonomously. Enterprise-grade, end-to-end tested code that leverages your existing services, components, and standards. This isn't AI autocomplete. This is spec and test-driven development at the speed of compute. Schedule a technical deep dive with our AI experts at blitzie.com. That's BLYTZY.com.
Starting point is 00:11:37 There's a new standard that I think is. going to matter a lot for the enterprise AI agent space. It's called AIUC1, and it builds itself as the world's first AI agent standard. It's designed to cover all the core enterprise risks, things like data and privacy, security, safety, reliability, accountability, and societal impact, all verified by a trusted third party. One of the reasons it's on my radar is that 11 labs, who you've heard me talk about before and is just an absolute juggernaut right now, just became the first voice agent to be certified against AIUC1 and is launching a first of its kind insurable AIAEAEAEAE. agent. What that means in practice is real-time guardrails that block unsafe responses and protect
Starting point is 00:12:13 against manipulation, plus a full safety stack. This is the kind of thing that unlocks enterprise adoption. When a company building on 11 labs can point to a third-party certification and say our agents are secure, safe and verified, that changes the conversation. Go to AIUC.com to learn about the world's first standard for AI agents. That's AIUC.com. This episode is brought to you by Mercury, radically different banking, now available for personal accounts. I already use Mercury for my business. So when they introduced personal accounts, it made immediate sense for me. I try to bring the same level of intention to my personal finances that I bring to building companies, and most traditional banks just do not feel designed for that.
Starting point is 00:12:52 With Mercury Personal, you can toggle between business and personal in a click. You can set up sub-accounts for specific goals, automate transfers so projects and savings fund themselves, and put idle cash to work with high-yield savings, all without friction. It's built for people who care about how their money moves and want tools that actually keep up. Visit mercury.com slash personal to learn more. Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and column N.A. Members FDIC.
Starting point is 00:13:18 Welcome back to the AI Daily Brief. Today we are doing a bit more of a practical hands-on style episode. It was inspired by this post from Tarek over at the ClaudeCode team at Anthropic called Lessons from Building Claude Code, how we use skills, and the context for this is that if you take away one theme from pretty much all of 2026's episode so far, it's that we are moving into a much more agentic era of AI. Skills are a key component of how to get value out of agents. And so today we're going to first give a little bit of a background of what skills are, talk about some of these lessons and best practices from the team at Claude
Starting point is 00:14:00 and then share a few more resources where you can take the conversation farther. First of all, let's talk about what skills are. The official GitHub repo calls them, a simple, open format for giving agents new capabilities and expertise. Skills are folders of instructions, scripts, and resources that agents can discover and use to perform better at specific tasks. Write once, use everywhere. The background is this. As AI coding agents were getting more and more capable throughout 2025, people started to hit a very similar wall, which was basically that system prompts kept ballooning. Every new capability meant more instructions, more examples, and more edge cases crammed into a single context window. Of course, the more you try to jam into a
Starting point is 00:14:38 context window, the more you're going to have performance degradation. Having to juggle all of that knowledge all at once was crowding out space for actual execution on the task at hand. That led to agents getting slower, more expensive, and less reliable. Now, the insight that ended up driving skills was that agents don't need access to all of their knowledge all the time. What they need is to be able to load the right knowledge at the right moment. On October 16th, Anthropic officially announced skills in a blog post. The post was called equipping agents for the real world with agent's skills, and frame the issue as this. Clod is powerful, but real work requires procedural knowledge and organizational context. They write, as model capabilities improve, we can now build general
Starting point is 00:15:17 purpose agents that interact with full-fledged computing environments. Cloud code, for example, can accomplish complex tasks across domains using local code execution and file systems. But as these agents become more powerful, we need more composable, scalable, and portable ways to equip them with domain-specific expertise. This led us to create agent's skills, organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. So what a skill actually is, is a directory anchored by a markdown file. Every skill directory is going to have a skill.md file that's going to have some required metadata like a name and a short description. When agents have access to skills, rather than having to have all of the context
Starting point is 00:15:57 all at once, they simply load up the name in the description. The idea of progressive disclosure in skills is to give the agent just the information that it needs in order to make good decisions without overloading its context. So basically the first layer of detail is just the short description, which means that when the agent is doing a task, it has those descriptions in mind and can go call up that skill if it seems like it would be useful. The second level of detail in this progressive disclosure regime is the actual body of the skill.md.file.
Starting point is 00:16:25 If the agent thinks that that skill is going to be useful, it'll move from just reading the description to reading the contents of that skill.md.md. Now, the skill.mdbody is still very small. While the metadata is tiny at roughly 100 tokens per skill, even the full skill.mdbody is recommended to stay pretty small. This leads to the third level of detail in progressive disclosure. Basically, as skills grow in complexity, they also might have context that's relevant only in specific scenarios.
Starting point is 00:16:52 And in fact, this is a really important part that gets missed. In the article from Anthropics Tarik that we're going to come back to, he writes, A common misconception we hear about skills is that they are just markdown files, but the most interesting part of skills is that they're not just text files. They're folders that can include scripts, assets, data, etc., that the agent can discover, explore, and manipulate. Basically, you can bundle additional context in the form of other markdown files or references
Starting point is 00:17:17 or scripts that get linked out to from the skill.md file. The analogy, they say, is a well-organized manual that starts with a table of contents, then specific chapters, and finally a detailed appendix. Almost immediately, skills began being adopted outside of just the anthropic ecosystem. OpenAI added skills support to both ChatGBTBT and the GitHub co-pilot family of coding agents, all adopted the standard, and other ecosystems and harnesses have jumped on board as well. The launch of OpenClaw really took the skills conversation to the next level. As people started en masse building all of these different agents,
Starting point is 00:17:49 a lot of them had common skills needs. Like, for example, understanding how to use specific tools, how to interact with certain types of file formats like documents and PDFs, or how to take specific actions like transcribing audio. A site called Claw Hub quickly launched that now has something like 28,000 skills. And other people have their own collections, focused on particular use cases or areas of interest.
Starting point is 00:18:10 And yet, what Anthropic found when they actually sat back and looked was that as many skills as there were available, many if not most of them could fit into one of nine categories. Library and API reference, product verification, data and analysis, business automation, scaffolding and templates, code quality and review, CICD and deployment, incident runbooks, and infrastructure ops. So that's what led them to this post. Let's talk first about some of the categories in this taxonomy,
Starting point is 00:18:39 and then some of the more general best practices than Anthropics shared. I'm not going to go through all nine categories, but let's talk about a couple. One key category they found was data fetching and analysis, skills that, for example, connect your data. These skills they write might include libraries to fetch your data with credentials, specific dashboard IDs, et cetera, as well as instructions on common workflows or ways to get data. Another category which I can see being important to listeners of this show is business process and team automation. In other words, skills that automate repetitive workflows into one command.
Starting point is 00:19:08 They write, these skills are usually fairly simple instructions but might have more complicated dependencies on other skills or MCPs. An example might be a weekly recap skill, where merge PRs, plus closed tickets, plus deploys, come together in a formatted recap post. Another category in their key taxonomy of skills, which relates to some conversations, we've been having recently is about code quality and review. Now, the conversation that we've been sharing here is one about what happens when coding agent sprawl gets sufficient, that it just becomes impossible for humans to review all the code. There are some who are argued that we're already far past that point, while others cling on to the idea that humans need to have the final look.
Starting point is 00:19:42 My very strong instinct is that even if it would be better, if all code that was released as products and services actually had human review, I don't think that there's any chance that that paradigm gets out of 2026. I think we're going to have to solve the problem of code review in new ways, which I'll be clear as a problem that I am not qualified to solve, but I just think that we're going to be producing such an incredibly high volume of code that at some point we'll give up the ghost on the idea of being able to review it all. That makes code quality and review skills seem all the more potentially important. This Anthropic describes as skills that enforce code quality inside of your org and help review code. Some of the examples are adversarial review,
Starting point is 00:20:15 which would spawn a fresh-eye sub-agent to critique, implement fixes, and iterate until findings to grade into nitpicks, or a code-style skill that enforces code styling, especially styles that Claude does not do well by default. Interestingly, and I think related to that, Tarek argues that one of the highest ROI categories are verification skills. They describe this as skills that describe how to test or verify that your code is working. Verification skills are extremely useful for ensuring Claude's output is correct. It can be worth having an engineer spend a week just making your verification skills excellent.
Starting point is 00:20:45 Consider techniques like having Claude record a video of its output so you can see exactly what is tested, or enforcing programmatic assertions on state at each step. So there are more categories in the taxonomy, but that gives you a feel for what Anthropic is seeing in terms of their most valuable skills. Now, admittedly, this is from the Claude Code team, so it's going to index highly technical, whereas if you had an agent builder who is mostly focused on business processes, you probably see more gradations of this category for business process and team automation. Maybe even more useful then, are Tarik and Claude Cod Codes' tips for actually making skills.
Starting point is 00:21:17 One thing that a number of folks missed is that Anthropic actually just updated their skill creator tool. Skill creator they write helps you write evals, run benchmarks, and keep your skills working as models evolve, and it was meant to answer a specific challenge. Since launching agent skills last October, they wrote, we've noticed that most authors are subject matter experts, not engineers. They know their workflows but don't have the tools to tell whether a skill still works with a new model, triggers when it shoulds, or if it actually improved after an edit. Ultimately, they write the goal is bringing some of the rigor of software development, like testing, benchmarking, and iterative improvement, to skill authoring without requiring anyone to write code.
Starting point is 00:21:53 Solopreneur and educator Ali Lemon actually called this out as a fairly big deal. He wrote, Anthropics shipped three upgrades to skills that fix most problems almost everyone runs into. Problem one, you had no way to measure how well your skills were actually performing. Now you can run evals that test your skill against multiple prompts and get a score. Problem two, your skills break when models update and you don't notice. With the new skill creator, you can run AB test comparing your skill in raw Claude. Problem number three, he writes, Claude doesn't even use your skill half the time because the description is too vague or too specific.
Starting point is 00:22:20 Now the skill creator rewrites your descriptions automatically so they trigger at the right time. Anthropagy points out, ran this on their own skills and saw better triggering five out of six times. Now, one other note from the skill creator that I thought was valuable, is the framework for organizing skills into two categories. They call those two categories, one, capability uplift, skills that help Claude do something the base model either can't do or can't do consistently, i.e. certain types of document creation. And then the second category of skills are called encoded preference skills that document workflows where Claude can already do each piece, but the skill sequences them according to your team's processes. The distinction matters, they say,
Starting point is 00:22:56 because these two types of skills may need testing for different reasons. Capability uplift skills may become less necessary as models improve, while encoded preference skills are more durable, but only as valuable as their fidelity to your actual workflow. So back to Tarek's post, here are some of their top tips for making skills better. The first is don't state the obvious. They write, if you're publishing a skill that is primarily about knowledge, try to focus on information that pushes Claude out of its normal way of thinking. The front-end design skill is a great example. It was built by one of the engineers at Anthropic by iterating with customers on improving
Starting point is 00:23:28 Claude's design taste, avoiding classic patterns like the inter font and purple gradients. The second tip is to build a gotcha section. In fact, Tarik argues that the highest signal content in any skill is the gotcha section. These sections articulate common failure points that Claude runs into when using your skill. And ideally, he says you update your skill over time to capture these gatches. A third tip goes back to that idea that people still think of skills as just a single markdown file rather than an entire folder. And Tarek says you should think of the entire file system as a form of context engineering.
Starting point is 00:23:58 They also suggest you should avoid railroading clod, i.e. give Claude the information it needs, but give it the flexibility to adapt to the situation. As Tarek puts the conclusion, this should be thought of more as a grab bag of useful tips than as some sort of definitive guide. That makes sense because right now, everyone is just racing to figure out how to actually engage with the new capabilities of agents, and so every bit of advice at this point is going to be at least a little bit a work in progress. Now, one of the interesting things then is how all of these work-in-progress lessons apply to different categories of users. The most obvious is probably
Starting point is 00:24:31 the advanced agent builders who are building and maintaining complex multi-agent teams. For them, obviously, skills are essentially a modular architecture for agent capabilities. And frankly, this is kind of the audience that Tariq most wrote this post for. A level down from that are the individual power users, which my guess is a lot of you fall into this category. This is not a person who's building complex agent teams and orchestration models. Instead, they are using one or a small number of agents to get their own work done faster or better or do things that weren't possible before.
Starting point is 00:25:00 For that type of user, skills are basically reusable prompts with superpowers. The difference between a skill and a saved prompt is that a skill can include actual code, templates, reference data, and examples, not just instructions. The practical value then is, you figure out how to get the agent to do something well once, and then you package it so it works reliably every time. The stand-up post example from Tarek's Post is perfect for this tier. This is an automation of a daily task you do, and the type of thing that you want to happen consistently over and over again.
Starting point is 00:25:30 This also helps demonstrate why that gotcha section can be really valuable. Every time the agent makes a mistake, you add it to the skill so it doesn't happen again, and the skill becomes a living document that gets smarter over time. This also helps you stay not locked into one specific ecosystem. Because skills are supported by Codex, ClaudeCode, cursor, etc., you're not locked into anyone tools-prompting format. But what about for the mainstream user, the person who isn't even yet fully in ClodCode or Codex,
Starting point is 00:25:57 people who are using off-the-shelf tools or experimenting with perplexity computer or Notion custom agents. What's interesting here is that the design pattern holds, And you can see even in these simpler prosumer and consumer tools, the idea of skills as reusable capabilities infiltrating into the mainstream. In fact, earlier this week, Notion announced custom skills for Notion AI. In their announcement tweet, they write, write a prompt, you'll use it once. Write a skill and you'll use it forever. And this is the mental model shift, even if you are not an agent builder with Claude Code. The shift is from thinking about ad hoc prompting to
Starting point is 00:26:33 reusable capabilities. For a lot of folks, out there, you're not ultimately going to have to care about the full architecture of skill.md files and progressive disclosure and all these things. Those folks just know they can teach the AI to do a specific thing their way, give it a name, and invoke it whenever they want. For some, it'll almost be an update to custom GPs, which for many became essential, even though they never fully took off. Now, you can see how Notion has simplified skills into their own ecosystem.
Starting point is 00:26:59 Basically, you can take any page in Notion, click the menu, and turn that page into a skill. And the point is that this concept of skills as reusable capabilities is a concept that is converging across the entire AI stack, from consumer uses up to much more advanced uses all at once. The underlying idea is that AI is less and less a one-off conversation, and more and more, a library of reliable, repeatable capabilities. Skills, I think, are a useful framework for that, no matter what level you're engaging with it on. And hopefully this episode has given you a little bit of a better starting point. We might go deeper in a future operator episode, but for now, that is going to do it for today's AI Daily Brief.
Starting point is 00:27:38 Appreciate you listening or watching, as always, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.