The AI Daily Brief: Artificial Intelligence News and Analysis - Are Agent Swarms the Next AI Paradigm?

Starting point is 00:00:00 Today on the AI Daily Brief, is 2026 going to be the year of AI agent swarms? Before that on the headlines, some big jumps in Anthropics fundraising and revenue. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, ZenCoder, and Superintelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. learn more about sponsoring the show. Send us a note at sponsors at AIdailybrief.aI. Also, if you were interested in the research that we did at the end of last year, we have our next research kicking off

Starting point is 00:00:45 soon to keep track of all that, as well as to hear about future products we have coming, AI maturity maps, AI Opportunity Radars, and much more, go to AIDBIntel.com, where you can sign up to get that information as soon as it comes out. Now, with that out of the way, let's dive in. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. We kick off today with some fundraising and business news out of Anthropic. The company is close to finalizing their latest funding round, which could raise more than $20 billion. Reports state that Anthropic has between 10 and 15 billion in firm commitments that could be finalized early next week, including the Singapore Sovereign Wealth Fund and Sequoia making large investments.

Starting point is 00:01:26 Anthropic has also recently doubled the size of the round from 10 to 20 billion in response to excessive interest. One investor told the financial times that the round was five to six times oversubscribed before the size increase. In addition to venture capital and sovereign wealth, Microsoft and Nvidia have also committed to invest a total of $15 billion in the company, which is on top of the $20 billion from investment firms. The round would reportedly value Anthropic at $350 billion, almost a doubling from their Series F, which closed in September. The fundraising frenzy firmly cements Anthropics momentum. Last year, remember, OpenAI raised $40 billion anchored by $30 billion from SoftBank, meaning that Anthropic is now neck and neck

Starting point is 00:02:03 with those figures. In addition to fundraising news, the information has an update on Anthropics' revenue growth forecasts. They report that Anthropic updated investors in December and hiked forecasts across the board. 2026 revenue is now expected to come in at $18 billion, around a 4x increase from last year's numbers and up 20% from estimates made last summer. In 2007, Anthropic expects to generate $55 billion in revenue. For 2019, their most optimistic forecast calls for $148 billion. That forecast is particularly notable as it's $3 billion more than OpenAI's last forecast, which was made during the summer. Open AI, of course, may have hiked expectations since then, but still very notable that Anthropic believes they could overtake Open AI within three years. The other big number from the financial

Starting point is 00:02:46 update was Anthropics increasing training costs. They expect to spend $12 billion on training this year, which is a 50% increase from summer projections. Their forecasts also project training costs to exceed $100 billion by 2029. These increased costs push back Anthropics timeline for profitability by a year, with the company now expecting to flip cash flow positive by 2028. Now, one of the things that Dario and Anthropic have, of course, been weighing in a lot about is chip exports to China, with Anthropic being firmly in the camp that we should not be exporting chips to China. An update on that front, as Beijing has approved the first batch of Nvidia chip imports. Roiders reports that Chinese officials have improved the import of several hundred thousand H-200s, allowing access to the advanced chips for

Starting point is 00:03:27 the first time. Sources said the first batch of approvals were primarily allocated to three unnamed tech giants. The Wall Street Journal later named Alibaba and ByteDances two of the three receiving approval. Other enterprises are still in the queue awaiting a subsequent round of approvals, presumably including high-flying startups like Deepseek who may have to wait in line to set up their H-200s. Reports stated that Chinese AI firms will be required to support local chipmakers as well, using their chips for some training tasks and most AI inference. Basically, it seems like officials are trying to strike a balance, allowing Chinese companies to train advanced models while also protecting domestic chipmakers. Now, this could be a huge boom to Nvidia's first quarter financials.

Starting point is 00:04:05 Several hundred thousand H-200s is in the ballpark of 10 billion in sales. And that's only the first round of approvals. In Q2 of last year, when Chinese chip exports were shut down by the U.S. government, Nvidia reported a $5.5 billion write-down associated with losing Chinese sales. That implies Nvidia could see record Chinese sales this quarter simply based on this first round of approvals. Invita CEO Jensen Huang is currently visiting China to meet with local employees, but reports suggest that he hasn't met with any senior officials. That said, his next stop is Taiwan, where people familiar with the trip said he plans to ask suppliers to bump up H-200 production to meet Chinese demand. Moving over to the training side of the house, the UK government has

Starting point is 00:04:43 expanded their AI training initiative with an ambitious new goal to upskill every worker in the country. The Department for Science, Innovation, and Technology announced on Tuesday that free AI training will be made available to every adult worker. The training will come in the form of 20-minute online courses with modules covering use cases like drafting text, content creation, and automation of administrative tasks. Technology Secretary Liz Kendall said, we want AI to work for Britain, and that means ensuring Britons can work with AI. Change is inevitable, but the consequences of change are not. We will protect people from the risks of AI while ensuring everyone can share in its benefits. New partners, including Cisco, Cognizant, and the National Health Service will join

Starting point is 00:05:19 existing partners including Amazon, Google, Microsoft, and Salesforce in the upskilling initiative. The department claimed this would be the largest targeted training program since the established of Open University in the late 1960s, which delivers distance learning for higher education. They said the program had already delivered a million courses and the government would aim to retrain 10 million workers by the end of the decade. Workers that complete the training will be certified with an AI Foundation's badge to give employers confidence they have basic AI skills. Now, there is a lot that we could say about this.

Starting point is 00:05:46 The cynic in me, of course, sees all of the potential challenges with this program, most of which sort of amount to a question of whether this is too little to move the needle. But we got to start somewhere. Governments need to get involved in a way that is actually helpful to people adapting to a new world rather than just trying to pretend that they have control over whether that new world exists. And so for that reasons, I think this is a good thing, and I'm excited to see it hopefully go even farther than they're thinking right now. Now, our main episode today is about a new model out of China and its agent swarm capabilities, but Alibaba's Quen team also released a new model earlier this week, specifically called Quen 3 Max Thinking.

Starting point is 00:06:19 Now, as you can probably tell from the naming convention, this is the big flagship model from the Quen team. Their equivalent of GPT-52 Pro, Gemini 3 Pro, or Opus 4-5. The model makes use of an inference technique that the Quinn team are calling heavy mode. Quinn is doing things slightly differently from existing approaches to test time-scaling, generating a response, then feeding it back into the model for improvements in a recursive loop. It appears to be generating some pretty significant gains. Quinn said that this method improved benchmark scores on GPQA, which is a PhD-level science test, from 90.3% to 92.8%.

Starting point is 00:06:50 On live codebench scores jump from 88% to 91.4%. Overall, the benchmarking looks pretty strong. Now, the cost is a little beefy for a Chinese open source model. Quen3 Max thinking comes in and around the same cost as Claude Haiku 4.5, meaning that it's still much cheaper than models like Gemini 3 Pro or GPD 5.2, but about 10 times more expensive than Deepseek v. 3.2. Now, Quen 3 is already being used by many American companies. Airbnb CEO Brian Chesky, for example, recently said that his company was relying on

Starting point is 00:07:20 when 3 as a more affordable alternative to U.S. models, meaning that you got to think that they will be watching this model release closely, although again, how it stacks up compared to Kimi K2.5, which we will talk about in our main episode, remains to be seen. Lastly, today, it's not just the Chinese labs with some interesting new product to show off. Google has released a new feature for Gemini 3 Flash called Agentic Vision. The feature leverages Gemini's state-of-the-art multimodal reasoning with code to execute unique capabilities. writes Google, Agendic Vision introduces an agentic think-as.

Starting point is 00:07:50 Act-observe loop into image understanding tasks. Think, the model analyzes the user query in the initial image, formulating a multi-step plan. Act, the model generates and executes Python code to actively manipulate images, such as cropping, rotating, or annotating, or analyzing them, such as running calculations, counting bounding boxes, etc. Last is observed. The transformed image is appended to the model's context window. This allows the model to inspect the new data with better context before generating a final response.

Starting point is 00:08:17 Overall, this promises to improve Gemini's ability to annotate images. perform data visualization tasks, help with basic image analysis. Google said that the loop improves model performance by between 5 and 10% across most vision benchmarks. Still developer experience lead Omar San Saviero hinted at the most exciting unlock from the new feature, he showed an output of an annotated image of a table containing a spill. Gemini had identified a spill, a piece of cloth, and several other items. The annotations appear to be instructions for a robot to clean up the spill by first clearing away the items in the way, debiting the cloth and wiping up the spill. The implications, of course, being that this feature could be used to give robots

Starting point is 00:08:51 on-the-fly analysis and reasoning ability, allowing them to tackle tasks that they've never seen before. Ultimately, though, as I said, when it comes to new models, the big conversation is around Kimmy K2.5, and so with that, we will wrap up the headlines and move on to the main episode. Hello, friends, if you've been enjoying what we've been discussing on the show, you'll want to check out another podcast that I have had the privilege to host,

Starting point is 00:09:16 which is called You Can With AI from KPMG. Season 1 was designed to be a set of real stories from real leaders, making AI work in their organizations, and now season two is coming and we're back with even bigger conversations. This show is entirely focused on what it's like to actually drive AI change inside your enterprise and as case studies, expert panels, and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple, Spotify, or YouTube, and subscribe today. If you're using AI to code, ask yourself, are you building software or are you just playing

Starting point is 00:09:53 prompt roulette. We know that unstructured prompting works at first, but eventually it leads to AI slop and technical debt. Enter Zenflow. Zenflow takes you from vibe coding to AI-first engineering. It's the first AI orchestration layer that brings discipline to the chaos. It transforms free-form prompting into spec-driven workflows and multi-agent verification, where agents actually cross-check each other to prevent drift. You can even command a fleet of parallel agents to implement features in fixed bugs simultaneously. We've seen teams accelerate delivery to X to 10x. Stop gambling with prompts. Start orchestrating your AI. Turn raw speed into reliable production grade output at zenflow.free. Today's episode is brought to you by my company

Starting point is 00:10:36 Superintelligent. In 26, one of the key themes in enterprise AI, if not the key theme, is going to be how good is the infrastructure into which you are putting AI in agents? Superintelligence agent readiness audits are specifically designed to help you figure out one, where and how AI and agents can maximize business impact for you, and two, what you need to do to set up your organization to be best able to leverage those new gains. If you want to truly take advantage of how AI and agents can not only enhance productivity, but actually fundamentally change outcomes in measurable ways in your business this year, go to be super.aI. Welcome back to the AI Daily Brief. Today we're talking about something that has been of interest

Starting point is 00:11:20 to people for quite some time. When I first started this show all the way back in April of 2020, Already there were people who were extremely interested in the way that LLMs could generate code. Now, it would take a couple of years and some significant advances in the models to actually unleash vibe coding in the way that it happened over the course of 2025, but the idea was there very early. We've similarly had interest in vast teams of agents that can coordinate amongst themselves to accomplish more things, even if the capability set hasn't fully been there. Which isn't to say that people haven't been experimenting.

Starting point is 00:11:54 Lindy released their agent swarm tool back in April of 2025, and the concept is related to something that I've talked about on this show, the Doctor Strange Theory of AI Agent Work. Now, the specific point that I've made is actually about the difference in how enterprises think agents will play out versus how I think they will play out, with the difference being that I don't think that agents are going to be one-to-one replacements for existing human work. I think that we're going to be able to deploy lots and lots of agents to scenario and war game different types of work, which while not exactly the same as agent swarms, which are more about breaking down complex tasks into specific sub-tasks, is in some ways still part of the same larger

Starting point is 00:12:30 conversation about how agents will actually work in the future. Over the last couple of days, we have started to get the first big model releases of 2026, and maybe the most significant so far is Moonshots Kimmy K2.5. While it is the agent swarm feature of K2.5, which has the most chatter, it's worth checking out the broader model as a whole. Artificial analysis sums up the shift when they write, Moonshots Kimmy K2.5 is the new leading open weights model, now closer than ever to the frontier, with only open AI anthropic and Google models ahead. And indeed, the benchmarks are impressive. K2.5, for example, claims 50.2 on humanity's last exam, which would put them ahead of GPT-52 running on high settings, Opus 4.5, and Gemini 3. On a variety of other benchmarks as well, they claim performance

Starting point is 00:13:18 that matches or exceeds these premier Western models. On the overall independent artificial analysis index, Kimi jumps from 11th place overall with their K2 thinking model into fifth only behind two iterations of GPT5.2, Opus 4.5, and Gemini 3 Pro. And of course, the cost is cheaper than any of those models. In AA's tests, Kimi K2.5 was about four times cheaper than Opus 45 or GPT52, but was still much more expensive than, for example, Deepseek version 3.2. One of the things that Moonshot has emphasized in their launch is the model's native multimodality.

Starting point is 00:13:52 Artificial analysis again writes, Kimmy K2.5 is the first flagship model from Moonshop to support image and video inputs. This is the first time that the leading OpenWates model has supported image input, removing a critical barrier to the adoption of open weights models compared to proprietary models from the frontier labs. They point out that this makes a significant difference as compared to other open weights leaders like Deepseeks v3.2. Now, anytime we get a model out of China, of course, one aspect of the discourse, is what it says for the state of the AI race. On that front, there were a number of people who took to Twitter slash X

Starting point is 00:14:23 to share examples of Kimmy 2.5 claiming that it was clawed. Enrico from Big AGI says identity crisis or training set. Still, overall, even with some of the suspicion of distillation of Western models, the release of 2.5 certainly validates the recent arguments from people like Demis Hasabas that Chinese models are very, very close to the U.S. when it comes to performance, if not yet having had an example of actually pushing the frontier. As Balaz Namethi points out, however, the real value in 2.5 is not, as he puts it pure IQ dominance. It's about how it does in an actual work environment.

Starting point is 00:14:55 He calls it less chatbot and more employee. And indeed, there are a couple things that stood out to me about the 2.5 announcement that are really impressive. One is the way that they're using this multimodal input capability in the context of coding. They show an example of taking a screen recording of a website, dumping it into Kimmy and asking it to clone it, with Kimmy shipping that code, including U.X and interactions. If this actually works like that, it opens up a significant new frontier in AI coding that you have to imagine that everyone will race to copy very quickly. Another thing that Moonshot emphasized is how good 2.5 is at office skills,

Starting point is 00:15:29 things like financial modeling and Excel or creating high-quality PowerPoints. Now, again, this could be incredibly valuable when it comes to work, although I haven't really been able to find a ton of examples yet of people testing this out that don't just feel like paid influencer posts. one that I found that did seem to positively test out these features came from Shafi. He wrote, This new AI model Kimmy from China created a full slide deck from my journal article in one single shop prompt.

Starting point is 00:15:52 I just gave it the keyword and journal name not even the link or PDF to the article. It searched the article and found the correct one. Develop the contents after reading the paper. Created contents for 12 slides, including searching images from internet, asked for suggestions to make edits which I declined and asked it to go ahead and generated slides in a PowerPoint format. Everything happened inside my phone in five to six minutes. Since it's my own article, I know it got most of the things right.

Starting point is 00:16:13 And yet, as I said at the beginning, probably the feature that people are most excited about is this Agent Swarm parallelization. An example that Kimmy gave was adapting O'Henry's short story the Gift of the Magi into a 10-minute short film. They asked it to generate a highly consistent storyboard script and embed it into an Excel file, which they said from a single prompt created a 100-Magabyte Excel file generated with images with a total of 55 scenes. Simon Willison writes,

Starting point is 00:16:38 the self-directed agent swarm paradigm claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once. He gave it the prompt, I want to build a dataset plugin that offers a UI to upload files to an S3 bucket and stores information about them in an SQ light table. Break this down into 10 tasks suitable for execution by parallel coding agents. He said the response was pretty good. It produced 10 realistic tasks and reasoned through the dependencies between them.

Starting point is 00:17:03 Global Soul writes, tried Kimmy Moonshot agent swarms and it is quite magical. Basically, they gave Kimmy a list of stocks and asked it to create a report that analyzes each from a variety of different factors. They said it created individual files for each company, an overall summary, and finished the output for all companies in 10 minutes. Swix also had an interesting experience in his testing. He writes, little detail from exploring the K2.5 agent swarm preview today.

Starting point is 00:17:28 I asked it to make a custom website for the Latenspace podcast, and despite it being trained to parallelize eagerly and having full permission to do so, it recognized that this was a noob task and did a highly competent job with one agent and refunded my credits. This thing might be AGI. I've never expected a parallel agent lab to use less than what it was trained or opted in to use. In other words, just because it could use a parallel agent structure, it'd recognize that for certain tasks it doesn't need that. Client founder, Saoud Rizwan, explains a little bit about what's going on in the background. He writes, LLMs are trained on sequential reasoning, breaking tasks down step by step one to do after another. When you ask them to orchestrate parallel work,

Starting point is 00:18:05 they don't know how to split tasks without conflicts. Moonshot caused this serial collapse and solved it with reinforcement learning. They used P-A-R-L parallel agent reinforcement learning, where they gave an orchestrator a compute and time budget that made it impossible to complete tasks sequentially. It was forced to learn how to break tasks down into parallel work for sub-agents to succeed in the environment. Simon Smith from Click Health did a full test as well and came away pretty impressed.

Starting point is 00:18:29 He writes, I've been thinking about the best way to organize agents in step-by-step workflows where each agent has skills defined by an agent's skills file, and to then scale this across an enterprise. Today, Kimmy dropped its K2.5 model along with Agent Swarms, and I thought, could this be it? The answer? Mostly. He then walks through how you do this. First, using Kimmy, you actually use the model selector to select Agent Swarm in the same way

Starting point is 00:18:52 that you would select between, for example, instant or thinking mode. For Simon's task, he gave Agent Swarm the task of responding to an RFP, which included in his words, research, strategy, creative brainstorming, and concept development, media. media planning, analytics planning, high-level project planning, and consolidating everything into a final written response in a Word document. He continues, as would be familiar to users of agentic coding tools like ClaudeCode and Codex, Kimmy turns your request into a step-by-step plan and then proceeds to work through it. Where things get interesting, however, is how it executes the plan with multiple agents.

Starting point is 00:19:22 For each step in the plan, he writes, Kimmy creates a set of relevant agents. And importantly, these aren't generic agents. Agents each have roles and names. Each agent he writes plays a specific role, defined for it in a prompt, and even gets a name and Avatar. The role description ensures the agent focuses on a specific job to be done, and the name and avatar make this extremely user-friendly. The model is then smart enough to figure out which agents can work in parallel, or in the case that an agent requires the output of a different agent, how to run them sequentially. Simon writes that you can monitor agents overall via a dashboard

Starting point is 00:19:52 with progress indicators, and also select individual agents to monitor their work. One of the important things that Simon points out is that part of the big upgrade here is not just the performance, but the user experience. He writes, When I think about something that would scale up to an enterprise, which will include a lot of users who won't be comfortable

Starting point is 00:20:08 in something like Cloud Code in the terminal, this feels like it would be easily adopted. It's extremely clear and intuitive. The model gave Simon both not only the final output, but also all of the intermediate outputs from each of the distinct agents. Now, Simon's big request, and his caveat, is that he wants access to connectors or MCPs

Starting point is 00:20:25 as well as agent skills, to be able to fully sync this with the larger ecosystem of data that people work in. Overall, though, he says I'm impressed. I've been waiting for something like this that makes it easy for anyone, regardless of technical expertise, to ask AI to do something and have it complete the task with multiple agents playing different roles and working collaboratively. This feels like the emerging future of humans managing teams of AI agents,

Starting point is 00:20:45 the way they currently manage teams of other humans. I honestly don't understand how Kimmy got here first. There are other solutions out there for agents to work together on tasks, but everything I've seen is too technical for the average user, requiring you to use the terminal or too rigid, requiring you to pre-built workflows? How did Kimmy create such a great model with such excellent agentic capabilities and build such an intuitive interface? Now, this is the interesting question, and why it makes me feel like we are very much seeing the beginning of a broader phenomenon around these agent swarms.

Starting point is 00:21:15 In addition to K2.5, I've seen a couple people talking about Claude's new task system in this same context, and so it seems like something that's probably on the minds of those folks as well. Langchain developer Sidney Runkel is also talking about this subagents architecture, all of which makes me feel like 2026 might be the year of the agent swarm. Indeed, there's enough chatter that Ethan Malik is making one last perhaps vainglorious attempt to steer us away from using the swarm terminology. On Monday, he tweeted, let's not call groups both terrifying and not a useful analogy. Groups of agents should be called teams or organizations. It both describes how to structure them and also how to use them. Don't let the weird AI folk naming win again. I'm not sure where it will

Starting point is 00:21:56 land when it comes to terminology, but it really does feel like this is something new happening, and I'm excited to see how it develops. I will be testing out K2.5. Maybe we'll do a special bonus operators episode about that. For now, however, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Are Agent Swarms the Next AI Paradigm?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.