The AI Daily Brief: Artificial Intelligence News and Analysis - Should You Build Single Agents or Multi Agent Systems?

Starting point is 00:00:00 Today on the AI Daily Brief, should you be building single agents or multi-agent systems? Before that on the headlines, it's apparently settlers of Catan but with AI companies. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello friends, quick announcements. Thanks as always to our sponsors for today's show,

Starting point is 00:00:29 KPMG, Blitzy, Fanta, and Superintelligent. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, and a quick flag, as I mentioned last week, We are finalizing some sponsorship sales of fall inventory, so if you are interested in sponsoring the show, shoot me a note, NLW at Breakdown.net. And I can share more information. For now, though, let's get into a very interesting behind the scenes look at what's actually going on with all these big AI labs. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news

Starting point is 00:00:57 you need in around five minutes. Things are getting a little feisty in AI land. We're seeing more and more aggressive competition when it comes to data. even some companies that were previously partners kind of starting to be at loggerheads with one another. In particular, tensions between OpenAI and Microsoft are flaring up as OpenAI's for-profit conversion stalls out. Last month, you'll remember that OpenAI abandoned plans to ditch their non-profit structure, but remain committed to converting to a public benefit company. The conversion would still unlock normal corporate structure with regular equity shares and a path to going public. Microsoft were reportedly the only ones on the cap table in opposition to the conversion, given their

Starting point is 00:01:37 lucrative revenue and profit-sharing agreement. Microsoft is currently entitled to a 20% share of revenues and a 49% share of profits. It's all very speculative, but the deal is capped out at around 120 billion. Negotiations have been ongoing since October when the conversion was first contemplated, but the Wall Street Journal reports that they become more difficult in recent weeks. The journal writes, OpenAI's executives have discussed what they view as a nuclear option, accusing Microsoft of anti-competitive behavior during their partnership. Now, that would involve seeking a federal regulatory review of investment contract for antitrust violations. OpenAI is also considering a PR campaign to turn public opinion against Microsoft. Issuing a joint statement, however, the two companies wrote,

Starting point is 00:02:17 We have a long-term productive partnership that has delivered amazing AI tools for everyone. Talks are ongoing and we are optimistic that we will continue to build together for years to come. Sources also told the journal that the windsurf acquisition was part of the problem. Currently, Microsoft has full access to all of OpenAI's IP, including models and systems. but OpenAI doesn't want Microsoft to gain access to Winserve's technology, which would make GitHub co-pilot more competitive as an AI coding product. A Winserve spokesperson said that the acquisition was, quote-unquote, speculative, suggesting the deal is still in limbo while the dispute is ongoing.

Starting point is 00:02:50 The rest of the negotiations are about how many shares Microsoft will get post-conversion, as well as OpenAI wanting to be released from exclusivity agreements. The information reports that OpenAI is offering Microsoft a 33% stake in the restructured company, which is a huge haircut from the 49% profit-sharing agreement. Regarding the exclusivity agreement, the information writes, OpenAI has told investors it wants to get out of its exclusive cloud contract with Microsoft, which makes Microsoft the only cloud provider that offers OpenAI models for sale through an API. Microsoft rivals Amazon and Google could jump at the chance to host OpenAI models on their servers,

Starting point is 00:03:23 which would make it easier for their cloud customers to use them. They also note that Google has already lobbied the government to kill the exclusive deal. Microsoft is also seeking to access OpenAI's technology past the current end date of 2030, with an ongoing debate about the definition of AGI, and lurking behind all of this is that OpenAI faces an end-of-year deadline to complete its restructuring or to potentially lose out on at least $20 billion

Starting point is 00:03:45 in committed fundraising. I would characterize the community's response to this as not all that surprised. Matthew Berman writes, No one could have seen this coming. It's kind of like the Thucidity's trap. Eventually Microsoft and OpenAI will go to war. Sam Altman has ambitions far greater

Starting point is 00:04:00 than being the model provider powering Microsoft co-pilot. He wants OpenAI to be a new city be Microsoft, Apple, and Google all rolled into one. It'd be interesting to see how this plays out. Now, in another area of competition around all of these big labs, Google is cutting ties with scale AI following their deal with meta. Citing five sources, Reuters reports that Google is ending their relationship with scale after MetaSoft acquired them. Google had been Scale AI's number one customer and was planning to spend about 200 million this year on data labeling. Roider sources added that Microsoft and XAI were also planning to end their contracts. Reportively

Starting point is 00:04:34 scale brought in $870 million in revenue last year, with $150 million of it coming from Google. A scale spokesperson declined a comment on the Google relationship, but insisted the company's business remained strong. They added that the startup will continue to operate independently and safeguard customers' data. For competitors, this is a big opportunity. Label Box CEO Manu Sharma said that they would generate hundreds of millions of new revenue from customers fleeing scale. Garrett Lord, the CEO of Handshake, said, our demand has tripled overnight after the news. For some, this opens the question of what exactly met a just paid $15. billion for. Signal writes,

Starting point is 00:05:06 meta-paying $15 billion for 49% implies around a $30 billion post-money valuation. But that price was set before the recent exodus, OpenAI, Google, possibly others, shifting to in-house infrastructure or rivals. If those departures create a revenue tied to that 51%, then the implied valuation from the meta-deal is massively distorted post-facto. Zuck basically ends up with a business where $15 billion got you an entire company without actually buying the whole company. Now, if you think that that means that it's increasingly looking like most of the

Starting point is 00:05:34 value of the deal seems to be about bringing CEO Alexander Wang and key members of his team in house, you are not alone in that feeling. Indeed, details are emerging from inside the massive deal, explaining exactly how this whole thing went down. The information reports that the deal started with Mark Zuckerberg floating a relatively modest, or at least comparably modest, $5 billion investment into scale back in mid-April. Even at that early stage, he was primarily negotiating to secure CEO Alexander Wang as the leader of his superintelligence team rather than the company itself. Wang encountered with 20 billion, arguing that existing shareholders wouldn't go for a smaller investment that risked destroying value in the company as customers fled. The pair eventually met at 14.3

Starting point is 00:06:11 billion for just under half of the company. The report states, this ultimately represents a $12.8 billion winfall for existing stockholders via a dividend paid for by Meta's investment. That includes over a billion dollars for Wang himself, according to sources close to the company. They said the payout will vest over five years as long as Wang remains employed at Meta, and there have also been discussions of Wang taking on the title of Chief AI Officer. The information writes, Zuckerberg's deputies were concerned that Mehta was paying a lot to be a minority shareholder with few rights in a startup that was losing its founder. The startup had also recently failed to meet its own financial projections. Behind the scenes, they worked to shift the deal more in Mehta's favor.

Starting point is 00:06:46 They report that if scale sells within the next two and a half years, meta will get fully paid back before other investors see a dime. The article also suggests that meta is walking a tightrope between financial, regulatory, and strategic concerns. It all seems pretty clear based on all the reporting so far that what Zuckerberg cares about is getting back in the AI race and he believes very strongly that Wang is the person to lead that effort. This is a fascinating report with just a ton of hardball on how the deal came together. At one stage, META asked for full voting rights,

Starting point is 00:07:15 aside from the appointment of board members. Meta also asked for a poison pill that would inflate their control if Alexander Wang was found to be helping to run scale while working at META. Basically, the overall tenor of all of this is that both sides knew the deal risked destroying value at the startup and needed some form of protection. Through their liquidation preference, met a secure to return on their investment if scale is sold off. For existing investors and staff, the protection was a lucrative payout now rather than risking the company falling off. This deal, it feels, is going to be very binary.

Starting point is 00:07:43 Either it's going to make Zuckerberg look like a genius or an absolutely insane person, and only time is going to tell which. For now that, that is going to do it for today's AI Daily Brief headlines. Next up, the main episode. Today's episode is brought to you by KPMG. In today's fiercely competitive market, unlocking AI's potential could help give you a competitive edge, foster growth, and drive new value. But here's the key.

Starting point is 00:08:07 You don't need an AI strategy. You need to embed AI into your overall business strategy to truly power it up. KPMG can show you how to integrate AI and AI agents into your business strategy in a way that truly works and is built on trusted AI principles and platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at www.kpmg.org.us slash AI. Again, that's www.kpmg.comg.coms

Starting point is 00:08:34 slash AI. This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI. But instead of building competitive modes, their best engineers are stuck modernizing ancient code bases or updating frameworks just to keep the lights on. These projects, like migrating Java 17 to Java 21, often means staffing a team for a year or more. And sure, co-pilots help, but we all know they hit context limits fast, especially on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting, processing millions of lines of code and making 80% of the required changes automatically. One major financial firm used Blitzy to modernize

Starting point is 00:09:15 a 20 million line Java code base in just three and a half months, cutting 30,000 engineering hours and accelerating their entire roadmap. Email Jack at Blitzie.com with Modernize and the subject line for prioritized onboarding. Visit blitzie.com today before your competitors do. Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC 2, ISO-27-01, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices. The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work can use up valuable time and resources. Vanta makes it easy and faster by automating compliance

Starting point is 00:09:57 across 35 plus frameworks. It gets you audit ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieved $535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com slash NLW. That's V-A-N-T-A.com for $1,000 off. Today's episode is brought to you by superintelligence, specifically agent readiness audits. Everyone is trying to figure out what agent use cases are going to be most impactful for their business, and the agent readiness audit is the fastest and best way to do that.

Starting point is 00:10:41 We use voice agents to interview your leadership and team, and process all of that information to provide an agent readiness score, a set of insights around that score, and a set of highly actionable recommendations on both organizational gaps and high-value agent use cases that you should pursue. Once you've figured out the right use cases, you can use our marketplace to find the right vendors and partners. And what it all adds up to is a faster, better agent strategy. Check it out at B-Super.a.i or email agents at B-SupertA.I to learn more. Welcome back to the AI Daily Brief. Today we have something really interesting, almost a point-counterpoint on agent architectures, which is particularly pertinent as the conversation

Starting point is 00:11:23 around agentic systems becomes more mainstream and more endemic inside the enterprise. On the one hand, we have Anthropic, which published a piece called How We Built Our Multi-Agent Research System. It's a really in-depth piece that shows the architecture of their research system, featuring an orchestrator, subagents, tools, etc. And then on the flip side, we have a blog post from the creators of Devin, the AI code generation tool, with an argument against building multi-agent systems. This is obviously a hot topic. For those of you who remember my coverage of Microsoft Build, one of the things that was

Starting point is 00:11:56 really notable is that they skipped right on over the stage where we were all excited about some single agent that could do a bunch of stuff and instead moved right into this multi-agent orchestration era, where they're building infrastructure for and thinking about and pushing on their customers, more complex agentic systems, where multiple agents can come together to accomplish more complete and extensive tasks. So let's dig into both of these arguments and figure out what we have to learn from each of them. And let's start with Anthropic. Anthropics paper talks about multi-agent systems using Anthropics Research Agent as an example. The research agent drives Anthropics version of deep research with the basic idea to ensure that the system can search

Starting point is 00:12:34 out information across the web, collate it, and produce a final report. Anthropic achieves this through a multi-agent system. They write in their section titled Benefits of a Multi-Agent system, research work involves open-ended problems where it's very difficult to predict the required steps in advance. You can't hard code a fixed path for exploring complex topics as the process is inherently dynamic and path-dependent. This unpredictability makes AI agents particularly well suited for research tasks. The model must operate autonomously for many turns, making decisions about which directions to pursue based on intermediate findings. A linear one-shot pipeline cannot handle these tasks. And so Anthropic instead turns to a multi-agent research system.

Starting point is 00:13:14 The system uses a central lead agent that orchestrates the process and multiple subagents that access search tools. A user request comes to the lead agent, which then strategizes, the task is broken down into different chunks, and instructions are provided to subagents to carry out each portion of the task. The subagents can iterate on their search and go a little deeper if they need to, but they're ultimately designed to find very granular individual pieces of information. Anthropic gives the example of using the research feature to find a list of every board

Starting point is 00:13:40 member across S&P 500 companies. In that example, each sub-agent might be tasked with looking for a single company's list of board members, and once those individual results are returned, the lead agent collates them into a single report. There are lots of benefits to the method that Anthropic describes. Tasks are run in parallel with dozens of sub-agents working at the same time on different tasks, which, as you would imagine, significantly increases speed. It also means that each individual sub-agent is less critical. In single-agent systems, a single-failed task could break the entire workflow, whereas with a multi-agent system, the orchestration agent can just spawn a new sub-agent, modifying the instructions and try again. Anthropic found that they could run tasks using

Starting point is 00:14:20 lesser models as sub-agents and actually come up with improved results. A multi-agent system with Claude 4 opus as the orchestrator and Claude 4 sonnet sub-agents outperformed a single Claude4 Opus agent by 90.2% on their internal evaluations. Multigent systems also will allow you to work around context limits. The sub-agents can use as many tokens as they need to complete a task, which is a small fragment of the overall workflow. You're essentially working across multiple contacts windows that all feed a compressed answer back to the lead agent. For complex tasks, this avoids having the issue of a single agent hitting the context limit before the task is complete. Indeed, Anthropic basically says that there's nothing all that complicated about

Starting point is 00:14:59 why these multi-agent systems work. They write, multi-agent systems work mainly because they help spend enough tokens to solve the problem. We found that token usage by its explains 80% of the improvement of multi-agent systems. In the particular use case of research on the internet, Anthropic argued that a multi-agent system is the most suitable. Now, this ability to use more tokens is also a downside. Multi-agent systems can be extremely expensive to run. They write, in our data, agents typically use around four times more tokens than chat interactions, and multi-agent systems use around 15 times more tokens than chats. They write, multi-agent systems require tasks where the value of the task is high enough to pay for the increase

Starting point is 00:15:37 performance. All right, so that is the multi-agent research system example. But what about this blog post titled simply Don't Build Multi-Agens? This one, as I said, comes from cognition who are mainly focused on building an AI and agenda coding platform. Author Walden Yan writes, frameworks for LLM agents have been surprisingly disappointing. I want to offer some principles for building agents based on our own trial and error and explain why some tempting ideas are actually quite bad in practice. Basically, the argument here is that, instead of building a multi-agent system, which they argue can be quite fragile, build a single agent that can reliably complete tasks end-to-end.

Starting point is 00:16:16 Now, what they're not suggesting is some simple architecture of just using one giant agent, loading it up with context and letting it run, but instead they're suggesting that users should try a linear handoff design. They cover this in a few different levels of complexity with differences in how the handoff is achieved, but the basic idea is that the first agent receives the query and breaks the requisition down into steps, very similar to the orchestrating agent in a multi-agent system, but then it hands off the necessary context to a sub-agent to complete the first sub-task. This sub-agent then hands off the original context with their additions to the next sub-agent and so on and so forth. And context is the key word here. In fact, the big term that

Starting point is 00:16:54 they're focused on in this piece is context engineering. They write, in 2025, the models out there are extremely intelligent, but even the smartest human won't be able to do their job effectively without the context of what they're being asked to do. Prompt engineering was coined as a term for the effort needed to write your task in the ideal format for an LLM chatbot. Context engineering is the next level of this. It's about doing this automatically in a dynamic system. It takes more nuance and is effectively the number one job of engineers building AI agents.

Starting point is 00:17:23 By way of example, they show a simple schematic of a main agent that breaks down a task into two sub-tasks, spitting up two sub-agents to do those tasks, with the main agent then combining the results. They write, this is a tempting architecture, especially if you work in a domain of tasks with several parallel components to it. However, it's very fragile. The key point of failure is this. They then use an example of building a flappy bird clone. They write, this task gets divided into subtask one, build a moving game background with green pipes and hitboxes, and subtask two, build a bird you can move up and down. It turns out sub-agent one actually mistook your sub-task and started building a background that looks like Super Mario Bros.

Starting point is 00:18:04 Sub-agent 2 built you a bird, but it doesn't look like a game asset, and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications. The issue here, or the concern at least, is that the sub-agents don't have enough context to actually deliver the result that the context requires. The blog writes, you might think that a simple solution would be to just copy over the original task

Starting point is 00:18:27 as context to the sub-agents as well, that way they don't misunderstand their sub-task. But remember that in a real production system, the conversation is most likely multi-turn, the agent probably had to make some tool calls to decide how to break down the task, and any number of details could have consequences on the interpretation of the task. Unfortunately, they suggest that just giving the context still might not be enough, or at least doesn't address all the problems. In the second scenario, where each of the sub-agents has the context of the conversation

Starting point is 00:18:55 in action so far, as well as their work assignments, still could have problems. They write, when you give your agent the same flappy bird cloning task this time, you might end up with a bird in background with completely different visual styles. Sub-agent 1 and sub-agent 2 cannot see what the other was doing, and so their work ends up being inconsistent with each other. This leads them to two principles of agent design. Principle 1, share context and share-ful agent traces, not just individual messages. And principle 2, actions carry implicit decisions and conflicting decisions carry bad results. The author writes, I would argue that principles one and two are so critical and so rarely worth violating that you should by default rule out any agent architectures that don't abide by them. You might think that this is constraining, but there is actually a wide space of different architectures you could still explore for your agent. And that's where we get into the idea of the single agent that breaks down the task in much the same way that the orchestration agent does, but then does the tasks sequentially and linearly, carrying the context of conversations and actions with it at each step. Now, this is not a silver bullet.

Starting point is 00:19:57 As they point out, you might run into issues for very large tasks with so many subparts that context windows start to overflow. However, the author argues for a set of different approaches that can solve that problem. So one example they gave of a slightly different architecture for longer tasks is still using the same chain of the original agent doing the sub-tasks, but with another LLM providing context compression along the way. Instead of handing off the full context, a side process compresses the key moments and decisions of the conversation and action so far into a smaller, more manageable set of context.

Starting point is 00:20:27 that can be used for the next part of the sequence. Cognition writes, this is hard to get right. It takes investments into figuring out what ends up being the key information and creating a system that is good at this. So what to make of these differences in approach? First of all, it's important to remember that we are talking about different sets of use cases here. Cognition is narrowly focused on coding agents,

Starting point is 00:20:48 and although the author is not strictly talking about coding agents, this is a main example that they're drawing from as they talk about building a high-performance agent that can take on long task while retaining context throughout. Anthropic, on the other hand, are talking specifically about workflows that lend themselves to parallelization. Basically, any workflow where the subtasks are completely independent and discreet, and don't rely on one another, in how they need to be recombined, could be a good candidate for using

Starting point is 00:21:14 multi-agent systems. They even note that not every use case is good for this type of system that they're talking about. They write, some domains that require all agents to share the same context or involve many dependencies between agents are not a good fit for multi-agent systems today. For instance, most coding tasks involve fewer truly paralyzable tasks than research, and LLM agents are not yet great at coordinating and delegating to other agents in real time. We found that multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds

Starting point is 00:21:44 single context windows, and interfacing with numerous complex tools. So when it comes to which of these approaches you or your company should take, as always, context is king and the use case matters. Are the subtasks dependent upon one another, or are they truly independent and can be done in parallel? The answer to that is going to dictate which of these systems might be the better choice. The other point to remember is that this is very much a snapshot in time. The cognition author even writes that they're optimistic about the long-term possibilities of agents collaborating with one another. It's just that right now, in their estimation, running multiple agents in collaboration at today's capability levels results in fragile systems.

Starting point is 00:22:24 As you might imagine, there are very few areas of development right now that have more sheer tonnage of brainpower on solving some of these issues than these questions of multi-agent systems. So while as much of the cognition post is well considered, and as we discussed, Anthropic even agrees with in some ways, the constraints of today are unlikely to be the constraints of tomorrow. Still two really valuable pieces, and hopefully even for a lay audience that's listening to this, gives you a better sense of how agents are being built right now and how you might want to think about them. For now, that's going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Should You Build Single Agents or Multi Agent Systems?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.