The AI Daily Brief: Artificial Intelligence News and Analysis - Should You Build Single Agents or Multi Agent Systems?
Episode Date: June 18, 2025Companies have two main options for building agent systems. Anthropic suggests multi-agent setups, dividing tasks between separate sub-agents. Devon's creators prefer single-agent or linear setups..., maintaining context clearly and consistently for tasks like coding.https://www.anthropic.com/engineering/built-multi-agent-research-systemCognition: https://cognition.ai/blog/dont-build-multi-agentsGet Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at agntcy.org Vanta - Simplify compliance - https://vanta.com/nlwPlumb - The automation platform for AI experts and consultants https://useplumb.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
Today on the AI Daily Brief,
should you be building single agents or multi-agent systems?
Before that on the headlines,
it's apparently settlers of Catan but with AI companies.
The AI Daily Brief is a daily podcast and video
about the most important news and discussions in AI.
Hello friends, quick announcements.
Thanks as always to our sponsors for today's show,
KPMG, Blitzy, Fanta, and Superintelligent.
To get an ad-free version of the show,
go to patreon.com slash AI Daily Brief,
and a quick flag, as I mentioned last week,
We are finalizing some sponsorship sales of fall inventory, so if you are interested in sponsoring
the show, shoot me a note, NLW at Breakdown.net. And I can share more information. For now,
though, let's get into a very interesting behind the scenes look at what's actually going on with all
these big AI labs. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news
you need in around five minutes. Things are getting a little feisty in AI land. We're seeing
more and more aggressive competition when it comes to data.
even some companies that were previously partners kind of starting to be at loggerheads with one another.
In particular, tensions between OpenAI and Microsoft are flaring up as OpenAI's for-profit conversion
stalls out. Last month, you'll remember that OpenAI abandoned plans to ditch their non-profit
structure, but remain committed to converting to a public benefit company. The conversion would still
unlock normal corporate structure with regular equity shares and a path to going public. Microsoft
were reportedly the only ones on the cap table in opposition to the conversion, given their
lucrative revenue and profit-sharing agreement. Microsoft is currently entitled to a 20% share of
revenues and a 49% share of profits. It's all very speculative, but the deal is capped out at around
120 billion. Negotiations have been ongoing since October when the conversion was first contemplated,
but the Wall Street Journal reports that they become more difficult in recent weeks. The journal writes,
OpenAI's executives have discussed what they view as a nuclear option, accusing Microsoft
of anti-competitive behavior during their partnership. Now, that would involve seeking a federal regulatory review of
investment contract for antitrust violations. OpenAI is also considering a PR campaign to turn public
opinion against Microsoft. Issuing a joint statement, however, the two companies wrote,
We have a long-term productive partnership that has delivered amazing AI tools for everyone.
Talks are ongoing and we are optimistic that we will continue to build together for years to come.
Sources also told the journal that the windsurf acquisition was part of the problem.
Currently, Microsoft has full access to all of OpenAI's IP, including models and systems.
but OpenAI doesn't want Microsoft to gain access to Winserve's technology,
which would make GitHub co-pilot more competitive as an AI coding product.
A Winserve spokesperson said that the acquisition was, quote-unquote, speculative,
suggesting the deal is still in limbo while the dispute is ongoing.
The rest of the negotiations are about how many shares Microsoft will get post-conversion,
as well as OpenAI wanting to be released from exclusivity agreements.
The information reports that OpenAI is offering Microsoft a 33% stake in the restructured company,
which is a huge haircut from the 49% profit-sharing agreement.
Regarding the exclusivity agreement, the information writes,
OpenAI has told investors it wants to get out of its exclusive cloud contract with Microsoft,
which makes Microsoft the only cloud provider that offers OpenAI models for sale through an API.
Microsoft rivals Amazon and Google could jump at the chance to host OpenAI models on their servers,
which would make it easier for their cloud customers to use them.
They also note that Google has already lobbied the government to kill the exclusive deal.
Microsoft is also seeking to access OpenAI's technology past the current end date of 2030,
with an ongoing debate about the definition of AGI,
and lurking behind all of this
is that OpenAI faces an end-of-year deadline
to complete its restructuring
or to potentially lose out on at least $20 billion
in committed fundraising.
I would characterize the community's response to this
as not all that surprised.
Matthew Berman writes,
No one could have seen this coming.
It's kind of like the Thucidity's trap.
Eventually Microsoft and OpenAI will go to war.
Sam Altman has ambitions far greater
than being the model provider powering Microsoft co-pilot.
He wants OpenAI to be a new city
be Microsoft, Apple, and Google all rolled into one. It'd be interesting to see how this plays out.
Now, in another area of competition around all of these big labs, Google is cutting ties with
scale AI following their deal with meta. Citing five sources, Reuters reports that Google is
ending their relationship with scale after MetaSoft acquired them. Google had been Scale AI's number
one customer and was planning to spend about 200 million this year on data labeling. Roider
sources added that Microsoft and XAI were also planning to end their contracts. Reportively
scale brought in $870 million in revenue last year, with $150 million of it coming from Google.
A scale spokesperson declined a comment on the Google relationship, but insisted the company's
business remained strong. They added that the startup will continue to operate independently
and safeguard customers' data. For competitors, this is a big opportunity. Label Box CEO Manu Sharma
said that they would generate hundreds of millions of new revenue from customers fleeing
scale. Garrett Lord, the CEO of Handshake, said, our demand has tripled overnight after the news.
For some, this opens the question of what exactly met a just paid $15.
billion for. Signal writes,
meta-paying $15 billion for 49% implies around a $30 billion post-money valuation.
But that price was set before the recent exodus, OpenAI, Google, possibly others,
shifting to in-house infrastructure or rivals.
If those departures create a revenue tied to that 51%, then the implied valuation from
the meta-deal is massively distorted post-facto.
Zuck basically ends up with a business where $15 billion got you an entire company without
actually buying the whole company.
Now, if you think that that means that it's increasingly looking like most of the
value of the deal seems to be about bringing CEO Alexander Wang and key members of his team in
house, you are not alone in that feeling. Indeed, details are emerging from inside the massive deal,
explaining exactly how this whole thing went down. The information reports that the deal started
with Mark Zuckerberg floating a relatively modest, or at least comparably modest, $5 billion
investment into scale back in mid-April. Even at that early stage, he was primarily negotiating to secure
CEO Alexander Wang as the leader of his superintelligence team rather than the company itself. Wang
encountered with 20 billion, arguing that existing shareholders wouldn't go for a smaller investment
that risked destroying value in the company as customers fled. The pair eventually met at 14.3
billion for just under half of the company. The report states, this ultimately represents a $12.8 billion
winfall for existing stockholders via a dividend paid for by Meta's investment. That includes over a billion
dollars for Wang himself, according to sources close to the company. They said the payout will
vest over five years as long as Wang remains employed at Meta, and there have also been discussions
of Wang taking on the title of Chief AI Officer. The information writes,
Zuckerberg's deputies were concerned that Mehta was paying a lot to be a minority shareholder
with few rights in a startup that was losing its founder. The startup had also recently failed to meet
its own financial projections. Behind the scenes, they worked to shift the deal more in Mehta's favor.
They report that if scale sells within the next two and a half years,
meta will get fully paid back before other investors see a dime. The article also suggests that
meta is walking a tightrope between financial, regulatory, and strategic concerns.
It all seems pretty clear based on all the reporting so far
that what Zuckerberg cares about is getting back in the AI race
and he believes very strongly that Wang is the person to lead that effort.
This is a fascinating report with just a ton of hardball on how the deal came together.
At one stage, META asked for full voting rights,
aside from the appointment of board members.
Meta also asked for a poison pill that would inflate their control
if Alexander Wang was found to be helping to run scale while working at META.
Basically, the overall tenor of all of this
is that both sides knew the deal risked destroying value at the startup and needed some form of
protection. Through their liquidation preference, met a secure to return on their investment
if scale is sold off. For existing investors and staff, the protection was a lucrative payout now
rather than risking the company falling off. This deal, it feels, is going to be very binary.
Either it's going to make Zuckerberg look like a genius or an absolutely insane person,
and only time is going to tell which. For now that, that is going to do it for today's
AI Daily Brief headlines. Next up, the main episode.
Today's episode is brought to you by KPMG.
In today's fiercely competitive market,
unlocking AI's potential could help give you a competitive edge,
foster growth, and drive new value.
But here's the key.
You don't need an AI strategy.
You need to embed AI into your overall business strategy
to truly power it up.
KPMG can show you how to integrate AI and AI agents
into your business strategy in a way that truly works
and is built on trusted AI principles and platforms.
Check out real stories from KPMG to hear how AI
is driving success with its clients at www.kpmg.org.us slash AI. Again, that's www.kpmg.comg.coms
slash AI. This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business
leaders who are eager to implement cutting-edge AI. But instead of building competitive modes,
their best engineers are stuck modernizing ancient code bases or updating frameworks just to keep
the lights on. These projects, like migrating Java 17 to Java 21, often means staffing a team for a
year or more. And sure, co-pilots help, but we all know they hit context limits fast, especially
on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work,
Blitzy's autonomous platform handles the heavy lifting, processing millions of lines of code and
making 80% of the required changes automatically. One major financial firm used Blitzy to modernize
a 20 million line Java code base in just three and a half months, cutting 30,000 engineering
hours and accelerating their entire roadmap. Email Jack at Blitzie.com with Modernize and the subject
line for prioritized onboarding. Visit blitzie.com today before your competitors do.
Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just
claim security, they have to prove it. Achieving compliance with a framework like SOC 2, ISO-27-01, HIPAA,
GDPR, and more is how businesses can demonstrate strong security practices. The problem is that
navigating security and compliance is time-consuming and complicated. It can take months of work
can use up valuable time and resources. Vanta makes it easy and faster by automating compliance
across 35 plus frameworks. It gets you audit ready in weeks instead of months and saves you up to
85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieved
$535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the
numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com
slash NLW. That's V-A-N-T-A.com for $1,000 off.
Today's episode is brought to you by superintelligence, specifically agent readiness audits.
Everyone is trying to figure out what agent use cases are going to be most impactful for
their business, and the agent readiness audit is the fastest and best way to do that.
We use voice agents to interview your leadership and team, and process all of that information
to provide an agent readiness score, a set of insights around that score, and a set of
highly actionable recommendations on both organizational gaps and high-value agent use cases that
you should pursue. Once you've figured out the right use cases, you can use our marketplace
to find the right vendors and partners. And what it all adds up to is a faster, better agent
strategy. Check it out at B-Super.a.i or email agents at B-SupertA.I to learn more.
Welcome back to the AI Daily Brief. Today we have something really interesting,
almost a point-counterpoint on agent architectures, which is particularly pertinent as the conversation
around agentic systems becomes more mainstream and more endemic inside the enterprise.
On the one hand, we have Anthropic, which published a piece called How We Built Our Multi-Agent
Research System. It's a really in-depth piece that shows the architecture of their research system,
featuring an orchestrator, subagents, tools, etc. And then on the flip side, we have a blog post
from the creators of Devin, the AI code generation tool, with an argument against building multi-agent
systems.
This is obviously a hot topic.
For those of you who remember my coverage of Microsoft Build, one of the things that was
really notable is that they skipped right on over the stage where we were all excited about
some single agent that could do a bunch of stuff and instead moved right into this multi-agent
orchestration era, where they're building infrastructure for and thinking about and pushing on
their customers, more complex agentic systems, where multiple agents can come together to accomplish
more complete and extensive tasks. So let's dig into both of these arguments and figure out what
we have to learn from each of them. And let's start with Anthropic. Anthropics paper talks about
multi-agent systems using Anthropics Research Agent as an example. The research agent drives
Anthropics version of deep research with the basic idea to ensure that the system can search
out information across the web, collate it, and produce a final report. Anthropic achieves this
through a multi-agent system. They write in their section titled Benefits of a Multi-Agent
system, research work involves open-ended problems where it's very difficult to predict the required
steps in advance. You can't hard code a fixed path for exploring complex topics as the process is
inherently dynamic and path-dependent. This unpredictability makes AI agents particularly well
suited for research tasks. The model must operate autonomously for many turns, making decisions about
which directions to pursue based on intermediate findings. A linear one-shot pipeline cannot handle
these tasks. And so Anthropic instead turns to a multi-agent research system.
The system uses a central lead agent that orchestrates the process and multiple subagents
that access search tools.
A user request comes to the lead agent, which then strategizes, the task is broken down
into different chunks, and instructions are provided to subagents to carry out each portion
of the task.
The subagents can iterate on their search and go a little deeper if they need to, but they're
ultimately designed to find very granular individual pieces of information.
Anthropic gives the example of using the research feature to find a list of every board
member across S&P 500 companies. In that example, each sub-agent might be tasked with looking
for a single company's list of board members, and once those individual results are returned,
the lead agent collates them into a single report. There are lots of benefits to the method that
Anthropic describes. Tasks are run in parallel with dozens of sub-agents working at the same
time on different tasks, which, as you would imagine, significantly increases speed. It also means
that each individual sub-agent is less critical. In single-agent systems, a single-failed task could
break the entire workflow, whereas with a multi-agent system, the orchestration agent can just spawn a
new sub-agent, modifying the instructions and try again. Anthropic found that they could run tasks using
lesser models as sub-agents and actually come up with improved results. A multi-agent system with
Claude 4 opus as the orchestrator and Claude 4 sonnet sub-agents outperformed a single Claude4
Opus agent by 90.2% on their internal evaluations. Multigent systems also will allow you to work around
context limits. The sub-agents can use as many tokens as they need to complete a task,
which is a small fragment of the overall workflow. You're essentially working across multiple
contacts windows that all feed a compressed answer back to the lead agent. For complex tasks,
this avoids having the issue of a single agent hitting the context limit before the task is
complete. Indeed, Anthropic basically says that there's nothing all that complicated about
why these multi-agent systems work. They write, multi-agent systems work mainly because they help
spend enough tokens to solve the problem. We found that token usage by its
explains 80% of the improvement of multi-agent systems. In the particular use case of research on
the internet, Anthropic argued that a multi-agent system is the most suitable. Now, this ability to
use more tokens is also a downside. Multi-agent systems can be extremely expensive to run. They write,
in our data, agents typically use around four times more tokens than chat interactions,
and multi-agent systems use around 15 times more tokens than chats. They write,
multi-agent systems require tasks where the value of the task is high enough to pay for the increase
performance. All right, so that is the multi-agent research system example. But what about this blog post
titled simply Don't Build Multi-Agens? This one, as I said, comes from cognition who are mainly
focused on building an AI and agenda coding platform. Author Walden Yan writes,
frameworks for LLM agents have been surprisingly disappointing. I want to offer some principles
for building agents based on our own trial and error and explain why some tempting ideas are
actually quite bad in practice. Basically, the argument here is that,
instead of building a multi-agent system, which they argue can be quite fragile,
build a single agent that can reliably complete tasks end-to-end.
Now, what they're not suggesting is some simple architecture of just using one giant agent,
loading it up with context and letting it run, but instead they're suggesting that users
should try a linear handoff design. They cover this in a few different levels of complexity
with differences in how the handoff is achieved, but the basic idea is that the first agent
receives the query and breaks the requisition down into steps, very similar to the orchestrating agent
in a multi-agent system, but then it hands off the necessary context to a sub-agent to complete the
first sub-task. This sub-agent then hands off the original context with their additions to the next
sub-agent and so on and so forth. And context is the key word here. In fact, the big term that
they're focused on in this piece is context engineering. They write, in 2025, the models out there
are extremely intelligent, but even the smartest human won't be able to do their job effectively
without the context of what they're being asked to do.
Prompt engineering was coined as a term for the effort needed to write your task in the ideal
format for an LLM chatbot.
Context engineering is the next level of this.
It's about doing this automatically in a dynamic system.
It takes more nuance and is effectively the number one job of engineers building AI agents.
By way of example, they show a simple schematic of a main agent that breaks down a task
into two sub-tasks, spitting up two sub-agents to do those tasks, with the main agent
then combining the results. They write, this is a tempting architecture, especially if you work in a domain
of tasks with several parallel components to it. However, it's very fragile. The key point of failure is this.
They then use an example of building a flappy bird clone. They write, this task gets divided into
subtask one, build a moving game background with green pipes and hitboxes, and subtask two,
build a bird you can move up and down. It turns out sub-agent one actually mistook your sub-task
and started building a background that looks like Super Mario Bros.
Sub-agent 2 built you a bird, but it doesn't look like a game asset,
and it moves nothing like the one in Flappy Bird.
Now the final agent is left with the undesirable task of combining these two miscommunications.
The issue here, or the concern at least,
is that the sub-agents don't have enough context to actually deliver the result
that the context requires.
The blog writes,
you might think that a simple solution would be to just copy over the original task
as context to the sub-agents as well,
that way they don't misunderstand their sub-task.
But remember that in a real production system, the conversation is most likely multi-turn,
the agent probably had to make some tool calls to decide how to break down the task,
and any number of details could have consequences on the interpretation of the task.
Unfortunately, they suggest that just giving the context still might not be enough,
or at least doesn't address all the problems.
In the second scenario, where each of the sub-agents has the context of the conversation
in action so far, as well as their work assignments, still could have problems.
They write, when you give your agent the same flappy bird cloning task this time, you might end up with a bird in background with completely different visual styles.
Sub-agent 1 and sub-agent 2 cannot see what the other was doing, and so their work ends up being inconsistent with each other.
This leads them to two principles of agent design.
Principle 1, share context and share-ful agent traces, not just individual messages.
And principle 2, actions carry implicit decisions and conflicting decisions carry bad results.
The author writes,
I would argue that principles one and two are so critical and so rarely worth violating that you should by default rule out any agent architectures that don't abide by them. You might think that this is constraining, but there is actually a wide space of different architectures you could still explore for your agent. And that's where we get into the idea of the single agent that breaks down the task in much the same way that the orchestration agent does, but then does the tasks sequentially and linearly, carrying the context of conversations and actions with it at each step. Now, this is not a silver bullet.
As they point out, you might run into issues for very large tasks with so many subparts that
context windows start to overflow.
However, the author argues for a set of different approaches that can solve that problem.
So one example they gave of a slightly different architecture for longer tasks is still using
the same chain of the original agent doing the sub-tasks, but with another LLM providing context
compression along the way.
Instead of handing off the full context, a side process compresses the key moments and decisions
of the conversation and action so far into a smaller, more manageable set of context.
that can be used for the next part of the sequence.
Cognition writes, this is hard to get right.
It takes investments into figuring out what ends up being the key information
and creating a system that is good at this.
So what to make of these differences in approach?
First of all, it's important to remember that we are talking about
different sets of use cases here.
Cognition is narrowly focused on coding agents,
and although the author is not strictly talking about coding agents,
this is a main example that they're drawing from
as they talk about building a high-performance agent that can take on long task
while retaining context throughout.
Anthropic, on the other hand, are talking specifically about workflows that lend themselves
to parallelization.
Basically, any workflow where the subtasks are completely independent and discreet, and don't
rely on one another, in how they need to be recombined, could be a good candidate for using
multi-agent systems.
They even note that not every use case is good for this type of system that they're talking
about.
They write, some domains that require all agents to share the same context or involve many
dependencies between agents are not a good fit for multi-agent systems today. For instance,
most coding tasks involve fewer truly paralyzable tasks than research, and LLM agents are not
yet great at coordinating and delegating to other agents in real time. We found that multi-agent
systems excel at valuable tasks that involve heavy parallelization, information that exceeds
single context windows, and interfacing with numerous complex tools. So when it comes to which
of these approaches you or your company should take, as always, context is king and the use case
matters. Are the subtasks dependent upon one another, or are they truly independent and can be done
in parallel? The answer to that is going to dictate which of these systems might be the better
choice. The other point to remember is that this is very much a snapshot in time. The cognition
author even writes that they're optimistic about the long-term possibilities of agents
collaborating with one another. It's just that right now, in their estimation, running multiple
agents in collaboration at today's capability levels results in fragile systems.
As you might imagine, there are very few areas of development right now that have more sheer
tonnage of brainpower on solving some of these issues than these questions of multi-agent systems.
So while as much of the cognition post is well considered, and as we discussed, Anthropic
even agrees with in some ways, the constraints of today are unlikely to be the constraints
of tomorrow. Still two really valuable pieces, and hopefully even for a lay audience that's
listening to this, gives you a better sense of how agents are being built right now and how you
might want to think about them. For now, that's going to do it for today's AI Daily Brief. Until next time,
peace.
