The AI Daily Brief: Artificial Intelligence News and Analysis - Walmart Blasts Past Agent Experimentation
Episode Date: July 29, 2025Walmart has evolved from individual AI agents to "agent orchestration" - a unified system of four "super agents" that coordinate specialized sub-agents across their entire operatio...n. This shift represents moving beyond the experimentation phase to full-scale agentic systems, featuring Sparky (customer shopping agent), Marty (partner/supplier agent), and agents for employees and developers. Ask GPT about our Agent Readiness Audits - https://bit.ly/supersuperagentBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at agntcy.org Vanta - Simplify compliance - https://vanta.com/nlwPlumb - The automation platform for AI experts and consultants https://useplumb.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
Today on the AI Daily Brief, Walmart moves from agent experimentation to agent orchestration,
and before that in the headlines, reports that the forthcoming GPT5 could be very, very good at coding.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Hello, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, Blitzy, Vanta, Plum, and Superintelligent.
As always, to get an ad-free version of the show, go to patreon.com slash AI Daily Brief.
And if you are interested in sponsoring the show, shoot me a note at NLW at Breakdown.net.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes.
GPT5 fever is in the air and we are getting increasingly specific rumors as the much-anticipated model becomes closer and closer.
Specifically, the recent scuttlebutt is that, as the information's article puts it, OpenAI's GPT-5 shines in coding tasks.
Right, Stephanie Palazolo, GBT5 is almost here and we're hearing good things.
The early reaction from at least one person whose used the unreleased version was extremely positive.
GPT5 shows improved performance in a number of domains, including the hard sciences,
but the most notable improvement comes in software engineering.
GBT5 is not only better at academic and competitive programming problems,
but also at more practical programming tasks that real-life engineers might handle,
like making changes in a large, complicated codebase full of old code.
That nuance has been something Open AIs models have struggled with in the past, and is one reason why
Rival Anthropic has been able to keep its lead with many app developer customers.
And if you are a regular listener, you will know that honestly at this point, Anthropics' dominance of
coding use cases has to be one of the most long-duration leads that we've seen in the foundation
model space since Chatchibati kicked things off a couple of years ago. Going all the way back
to Sonnet 3.5, Anthropic models really have been the default for coding use cases, although
certainly Gemini's 2.5 suite, including Flash, have challenged that more recently.
Now, in addition to that report from the information, there have also been more suspected sightings
of GPT5 in the wild. Last week, a new model codename summit showed up on LM Arena.
Professor Ethan Mollick put it through its paces with his preferred custom benchmark posting.
Kind of amazing. The Mystery Model Summit, with the prompt,
create something I can paste into P5JS that will startle me with its cleverness in creating
something that invokes the control panel of a starship in the distant future, and make it better.
2,351 lines of code first time. He showed the results, and by the way, if you are listening
rather than watching, this is worth maybe jumping over to YouTube or Spotify for,
it's a ridiculously detailed control panel, unlike anything a previous model had generated.
Grock 4's version was cool, but far less detailed. While six months ago, the results were fairly
basic. Now, I'm not necessarily the hugest fan of custom coding benchmarks like the hexagon
test because they tend to be a little bit one-dimensional. What's nice about Ethan's prompt
is that it touches on coding, creativity, and planning all in one hit. Now, aside from how visually
impressive the result was, getting over 2,000 lines of functional code out of a one-line prompt
is no small feat. Another mystery model that people were enraptured with was called Zenith.
Justine Moore from A16Z writes, I'm blown away by some of the outputs from the new mystery model on
Elam Arena. It's called Zenith and it seems to be good at a bunch of things, but I find the one-shot
coding of functional games, this is Minecraft, to be particularly impressive. AI Battle wrote,
The Zenith model that is being tested in Elm Arena is producing some amazing outputs, with just a
single prompt, it generated gun sounds, sprinting mechanics, a mini-map, and detailed textures
for a Doom-style game. Ilker tweeted, I can't believe what I just saw. While testing robot SVG drawing
on Elm Arena, I stumbled on the best SVG model I've seen so far. Outputs an actual animated SVG, model
code name is Zenith? Could this be GPT5? Well, for those who are looking to actually get in there
and test this themselves, last night, Vrasser X noted that Summit and several other codenamed variants
had been pulled. The models that have been removed include Summit, Zenith, Starfish,
nectarine, and lobster, to which Vrasser X responded, all the GPT5 models have officially left
the web dev arena. The release is imminent. Get ready, folks. Now, speaking of the coding use case,
Google is testing out a new vibe coding tool of their own. Called Opel, the tool is a tool is
in some ways a little bit closer to N8N than it is to lovable or replet. The idea is to allow
non-technical users to create apps using natural language prompts, but the point of differentiation
is that Opal is geared towards mini apps that live in Google's AI Studio rather than big, fully
functioning experiences of the type that people are going to lovable or bold for. In this way,
it's a little closer to a workflow automation tool that allows users to chain together multiple
prompts and tap Google's range of models to generate outputs. Google demonstrated the product being used
to auto-generate blog posts, tapping into text, image, and video models along the way.
Google also seems to be leaning into the social aspect of vibe coding with a remix gallery
and a set of samples to get you started.
Eric Friedman wrote,
Just tried out Opel the new AI tool from Google.
Wow, things are changing very fast.
I've seen a bunch of startups and tools like this, but seeing Google roll it out so quickly
is wild.
And you can see in Eric's tweet, an interface that will be very similar to you if you've
used N8N or Lindy or anything like that.
V.C. Nibil Hyatt writes,
interesting experiment in the AI Apps Workflow Builder category, even if I'm pretty sure I never
want to see another node-based graph in my life. From where I'm sitting every single type of use
use case around coding, low-code and no-code app development, etc., is going to be some of the
most significant area of development for the foreseeable future. Moving over to Funding Land, Anthropic
is apparently fielding offers for fundraising that would value the startup up at $150 billion.
The information reports that they are now in early discussions to raise between $3 and $5 billion
and almost triple their $61 billion valuation from March.
This is a big jump up from the reported interest
that they were fielding earlier this month at $100 billion.
But of course, Anthropic has not only reached $4 billion in ARR this summer
up from a billion at the beginning of the year,
the rate at which they are growing has also increased fairly dramatically.
After the leaked memo from last week,
where CEO Dario Amade said that they were going to be more flexible
and actually consider taking money from Gulf State investors,
the latest reporting suggests that the new interest is coming from,
Abu Dhabi State-affiliated Fund, MGX. Now, MGX already owns a roughly 8% stake in Anthropic,
which they purchased last year in the secondary market from bankrupt crypto firm FTX.
The numbers seem high until you realize that $150 billion is only 40x revenue,
which, while yes, is meaningful, is a lot less crazy than it might appear at first glance.
Lastly, today an update in the Talent Wars. Mark Zuckerberg has installed former OpenAI
researcher, Shang Jiaziao, as the chief scientist for META's Super Intelligence Group.
Zhao worked on multiple iterations of OpenAI's frontier models, including major research contributions
to develop reasoning for 01. He'll be reporting to chief AI officer Alexander Wang and gives the team a leader
with stronger research credentials. Posted Zuckerberg, in this role, Shang Jia will set the research
agenda and scientific direction for our new lab working directly with me and Alex.
Shang Jaya co-founded the new lab and has been our lead scientist from day one. Now that our recruiting
is going well and our team is coming together, we've decided to formalize his leadership role.
Shangjiya has already pioneered several breakthroughs including a new scaling paradigm and distinguished himself
as a leader in the field. I'm looking forward to working closely with him to advance his scientific vision.
Now, if you are wondering how this new position fits with Jan Lacoon's role as chief AI scientist for the company,
Zuck added, to avoid any confusion, there's no change in Yan's role. He will continue to be chief
scientist for fare. Yon himself responded, my role as chief scientist for fair has always been
focused on long-term AI research and building the next AI paradigms. I'm looking forward to working with
Changjia to accelerate the integration of new research into our most advanced models.
Points out Charles Dehan, dude is three years out of grad school. Wild times, my friends, but for now,
that is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode.
This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform
with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours
to understand enterprise-scale code bases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzie platform,
bringing in their development requirements.
The Blitzy platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie as their pre-I-D-E development tool,
pairing it with their coding co-pilot of choice to bring an AI-native STLC into their org.
Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises.
The team will provide a 5x velocity increase on a real development project in your org.
Visit blitzie.com and press book demo to learn how Blitzie transforms your STLC
from AI-assisted to AI Native.
That's BLITZY.com.
As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal.
But with AI accelerating how quickly startups build and ship, security expectations,
expectations are higher earlier than ever. Getting security and compliance right can unlock growth or
stall it if you wait too long. With deep integrations and automated workflows built for fast-moving
teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models,
infra and customers evolve. Fast-growing customers like Langchane, writer and cursor trusted Vanta to
build a scalable foundation from the start. And look, as someone who lives in the world of
enterprise procurement, I love how Vanta makes it easy to get compliance right. The last thing is
you need when you're trying to win that big deal is to have it scuttled by something that Vanta
has solved for over 10,000 companies. Go to vanta.com slash NLW to save $1,000 today through the Vanta for
Startups Program and join over 10,000 ambitious companies already scaling with Vanta. That's v-a-n-ta.com
to save $1,000 for a limited time. You keep building the same AI automations for different
clients. Sound familiar? You've got workflows people actually want, but you're stuck in the
services trap, or worse, just selling one-off copies that give away all your IP. Sure, you could
vibe code your way to an entire product, but hosting, authentication, payment support, it's a lot of
work for a little upside. Or you could just use Plum. Plum is the platform where technical creators like
you build an audience of paid subscribers for your AI workflows. Subscribers get updates every time
you improve things, your prompts stay protected, and you finally get recurring revenue. Ready to stop
trading time for money? Visit useplum.com. That's Plum with a B.
If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point.
But I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial Readiness Report.
Over the last six months, Super Intelligence has built out an entire Agent Planning Suite.
We help you move from Discovery to Planning to Implementation.
After you've completed your Agent Readiness Audits, we help you double-click on your most important use cases with what we call our Use Case Planning Reports.
These reports are going to help you understand what sort of technical preparation you need to do to be ready for a use case,
what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination.
After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for.
If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions.
just go to bit.ly slash super agent. That's bit.l.ly slash super agent, all one word. And if you have any
questions, the agent can even help you book an appointment with our team. Welcome back to the AI Daily Brief.
So something interesting happened last week. Walmart made an announcement about their updated agent
strategy, which without digging into it too much, if you had asked me, I would have predicted that
it was sort of just a corporate press release, right? I wouldn't have thought that it would get all that
much attention. We've sort of moved out of the stage at least a little bit where you can talk about
your AI or agent strategy without really saying anything new and get earned media for it just because.
And yet, outlet after outlet after outlet ran stories about Walmart's agent strategy. I thought this
was really interesting and so I started to dig into it a little bit deeper and I actually think
it's an interesting bellwether of where things are headed. The TLDR is that Walmart is moving from an
agent experimentation phase to what I would call an agent orchestration phase. Basically, they're
moving from single spot agents that can do small tasks well to overarching systems where management
agents can orchestrate and coordinate subagents that do those individual tasks into a hole
that is hopefully greater than the sum of its parts. So that's what we're going to talk about today,
but just to give a little bit of background, especially for the fairly decent chunk of you who are not in the US,
it's kind of hard to overstate just how enormous Walmart is.
Walmart is the largest employer in America with over 2.1 million employees.
They are by far the world's largest retailer.
These are 2020 numbers, but more recent numbers had it at $635 billion
as opposed to Amazon's $360 billion.
And even more than that, just in terms of companies ranked by revenue,
Walmart is bigger than Saudi Aramco, PetroChina, and everyone else.
The point being that Walmart is huge in the same.
And so maybe that's part of what's capturing attention.
The other thing that's interesting to note, though, is that despite the fact that they are in the
same space as Amazon, Walmart has in no way shirked or decided to retreat from digital battles
and is one of the most aggressive about connecting how they operate their physical retail stores
with the latest new technology.
Walmart is not just a physical retailer.
It's also an insanely complicated logistics business.
It's got one of the largest e-commerce platforms on the planet, and it also has a sprawling
white-collar organization. In other words, there is lots and lots of room for agents to come in and
find new efficiencies and help them do things differently. Now, the big announcement was not only
that Walmart is to use the title of the blog post by Global CTO Suresh Kumar, all in on agents,
the way that they're thinking about agents is particularly relevant. Kumar writes,
I believe in the power of agentic AI to transform industries. We've been building agents
fast for every aspect of the business. Once we saw how quickly teams were adopting these agents
and how helpful they were, we realized agents weren't just useful. They were essential.
However, Kumar and Walmart also found a challenge. We also recognized, he said, that multiple
agents, even if each one is useful, can quickly become overwhelming and confusing. So he writes,
we made a deliberate choice to go beyond individual tools and build a unified company-wide framework,
one that ensures every new agent we roll out makes life simpler and easier for everyone,
for consumers, for customers, for associates, and for our partners. So at the core of the
strategy they announced are four of what Kumar and Walmart are calling super agents. And in many ways,
this is basically an orchestration agent or a boss agent or a management agent or whatever you want to
call it. An agent that sits on top interfaces with its intended audience and can route whatever
they're trying to do to the right subagents. Among the four super agents, there is one for customers,
one for their team and associates, one for their partners, and one for their developers.
The grouping then is not by particular task. It's grouped by.
particular user and relatedly grouped around their connectivity to particular sets of data.
Now, one thing that was interesting about the reporting is that it presented it like it was a big
shift in strategy, away from individual agents and towards these super agents.
The Wall Street Journal piece, for example, was called why Walmart is overhauling its
approach to AI agents.
And all of this reporting really honed in on how things were getting so confusing with
so many different agents.
I don't think that that's exactly the story here.
And no disrespect to the Wall Street Journal or any other outlet who covered it that way.
It was clearly embedded in the narrative that Walmart was presenting.
This is really the natural evolution from moving from an agent experimentation phase
where you're going to naturally release lots of spot agents that can do very discrete things
to see how they function and see how much they improve how different processes work
into an overarching system.
And the nature of agents is such that it's going to make sense in those overarching systems
to have super agents, to use Walmart's term, to orchestrate and manage and interact with all of those
subagents that do different things. In other words, this isn't so much an overhaul as a natural
evolution, but while Walmart is naturally evolving its approach to AI agents doesn't make for as good
a headline. Also, frankly, it's not the job of everyone who's interacting with these systems
to understand what's going on behind the scenes. In fact, Walmart's strategy here in some ways
could be seen as parallel to what we're anticipating from GPT-5,
which is that in addition to hopefully improved general capability set,
a big part of the transition will be from the model selector
where you have to pick between O3 or 40 Mini or 4-0 or whatever model you want to use
to an interface where the interface is smart enough to be able to interact with the prompt
and figure out which model is best suited to accomplishing the goal.
Basically removing the burden of the user to figure out how to get the best,
best out of the AI, to the AI being smart enough to actually figure that out for itself based on
what the user's end goal is. So of the four agents that they're building, two of them have names.
There is Sparky, which is their customer shopping agent, Marty, which is their partner agent,
interacting with suppliers, sellers, and advertisers. And then there are two, so far unnamed agents,
one for their associates and teams, which can be anything from scheduling to sales data,
and one for their developers. These agents are at various stages of rollout right now, not
all of them are fully live. Sparky, the customer-facing agent, is live right now, but will have
expanded capabilities coming in future months. Marty, which is the supplier-facing agent, is not
currently live, but is expected to launch soon, and the employee and engineer agents are expected
at some point over the next year. Still, when reporters expressed skepticism and basically wanted to
make sure that they weren't reporting on hopes and dreams, the executives that they were given access
to really hammered home that these were not big future ideas, but building on systems that
were already in place. As one executive put it to fortune, it's not vaporware. What's more,
even if the superagents aren't ready, there's already a ton of interaction with the individual
subagents that will become part of these systems. For example, retail dive reported that Walmart
currently has 900,000 associates that interact with their internal conversational AI tool asking
3 million questions per week. The company also says that they've already used AI to cut
resolution time and customer support by up to 40%. They've cut fashion production timelines by
up to 18 weeks, and they've cut the time needed for shift planning by team leads from 90 minutes
to 30 minutes. That 90 minute to 30 minute cut may not seem like all that much at first glance,
but across 2.1 million employees, saving an hour at a time across every instance of needing
to plan shifts is actually going to represent tens of thousands of hours saved, if not more.
Now, as part of the announcement, Walmart is also staffing up. A day in advance of this, they announced,
that former Instacart executive Daniel Danker
will become their head of global AI acceleration,
product, and design.
And to give an indication of how important they view this role,
Danker will report directly to Walmart CEO Doug McMillan.
Interestingly, Danker was a product guy at Instacart,
being most recently their chief product officer.
And this trend of companies hiring AI leads
at very high levels that report directly into the CEO
is absolutely a trend,
which I'm sure we're going to be talking about more on this show.
Now, there is a lot more that we could dig into
with this specific Walmart announcement.
One of the things that's really interesting,
for example, if we think about Sparky,
is that this is not just about providing
a better consumer experience.
It's about fundamentally rethinking it.
Forbes wrote a piece called Walmart Reveals AI Roadmap
that points to a world without search bars.
They quoted Hari Vasudev,
who's the CTO of Walmart US,
from the retail rewired innovation event
where they announced all of this stuff.
Hari said,
we expect that the search bar
and the conventional way of searching for items
will be replaced by this multimodal interface.
and Sparky. He continued, you could basically give it a very high-level task saying, you know, I've
just moved into a new apartment. I'm looking to furnish the entire apartment within this budget, this
color scheme, and Sparky will come back and give you the entire selection that'll help you meet
exactly that need. In other words, this is a shift from keyword-based search to task-based
shopping. The goal is not the more efficient retrieval of relevant results, but agents completing entire
planning and shopping workflows. Given how many customers Walmart has, their size,
size and influence in this industry, they could basically self-fulfilling prophecy this modality
of interacting with retail into existence. Now, one other really interesting part of these announcements
was the ecosystem approach they're taking to building these tools. In the Wall Street
Journal article, they write, Walmart said it's connecting these various agents using an open-source
standard known as Model Context Protocol or MCP. When Walmart first started building agents,
MCP wasn't super widespread. But according to CTO Kumar, the company is now going to
going back and making sure that even their older agents can form with the standard.
And this seems to be more than a throwaway priority.
Vasudev said during the announcement,
from the perspective of the technology and the product stack,
we're certainly building Sparky to be capable of interacting with both other agents
as well as with humans at the other end.
Now, one of the things that makes this interesting
is that right now there are different ends working towards the middle
when it comes to agentic shopping experiences.
What I mean by that is retailers like Walmart are working on their own agentic experiences,
but there's also competition for consumers to have their own agents that work on their behalf
that are basically their personal representatives and don't have anything to do with Walmart's
proprietary agents. The decisions that a company like Walmart makes around how much those
personal shopping agents will be able to interact with their proprietary tools will have a big
impact on how agentic shopping actually rolls out in practice.
writes Forbes, rather than forcing customers into platform-specific AI experiences,
Walmart appears to be preparing for a world where multiple AI
assistants coordinate on behalf of consumers. Sparky could evolve from a shopping assistant
into the foundation for agent to agent commerce, where your personal AI assistant negotiates with
Walmart systems to complete purchases, compare prices, or manage subscriptions. Indeed, they go on,
this external agent capability could position Walmart as the hub for AI-mediated shopping across the
entire ecosystem, not just Walmart purchases. Now, I think that this is absolutely the right
approach at this stage. On the one hand, Walmart is of a size where it could throw its weight around
to force consumers into its own walled gardens.
But I think we simply don't know yet enough about how agentic shopping is likely to play out
for companies to take that sort of aggressive approach
without really running the risk of running counter to how things actually evolve in practice.
And a last note on this that I found really interesting.
Lest you think that this is all just being driven by the financial organization,
I noticed that back at the end of June,
a meta senior ML engineer had posted a research paper from Walmart
on agentic rag for personalized recommendations.
The paper is called
Agentic Retrieval Augmented Generation for Personalized Recommendation
and is by a group of Walmart global tech researchers
based in California and Washington.
The point being that Walmart is very clearly fully playing this game,
not just taking what the market is giving them,
but actually trying to push the frontier
of what AI systems in retail can do.
Now, zooming out as we wrap up here,
I think if you're trying to make sense of this,
the way to look at it is not so much
what it means for consumers,
although we do now have a great big additional playing field to understand
agentic experiences in retail and see how it works in practice.
I think instead it's to better understand that what the world's largest retailer
and largest company by revenue globally has just said
is that the future is not only agents, but complete agentic systems,
agent orchestrators, agent managers, multi-tier agent systems, in other words,
that work across entire functions.
Not to put too much pressure on you if you are an enterprise listener
who's just excitedly kicking off your first individual agent experiments,
but the biggest companies in the world are now racing past the individual agent stage
right on into the agent orchestration and agent system stage.
As always then, the subtext of this show is put simply, speed up, friends, speed up.
For now that, that is going to do it for today's AI Daily Brief.
Thanks as always for listening or watching, and until next time, peace.
