The AI Daily Brief: Artificial Intelligence News and Analysis - Walmart Blasts Past Agent Experimentation

Starting point is 00:00:00 Today on the AI Daily Brief, Walmart moves from agent experimentation to agent orchestration, and before that in the headlines, reports that the forthcoming GPT5 could be very, very good at coding. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Blitzy, Vanta, Plum, and Superintelligent. As always, to get an ad-free version of the show, go to patreon.com slash AI Daily Brief. And if you are interested in sponsoring the show, shoot me a note at NLW at Breakdown.net. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes.

Starting point is 00:00:46 GPT5 fever is in the air and we are getting increasingly specific rumors as the much-anticipated model becomes closer and closer. Specifically, the recent scuttlebutt is that, as the information's article puts it, OpenAI's GPT-5 shines in coding tasks. Right, Stephanie Palazolo, GBT5 is almost here and we're hearing good things. The early reaction from at least one person whose used the unreleased version was extremely positive. GPT5 shows improved performance in a number of domains, including the hard sciences, but the most notable improvement comes in software engineering. GBT5 is not only better at academic and competitive programming problems, but also at more practical programming tasks that real-life engineers might handle,

Starting point is 00:01:27 like making changes in a large, complicated codebase full of old code. That nuance has been something Open AIs models have struggled with in the past, and is one reason why Rival Anthropic has been able to keep its lead with many app developer customers. And if you are a regular listener, you will know that honestly at this point, Anthropics' dominance of coding use cases has to be one of the most long-duration leads that we've seen in the foundation model space since Chatchibati kicked things off a couple of years ago. Going all the way back to Sonnet 3.5, Anthropic models really have been the default for coding use cases, although certainly Gemini's 2.5 suite, including Flash, have challenged that more recently.

Starting point is 00:02:04 Now, in addition to that report from the information, there have also been more suspected sightings of GPT5 in the wild. Last week, a new model codename summit showed up on LM Arena. Professor Ethan Mollick put it through its paces with his preferred custom benchmark posting. Kind of amazing. The Mystery Model Summit, with the prompt, create something I can paste into P5JS that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future, and make it better. 2,351 lines of code first time. He showed the results, and by the way, if you are listening rather than watching, this is worth maybe jumping over to YouTube or Spotify for,

Starting point is 00:02:39 it's a ridiculously detailed control panel, unlike anything a previous model had generated. Grock 4's version was cool, but far less detailed. While six months ago, the results were fairly basic. Now, I'm not necessarily the hugest fan of custom coding benchmarks like the hexagon test because they tend to be a little bit one-dimensional. What's nice about Ethan's prompt is that it touches on coding, creativity, and planning all in one hit. Now, aside from how visually impressive the result was, getting over 2,000 lines of functional code out of a one-line prompt is no small feat. Another mystery model that people were enraptured with was called Zenith. Justine Moore from A16Z writes, I'm blown away by some of the outputs from the new mystery model on

Starting point is 00:03:19 Elam Arena. It's called Zenith and it seems to be good at a bunch of things, but I find the one-shot coding of functional games, this is Minecraft, to be particularly impressive. AI Battle wrote, The Zenith model that is being tested in Elm Arena is producing some amazing outputs, with just a single prompt, it generated gun sounds, sprinting mechanics, a mini-map, and detailed textures for a Doom-style game. Ilker tweeted, I can't believe what I just saw. While testing robot SVG drawing on Elm Arena, I stumbled on the best SVG model I've seen so far. Outputs an actual animated SVG, model code name is Zenith? Could this be GPT5? Well, for those who are looking to actually get in there and test this themselves, last night, Vrasser X noted that Summit and several other codenamed variants

Starting point is 00:03:59 had been pulled. The models that have been removed include Summit, Zenith, Starfish, nectarine, and lobster, to which Vrasser X responded, all the GPT5 models have officially left the web dev arena. The release is imminent. Get ready, folks. Now, speaking of the coding use case, Google is testing out a new vibe coding tool of their own. Called Opel, the tool is a tool is in some ways a little bit closer to N8N than it is to lovable or replet. The idea is to allow non-technical users to create apps using natural language prompts, but the point of differentiation is that Opal is geared towards mini apps that live in Google's AI Studio rather than big, fully functioning experiences of the type that people are going to lovable or bold for. In this way,

Starting point is 00:04:39 it's a little closer to a workflow automation tool that allows users to chain together multiple prompts and tap Google's range of models to generate outputs. Google demonstrated the product being used to auto-generate blog posts, tapping into text, image, and video models along the way. Google also seems to be leaning into the social aspect of vibe coding with a remix gallery and a set of samples to get you started. Eric Friedman wrote, Just tried out Opel the new AI tool from Google. Wow, things are changing very fast.

Starting point is 00:05:03 I've seen a bunch of startups and tools like this, but seeing Google roll it out so quickly is wild. And you can see in Eric's tweet, an interface that will be very similar to you if you've used N8N or Lindy or anything like that. V.C. Nibil Hyatt writes, interesting experiment in the AI Apps Workflow Builder category, even if I'm pretty sure I never want to see another node-based graph in my life. From where I'm sitting every single type of use use case around coding, low-code and no-code app development, etc., is going to be some of the

Starting point is 00:05:28 most significant area of development for the foreseeable future. Moving over to Funding Land, Anthropic is apparently fielding offers for fundraising that would value the startup up at $150 billion. The information reports that they are now in early discussions to raise between $3 and $5 billion and almost triple their $61 billion valuation from March. This is a big jump up from the reported interest that they were fielding earlier this month at $100 billion. But of course, Anthropic has not only reached $4 billion in ARR this summer up from a billion at the beginning of the year,

Starting point is 00:05:58 the rate at which they are growing has also increased fairly dramatically. After the leaked memo from last week, where CEO Dario Amade said that they were going to be more flexible and actually consider taking money from Gulf State investors, the latest reporting suggests that the new interest is coming from, Abu Dhabi State-affiliated Fund, MGX. Now, MGX already owns a roughly 8% stake in Anthropic, which they purchased last year in the secondary market from bankrupt crypto firm FTX. The numbers seem high until you realize that $150 billion is only 40x revenue,

Starting point is 00:06:28 which, while yes, is meaningful, is a lot less crazy than it might appear at first glance. Lastly, today an update in the Talent Wars. Mark Zuckerberg has installed former OpenAI researcher, Shang Jiaziao, as the chief scientist for META's Super Intelligence Group. Zhao worked on multiple iterations of OpenAI's frontier models, including major research contributions to develop reasoning for 01. He'll be reporting to chief AI officer Alexander Wang and gives the team a leader with stronger research credentials. Posted Zuckerberg, in this role, Shang Jia will set the research agenda and scientific direction for our new lab working directly with me and Alex. Shang Jaya co-founded the new lab and has been our lead scientist from day one. Now that our recruiting

Starting point is 00:07:06 is going well and our team is coming together, we've decided to formalize his leadership role. Shangjiya has already pioneered several breakthroughs including a new scaling paradigm and distinguished himself as a leader in the field. I'm looking forward to working closely with him to advance his scientific vision. Now, if you are wondering how this new position fits with Jan Lacoon's role as chief AI scientist for the company, Zuck added, to avoid any confusion, there's no change in Yan's role. He will continue to be chief scientist for fare. Yon himself responded, my role as chief scientist for fair has always been focused on long-term AI research and building the next AI paradigms. I'm looking forward to working with Changjia to accelerate the integration of new research into our most advanced models.

Starting point is 00:07:44 Points out Charles Dehan, dude is three years out of grad school. Wild times, my friends, but for now, that is going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzie platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task.

Starting point is 00:08:20 Blitzy delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzie as their pre-I-D-E development tool, pairing it with their coding co-pilot of choice to bring an AI-native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project in your org. Visit blitzie.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted to AI Native. That's BLITZY.com.

Starting point is 00:08:58 As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations, expectations are higher earlier than ever. Getting security and compliance right can unlock growth or stall it if you wait too long. With deep integrations and automated workflows built for fast-moving teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models, infra and customers evolve. Fast-growing customers like Langchane, writer and cursor trusted Vanta to build a scalable foundation from the start. And look, as someone who lives in the world of enterprise procurement, I love how Vanta makes it easy to get compliance right. The last thing is

Starting point is 00:09:38 you need when you're trying to win that big deal is to have it scuttled by something that Vanta has solved for over 10,000 companies. Go to vanta.com slash NLW to save $1,000 today through the Vanta for Startups Program and join over 10,000 ambitious companies already scaling with Vanta. That's v-a-n-ta.com to save $1,000 for a limited time. You keep building the same AI automations for different clients. Sound familiar? You've got workflows people actually want, but you're stuck in the services trap, or worse, just selling one-off copies that give away all your IP. Sure, you could vibe code your way to an entire product, but hosting, authentication, payment support, it's a lot of work for a little upside. Or you could just use Plum. Plum is the platform where technical creators like

Starting point is 00:10:23 you build an audience of paid subscribers for your AI workflows. Subscribers get updates every time you improve things, your prompts stay protected, and you finally get recurring revenue. Ready to stop trading time for money? Visit useplum.com. That's Plum with a B. If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point. But I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial Readiness Report. Over the last six months, Super Intelligence has built out an entire Agent Planning Suite. We help you move from Discovery to Planning to Implementation. After you've completed your Agent Readiness Audits, we help you double-click on your most important use cases with what we call our Use Case Planning Reports.

Starting point is 00:11:07 These reports are going to help you understand what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether you should be thinking about building, buying, partnering, or some combination. After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for. If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions. just go to bit.ly slash super agent. That's bit.l.ly slash super agent, all one word. And if you have any questions, the agent can even help you book an appointment with our team. Welcome back to the AI Daily Brief. So something interesting happened last week. Walmart made an announcement about their updated agent strategy, which without digging into it too much, if you had asked me, I would have predicted that

Starting point is 00:12:00 it was sort of just a corporate press release, right? I wouldn't have thought that it would get all that much attention. We've sort of moved out of the stage at least a little bit where you can talk about your AI or agent strategy without really saying anything new and get earned media for it just because. And yet, outlet after outlet after outlet ran stories about Walmart's agent strategy. I thought this was really interesting and so I started to dig into it a little bit deeper and I actually think it's an interesting bellwether of where things are headed. The TLDR is that Walmart is moving from an agent experimentation phase to what I would call an agent orchestration phase. Basically, they're moving from single spot agents that can do small tasks well to overarching systems where management

Starting point is 00:12:46 agents can orchestrate and coordinate subagents that do those individual tasks into a hole that is hopefully greater than the sum of its parts. So that's what we're going to talk about today, but just to give a little bit of background, especially for the fairly decent chunk of you who are not in the US, it's kind of hard to overstate just how enormous Walmart is. Walmart is the largest employer in America with over 2.1 million employees. They are by far the world's largest retailer. These are 2020 numbers, but more recent numbers had it at $635 billion as opposed to Amazon's $360 billion.

Starting point is 00:13:21 And even more than that, just in terms of companies ranked by revenue, Walmart is bigger than Saudi Aramco, PetroChina, and everyone else. The point being that Walmart is huge in the same. And so maybe that's part of what's capturing attention. The other thing that's interesting to note, though, is that despite the fact that they are in the same space as Amazon, Walmart has in no way shirked or decided to retreat from digital battles and is one of the most aggressive about connecting how they operate their physical retail stores with the latest new technology.

Starting point is 00:13:50 Walmart is not just a physical retailer. It's also an insanely complicated logistics business. It's got one of the largest e-commerce platforms on the planet, and it also has a sprawling white-collar organization. In other words, there is lots and lots of room for agents to come in and find new efficiencies and help them do things differently. Now, the big announcement was not only that Walmart is to use the title of the blog post by Global CTO Suresh Kumar, all in on agents, the way that they're thinking about agents is particularly relevant. Kumar writes, I believe in the power of agentic AI to transform industries. We've been building agents

Starting point is 00:14:24 fast for every aspect of the business. Once we saw how quickly teams were adopting these agents and how helpful they were, we realized agents weren't just useful. They were essential. However, Kumar and Walmart also found a challenge. We also recognized, he said, that multiple agents, even if each one is useful, can quickly become overwhelming and confusing. So he writes, we made a deliberate choice to go beyond individual tools and build a unified company-wide framework, one that ensures every new agent we roll out makes life simpler and easier for everyone, for consumers, for customers, for associates, and for our partners. So at the core of the strategy they announced are four of what Kumar and Walmart are calling super agents. And in many ways,

Starting point is 00:15:05 this is basically an orchestration agent or a boss agent or a management agent or whatever you want to call it. An agent that sits on top interfaces with its intended audience and can route whatever they're trying to do to the right subagents. Among the four super agents, there is one for customers, one for their team and associates, one for their partners, and one for their developers. The grouping then is not by particular task. It's grouped by. particular user and relatedly grouped around their connectivity to particular sets of data. Now, one thing that was interesting about the reporting is that it presented it like it was a big shift in strategy, away from individual agents and towards these super agents.

Starting point is 00:15:44 The Wall Street Journal piece, for example, was called why Walmart is overhauling its approach to AI agents. And all of this reporting really honed in on how things were getting so confusing with so many different agents. I don't think that that's exactly the story here. And no disrespect to the Wall Street Journal or any other outlet who covered it that way. It was clearly embedded in the narrative that Walmart was presenting. This is really the natural evolution from moving from an agent experimentation phase

Starting point is 00:16:09 where you're going to naturally release lots of spot agents that can do very discrete things to see how they function and see how much they improve how different processes work into an overarching system. And the nature of agents is such that it's going to make sense in those overarching systems to have super agents, to use Walmart's term, to orchestrate and manage and interact with all of those subagents that do different things. In other words, this isn't so much an overhaul as a natural evolution, but while Walmart is naturally evolving its approach to AI agents doesn't make for as good a headline. Also, frankly, it's not the job of everyone who's interacting with these systems

Starting point is 00:16:45 to understand what's going on behind the scenes. In fact, Walmart's strategy here in some ways could be seen as parallel to what we're anticipating from GPT-5, which is that in addition to hopefully improved general capability set, a big part of the transition will be from the model selector where you have to pick between O3 or 40 Mini or 4-0 or whatever model you want to use to an interface where the interface is smart enough to be able to interact with the prompt and figure out which model is best suited to accomplishing the goal. Basically removing the burden of the user to figure out how to get the best,

Starting point is 00:17:21 best out of the AI, to the AI being smart enough to actually figure that out for itself based on what the user's end goal is. So of the four agents that they're building, two of them have names. There is Sparky, which is their customer shopping agent, Marty, which is their partner agent, interacting with suppliers, sellers, and advertisers. And then there are two, so far unnamed agents, one for their associates and teams, which can be anything from scheduling to sales data, and one for their developers. These agents are at various stages of rollout right now, not all of them are fully live. Sparky, the customer-facing agent, is live right now, but will have expanded capabilities coming in future months. Marty, which is the supplier-facing agent, is not

Starting point is 00:18:01 currently live, but is expected to launch soon, and the employee and engineer agents are expected at some point over the next year. Still, when reporters expressed skepticism and basically wanted to make sure that they weren't reporting on hopes and dreams, the executives that they were given access to really hammered home that these were not big future ideas, but building on systems that were already in place. As one executive put it to fortune, it's not vaporware. What's more, even if the superagents aren't ready, there's already a ton of interaction with the individual subagents that will become part of these systems. For example, retail dive reported that Walmart currently has 900,000 associates that interact with their internal conversational AI tool asking

Starting point is 00:18:41 3 million questions per week. The company also says that they've already used AI to cut resolution time and customer support by up to 40%. They've cut fashion production timelines by up to 18 weeks, and they've cut the time needed for shift planning by team leads from 90 minutes to 30 minutes. That 90 minute to 30 minute cut may not seem like all that much at first glance, but across 2.1 million employees, saving an hour at a time across every instance of needing to plan shifts is actually going to represent tens of thousands of hours saved, if not more. Now, as part of the announcement, Walmart is also staffing up. A day in advance of this, they announced, that former Instacart executive Daniel Danker

Starting point is 00:19:18 will become their head of global AI acceleration, product, and design. And to give an indication of how important they view this role, Danker will report directly to Walmart CEO Doug McMillan. Interestingly, Danker was a product guy at Instacart, being most recently their chief product officer. And this trend of companies hiring AI leads at very high levels that report directly into the CEO

Starting point is 00:19:38 is absolutely a trend, which I'm sure we're going to be talking about more on this show. Now, there is a lot more that we could dig into with this specific Walmart announcement. One of the things that's really interesting, for example, if we think about Sparky, is that this is not just about providing a better consumer experience.

Starting point is 00:19:55 It's about fundamentally rethinking it. Forbes wrote a piece called Walmart Reveals AI Roadmap that points to a world without search bars. They quoted Hari Vasudev, who's the CTO of Walmart US, from the retail rewired innovation event where they announced all of this stuff. Hari said,

Starting point is 00:20:10 we expect that the search bar and the conventional way of searching for items will be replaced by this multimodal interface. and Sparky. He continued, you could basically give it a very high-level task saying, you know, I've just moved into a new apartment. I'm looking to furnish the entire apartment within this budget, this color scheme, and Sparky will come back and give you the entire selection that'll help you meet exactly that need. In other words, this is a shift from keyword-based search to task-based shopping. The goal is not the more efficient retrieval of relevant results, but agents completing entire

Starting point is 00:20:39 planning and shopping workflows. Given how many customers Walmart has, their size, size and influence in this industry, they could basically self-fulfilling prophecy this modality of interacting with retail into existence. Now, one other really interesting part of these announcements was the ecosystem approach they're taking to building these tools. In the Wall Street Journal article, they write, Walmart said it's connecting these various agents using an open-source standard known as Model Context Protocol or MCP. When Walmart first started building agents, MCP wasn't super widespread. But according to CTO Kumar, the company is now going to going back and making sure that even their older agents can form with the standard.

Starting point is 00:21:19 And this seems to be more than a throwaway priority. Vasudev said during the announcement, from the perspective of the technology and the product stack, we're certainly building Sparky to be capable of interacting with both other agents as well as with humans at the other end. Now, one of the things that makes this interesting is that right now there are different ends working towards the middle when it comes to agentic shopping experiences.

Starting point is 00:21:40 What I mean by that is retailers like Walmart are working on their own agentic experiences, but there's also competition for consumers to have their own agents that work on their behalf that are basically their personal representatives and don't have anything to do with Walmart's proprietary agents. The decisions that a company like Walmart makes around how much those personal shopping agents will be able to interact with their proprietary tools will have a big impact on how agentic shopping actually rolls out in practice. writes Forbes, rather than forcing customers into platform-specific AI experiences, Walmart appears to be preparing for a world where multiple AI

Starting point is 00:22:14 assistants coordinate on behalf of consumers. Sparky could evolve from a shopping assistant into the foundation for agent to agent commerce, where your personal AI assistant negotiates with Walmart systems to complete purchases, compare prices, or manage subscriptions. Indeed, they go on, this external agent capability could position Walmart as the hub for AI-mediated shopping across the entire ecosystem, not just Walmart purchases. Now, I think that this is absolutely the right approach at this stage. On the one hand, Walmart is of a size where it could throw its weight around to force consumers into its own walled gardens. But I think we simply don't know yet enough about how agentic shopping is likely to play out

Starting point is 00:22:51 for companies to take that sort of aggressive approach without really running the risk of running counter to how things actually evolve in practice. And a last note on this that I found really interesting. Lest you think that this is all just being driven by the financial organization, I noticed that back at the end of June, a meta senior ML engineer had posted a research paper from Walmart on agentic rag for personalized recommendations. The paper is called

Starting point is 00:23:16 Agentic Retrieval Augmented Generation for Personalized Recommendation and is by a group of Walmart global tech researchers based in California and Washington. The point being that Walmart is very clearly fully playing this game, not just taking what the market is giving them, but actually trying to push the frontier of what AI systems in retail can do. Now, zooming out as we wrap up here,

Starting point is 00:23:35 I think if you're trying to make sense of this, the way to look at it is not so much what it means for consumers, although we do now have a great big additional playing field to understand agentic experiences in retail and see how it works in practice. I think instead it's to better understand that what the world's largest retailer and largest company by revenue globally has just said is that the future is not only agents, but complete agentic systems,

Starting point is 00:23:59 agent orchestrators, agent managers, multi-tier agent systems, in other words, that work across entire functions. Not to put too much pressure on you if you are an enterprise listener who's just excitedly kicking off your first individual agent experiments, but the biggest companies in the world are now racing past the individual agent stage right on into the agent orchestration and agent system stage. As always then, the subtext of this show is put simply, speed up, friends, speed up. For now that, that is going to do it for today's AI Daily Brief.

Starting point is 00:24:30 Thanks as always for listening or watching, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Walmart Blasts Past Agent Experimentation

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.