The AI Daily Brief: Artificial Intelligence News and Analysis - The Five Vectors of AI Competition
Episode Date: May 20, 2025With Microsoft Build, Google I/O, and Code with Claude all happening this week, the battle between major AI labs is heating up. This episode breaks down how to think about the AI race across five key ...vectors: Consumer, Enterprise, Coding, Agents, and Benchmarks. Get Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Vertice Labs - Check out http://verticelabs.io/ - the AI-native digital consulting firm specializing in product development and AI agents for small to medium-sized businesses.The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
As we prepare for a big week with events from Anthropic, Google, and Microsoft, we get into five different ways to think about broader AI competition.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Thanks to today's sponsors, blitzie.com, for Tis Labs and Super Intelligent, and to get an ad-free version of the show, go to patreon.com.
Hello, friends, quick note.
Like I said, we have a very big week coming up.
There is going to be a lot of news.
Today is a preview episode of what's going on, plus a potentially interesting framework for how
to think about AI competition. This crowded out the room for some of the headlines, but we will be
back with a normal episode with both headlines and a main episode tomorrow. I anticipate we'll probably
be talking about what was announced at Microsoft Build for the main. But in any case, that is
the story with today's episode. So without any further ado, let's dive in. Welcome back to the AI Daily
Brief. Well, friends, we have a very big week of AI news coming up with us, or at least, I should say,
the context and setting where we would presumably have a big week of AI news. So what's going on? Well,
First of all, we have Microsoft Build, which kicks off today in Seattle and runs throughout the week.
Then starting tomorrow, we have Google I.O., which is their annual developer conference.
Then at the end of the week on May 22nd, we have Anthropics first developer conference
Code with Claude.
All of this, by the way, is why OpenAI tried to front run some of this by launching Codex
last week so that they didn't miss an entire cycle of new model announcements.
In any case, what we're going to do today is take a look at the state of where these companies
are, see what we might glean about what might be coming, and the context that I want to use for that
is thinking about the competition for AI across five different vectors. Consumer Enterprise Benchmarks
coding agents. This is far from scientific or even comprehensive. It's just how I kind of think
about things. I will note that I think competition for developers is a really important thing,
but I think it's kind of endemic across all of these and is especially honed in on coding and agents.
And in any case, we're going to come back to that. But to start, let's talk about
where each of these companies are. And let's actually do a quick summary of the Codex announcement,
again, since it was chosen as a way to front-run the rest of this news. So what is Codex?
Talked about it a little bit on Friday, but basically it is a coding agent. It's powered by a new
model called Codex1 that's a version of the O3 reasoning model that is optimized specifically
for software engineering. OpenAI claims that Codex1 produces cleaner code than O3, is better
at following instructions, and will run iterative tests on its code until it gets the result
it wants. Codex is built for power. It can handle multiple tasks simultaneously, and people can
use their computers or browsers even while it's running. Codex is built into chat GPT, which is different
than Claude Code or cursor or other things like that, which immediately gives it really wide
distribution. For example, Boxes Aaron Levy points out, OpenAI Codex works on the mobile app. We're
entering a wild world where you can have AI agents coding anything while on your phone. The ability to
just have unlimited AI agents executing tasks on your behalf in the background is going to utter
change knowledge work. Now, I wasn't using Codex, but I will say that I had a moment this weekend
where I was walking around a nearby hiking trail, and I had lovable vibe coding something
just from Chrome on my phone, and I had OpenAI's deep research working on a different thing.
So even though it's nascent and the UI is not exactly optimized, I think Aaron's exactly right here.
The reaction to Codex is a little muted. Santiago writes,
literally everyone is freaking out over Codex like they didn't do the exact same thing for
Devin, Cursor, Deep Seek, and every GPT drop since 2.0. The hype cycle
resets every three weeks and we all start everything all over again. This is what we'll see over
the next few days. Open AI employees will claim they've been using Codex for a while and it's writing all
their code. A few people will tell the story of how they casually ask Codex to finish an old project,
and it did it all and it was perfect. AI influencers will litter our feeds with 90% of people
don't know this codex trick and cursor is dead threads, etc., etc., etc., etc.
Now, interestingly, I actually don't think that this has been what's going on. I do think that
you should mentally filter out every AI think boy post along the lines of the
these threads that he's talking about, but I don't think people are really freaking out over a
Codex. I think they're excited. I think that they're trying to figure it out. Riley Brown, for example,
who's building vibe code wrote a long post and shared a video called How to Test Codex as a vibe
coder non-technical. He says with Codex, you can spin up AI coding agents that edit your code.
This is like Devin, not cursor. You can run these AI agents in parallel. He then goes through
how he used it with other vibe coding apps, including V-Zero, and shared a video of his work.
At the end, he said, P.S., this is probably not an inefficient way to do it. This is just how I tested it.
the type of posts that I've seen a lot of surrounding Codex. I think so far it's just too fast to know
how it's going to fit in this whole ecosystem, although certainly it is validation that this
ecosystem is incredibly important. Another interesting thought came from Josh Tobin, who does agents
at OpenAI, who said, my hot take is that Codex increases the value of being technical. If you can
describe precisely what you want to build, you can get a massive amount done in parallel. That's
fundamentally a technical skill. Professor Ethan Malik writes, Codex is neat, but I really wish that
OpenAI had gone the extra step of making it accessible to non-coters. Not that non-coder should expect
to make complex or high-quality applications with today's software engineering agents, but democratizing
making of small tools can make a big difference. Anyways, Codex is out. It's now a part of this
ecosystem, and I'm sure we'll see it start to integrate and interact with all these other tools
soon. But what about the companies that OpenAI was trying to front-run? What can we expect from them?
Now, obviously, this is a big company that has a lot to talk about even beyond co-pilot and AI,
but copilot is obviously expected to be a core part of the story. Right now, it's not exactly
clear, though, if there's some big thing to announce, or if we're just seeing the continued
deepening of co-pilot into all of Microsoft's products. Business Standard writes, Microsoft's
copilot AI assistant is expected to take center stage at the upcoming event. The company has been
steadily embedding copilot across its key platforms, Windows, Office, and Azure, and further updates
are anticipated this week. New features such as semantic search abilities and settings, file
Explorer and the Windows search bar are likely on the way. Additionally, Microsoft may announce
enhancement to co-pilot agents, a feature introduced in April designed to streamline complex
multi-step tasks using AI. Now, Business Standard also expects Windows 11 and Azure to get some airtime,
particularly around their AI dimensions, such as the recall feature in Windows 11, but fascinatingly,
they also call out Model Context Protocol as a major potential part of this. If we see any sort
of emphasis on MCP, like if it makes it into Satchin-Della's keynote, that would certainly suggest that
Microsoft is really interested in competing around agents. I think for me, what I'm watching with
Microsoft is just how they position themselves in general in this AI battle. They're very clearly
not competing, at least right now, to push the boundaries of what's possible from a model
standpoint, but they're still the default for enterprises. And so what they do potentially carries
more heft in terms of what's available. One of the infuriating things for people who work inside
companies that are Microsoft shops is the disparity between the tools they can use in their personal
life and what they have available. So to the extent that Microsoft can close those gaps, that would be a
very powerful thing. Remember, though, Microsoft is thinking in a big zoomed out way. We've talked a lot
recently about their Work Trend Index for 2025, where they declared that this was the year that the
frontier firm is born. The frontier firm, you might remember, is where every employee becomes
an agent boss managing swarms of AI agents. And so I'm going to be watching closely to see how
Microsoft is painting a vision of how we get from where we are now to that. And then there's Google.
Unlike Microsoft, Google is still absolutely competing to be front and center and pushing the
boundaries when it comes to actual AI models. And even if one argues that they are still behind
where one might have imagined Google would be relative to these upstarts, given how much
AI talent they've had for so long, it's hard to argue that they've had anything but an extremely
excellent year since last year's I.O. Last year at this time, I know this sounds forever ago,
but it was just one year ago. Gemini was doing things.
like suggesting glue as a pizza topping. But since then, Google has staged an enormous comeback.
Gemini 2.5 Pro is a benchmark leader, with many people discussing how it, for the first time,
pushed Anthropics Claude off the top of the heap when it came to coding use cases.
Gemini's product range is competitive at every price point. Their agent previews have been
impressive. The question is one of users. The company touts 1.5 billion users for AI overviews,
but that's just embedded in Google search, not really a telling statistic. They say they have 150
million subscribers through their Google One service, which is a 50% jump from last February,
but that's also a shared product with their data services. They claim 350 million monthly
active Gemini users, but that could include a fair number of pre-installed handsets. The double-edged
sword of a company having big existing distribution is that there's some skepticism of how rich
and deep the use actually is. This is why people don't really pay attention when Zuckerberg
touts how many people are using Metas AI, because it's just in your face inside Instagram and
Messenger and WhatsApp. And even at that 350 million number, that's still way behind chat
GPT. Now, it's clear that Google is not just going to concede the battle for consumer to
open AI. In fact, in the lead-up to I.O., we've had a host of big releases. The company launched
an updated version of Gemini 2.5 Pro that significantly improves its coding ability. We got a fascinating
next-generation tease from DeepMind about a coding agent that can optimize algorithms. They claim that
the agent has cut Google's compute by 0.7% globally through code optimization. We've also seen
some big updates to fan favorite notebook LM, including the launch of a standalone app. Now, all of those
things could have been fodder for a major unveiling at the conference, but Google decided to roll
them out early. Meanwhile, the pre-show coverage is really dabbling around the edges. It's sort of
focused on new gadgets and features for Chrome and Android. TechRadar, for example, bundles
everything into, quote, a ton of Gemini AI news. But it doesn't really seem like they're clear on what
that might be. Look, as I said, Google has done significant work over the last year to improve their
place in the AI fight. And I'm very excited to see what they push out at I.O. and beyond. I do think that
they find themselves sort of uncomfortably between pure consumer and pure enterprise. On one end of the
spectrum, we have OpenAI, who's racing up to 800 million users thanks to Ghibli images and is just
super focused on consumer, although honestly making progress on enterprise in a way that we'll
talk about in a minute. And on the other end of the spectrum, we have Microsoft, which just feels like
they have total lock-in among enterprise users. Google sits somewhere in between. They're the enterprise
choice for consumer-type smaller companies, SMEs, mid-markets, but I wonder if that middle
space is actually making it harder for them to prioritize which AI stuff to care about. And then
there's Anthropic. Back at the beginning of April, Anthropic announced that they were hosting their
first ever developer conference called Code with Claude. They wrote, Code with Claude is a hands-on
one-day event focused on exploring real-world implementations and best practices using the
Anthropic API, CLI tools, and Model Context Protocol. Now, they've given out almost no
information aside from that. And what we know for sure, and certainly if you've listened to this show,
is that Anthropic has really cemented itself as the core choice for models to power
coding tools and coding agents. The company continues to grow, and if it weren't for the just
utter juggernaut of OpenAI, their numbers would be getting way, way more attention. In terms of
expectations for what's coming this week? It's all about the new and updated models.
The information reported last week that, according to their sources, who were people who had
used these new models, Anthropic was going to announce new versions of its two largest models,
Claude Sonnet and Claude Opus, and that these models were supposed to be able to go back
and forth between thinking and reasoning and tool use. Writes the information. The key point,
if one of these models is using a tool to try and solve a problem but get stuck, it can go back
to reasoning mode to think about what's going wrong and self-correct. Also from the
information? For people who use the new models to generate code, the models will automatically test
the code they create to make sure it's running correctly. If there's a mistake, the models can
stop and think about what might have gone wrong and then correct it. Continuing, they write,
the new Anthropic models are thus supposed to handle more complex tasks with less input and corrections
from their human customers. That's useful in domains like software engineering where you might
want to provide a model with high-level instructions like make this app faster, and let it run
on its own to test out various ways of achieving that goal without a lot of handholding.
Now, if we get all of that from Anthropic, I think people will be extremely excited.
And I think it shows just how important right now the battle around coding is as a core part of the larger AI competition.
Today's episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context,
which, if you don't know exactly what that means yet, do not worry we're going to explain, and it's awesome.
So Blitzy is used alongside your favorite coding co-pilot as your batch software development platform for the
Enterprise, and it's meant for those who are seeking dramatic development acceleration on large-scale
codebases. Traditional co-pilots help developers with line-by-line completions and snippets, but Blitzy
works ahead of the IDE, first documenting your entire codebase, then deploying more than
3,000 coordinated AI agents working in parallel to batch-build millions of lines of high-quality
code for large-scale software projects. So then whether it's code-based refactors, modernizations,
or bulk development of your product roadmap, the whole idea of Blitzy is to provide
enterprise's dramatic velocity improvement. To put it in simpler terms, for every line of code
eventually provided to the human engineering team, Blitzy will have written it hundreds of times,
validating the output with different agents to get the highest quality code to the enterprise
in batch. Projects then that would normally require dozens of developers working for months
can now be completed with a fraction of the team in weeks, empowering organizations to
dramatically shorten development cycles and bring products to market faster than ever.
If your enterprise is looking to accelerate software development, whether it's large-scale modernization,
refactoring, or just increasing the rate of your STLC, contact Blitsey at blitzie.com, that's
B-L-I-T-Z-Y dot com, to book a custom demo, or just press get started and start using the product
right away.
Today's episode is brought to you by Vertice Labs, the AI Native Digital Consulting firm
specializing in product development and AI agents for small to medium-sized businesses.
Now, guys, this is a market that we have seen so much.
much interest for, so much demand for, and many times great AI dev shops and builders out there
just have so much business from the high end of the mid-market and big enterprises that this is a
group of buyers that gets neglected. Now for Vertis, AI Native means that they don't just build AI,
they use it in every step of their process. They embed agents in their workflows so that they
better know how to help you embed agents in your workflows. And indeed, what they specialize in
is building AI agents and agendic workflows that augment knowledge work, from customer support to
internal ops so that your team can focus on higher value work.
Vertice wants to ensure that this is not just another co-pilot, but something that works
end-to-end, translating business problems into working software in weeks, not quarters.
They have found that their clients typically see a 60% reduction in time and cost, with
significantly higher output than traditional technology partners.
So if you are a founder, a C-T-O, a business leader, or you've just got a product idea to
launch, check out vertislabs.io.
that's V-E-R-T-I-E-Labs.I-O.
Today's episode is brought to you by Superintelligent.
Now, you have heard me talk about agent readiness audits probably numerous times at this point.
This is our system that uses voice agents and a hybrid human AI analysis process
to benchmark your agent readiness and map your agent opportunities
and give you some really pointed, actionable next steps to move further down the path
in your agentic journey.
But we are coming up on the slow time of the year, and if you want to use this
time to get out ahead of peers and competitors. We're excited to announce something we're calling
Agent Summer. The idea here isn't that complicated. It's basically just an accelerated program to
get you agentified and fast. First of all, it's going to include an agent readiness audit,
figuring out where your biggest agent opportunities are. Next, we're going to support both your
internal change management process, helping you figure out AI policy, data readiness, things like that,
as well as doing action planning around the agent opportunities that are most relevant for you.
And finally, we're going to connect you to the right vendors to actually go and deliver
this. Now, for this, we want to work with a very small handful of companies that really want to move.
We're going to be bundling more than $50,000 of services for something that starts closer to
$30,000. And so if you want to use this summer to jump ahead on your company's agent journey,
email agent at besuper.aI with summer in the subject line, claim one of these limited spots,
and let's go have an agent summer. Let's actually talk just briefly about each of these different
areas of competition. Like I said, coding is a huge one. This is one of the most essential in breakout
out use cases. It's a use case that enables other use cases. It has dimensions both for technical
people and developers because it's accelerating what they're able to do. And the tools that they use
to do whatever it is that they're doing tend to find their way into the tools that people use to
interact with what they're doing. But there's also the whole vibe coding piece of this, where we're also
simultaneously seeing a massive expansion of who can participate in that sort of creation. And so all of a
sudden, it's not just the developers, or at least the traditional developers who have a stake in which
of these models is best for coding, but also this new legion of vibe coders and solopreneurs.
Right now, we'll have to see if this new Codex model starts to knock Anthropics models off the top
of the heap. Interestingly, a couple of weeks ago, when Google announced Gemini 2.5 Pro I-O,
there was a lot of chatter about how the benchmarks were better than things like Claude,
and there was this whole question about whether we had a new king of AI coding. And while there's
certainly been some positive buzz since then, by and large, I don't think that we've seen
habits really shift. Now, again, it's only been a new thing.
a couple of weeks, but given that we are about to get another update from Anthropic, it seems likely to
me that that company retains their top dog status, at least for developers, but who knows, this
is going to be one of the most, if not the most dynamic area of this competition. It's also closely
related to this other area of competition in agents. And when I'm talking about agents,
I'm actually talking about two different things simultaneously, or I should say probably at least
two different things. One of them are, of course, the end agents themselves, and the other are the
platforms for building agents. Now, on the platform side, this is the other area where Anthropic
really has cemented its lead status. We did a whole show built off of latent spaces post YMCP1.
That's a good primer on how essential model context protocol has become to the emerging field of
agent building. But there are other areas of the agent infrastructure stack that other people
are trying to compete for as well. For example, in the beginning of April, we got Google announcing
the agent-to-agent protocol, which is basically an agent communication protocol.
You can tell how far MCP had come because Google, when they announced it, wrote, A2A is an open
protocol that complements Anthropics Model Context Protocol, which provides helpful tools and context
agents. So they are basically trying to build a different part of the agent infrastructure stack
that MCP is not addressing. As we are watching these announcements from this week, I would say watch to see
what Anthropics says about MCP, watch to see what Google says about A2A as well as any other agent
infrastructure plays, and see if and how Microsoft talks about bringing any of these things into
their ecosystem for enterprise builders, especially through Azure. Enterprise and consumer actually
make up another part of this competition. I talked before about how Microsoft sort of has a default
pole position, and of course they use their partnership with OpenAI to anchor that position
in the early days of generative AI. Interestingly, it does appear that OpenAI is making up
some major ground with the enterprise. Now, these stats are recent, but they do represent
present a particular slice of the market. This comes from Ramps's AI index, which basically
estimates business adoption of AI products by using Ramps's card and bill pay data. Ramp is not
necessarily used by the biggest enterprises in the country, so this is going to represent
more SMEs, startups, and some small mid-markets. But at least in this cohort, Open AI is
flourishing. There has been a massive increase in the percentage of U.S. companies that are using
OpenAI's business subscription, from a little over 15% at the end of last year, all the way up to
32.4% now. Anthropic has also jumped from around 4% at the end of the year, doubling to 8%
now, but obviously still very far behind OpenAI, and Google has absolutely fallen off a cliff.
Now again, I want to caution that this is a very specific slice of the market. It doesn't
represent everything. But the point for our purposes is that as we are thinking about AI competition,
enterprise is a very particular subset of that competition.
Now, on Consumer, we touched on it before, but here OpenAI just continues to be the absolute
total leader. It continues to be the case that for many Normies, ChatGBT, GEPT, and AI are
synonymous. And OpenAI has recently had a burst of new users thanks to things like their
new image model and the Givli-style image generation meme, which exploded all over X and other
platforms. And basically, it sounds like OpenAI is somewhere around 800 million weekly active
users right now. And we don't know exactly what that number is and how much it's peeled off since
the Ghibli trend ended, but it's still so much bigger than anything else out there. What's more,
OpenAI is very clearly doubling down on their consumer lead, announcing that they were bringing in
Instacart CEO Fiji Simo as their new CEO of applications, basically their CEO for the actual
business stuff. Now, interestingly, coming back to agents, I said that there were two aspects of agent
competition. One was the infrastructure, things like MCP and A2A, but the other side is the end agents
themselves. And of the big labs, so far, Open AI and Google seem like the two that really want to
compete with end agent products, and OpenAI even more strongly than Google. I think that
these companies understand that the moat is in owning the customer relationship and that there's
going to be a huge amount of commoditization, volatility, and switching when it comes to models.
And so I think that when OpenAI is thinking about agents, they're not just trying to be the models that power agents.
They're also thinking about actually owning the agents themselves, having the best deep research agent,
having the best computer use agent and operator, having the best coding agent now.
My sense is that that's a battle that they're trying to have and it is actually directly related to their leadership in consumer.
Now lastly, for the sake of completeness, as we think about AI competition, if you were just going by news articles,
you might think that it was all about benchmarks. However, as I record this, I don't think that benchmarks
have ever had a lower place in the consideration of users. Back a few months ago, in a Reddit thread in the
Lama community, a poster wrote something called I'm starting to think AI benchmarks are useless.
Across every possible task I can think of, Claude beats all other models by a wide margin.
I have three AI agents that I've built that are tasked with researching, writing, and outreaching
to clients. Clod absolutely wipes the floor with every other model. Yet Claude is usually beaten benchmarks
by OpenAI and Google models.
They then get into speculating on why,
but this is definitely broadly the perception.
Now, interestingly, I think that we might be hitting sort of a floor
or a nadir on how much people don't care about benchmarks,
and I think that part of where we might go
is starting to see more specific, discrete benchmarks
or evals for particular use cases.
For example, very randomly as I was posting this,
I saw that tiny founder Andrew Wilkinson had wrote,
I just saved around $5,000 by drafting a legal agreement with Gemini 2.5 Pro, which ranks number
one on legal bench. Now, the rest of his tweet is about the disruption coming for lawyers,
but what's interesting is that he very clearly cared about accuracy benchmarks. That did
influence his choice as a builder. And so I think that to the extent that benchmarks can
actually be useful for entrepreneurs and developers, they have some utility, it's just really not
going to be around general consumers or even general enterprises, I think, switching between models
because they scored higher on a benchmark.
Ultimately, proof is in the pudding and it's all about practice.
So summing up, we have a very big week coming.
Microsoft Build, Google I.O., Anthropics Code with Claude,
OpenAI, trying to needle in and get their stamp on the conversation before.
And if you want to just keep a crib sheet of where they stand relative to AI competition,
I'd encourage you to think about it in these dimensions.
Again, consumer, coding, agents, enterprise, and benchmarks.
For now that, that is going to do it for today's AI Daily Brief.
Appreciate you listening or watching as always.
next time. Peace.
