The AI Daily Brief: Artificial Intelligence News and Analysis - The Era of AI Experimentation is Over
Episode Date: May 8, 2025Companies have moved past the phase of simply experimenting with AI. Enterprises like IBM now view AI as necessary infrastructure, integrating AI agents directly into business operations. NLW argues t...hat there is a broader shift happening that business leaders to take notice of. Interested in sponsoring the show? nlw@breakdown.network Get Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, why the era of AI experimentation is over.
Before then, in the headlines, do we have a new king of AI coding?
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Thanks to today's sponsors, KPMG, Blitzy and Super Intelligent, and to get an ad-free version of the show,
go to patreon.com slash AI Daily Brief.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI News you need in around five minutes.
Google has just announced a new version.
of its Gemini 2.5 Pro. They're calling it the I.O. edition. And it is specifically aimed at coding
and apparently does so very well. So is Google's new update to Gemini 2.5 Pro the top model for
coding assistance? Well, let's discuss it. Since Cursor picked up Steam late last year, there's been a
pretty strong consensus that Anthropics Claude models are the ones to use for AI coding. There was a
brief scuffle at the end of last year with the release of 01, but Anthropic quickly answered
with Claude 3.7s on it, which for many remains the standard. Google's new Gemini 2.5 Pro
I.O. Edition does seem to upset the leaderboard on the benchmarks, at least, suggesting its head
and shoulders above the competition. Google DeepMind CEO, Demis Hesavas, announced the launch
writing, very excited to share the best coding model we've ever built. Today, we're launching
Gemini 2.5 Pro Preview I.O. edition with massively improved coding capabilities. He goes on and says,
it's especially good at building interactive web apps, and then shares a demo of an app that gets
prototyped just from a simple line drawing. Now, the model is now ranked number one on Elm Arena
encoding, as well as number one on web dev arena. Both of those benchmarks are subjective, with
users selecting their favorite between two competing outputs from rival models. There's been a lot
of criticism recently around how valid this method is for chatbot outputs, with humans being easy
to sway with things like emoji use and verbosity, but it does feel like these could be a better strategy
for rating the outputs from coding assistance, with there being less of those sort of simple triggers
shaping which output users prefer. What's more, the numbers are not particularly close. Going from
ELO scores on WebDev Arena, there's as much daylight between these two models as there was
between 3.7 Sonnet and the initial release of Gemini 2.5 Pro. On LM Arena, the model achieved the number
one ranking across all categories, which is extremely unusual. The model is proprietary, so users can
only access it through Google's web services. Cost remains the same as the older version, which is
around two-thirds the price of 3.7 Sonnet. Users can get free access through the Gemini app if they
enable Canvas 2, but you'll need to pay if you want to plug the API into your IDEE.
Now, early reviews are very positive. Google's Logan Kilpatrick shared a quote from Silas
Alberti, a member of the founding team of cognition who said, the updated Gemini 2.5 Pro
achieves leading performance on our junior dev evals. It was the first ever model that solved one of our
evals involving a larger refractor of a request routing back end. It felt more like a senior developer
because it was able to make correct judgment calls and choose good abstractions. Ramesh R, vibe-coded a
candy crush clone writing, one-shot coding with sound effects. The casual game industry is dead,
took it less than a minute. Pietro Chirano, the CEO of Everart, coded up a 3D simulation of a
gorilla fighting 100 men, latching onto a current meme, and Hyperbolic Labs CTO Euchin Jyn wrote,
this model is now my top coding model. It beats O3 and Claude 3.7 sonnet on several of my hard prompts.
Google, call it Gemini 3. Ethan Mollock did a practical test of the model's ultra-long context window,
commenting, pretty awesome results from the new version of Gemini 2.5. I changed one line of war and peace,
inserting a sentence into book 14 chapter 10 about halfway through, where Princess Mary spoke to
Crab Man, the superhero. Gemini 2.5 consistently found this reference among 860,000 tokens.
He did note some weird quirks of prompting, adding,
If you don't tell it to read everything, sometimes it is lazy, though,
and doesn't go through the text.
AI is weird.
Now, not everyone is universally on the I-O train.
Software engineer Dylan Normandyin writes,
I'm underwhelmed by the latest Gemini 2.5 Pro update.
Seems significantly worse as a pair program than the previous version.
Same thing happened when we went from Sonnet 3.5 to Sonnet 3.7.
The technical ability of the AI may have improved,
but the user experience suffered.
Maybe more damning is this tweet from Signal,
who writes, Gemini is technically great, but feels like talking to a corporate help desk
that's read too many HR manuals. No edge, no warmth, no subtext. Lack of custom instructions
doesn't help either. For coding via third-party apps, it's fine, but for anything that requires
vibe, intuition, or taste, I'll take Claude or GBT every time. Still, if for some the vibes
are off, overall, it seems like a great update. And this version, of course, comes out ahead of Google's
IO conference, which is kicking off in two weeks' time. I'm always excited to see what Google shares
at that event, and this does nothing but increase that excitement.
Next up, open source platform Hugging Face has released a free computer use agent.
Called OpenComputer agent, the free tool is similar to OpenAI's operator in its features.
It can access the web and tackle basic agentic tasks.
However, at least currently, its performance leaves a lot to be desired.
TechCrunch reports that it got tripped on attempting to book flights and is generally pretty
sluggish.
Now, Hugging Face, for their part, said that the goal wasn't to build a state-of-the-art computer
use agent, but rather to demonstrate that open source models are becoming more capable and are cheap
to use on cloud infrastructure. One of the big blockers during this early stage of agent deployment has
been that the cost can be unworkable for anything complex. I'm Eric Roucher from Hugging Face wrote,
As vision models become more capable, they become able to power complex agentic workflows.
And ultimately, it feels more like this is a proof of concept and a demonstration of the
advancements in open source agents than anything else. Lastly today, an area of AI that we haven't checked in on
for a while, AI startup Lighttricks has released a powerful new video model that can run on consumer
hardware. The new model called LTX video is a 13 billion perimeter video model, which theoretically
operates 30 times faster than comparable models on consumer-grade GPUs. That's a big
enough jump to take video generation from impossible to functional for workstation use. It also
means that cost has collapsed, with Lighttricks claiming roughly a 10x cost decrease against leading
competitors. CEO Zeev Farben writes, the introduction of our 13 billion parameter LTX video,
model marks a pivotal moment in AI video generation with the ability to generate fast, high-quality
videos on consumer GPUs. Our users can now create content with more consistency, better quality,
and tighter control. The trick appears to be a feature called multi-scale rendering. The model
generates video and progressive layers of detail, massively increasing efficiency. Farman explained,
it allows the model to generate details gradually. You're starting on the course grid, getting a rough
approximation of the scene, of the motion, of the objects, moving, etc. And then the scene is kind of
divided into tiles, and every tile is filled with progressively more details. This method allows the model
to fit within the memory limits of consumer GPUs, while rival models from Luma and Runway typically
need beefier enterprise-grade-hosted hardware. Farbman says that the memory limit restricts tile size,
not the overall resolution as it would with other models. Quality seems up to scratch from the
available samples. Although at this point, we're basically past the point where there's a big gap
in quality on video models, and many of the selling points have moved to cost and availability.
The model is now fully available as open source so you can try it out on Hugging Face or take
it for a spin at home if you have a reasonably powerful GPU.
For now that that is going to do it for today's AI Daily Brief Headlines edition, next up,
the main episode.
Today's episode is brought to you by KPMG.
In today's fiercely competitive market, unlocking AI's potential could help give you a competitive
edge, foster growth, and drive new value.
But here's the key.
You don't need an AI strategy.
You need to embed AI into your overall business strategy to turn.
truly power it up. KPMG can show you how to integrate AI and AI agents into your business
strategy in a way that truly works and is built on trusted AI principles and platforms.
Check out real stories from KPMG to hear how AI is driving success with its clients at
www.kpmG.org.com slash AI. Again, that's www.kpmg.comg.com slash AI.
Today's episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context,
which if you don't know exactly what that means yet, do not worry we're going to explain, and it's awesome.
So Blitzy is used alongside your favorite coding copilot as your batch software development platform for the enterprise,
and it's meant for those who are seeking dramatic development acceleration on large-scale code bases.
Traditional co-pilots help developers with line-by-line completions and snippets, but Blitzy works ahead of the IDEE,
first documenting your entire code base, then deploying more than 3,000 coordinated AI agents working in parallel
to batch build millions of lines of high-quality code for large-scale software projects.
So then, whether it's code-based refactors, modernizations, or bulk development of your product roadmap,
the whole idea of Blitzy is to provide Enterprise's dramatic velocity improvement.
To put it in simpler terms, for every line of code eventually provided to the human engineering team,
Blitzy will have written it hundreds of times,
validating the output with different agents to get the highest quality code to the enterprise and batch.
Projects then that would normally require dozens of developers working for months can now be completed
with a fraction of the team in weeks, empowering organizations to dramatically shorten development cycles
and bring products to market faster than ever. If your enterprise is looking to accelerate software
development, whether it's large-scale modernization, refactoring, or just increasing the rate of your
SDLC, contact Blitzy.com, that's B-L-I-T-Z-Y.com, to book a custom demo, or just press get started
and start using the product right away.
Today's episode is brought to you by Super Intelligent, and more specifically, our agent readiness audits.
Every company right now is in the midst of a discovery process trying to figure out how autonomous
agents are going to change, both how they work internally, as well as the way they service their
customers, and even what products they actually offer. Agent readiness audits are the fastest,
most efficient way to find out where and how agents can have the biggest impact on your business.
We deploy a custom-designed voice agent to interview teams and links.
leaders, run that through a hybrid human AI analysis process to produce an agent readiness score,
plus a set of insights and actionable recommendations for both what agent use cases are likely to
drive the most value and what you need to do internally to be most ready to seize those
opportunities. After the audit, there are a variety of next steps. We can dive deep and provide
an action planning report on one or more of the specific use cases. We also provide leadership
accountability coaching to help support internal change management, or you can turn your audits
into RFPs on our marketplace.
So go to B-Super.A.I or email us agents at B-Supor.A.I.
to learn more about agent readiness audits.
Welcome back to the AI Daily Brief.
Today we have an interesting show for you.
I'm going to try to take a couple of different news items from the last week or so
and bring them together to articulate or argue for a trend that I'm seeing.
And that is, in short, the shift in mentality, particularly among enterprises and businesses
when it comes to AI. Briefly put, I think that we are moving out of a period where Gen.
AI feels like an exciting and important, yet experimental and unproven and still unknown
force, into something where it is inevitable, essential, and omnipresent.
And my argument is that this is a sense that's more broadly held. This isn't just me arguing
something. It's something that I think you're seeing in the ether. The thing that kick this off
for me, and why I decided to talk about this today, was a post from IBM's VP of Product for
AI platform, Armand Ruiz, who writes, the era of AI experimentation is over. It's time to operationalize
AI agents in the enterprise. Now, the specific genesis for this is that IBM is now underway with its
Think 2025 conference. And this is very much the theme. IBM rolled out a full-stack agentic
offering, including pre-built agents for HR sales and procurement, platforms for agent
orchestration, observability, and governance. The company also announced new partnerships,
with Cerebris and Oracle to make their AI available on those platforms.
And while all of that's awesome and great and you should check out what IBM has to offer,
that's not really the point of this show.
The point is that they are now arguing in explicit and clear terms
that enterprises should be past the point of tinkering with projects,
throwing small teams at pilots,
and instead should be thinking about big structural changes to how they operate.
Now, interestingly, IBM is also putting their money where their mouth is.
they are dogfooding this in a direct way.
CEO Arvind Krishna revealed that the company has used AI agents to replace a couple of
hundred HR workers entirely.
They're also making heavy use of the technology across their entire workforce.
Now, Krishna emphasized that the adoption of agents so far has been additive rather than
viewed as a cost-cutting measure.
He said, while we've done a huge amount of work inside IBM leveraging AI and automation
on certain enterprise workflows, our total employment has actually gone up.
Because what it does is it gives you more investment to put into other areas.
This touches on a theme that I talk about frequently, which is that the fact that AI is coming for basically all of our jobs does not a priori mean that we're not going to have jobs.
There is a decision that enterprises and organizations get to make on how to reinvest the savings that they get from AI-related gains.
Some will, yes, just hack headcount.
It is inevitable, and that's going to be a part of what we discuss later.
But others are going to make a bet that the better play long term is to reinvest those savings into better products, better
services, better support, basically all the things that make them better able to compete and win
new business. So, for example, in IBM's case, they reallocated resources from HR into hiring
more salespeople and programmers. Krishna commented that for them, these are critical thinking
domains where people need to do things that face up or against other humans as opposed to just
doing rote process work. Krishna also highlighted just how fast the entire space is moving. He commented,
over the next few years, we expect there will be over a billion new applications constructed using
generative AI. AI is one of the unique.
technologies that can hit at the intersection of productivity, cost savings, and revenue scaling.
Effectively, he's arguing that there is essentially no wrong way to deploy AI at the moment,
whether your intent is to cut cost, push productivity, design new paths to growth.
The only so-called wrong way to do AI is to get stuck in infinite pilots, rather than
really thinking at-scale operationalized terms.
Vendropete writes, at the heart of IBM's announcement is a recognition that organizations
are shifting from isolated AI experiments to coordinated deployment strategies that require
enterprise-grade capabilities. Ritika Gunnar, the general manager for data and AI at IBM,
said we're trying to bridge the gap from where we are today, which is thousands of experiments,
into enterprise-grade deployments, which require the same kind of security, governance,
and standards that we demand on mission-critical applications. Gunnar believes that the next big
challenge is moving from a place where you have a handful of agents doing isolated tasks
to operationalizing multi-agent systems that can generate serious ROI. He said, we really believe
that we're entering into an era of systems of true intelligence. And yet already, AI is
moving the needle. IBM say that 94% of HR requests that the company are now handled by their agents,
and they also say that they've reduced procurement times by 70% using agendic workflows.
Now, okay, again, this was presented in the context of a big sales conference, more or less,
and so one could be forgiven for being a little bit skeptical, right? It is clearly in IBM's
interest to have everyone believe that the era of AI experimentation is over. But there is
plenty of other evidence of looking around that this sentiment is shared more broadly. We've
covered extensively the results from the recent KPMG Q1 AI Pulse survey. That survey, which focuses
on companies of a billion dollars in revenue or more, found that more than three quarters of
organizations were piloting or deploying agents currently, with another 25% exploring the
possibility. But even more than that, there's been a total shift in the ubiquity and
normalness of individual employees using these tools as well. Daily productivity tool use,
in other words, people just using ChatGBT or co-pilot or whatever, is up from 22% last quarter
to 58% this quarter. Every other metric that they surveyed around this sort of regular usage was up as well.
The deployment of agents is also clearly starting to pick up. Sixty one percent of companies said
they now have call center agents. Sixty-eight percent said they have a customer-facing AI agent,
and 66 percent said they have agents performing administrative tasks like scheduling.
Those figures were all around 20 percent in Q4. So again, big jumps. Now let's go to market
logic. You might remember about a year ago, we had this barrage of articles about how maybe
AI was kind of a bubble. This was probably best captured by the Goldman Sachs piece. Gen AI,
too much spend, too little benefit. Meanwhile, fast forward a year and Goldman analysts are looking at
big tech earnings where AI revenue lines of business are all growing and basically arguing that
right now is a buy-the-dip opportunity because of the pricing of AI stocks. And then there's the shift
in tonality around jobs. One of my great frustrations, as many of you well know, has been the
comfortable lies we tell ourselves. These are best expressed in phrases like,
AI won't take your job. A person using AI will take your job. And while yes, it is the case that
everyone who performs well in the AI and agent economy will be fully versed in using AI,
I believe that this is, to use a word like the kids use, cope. I think that AI is coming for a huge
portion of what we do. And the question is how fast and how well we redesign what we do to take
advantage of what AI offers, rather than clinging to these set of tasks that used to compromise
our jobs. Increasingly, you are seeing this language and this recognition actually come to market.
Over the last month, we had the CEO of Shopify write a long letter to his team talking about
the AI revolution and specifically noting that teams will have to show that they tried to use
AI and couldn't successfully do it before they get more budget for headcount. Duolingo followed
just last week, basically explicitly saying that they are going to be moving.
from contractor-generated content to AI-generated content.
Now, it wasn't like this was the first move for Duolingo here.
The company had cut 10% of its contractor workforce back at the end of 2023,
and there was reportedly another round of cuts in October of 2024,
with both translators and writers being replaced with AI.
But then we got maybe the most pointed expression of this from the CEO of Fiverr.
Fiverr CEO Mika Kaufman wrote,
I've always believed in radical candor
and despise those who sugarcoat reality to avoid stating the unpleasant truth.
The very basis for radical candor is care. You care enough about your friends and colleagues to tell the
truth because you want them to be able to understand it, grow, and succeed. So here is the unpleasant
truth. AI is coming for your jobs. Heck, it's coming for my job too. This is a wake-up call.
It does not matter if you're a programmer, designer, product manager, data scientists, lawyer,
customer support rep, salesperson, or a finance person, AI is coming for you. You must understand
that what was once considered easy tasks will no longer exist. What was considered hard tasks will be the new easy,
and what was considered impossible tasks will be the new hard.
If you do not become an exceptional talent at what you do, a master,
you will face the need for a career change in a matter of months.
I'm not trying to scare you.
I'm not talking about your job at Fiverr.
I'm talking about your ability to stay in your profession in the industry.
Are we all doomed?
Not all of us, but those who will not wake up and understand the new reality fast
are unfortunately doomed.
Now, he then goes into a set of suggestions for what people can do,
and interestingly in this case,
he's not announcing some new policies alongside it.
He concludes his note,
If you don't like what I wrote, if you think I'm full of poop or just an a-hole who's trying to scare you,
be my guest and disregard this message.
I love all of you and wish you nothing but good things.
But I honestly don't think that a promising professional future awaits you if you disregard reality.
If, on the other hand, you understand deep inside that I'm right and want all of us to be on the winning side of history,
join me in a conversation about where we go from here as a company and as individual professionals.
We have a magnificent company and a bright future ahead of us.
We just need to wake up and understand that it won't be pretty or easy.
It will be hard and demanding, but damn.
am well worth it. This message is food for thought. I've asked Shelley to free up time on my calendar
in the next few weeks so that those of you who wish to sit with me and discuss our future can do so.
Now, this is certainly the most assertive language that we've seen around this, but I think that it
reflects a lot of what leaders in companies are thinking. So what does this all mean? Well, the good news
is that there's a difference between organizations waking up to a mindset shift and no longer questioning
whether this is the future, but now actively and assertively moving towards this future,
and on the other hand, actually being in the future. Yes, that growth line, for example, in the
KPMG survey is super strong and clear, but 58% of people using productivity tools on a daily basis
means that still 42% aren't. There is a window, there is a moment in time, and this is what the CEO
Fiverr was articulating as well, where there is an opportunity to start to adapt. For me personally,
I find it quite encouraging, that we're not.
having the conversation that tiptoes into this future, but that is confronting it head on.
I think the only way that we assertively exert our control and our agency over the shape of this
future is to recognize it. And we do have agency here. What organizations don't get to decide
is how the technology is going to develop and whether it's going to change the shape of what they do
and how they do it. What they do get to decide is how proactively they transform themselves for
that new future. What they do get to decide is what their position vis-a-vis their own employees
is going to be. What they do get to decide is how they're going to reinvest the inevitable savings
that come from robots doing a bunch of the jobs that people do now. And none of those things
leads to the dystopian nightmare scenarios that people so often inevitably assume are true.
I continue to be incredibly bullish and optimistic about the future where we are all super-powered
and super-intelligent. But the TLDR is that I agree with Aramante.
the era of AI experimentation, at least from a mindset perspective, and viewing it simply as
experimentation, is over. So friends, let's dive in all the way. For now, that's going to do
it for today's AI Daily Brief. Appreciate you listening or watching, as always, and until next time,
peace.
