The Changelog: Software Development, Open Source - Measuring the actual impact of AI coding (Friends)
Episode Date: July 11, 2025Abi Noda from DX is back to share some cold, hard data on just how productive AI coding tools are actually making developers. Teaser: the productivity increase isn't as high as we expected. We also di...scuss Jevons paradox, AI agents as extensions of humans, which tools are winning in the enterprise, how development budgets are changing, and more.
Transcript
Discussion (0)
Welcome to changelog and friends, a weekly talk show about taking more walks.
Thanks to our partners at Fly.io, the public cloud built for developers and AI agents who
ship.
We love Fly, you might too.
Learn more at Fly.io.
Okay, let's talk.
Well, friends, I'm here with Damian Schenkelman, VP of R&D at Auth0, where he leads the team
exploring the future of AI and identity.
So cool.
So Damian, everyone is building for the direction of Gen.AI, artificial intelligence, agents,
agentic.
What is Auth0 doing to make that future possible?
So everyone's building Gen.AI apps, Gen.AI agents.
That's a fact.
It's not something that might happen, it's going to happen.
And when it does happen, when you are building these things
and you need to get them into production,
you need security, you need the right card trails.
And identity, essentially authentication, authorization,
is a big part of those card trails.
What we're doing at OZero is using our 10 plus years
of identity developer tooling to make it simple for developers,
whether they're working at a Fortune 500 company or working just at a startup that
right now came out of Y Combinator to build these things with SDKs,
great documentation, API first types of products, and our typical Auth0 DNA.
Friends, it's not if, it's when, it's coming soon.
If you're already building for this stuff, then you know.
Go to Auth0.com slash AI.
Get started and learn more about Auth for Gen. AI at Auth0.com slash AI.
Again, that's Auth0.com slash AI.
dot com slash a I.
All right, I'll be we're here to talk about measuring these a.i. agents that are infiltrating our organizations everywhere,
our lives, they're making us in some cases, in some cases we do it willingly.
But somebody's got to track these things. How you doing it?
Well, it's still early days.
But OK, it's the the the name of the game right now is firstly, being able
to understand how are we using these AI tools, how are developers incorporating them into
their workflows, how much value are we getting out of them, what's the ROI, are we spending
too much, too little, how do we write size, amount of spend?
And then how good are these agents is the other question. And how do we measure AI?
Gosh, are you asking us? We're asking you.
I'm just setting the scene.
Setting the scene.
Just setting the scene. It's early days, you said though. It's early days. So
you've been in the business of helping organizations
understand the developer experience in terms of morale,
ability, code being committed,
how that affects the organization,
how that affects the bottom line.
So you've got organizations that essentially hire you
as a service or a consultant,
or however you wanna frame that,
and you help them determine if their teams are successful
and if code is being deployed properly and all that good stuff.
You've got to have some sort of pressure from those folks
because they're top down at this point.
They're saying, okay, developers,
you must begin to use this
because we're seeing this dramatic increase
and they've got to deploy it in ways
where they can test it and try it.
So what kind of pressure have you seen from your side?
Quite a lot. I mean, it's, I think I can speak for all of us.
That I don't think I've ever seen anything like this in the industry where,
uh, you know, those at the top are
so bought into the promise of a new technology and are
pretty aggressively pushing it down.
And a good example of that is one thing that's really common is for for top down
tracking now just adoption and utilization.
So lots of organizations are looking at
monthly active, weekly active, daily active usage by developers.
They're segmenting developers into different
cohorts based on whether they're super users
or low, medium, moderate adopters.
And then starting to then trying to study,
okay, what is that getting us?
Right, are the people who are using AI more productive?
Are they happier?
Is their code better or worse? But
yeah, the pressure is unlike anything I've really seen before.
And I was just talking to some researchers at one of the
prominent AI developer tool vendors and, and they said that
a lot of this usage that they're saying,
especially of agentic tools right now,
they believe is more fear driven than utility driven,
meaning that people are using a lot of these tools
even when they're not really effective right now,
because they fear that not doing so will mean
that they could become obsolete.
So that was a pretty interesting finding from-
I blame Steve Yagy.
He comes on our show, he starts telling people,
you will be replaced, you better adopt this thing right now.
The idea is dead.
The AI vendors have done a fantastic job
in their marketing of affecting the minds of leaders.
I even saw like Anthropix, what they put together like an economic research organization to
study the impact of what's going to happen.
It's like the best PR Sun ever, I think.
For sure.
Well, have you guys been surveying?
Have you been collecting the data?
Do you have anything that you can,
at the definitive or even just gives us a glimpse
into what's actually going down on the streets?
Yeah, so we are collecting data
from over 400 different organizations now.
It's both through surveys
as well as looking at their actual telemetry.
So DX connects to pretty much all the leading AI
coding tools today.
So whether that's Copilot, Cursor,
Windsurf, et cetera, Cloud Code.
So we're ingesting that telemetry as well,
which gives us a real time view into developer usage
and utilization.
Some of what we're seeing, first of all, adoption is rising
extremely rapidly since really about three or four months ago.
I think that's when we started seeing the top-down mandates.
That's when the message became, you got to get on board or you're going to be left behind.
So we're seeing that in the data.
In terms of impact,
we see a number of really interesting things.
So first of all, on average, and keep in mind,
this is, it's called Q2 2025,
because this space is evolving very quickly.
On average, developers report saving about three hours per week,
things say, iTools. Okay. Now, when you think about what, well, what, put that in the context. Well,
that's about what, five to 10% of their work week. So about, so we're talking about about a five to 10% boost.
Now that is a lot less than maybe what you might expect
if you were just looking at the headlines
or scrolling Reddit.
One piece of research it aligns with is Google.
I don't know if you guys saw that they came out,
it was about two or three weeks ago saying,
hey, based on our research,
we're seeing about a 10% productivity improvement
with our developers, thanks to AI.
So that's one data point.
Couple other data points,
and we can kind of dive deeper as you guys wish,
is one of the strongest relationships
we're seeing with data is actually with engagement,
meaning developer job engagement.
And I think that's really interesting because when you like hear some of these like OGs like
Kent Beck getting into AI augmented coding, like one thing you hear them talk about, maybe more so
than anything about their productivity, is how much fun they're now having.
It's just a more enjoyable paradigm of working.
And so we're seeing that reflected in the data.
You don't see that being talked about in the press.
I don't think people maybe care about that as much right now.
It's all about productivity.
The last thing I'll share is,
whereas we are seeing that around 10% lift
in developer time savings.
We're not seeing that strong of a correlation
in terms of something like code throughput.
I mean, actual rate of deliverables being shipped.
So that's a little bit perplexing.
That's a metric a lot of organizations immediately
wanna look toward is we shipping
more PRs because of.
Sure.
And there is a small relationship, but it's not, we can't say is like 10, 10% plus lift
across organizations right now there.
And that raises a lot of interesting questions to, well, why and how are the time savings
trend? Where are the time savings,
where are those time savings going
are some of the interesting questions.
This research is based on Q2 of this year, is that right?
So this time window that you're speaking of
is basically just Q2.
Q1, it's really H1 data.
Okay, all of this year, 2025.
And I should add that we saw a notable rise
in a lot of those numbers compared to each
two of last year.
So particularly the adoption metrics, the time savings, those have increased materially
since each two last year.
I don't know about you, Jerry.
I want to talk about the fun.
I feel like we've been talking about all this productivity and the FOMO and the fear and
the slaps in the faces and the,
you're gonna lose your job, oh my gosh.
Let's talk about the fun.
Can you talk about the fun, Avi?
Like, what do you know about this Kent Beck fun aspect?
Like, what is the unlock here
that's making this paradigm shift more fun?
You know, I think when GitHub Copilot first came out,
it was your AI pair programmer, right?
And I think-
Your buddy.
Your buddy.
And I think that more interactive, more social form
of doing development work, having someone,
or in this case, an AI to where you can get unblocked
when you're just in a brain funk
or get really fast feedback on something you're trying to do
or that you just did.
I think that's more fun.
We've heard that from our engineers.
We see that out in the field when we're doing research
and you see that from folks like Kim Beck
who are really, or Gene Kim, right, who are talking about
how much fun they're having.
They haven't maybe been doing as much coding
in their careers recently, but they're getting back in
because they're having so much fun.
Nick Neesey, I was making fun of Nick
a few weeks back on the show, maybe a few months back,
because of all of his AI subscriptions.
He was confessing all of the money he was spending.
And I was saying, he was telling me how he was using it
and I kind of made fun of him and said,
well, you're just lonely.
Like you don't actually need help.
You just want someone to be there with you.
He's like, yeah, totally.
And for him, like that is the fun
as it feels more alive to just not be alone.
Now people who pair programmed in the past
or do it their jobs know what that's like.
It can also be exhausting because you're interacting,
you're trying stuff, you're bouncing stuff off a person.
And most of us don't have that.
I mean, very few orgs buy into,
let's put two developers on one feature.
I mean, that just is a very hard sell.
And so people who do it swear by it,
but very few people will do that
because it just doesn't make sense in the leadership's eyes.
It's like, okay.
But the pair programming aspect of this
and just having someone that it's like the rubber duck,
but the rubber duck talks back and has ideas
and has information.
And that's really powerful for, I think, a lot of us.
For me personally, Adam said it, unlock.
For me, I'm just doing stuff that I wouldn't have tried before
because I just don't have time.
And I don't have two hours for this random idea I just had
where I thought, this would be nice.
Nah, that's too much work.
Like, I've done that constantly for the last 20 years.
And now I'm like, this would be nice.
I'll just go have Claude try it while I'm doing something else and you feel like somebody
else is toiling away and you're just getting that thing done and sometimes you throw it away and
sometimes you use it and sometimes it just helps you with something else and that for me specifically
I know I've sung Claude Codes praises many times on the pod because I'm just into it right now. I'm just having fun with that particular
tool. Once it was agentic and it was in my terminal and it was good enough that
I didn't really have to look at the code as long as I wasn't gonna like check it
into our main repository and have to maintain it, I'm just coding all kinds of
stuff without coding and for me all of a sudden I'm having fun. Whereas prior to
this, like if you go back the last two years,
it's been like a Google replacement, but there's nothing fun about replacing Google. You're like you just get faster answers.
But your 10% is interesting because you know, GitHub has been claiming 50% for a long time, haven't they?
I mean that's been their advertisement on CoPilot is 50%. 10% to me seems low, but
that's self-reporting and analytical reporting like you're doing the data on that.
And people say three hours a week,
which would be 5% on a 60 hour week, 10% ish.
Almost 10% on a 40 hour week.
So yeah, 10% is just not,
I wonder why it's not higher.
Yeah, I think first of all,
it's important to put the data from folks like GitHub
in context, a lot of the research,
when you hear some of that kind of stuff in the headlines,
a lot of that research is based on like controlled studies,
controlled experiments.
It's putting two groups of developers in a room.
It's a lab.
Yeah.
And I think that's worth putting into context.
I think there's a difference between applying these tools
to greenfield projects, side projects, small, clean codebases versus applying them to legacy
codebases, millions of lines of code, messy with different microservices.
And even in programming languages or frameworks that LLMs aren't as well suited for.
And so I think there's another interesting trend we're seeing.
In the same way that there's a big trend still today around,
hey, we need to break up our monolith for a number of reasons,
around service reliability and engineering productivity ownership.
I think there needs to be more focus right now on
all the things that have mattered for humans
as far as kind of code based readability, code based optimization.
I think the same problems hold true for LLMs.
I've been talking with larger organizations
who've kind of come into the realization
they're at a systemic disadvantage
in terms of leveraging these tools
because their code bases and systems
just aren't as optimized for agents and LLMs.
So I think that's a challenge for larger organizations. Something I wanna to mention, I'm not even qualified to really mention this deeply,
so I just want to touch on it, but expose it, is back to this fun of like the buddy in the room or
the pair programmer. Is this this phenomenon of a human activity that we're more productive,
at least I personally am, when another human is in the room? even if they're not present from my work just
The social nature of life I think bleeds into that and I wonder if that's
Maybe, you know because you've got some doctors on your staff. Maybe you've been exposed or through osmosis You've learned these things. But what do you know about like just?
Brain and not brain activity, but like more like human activity together,
just being more productive in the same room.
This is something that I've been studying personally
because whenever I am with someone else,
for some reason, I'm just able to like,
just have more energy naturally.
And I'm like, why, why is it like that?
And I wonder if that's the same thing here with like,
developers tend to, in most cases, I'm not sure's the same thing here with like developers tend to in most cases I'm not sure where the percentages but a large percentage is alone solo sometimes pair program, but it's more like a
Particular scenario. It's usually a
solo endeavor
Team sport so endeavor right team sport. We're all making it but so endeavor
Yeah, I'm making this feature or I'm in charge of this. And so I wonder if there's this social aspect
that really is now going to be,
for the most part here forever,
if we keep this tool in our life,
now we always have a buddy.
I wonder if that's a thing.
Yeah, I'm not sure.
I haven't seen research on that specifically.
Adam, you sound like an extrovert by the way.
Yeah.
I don't think I'm an extrovert at all.
I do not get my energy by hanging out with you all here when I'm done here.
I'm going to go take a nap because I have to. I'm kidding. I'm not going to,
but there's some part of me that needs to decompress after exposure like there.
So I'm definitely not an extrovert. I'm more introverted,
but there's this idea of body doubling. There's this phenomenon of body doubling.
I really wish I had Marielle Reese here who co-hosted Brain Science with me because she
knows deeply about body doubling.
And that's essentially what you do here is you body double.
You have a mirror, you have a buddy, either as a fictitious software program that can
act like human or literally a human in the room that doesn't interact with you is just sort of there working with you making it productive.
So body doubling is an interesting phenomenon.
So to the 10% and maybe there's some brain science here and I'm not a brain scientist
either, but I've witnessed in myself more speed and productivity, but like the same amount of output because I'm just kind
of like done for the day or just like happy.
You know, I'm just like, well, I wonder if there's like amount of work that a human does
in a day and you can like optimize that on the margins and some humans are probably more
productive than others and stuff.
But like for a lot of engineers, especially you've been in the craft for so long,
you kind of have this idea of like how much you can do
in a day and then you feel like satisfied.
And I wonder if people are,
and self reporting you probably wouldn't do this
because it might be against your self interest,
but like doing what they normally do,
but just kind of doing a little bit faster and better.
And then doing something else and more, you know,
working on having another meeting,
have a R&D session or going for an extra walk.
I wonder if there's any of that in there
because I find myself being like, I could do more,
but I've already done what I was gonna do
and so I'm gonna take a walk.
There absolutely could be.
Another theory, like a lot of theories on it,
another theory could be that in the enterprise
or in organizations,
application of these AI tools is just still catching up
to what you were talking about,
what you're doing with Cloud Code, right?
For example, we don't necessarily see Cloud Code
as the leading adopted tool currently in organizations.
And in fact, some companies I talk to are,
it's on their radar.
But it's pretty new.
It's pretty new.
Another theory, I saw a really good writeup
on this recently is when you actually back it,
so let's say, might need to get calculator out here.
Let's assume that at most companies, engineers spend 20%, 30% of their 40 hour workweek writing
code. So then if you take that 30%, and then you start plugging
in these numbers, like, okay, let's say they're twice as more
effective, like twice double the productivity.
So then you take 30%, so you double that,
well, what's that?
So that means like 15% net, like I said,
I'm gonna get this wrong doing this live.
So when you start backing into it that way,
again, the 10% actually,
you can see how you get there.
Even with a pretty high acceleration
of the coding part of the job,
you're kind of limited by these other factors.
And that's not even factoring,
just like all the other areas of friction,
as we've talked about before in the podcast
that are holding developers back
and that are still currently constraints,
even when the writing code part of their job
is greatly accelerated.
Right.
In other words, it's not as if you're doing coding
100% of your job.
It's a smaller portion of your job,
and if you're doing that smaller portion faster,
then you're only speeding up that one thing.
And as many of us know,
who've been in the industry a long time, is the coding part.
While it can require you to sit down for six hours
and do it, it's not always the limiting factor,
it's not the problem sometimes.
Code reviews are still very time consuming,
and of course, we gotta code review this stuff, right?
Like, the agents aren't quite good enough
to just let them just vibe code in the enterprise.
Now, there are people claiming there's vibe coding
going on in the enterprise,
but I think most of that's rogue
and just trying to beat your colleagues at your job,
unless you have data to the contrary.
I'm interested you said, clogged code isn't very adopted.
That makes sense.
I mean, these agentic tools, especially, I mean,
Gemini CLI, like the new CLI tools, we're talking
like the last three, four months. And so we're
not going to date on that.
Gemini is like two weeks.
Yeah, exactly. Yeah. I mean, things are moving
very fast and enterprises move traditionally
slow depending on the enterprise. Well, friends, it's all about faster builds, teams with faster builds, ship faster and
win over the competition.
It's just science.
And I'm here with Kyle Galbraith, co-founder and CEO of Depot.
Okay, so Kyle, based on the premise that most teams
want faster builds, that's probably the truth.
If they're using CI providers with their stock configuration
or GitHub actions, are they wrong?
Are they not getting the fastest builds possible?
I would take it a step further and say,
if you're using any CI provider with just the basic things
that they give you, which is, if you're using any CI provider with just the basic things that they give you, which is if you think about a CI provider, it is in essence a
lowest common denominator generic VM and then you're left to your own devices to
essentially configure that VM and configure your build pipeline.
Effectively pushing down to you, the developer, the responsibility of
optimizing and making those builds fast.
Making them fast, making them secure, making them cost effective, like all pushed down to you.
The problem with modern day CI providers is there's still a set of features and a set of
capabilities that a CI provider could give a developer that makes their builds more
performant out of the box, makes their builds more cost effective out of the box
and more secure out of the box.
I think a lot of folks adopt GitHub Actions
for its ease of implementation
and being close to where their source code already lives
inside of GitHub.
And they do care about build performance
and they do put in the work to optimize those builds.
But fundamentally, CI providers today
don't prioritize performance.
Performance is not a top level entity
inside of generic CI providers.
Yes, okay friends, save your time,
get faster builds with Depot, Docker builds,
faster get-up-action runners,
and distributed remote caching for Bazel, Go,
Gradle, Turbo repo, and more.
Depot is on a mission to give you back your dev time
and help you get faster build times
with a one line code change.
Learn more at depo.dev.
Get started with a seven day free trial.
No credit card required.
Again, depo.dev.
Who is winning?
Like who is, you know, is it Windsurf? Is it Copilot? Like from your data, what are people using the most?
Yeah, I mean, I don't want to, we're coming out with some data on that real soon.
Okay.
If you will, like, let's call it a leaderboard of AI.
Okay, give us a teaser.
I don't want to steal your thunder, but I want to hold a teaser.
Yeah, well, I'm not going to call the winners specific.
I can kind of, I mean, definitely Cursor, Copilot, Windsurf, well, I'm not going to I'm not going to call the winners specific. I can kind of I mean, definitely cursor, copilot,
windsurf, and then cloud code are what we're seeing, both in
the data, but also when we go talk to organizations about
kind of what they're what they're looking toward. You
know, other interesting thing,
we've all followed like Curso's astounding growth.
What's really interesting is like most organizations
are just in experimental mode right now.
So they're going in and saying,
okay, we're gonna like buy them all.
We're gonna buy them all, give everybody everything,
then we're gonna figure out what we're actually gonna do.
So it's very much up for grabs.
Everyone's talking about like cursors, momentum,
but I wrote an article last week saying,
yeah, their growth is incredible,
but who knows what's gonna happen?
12 months from now, companies are gonna say,
okay, we've been trying all
these things. We need to kind of potentially standardize around a uniform tool chain around
this. As you know, dread like cloud code, like there's even these like workflows, shared
workflows. There's like, there'll be leverage in standardizing this tooling because there's
going to be a lot of enabling work
that needs to happen to make these tools
and agents successful.
And so, yeah, it's very much up for grabs,
but the tools I mentioned are the ones currently,
I think we're seeing the most interest
in highest levels of adoption by companies.
Are you in an extreme growth mode as a result of this?
Like the DX business?
Yeah, because you've got to,
if you have a large swath of enterprises
who need to experiment,
they need to track how they experiment, right?
So they need frameworks, not what you've got.
I'm just curious if that has resulted into extreme growth.
Six months ago, even four months ago, I would say companies coming at us were looking for
help with all kinds of things and kind of figuring out AI was one of them.
Say right now, AI is the number one use case. That's the only thing.
That's the only thing anybody cares about.
Yeah. So it's data for everything from bake-offs.
As you said, hey, we're evaluating five different tools,
and we want to, with data, understand
which of these are most effective for developers.
It's putting a real, turn that into dollars and hours
and numbers, hey, what is the impact?
You know, our CFO is saying we should have 50% improvement.
What is the data telling us about what this is actually
yielding right now?
It's understanding, you know, what are the downstream effects?
So, okay, we're seeing more code throughput, faster code velocity.
We also seeing more defects. You know, we like how's that affecting developer flow state?
Are we seeing more incidents? Is the code maintainable? The developers, is developers
ability to then maintain this AI generated code, increasing or decreasing.
And finally, cost.
So with the consumption, consumptive based spending now,
there's a real question.
Okay, we gotta like figure out what,
like how much can we spend?
Like what's the appropriate budget?
And then how do we think about making sure
that we're spending that money on good things,
not like developers screwing around,
burning tokens in ways that aren't
a credo for the business.
So yeah, Adam, it's been a really big tailwind
for the DX business for sure.
Well, it has to be one of the most divisive technology
hype cycles in human history because, I mean, maybe blockchain
was equally as divisive because there was believers
and non-believers in blockchain,
and there was a lot of hype around blockchain
will solve every single problem.
And then other people were looking at the technology
and thinking like, well, it's really good
if you need decentralized consensus, you know,
which does have some applications, and it's finding some use cases, but not like
everyone's going to say blockchain will solve it. Right. And so you had a lot of division
there. And you have a lot of division on this because yeah, you have CFO saying we should
be 50% more productive and you have people who are boots on the ground saying like, that's
not going to happen. You know? And so that's a lot of pressure on me and my team,
which we think is unwarranted,
and we're doing all the tools.
It's just insane how much, yeah, top-down pressure
of something that nobody really knows the upside
in any sort of clear way, right?
We know a vague upside.
We can feel it, we can maybe report on it a little bit,
but we're just not sure where this train is headed.
And so it makes sense that DX, you know,
your guys' business is in high demand on that one topic
because we all wanna know.
Yeah.
You know, it's a big open question right now,
is how much are these tools gonna deliver on the promise?
So far it's looking like more than blockchain, you know?
But like you said, Jerry's still out.
It is amazing when I go talk to leaders.
I think a lot of leaders, I don't know the percentage,
we haven't surveyed them,
but I think a lot of leaders really do believe
that a large portion of their engineering workforce
will be replaceable. A lot of leaders I talk to, I can hear it and I can see it in their eyes.
They believe that that is what's going to happen. And when I talk to like prospective investors,
CIOs, it's a question I get asked a lot about,
even the DX business.
How is this relevant in a world where
there's way less developers?
Right.
It's an interesting question.
AX, dude, you need AX.
Agent. Yeah, AX, yeah.
Agent, experience. So it's an interesting question. AX, dude, you need AX. Agent, agent experience.
So it's an interesting thought exercise.
I also talk to leaders
who strongly believe
this is not gonna result in any sort of
widespread reduction in human head count.
I think that's my personal prediction at this time.
I was talking with a leader,
I think I just heard a, it's called Javon's Paradox.
Have you ever heard of that?
Okay, I haven't heard of this one, no.
Okay, it was just shared with me this week,
but it's on Wikipedia.
It's the idea that when a resource becomes more efficient,
The idea that when a resource becomes more efficient,
it actually leads to higher utilization. So meaning is, I think there's an example
of something with like oil refineries
and the ability to refine oil became much more efficient.
So you'd think that the sort of investment in that would be decreased, like, oh, we can have
less oil refineries, because we can refine oil faster. But, you
know, it only resulted in like more production, more
refineries. And so a similar applying that to what's
happening here, like, is, if we view these tools as making
engineers, maybe not replacing engineers,
but making the engineers significantly more productive,
you know, that that law would then suggest that,
well, we're just gonna have more.
Yeah, we can get more out of per engineer
and we're just gonna have more engineers
and have more software faster, right?
So yeah, it'll be interesting.
I see how this all plays out.
I kind of I'm java with that because I think that I hadn't heard of this
principle or paradox before, but it does make sense
that when you make something more efficient, you tend to use more of it.
And that's kind of where I'm leaning towards.
Like you're going to have a recalibration of what a developer is
because there's going to be more people willing to do what developers do.
And so there's going to be a wider spectrum of what to do.
So degree of difficulty, easier, harder.
And then I think you're going to see the definition change, so to speak.
And you're going to see more people come into it.
You're just, you're still gonna need people to think. You know, there's, I can't extract with this big enough,
but there's still gonna be humans to think about the problem of humanity.
You can probably offset a lot of that to AI, but I think you still need like this human
intellect, this human, I
don't know how to describe it besides feels like what
feels right to humanity. I think you still need that in there because it's not quite in the AI,
they're just more bits and bytes more than true intelligence is, it's not the same, you know.
And this shows up. So we just published this AI measurement framework, sent you guys the link.
It'd be great to include in the show notes.
One of the big questions as we developed this framework
was how do we measure agents?
So do we treat agents as people
or do we treat agents as extensions of teams and people?
In the framework, we discussed this in the paper, you know, we
advocate for treating agents as extensions of people and teams.
So another way to think about that is that what that effectively
What that effectively means is that a developer is the manager of these agents, but we're still measuring the developer and the team with the agents being an extension of that team, if that makes sense.
Yeah.
So thinking about how do we measure this, you kind of arrive at similar questions
to what we were just talking about
in terms of the human to agent ratio
and balance and relationship and how that will evolve.
Yeah, because the effectiveness of the humans
being extended is another factor
because I may be better or worse
at leveraging an agent than you might be.
And so this team plus three agents
versus that team plus five agents,
there's so many variables there
to actually whittle it down to
any sort of usable information.
Well, that's your job, Abhi, not mine.
So I'm sure you guys will figure it out.
And you hear more about,
this is another thing when I talk to leaders,
they're thinking a lot about number of agents. Like what's the right ratio of human to agents? And I think that's a
really interesting question. I also think it's not a practical question right now. I mean, you know,
this working with like Claude Code, like it's not, we're kind of talking about like single threaded
versus multi threaded, like it's not,
we're not at the point where we're really talking about,
I have one QA agent and a designer agent
and a front end developer agent,
that's not really the paradigm.
I mean, I've seen people kind of trying to do that.
But there's some people doing that,
they're on the edge.
They're on the tip, yeah.
But that's not really're on the edge. There's some people doing that. They're on the edge. They're on the tip, yeah.
But that's not really the paradigm right now.
So I think right now, human extension is the right way
to think about it, but that could change.
In that paper, one of the things that you had
pull quoted actually was companies are no longer limited
by the number of engineers they can hire,
but rather the degree to which they can augment them
with AI to gain leverage.
It's kind of like what you're talking about there,
is like you're counting agents, you're counting humans,
but you really want to just like augment the human ability,
not replace it.
Although some of the leaders have been smirking,
thinking replace, replace, right?
Yeah, they're thinking we'll replace them soon.
Gosh. As soon as the data shows that we can, we will.
Bye. And thankfully, the data is not showing that.
Right. And I think the secondary effects of software quality,
understanding the bottlenecks in the SDLC, like we talked about things like code review
aren't going away. Dec decision-making and judgment.
Actually, when you think about it, for example,
like product management, a lot of seasoned engineering
leaders, when I talk to them, know that product management
is actually the big bottleneck, not so much
engineering velocity.
It's really like product velocity. It's decision making.
It's that life cycle from idea to code.
And I think we're going to see attention on those bottlenecks magnified,
because as we optimize the coding, we're already seeing this at DX,
companies come to us, hey, we thought we were supposed to get
50% productivity improvement.
We're not seeing that from the AI tools.
So now we're asking what really is our problem?
What really are the bottlenecks?
And so it is magnifying attention
on engineering productivity in general,
because folks are really focusing on that topic right now,
because they're expecting these tools
to be transformative in their organizations.
So I think that's an interesting trend we're seeing as well.
So this could actually result in people carrying more
and investing more in the other aspects
of developer experience that people maybe haven't focused on
before because those
constraints are being magnified.
When you solve one bottleneck, then you see the other one for what it is.
And you start trying to solve that one. Well friends, it's time to build the future of multi-agent software.
You can do so with Agency.
That's AGNT CY.
The Agency is an open source collective building the internet of agents.
It's a collaboration layer where AI agents can discover, connect, and work across frameworks.
For developers, this means standardized agent discovery tools, seamless protocols for inter-agent
communication, and modular components to compose and scale multi-agent workflows. You can join crew, line chain, lambda index,
browser base, cisco, and dozens more. The agency is dropping code, specs, and services
with no strings attached. Build with other engineers who care about high quality multi-agent That's agntcy.org and add your support once again,
agency.org, agntcy.org.
So budgets, I'm curious about budgets
because as we look at agents as extensions of engineers,
let's imagine that I'm worth $100,000 a year,
whatever plus whatever.
And if you give me one agent, maybe I'm worth 120.
And so are you spending $20,000 a year
on an agent per engineer?
Cause that saves you another engineer perhaps.
I'm probably not gonna get to two engineers,
a lot of two-Xer, but maybe I'm 1.1X,
maybe I'm only worth 110, how much do we spend on this?
I'm sure these people are trying to figure it out
because budgets very much have to be actualized
and decided on like how much are we gonna spend towards this.
Now when you're just trying every tool there is,
and I guess is the budget is, don't worry about the budget,
we gotta figure this out.
But eventually those things need to be figured out
because I don't think we're gonna get outright replaced
like these CEOs want to soon.
But certainly we're gonna be augmented
in a way where your budgeting starts to change.
Start to think about, you know, engineer plus.
What are your thoughts on that?
And have you guys done any work
with regard to pricing these things out?
In our paper we talk about cost
and we talk about ways to think about cost.
We, I don't think across the industry
are at a point where companies are focused on this problem.
They're talking about it because they know it's coming.
They know it's the next problem they need to figure out
once they get over the kind of experimentation phase.
But your example right there, yeah,
you know, your cost is $100,000 per year with an agent.
Your cost is 120,000 per year.
Like what does that mean?
Again, this goes back to the, well,
then should there be less develop, should there be less people?
Right. Because like we're offsetting that. And so a couple ways that in our paper, we talk about ways to think about this.
So what is this idea of like human equivalent hours? And this came up in our conversation here before, like being able to kind of measure, okay,
how much human equivalent work did this agent do?
So not just looking at number of PRs or right,
but like how much human equivalent work.
So if you can measure that, which is hard,
then you can take that against AI spend.
And you essentially have this idea
of an agent hourly rate.
So your spend divided by number of hours of work produced,
work done, that's your agent hourly rate.
So I think that's one interesting number
for that we're working with companies
on putting some focus on.
That's a number you can use to start to rationalize
what's the right amount of spend.
Another interesting metric is like net time gain
per developer.
So that would be, we talked about the time savings.
So how much time is AI saving you?
Well, how much are you spending
on AI? And then when you take that, you know, convert it into equivalent of what the developer's
hourly rate is, are they actually saving time or do they spend more money than the time they saved?
Right. Right. So those are two, so agent hourly rate
and net time gain per developer are two metrics that,
again, really hard to get at right now.
I mean, a lot of the vendors,
it's hard to just get the cost
and stay on top of the cost information in general.
But I think those are two good frameworks
for thinking about the net ROI and right sizing the investment.
That's a complicated task.
That's for sure.
But I'm glad we got smart people thinking about it.
It'll certainly be a huge concern maybe 12 to 18 months from now.
Eventually, some of these tools got to shake out, I think.
And that's what I've kind of been waiting on is like, you know, I'll let the edge lords do their edging
and then I'll just wait and see what shakes out.
But it's been a longer grind, you know,
it's probably been two and a half, almost a three year
since ChatGPT changed the world.
And they're just now getting to where,
like for the longest time. I replaced Google with it, but I wasn't going to use it for any software until this last iteration of models
and they've all gotten to where it's like, okay, you know, we've reached a threshold which is significant.
Going back to your Javan's paradox, that actually tracks with me with just as an N of one. Like I said earlier, I'm not replaceable in this sense,
but I'm just writing more software.
Like I'm not just doing less,
although I did confess to going
and taking a walk earlier than I would have.
But I'm also just doing more stuff
that I wouldn't have done.
Like it just unlocks me to write more software
that I wasn't gonna write.
And I imagine all around the world,
imagine every JIRA board or Pivotal Tracker
or whatever tool you're using and the Icebox,
you know, the backlog.
And there's things in that backlog that you know
they're just never gonna get worked on
because other stuff just goes in higher and replaces them.
And it's just a constant grind.
And there's so much unwritten software
that we're not gonna run out.
We're not gonna, we're just gonna hire, hire, hire.
We're gonna augment, we're gonna write more software.
Some of that software is gonna be really crappy.
We're gonna hire more security engineers
and then we're gonna hire people to replace the software.
I mean, it's gonna be just fine, I think.
That's my-
Just fine, I think.
I think we're gonna be just fine.
Now we do change how we do our work.
Absolutely, 100% change how you do your work.
And I think our teams change slightly.
I think our enterprises change.
I think less large enterprises probably,
smaller teams doing more.
Businesses don't have to grow that head count
quite as fast, but the large ones stay large.
That's just an intuition.
I don't know that I have with you guys.
Well, even with your, you know, confessing to taking a walk early, I wonder if maybe
during that walk you solved a harder problem that you haven't been able to solve because
you were happier, you felt more fulfilled and maybe you actually had the brain space
to just think, you know, so that walk doesn't actually.
I like to think I did. I'm pretty sure I did.
Yeah, you probably did.
I mean, it's not indicative of less output.
That's the problem, I think, is...
And why I'm so thankful DX is here because you got the Core 4 and this kind of four degree
of measurement across teams, and you got this newer one for the AI to measure agents.
We need those checks and balances because I'm curious, you know, having said that, if
the DX Core four needs to change
or will change because of AI like does do we need to add a happiness metric in there
or morale? I think it's kind of in there, but maybe you can speak to it more. So Abby,
but I'm curious if that DX core four needs to be the core five because we need to measure
the human contentment, I would say, like as an individual.
And then that individual is part of a team.
That team is part of a culture and a culture of a company.
I'm curious if that will change because of AI.
Yeah, we talked about this last time I was on the show too.
Did we, gosh.
We have the idea of happiness encapsulated
in our developer experience index measurement.
Okay. Very intentionally not called happiness because one of the goals of the core
four is to make it palatable for executives.
And I mean, not to sound cynical, they don't want to measure it.
Not now, back in 2021 they did did, when no one could retain developers.
Right.
Right now they can't.
Funny thing, GitHub recently came out with a white paper
on how to think about measurements
and we consulted with them closely.
They incorporated a lot of the core four measurements
in their article, but one thing specifically I kept telling them
was don't call it happiness.
They're like, GitHub is all about developer happiness.
It's all about developer happiness.
I said, don't call it happiness because, you know,
the irony is if you call it happiness,
then executives won't measure it.
And so they won't care about developer happiness.
If you call it it if you kind of
frame it as something else we frame it as developer experience index which is how you measure effectiveness
because we believe like developer happiness is part of being effective in how you build software
then they'll measure it and it'll get improved and optimized and so
then they'll measure it and it'll get improved and optimized. And so, yeah, it's a naming problem, Adam,
but it is encapsulated in there.
In terms of like, should the Core 4 change?
That's something we've looked at closely.
And as of now, as a organizational way of thinking
about a measuring productivity,
we think Core 4 still holds true.
What's different about AI is you need more, right?
There's a lot of new stuff we're measuring.
And so the AI measurement framework actually includes the Core 4 in one aspect of it,
which is understanding how is the overall organizational productivity being impacted
pre and post AI or depending on level of adoption.
So that's a lot of what we're helping our customers measure right now is when we look
at the adoption curve in our organization or the maturity curve, I think it's transitioning
into now less about adoption more about maturity. So not just
are people using Co-Pilot daily, but how are they using it? Right? Are they using it just for
autocomplete? Are they using the agentic stuff? Are they using the AI to help them create the
prompt that they feed back to AI? Right? So maturity, like how is that matured increases?
to AI, right? So maturity, like how is that maturity increases?
Does the organization seem more productive based on data?
That's the big question we're trying to help companies
answer right now.
What has been your adoption strategy at DX
for these tools?
Yeah, you know, we work with Netflix
and one thing that's been on my mind is how much I have not heard Netflix making a fuss about AI.
Meaning like a lot of these companies like, you know, we need our developers using them.
We're measuring how much they're using.
Like I haven't heard that from Netflix, which makes a lot of sense because Netflix is predicated,
but their entire culture is, hey, we just hire really senior people, like,
like developers kind of rule at Netflix, right? Because they
they entrust their that their developers are the best in the
world and that if there's a really useful tool for them to
use, they'll use it, they're going to use it the right
amount. They're not going to use it more because we're measuring them.
And so that's the approach I've taken at DX.
Now I've also encouraged like cloud code, for example.
So I encouraged a few of our engineers,
hey, I'm reading about this workflow,
you know, voice attacks to cloud code to this, to that, right?
Like one of those more edge.
And so I asked a few of our engineers to go try this
workflow for a little bit and see what you think.
But we definitely haven't mandated it.
It's my understanding that everyone's using one or multiple
of the tools and doing their own experimentation.
But ultimately, yeah, I trust the engineers to use it to the degree, to the extent that it's useful.
And I think that's the, I haven't put any pressure on people. I haven't been like,
if you don't do this, you're going to become obsolete. That's not a message I'm bringing. We have some really
interesting conversations around just candid conversations like where do we all think this
is going, right? And that's also relevant to DX because we're a product company. We're thinking
about products around AI tooling, AI enablement tooling. And so we're having those discussions as well. Can you say more about that?
Well, DX the business, we've gotten to this
interesting point, we've been doing the measurement,
right, thing for now almost four years.
It's going really well, but a few months ago.
You're getting bored, getting bored of measuring stuff.
Well, a few months ago, I met with Drew Houston,
CEO of Dropbox, and he said to me,
they've been a customer of ours, he goes,
you guys are doing diagnostics really well.
Have you thought about, what about the interventions?
Like, what about solving the things that you're measuring?
And so, that's actually a question I realized,
we've got asked in different ways, but constantly,
like, well, can you actually help us, like, improve?
And, you know, we've done a lot of things,
like you guys saw this partnership we have with ThoughtWorks.
So, hey, partnering with consulting companies
who can come in and help you with the transformation.
We started partnering more with different vendors.
Hey, is there a way we can loop in vendors?
We don't really wanna play favorites,
but hey, at least we can map different tools out there
to different problems, areas of the SDLC.
I think we've gotten to a point where it's like,
hey, you know what, actually there's like gaps in the market
but there is no solution for some of these problems.
And should DX just go build them?
And now with AI, we're seeing a whole new generation
of problems that are being created.
I mean, like you guys,
my previous company was like a slack app, right? So like, I got in on that right as slack was
just grown like crazy. And so there was just, it was an entire
new paradigm, and the entire generation of businesses were
built on slack. And so I think AI and AI engineering
specifically, so this new way of working, this new way of doing software
to me is actually a new paradigm in which,
potentially like the entire tool chain could be rewritten.
Like the folks at GitHub and GitLab are worried
that cursor is gonna add code review
and source control management to their,
like just go for it, right?
I think it completely upends the status quo.
And so, you know, at DX, I don't think,
we're not going after cursor,
we're not going after GitHub, that's for sure.
Who are you going after?
I think we're going after the adjacent opportunities,
the ones like on the margins that, you know,
the big guys are,
we're not trying to go to war with the cursor and copilot.
We're trying to solve the adjacent problems we see.
Some things we've talked about today,
like how do you actually,
how do you upskill developers?
How do you optimize your code for LLMs?
How should platform engineering teams think about
sort of self-service and enablement
in the same way that if you guys have followed
things like Spotify Backstage, right?
Like big focus on golden pads,
self-service developer enablement.
What does that look like in a post AI world?
It's like, well, enablement on what?
Golden pads around AI tooling, AI development workflows,
shared, you know, we talked about like,
Claude has workflows, literally, right?
They have workflows, they're so like, you know,
curating that, like, how do you create like a standardized
set of like workflows that you hired,
you develop in your organization, boom,
they have this, you know this menu of superpowers,
AI-powered superpowers that they can... So those are the types of problems. I can't get into
specifics today, but those types of adjacent problems, I think, are new constraints for
enterprises looking to deploy AI at organizational scale. So not single player mode, right? But more multiplayer mode.
How does an organization become successful with these tools
is a different set of problems.
It's like AI adoption best practices as a service.
Yeah, that's one potential opportunity, right?
That's just one of your ideas.
Yeah, that's not saying that's what we're doing in DX, but.
What else is adjacent to this?
Don't give us your solutions,
but what are the other problems?
Spill the beans, Avi.
He's gonna press you to spill those beans.
Wow.
I'm just kidding with you.
He's not pressing you.
He's just curious.
Yeah, I think what we're seeing is,
there's really two things.
One is that there's this new set of problems,
like some of the things I've hinted at.
I think there's this new generation
of problems to be solved. The other opportunity is that there's actually the pre-existing
problems that are being more magnified, meaning post-AI, it's even clearer that these are
real constraints. And in some ways, they're a limiting factor to how much value you get out of AI. Because again, if we go back to the idea of agents
as extensions of humans, then the ability of that human
and the environment in which that human is working in
is actually a more magnified limiting factor.
Because now this developer could be getting
150% leverage, whereas before there
was just 100 to use the numbers we were talking about before.
So what are the constraints of the human?
Well, it's probably the same constraints we've had before, some new ones, and those types
of things we've been measuring for years and seen go unsolved and we think maybe we should go try to solve them.
Are you raising money? Where do I invest, Avi? Where do I invest? You sound like you're well
positioned to actually take these adjacent markets over, man.
I think GitLab, GitHub, Atlassian have a platform advantage over a cursor.
That battles on. It'll be really interesting to see, can GitHub leverage its platform advantage over like a cursor. Sure. That battles on, right? It'll be really interesting to see,
can GitHub leverage its platform advantage that it has,
being the system of record for code, right?
And like developer communication,
can it convert and being embedded into much of that,
so you'll see, can it convert that into a really amazing platform
that can kind of win out over point solutions
that are threatening their business model?
I think similarly DX has some platform advantages, right?
Namely that we can actually measure
what's going on with all this.
You can prove what's going on.
Yeah, and so, yeah, again, I don't think,
we're not gonna jump into the battle of the AI code agents,
but we see opportunity to bring a lot of value
to our customers by helping them maximize those investments
and continue to just optimize
their overall engineering productivity.
We're almost out of time, but I gotta ask you
for predictions since you mentioned this.
Cause when you mentioned the popular agents,
you did not mention Copilot.
Curse was mentioned, Windsor was mentioned,
Cloud was mentioned because Jared sort of interjected it,
Cloud code at least, but Copilot was not.
Can you give me a prediction?
What do you think is gonna happen over the next year or so
given this?
Tomocho's waters just mentioned to get hub or get lab the entrenched
What what happens if they don't succeed in?
This transitional moment. Yeah, if I were cursed, I'd go for it, right? I would go for the full stack like be the platform
I think the prediction I'll leave you with is I'm interested in seeing,
are we gonna end up in kind of like,
in the same way we've sort of ended in like multi-cloud,
like is the end state of this like kind of like multi-agent,
like, you know, different companies will offer agents
and models that are more fine tuned
to different types of work.
And so for the developer, are we gonna be in a world
where like based on the task or even sub-tuned to different types of work. And so for the developer, are we going to be in a world where
based on the task or even subtask,
we're delegating to different providers and services?
And then there's an orchestration problem.
So that's another problem we're thinking about at DX.
Is that the paradigm?
And if so, there's a tooling layer needed.
Man, I want to be invited to one of these think tanks
y'all have.
I want to be in these.
I want to be a fly on the wall in the room with all this data you got, this moat you've
got just looking at the landscape and considering.
Less conjecture because they have actual data.
You know, Adam and I conjecture, but we don't have any data.
So we're just talking out of our certain body parts.
All right, Abbie, we know you got a hard stop.
We'll let you go.
Thanks for stopping by again.
You're welcome. Anytime, man. Yeah, thanks so much. Always you had a hard stop. We'll let you go. Thanks for stopping by again. You're welcome anytime, man.
Yeah, thanks so much.
Always fun chatting with you guys.
Keep up the great work.
Bye, friends.
Bye.
All right, that's all we got for this week.
Thanks for changelogging with us.
Thanks for frenzing with us.
Thanks for making your way to Denver
so you can meet up with us and live show with us
and maybe even hike with us.
You are coming, right?
I hope you're coming. July 26th, the Oriental Theater, Denver, Red Rocks, oh my gosh it's gonna be so much fun.
Learn more about it at changelog.com slash live. Thanks again to BMC for these dope beats,
to our partners at Fly.io, and to our sponsors Auth0, Auth0.com slash AI, Depot at Depot.dev,
and agency that's AGNTCY.org.
Have a great weekend.
Tell your friends about the change log, why don't ya?
And let's talk again real soon. Thanks for watching!