The Changelog: Software Development, Open Source - Measuring the actual impact of AI coding (Friends)

Starting point is 00:00:00 Welcome to changelog and friends, a weekly talk show about taking more walks. Thanks to our partners at Fly.io, the public cloud built for developers and AI agents who ship. We love Fly, you might too. Learn more at Fly.io. Okay, let's talk. Well, friends, I'm here with Damian Schenkelman, VP of R&D at Auth0, where he leads the team exploring the future of AI and identity.

Starting point is 00:00:48 So cool. So Damian, everyone is building for the direction of Gen.AI, artificial intelligence, agents, agentic. What is Auth0 doing to make that future possible? So everyone's building Gen.AI apps, Gen.AI agents. That's a fact. It's not something that might happen, it's going to happen. And when it does happen, when you are building these things

Starting point is 00:01:09 and you need to get them into production, you need security, you need the right card trails. And identity, essentially authentication, authorization, is a big part of those card trails. What we're doing at OZero is using our 10 plus years of identity developer tooling to make it simple for developers, whether they're working at a Fortune 500 company or working just at a startup that right now came out of Y Combinator to build these things with SDKs,

Starting point is 00:01:37 great documentation, API first types of products, and our typical Auth0 DNA. Friends, it's not if, it's when, it's coming soon. If you're already building for this stuff, then you know. Go to Auth0.com slash AI. Get started and learn more about Auth for Gen. AI at Auth0.com slash AI. Again, that's Auth0.com slash AI. dot com slash a I. All right, I'll be we're here to talk about measuring these a.i. agents that are infiltrating our organizations everywhere,

Starting point is 00:02:12 our lives, they're making us in some cases, in some cases we do it willingly. But somebody's got to track these things. How you doing it? Well, it's still early days. But OK, it's the the the name of the game right now is firstly, being able to understand how are we using these AI tools, how are developers incorporating them into their workflows, how much value are we getting out of them, what's the ROI, are we spending too much, too little, how do we write size, amount of spend? And then how good are these agents is the other question. And how do we measure AI?

Starting point is 00:02:54 Gosh, are you asking us? We're asking you. I'm just setting the scene. Setting the scene. Just setting the scene. It's early days, you said though. It's early days. So you've been in the business of helping organizations understand the developer experience in terms of morale, ability, code being committed, how that affects the organization,

Starting point is 00:03:15 how that affects the bottom line. So you've got organizations that essentially hire you as a service or a consultant, or however you wanna frame that, and you help them determine if their teams are successful and if code is being deployed properly and all that good stuff. You've got to have some sort of pressure from those folks because they're top down at this point.

Starting point is 00:03:36 They're saying, okay, developers, you must begin to use this because we're seeing this dramatic increase and they've got to deploy it in ways where they can test it and try it. So what kind of pressure have you seen from your side? Quite a lot. I mean, it's, I think I can speak for all of us. That I don't think I've ever seen anything like this in the industry where,

Starting point is 00:03:53 uh, you know, those at the top are so bought into the promise of a new technology and are pretty aggressively pushing it down. And a good example of that is one thing that's really common is for for top down tracking now just adoption and utilization. So lots of organizations are looking at monthly active, weekly active, daily active usage by developers. They're segmenting developers into different

Starting point is 00:04:24 cohorts based on whether they're super users or low, medium, moderate adopters. And then starting to then trying to study, okay, what is that getting us? Right, are the people who are using AI more productive? Are they happier? Is their code better or worse? But yeah, the pressure is unlike anything I've really seen before.

Starting point is 00:04:49 And I was just talking to some researchers at one of the prominent AI developer tool vendors and, and they said that a lot of this usage that they're saying, especially of agentic tools right now, they believe is more fear driven than utility driven, meaning that people are using a lot of these tools even when they're not really effective right now, because they fear that not doing so will mean

Starting point is 00:05:27 that they could become obsolete. So that was a pretty interesting finding from- I blame Steve Yagy. He comes on our show, he starts telling people, you will be replaced, you better adopt this thing right now. The idea is dead. The AI vendors have done a fantastic job in their marketing of affecting the minds of leaders.

Starting point is 00:05:49 I even saw like Anthropix, what they put together like an economic research organization to study the impact of what's going to happen. It's like the best PR Sun ever, I think. For sure. Well, have you guys been surveying? Have you been collecting the data? Do you have anything that you can, at the definitive or even just gives us a glimpse

Starting point is 00:06:11 into what's actually going down on the streets? Yeah, so we are collecting data from over 400 different organizations now. It's both through surveys as well as looking at their actual telemetry. So DX connects to pretty much all the leading AI coding tools today. So whether that's Copilot, Cursor,

Starting point is 00:06:32 Windsurf, et cetera, Cloud Code. So we're ingesting that telemetry as well, which gives us a real time view into developer usage and utilization. Some of what we're seeing, first of all, adoption is rising extremely rapidly since really about three or four months ago. I think that's when we started seeing the top-down mandates. That's when the message became, you got to get on board or you're going to be left behind.

Starting point is 00:07:03 So we're seeing that in the data. In terms of impact, we see a number of really interesting things. So first of all, on average, and keep in mind, this is, it's called Q2 2025, because this space is evolving very quickly. On average, developers report saving about three hours per week, things say, iTools. Okay. Now, when you think about what, well, what, put that in the context. Well,

Starting point is 00:07:36 that's about what, five to 10% of their work week. So about, so we're talking about about a five to 10% boost. Now that is a lot less than maybe what you might expect if you were just looking at the headlines or scrolling Reddit. One piece of research it aligns with is Google. I don't know if you guys saw that they came out, it was about two or three weeks ago saying, hey, based on our research,

Starting point is 00:08:05 we're seeing about a 10% productivity improvement with our developers, thanks to AI. So that's one data point. Couple other data points, and we can kind of dive deeper as you guys wish, is one of the strongest relationships we're seeing with data is actually with engagement, meaning developer job engagement.

Starting point is 00:08:27 And I think that's really interesting because when you like hear some of these like OGs like Kent Beck getting into AI augmented coding, like one thing you hear them talk about, maybe more so than anything about their productivity, is how much fun they're now having. It's just a more enjoyable paradigm of working. And so we're seeing that reflected in the data. You don't see that being talked about in the press. I don't think people maybe care about that as much right now. It's all about productivity.

Starting point is 00:08:57 The last thing I'll share is, whereas we are seeing that around 10% lift in developer time savings. We're not seeing that strong of a correlation in terms of something like code throughput. I mean, actual rate of deliverables being shipped. So that's a little bit perplexing. That's a metric a lot of organizations immediately

Starting point is 00:09:24 wanna look toward is we shipping more PRs because of. Sure. And there is a small relationship, but it's not, we can't say is like 10, 10% plus lift across organizations right now there. And that raises a lot of interesting questions to, well, why and how are the time savings trend? Where are the time savings, where are those time savings going

Starting point is 00:09:47 are some of the interesting questions. This research is based on Q2 of this year, is that right? So this time window that you're speaking of is basically just Q2. Q1, it's really H1 data. Okay, all of this year, 2025. And I should add that we saw a notable rise in a lot of those numbers compared to each

Starting point is 00:10:06 two of last year. So particularly the adoption metrics, the time savings, those have increased materially since each two last year. I don't know about you, Jerry. I want to talk about the fun. I feel like we've been talking about all this productivity and the FOMO and the fear and the slaps in the faces and the, you're gonna lose your job, oh my gosh.

Starting point is 00:10:27 Let's talk about the fun. Can you talk about the fun, Avi? Like, what do you know about this Kent Beck fun aspect? Like, what is the unlock here that's making this paradigm shift more fun? You know, I think when GitHub Copilot first came out, it was your AI pair programmer, right? And I think-

Starting point is 00:10:45 Your buddy. Your buddy. And I think that more interactive, more social form of doing development work, having someone, or in this case, an AI to where you can get unblocked when you're just in a brain funk or get really fast feedback on something you're trying to do or that you just did.

Starting point is 00:11:13 I think that's more fun. We've heard that from our engineers. We see that out in the field when we're doing research and you see that from folks like Kim Beck who are really, or Gene Kim, right, who are talking about how much fun they're having. They haven't maybe been doing as much coding in their careers recently, but they're getting back in

Starting point is 00:11:33 because they're having so much fun. Nick Neesey, I was making fun of Nick a few weeks back on the show, maybe a few months back, because of all of his AI subscriptions. He was confessing all of the money he was spending. And I was saying, he was telling me how he was using it and I kind of made fun of him and said, well, you're just lonely.

Starting point is 00:11:51 Like you don't actually need help. You just want someone to be there with you. He's like, yeah, totally. And for him, like that is the fun as it feels more alive to just not be alone. Now people who pair programmed in the past or do it their jobs know what that's like. It can also be exhausting because you're interacting,

Starting point is 00:12:11 you're trying stuff, you're bouncing stuff off a person. And most of us don't have that. I mean, very few orgs buy into, let's put two developers on one feature. I mean, that just is a very hard sell. And so people who do it swear by it, but very few people will do that because it just doesn't make sense in the leadership's eyes.

Starting point is 00:12:31 It's like, okay. But the pair programming aspect of this and just having someone that it's like the rubber duck, but the rubber duck talks back and has ideas and has information. And that's really powerful for, I think, a lot of us. For me personally, Adam said it, unlock. For me, I'm just doing stuff that I wouldn't have tried before

Starting point is 00:12:51 because I just don't have time. And I don't have two hours for this random idea I just had where I thought, this would be nice. Nah, that's too much work. Like, I've done that constantly for the last 20 years. And now I'm like, this would be nice. I'll just go have Claude try it while I'm doing something else and you feel like somebody else is toiling away and you're just getting that thing done and sometimes you throw it away and

Starting point is 00:13:15 sometimes you use it and sometimes it just helps you with something else and that for me specifically I know I've sung Claude Codes praises many times on the pod because I'm just into it right now. I'm just having fun with that particular tool. Once it was agentic and it was in my terminal and it was good enough that I didn't really have to look at the code as long as I wasn't gonna like check it into our main repository and have to maintain it, I'm just coding all kinds of stuff without coding and for me all of a sudden I'm having fun. Whereas prior to this, like if you go back the last two years, it's been like a Google replacement, but there's nothing fun about replacing Google. You're like you just get faster answers.

Starting point is 00:13:52 But your 10% is interesting because you know, GitHub has been claiming 50% for a long time, haven't they? I mean that's been their advertisement on CoPilot is 50%. 10% to me seems low, but that's self-reporting and analytical reporting like you're doing the data on that. And people say three hours a week, which would be 5% on a 60 hour week, 10% ish. Almost 10% on a 40 hour week. So yeah, 10% is just not, I wonder why it's not higher.

Starting point is 00:14:25 Yeah, I think first of all, it's important to put the data from folks like GitHub in context, a lot of the research, when you hear some of that kind of stuff in the headlines, a lot of that research is based on like controlled studies, controlled experiments. It's putting two groups of developers in a room. It's a lab.

Starting point is 00:14:47 Yeah. And I think that's worth putting into context. I think there's a difference between applying these tools to greenfield projects, side projects, small, clean codebases versus applying them to legacy codebases, millions of lines of code, messy with different microservices. And even in programming languages or frameworks that LLMs aren't as well suited for. And so I think there's another interesting trend we're seeing. In the same way that there's a big trend still today around,

Starting point is 00:15:38 hey, we need to break up our monolith for a number of reasons, around service reliability and engineering productivity ownership. I think there needs to be more focus right now on all the things that have mattered for humans as far as kind of code based readability, code based optimization. I think the same problems hold true for LLMs. I've been talking with larger organizations who've kind of come into the realization

Starting point is 00:16:10 they're at a systemic disadvantage in terms of leveraging these tools because their code bases and systems just aren't as optimized for agents and LLMs. So I think that's a challenge for larger organizations. Something I wanna to mention, I'm not even qualified to really mention this deeply, so I just want to touch on it, but expose it, is back to this fun of like the buddy in the room or the pair programmer. Is this this phenomenon of a human activity that we're more productive, at least I personally am, when another human is in the room? even if they're not present from my work just

Starting point is 00:16:50 The social nature of life I think bleeds into that and I wonder if that's Maybe, you know because you've got some doctors on your staff. Maybe you've been exposed or through osmosis You've learned these things. But what do you know about like just? Brain and not brain activity, but like more like human activity together, just being more productive in the same room. This is something that I've been studying personally because whenever I am with someone else, for some reason, I'm just able to like, just have more energy naturally.

Starting point is 00:17:18 And I'm like, why, why is it like that? And I wonder if that's the same thing here with like, developers tend to, in most cases, I'm not sure's the same thing here with like developers tend to in most cases I'm not sure where the percentages but a large percentage is alone solo sometimes pair program, but it's more like a Particular scenario. It's usually a solo endeavor Team sport so endeavor right team sport. We're all making it but so endeavor Yeah, I'm making this feature or I'm in charge of this. And so I wonder if there's this social aspect that really is now going to be,

Starting point is 00:17:50 for the most part here forever, if we keep this tool in our life, now we always have a buddy. I wonder if that's a thing. Yeah, I'm not sure. I haven't seen research on that specifically. Adam, you sound like an extrovert by the way. Yeah.

Starting point is 00:18:04 I don't think I'm an extrovert at all. I do not get my energy by hanging out with you all here when I'm done here. I'm going to go take a nap because I have to. I'm kidding. I'm not going to, but there's some part of me that needs to decompress after exposure like there. So I'm definitely not an extrovert. I'm more introverted, but there's this idea of body doubling. There's this phenomenon of body doubling. I really wish I had Marielle Reese here who co-hosted Brain Science with me because she knows deeply about body doubling.

Starting point is 00:18:32 And that's essentially what you do here is you body double. You have a mirror, you have a buddy, either as a fictitious software program that can act like human or literally a human in the room that doesn't interact with you is just sort of there working with you making it productive. So body doubling is an interesting phenomenon. So to the 10% and maybe there's some brain science here and I'm not a brain scientist either, but I've witnessed in myself more speed and productivity, but like the same amount of output because I'm just kind of like done for the day or just like happy. You know, I'm just like, well, I wonder if there's like amount of work that a human does

Starting point is 00:19:15 in a day and you can like optimize that on the margins and some humans are probably more productive than others and stuff. But like for a lot of engineers, especially you've been in the craft for so long, you kind of have this idea of like how much you can do in a day and then you feel like satisfied. And I wonder if people are, and self reporting you probably wouldn't do this because it might be against your self interest,

Starting point is 00:19:34 but like doing what they normally do, but just kind of doing a little bit faster and better. And then doing something else and more, you know, working on having another meeting, have a R&D session or going for an extra walk. I wonder if there's any of that in there because I find myself being like, I could do more, but I've already done what I was gonna do

Starting point is 00:19:51 and so I'm gonna take a walk. There absolutely could be. Another theory, like a lot of theories on it, another theory could be that in the enterprise or in organizations, application of these AI tools is just still catching up to what you were talking about, what you're doing with Cloud Code, right?

Starting point is 00:20:15 For example, we don't necessarily see Cloud Code as the leading adopted tool currently in organizations. And in fact, some companies I talk to are, it's on their radar. But it's pretty new. It's pretty new. Another theory, I saw a really good writeup on this recently is when you actually back it,

Starting point is 00:20:37 so let's say, might need to get calculator out here. Let's assume that at most companies, engineers spend 20%, 30% of their 40 hour workweek writing code. So then if you take that 30%, and then you start plugging in these numbers, like, okay, let's say they're twice as more effective, like twice double the productivity. So then you take 30%, so you double that, well, what's that? So that means like 15% net, like I said,

Starting point is 00:21:14 I'm gonna get this wrong doing this live. So when you start backing into it that way, again, the 10% actually, you can see how you get there. Even with a pretty high acceleration of the coding part of the job, you're kind of limited by these other factors. And that's not even factoring,

Starting point is 00:21:36 just like all the other areas of friction, as we've talked about before in the podcast that are holding developers back and that are still currently constraints, even when the writing code part of their job is greatly accelerated. Right. In other words, it's not as if you're doing coding

Starting point is 00:21:54 100% of your job. It's a smaller portion of your job, and if you're doing that smaller portion faster, then you're only speeding up that one thing. And as many of us know, who've been in the industry a long time, is the coding part. While it can require you to sit down for six hours and do it, it's not always the limiting factor,

Starting point is 00:22:13 it's not the problem sometimes. Code reviews are still very time consuming, and of course, we gotta code review this stuff, right? Like, the agents aren't quite good enough to just let them just vibe code in the enterprise. Now, there are people claiming there's vibe coding going on in the enterprise, but I think most of that's rogue

Starting point is 00:22:34 and just trying to beat your colleagues at your job, unless you have data to the contrary. I'm interested you said, clogged code isn't very adopted. That makes sense. I mean, these agentic tools, especially, I mean, Gemini CLI, like the new CLI tools, we're talking like the last three, four months. And so we're not going to date on that.

Starting point is 00:22:54 Gemini is like two weeks. Yeah, exactly. Yeah. I mean, things are moving very fast and enterprises move traditionally slow depending on the enterprise. Well, friends, it's all about faster builds, teams with faster builds, ship faster and win over the competition. It's just science. And I'm here with Kyle Galbraith, co-founder and CEO of Depot. Okay, so Kyle, based on the premise that most teams

Starting point is 00:23:29 want faster builds, that's probably the truth. If they're using CI providers with their stock configuration or GitHub actions, are they wrong? Are they not getting the fastest builds possible? I would take it a step further and say, if you're using any CI provider with just the basic things that they give you, which is, if you're using any CI provider with just the basic things that they give you, which is if you think about a CI provider, it is in essence a lowest common denominator generic VM and then you're left to your own devices to

Starting point is 00:23:55 essentially configure that VM and configure your build pipeline. Effectively pushing down to you, the developer, the responsibility of optimizing and making those builds fast. Making them fast, making them secure, making them cost effective, like all pushed down to you. The problem with modern day CI providers is there's still a set of features and a set of capabilities that a CI provider could give a developer that makes their builds more performant out of the box, makes their builds more cost effective out of the box and more secure out of the box.

Starting point is 00:24:30 I think a lot of folks adopt GitHub Actions for its ease of implementation and being close to where their source code already lives inside of GitHub. And they do care about build performance and they do put in the work to optimize those builds. But fundamentally, CI providers today don't prioritize performance.

Starting point is 00:24:47 Performance is not a top level entity inside of generic CI providers. Yes, okay friends, save your time, get faster builds with Depot, Docker builds, faster get-up-action runners, and distributed remote caching for Bazel, Go, Gradle, Turbo repo, and more. Depot is on a mission to give you back your dev time

Starting point is 00:25:07 and help you get faster build times with a one line code change. Learn more at depo.dev. Get started with a seven day free trial. No credit card required. Again, depo.dev. Who is winning? Like who is, you know, is it Windsurf? Is it Copilot? Like from your data, what are people using the most?

Starting point is 00:25:26 Yeah, I mean, I don't want to, we're coming out with some data on that real soon. Okay. If you will, like, let's call it a leaderboard of AI. Okay, give us a teaser. I don't want to steal your thunder, but I want to hold a teaser. Yeah, well, I'm not going to call the winners specific. I can kind of, I mean, definitely Cursor, Copilot, Windsurf, well, I'm not going to I'm not going to call the winners specific. I can kind of I mean, definitely cursor, copilot, windsurf, and then cloud code are what we're seeing, both in

Starting point is 00:25:56 the data, but also when we go talk to organizations about kind of what they're what they're looking toward. You know, other interesting thing, we've all followed like Curso's astounding growth. What's really interesting is like most organizations are just in experimental mode right now. So they're going in and saying, okay, we're gonna like buy them all.

Starting point is 00:26:20 We're gonna buy them all, give everybody everything, then we're gonna figure out what we're actually gonna do. So it's very much up for grabs. Everyone's talking about like cursors, momentum, but I wrote an article last week saying, yeah, their growth is incredible, but who knows what's gonna happen? 12 months from now, companies are gonna say,

Starting point is 00:26:43 okay, we've been trying all these things. We need to kind of potentially standardize around a uniform tool chain around this. As you know, dread like cloud code, like there's even these like workflows, shared workflows. There's like, there'll be leverage in standardizing this tooling because there's going to be a lot of enabling work that needs to happen to make these tools and agents successful. And so, yeah, it's very much up for grabs,

Starting point is 00:27:14 but the tools I mentioned are the ones currently, I think we're seeing the most interest in highest levels of adoption by companies. Are you in an extreme growth mode as a result of this? Like the DX business? Yeah, because you've got to, if you have a large swath of enterprises who need to experiment,

Starting point is 00:27:35 they need to track how they experiment, right? So they need frameworks, not what you've got. I'm just curious if that has resulted into extreme growth. Six months ago, even four months ago, I would say companies coming at us were looking for help with all kinds of things and kind of figuring out AI was one of them. Say right now, AI is the number one use case. That's the only thing. That's the only thing anybody cares about. Yeah. So it's data for everything from bake-offs.

Starting point is 00:28:06 As you said, hey, we're evaluating five different tools, and we want to, with data, understand which of these are most effective for developers. It's putting a real, turn that into dollars and hours and numbers, hey, what is the impact? You know, our CFO is saying we should have 50% improvement. What is the data telling us about what this is actually yielding right now?

Starting point is 00:28:36 It's understanding, you know, what are the downstream effects? So, okay, we're seeing more code throughput, faster code velocity. We also seeing more defects. You know, we like how's that affecting developer flow state? Are we seeing more incidents? Is the code maintainable? The developers, is developers ability to then maintain this AI generated code, increasing or decreasing. And finally, cost. So with the consumption, consumptive based spending now, there's a real question.

Starting point is 00:29:14 Okay, we gotta like figure out what, like how much can we spend? Like what's the appropriate budget? And then how do we think about making sure that we're spending that money on good things, not like developers screwing around, burning tokens in ways that aren't a credo for the business.

Starting point is 00:29:34 So yeah, Adam, it's been a really big tailwind for the DX business for sure. Well, it has to be one of the most divisive technology hype cycles in human history because, I mean, maybe blockchain was equally as divisive because there was believers and non-believers in blockchain, and there was a lot of hype around blockchain will solve every single problem.

Starting point is 00:29:58 And then other people were looking at the technology and thinking like, well, it's really good if you need decentralized consensus, you know, which does have some applications, and it's finding some use cases, but not like everyone's going to say blockchain will solve it. Right. And so you had a lot of division there. And you have a lot of division on this because yeah, you have CFO saying we should be 50% more productive and you have people who are boots on the ground saying like, that's not going to happen. You know? And so that's a lot of pressure on me and my team,

Starting point is 00:30:27 which we think is unwarranted, and we're doing all the tools. It's just insane how much, yeah, top-down pressure of something that nobody really knows the upside in any sort of clear way, right? We know a vague upside. We can feel it, we can maybe report on it a little bit, but we're just not sure where this train is headed.

Starting point is 00:30:49 And so it makes sense that DX, you know, your guys' business is in high demand on that one topic because we all wanna know. Yeah. You know, it's a big open question right now, is how much are these tools gonna deliver on the promise? So far it's looking like more than blockchain, you know? But like you said, Jerry's still out.

Starting point is 00:31:11 It is amazing when I go talk to leaders. I think a lot of leaders, I don't know the percentage, we haven't surveyed them, but I think a lot of leaders really do believe that a large portion of their engineering workforce will be replaceable. A lot of leaders I talk to, I can hear it and I can see it in their eyes. They believe that that is what's going to happen. And when I talk to like prospective investors, CIOs, it's a question I get asked a lot about,

Starting point is 00:31:51 even the DX business. How is this relevant in a world where there's way less developers? Right. It's an interesting question. AX, dude, you need AX. Agent. Yeah, AX, yeah. Agent, experience. So it's an interesting question. AX, dude, you need AX. Agent, agent experience.

Starting point is 00:32:06 So it's an interesting thought exercise. I also talk to leaders who strongly believe this is not gonna result in any sort of widespread reduction in human head count. I think that's my personal prediction at this time. I was talking with a leader, I think I just heard a, it's called Javon's Paradox.

Starting point is 00:32:32 Have you ever heard of that? Okay, I haven't heard of this one, no. Okay, it was just shared with me this week, but it's on Wikipedia. It's the idea that when a resource becomes more efficient, The idea that when a resource becomes more efficient, it actually leads to higher utilization. So meaning is, I think there's an example of something with like oil refineries

Starting point is 00:32:56 and the ability to refine oil became much more efficient. So you'd think that the sort of investment in that would be decreased, like, oh, we can have less oil refineries, because we can refine oil faster. But, you know, it only resulted in like more production, more refineries. And so a similar applying that to what's happening here, like, is, if we view these tools as making engineers, maybe not replacing engineers, but making the engineers significantly more productive,

Starting point is 00:33:31 you know, that that law would then suggest that, well, we're just gonna have more. Yeah, we can get more out of per engineer and we're just gonna have more engineers and have more software faster, right? So yeah, it'll be interesting. I see how this all plays out. I kind of I'm java with that because I think that I hadn't heard of this

Starting point is 00:33:54 principle or paradox before, but it does make sense that when you make something more efficient, you tend to use more of it. And that's kind of where I'm leaning towards. Like you're going to have a recalibration of what a developer is because there's going to be more people willing to do what developers do. And so there's going to be a wider spectrum of what to do. So degree of difficulty, easier, harder. And then I think you're going to see the definition change, so to speak.

Starting point is 00:34:24 And you're going to see more people come into it. You're just, you're still gonna need people to think. You know, there's, I can't extract with this big enough, but there's still gonna be humans to think about the problem of humanity. You can probably offset a lot of that to AI, but I think you still need like this human intellect, this human, I don't know how to describe it besides feels like what feels right to humanity. I think you still need that in there because it's not quite in the AI, they're just more bits and bytes more than true intelligence is, it's not the same, you know.

Starting point is 00:34:58 And this shows up. So we just published this AI measurement framework, sent you guys the link. It'd be great to include in the show notes. One of the big questions as we developed this framework was how do we measure agents? So do we treat agents as people or do we treat agents as extensions of teams and people? In the framework, we discussed this in the paper, you know, we advocate for treating agents as extensions of people and teams.

Starting point is 00:35:36 So another way to think about that is that what that effectively What that effectively means is that a developer is the manager of these agents, but we're still measuring the developer and the team with the agents being an extension of that team, if that makes sense. Yeah. So thinking about how do we measure this, you kind of arrive at similar questions to what we were just talking about in terms of the human to agent ratio and balance and relationship and how that will evolve. Yeah, because the effectiveness of the humans

Starting point is 00:36:17 being extended is another factor because I may be better or worse at leveraging an agent than you might be. And so this team plus three agents versus that team plus five agents, there's so many variables there to actually whittle it down to any sort of usable information.

Starting point is 00:36:35 Well, that's your job, Abhi, not mine. So I'm sure you guys will figure it out. And you hear more about, this is another thing when I talk to leaders, they're thinking a lot about number of agents. Like what's the right ratio of human to agents? And I think that's a really interesting question. I also think it's not a practical question right now. I mean, you know, this working with like Claude Code, like it's not, we're kind of talking about like single threaded versus multi threaded, like it's not,

Starting point is 00:37:07 we're not at the point where we're really talking about, I have one QA agent and a designer agent and a front end developer agent, that's not really the paradigm. I mean, I've seen people kind of trying to do that. But there's some people doing that, they're on the edge. They're on the tip, yeah.

Starting point is 00:37:24 But that's not really're on the edge. There's some people doing that. They're on the edge. They're on the tip, yeah. But that's not really the paradigm right now. So I think right now, human extension is the right way to think about it, but that could change. In that paper, one of the things that you had pull quoted actually was companies are no longer limited by the number of engineers they can hire, but rather the degree to which they can augment them

Starting point is 00:37:45 with AI to gain leverage. It's kind of like what you're talking about there, is like you're counting agents, you're counting humans, but you really want to just like augment the human ability, not replace it. Although some of the leaders have been smirking, thinking replace, replace, right? Yeah, they're thinking we'll replace them soon.

Starting point is 00:38:04 Gosh. As soon as the data shows that we can, we will. Bye. And thankfully, the data is not showing that. Right. And I think the secondary effects of software quality, understanding the bottlenecks in the SDLC, like we talked about things like code review aren't going away. Dec decision-making and judgment. Actually, when you think about it, for example, like product management, a lot of seasoned engineering leaders, when I talk to them, know that product management

Starting point is 00:38:38 is actually the big bottleneck, not so much engineering velocity. It's really like product velocity. It's decision making. It's that life cycle from idea to code. And I think we're going to see attention on those bottlenecks magnified, because as we optimize the coding, we're already seeing this at DX, companies come to us, hey, we thought we were supposed to get 50% productivity improvement.

Starting point is 00:39:08 We're not seeing that from the AI tools. So now we're asking what really is our problem? What really are the bottlenecks? And so it is magnifying attention on engineering productivity in general, because folks are really focusing on that topic right now, because they're expecting these tools to be transformative in their organizations.

Starting point is 00:39:31 So I think that's an interesting trend we're seeing as well. So this could actually result in people carrying more and investing more in the other aspects of developer experience that people maybe haven't focused on before because those constraints are being magnified. When you solve one bottleneck, then you see the other one for what it is. And you start trying to solve that one. Well friends, it's time to build the future of multi-agent software.

Starting point is 00:40:17 You can do so with Agency. That's AGNT CY. The Agency is an open source collective building the internet of agents. It's a collaboration layer where AI agents can discover, connect, and work across frameworks. For developers, this means standardized agent discovery tools, seamless protocols for inter-agent communication, and modular components to compose and scale multi-agent workflows. You can join crew, line chain, lambda index, browser base, cisco, and dozens more. The agency is dropping code, specs, and services with no strings attached. Build with other engineers who care about high quality multi-agent That's agntcy.org and add your support once again,

Starting point is 00:41:06 agency.org, agntcy.org. So budgets, I'm curious about budgets because as we look at agents as extensions of engineers, let's imagine that I'm worth $100,000 a year, whatever plus whatever. And if you give me one agent, maybe I'm worth 120. And so are you spending $20,000 a year on an agent per engineer?

Starting point is 00:41:41 Cause that saves you another engineer perhaps. I'm probably not gonna get to two engineers, a lot of two-Xer, but maybe I'm 1.1X, maybe I'm only worth 110, how much do we spend on this? I'm sure these people are trying to figure it out because budgets very much have to be actualized and decided on like how much are we gonna spend towards this. Now when you're just trying every tool there is,

Starting point is 00:42:01 and I guess is the budget is, don't worry about the budget, we gotta figure this out. But eventually those things need to be figured out because I don't think we're gonna get outright replaced like these CEOs want to soon. But certainly we're gonna be augmented in a way where your budgeting starts to change. Start to think about, you know, engineer plus.

Starting point is 00:42:22 What are your thoughts on that? And have you guys done any work with regard to pricing these things out? In our paper we talk about cost and we talk about ways to think about cost. We, I don't think across the industry are at a point where companies are focused on this problem. They're talking about it because they know it's coming.

Starting point is 00:42:43 They know it's the next problem they need to figure out once they get over the kind of experimentation phase. But your example right there, yeah, you know, your cost is $100,000 per year with an agent. Your cost is 120,000 per year. Like what does that mean? Again, this goes back to the, well, then should there be less develop, should there be less people?

Starting point is 00:43:07 Right. Because like we're offsetting that. And so a couple ways that in our paper, we talk about ways to think about this. So what is this idea of like human equivalent hours? And this came up in our conversation here before, like being able to kind of measure, okay, how much human equivalent work did this agent do? So not just looking at number of PRs or right, but like how much human equivalent work. So if you can measure that, which is hard, then you can take that against AI spend. And you essentially have this idea

Starting point is 00:43:50 of an agent hourly rate. So your spend divided by number of hours of work produced, work done, that's your agent hourly rate. So I think that's one interesting number for that we're working with companies on putting some focus on. That's a number you can use to start to rationalize what's the right amount of spend.

Starting point is 00:44:13 Another interesting metric is like net time gain per developer. So that would be, we talked about the time savings. So how much time is AI saving you? Well, how much are you spending on AI? And then when you take that, you know, convert it into equivalent of what the developer's hourly rate is, are they actually saving time or do they spend more money than the time they saved? Right. Right. So those are two, so agent hourly rate

Starting point is 00:44:46 and net time gain per developer are two metrics that, again, really hard to get at right now. I mean, a lot of the vendors, it's hard to just get the cost and stay on top of the cost information in general. But I think those are two good frameworks for thinking about the net ROI and right sizing the investment. That's a complicated task.

Starting point is 00:45:09 That's for sure. But I'm glad we got smart people thinking about it. It'll certainly be a huge concern maybe 12 to 18 months from now. Eventually, some of these tools got to shake out, I think. And that's what I've kind of been waiting on is like, you know, I'll let the edge lords do their edging and then I'll just wait and see what shakes out. But it's been a longer grind, you know, it's probably been two and a half, almost a three year

Starting point is 00:45:36 since ChatGPT changed the world. And they're just now getting to where, like for the longest time. I replaced Google with it, but I wasn't going to use it for any software until this last iteration of models and they've all gotten to where it's like, okay, you know, we've reached a threshold which is significant. Going back to your Javan's paradox, that actually tracks with me with just as an N of one. Like I said earlier, I'm not replaceable in this sense, but I'm just writing more software. Like I'm not just doing less, although I did confess to going

Starting point is 00:46:14 and taking a walk earlier than I would have. But I'm also just doing more stuff that I wouldn't have done. Like it just unlocks me to write more software that I wasn't gonna write. And I imagine all around the world, imagine every JIRA board or Pivotal Tracker or whatever tool you're using and the Icebox,

Starting point is 00:46:29 you know, the backlog. And there's things in that backlog that you know they're just never gonna get worked on because other stuff just goes in higher and replaces them. And it's just a constant grind. And there's so much unwritten software that we're not gonna run out. We're not gonna, we're just gonna hire, hire, hire.

Starting point is 00:46:46 We're gonna augment, we're gonna write more software. Some of that software is gonna be really crappy. We're gonna hire more security engineers and then we're gonna hire people to replace the software. I mean, it's gonna be just fine, I think. That's my- Just fine, I think. I think we're gonna be just fine.

Starting point is 00:47:03 Now we do change how we do our work. Absolutely, 100% change how you do your work. And I think our teams change slightly. I think our enterprises change. I think less large enterprises probably, smaller teams doing more. Businesses don't have to grow that head count quite as fast, but the large ones stay large.

Starting point is 00:47:22 That's just an intuition. I don't know that I have with you guys. Well, even with your, you know, confessing to taking a walk early, I wonder if maybe during that walk you solved a harder problem that you haven't been able to solve because you were happier, you felt more fulfilled and maybe you actually had the brain space to just think, you know, so that walk doesn't actually. I like to think I did. I'm pretty sure I did. Yeah, you probably did.

Starting point is 00:47:45 I mean, it's not indicative of less output. That's the problem, I think, is... And why I'm so thankful DX is here because you got the Core 4 and this kind of four degree of measurement across teams, and you got this newer one for the AI to measure agents. We need those checks and balances because I'm curious, you know, having said that, if the DX Core four needs to change or will change because of AI like does do we need to add a happiness metric in there or morale? I think it's kind of in there, but maybe you can speak to it more. So Abby,

Starting point is 00:48:14 but I'm curious if that DX core four needs to be the core five because we need to measure the human contentment, I would say, like as an individual. And then that individual is part of a team. That team is part of a culture and a culture of a company. I'm curious if that will change because of AI. Yeah, we talked about this last time I was on the show too. Did we, gosh. We have the idea of happiness encapsulated

Starting point is 00:48:42 in our developer experience index measurement. Okay. Very intentionally not called happiness because one of the goals of the core four is to make it palatable for executives. And I mean, not to sound cynical, they don't want to measure it. Not now, back in 2021 they did did, when no one could retain developers. Right. Right now they can't. Funny thing, GitHub recently came out with a white paper

Starting point is 00:49:12 on how to think about measurements and we consulted with them closely. They incorporated a lot of the core four measurements in their article, but one thing specifically I kept telling them was don't call it happiness. They're like, GitHub is all about developer happiness. It's all about developer happiness. I said, don't call it happiness because, you know,

Starting point is 00:49:36 the irony is if you call it happiness, then executives won't measure it. And so they won't care about developer happiness. If you call it it if you kind of frame it as something else we frame it as developer experience index which is how you measure effectiveness because we believe like developer happiness is part of being effective in how you build software then they'll measure it and it'll get improved and optimized and so then they'll measure it and it'll get improved and optimized. And so, yeah, it's a naming problem, Adam,

Starting point is 00:50:07 but it is encapsulated in there. In terms of like, should the Core 4 change? That's something we've looked at closely. And as of now, as a organizational way of thinking about a measuring productivity, we think Core 4 still holds true. What's different about AI is you need more, right? There's a lot of new stuff we're measuring.

Starting point is 00:50:34 And so the AI measurement framework actually includes the Core 4 in one aspect of it, which is understanding how is the overall organizational productivity being impacted pre and post AI or depending on level of adoption. So that's a lot of what we're helping our customers measure right now is when we look at the adoption curve in our organization or the maturity curve, I think it's transitioning into now less about adoption more about maturity. So not just are people using Co-Pilot daily, but how are they using it? Right? Are they using it just for autocomplete? Are they using the agentic stuff? Are they using the AI to help them create the

Starting point is 00:51:18 prompt that they feed back to AI? Right? So maturity, like how is that matured increases? to AI, right? So maturity, like how is that maturity increases? Does the organization seem more productive based on data? That's the big question we're trying to help companies answer right now. What has been your adoption strategy at DX for these tools? Yeah, you know, we work with Netflix

Starting point is 00:51:41 and one thing that's been on my mind is how much I have not heard Netflix making a fuss about AI. Meaning like a lot of these companies like, you know, we need our developers using them. We're measuring how much they're using. Like I haven't heard that from Netflix, which makes a lot of sense because Netflix is predicated, but their entire culture is, hey, we just hire really senior people, like, like developers kind of rule at Netflix, right? Because they they entrust their that their developers are the best in the world and that if there's a really useful tool for them to

Starting point is 00:52:19 use, they'll use it, they're going to use it the right amount. They're not going to use it more because we're measuring them. And so that's the approach I've taken at DX. Now I've also encouraged like cloud code, for example. So I encouraged a few of our engineers, hey, I'm reading about this workflow, you know, voice attacks to cloud code to this, to that, right? Like one of those more edge.

Starting point is 00:52:48 And so I asked a few of our engineers to go try this workflow for a little bit and see what you think. But we definitely haven't mandated it. It's my understanding that everyone's using one or multiple of the tools and doing their own experimentation. But ultimately, yeah, I trust the engineers to use it to the degree, to the extent that it's useful. And I think that's the, I haven't put any pressure on people. I haven't been like, if you don't do this, you're going to become obsolete. That's not a message I'm bringing. We have some really

Starting point is 00:53:26 interesting conversations around just candid conversations like where do we all think this is going, right? And that's also relevant to DX because we're a product company. We're thinking about products around AI tooling, AI enablement tooling. And so we're having those discussions as well. Can you say more about that? Well, DX the business, we've gotten to this interesting point, we've been doing the measurement, right, thing for now almost four years. It's going really well, but a few months ago. You're getting bored, getting bored of measuring stuff.

Starting point is 00:54:02 Well, a few months ago, I met with Drew Houston, CEO of Dropbox, and he said to me, they've been a customer of ours, he goes, you guys are doing diagnostics really well. Have you thought about, what about the interventions? Like, what about solving the things that you're measuring? And so, that's actually a question I realized, we've got asked in different ways, but constantly,

Starting point is 00:54:30 like, well, can you actually help us, like, improve? And, you know, we've done a lot of things, like you guys saw this partnership we have with ThoughtWorks. So, hey, partnering with consulting companies who can come in and help you with the transformation. We started partnering more with different vendors. Hey, is there a way we can loop in vendors? We don't really wanna play favorites,

Starting point is 00:54:54 but hey, at least we can map different tools out there to different problems, areas of the SDLC. I think we've gotten to a point where it's like, hey, you know what, actually there's like gaps in the market but there is no solution for some of these problems. And should DX just go build them? And now with AI, we're seeing a whole new generation of problems that are being created.

Starting point is 00:55:21 I mean, like you guys, my previous company was like a slack app, right? So like, I got in on that right as slack was just grown like crazy. And so there was just, it was an entire new paradigm, and the entire generation of businesses were built on slack. And so I think AI and AI engineering specifically, so this new way of working, this new way of doing software to me is actually a new paradigm in which, potentially like the entire tool chain could be rewritten.

Starting point is 00:55:53 Like the folks at GitHub and GitLab are worried that cursor is gonna add code review and source control management to their, like just go for it, right? I think it completely upends the status quo. And so, you know, at DX, I don't think, we're not going after cursor, we're not going after GitHub, that's for sure.

Starting point is 00:56:14 Who are you going after? I think we're going after the adjacent opportunities, the ones like on the margins that, you know, the big guys are, we're not trying to go to war with the cursor and copilot. We're trying to solve the adjacent problems we see. Some things we've talked about today, like how do you actually,

Starting point is 00:56:36 how do you upskill developers? How do you optimize your code for LLMs? How should platform engineering teams think about sort of self-service and enablement in the same way that if you guys have followed things like Spotify Backstage, right? Like big focus on golden pads, self-service developer enablement.

Starting point is 00:56:58 What does that look like in a post AI world? It's like, well, enablement on what? Golden pads around AI tooling, AI development workflows, shared, you know, we talked about like, Claude has workflows, literally, right? They have workflows, they're so like, you know, curating that, like, how do you create like a standardized set of like workflows that you hired,

Starting point is 00:57:20 you develop in your organization, boom, they have this, you know this menu of superpowers, AI-powered superpowers that they can... So those are the types of problems. I can't get into specifics today, but those types of adjacent problems, I think, are new constraints for enterprises looking to deploy AI at organizational scale. So not single player mode, right? But more multiplayer mode. How does an organization become successful with these tools is a different set of problems. It's like AI adoption best practices as a service.

Starting point is 00:57:55 Yeah, that's one potential opportunity, right? That's just one of your ideas. Yeah, that's not saying that's what we're doing in DX, but. What else is adjacent to this? Don't give us your solutions, but what are the other problems? Spill the beans, Avi. He's gonna press you to spill those beans.

Starting point is 00:58:10 Wow. I'm just kidding with you. He's not pressing you. He's just curious. Yeah, I think what we're seeing is, there's really two things. One is that there's this new set of problems, like some of the things I've hinted at.

Starting point is 00:58:24 I think there's this new generation of problems to be solved. The other opportunity is that there's actually the pre-existing problems that are being more magnified, meaning post-AI, it's even clearer that these are real constraints. And in some ways, they're a limiting factor to how much value you get out of AI. Because again, if we go back to the idea of agents as extensions of humans, then the ability of that human and the environment in which that human is working in is actually a more magnified limiting factor. Because now this developer could be getting

Starting point is 00:59:03 150% leverage, whereas before there was just 100 to use the numbers we were talking about before. So what are the constraints of the human? Well, it's probably the same constraints we've had before, some new ones, and those types of things we've been measuring for years and seen go unsolved and we think maybe we should go try to solve them. Are you raising money? Where do I invest, Avi? Where do I invest? You sound like you're well positioned to actually take these adjacent markets over, man. I think GitLab, GitHub, Atlassian have a platform advantage over a cursor.

Starting point is 00:59:43 That battles on. It'll be really interesting to see, can GitHub leverage its platform advantage over like a cursor. Sure. That battles on, right? It'll be really interesting to see, can GitHub leverage its platform advantage that it has, being the system of record for code, right? And like developer communication, can it convert and being embedded into much of that, so you'll see, can it convert that into a really amazing platform that can kind of win out over point solutions that are threatening their business model?

Starting point is 01:00:14 I think similarly DX has some platform advantages, right? Namely that we can actually measure what's going on with all this. You can prove what's going on. Yeah, and so, yeah, again, I don't think, we're not gonna jump into the battle of the AI code agents, but we see opportunity to bring a lot of value to our customers by helping them maximize those investments

Starting point is 01:00:38 and continue to just optimize their overall engineering productivity. We're almost out of time, but I gotta ask you for predictions since you mentioned this. Cause when you mentioned the popular agents, you did not mention Copilot. Curse was mentioned, Windsor was mentioned, Cloud was mentioned because Jared sort of interjected it,

Starting point is 01:00:57 Cloud code at least, but Copilot was not. Can you give me a prediction? What do you think is gonna happen over the next year or so given this? Tomocho's waters just mentioned to get hub or get lab the entrenched What what happens if they don't succeed in? This transitional moment. Yeah, if I were cursed, I'd go for it, right? I would go for the full stack like be the platform I think the prediction I'll leave you with is I'm interested in seeing,

Starting point is 01:01:27 are we gonna end up in kind of like, in the same way we've sort of ended in like multi-cloud, like is the end state of this like kind of like multi-agent, like, you know, different companies will offer agents and models that are more fine tuned to different types of work. And so for the developer, are we gonna be in a world where like based on the task or even sub-tuned to different types of work. And so for the developer, are we going to be in a world where

Starting point is 01:01:45 based on the task or even subtask, we're delegating to different providers and services? And then there's an orchestration problem. So that's another problem we're thinking about at DX. Is that the paradigm? And if so, there's a tooling layer needed. Man, I want to be invited to one of these think tanks y'all have.

Starting point is 01:02:03 I want to be in these. I want to be a fly on the wall in the room with all this data you got, this moat you've got just looking at the landscape and considering. Less conjecture because they have actual data. You know, Adam and I conjecture, but we don't have any data. So we're just talking out of our certain body parts. All right, Abbie, we know you got a hard stop. We'll let you go.

Starting point is 01:02:23 Thanks for stopping by again. You're welcome. Anytime, man. Yeah, thanks so much. Always you had a hard stop. We'll let you go. Thanks for stopping by again. You're welcome anytime, man. Yeah, thanks so much. Always fun chatting with you guys. Keep up the great work. Bye, friends. Bye. All right, that's all we got for this week.

Starting point is 01:02:33 Thanks for changelogging with us. Thanks for frenzing with us. Thanks for making your way to Denver so you can meet up with us and live show with us and maybe even hike with us. You are coming, right? I hope you're coming. July 26th, the Oriental Theater, Denver, Red Rocks, oh my gosh it's gonna be so much fun. Learn more about it at changelog.com slash live. Thanks again to BMC for these dope beats,

Starting point is 01:02:56 to our partners at Fly.io, and to our sponsors Auth0, Auth0.com slash AI, Depot at Depot.dev, and agency that's AGNTCY.org. Have a great weekend. Tell your friends about the change log, why don't ya? And let's talk again real soon. Thanks for watching!

Your Ad Here

The Changelog: Software Development, Open Source - Measuring the actual impact of AI coding (Friends)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.