Software at Scale - Software at Scale 58 - Measuring Developer Productivity with Abi Noda

Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Utsav Shah, and thank you for listening. Hey, welcome to another episode of the Software at Scale podcast. Joining me today is Abhi Noda, the CEO and co-founder of DX, a developer productivity platform. Welcome. Thanks so much for having me, Utsav. Excited to be here. Yeah. So, Abhi, first, the thing I want to talk about is the fact that you graduated from UIUC. So, virtual high five. I graduated from there as well. So, if you remember anything about your CS experience from then, I'd love to know a quick, quick, quick story.

Starting point is 00:00:43 Well, it's actually funny. I actually did not ever graduate. I did attend. Okay. But yeah, I spent a few... Actually, hold on. It's a pretty hilarious story. So do you want to still go into it? I'm happy to go into it.

Starting point is 00:00:59 It's not like private or anything. No, go for it. Yeah. Okay. like private or anything no go for it yeah okay so it's a funny story about my college experience because i actually switched majors late in high school after i'd initially applied to schools i'd initially applied as a econ and finance major to a bunch of universities and going into my senior year it changed my mind and decided I wanted to study computer science. And growing up in Illinois, UIUC was the obvious choice in terms of cost and the strength of that program to attend.

Starting point is 00:01:34 However, before actually going to school, I ended up taking a gap year and working as a software developer. I had been able to pick up a little bit of basic PHP while in high school. And so I was able to land a summer internship going out of my senior year working on WordPress sites. So ended up taking a gap year, eventually did attend UIUC. And seven weeks in ended up dropping out with a friend of mine to go travel the world and try to start a company. We never got a company off the ground. And actually, funny enough, several of our other friends also dropped out shortly thereafter to go and start companies. I would say that our cohort of friends who dropped out of college together have been pretty successful in their entrepreneurial endeavors.

Starting point is 00:02:21 So that's good. But I unfortunately didn't get to spend as much time at UIUC as I wish I could have. Really enjoyed my time there and made a lot of lifelong friends from my experience. But yeah, my experience at UIUC was a little bit short-lived. Yeah, there's only so much research can help before a podcast. You never know what's going to come out. That's a wonderful story. I'd love to have that kind of cohort of friends who are all so ambitious like oh i just want to start my own thing make it big that's awesome and after that like i remember reading about you i don't know if you remember this from our show a long time ago from like the indie hackers blog and like that whole circle of oh pull panda was like this

Starting point is 00:03:05 bootstrap company and then it became this big thing everyone's using it and got acquired by github so like maybe a brief story on how that got started i started pull panda it was originally called pull reminders and i started that company because i actually lost my job at a horrible time. It was over Christmas vacation. And I knew no companies were hiring. There was no point to network and reach out to people or even try to send my resume around. So I decided to work on a side project that I'd had the idea for almost a year. I've been working as an engineering manager at several companies. And at each company, I had ended up spending a considerable amount of my energy as an engineering manager, just chasing down pull requests and asking developers on the team to make sure they reviewed them or followed

Starting point is 00:03:58 up with reviewers to get them through the door. And I'd had the idea that perhaps this could be automated. And I'd even tried to, for example, use Zapier to create an automated bot that would just post a reminder to the team, sort of a status dump of here are all the pull requests that need to be reviewed or need to be merged. And so the idea originated from that experience. And I worked on the first version of Pull Reminders for about a month. The idea was not to turn it into a business. It was really just to build something and put it out into the world and see what would happen. But as it turned out, there was, right from the get-go, a pretty strong amount of interest.

Starting point is 00:04:42 I had a few people reach out, even found me by SEO, which is pretty funny. Typically, a brand new website or product is not found through Google. But I guess I had just the right HTML title on my page that was really targeted toward a specific use case. And so within a couple months, I got my first paying customer who paid me $20 a month for unlimited users. And from that point on, Pull Reminders became a business. And as you know, over the course of the next year and a half,

Starting point is 00:05:17 it sort of just took off like wildfire on its own. And I don't really attribute that to anything amazing that I did. It was sort of just being in the right place at the right time with the right product and a number of fortunate circumstances such as the rise of Slack. Slack was really still in a growth phase at that point. GitHub had recently launched the GitHub Marketplace. And so Pull Reminders became one of the featured products in the GitHub Marketplace. So thanks to occurrences like that, Pull Panda really just took off on its own and became a pretty successful business. Yeah, I think it's so interesting. And is that when you were first interested in developer productivity? Or that was just when you were an engineering manager

Starting point is 00:05:56 who was trying to make sure your team was executing quickly? You mentioned that it was a year you were thinking about this? Yeah, it's a great question. I was really thinking about the problem of how to measure productivity. That was the actual problem I was trying to figure out and solve and potentially build a product around. And Pull Reminders actually eventually became Pull Panda, which was a suite of products, one of which was a product called Pull Analytics, which provided the common get and pull request metrics that you see across the industry today. And so really, the inspiration for going down this path was to try to tackle the problem of engineering measurement. However, the pull reminders feature or product was really designed to be a baby step for me to just ship something.

Starting point is 00:06:48 I kind of considered it like a warm-up lap. I wrote down in my journal, just ship thisers by itself became quite a successful product. And I later added the measurement and analytics features to the bundle. Yeah, it kind of reminds me of breaking down a project into multiple milestones. Your first milestone was pretty successful. So nice work. But I think you're coming close to the actual end goal, which you had in mind then, which is figuring out

Starting point is 00:07:25 how developer productivity is measured. And that's with DX. And I really like the solution that you've come up with at least so far, which is you have these surveys, because I think we've tried a bunch of different times, there's like a bunch of tools in the space. The first one I can think of, I had heard of way back was Git Prime, which I think got acquired around like metrics from tools and you all took a different approach at least in the beginning which is it's actually a very well targeted survey that can give you a lot of insight so when did you think of that as the problem was that something you were thinking of at GitHub and that's where like DX came out how did did everything start? Yeah. Took me a long time to arrive at the current state of what I believe is the right approach and the right solution to this problem. Back when I was working on PullPanda, I did adopt the

Starting point is 00:08:19 approach similar to tools like GitPrime of taking in the data that was available from source control tools like GitHub and project management tools like Jira and using that data to try to produce metrics that were useful for engineering managers or, broadly speaking, engineering leadership. The story goes that I really just hit a wall with these types of metrics. I hit a wall in terms of the value I saw in them myself as an engineering leader, as well as the limited value I saw in customers who were using my products who weren't really able to do much with these metrics. The pattern I kept seeing was that companies would get really excited about the prospect of these metrics. They would purchase my product or products like

Starting point is 00:09:13 GetPrime. And a few months later, if I were to check in and ask them how things were going and what they were doing with their metrics, generally speaking, they weren't really doing anything with them. In addition, another pattern I saw was that although most organizations weren't really doing anything valuable with these metrics, they also were sometimes doing things that were harmful with them. So for example, when I worked on Poll Panda, I would regularly see people trying to export the data or reaching out with feature requests with specific types of reports that were specifically for the use case of using this type of data to evaluate the individual performance of engineers. And at this point, although I was still early in my journey, it was very evident to me that these types of metrics should not be used for those types of purposes.

Starting point is 00:10:05 And so I knew that there just had to be a different approach. The approach of just pulling data out of systems and using those metrics could only provide value in a very narrow use case of understanding code review. But if you wanted to understand bottlenecks or constraints or aspects of productivity beyond that, there really wasn't any data available within these systems. And as I continued to ponder and struggle with that problem, it started dawning on me that any question that leaders were trying to answer with this type of data. And let me give you an example. One question my customers started wanting to ask was code review quality. They wanted to understand they were using the data from Poll Panda to increase the speed of code reviews, but they wanted to understand if that was affecting the quality of the reviews. And the way that customers asked us to solve this problem was to calculate the number of comments per code review. And to me, this seemed like such a poor proxy for code review quality compared to,

Starting point is 00:11:22 for example, just asking your developers whether they felt that the code reviews were of quality or not. And so this pattern started to emerge as well. I found that more and more, any question that we were trying to answer with Git data was better answered by simply asking your developers. And this is before I knew anything really about psychometrics or survey-based measurement approaches. But it was just that simple idea that if you could just ask your developers, that would actually provide you the insight into a lot of these questions that leaders were trying to ask. And that's really where the idea for the current approach and the research I'm doing was born from. And it seems like there was a similar timeframe when the industry was thinking about developer

Starting point is 00:12:10 effectiveness metrics, things like DORA, like D-O-R-A and space. How is the timing related to the industry thinking about these things versus your thoughts evolving? Yeah, well, DORA and the book Accelerate came out while I was working on Pulled Panda. And so when that book came out, I actually got in touch with Nicole Forsgren, who's the lead author of Accelerate and who I now work with at DX today. And I was inspired by that research and by that book. And similar to the phenomena that I observed with customers and our product, I was very excited by the prospect of these metrics. It became clear to me, though, that there wasn't anything special about the DORA metrics

Starting point is 00:13:00 in particular, that they really sort of suffered from the same problems and limitations that the Git metrics that I was working with at the time did as well. And so I think the DORA metrics was a big thing for the industry and created a resurgence of appetite around this problem of how to measure engineering organizations. And space is something that came out while I was actually working with Nicole later on at GitHub. And so her and I were working together on product solutions for companies in terms of how to measure engineering organizations, as well as how to understand and improve developer effectiveness internally at GitHub. We were undergoing a pretty big transformation, and there was a big need for measurement to guide

Starting point is 00:13:48 our progress there as well. So my experience working in this problem space has intertwined and intersected with things like Dora and space at several points along the journey. Yeah. And that makes sense to me. I think those metrics, that book, Accelerate, I remember, I think it was my first job. And I was working in a developer effectiveness team when my manager was super we did both, right? We started measuring deployment frequency as well as we were doing these developer effectiveness surveys. And I think both of those together gave us a reasonable picture of what's going on. I kind of want to move around in time.

Starting point is 00:14:37 So you mentioned that when tools like GitPrime existed, people were excited about that. What were people doing before tools like that existed? How did organizations try to measure productivity way back? And how did it get to where we are today? Before tools like GitPrime, I don't think there was much formal progress around this problem of how to measure. When you kind of look back at history or the history of our domain, one of the most common ways was to just try to measure the output of developers. So metrics like lines of code or function points or velocity points were sort of the de facto ways in which organizations tried to understand and measure productivity. And in fact, those are still common approaches today.

Starting point is 00:15:26 You could argue they've somewhat been superseded by things like counting number of pull requests, but really that's a similar approach to counting commits or lines of code or velocity points. At the same time, there were many outspoken critics of this type of approach. For example, Martin Fowler wrote an article, I want to say in 2001, maybe it was a few years after that, but a long time ago, talking specifically about this practice and why it wasn't effective. There's also some famous stories, quotes from people like Bill Gates, early engineers at Apple, who also pushed back mockingly at the practice of trying to measure software development using output metrics. So I think that's where things were when Git Prime

Starting point is 00:16:13 came into the picture. And in fact, one of the selling points of Git Prime was that it was an alternative to just measuring lines of code. Unfortunately, the reality was that GitPrime also did measure lines of code and in fact, really provided lots of metrics that were similar to lines of code, although they weren't exactly lines of code. And so hopefully that answers your question. Before GitPrime, I think the status quo was just measuring and counting some type of output of developers based on their activity. And do you think it's just like an organizational urge to have metrics so that there's something to track? Is that kind of where you think all of this comes from?

Starting point is 00:16:55 Why is it a good idea to try to measure story points in the first place? I mean, absolutely. I think this is driven by really sensible needs and desires of businesses and leaders. I mean, even today, tens of millions of dollars are spent by the typical organization on software engineering. So to ask the question of how good are we doing? How can we get better guided by data is a very reasonable question for leaders to ask. I think at the same time, having been in this position myself, this often is sparked by a CEO or non-technical business stakeholder asking an engineering leader or a technical leader

Starting point is 00:17:38 for measurements to report up or prove or show or demonstrate the value or productivity of their organization. And so I think the need is pretty universal, all levels of an organization, but it's definitely a need that's not going away. And in recent years with, for example, COVID, and now the tightening macroeconomic conditions, the need has only grown. It's not going away. That's for sure. Yeah. And recently, it looks like there's a new paper coming out about a developer experience framework, which you've been working on. Maybe you can tell us a little bit about conclusion or like something that y'all have been thinking about how all of this can be tied up into a set of questions that you can ask developers or into a framework that you all have been thinking about, how all of this can be tied up into a set of questions

Starting point is 00:18:25 that you can ask developers or into a framework that you can use to actually measure and think about developer experience or developer effectiveness. This paper is really a culmination of all the experiences and all the research done both by myself and by my fellow authors on the experiences, and all the research done both by myself and by my fellow authors on the paper, including Nicole Forsgren, Margaret Ann Story, who is one of the co-authors of Space, and Michaela Grayler. And to convey the real goal of this paper, the other week, I was talking to

Starting point is 00:18:59 the CIO of a major bank. And he told me that internally, they were having heated discussions about the right way to measure and understand developer productivity. And of course, someone within the organization was advocating for the use of metrics like lines of code and commits and activity based measures to do this. And he said to me that he realized that there is a better way, a different way, but that there wasn't anything concrete that he could offer. to measuring productivity than the conventional approach of measuring development activity, development processing times, and other conventional metrics that are common today. So this paper, the subheading is the developer-centric approach to measuring and improving developer productivity. And that's really what this is about. What we're trying to outline here is an approach to measuring developer productivity

Starting point is 00:20:05 that is based on the feedback and signals from developers themselves, rather than focused on a top-down view of developers' activities and processes. And so in the paper, we go deep into what that means. When we talk about the feedback and signals from developers, we give that a definition and a conceptual model around developer experience. And then we get into the more practical recommendations of how to measure developer experience and the examples of types of metrics that organizations should use. Yeah. As a business leader, you can imagine that the kind of questions I might have are, am I shipping enough features given that I have

Starting point is 00:20:46 an X large team of developers? Is that something that this framework can help me answer? No. I mean, that's a very valid question. And that's a question that is actually really difficult to answer using even the conventional metrics. And I think that's where organizations go wrong. Google recently published a research paper about the difficulty of measuring developer productivity. And in that paper, they discuss the challenge of trying to measure knowledge work and software development is knowledge work in the same ways that we have typically measured non-knowledge work such as manufacturing, right? And so they give the example of coal shoveling. They talk about how you can't measure software development in the same way that we would just measure the

Starting point is 00:21:37 amount of coal that someone shovels in a given day. And so to your question of, are we delivering enough features? Or are we delivering a good number of features? That's not a question you can really answer based on counting tickets, counting commits, or counting anything. In fact, that's probably a question that is subjective within the organization. So in some ways, it does relate to the approach that we're outlining for understanding developer productivity. But our framework is really focused on understanding the root causes, the things that inhibit productivity or promote productivity. Our framework isn't about measuring the actual yield or output of an organization. Because as that Google paper discusses, that's not something that is actually feasible with knowledge work, you can't count

Starting point is 00:22:32 the output. It's not a factory where you can count widgets. Yeah, I think this is more and I think it's absolutely correct, which is it's a framework that you can use to debug your engineering organization, especially when it has, you know, self-imposed blockages or small things that are basically impeding large amounts of productivity, right? Which is exactly what you should be using something like this for. Is that fair to say? Exactly. Yeah. I would argue that developer productivity is difficult to even define. We aren't as an industry even close to being able to measure it in good ways. What this framework offers and what other previous approaches have offered is ways to understand and debug productivity and improve it. But actually

Starting point is 00:23:19 measuring productivity itself, at least our conventional definition of what that is, meaning yield or output, that's just not something that is really feasible based on all the research and collective experience we have in this industry. Yeah. An analogy I like to think of is that it's similar to product quality. Like how do you measure that a product is high quality? You can maybe try to track how many bugs you're getting in, but if the product's really slow, and customers are just leaving, that might not translate to a bug count. So it's similar in that way. What do you think? Absolutely. There's a great book called How to Measure Anything. And the subtitle of that book is How to Measure Intangibles. And quality is an example of an intangible, right? Whereas widgets

Starting point is 00:24:07 coming out of a factory are tangible, you can count them, you can see them. Quality is something that is abstract and intangible. And in the book, one of the approaches that is discussed is psychometrics, meaning that in the same way that we take out a thermometer to measure the temperature of a room, the thermometer in that case is a measurement instrument. It takes in input from the room and spits out a number. We can use humans as a measurement instrument as well. Humans can observe input. They can observe the world around them, and then provide data back. And so when you're trying to measure something like quality that is intangible, the actual asking humans to rate things based on a rigorous approach and using that data to produce quantitative and or qualitative insight. And you've been thinking about developer productivity in this space for a really long

Starting point is 00:25:19 time. But while working on this, there might have been aspects that surprised you, even though you've been in the space for so long. Is there anything that comes to mind that's like that? I think one of the things that surprised me is I have arrived at this understanding of survey-based measurement as a practical and most often preferable approach to debugging developer productivity. One of the things that surprised me is how much opposition and prejudice there is against

Starting point is 00:25:53 survey-based measurement by tech leaders. A lot of tech leaders, when you bring up survey-based measurement, scoff at it. They're not interested in it. And in fact, they don't trust it. And I think that's one thing that surprised me, given that survey-based measurement is a well-established means of measurement in other fields, such as healthcare, economics, and education. So that's something I'm currently working on better understanding and working on helping educate leaders across the industry. At the same time, the leading companies in our industry like Microsoft and Google and Amazon heavily rely on survey-based measurement for insights about developer productivity. And so I think there really just is an education gap right now and a gap in the way leaders

Starting point is 00:26:44 understand and perceive the effectiveness of survey-based measurement. Yeah, I have to agree. I think even I personally was biased or thinking, what's the point of surveys? We have all of the metrics we could possibly need. We had Git metrics, CI metrics, pull request metrics, and like surveys are biased, people are going to complain. But that actually tells you about all of the issues that you just cannot obtain from any of these metrics, like, oh, a team is blocked, because they're waiting on security reviews.

Starting point is 00:27:16 And that information is in some Google Doc, hidden far away. There's just no way you're going to get that through extremely quantitative means unless you have like an organization that tracks things perfectly in a JIRA or something like that, which I don't think any organization is. So I have to agree. I think even my mind about this changed after running or like at least seeing the results of a few surveys. Absolutely. And I've spoken to leaders at companies like Google. And in fact, I was speaking with a researcher who focused on developer productivity in Google. And one thing they shared is, of course, Google is one of those companies that does have really

Starting point is 00:27:57 in-depth instrumentation across all their developer tools and systems. And so in theory, Google is an organization that could rely more heavily on system-based methods of data collection and measurement. However, one of the things that surprised me is that they told me that they use both methods. So they use survey-based methods and system-based methods. And they found that their survey-based methods provide the same information as their system-based or log-based metrics. And the takeaway from that is that companies that aren't Google, companies that don't have comprehensive logging across the entire developer tool chain, which is most companies outside of Google, Facebook, and Microsoft, those companies can and should be relying on survey-based measurement to capture these same types of metrics. So if it works for Google at their scale and their size of developer population, this approach should really work for all companies and all leaders.

Starting point is 00:28:55 One thing that might concern people is this idea of survey fatigue, which is also called out in your paper. How often should you be surveying people? Or is there like a different way you should be thinking about surveys? Survey fatigue and engagement on surveys in general is one of the sour parts of survey-based measurement. You can't get around it. And it's something that a lot of organizations do struggle with. I recently published an article sort of sharing the approximate participation rates I've heard of across the industry. And I would say across big tech companies, the average participation rate for quarterly surveys hovers around 30 to 40%. By most comparisons, that is not a good participation rate for an employee survey.

Starting point is 00:29:46 And so it begets the question of why is this the case and how can organizations improve? In my own personal experience working with a lot of organizations that use DX, we have seen sustained participation rates of 90% or above. And I don't think this can be boiled down to one feature or one aspect of how we approach developer surveys. But I will say that I think a lot of organizations make common mistakes when it comes to survey programs. And as a result, they don't see sustained success over the long term. Just some common examples of mistakes I see. One of them is just the design of the surveys themselves. Typically, surveys suffer from questionable levels of confidentiality or anonymity in the responses they capture.

Starting point is 00:30:39 Many of these surveys don't have a compelling purpose to them. They're often coming from one particular platform or tools team that's collecting feedback about their tools. It's not positioned as a global developer listening program that is capturing data about the holistic developer experience and how that data will actually be used to drive improvements. So having a clear and compelling purpose around a survey program is important. But another really important part is to actually follow through on those promises and that vision. So a lot of organizations promise the world when they deploy their survey. But after one or two surveys, developers quickly come to realize that nothing seems to be happening

Starting point is 00:31:25 with the data. Nothing seems to be happening on their team. Nothing seems to be happening with leadership. And of course, as a result, the entire survey program doesn't feel worthwhile and participation drops. And so ultimately, I don't think frequency by itself is the culprit of the challenges we see. I think frequency needs to align with the purpose and the action that surrounds a survey program and really making sure that those things are clearly defined and well executed upon. Those are really core pillars

Starting point is 00:32:01 in terms of driving sustained engagement over time with surveys. Yeah. And that last point is kind of why I'm hesitant on, you know, rolling out a survey or even a survey tool, because at least at my current company, just because I don't know if I'll be able to actually fund everything that comes out of the survey, or even like 60%. There's a little bit of a fear in me. It's like, okay, what if there's 30 issues that get surfaced and we solve six of them or five of them? Is that enough? I know there's messaging we can do, but we don't have as much funding to work on this versus other things. I don't know how often this objection comes to you or this is something that you hear about? This is a concern that I've had looking across the industry and a headwind I've seen in terms of the adoption of survey-based measurement. My current view on this is that action or

Starting point is 00:33:00 specifically engineering initiatives, change initiatives spawned from surveys, are not absolutely necessary to sustain a successful program. What ultimately matters is the perception by developers that the program is worthwhile. And collecting feedback can be worthwhile even if official initiatives are not launched as a result of the feedback. For example, if the CTO of the company just simply told developers that the feedback was really valuable and that although no action can currently be taken, the feedback is still though valuable for understanding trends and informing priorities in the future. In my view, that would be enough.

Starting point is 00:33:48 And if that message were re-emphasized by other leaders and managers across the organization, that would alone make developers feel like their feedback was valued and that the exercise was worth it. So I think the point I'm trying to make is that survey results do not need to be turned into JIRA tickets in order for that feedback to be made feel worthwhile and meaningful. A simple acknowledgement, simple communication that values the feedback and communicates the importance of that feedback to the business, I think, is most often enough to sustain a program like this. I think that is very informative for me. And it makes sense, right? Like even

Starting point is 00:34:31 informing developers that this information is going to be useful, we may not have time to work on it currently, but it's going to inform our future priorities does make it seem worthwhile. I'm curious about how you've seen companies. So you talked about Google doing surveys as well as looking at log-based metrics. Have you seen companies that have taken things like the developer experience framework or learnings from tools like DX to make changes? I don't know.

Starting point is 00:35:07 Are there some interesting case studies or just an example that you could share? Well, absolutely. I mean, at these large companies, these quarterly surveys, and actually augment quarterly surveys with more real time feedback as well. But these developer listening programs drive a lot of the priorities of the organization when it comes to infrastructure, developer tools, and developer productivity. Speaking personally with the organizations that I've worked with, there's a lot of action and initiatives and change that has come out of the insights gained through survey-based and system-based measurement. I think to bucket the two types of approaches that I've seen, and I believe that both approaches

Starting point is 00:35:45 in combination are ultimately necessary. But generally, you see a focus on either bottoms up improvement, which is a focus on local teams and local improvements. And on the other hand, you oftentimes see a focus on top down initiatives and-down change. So to give you an example, Pfizer is one organization we work with that has placed a huge focus on enabling their individual pods or squads to get the data out of these periodic surveys and review them as a team and drive improvements and changes at the local level all across the organization. In contrast, another organization that I've worked with, eBay, is much more focused on aggregating the insights at the enterprise level and driving larger, longer-term initiatives

Starting point is 00:36:40 through their developer productivity organization to drive change and improvement to all developers across the organization. In my view, both of these approaches are good, but in an ideal case, I think organizations would use both approaches at the same time. I think some problems in organizations lend themselves more to sweeping top-down or executive-level initiatives, whereas a lot of problems exist at the local level within teams, in a particular part of the code base, or due to a particular workflow on a particular team. And so those types of problems aren't going to be addressed by executives or developer productivity teams. Those problems need to be addressed by the local teams themselves.

Starting point is 00:37:26 How have you seen engineering leaders decide how much funding they should drive towards developer productivity, platform-y kind of teams? Engineering leaders have to think about funding, infrastructure work, security work. There's so many things to think about. What have you seen or what do you think the right approach is about how much total engineering

Starting point is 00:37:47 bandwidth should go towards developer productivity or developer effectiveness teams? This is one of those questions where the answer is, it depends. I've seen all kinds of examples from across industries. I recently spoke with the former CTO of Atlassian and Shopify, and he advocates for 50% of engineering headcount or FTE spend to go toward what he defines as platform work, which includes things like reliability, productivity, internal tools, etc. In reality, especially in the current environment, platform work is often being deprioritized and often can become second fiddle to core customer-facing work. And I think that's okay too. practice when it comes to what the right amount of investment is. It ultimately depends on where your business is at on its journey, the context it's operating in, and what the potential ROI is

Starting point is 00:38:53 of investments in developer productivity and experience. And I think one clear trend I see is that larger organizations, when they look at the amount of money that's being spent on their developer organization, when you think about even 1% or 0.5% productivity or efficiency gains multiplied across that headcount, there's potential for huge business impact in terms of increasing overall engineering capacity to be able to be focused on delivering more value and customer-facing work. And so I think that's one of the reasons why with the larger organizations like Spotify, Google, and Microsoft, there's a huge emphasis on understanding and improving developer productivity. And I think that's true even as you go down in size from those

Starting point is 00:39:42 large organizations down to organizations that are 100, maybe 200 engineers, typically, I see organizations start focusing on these types of problems when they hit around the 30 to 50 engineer count as far as headcount. Yeah, I think that makes sense to me. It's like each organization has its own unique challenges. And you also have to see how much does increased investment here improve leverage throughout the rest of the organization. Maybe a last question around surveys. I was just reading the paper again.

Starting point is 00:40:19 How standardized do surveys need to be? You know, like how, I guess, bespoke do you think surveys should be across different companies versus how standard do you think a survey could be? Could I reuse the survey that I used in a previous company in a new one and just change the name of the tools? What do you think the best practices are? I think that, first of all, developing reliable and valid survey items is actually a very involved and expensive process. And so most organizations, I think, do not have the expertise nor the resources

Starting point is 00:40:55 to design effective surveys that produce reliable measurements. And so that's one vote for standardization, in my opinion. That being said, no two organizations are the same. And in a limited survey consisting of n number of questions, it doesn't make sense that every organization would be focused on the same things. And so I think to answer your question, the ideal case is some sort of combination. I think organizations can benefit from leveraging the work of experts or other organizations who have developed accurate and reliable measurements that are survey-based. However, organizations will also benefit from thinking about their own particular needs and use cases and developing their own measurements and data points that they want to

Starting point is 00:41:45 capture. One distinct advantage of the former approach of standardization is, of course, benchmarks. And with survey-based metrics, similar to a lot of system-based metrics, it's difficult to contextualize the data and inform decisions without being able to compare your data against the data of others. And I think that's especially true when it comes to sentiment-oriented measures, things like satisfaction or ease of use of tools. These things are difficult to understand in isolation without contextualizing them against industry benchmarks or the data from peer organizations. So I think that's also one point to consider when thinking about the benefits of standardization versus investing in bespoke development. And to open it up a little bit,

Starting point is 00:42:41 I know that DX is working on getting metrics from tools directly as well. So I think DX maybe started with an approach or like a focus on surveys, but now y'all are thinking of expanding it to get metrics from GitHub and Jira. What do you think is going to be different about your approach, given all your learnings over the years? It's a great question. It has come as a little bit of a surprise to myself that I find myself on year 10 of this journey in engineering metrics, having already built solutions multiple times that deliver system-based metrics to now find myself working on the same problem again. That being said, it's clear that the solution that organizations need for getting complete insight into developer productivity, but not

Starting point is 00:43:34 just developer productivity, even just operational performance or team operations, requires more than just periodic survey-based data. And one of the opportunities that we've seen is to provide a different way of unlocking system-based metrics and insights as well. So what we're doing that I think is different than what currently exists on the market is that we're providing an unopinionated platform for system-based data. We really are providing a platform rather than metrics in a box, which is the approach that I've previously taken with this problem. And you see most companies and vendors in this space taking. There's a lot of vendors out there who offer cycle time and DORA metrics and quote-unquote space metrics in a box. They proclaim that they have research-backed

Starting point is 00:44:26 metrics that will solve all your problems as an engineering leader. And of course, we know from our experience that those claims are far from the truth. What, however, is true is that there is some value in different cases for those types of metrics. However, it really depends on the individual needs and contacts of each organization. And so to provide a solution that can meet organizations where they're at, what we're focused on is providing a platform that enables free access and visibility and reporting across both third-party tooling, such as GitHub, as well as bespoke internal developer tools, which more and more make up a considerable portion of the developer tool chain, especially at larger companies. So what we want to do is provide

Starting point is 00:45:16 a third-party solution to really just save organizations the time and trouble that many go through to build their own data pipelines and data warehouse specifically for engineering data. And since it's unopinionated, like, as an engineering leader, how would I approach using the platform? Like, would I be doing a bunch of setup? Or is it that the metrics kind of come in, but I get to decide how to use them? If you could just walk me through like an example. Yeah, so what we provide is a ready to go or a data schema and data connectors that are ready for analysis. So if you're an engineering leader or lead a developer platform team, you can use our

Starting point is 00:46:04 solution to quickly centralize all your data across your internal bespoke tools, as well as third-party tools like Jenkins, CircleCI, GitHub, Jira, etc. And what I mean by unopinionated is that we don't offer charts out of the box. Instead, we provide an enriched schema that enables organizations to run the queries and reports and metrics that they are interested in for their specific needs. So that's, I think, the major point of difference between our approach and the approach of a lot of organizations out there is that we've seen that many companies who purchase off-the-shelf metrics tools find themselves in a tricky place where the metrics that come out of the box aren't exactly the metrics that they actually want from the system. Or worse yet, sometimes organizations

Starting point is 00:46:58 like the metrics that are offered, but actually want to display those metrics in other places, for example, in their Looker BI dashboard or within an internal developer portal. So with our approach, we provide an extensible platform where we really just provide the data in a ready to analyze and use format. But you can use that data in any way you want, whether it's plugging it into your existing BI tool to produce reports, or whether you want to build an internal developer portal for presenting these metrics in a specific way. So we offload the costs and complexity of engineering work that goes into building data pipelines and data warehousing solutions around these types of tools, but we don't tell you exactly what your reports should consist of. Yeah, now I'm just imagining all sorts of things like if you could slice and dice like, you know, developer NPS data on how good their developer experience is, if you could break that down and put that right next to perhaps some operational metrics, it seems like it has a

Starting point is 00:48:04 lot of potential. And I can already think of how I might use something like that. Yeah, that's a good point. And I actually completely forgot to mention that. What you just described is one of the key reasons why we're excited about this solution and the value it can bring to organizations is that one of the unique things about this platform is that in addition to centralizing data from bespoke tooling and third-party tools like GitHub, this solution also collates the data from surveys from our existing survey-based measurement platform. And so combining that system data alongside the survey-based data can unlock unique insights

Starting point is 00:48:47 like what you just described for customers. And that's one of the other advantages that we see in this approach of combining both survey-based and system-based measures. Yeah. Well, I am still excited to see all of the future research papers that come out. I'm getting to learn a lot. So thank you so much for being on the show. And I hope you had fun. Thanks, Itzav. Yeah, this is great. Thanks

Starting point is 00:49:11 for having me. Of course.

Your Ad Here

Software at Scale - Software at Scale 58 - Measuring Developer Productivity with Abi Noda

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.