The Changelog: Software Development, Open Source - The 10x developer myth (Interview)

Episode Date: March 31, 2020

In late 2019, Bill Nichols, a senior member of the technical staff at Carnegie Mellon University with the Software Engineering Institute published his study on "the 10x developer myth." On this show w...e talk with Bill about all the details of his research. Is the 10x developer a myth? Let's find out.

Transcript
Discussion (0)
Starting point is 00:00:00 You know, they can't be all the skills at once. While they may have them all, they can't be them all at once. Exactly. And even more to the point, you can't teach people to be intelligent. I mean, you've got to have a certain base capability. But most of these other things, like using a tool, using Git or others, revision control effectively is a skill. It takes skill and it takes repetition. Doing good design involves a lot of skill and a lot of practice.
Starting point is 00:00:25 Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com. We move fast and fix things here at ChangeLog because of Rollbar. Check them out at Rollbar.com. And we're hosted on Linode cloud servers. Head to Linode.com slash ChangeLog. Linode makes cloud computing simple, affordable, and accessible. Whether you're working on a personal project or managing your enterprise's infrastructure linode has the pricing support and scale you need to take your ideas to the next level we trust linode because they keep it fast and they keep it simple check them out at linode.com change log Welcome back, everyone. You are listening to the ChangeLog, a podcast featuring the hackers, the leaders, and the innovators in the world of software.
Starting point is 00:01:15 I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog. In late 2019, Bill Nichols, a senior member of the technical staff at Carnegie Mellon University with the Software Engineering Institute, published his study on the 10x developer myth. Today, we talk with Bill through all the details of his research. Is the 10x developer myth? Let's find out. So Bill, you are a senior member of the technical staff at Carnegie Mellon's SEI. That's the Software Engineering Institute. We're here to talk about your paper, which you published in IEEE Software back in September, called The End to the Myth of Individual Programmer Productivity. A good title caught my eye.
Starting point is 00:02:00 First of all, thanks for joining us. Oh, you're welcome. Thanks for having me here. Good to have you, Bill. So there is a lore, there is a myth, or maybe it's not a myth. You're here to end the myth. We're going to talk about it, about the 10Xer. Maybe even more than 10X, but definitely that particular term 10Xer has become a thing wherein certain programmers are massively more productive than other programmers.
Starting point is 00:02:27 And it's really based out of anecdotal or experiential evidence in many cases. And I thought it was interesting that you actually went and did some research, did a study, and so we thought we'd bring you on and talk about it. First of all, I'd love to hear why conduct this study and why this is something that interests you. Okay, well, let's start off with, I'm working on a research project ongoing called SCOPE. I won't go into the acronym, but it is looking for what are the factors that affect software development? What affects the cost?
Starting point is 00:02:58 What affects the productivity, the durations, and so forth? And one of the things I wanted to look at were some of the smaller micro factors that we could look at almost experimentally. And I had this data sitting around from the personal software process course we taught for 20 years. I had essentially 20 years worth of data. I think in this study I used about 10 years of it. And I wanted to see if we could find evidence for the effect of the individual programmers. And a lot of what turned up from that
Starting point is 00:03:35 study was mostly a null result. I couldn't find factors that affected programmer productivity. For example, we looked at things like experience. No real effect. To the extent we saw anything, it looked like experience actually was inversely related to productivity. Okay, I have some potential explanations for that. Yeah, well, what it probably means
Starting point is 00:04:01 is that the best programmers were being promoted out of programming into management. Well, that's definitely something. We've seen that happen, yeah. It was a very small effect, just to be clear. It was statistically significant, but very small. The other thing to keep in mind is that during this entire period, the number of programmers was growing very rapidly. So there might have been an effect based on training.
Starting point is 00:04:26 More people were going through universities, were being better trained. There are a number of things it could have been, but it wasn't a big effect. So I decided I wanted to look at this in more detail because it just seemed kind of odd. So I started looking at how big is the variation in the programmers themselves. Well let me back up and tell you a little bit about what is the the data set. When we taught the personal software process we were actually trying to teach them how to estimate work and how to improve their quality, how to reduce the number of defects they put into
Starting point is 00:05:05 tests, how to find defects before they got there, and so forth. And the program, we used programs to reinforce the learning as practicals. And we used the same programs for a number of years. So we had essentially several thousand programmers go through this course, all of whom programmed the same 10 exercises from the requirements. Now, they used their own development environment, their own programming languages, and so forth. But that was kind of a unique set of data. And I decided to try to take advantage of having this large cohort of developers who did exactly the same program and look at how they varied.
Starting point is 00:05:51 What kind of exercises are these? Are they small scope things? They're relatively small. I would characterize them as the kind of thing that a reasonably experienced programmer should be able to complete in an afternoon. Okay. experienced programmers should be able to complete in an afternoon. So somewhere between two to six hours would be the norm. If you have some trouble, it might take a little longer. We had people who sometimes took up to 10 hours, and they were outliers.
Starting point is 00:06:23 But basically, if you were thinking about, these were not terribly challenging problems. They were non-trivial, but they weren't terribly challenging. Things like you had to program a linked list and use that in a program to read in some data and sort. Some basic introductory things like that. Some of the later programs involved. I think the most complicated was the last exercise, the next to last exercise, where they had to build a matrix inversion. They had to solve a set of linear equations for multiple regression using a matrix inversion. But they were given the specifications.
Starting point is 00:06:53 I was going to say, are these things being taught in the class? These are things that they tell you the algorithm and you go ahead and implement it, right? Right. They were taught, they were given the algorithm. They were given any special knowledge. Like when they had to do a Simpsons integral, which is a numeric integration. But they were given the algorithm.
Starting point is 00:07:10 So they didn't have to do the hard analytic part, they just had to program it. And we found that pretty much anyone who is able to get through a university and work as a professional programmer can do these sorts of exercises. They shouldn't be those out-there type of problems where you have to be a genius to figure them out. In fact, it should be pretty much normal. So I think the context of this course is helpful as well because at first I was like, well, why are there people, you said there was people with an average of 3.7 years experience,
Starting point is 00:07:45 but there were people with 36 years experience. And I was thinking, why would somebody with 36 years programming experience be taking a college course? And I think the context of this personal software process. So tell us about what you were teaching. This PSP was not something I even heard of before. So maybe it's very common and I'm out of the loop. Maybe it's old-fashioned, I don't know.
Starting point is 00:08:03 But tell us about that and the kind of people that were in the class. All right. Well, PSP, Personal Software Process, wasn't all that common. But it was a technique we developed at SEI, actually mostly the work of Watts Humphrey. He was a little bit of an aside. You probably have heard of The Mythical Man Month, the book The Mythical Man Month? Yeah, Frederick Brooks. By Frederick Brooks? Okay. bit of an aside. You probably have heard of the book The Mythical Man Month by Fred Brooks.
Starting point is 00:08:26 Okay, what not everyone knows is Fred Brooks was fired from that project. The one that blew up on it. And he wrote the book about it. Watts Humphrey was the guy who took over the project and brought it in. Huh, I did not know that.
Starting point is 00:08:41 It's a little bit of history. So I worked with Watts for a number of years. It's part of what got me to go to the SEI. And I took the personal software process. It was really part of teaching developers how to plan and manage their work so that they could work on an industrial project and know how to talk to their management, negotiate schedules, bring the projects in on time.
Starting point is 00:09:05 And the two key things we taught were estimation and quality. The main things in quality were applying design techniques, then showing them how to do disciplined reviews of designs and code so that you'd find your defects before a test. And what we tried to do was show them, hey, you can actually write a program and you can get this to compile first time without getting compile warnings. Now, in the days before IDEs, that was a big thing. Yeah. And we showed them that you should be able to
Starting point is 00:09:40 get into test and it should run the test first time. It's not going to be, you're not going to get into here and run the test to find out where your problems were and fix the ones that hit. You're going to actually get it to run the first time. And the thinking here is this makes the entire development process more consistent, makes it more predictable, and gets you a much higher quality product in the end. So there were a handful of adopters, and the people who were in these PSP courses were from industry.
Starting point is 00:10:12 There were some university classes, but 90% of the data was from industry. It was from organizations that we were working with, trying to teach them how to manage software projects. And part of the training was taking the PSP course. So the developers on the team would all take the course. They'd learn these techniques for how to measure the work, how to estimate it, how to improve quality, and then we would use that to help them manage their project. And we had some very notable successes with that.
Starting point is 00:10:44 I could go into a few examples, but probably the most spectacular was the rewrite of the Bursatech trading engine. You probably haven't heard of it, but Bursatech is the stock trading engine they use in Mexico. And they were falling behind, and there were really only a handful of engines in the world. A handful of those big trading engines for stocks, derivatives, and so forth. One of them was in Mexico. And their choice was either rewrite it because it couldn't keep up with modern times, or buy the one they were using in New York or London. So they decided to rewrite it.
Starting point is 00:11:21 And they, with a relatively small team, I think maximum was a couple of dozen people, it took them a little over a year. They put together probably about 500,000 lines of code, and they had to run it in shadow mode before it went operational. In the first year of operations, it had a total of downtime that was on the order of minutes. Literally minutes of downtime. And their performance was about a factor of 10.
Starting point is 00:11:51 They could handle about 10 times the number of transactions of any system that existed at the time. So that was part of the benefit of doing things right the first way, doing things in a disciplined manner. And that was kind of where all of this PSP training was coming from. But what that really led to was I've got this large body of data from the PSP training that we could repurpose for other things. Like knowing how well people can really estimate. What were the results at the beginning
Starting point is 00:12:24 versus the end of the course? We can measure things like how many lines of code do normal programmers tend to be able to write? What's their variation? We haven't finished the study yet, but we started looking at the differences in the programming language. All these people using a variety of different languages. Some used C, C++, a few in Fortran, all sorts of boutique languages, Java, Basic. How do these differ as far as how long it took them to produce the code, how big the programs are, how defect-prone they were? Lots of different stuff. All sorts of different things you could look at observationally with this data, yeah. I know we're here to talk about productivity, but on the
Starting point is 00:13:04 estimation side, give us the summary. Is estimating a reliable thing you can do? Oh yeah, well you can get reasonably good, but there are limits. We found initially people were consistently underestimating the work involved. Act surprised. Exactly. Yeah, okay,, what we taught them was, well, if you can get a good estimate of the size, that'll be a really good proxy for how long it's going to take. It's not going to be perfect, but it's going to be a good proxy. So we focused on how to teach them size estimate. We used a t-shirt technique, you know, do a preliminary type of design, not a real design, but just what are the pieces you need? How big are these pieces? Are they bigger than a bread basket? Small, medium, large? We've got an inventory of
Starting point is 00:13:48 stuff you've already written. Put them into bins. Sort them. What does a big look like? How big is a medium? And by just throwing these into bins and aggregating them, you can get a pretty accurate result. We found that they could normally get to about, they would typically start off with a variance of about 50%, and we could get it to about 30%. Yeah. But at the end, they were centered on the average instead of way underestimating, and you didn't get the big outliers. So it was not a spectacular result, but it was pretty consistent.
Starting point is 00:14:24 We showed, yeah, this really does kind of work. It also says that the best you're ever going to do, probably on a project of about a day, is going to be somewhere around plus or minus 30%. That's just life. A lot of variation in how long it's going to take you. And these are personal estimates too, right? These are not team estimates. Right.
Starting point is 00:14:43 These are personal estimates too, right? These are not team estimates. Right. These are personal estimates. And that really should have been the hint what I was going to find when I started looking at individual productivity. Why should that have been the hint? Well, we had these high variations, these high variations in their estimates and that kind of suggests that there are other factors involved here. It's not just a matter of estimating. It really suggested, it suggested without demonstrating,
Starting point is 00:15:11 that people would sit down to work and it was probably not going to be what they thought or they might be running into days when they're slower or days when they're faster. And when I started to really dig at the data, that's where it kind of got interesting. What I did initially was I said, okay, everyone's done the same program. What was the variation of how long it took them to get that program done?
Starting point is 00:15:36 So I'm not going to complicate this by looking across different languages. Let's just look at the people who use the same language. So that's why I trimmed it down to about 13% of the data set and just looked at the C programmers. I can do this with C++ and Java with slightly smaller sets. Beyond that, it gets into really small data. But let's use the same program. Okay, now I turned out there were 494 of these developers. Okay. So I ranked them one through 494. I used 500 because it's close and it's a round number. There were essentially 500. Okay.
Starting point is 00:16:13 And I could actually, rather than looking at the size of the program and lines of code or anything, I just said how long did it take them to finish. Okay. Then I did the same thing for the second program and the third. And one of the things that really struck me was that the distribution
Starting point is 00:16:31 of the times was just remarkably consistent. That is, the ratio of the fastest programmer to the 25th percentile, the median, the 75th, they look like almost straight lines if you plot these across the 10 programs.
Starting point is 00:16:48 There was a lot of jitter when you looked at the outliers, the ones in the slowest 25%. A lot of variation there. On the edges. But everything else looked pretty consistent. It's funny how that is on any plot really. When you've got the extremes, you've got the far right, the far left, and the middle, you tend to have a lot of jitter on the fringes. Yeah, but the thing is, on the fast side, you can only go so fast.
Starting point is 00:17:16 So that was kind of constrained. You're not constrained on the slow side. So that's where you saw most of the real variation. There's no control for quality in there, because I always think of the three little pigs, you know, and the big bad wolf, like that pig who built their house out of straw was done first. So, I mean, you can schlock together a solution that's not
Starting point is 00:17:33 quality and that doesn't really mean you're being productive. It just looks productive. That is true. To the extent that we had the controls on quality, we were measuring the number of defects that turned up when they ran tests. So I could look at those independently and we were kind of guiding them along the way
Starting point is 00:17:50 that they were doing certain quality things. But you are correct. There's no guarantee that they were consistent in how carefully they were handling all of the edge conditions. And we basically told them, don't worry about all the edge conditions. Your job is to get this program to run.
Starting point is 00:18:07 Don't gold plate it. Gotcha. So we were trying to guide them to give us, if not the minimal program that would solve the requirements, get something not far from just the most basic that satisfies these requirements and runs correctly by completing these specific test cases. So all C programmers based on the subset,
Starting point is 00:18:29 all producing the same program with similar control patterns, at least, and understandably similar defect testing in terms of the test written. So as close as you possibly could be to control. Yeah. We applied a certain number of controls to it. We weren't deliberately trying to control it because this was not meant to be an experiment. Well, my next question on that front is, did they know they were being tested on this front?
Starting point is 00:18:55 Did they know they were being a data set for productivity? Obviously, they're in this course, and it's about that. That's the beauty is he backfitted. Well, there was no real intention of ever making it a test for productivity. That was something
Starting point is 00:19:09 that we looked at after the fact. That actually kind of makes it more valid because they were just moving at their regular pace versus like you telling me I'm in a race.
Starting point is 00:19:16 I'm going to race as fast as I can. Exactly. And there is no intention. The whole point of this exercise was to help them do something predictably. We wanted them to be in their normal development environment. So what you found were the, let's talk about the outliers a little bit,
Starting point is 00:19:36 because in my mind, that somewhat lends credence to the lore of the unicorn developer who is the 10Xer, who is the outlier? Isn't that kind of the mystique that people are trying to hire those outliers or find the people who really are that way? There you go. That is the mystique. Now, the funny thing is,
Starting point is 00:19:58 when I looked at the data, if I looked at these things using a lot of different techniques, some of them I borrowed from social science on looking at the variation. Tim Menzies told me, no one's going to understand this and it's too complicated. Let's try to simplify it. So that's why I started doing things by ranking them, just doing ranked analyses. It turned out the same people were not in the same position in all ten programs. Nowhere near it.
Starting point is 00:20:31 One of the plots I'm the proudest of, if you look at the plot number two in the paper, what I did was I took the programmers' median ranking that is, where did they finish among their peers on their median? They basically did 10 exercises. Was their median number 10? Was their median number... Well, I ranked their medians.
Starting point is 00:20:56 I'm sorry, what I did was I took their medians and I ranked them. I ordered them, sorted them. What I did then was I took their range. So I used basically a binomial type of thing, non-parametric. I think I used take the 10 programs, throw out the two smallest and the two largest, and then you say, what are the odds of being above this 8 out of 10 times? And that gives you like a 90% confidence range. Their true median will be somewhere in there.
Starting point is 00:21:27 And their range, this range of where you could place them was pretty close to half the sample. It was very hard. It turned out that that range turned out to be pretty close to about a quarter of the sample. And if you took their full range, it was about half the sample. So that means the same developer, take your 250 out of 500. Some days he's going to be at 250. But almost as likely going to be at about 125 or 375.
Starting point is 00:22:06 And if you go down to the very best programmers, the ones who were typically the fastest few, their numbers typically ranged up to around 100. Where you saw the most variation was out at the slow end. And you should be aware that not everyone who took the class was a professional programmer. We had some people here who were managers who hadn't programmed in a while. We had some people who might have been with the process improvement group who weren't current. So it's not too surprising that we had some outliers on the slow side.
Starting point is 00:22:40 But what the data was telling me is that, yeah, there's definitely a difference between your best programmers and your weakest programmers. It gets really hard to tell the difference between a good programmer and a pretty good one. You can kind of tell an average programmer from your very best, but you better have a few data points because there's a good chance that you're going to get them wrong if you just take one sample did you have any of the this spectrum across the years so it was like you mentioned it was over a decade right 10 years is that right so it was over a decade was there any variance in terms of like when like first five years it was better or worse? Or was the year range by any degree different? The honest answer is I haven't looked at that carefully.
Starting point is 00:23:30 It's actually a pretty reasonable question. I don't know if I have enough data points to get that accurately. But it's a very good question because if there is an effect in things like the training, that's the kind of place where it might turn up. Yeah. I was just thinking in terms of just the fact that, you know, over time things happen inside of people's lives that may not in others. And maybe, you know, first five years there could have been a, you know, global pandemic happening or not. You know, so things will change obvious human behavior, you know, so that might be an outlier or that might be an indicator of data change. Because someone could go through a divorce, you know,
Starting point is 00:24:10 have a loss of life in their family. Different factors that weigh on them personally over time. Well, nobody was in it for 10 years, though. These are different students. Right. That's a good point. So one of the things that I was looking at is the difference between what I would call between and within.
Starting point is 00:24:28 In the social sciences, you'll have differences between groups, things like socioeconomic status. And you'll have some differences between groups, but you'll have some differences within groups. My group was the individual programmer. Each programmer was his or her own group. So this was a repeated measures approach. They took 10 samples, repeated measures. And what I say within, I say, what is the variation of an individual programmer? What is that programmer's variation
Starting point is 00:24:56 across these 10 exercises? And those all were being done within a few weeks typically. So that was a very small time span. I did not get a chance to look at their behavior over time. We didn't get a chance to revisit that. But we did get a chance to look at what was the behavior of this overall super group of programmers, what was the variation there, and what was the variation of the individual programmer. And the funny thing is when I looked at the total variation, when I did things like regressions and very conventional measures variation, it was almost 50-50. That is, almost
Starting point is 00:25:34 half the variation was what was total group-wide among the 500 developers, and about half the variation was within individual developers. So it told me that the individual programmer varied from himself almost as much as the different programmers varied from each other. Now, again, to be clear, that doesn't mean that some programmers really aren't better than others. It tells you that there's a lot of noise in this data. And if you try to pick out the superior programmer from one sample or a couple of samples, you're probably
Starting point is 00:26:12 just chasing noise. And that's kind of where I had the observation that all of these experiments that people have run before and I cite them in the paper, DeMarco and Lister, some observations from Curtis, the original paper by Sackman. They all basically took a snapshot. How did the group of developers do on a program? And I said, well, look, I reproduced their data. On any one of my programs, it looks very much like what they were showing. But if I do use the
Starting point is 00:26:43 same people on another one, I get the same distribution. It's just all the people moved around. So in a sense, I can replicate their data, but it seems to indicate that it means something different, that we've been ignoring that huge variation of how the individual performs from day to day. Different programs, different problems.
Starting point is 00:27:08 Maybe you didn't sleep well the other night. A more typical problem might be you made some boneheaded programming mistake and you just couldn't see it. You didn't see it until someone walked by your desk and said, what's the problem? And they saw it right away. How many times have you programmed and that's happened to you you just get stuck for a while too much too often too much and and all i'm telling you is
Starting point is 00:27:33 take some shower time to get past yeah take a walk or a walk or a bike ride that's exactly the kind of thing we told them to do so when they did that by the the way, they stopped the clock. They were taking the time. It's not the wall clock time or the calendar time. It's literally stopwatch time. They were using a tool where they would hit a timer button. I'm working on this program. I'm going to take a break. I'm going to stop.
Starting point is 00:27:58 Gets rid of a lot of noise. So the message here is that that kind of variation, that kind of problem is normal. If you see that happening to yourself, it just means you're normal. In this new world of remote first, more and more teams are looking to build video into their apps. Everything from media publications, education and learning platforms to communities and social platforms. If you're trying to build video into your app, you're probably deciding between having full control by building yourself or faster dev cycles with an out-of-the-box platform. Well, Mux gives you the best of both worlds by doing for video what Stripe has done for payments.
Starting point is 00:28:41 In a world of complicated encoding, streaming, multiplexing, and compression, Mux simplifies all things video to an easy-to-use API to make beautifully scalable video possible for every development team. Mux lets you easily build video into your product with full control over design and user experience. Videos delivered through Mux are automatically optimized to deliver the best viewing experience, and you don't have to deal with the complaints about rebuffering or videos not playing. Get started at get.mux.com slash changelog. They're giving our listeners a $50 credit to play with. That's over an hour's worth of video content that you can upload and play with. And check out all their features, including just-in-time publishing,
Starting point is 00:29:19 watermarking, thumbnails, and GIFs. To get the credit, just mention changelog when you sign up or send an email to help at mux.com and they'll add the credit to your account again get.mux.com changelog so bill if you were to summarize your findings i know we talked about them some but if you were to summarize your findings, I know we've talked about them some, but if you were just to lay it out, here's my takeaway from my findings of doing this research and digging into this data I already had. Could you give us the executive summary? What's your take on this scenario based on what you found?
Starting point is 00:30:00 My short summary would be, there's a lot of variation in program developers, and you're going to see about as much variation with one developer within that developer as you're going to see between the developers on your group. So that's one of the things that you just have to be willing to deal with when you're going to be planning your projects, that there's going to be a lot of variation. And the big source of variation is really just the individual variance. That'll be at least as big as the problems you have with some being better than others, as long as everyone is capable,
Starting point is 00:30:39 as long as you have people capable of doing the work. What's your goal with the findings? Who are you trying to communicate primarily to? Are you trying to help teams be more efficient? Are you trying to help larger industry users, the gigantors versus the small teams? Who are you trying to really influence with this summary? That's a good question. And I wasn't really trying to influence anyone in particular, but if I were giving advice, I would say what you really need to do is focus on hiring good quality people and don't obsess on trying to find that unicorn developer.
Starting point is 00:31:19 Just to be clear, that unicorn developer might be able to do things that your normal developer can't. You may be able to find that unicorn who can solve the super difficult problem. I mean, how many Linus Torvalds in the world really are there? But if you're trying to put together a team that's going to build software, you need to find capable people. You want to be above a cut line of capability. But beyond that, you're not going to help yourself by trying to find the best 5% programmers and stacking your team. It just isn't going to work. Your best bet is to get a good set of programmers capable and then put them in an environment where they can succeed. And there are a lot of other things
Starting point is 00:32:03 you can do environmentally to make the work process more effective. This reminds me a lot of what you're suggesting here from that movie Moneyball. Anybody seen that movie Moneyball? Mm-hmm. Where the data is essentially like a lot of the teams are trying to optimize for the Lou Gehrigs, the amazing, the best ever, and not the base hitters,
Starting point is 00:32:23 not the people who can be consistent. And it sounds like what you're describing, Bill, is focus on people who can do the job and be consistent at it and get base hits. Potentially even doubles because base hits lead to more people on base and more people to use the baseball analogy fully to get back to home. That's the point of baseball is to get back to home. Right. And get a run. Well, it seems like on top of that, so one thing, I mean, I think about NBA teams because there's a strategy of just stacking all these all-stars on a single team. You know, you get five of the best players in the league on the team.
Starting point is 00:32:57 And what you find is that that can backfire because they have interpersonal problems. They all want the ball all the time, et cetera, et cetera. Sometimes it works out, but oftentimes it doesn't however i think what bill is saying here is like in the nba they have much better stats right like you can look at like the guy who shoots 65 from threes well nobody really does that really good for three three-point shooter and say well i know i got the best three point shooter but when it comes to programming there he tried. Bill tried, and what he found was a lot of noise, right? He can't even get the stats right.
Starting point is 00:33:29 That's really very perceptive. A couple of things that I would build on here is that's one of the metaphors that I've used over time. I was one of the I don't know if you guys have ever read Bill James. I was reading him back when he did his first historical baseball abstract.
Starting point is 00:33:47 And when I saw some of this measurement, I said, ah, this is what I've been looking for. Really? Because it solves the problem. And, well, let me go back to another 1980 metaphor, and that was the miracle in ice team. Oh, yeah, 1980. Herbie Brooks said, I don't want the best players, I want the best team. He was criticized for not taking some really good hockey players.
Starting point is 00:34:10 But he had a specific type of game he wanted to play. He needed certain types of skills, and he needed a certain mix of skills to make it work. And for a software development team, it's really the same thing. It's not like you have to have the Harlan Mills surgical team.
Starting point is 00:34:26 But you do need to have certain types of skills. You need to have people who know how to use different tools. You want to have people who are really good at design. We've been most effective when we had one or two people who were really, really good at reviewing work. They might have been really good at reviewing designs. They might have been really good at reviewing work. They might have been really good at reviewing designs. They might have been really good at reviewing code. And when you put those on a team, it's where you get that situation where the sum really is. The whole really is more than the sum of the parts.
Starting point is 00:34:56 So when you put together the right set of skills, you get a team that's much better than anyone individually. It would make sense, too, to sort of stack the team up with that kind of team or that kind of skill set across the board because one person can't handle that much cognitive load anyways. Exactly. They can't be all the skills at once. While they may have them all, they can't be them all at once. Exactly. And even more to the point is you can't teach people to be intelligent.
Starting point is 00:35:27 I mean, you've got to have a certain base capability. But most of these other things, like using a tool, using Git or others, revision control effectively is a skill. It takes skill and it takes repetition. Doing good design involves a lot of skill and a lot of practice. When you compose teams who have members on them who have each really worked at honing a certain skill, they can come together and you can have a more effective team because, hey, now I know someone I can go to if I have a certain type of problem.
Starting point is 00:36:00 And going back to the baseball analogy, if Babe Ruth hits a lot of home runs, they're worth a lot more if he's got a few guys in front of him getting on base. That's true. Absolutely true. What are your thoughts on people trying to derive insights from a repository of code? So rather than study the humans, study the artifact, the output. There's a lot of work going on in looking at those artifacts, and I certainly think there's a lot you can infer from them,
Starting point is 00:36:28 but you've got to be cautious too, because you don't necessarily know how it got there. All you have at the end of the day is the artifact. So what I've tended to focus on is the entire development process. What are the steps that you took to get there? Now, sometimes you can infer other things from the artifacts based on, depends on things like how many times you check them in. What kind of other information you're gathering?
Starting point is 00:36:53 Are you getting all of the defects that are recorded? Are they only being recorded after the release? Are you getting defects recorded in test? Which version of test? So I'm not going to say you can't get things from that. I'm just going to say that it is limited and the thing that you really have to be aware of is what were the
Starting point is 00:37:10 limits based on what information was recorded. There's a whole discipline of mining software repositories. A lot of that work is going on. I enjoy sometimes trolling through subreddits, especially of cryptocurrency projects,
Starting point is 00:37:27 because there's a lot of people who are fanboys of a project, which is ultimately a software project with a financial thing attached to it, and yet they are unable to judge a software project. And they'll often post how many commits they've been making this week on their GitHub repository as some sort of benchmark
Starting point is 00:37:50 that's meant like, hey guys, 105 commits this week. We're killing it. It's like, well, what is a commit? Not every commit is created equally. It's just funny because you have these people misapplying. Especially if 100 of them were typos.
Starting point is 00:38:07 Just saying in documentation. Or you just write a bot that just makes it commit every once in a while to make it look like you're working on it. There are two things I would say about that. That's a very insightful comment. The things that we have coached the development teams on is, okay, you're taking all this data, you're recording all this data. Don't use this as a contest.
Starting point is 00:38:28 Don't use this to rate people. Don't even think about it. In fact, it's better when you report this to management that you give them things like aggregated numbers that are really on point for are you on progress? Are you doing the things you said you were going to do? Because if you start focusing on your numbers, you're going to put the cart before the horse, and you're going to be doing the things you said you were going to do? Because if you start focusing on your numbers, you're going to put the cart before the horse and you're going to be doing the wrong things. So it's really a matter of you're taking these numbers
Starting point is 00:38:55 entirely as a way of getting insight into what you're doing. You don't want that to be the end in itself. Now, if you just are recording kind of data that you're doing. You don't want that to be the end in itself. Now, if you just are recording kind of data that you're using and you're seeing these numbers, then you typically will see fairly predictable distributions. So over time, you're going to see a distribution, small, medium, large, for example, and you can kind of add them up together because as long as the distributions are consistent, you'll get reproducible results. It's when you start artificially manipulating those outputs that the data becomes useless. Yeah, and how often does your own personal bias kind of play a role
Starting point is 00:39:36 in your findings? You almost have a preconceived assumption, which is your bias going in, and then you kind of... You're digging for what you want to look for. I don't know. Handle the data in such a way that you come to the outcome you assumed you would come to. You've just described one of the big problems in the replication crisis. Have you read much about that? No. Oh, okay. Look up replication crisis. There's a lot of literature on that, especially since about 2005. A Stanford professor, don't ask me to pronounce his name, Iodinus, was noting that it was very hard to find results in social sciences that replicate. And a lot of that is because a number of very predictable things.
Starting point is 00:40:17 You're going to have the file drawer effect. If you don't find what you want, your study never sees the light of day. There are all sorts of things that people call like P-hacking. P is the measure of statistical significance. So there are easy ways that if you want to get a result, you can manipulate your data to kind of nudge it in that direction. That's why you have to be wary of documentaries, because it's very easy to watch pretty much any documentary and come out a believer, not because of your own personal inherent beliefs prior to, but because they're
Starting point is 00:40:50 the whole point of the documentary is to lead you to a conclusion. And that conclusion is their conclusion. And in some cases you really can't help it because they've proven all the data to that direction versus the other. There's no argument. And the hardest part is sometimes they're actually right. And then other times they're wrong. And it's like, how do you know? How can I know? You're just so convincing. Exactly. Yeah. It's really tough. There's no argument. And the hardest part is sometimes they're actually right. And then other times they're wrong. And it's like, how do you know? How can I know? You're just so convincing.
Starting point is 00:41:08 Exactly. It's really frustrating when you have all these studies out there. And how many times have you watched the news on the latest medical study? And okay, is fat good for you or not good for you today?
Starting point is 00:41:23 Who knows? We know that one. Just kidding knows? We know that one. Just kidding. That's good, man. We do know that one. But we didn't always. Well, I mean, even in today's climate, too, when you think about how do you push back against coronavirus is, in a lot of cases, a lot of the experts say testing. Because if you don't know where it's at, you don't know how to fight it or how it's mutating or how it's affecting the
Starting point is 00:41:45 population, who it's affecting all these different factors. And it's, it's just sort of, if you don't have the test, you don't have testing in place. Yeah, exactly.
Starting point is 00:41:54 You know, you can social distance all you want, which is a great precaution. But until we have conclusive testing in any given population, you can't understand where the spread is and how it's affected the population. And you won't really know the spread is and how it's affected the population. And you won't really know, when can we stop social distancing? Exactly.
Starting point is 00:42:10 How are you going to measure it? So those all touch on a lot of the problems we have in software engineering trying to get data. You talked a little bit before. One of the metaphors I've used is you can watch baseball games. How many people are at a typical ball game? 20, 30,000? Some of the big cities I've used, is you can watch baseball games. How many people are at a typical ball game? 20,000, 30,000? Some of the big cities, maybe 50,000?
Starting point is 00:42:30 Big games, you might have a couple of million on TV. And for generations, we have evidence now that for generations, there were a lot of serious mistakes in the valuation of those players. Now, these are players who we're watching do their jobs. They're keeping meticulous records. We're reading the summaries in the newspaper the next morning and we can still make mistakes in valuation. What does that say about the rest of us? We're not doing so well.
Starting point is 00:43:01 Well, it's just the state of being, isn't it? Yeah. Yeah. Well, let's bring it back to some more studying you've done before. This study you share from, I believe you mentioned it before, Menzies from 2016, are delayed issues harder to resolve. And you're kind of revisiting this idea of cost of fixed effects throughout a lifecycle of a software project.
Starting point is 00:43:20 So kind of break into what this acronym DIE stands for and what it's all about. Well, Tim came up with the delayed issue effect. And in this case, what we were looking at were some industrial data. I had data from, I think it was about 190 different software development teams. We had all of the defect data that they had collected throughout their processes. And they were from a lot of different industries, very heterogeneous data. So it's got some of that noise. But we knew, was this defect injected in a requirements?
Starting point is 00:43:57 Was it injected during a defect design? Was it injected during the design of the product? Somewhere during the testing? Was it injected when they wrote the code? Where did you find it? Was it in one of the reviews? Was it in an inspection? Was it in a test?
Starting point is 00:44:11 Was this post-development? So we looked at the activity difference between when the defect was injected and when it was removed. Now, we happen to have stopwatch-level defect information, how long it took the developer to fix that problem. That is, once it came back to the developer's desk, how long did this actually take to fix? Upfront acknowledgement, that's not including the wasted time the user had to spend figuring it out, writing up the report, or the bookkeeping the organization might have gone through
Starting point is 00:44:43 taking in problems, but the actual developer facing time. And we found that by and large, it just wasn't growing that fast. It was growing, but typically if you found a defect during your development, it was pretty cheap. If you found it after development, it took a factor of typically a couple of times as big to fix. It took longer to find and fix if it was a test type defect compared to, say, a review defect. But once it was any kind of test or field, it was relatively flat. So one of the inferences would be that
Starting point is 00:45:20 modern development has really flattened that curve. I don't know how long you guys have been. When I started programming that modern development has really flattened that curve. I don't know how long you guys have been. When I started programming, my first program I wrote on punch cards. You beat us. I skipped that step. Most people have by now. I didn't use a CRT until I was like a sophomore in college.
Starting point is 00:45:42 CRT? What's a CRT? I'm just kidding. Cathode Ray 2. I'm just joking. But I learned how to program with Emacs. It was really hard in the early days when you had to do these builds on programs and they would take forever because you had to recompile everything based on cards
Starting point is 00:45:59 or you had to submit it to a batch system. The modern development environment with the most things at your desktop, the builds are so quick, or you had to submit it to a batch system. The modern development environment, with the most things at your desktop, the builds are so quick, a lot of things that really took a long time before have been flattened. And developer-facing time on fixing those things isn't all that different.
Starting point is 00:46:17 Yeah, for a field of defect, it can be hard to find the actual source of the problem. Yeah. But those factors of 100 to 1, 10 to 1 to 100 to 1 that came out of some studies in the 80s, we don't see anything like that, at least in this data. So it tells you that either something has changed or there was a bias in how they
Starting point is 00:46:44 collected their data. Which we've talked about. Yeah, but they collected their data in the 1980s. It was a very different world. What I find interesting about your historical touch points there, punch cards, gone. CRT is pretty much gone. Emacs, it's alive and kicking, baby.
Starting point is 00:47:00 It's alive and kicking. It sure is. Richard Stallman. So a lot of the software, I mean, we definitely have modernized quite a bit, but it's amazing how there's certain pieces of software that are still relevant today, despite all the changes in modern development techniques, Vim and Emacs being two of the staples.
Starting point is 00:47:18 Yeah, if you're a programmer, odds are you know at least one of VI or Emacs. One thing mentioned in this study, too, and I don't know if it's significant or not, is the TSP versus PSP, which is the team software process. Is it similar? Are they parallel studies? What is this? Well, the team software process is really the industrial version of personal software process. We taught people PSP so that they would learn how to do things like
Starting point is 00:47:46 record data, how to make estimates. The team software process was how to do this in a real industrial environment so that you could coordinate with your team members, you could report up to management, you could make your estimates, you could see if your project, if an overall project, was really on track. So the PSP was sort of subsumed into the team software process, but the team software process was for planning really big programs. I have some TSP data that for projects that spanned multiple years. So you would look at the data very differently and one of the things you couldn't get from that is you're not going to have the same programmers developing the same thing. It just multiple years. So you would look at the data very differently. And one of the things you couldn't get from that is you're not going to have the same programmers developing the same thing. It just doesn't happen. In the real world, everyone is developing a different piece of software.
Starting point is 00:48:36 So they aren't really apples to apples comparisons anymore, and you can't treat them as such. You have to look at the data a little differently. But they record the same types of data. How long did something take? How big was it? Were there defects? Where did I find them? Where did they come in? Are these processes still taught today,
Starting point is 00:48:54 or have they been subsumed by things like XP and Scrum and other processes? I was thinking about your personal software process and how different it is to TDD in the testing way where you say you can make your test pass the first time. Well, TDD advocates will say never do that. Like red, green, refactor, it should fail, right? So it's at odds with a few of these other processes.
Starting point is 00:49:15 I'm curious if it's still extant, if it's in the world of academia. Tell us about it. Well, it's still being used. By and large, the Agile techniques have more or less taken over the world. And I can give you some various group of programmers, you always had this surge of new guys because every cadre, every new cohort was twice as big as the last one. So you were always
Starting point is 00:49:56 dealing with these inexperienced programmers who needed to do something very simple. Now, you said something very interesting and I hear that again and again about, you know, the program, it should fail the first time. Well, the way we would teach them, and we've taught people to do that, is, okay, if you want to program that way, that's fine. But the first failure is not a failure. Yeah, you should get a red light the first time, but that's not a failure because that's what you intended it to do. you intended it to do. It's expected. It's expected.
Starting point is 00:50:27 What we're counting are mistakes. Mistakes that caused a program and caused you to do some additional rework. Not the first development. That first failure is an expected failure. So superficially, it sounds like they're at odds. They really aren't. But the mindset and the discipline is somewhat different. We were really focused on doing things like being methodical, really engaging in the design.
Starting point is 00:50:53 You see some more of these things coming out now as you move into the DevOps world. The pendulum is swinging back the other way. That's a whole other discussion. Kind of reminds me of that quote, I believe, as Thomas Edison saying, I haven't failed 10,000 times. I've only found 10,000 ways my idea didn't work. That's an optimist. Yeah. One of the things that we tried to teach was that testing alone is never enough because you can't test everything. If you're doing something like the Avionics DL-178C, where they have truly comprehensive
Starting point is 00:51:31 testing, that's extraordinarily expensive. By and large, the best testing I've found will find maybe about half the total defects. So you've got to do other things to find the edge cases. And that could be things like designing it so that they can't happen, inspecting, doing inspections on your design and your code so that you know these code conditions or these input conditions aren't going to hit or that you're controlling for them. But it is almost impossible to test quality into code.
Starting point is 00:52:04 Oh, you need the testing, but it shouldn't be your starting point in the quality. It's really your end point. As the world at large is spending even more time online, search will become a more critical lever for engaging your users and your customers and some good news out there for those fighting the fight against COVID-19 and the coronavirus. Our partners Algolia have offered their pro plan for free. If you are a developer or a team working on a COVID-19 related not for profit website or app, you can get Algolia's ProPlan totally free. Check for a link in the show notes or head to algolia.com. So Bill, we've talked about a few of your research papers. Let's talk about some vapor research here, things that you have currently brewing, which is a very interesting cost-benefit analysis of static analysis and doing that during development. This is something that you have in progress.
Starting point is 00:53:18 You want to talk about it? Sure. I wrote a technical report, but that's mostly just a data dump. Technically, I had to get a lot of the data out there so that I could publish it openly. It had to go through a lot of different approvals. The underlying problem is we have all of these commercial tools for analyzing code, finding defects. You're probably familiar with things like Poverity, Clagg, SonarQ, Clockwork, FXCop. These things are very good at finding defects.
Starting point is 00:53:49 And I wasn't really interested in comparing one to another because they often find different types of things. But the question was, how do they affect the software development process? If I'm a program manager and I have to make a decision, should I spend the money to put these things on the project and have the developers use them? Is that money well spent? So I decided I could look at this from a very narrow standpoint. I don't care about how much they spent specifically. That's a financial problem. But using these tools, how much time do they take? How many defects do they find? And what I was finding was that pretty consistently
Starting point is 00:54:27 the teams that I was following, the teams that I had data for that used these tools were consistently finding defects they were removing them, the removal costs were relatively low and from an economic standpoint at the margins they were actually finishing faster and spending less time in development. So, operationally, at least, these tools had a small but measurable improvement in the performance. Now, the flip side is this was just through the development. You're going to have higher quality product at the end because you have all these other defects that didn't have to go through test, that didn't escape from test,
Starting point is 00:55:10 and aren't going to be found in the field because they were swept out by these tools. So the bottom line in that was that the effects look like they're actually pretty modest. They aren't as enormous as you might expect, but that is stipulating that these were teams that were actually doing a pretty good job on quality already. I have some anecdotal data from a guy at the Veterans Administration, and he was ready to pull his hair out at the quality of codes that were being given to him. So he basically laid down the law. You're not going to give me this delivery until you've run it through this commercial static analysis tool and resolved every defect that it finds. It doesn't mean you have to make all the changes, but you have to run this tool and you have to dispose of all of these problems. Either it was a mistake or it wasn't.
Starting point is 00:55:59 And until you've done that, you're not done. And the first time he did that, he had like 109,000 reports on 100,000 lines of code. That's a lot. Sounds like fun. That's a lot, yeah. But he found that once... More than one per line? Is that more than one per line? More than one per line, yeah. I'm just doing the math. I'm like, wait a second.
Starting point is 00:56:23 Yeah. But he found that once it became clear that he was going to use this as an acceptance criteria, part of his acceptance criteria, they would have to build it into their process and just start getting used to it. And he found that his data was suggesting they were actually coming in faster. So I wanted to look at that atomically. So I used our TSP data, which said all of these defects, we found all of these defects, which ones specifically were being found by the static analysis. How much time was spent running these tools and finding the defects and fixing them. And I did some modeling to make some estimates on how much time was being saved downstream, because now you're not going to be finding quite as many things in test but i said okay it looks like this is not taking longer anyone who says this is going to take longer is probably wrong so operationally these things
Starting point is 00:57:17 actually speed up development so you could not make an economic case not to use them why would you not use them well Well, maybe they cost too much. That's a different argument. But operationally, the data says they work. There are other things that work too, and there's evidence that they find different things that you might find using tests or using other types of inspection techniques,
Starting point is 00:57:43 but there's no reason not to use them. Did you apply the die model to this by any chance? Because if you got these defects in there over time, other inherent costs or hidden costs that sort of like linger and hide? Yeah, good question. What I did was for each of the teams, I created a parametric model. And I've done this before.
Starting point is 00:58:04 I didn't send you a copy of this one, but I did a paper a few years back for Software Quality Professional that showed how you could model the economics of some of these things. The model included how much time you're going to spend in these different activities. How much time have you spent doing design, coding, testing? How many defects do you inject in each of these activities? Where are you finding the defects? How effective are these activities at finding and removing defects? And for every defect, how long does it take? So I did some parametric averages of the defect finds in different phases, and I used that parametric model to estimate if I find this defect in static analysis, how much time is that going to save me downstream?
Starting point is 00:58:46 Oh, I only had about a 50% chance of finding that in test, and it was going to take about three times longer. Okay, that was a net win. Yeah. But that's the way I did the modeling. And the actual parameters I put in the paper. The model is not dissimilar to some of the things you've seen from, well, you've probably heard of Kokomo. There's a variant of Kokomo called CoQalmo by one of Barry Bain's students. And it turned out that our models have a lot of commonality. They aren't
Starting point is 00:59:20 exactly alike, but they look enough alike that we could talk to each other about it. I have not heard of Kokomo. Adam, have you heard of Kokomo? Oh, Kokomo. Kokomo is a parametric software size estimator. If you look up Kokomo, Barry Boehm, he came out of TRW with this originally like in the 80s. It was, how can I estimate the cost of a big software program before I begin work? Based on Google, it says Kokomo is a constructive cost model, is a regression model based on lines of code, LLC. A number of lines of code is a procedural cost estimate model for software projects and is often used as a process of reliability predicting the various parameters associated with making a project such as size, effort, cost, time, and quality. Thanks, GeeksforGeeks.org.
Starting point is 01:00:09 Exactly. So basically with something like a Kokomo, you'll get some kind of wag about how big is the software going to be. It might start out with a function point estimate, but you have to get it into something like lines of code. You'll put a bunch of tuning parameters like what is the experience, what is the experience of the developers who are going to work on this? Is this precedented?
Starting point is 01:00:30 Various types of parameters, and you get an estimate out at the end. containment model, parameterized defect containment model, to estimate the cost associated with managing quality during the development. So you mentioned you had anecdotal evidence on the improvement side. And you had good results on the TSP data that you had. Yeah, my data suggested the anecdotal evidence was consistent with the anecdotal evidence.
Starting point is 01:01:07 Right. But what would be really cool would be the same teams with and without the static analysis. The teams that might have already been performing well
Starting point is 01:01:14 because of good practice. Yeah. Got a couple 10Xers in there. Just kidding. It's all, it's a recursive problem here. Right. Were there 10Xers?
Starting point is 01:01:25 Were there not? Did they really exist? Yeah. What you'd really like to see is some kind of longitudinal study where you introduced it. But now you're getting into why getting these things out of software is so hard. Yeah. You need to raise some money to do a longitudinal study like that. That would be a lot of work.
Starting point is 01:01:41 And getting people to cooperate with that sort of thing. Right. Well, the tooling, some of the tooling you mentioned, Kokomo and others, like I hadn't heard of before, but it reminded me of like Code Climate or other analysis tools that sort of try to infer or determine problems, essentially. You know, whether it's a particular language or a particular framework they're analyzing. And I'm just kind of curious, of those that you're studying, what kind of teams are using those kinds of tooling? Is it large-scale teams with 500-plus engineers or 100 engineers? What size teams?
Starting point is 01:02:16 What are some of the parameters there versus, say, maybe a modern team that tries to be smaller, eight people or less, one product manager, a few developers, a designer. How's the teams layout using this tooling? Okay. What I found, the teams that we were working with were typically in the modern team size. If you have more than about 10 people, it's generally because you've got multiple teams
Starting point is 01:02:41 working together. Multiple teams working together on a bigger project. So most of our data were from teams in the order of five to ten people. However, for the one study, I did have this one team that essentially had, I think it maxed out at about 40 people. So they were kind of an outlier. That was in a military application, longer term. So a lot of those people were on or off the project at various times. I'd have to go back in the data and look a lot more granular inspection
Starting point is 01:03:11 to see what were the most at any one time. But most of the teams were in the 5 to 10 range. Well, let's use Jared as an example here. Jared, you always mention how you haven't had a typical trajectory of a software developer. You haven't worked inside of a large enterprise before. You've never used tools like this? You use tools like this?
Starting point is 01:03:31 What kind of things do you use to find your bugs, to find your defects? Is it error tracking or is it users? That question presupposes that I write bugs, Adam, which is a false dichotomy. No, I fix my bugs in production. I don't know. I'm not a good software engineer,
Starting point is 01:03:46 so I just kind of roll with the punches. A team of one with very few quality controls in place is how I've been rolling. I do test-driven development to a certain degree. I'm not a red-green refactor zealot. I don't think you have to go red. But I do write tests as I go, sometimes exploratory. Sometimes when I do know the solution
Starting point is 01:04:05 i just want to like harness it as i develop it and then i i add regression tests uh when there are bugs that are fixed but i don't use any sort of static analysis i don't use very much tooling at all but i think that's probably because of how small my teams are which usually myself maybe one or two contractors and that's it. And so it's just so much easier to manage when you don't have more people. So Bill, in this case, then hearing Jared's example of his cycles, and knowing what you know about all the studies you've done, do you believe that a static analysis tool would help a Jared out there?
Starting point is 01:04:47 If you think about the kind of warnings you get in an IDE or a compiler, the typical static analysis tools simply find more. And what I've typically found is that if you use one of these tools before you start running tests, you typically are going to find some things before you get into test, and some of those might have turned up in the test. So I think you probably would find that they're likely to be helpful. Not guaranteed, because it depends if you've got to pick the right tool that's aligned with the type of environment you're going to be using. I mean, operationally, it's going to be finding the right types of bugs. So it's an indicator light.
Starting point is 01:05:27 Here's more things to pay attention to, right or wrong. Yeah, and what I've typically found is that a rising tide floats all boats. If you find a bunch of these warnings, some fraction of those were going to turn up as testing problems. Some of those were going to turn up as testing problems. Some of those were going to turn up in the field. And if you are unfortunate enough to be in a domain where you've got real security issues, some fraction of those would actually turn out to be a security violation and might actually be exploitable. For sure. But it's a small number. It's a very small number. And the best way to get something really secure into the field is make sure it doesn't have any bugs before you ship it.
Starting point is 01:06:07 So you have all of this data. We talked about your programmer productivity data, this 10 years of PSP exercise results and timings. We have this static analysis data and you're slicing and dicing it, you're analyzing it, you're putting your thought into it. I'm curious if you ever thought, what if I just put the data out there, publicized the data, made it public, open license or whatever, and invited more people to get more eyes on it.
Starting point is 01:06:37 That way maybe you get more insights from the same amount of information. I've actually done that fairly recently. If you check the citations in the paper, I think I do refer to where I put the PSP data. Okay. If I didn't put it in that paper, then
Starting point is 01:06:53 that was my mistake. I know I have put it out there since. Which paper, Bill? I think I put it in the IEEE. I believe I put the IEEE paper into the IEEE's repository. I should have also put that out somewhere on our SEI site. Well, it might be available from SEI.
Starting point is 01:07:15 It may not. There was a lot of resistance for a long time because there were concerns about privacy. What I did was with that data, I really stripped out everything that could be even vaguely identifiable. So it was real easy for the PSP data. That is out there. I think the biggest barrier
Starting point is 01:07:36 is if you haven't taken the PSP course, the data is pretty overwhelming. And it's really hard to get your head around. I found that unless I worked with someone, if they hadn't taken a PSP course, they really didn't know what to do with
Starting point is 01:07:51 the data. And the problem is even more so with the team software process data, because the data itself is noisier. You don't have those constraints. You have to understand the data to understand where the noise kind of lives and operate around it. Yeah, so it's out there.
Starting point is 01:08:08 It hasn't been out publicly available for a long time, but I've historically had very little pull. Tim Menzies is one of the few who's come to me for data. Gotcha. So the Programmer Productivity paper was published in September of 2019 or in the September-October issue. I'm curious if there's been a response that's been measurable from the community? Have people given you high fives? Have they said, oh, come on, Bill, you're full of it? What's been the response? That's a good question. I was kind of surprised at just how big the response was. It went kind of viral on places like Reddit, Stack Exchange. Ended up with thousands of hits on our SEI website when I wrote the blog. Very little came out of the IEEE, by the way.
Starting point is 01:09:01 But when I did a blog on the SEI website in January, that went viral. And then people went back and looked up the IEEE paper. And it's kind of interesting because a lot of the comments you get on the places like Reddit were mostly critical. They were bringing up things that the study didn't really aim to do. Like saying, that's not how you measure programmer productivity anyway. It's the unicorn developer can solve the more difficult problem. Okay, well, that's fine, but that's not what the study was aimed at.
Starting point is 01:09:35 I got some pushback on the other side from people like Steve McConnell about, well, what are all of the things that you haven't accounted for? What about attrition bias? I still see a lot of high rates of variation among individual developers. On your paper, why did you stop at the 5% and the 95% ranges and this and that? So yeah, there were plenty of criticism, which frankly I welcome because at least people are talking about it.
Starting point is 01:10:05 Yeah. And they're actually discussing the issues. Well, it also depends on what you're trying to get at with, you know, what the point is. What are you optimizing for? Are you optimizing for the absolute truth or, you know, some sort of indicator of truth so that we can be better at what we do? And, you know, maybe try to hire the 10Xers, good luck, or find the base hitters like we've talked about.
Starting point is 01:10:29 Yeah, and more to the point is, what question are you really trying to answer? Because one of the things that came out was when people say 10x programmer, they don't always mean the same thing. Some people mean it very literally. There are plenty of papers that study the productivity, factor of 10 productivity differences. But that's not necessarily what all the programmers mean.
Starting point is 01:10:51 They mean the 10 times, they mean the unicorn. 10 times is really kind of a metaphorical 40 days and 40 nights type of thing. Yeah, most people that I hear say the 10x developer nowadays, maybe 2020 time range, are speaking not so much as somebody who can code faster, better, stronger than anybody else, but it's really the multipliers inside of a team.
Starting point is 01:11:14 It's the person that can make everybody else around them better. It's the Scotty Pippins on the team. It's like, that team is better because Scotty Pippin, NBA, 90s Bulls reference for those out there. Look up 90s Bulls, they're pretty good. It's a player like that.
Starting point is 01:11:32 Everybody else is better because Scotty Pippen's on the team. That person is a force multiplier, even if they aren't going to finish these exercises faster than anybody else. I actually will have a lot of sympathy with that because among the things I've talked about, how do you as a manager improve the productivity
Starting point is 01:11:52 of your team? It's putting together the right team, getting the right set of skills. I'll give you another example. Have you ever heard of William Shockley? No. No. He was more or less the inventor of the transistor. Okay.
Starting point is 01:12:06 Everyone hated the guy. He founded one of the big electronics firms out in Silicon Valley. He basically made Silicon Valley what it is, along with HP. Bell Labs? He was such a terrible person to work for that the guy who founded Intel left and made a fortune because he couldn't stand working for Shockley. But Shockley wrote a short paper thinking about why are some people so much more productive than others
Starting point is 01:12:32 from writing papers? And he sat down and thought about, well, let's see. To write a good paper, you have to have a good idea. You have to be good at managing the research. You have to be really good at collecting and managing your data. You have to be really good at writing. And he put together a list of maybe six, seven items, just brainstorming, that you had to be good at. So he said, okay, what are the odds that someone's going to be better than average in these seven different categories?
Starting point is 01:13:06 They said, oh, that's going to lead you to a log normal distribution, which is kind of what we see, isn't it? Well, I would take that as, yeah, you're not going to find too many people who are better than average at seven different categories. Right. different categories. But you can build a team with people who are better than average at one or two of them. And now you've got a team that can do that sort of thing. Those are your force multipliers. Yeah, I think in today's age we see the aspects of community forming, for one, but then the multiplier of good teams. Like here but then the, the multiplier of like
Starting point is 01:13:45 good teams, like here at Chainsaw, we have a great team. You know, it's because it's not because I'm amazing or Jared's amazing. It's because the whole entire team works very well together. You know, and I, I've been a proponent, have even come from the military too, to focus on team advantages versus individual advantages. Now you'll always have that individual that will be the Scottie Pippen, the force multiplier. You're going to have that. It's going to be natural, but rather optimized for a strong team
Starting point is 01:14:09 than finding that person because that person will eventually just naturally appear given enough effort. And I don't have all of your data. I haven't surmised that from all of your data or all your insights, but that's at least my personal uninvited opinion, or maybe invited, who knows. And that's the sort of thing I would recommend when you're trying, as a manager in an organization, when you're trying to build teams, when you're trying to build a strong team, do the things that help you build a strong team.
Starting point is 01:14:37 Don't wait for the Scotty Pippen to come around. Look for the ways to put together that effective team and kind of work well Harvey Brooks beat the mighty Soviet Red Army team with a bunch of college players just shows you anything's possible really Bill thank you so much for I guess your dedication to the craft and to teaching without people like you in the game
Starting point is 01:15:02 since we're speaking metaphorically about games and teams it makes sense without people like you in the game, you know, since we're speaking metaphorically about games and teams, it makes sense. You know, without people like you in the game, you know, we need people like you. So thank you from our side and from our audience's side to do all the teaching you've done and all the insights and research you've done and for sharing that. I mean, it takes a lot of dedication, a lot of time, and we don't discount that one little bit. We thank you for that. Well, thanks for having me. You ask great questions. It's really a pleasure to talk to people who can put those challenging
Starting point is 01:15:32 and insightful questions out there. Well, that's what we aim to do around here. And, Nadi, thank you for listening. Bill, thanks for coming on today. Appreciate your time. You're welcome. All right, the next step is to sound off in the comments. If you have some thoughts, I'm sure you do on the 10 X developer myth is bills research
Starting point is 01:15:52 botched. Is it biased? Is it true? What do you think? Hey, the change law.com slash three 88, let us know. And we get asked this all the time. How can you support us? Well, the best way for you to support us is to tell your friends, literally that's the way to help send a text send a tweet literally pick up the phone and call somebody hey i listen to the changelog and you should listen too and as you know jared and i we host the changelog our beats are produced by the beat free breakmaster cylinder and we're brought to you by some awesome partners fastly linode and rollbar oh and one more thing we have a master feed that brings you all of our podcasts into one single feed. It's the easiest way to listen to everything we ship.
Starting point is 01:16:32 Head to changelove.com slash master to subscribe or search for Change Love Master in your podcast app. You'll find us. Thanks for listening. We'll see you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.