The Changelog: Software Development, Open Source - The 10x developer myth (Interview)
Episode Date: March 31, 2020In late 2019, Bill Nichols, a senior member of the technical staff at Carnegie Mellon University with the Software Engineering Institute published his study on "the 10x developer myth." On this show w...e talk with Bill about all the details of his research. Is the 10x developer a myth? Let's find out.
Transcript
Discussion (0)
You know, they can't be all the skills at once.
While they may have them all, they can't be them all at once.
Exactly.
And even more to the point, you can't teach people to be intelligent.
I mean, you've got to have a certain base capability.
But most of these other things, like using a tool, using Git or others, revision control effectively is a skill.
It takes skill and it takes repetition.
Doing good design involves a lot of skill and a lot of practice.
Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com. We move fast and fix
things here at ChangeLog because of Rollbar. Check them out at Rollbar.com. And we're hosted
on Linode cloud servers. Head to Linode.com slash ChangeLog. Linode makes cloud computing simple,
affordable, and accessible. Whether you're working on a personal
project or managing your enterprise's infrastructure linode has the pricing support and scale you need
to take your ideas to the next level we trust linode because they keep it fast and they keep
it simple check them out at linode.com change log Welcome back, everyone.
You are listening to the ChangeLog, a podcast featuring the hackers, the leaders, and the innovators in the world of software.
I'm Adam Stachowiak, Editor-in-Chief here at ChangeLog.
In late 2019, Bill Nichols, a senior member of the technical staff at Carnegie Mellon University with the Software Engineering Institute,
published his study on the 10x developer myth. Today, we talk with Bill through all the details of his research. Is the 10x developer myth? Let's find out. So Bill, you are a senior member of the
technical staff at Carnegie Mellon's SEI.
That's the Software Engineering Institute.
We're here to talk about your paper, which you published in IEEE Software back in September,
called The End to the Myth of Individual Programmer Productivity.
A good title caught my eye.
First of all, thanks for joining us.
Oh, you're welcome.
Thanks for having me here.
Good to have you, Bill.
So there is a lore, there is a myth, or maybe it's not a myth.
You're here to end the myth.
We're going to talk about it, about the 10Xer.
Maybe even more than 10X, but definitely that particular term 10Xer has become a thing wherein certain programmers are massively more productive than other programmers.
And it's really based out of anecdotal or experiential evidence in many cases.
And I thought it was interesting that you actually went and did some research, did a study,
and so we thought we'd bring you on and talk about it.
First of all, I'd love to hear why conduct this study and why this is something that interests you.
Okay, well, let's start off with, I'm working on a research project ongoing called SCOPE.
I won't go into the acronym, but it is looking for what are the factors that affect software
development?
What affects the cost?
What affects the productivity, the durations, and so forth?
And one of the things I wanted to look at were some of the smaller micro factors
that we could look at almost experimentally. And I had this
data sitting around from the personal software process course we
taught for 20 years. I had essentially 20
years worth of data. I think in this study I used about 10 years
of it. And I wanted to see if we could
find evidence for the effect of the individual programmers. And a lot of what turned up from that
study was mostly a null result. I couldn't find factors that affected programmer productivity.
For example, we looked at things like experience.
No real effect.
To the extent we saw anything,
it looked like experience actually
was inversely related to productivity.
Okay, I have some potential explanations for that.
Yeah, well, what it probably means
is that the best programmers were being promoted
out of programming into management.
Well, that's definitely something.
We've seen that happen, yeah.
It was a very small effect, just to be clear.
It was statistically significant, but very small.
The other thing to keep in mind is that during this entire period, the number of programmers was growing very rapidly.
So there might have been an effect based on training.
More people were going through universities, were being better trained.
There are a number of things it could have been, but it wasn't a big effect.
So I decided I wanted to look at this in more detail because it just seemed kind of odd.
So I started looking at how big is the variation in the programmers
themselves. Well let me back up and tell you a little bit about what is the the
data set. When we taught the personal software process we were actually trying
to teach them how to estimate work and how to improve their quality, how to
reduce the number of defects they put into
tests, how to find defects before they got there, and so forth. And the program, we used programs
to reinforce the learning as practicals. And we used the same programs for a number of years. So
we had essentially several thousand programmers go through this course,
all of whom programmed the same 10 exercises from the requirements.
Now, they used their own development environment, their own programming languages, and so forth.
But that was kind of a unique set of data.
And I decided to try to take advantage of having this large cohort of developers
who did exactly the same program and look at how they varied.
What kind of exercises are these? Are they small scope things?
They're relatively small.
I would characterize them as the kind of thing that a reasonably experienced programmer
should be able to complete in an afternoon.
Okay. experienced programmers should be able to complete in an afternoon.
So somewhere between two to six hours would be the norm.
If you have some trouble, it might take a little longer.
We had people who sometimes took up to 10 hours, and they were outliers.
But basically, if you were thinking about, these were not terribly challenging problems.
They were non-trivial, but they weren't terribly challenging.
Things like you had to program a linked list and use that in a program to read in some data and sort.
Some basic introductory things like that.
Some of the later programs involved.
I think the most complicated was the last exercise, the next to last exercise, where they had to build a matrix inversion. They had to solve a set of linear equations for multiple regression
using a matrix inversion.
But they were given the specifications.
I was going to say, are these things being taught in the class?
These are things that they tell you the algorithm
and you go ahead and implement it, right?
Right.
They were taught, they were given the algorithm.
They were given any special knowledge.
Like when they had to do a Simpsons integral, which is a numeric integration.
But they were given the algorithm.
So they didn't have to do the hard analytic part, they just had to program it.
And we found that pretty much anyone who is able to get through a university and work as a professional programmer can do these sorts of exercises.
They shouldn't be those out-there type of problems
where you have to be a genius to figure them out.
In fact, it should be pretty much normal.
So I think the context of this course is helpful as well
because at first I was like, well, why are there people,
you said there was people with an average of 3.7 years experience,
but there were people with 36 years experience.
And I was thinking, why would somebody with 36 years programming experience
be taking a college course?
And I think the context of this personal software process.
So tell us about what you were teaching.
This PSP was not something I even heard of before.
So maybe it's very common and I'm out of the loop.
Maybe it's old-fashioned, I don't know.
But tell us about that and the kind of people that were in the class.
All right.
Well, PSP, Personal Software Process, wasn't all that common.
But it was a technique we developed at SEI, actually mostly the work of Watts Humphrey.
He was a little bit of an aside.
You probably have heard of The Mythical Man Month, the book The Mythical Man Month?
Yeah, Frederick Brooks.
By Frederick Brooks? Okay. bit of an aside. You probably have heard of the book The Mythical Man Month by Fred Brooks.
Okay, what not everyone knows
is Fred Brooks was fired
from that project.
The one that blew up on it.
And he wrote the book about it.
Watts Humphrey was the guy who took over the project
and brought it in.
Huh, I did not know that.
It's a little bit of history.
So I worked with Watts for a number of years.
It's part of what got me to go to the SEI.
And I took the personal software process.
It was really part of teaching developers how to plan and manage their work
so that they could work on an industrial project
and know how to talk to their management, negotiate schedules,
bring the projects in on time.
And the two key things we taught were estimation and quality.
The main things in quality were applying design techniques,
then showing them how to do disciplined reviews of designs and code
so that you'd find your defects before a test.
And what we tried to do was show them, hey, you can actually write a program
and you can get this to compile first time without getting compile
warnings. Now, in the days before IDEs, that was a big thing.
Yeah. And we showed them that you should be able to
get into test and it should run the test first time.
It's not going to be, you're not going to get into here and run the test to find out where your problems
were and fix the ones that hit. You're going to actually get it to run the first time.
And the thinking here is this makes the entire development
process more consistent, makes it more predictable,
and gets you a much higher quality product in the end.
So there were a handful of adopters,
and the people who were in these PSP courses were from industry.
There were some university classes, but 90% of the data was from industry.
It was from organizations that we were working with,
trying to teach them how to manage software projects.
And part of the training was taking the PSP course.
So the developers on the team would all take the course. They'd learn these
techniques for how to measure the work, how to estimate it, how
to improve quality, and then we would use that to help them manage their project.
And we had some very notable successes with that.
I could go into a few examples, but probably the most spectacular was the rewrite of the Bursatech trading engine.
You probably haven't heard of it, but Bursatech is the stock trading engine they use in Mexico.
And they were falling behind, and there were really only a handful of engines in the world.
A handful of those big trading engines for stocks, derivatives, and so forth.
One of them was in Mexico.
And their choice was either rewrite it because it couldn't keep up with modern times,
or buy the one they were using in New York or London.
So they decided to rewrite it.
And they, with a relatively small team,
I think maximum was a couple of dozen people,
it took them a little over a year.
They put together probably about 500,000 lines of code,
and they had to run it in shadow mode before it went operational.
In the first year of operations,
it had a total of downtime that was on the order of minutes. Literally minutes of downtime.
And their performance was about a factor of 10.
They could handle about 10 times the number of transactions of any system
that existed at the time. So that was part of the
benefit of doing things right the first way, doing things in a disciplined
manner. And that was kind of where all of this PSP training
was coming from. But what that really led to
was I've got this large body of data from the PSP training
that we could repurpose for other things.
Like knowing how well people can really estimate. What were the results at the beginning
versus the end of the course? We can measure things like how many lines of code do normal programmers tend to be
able to write? What's their variation? We haven't finished the study yet, but we started looking at
the differences in the programming language. All these people using a variety of different
languages. Some used C, C++, a few in Fortran, all sorts of boutique languages, Java, Basic.
How do these differ as far as how long it took them to
produce the code, how big the programs are, how defect-prone they were?
Lots of different stuff. All sorts of different things you could look at observationally
with this data, yeah. I know we're here to talk about productivity, but on the
estimation side, give us the summary. Is estimating a reliable thing you can do? Oh yeah, well you can get
reasonably good, but there are limits. We found initially people were consistently underestimating
the work involved. Act surprised. Exactly. Yeah, okay,, what we taught them was, well, if you can get a good
estimate of the size, that'll be a really good proxy for how long it's going to take. It's not
going to be perfect, but it's going to be a good proxy. So we focused on how to teach them
size estimate. We used a t-shirt technique, you know, do a preliminary type of design,
not a real design, but just what are the pieces you need? How big are these pieces? Are they
bigger than a bread basket? Small, medium, large? We've got an inventory of
stuff you've already written. Put them into bins. Sort them. What does a big look
like? How big is a medium? And by just
throwing these into bins and aggregating them, you can get a pretty accurate result.
We found that they could normally get to about, they would typically start
off with a variance of about 50%, and we could get it to about 30%.
Yeah.
But at the end, they were centered on the average instead of way underestimating, and you didn't get the big outliers.
So it was not a spectacular result, but it was pretty consistent.
We showed, yeah, this really does kind of work.
It also says that the best you're ever going to do, probably on a project of about a day,
is going to be somewhere around plus or minus 30%.
That's just life.
A lot of variation in how long it's going to take you.
And these are personal estimates too, right?
These are not team estimates.
Right.
These are personal estimates too, right? These are not team estimates. Right. These are personal estimates. And that really should have been the hint
what I was going to find when I started looking at individual productivity.
Why should that have been the hint? Well, we had
these high variations, these high variations in their estimates
and that kind of suggests that there are other factors
involved here.
It's not just a matter of estimating.
It really suggested, it suggested without demonstrating,
that people would sit down to work
and it was probably not going to be what they thought
or they might be running into days when they're slower
or days when they're faster.
And when I started to really dig at the data,
that's where it kind of
got interesting. What I did initially was I said, okay, everyone's done the same program.
What was the variation of how long it took them to get that program done?
So I'm not going to complicate this by looking across different languages. Let's just look at
the people who use the same language. So that's why I trimmed it down to about 13% of the data set
and just looked at the C programmers. I can do this with C++ and Java with slightly smaller
sets. Beyond that, it gets into really small data. But let's use the same program. Okay,
now I turned out there were 494 of these developers. Okay. So I ranked them one through 494.
I used 500 because it's close and it's a round number.
There were essentially 500.
Okay.
And I could actually, rather than looking at the size of the program and lines of code or anything,
I just said how long did it take them to finish.
Okay.
Then I did the same thing for the second program
and the third.
And one of the things
that really struck me
was that the distribution
of the times
was just remarkably consistent.
That is,
the ratio of the fastest programmer
to the 25th percentile,
the median,
the 75th,
they look like almost straight lines if you plot these across the 10 programs.
There was a lot of jitter when you looked at the
outliers, the ones in the slowest 25%.
A lot of variation there. On the edges.
But everything else looked pretty consistent. It's funny how that is on any plot
really.
When you've got the extremes, you've got the far right, the far left, and the middle,
you tend to have a lot of jitter on the fringes.
Yeah, but the thing is, on the fast side, you can only go so fast.
So that was kind of constrained.
You're not constrained on the slow side.
So that's where you saw most of the real variation.
There's no control for quality in there,
because I always think of the three little pigs, you know,
and the big bad wolf, like that pig who built their house out of straw was done first.
So, I mean, you can schlock
together a solution that's not
quality and that doesn't really mean you're being productive.
It just looks productive.
That is true. To the extent that we
had the controls on quality,
we were measuring the number
of defects that turned up when they ran tests.
So I could look at those independently
and we were kind of guiding them along the way
that they were doing certain quality things.
But you are correct.
There's no guarantee that they were
consistent in how
carefully they were handling all of the edge conditions.
And we
basically told them, don't worry about all the edge
conditions. Your job is to get this program to run.
Don't gold plate it.
Gotcha.
So we were trying to guide them to give us,
if not the minimal program that would solve the requirements,
get something not far from just the most basic
that satisfies these requirements
and runs correctly by completing these specific test cases.
So all C programmers based on the subset,
all producing the same program with similar control patterns,
at least,
and understandably similar defect testing in terms of the test written.
So as close as you possibly could be to control.
Yeah.
We applied a certain number of controls to it.
We weren't deliberately trying to control it because this was not meant to be an experiment.
Well, my next question on that front is, did they know they were being tested on this front?
Did they know they were being a data set for productivity?
Obviously, they're in this course, and it's about that.
That's the beauty is he backfitted.
Well, there was no
real intention
of ever making it
a test for productivity.
That was something
that we looked at
after the fact.
That actually kind of
makes it more valid
because they were just
moving at their regular pace
versus like you telling me
I'm in a race.
I'm going to race
as fast as I can.
Exactly.
And there is no intention.
The whole point of this
exercise was to help them do something predictably.
We wanted them to be in their normal development environment.
So what you found were the, let's talk about the outliers a little bit,
because in my mind, that somewhat lends credence to the lore
of the unicorn developer who is the 10Xer, who is the outlier?
Isn't that kind of the mystique
that people are trying to hire those outliers
or find the people who really are that way?
There you go.
That is the mystique.
Now, the funny thing is,
when I looked at the data,
if I looked at these things
using a lot of different techniques, some of them I borrowed from social science on looking at the variation.
Tim Menzies told me, no one's going to understand this and it's too complicated.
Let's try to simplify it.
So that's why I started doing things by ranking them, just doing ranked analyses.
It turned out the same people were not in the same position in all ten
programs. Nowhere near it.
One of the plots I'm the proudest of, if you look at
the plot number two in the paper, what I did was
I took the programmers' median ranking
that is, where did they finish among their peers on their median?
They basically did 10 exercises.
Was their median number 10?
Was their median number...
Well, I ranked their medians.
I'm sorry, what I did was I took their medians and I ranked them.
I ordered them, sorted them.
What I did then was I took their range.
So I used basically a binomial type of thing, non-parametric.
I think I used take the 10 programs, throw out the two smallest and the two largest,
and then you say, what are the odds of being above this 8 out of 10 times?
And that gives you like a 90% confidence range.
Their true median will be somewhere in there.
And their range, this range of where you could place them
was pretty close to half the sample.
It was very hard.
It turned out that that range turned out to be pretty close to about a quarter of the sample.
And if you took their full range, it was about half the sample.
So that means the same developer, take your 250 out of 500.
Some days he's going to be at 250.
But almost as likely going to be at about 125 or 375.
And if you go down to the very best programmers,
the ones who were typically the fastest few,
their numbers typically ranged up to around 100.
Where you saw the most variation was out at the slow end.
And you should be aware that not everyone who took the class was a professional programmer. We had some people here who
were managers who hadn't programmed in a while. We had some people who
might have been with the process improvement group who weren't current.
So it's not too surprising that we had some outliers on the slow side.
But what the data was telling me is that, yeah, there's definitely
a difference between your best programmers and your weakest programmers.
It gets really hard to tell the difference between a good programmer and a pretty good one.
You can kind of tell an average programmer from your very best, but you better have a few data points because there's a good chance that you're going to get them wrong if
you just take one sample did you have any of the this spectrum across the years so it was like
you mentioned it was over a decade right 10 years is that right so it was over a decade was there
any variance in terms of like when like first five years it was better or worse? Or was the year range by any degree different?
The honest answer is I haven't looked at that carefully.
It's actually a pretty reasonable question.
I don't know if I have enough data points to get that accurately.
But it's a very good question because if there is an effect in things like the training,
that's the kind of place where it might turn up. Yeah. I was just thinking in terms of just the fact that, you know, over time things happen inside of people's lives that may not in others.
And maybe, you know, first five years there could have been a, you know, global pandemic happening or not.
You know, so things will change obvious human behavior, you know, so that might be an outlier or that might be an indicator of
data change. Because
someone could go through a divorce, you know,
have a loss of life in their
family. Different factors that weigh on them
personally over time.
Well, nobody was in it for 10 years, though.
These are different students. Right. That's a good
point. So
one of the things that I was looking at
is the difference between what I would call between and within.
In the social sciences, you'll have differences between groups, things like socioeconomic status.
And you'll have some differences between groups, but you'll have some differences within groups.
My group was the individual programmer. Each programmer was his or her own group.
So this was a repeated measures approach.
They took 10 samples, repeated measures.
And what I say within, I say,
what is the variation of an individual programmer?
What is that programmer's variation
across these 10 exercises?
And those all were being done within a few weeks typically.
So that was a very small time span.
I did not get a chance to
look at their behavior over time. We didn't get a chance to revisit that. But we did get a chance
to look at what was the behavior of this overall super group of programmers, what was the variation
there, and what was the variation of the individual programmer. And the funny thing is when I looked at the total variation, when I did things
like regressions and very conventional measures variation, it was almost 50-50. That is, almost
half the variation was what was total group-wide among the 500 developers, and about half the
variation was within individual developers. So it told me that the individual programmer varied from himself
almost as much as the different programmers varied from each other.
Now, again, to be clear, that doesn't mean that some programmers
really aren't better than others.
It tells you that there's a lot of noise in this data.
And if you try to pick out the superior programmer from
one sample or a couple of samples, you're probably
just chasing noise. And that's kind of where I had the
observation that all of these experiments that people have run before
and I cite them in the paper, DeMarco and Lister, some
observations from Curtis, the original paper by Sackman.
They all basically took a snapshot. How did the
group of developers do on a program? And I said, well, look,
I reproduced their data. On any one of my programs, it looks
very much like what they were showing. But if I do use the
same people on another one,
I get the same distribution.
It's just all the people moved around.
So in a sense, I can replicate their data,
but it seems to indicate that it means something different,
that we've been ignoring that huge variation
of how the individual performs from day to day.
Different programs, different problems.
Maybe you didn't sleep well the other night.
A more typical problem might be you made some boneheaded programming mistake
and you just couldn't see it.
You didn't see it until someone walked by your desk and said,
what's the problem?
And they saw it right away.
How many times have you programmed and that's happened to
you you just get stuck for a while too much too often too much and and all i'm telling you is
take some shower time to get past yeah take a walk or a walk or a bike ride that's exactly the kind
of thing we told them to do so when they did that by the the way, they stopped the clock. They were taking the time.
It's not the wall clock time or the calendar time.
It's literally stopwatch time.
They were using a tool where they would hit a timer button.
I'm working on this program.
I'm going to take a break.
I'm going to stop.
Gets rid of a lot of noise.
So the message here is that that kind of variation, that kind of problem is normal.
If you see that happening to yourself, it just means you're normal.
In this new world of remote first, more and more teams are looking to build video into their apps.
Everything from media publications, education and learning platforms to communities and social platforms. If you're
trying to build video into your app, you're probably deciding between having full control
by building yourself or faster dev cycles with an out-of-the-box platform. Well,
Mux gives you the best of both worlds by doing for video what Stripe has done for payments.
In a world of complicated encoding, streaming, multiplexing, and compression,
Mux simplifies all things video to an easy-to-use API to make beautifully scalable video possible for every development team. Mux lets you easily build video into your product with full control
over design and user experience. Videos delivered through Mux are automatically optimized to deliver
the best viewing experience, and you don't have to deal with the complaints about rebuffering or videos not playing.
Get started at get.mux.com slash changelog.
They're giving our listeners a $50 credit to play with.
That's over an hour's worth of video content that you can upload and play with.
And check out all their features, including just-in-time publishing,
watermarking, thumbnails, and GIFs.
To get the credit, just mention changelog when you sign up or send an email to help at mux.com and they'll add the credit to your account again get.mux.com changelog
so bill if you were to summarize your findings i know we talked about them some but if you were to summarize your findings,
I know we've talked about them some,
but if you were just to lay it out,
here's my takeaway from my findings of doing this research and digging into this data I already had.
Could you give us the executive summary?
What's your take on this scenario based on what you found?
My short summary would be,
there's a lot of variation in program developers,
and you're going to see about as much variation with one developer within that developer
as you're going to see between the developers on your group.
So that's one of the things that you just have to be willing to deal with
when you're going to be planning your projects, that there's going to be a lot of variation. And the big source of variation is really just the individual variance.
That'll be at least as big as the problems you have
with some being better than others, as long as everyone is capable,
as long as you have people capable of doing the work.
What's your goal with the findings?
Who are you trying to communicate primarily to?
Are you trying to help teams be more efficient? Are you trying to help larger industry
users, the gigantors versus
the small teams? Who are you trying to really influence with this summary?
That's a good question. And I wasn't really trying
to influence anyone in particular, but if I were giving advice, I would say what you really need to do is focus on hiring good quality people and don't obsess on trying to find that unicorn developer.
Just to be clear, that unicorn developer might be able to do things that your normal developer can't.
You may be able to find that unicorn who can solve the super difficult problem.
I mean, how many Linus Torvalds in the world really are there?
But if you're trying to put together a team that's going to build software, you need to find capable people.
You want to be above a cut line of capability. But beyond that,
you're not going to help yourself by trying to find the best 5% programmers and stacking your
team. It just isn't going to work. Your best bet is to get a good set of programmers capable
and then put them in an environment where they can succeed. And there are a lot of other things
you can do environmentally to make the work process more effective.
This reminds me a lot of what you're suggesting here
from that movie Moneyball.
Anybody seen that movie Moneyball?
Mm-hmm.
Where the data is essentially like a lot of the teams
are trying to optimize for the Lou Gehrigs,
the amazing, the best ever, and not the base hitters,
not the people who can be consistent.
And it sounds like what you're describing, Bill, is focus on people who can do the job and be consistent at it and get base hits.
Potentially even doubles because base hits lead to more people on base and more people to use the baseball analogy fully to get back to home.
That's the point of baseball is to get back to home.
Right.
And get a run. Well, it seems like on top of that, so one thing, I mean, I think about NBA teams because
there's a strategy of just stacking all these all-stars on a single team.
You know, you get five of the best players in the league on the team.
And what you find is that that can backfire because they have interpersonal problems.
They all want the ball all the time, et cetera, et cetera.
Sometimes it works out, but oftentimes it doesn't however i think what bill is saying here is like in the nba they have
much better stats right like you can look at like the guy who shoots 65 from threes well nobody
really does that really good for three three-point shooter and say well i know i got the best three
point shooter but when it comes to programming there he tried. Bill tried, and what he
found was a lot of noise, right? He can't even get
the stats right.
That's really very perceptive.
A couple of things that I would
build on here is
that's one of the metaphors that I've
used over time. I was one of the
I don't know if you guys have ever read
Bill James. I was reading him
back when he did his first historical baseball abstract.
And when I saw some of this measurement, I said, ah, this is what I've been looking for.
Really?
Because it solves the problem.
And, well, let me go back to another 1980 metaphor, and that was the miracle in ice team.
Oh, yeah, 1980.
Herbie Brooks said, I don't want the best players, I want the best team.
He was criticized for not taking
some really good hockey players.
But he had a specific type
of game he wanted to play. He needed certain types
of skills, and he needed a certain mix of skills
to make it work. And for
a software development team, it's
really the same thing.
It's not like you have to have the
Harlan Mills surgical team.
But you do need to have certain types of skills.
You need to have people who know how to use different tools.
You want to have people who are really good at design.
We've been most effective when we had one or two people who were really, really good at reviewing work.
They might have been really good at reviewing designs.
They might have been really good at reviewing work. They might have been really good at reviewing designs. They might have been really good at reviewing code. And when you put
those on a team, it's where you get that situation where the
sum really is. The whole really is more than the sum of the parts.
So when you put together the right set of skills, you get
a team that's much better than anyone individually.
It would make sense, too, to sort of stack the team up with that kind of team or that kind of skill set across the board because one person can't handle that much cognitive load anyways.
Exactly.
They can't be all the skills at once.
While they may have them all, they can't be them all at once.
Exactly.
And even more to the point is you can't teach people to be intelligent.
I mean, you've got to have a certain base capability.
But most of these other things, like using a tool, using Git or others, revision control effectively is a skill.
It takes skill and it takes repetition.
Doing good design involves a lot of skill and a lot of practice. When you compose teams who have members on them
who have each really worked at honing a certain skill,
they can come together and you can have a more effective team
because, hey, now I know someone I can go to
if I have a certain type of problem.
And going back to the baseball analogy,
if Babe Ruth hits a lot of home runs, they're worth a lot more
if he's got a few guys in front of him getting on base.
That's true. Absolutely true.
What are your thoughts on people trying to derive insights from a repository of code?
So rather than study the humans, study the artifact, the output.
There's a lot of work going on in looking at those artifacts,
and I certainly think there's a lot you can infer from them,
but you've got to be cautious too,
because you don't necessarily know how it got there.
All you have at the end of the day is the artifact.
So what I've tended to focus on is the entire development process.
What are the steps that you took to get there?
Now, sometimes you can infer other things from the artifacts based on,
depends on things like how many times you check them in.
What kind of other information you're gathering?
Are you getting all of the defects that are recorded?
Are they only being recorded after the release?
Are you getting defects recorded in test?
Which version of test?
So I'm not going to say you can't get things from that.
I'm just going to say that it is limited
and the thing that you really have to be aware
of is what were the
limits based on what information
was recorded.
There's a whole discipline of mining software
repositories.
A lot of that work is going on.
I enjoy
sometimes trolling through
subreddits, especially of cryptocurrency projects,
because there's a lot of people who are fanboys of a project,
which is ultimately a software project
with a financial thing attached to it,
and yet they are unable to judge a software project.
And they'll often post how many commits
they've been making this week on their
GitHub repository as some sort of
benchmark
that's meant like, hey guys, 105
commits this week. We're killing it.
It's like, well, what is a commit?
Not every commit
is created equally. It's just funny
because you have these
people misapplying.
Especially if 100 of them were typos.
Just saying in documentation.
Or you just write a bot that just makes it commit every once in a while
to make it look like you're working on it.
There are two things I would say about that.
That's a very insightful comment.
The things that we have coached the development teams on is,
okay, you're taking all this data, you're recording all this data.
Don't use this as a contest.
Don't use this to rate people.
Don't even think about it.
In fact, it's better when you report this to management that you give them things like aggregated numbers that are really on point for are you on progress?
Are you doing the things you said you were going to do?
Because if you start focusing on your numbers, you're going to put the cart before the horse, and you're going to be doing the things you said you were going to do? Because if you start focusing on your numbers,
you're going to put the cart before the horse
and you're going to be doing the wrong things.
So it's really a matter of you're taking these numbers
entirely as a way of getting insight into what you're doing.
You don't want that to be the end in itself.
Now, if you just are recording kind of data that you're doing. You don't want that to be the end in itself. Now, if you just are recording kind of data that you're using and you're seeing these numbers, then you typically will see fairly
predictable distributions. So over time, you're going to see a distribution, small, medium, large,
for example, and you can kind of add them up together because as long as the distributions
are consistent,
you'll get reproducible results. It's when you start artificially manipulating those outputs
that the data becomes useless. Yeah, and how often does your own personal bias kind of play a role
in your findings? You almost have a preconceived assumption, which is your bias going in,
and then you kind of... You're digging for what you want to look for. I don't know. Handle the data
in such a way that you come to the outcome you assumed you would come to. You've just described
one of the big problems in the replication crisis. Have you
read much about that? No. Oh, okay. Look up replication crisis. There's
a lot of literature on that, especially since about 2005.
A Stanford professor, don't ask me to pronounce his name, Iodinus, was noting that it was very hard to find results in social sciences that replicate.
And a lot of that is because a number of very predictable things.
You're going to have the file drawer effect.
If you don't find what you want, your study never sees the light of day.
There are all sorts of things that people call like P-hacking.
P is the measure of statistical significance. So there are easy ways
that if you want to get a result, you can manipulate your data
to kind of nudge it in that direction.
That's why you have to be wary of documentaries, because it's very easy to watch pretty much any documentary
and come out a believer, not because of your own personal inherent beliefs prior to, but because they're
the whole point of the documentary is to lead you to a conclusion.
And that conclusion is their conclusion.
And in some cases you really can't help it because they've proven all the data to that
direction versus the other.
There's no argument.
And the hardest part is sometimes they're actually right.
And then other times they're wrong. And it's like, how do you know? How can I know? You're just so convincing. Exactly. Yeah. It's really tough. There's no argument. And the hardest part is sometimes they're actually right. And then other times they're wrong.
And it's like, how do you know? How can I know? You're just so convincing.
Exactly.
It's really frustrating when you have
all these studies out there.
And how many times have you watched
the news on the latest medical
study? And
okay, is fat good for you or not good for you
today?
Who knows?
We know that one. Just kidding knows? We know that one.
Just kidding.
That's good, man.
We do know that one.
But we didn't always.
Well, I mean, even in today's climate, too, when you think about how do you push back against coronavirus is, in a lot of cases, a lot of the experts say testing.
Because if you don't know where it's at, you don't know how to fight it or how it's mutating or how it's affecting the
population,
who it's affecting all these different factors.
And it's,
it's just sort of,
if you don't have the test,
you don't have testing in place.
Yeah,
exactly.
You know,
you can social distance all you want,
which is a great precaution.
But until we have conclusive testing in any given population,
you can't understand where the spread is and how it's affected the
population. And you won't really know the spread is and how it's affected the population.
And you won't really know, when can we stop social distancing?
Exactly.
How are you going to measure it?
So those all touch on a lot of the problems we have
in software engineering trying to get data.
You talked a little bit before.
One of the metaphors I've used is you can watch baseball games.
How many people are at a typical ball game? 20, 30,000? Some of the big cities I've used, is you can watch baseball games. How many people are at a typical ball game?
20,000, 30,000?
Some of the big cities, maybe 50,000?
Big games, you might have a couple of million on TV.
And for generations, we have evidence now that for generations,
there were a lot of serious mistakes in the valuation of those players.
Now, these are players who we're watching do their jobs.
They're keeping meticulous records. We're reading the summaries in the newspaper the next morning
and we can still make mistakes in valuation.
What does that say about the rest of us?
We're not doing so well.
Well, it's just the state of being, isn't it?
Yeah.
Yeah.
Well, let's bring it back to some more studying you've done before.
This study you share from, I believe you mentioned it before, Menzies from 2016,
are delayed issues harder to resolve.
And you're kind of revisiting this idea of cost of fixed effects throughout a
lifecycle of a software project.
So kind of break into what this acronym DIE stands for and what it's all about.
Well, Tim came up with the delayed issue effect.
And in this case, what we were looking at were some industrial data.
I had data from, I think it was about 190 different software development teams.
We had all of the defect data that they had collected throughout their processes.
And they were from a lot of different industries, very heterogeneous data.
So it's got some of that noise.
But we knew, was this defect injected in a requirements?
Was it injected during a defect design?
Was it injected during the design of the product?
Somewhere during the testing?
Was it injected when they wrote the code?
Where did you find it?
Was it in one of the reviews?
Was it in an inspection?
Was it in a test?
Was this post-development?
So we looked at the activity difference between when the defect was injected and when it was removed.
Now, we happen to have stopwatch-level defect information, how long it took the developer to fix that problem.
That is, once it came back to the developer's desk,
how long did this actually take to fix?
Upfront acknowledgement, that's not including the wasted time
the user had to spend figuring it out, writing up the report,
or the bookkeeping the organization might have gone through
taking in problems,
but the actual developer facing time.
And we found that by and large, it just wasn't growing that fast.
It was growing, but typically if you found a defect during your development, it was pretty cheap.
If you found it after development, it took a factor of typically a couple of times as big to fix. It took longer to find and fix
if it was a test type defect compared to, say, a review defect.
But once it was any kind of test or field, it was
relatively flat. So one of the inferences would be that
modern development has really flattened that curve.
I don't know how long you guys have been. When I started programming that modern development has really flattened that curve.
I don't know how long you guys have been.
When I started programming, my first program I wrote on punch cards.
You beat us.
I skipped that step.
Most people have by now.
I didn't use a CRT until I was like a sophomore in college.
CRT? What's a CRT?
I'm just kidding.
Cathode Ray 2. I'm just joking.
But I learned how to program with Emacs.
It was really hard in the early days
when you had to do these builds on programs
and they would take forever
because you had to recompile everything based on cards
or you had to submit it to a batch system.
The modern development environment
with the most things at your desktop, the builds are so quick, or you had to submit it to a batch system. The modern development environment,
with the most things at your desktop,
the builds are so quick,
a lot of things that really took a long time before have been flattened.
And developer-facing time on fixing those things
isn't all that different.
Yeah, for a field of defect,
it can be hard to find the actual source of the problem.
Yeah.
But those factors of 100 to 1,
10 to 1 to 100 to 1 that came out of some studies
in the 80s, we don't see anything like
that, at least in this data. So it tells you that
either something has changed or there was a bias in how they
collected their data.
Which we've talked about.
Yeah, but they collected their data in the 1980s.
It was a very different world.
What I find interesting about your historical touch points there,
punch cards, gone.
CRT is pretty much gone.
Emacs, it's alive and kicking, baby.
It's alive and kicking. It sure is.
Richard Stallman.
So a lot of the software,
I mean, we definitely have modernized quite a bit,
but it's amazing how there's certain pieces of software
that are still relevant today,
despite all the changes in modern development techniques,
Vim and Emacs being two of the staples.
Yeah, if you're a programmer,
odds are you know at least one of VI or Emacs.
One thing mentioned in this study, too, and I don't know if it's significant or not, is the TSP versus PSP, which is the team software process.
Is it similar?
Are they parallel studies?
What is this?
Well, the team software process is really the industrial version of personal software process.
We taught people PSP so that they would learn how to do things like
record data, how to make estimates. The team software process was how to do this in a real
industrial environment so that you could coordinate with your team members, you could report up to
management, you could make your estimates, you could see if your project, if an overall project, was really on track.
So the PSP was sort of subsumed into the team software process, but the team software process was for planning really big programs. I have some TSP data that for projects that spanned multiple
years. So you would look at the data very differently and one of the things you couldn't
get from that is you're not going to have the same programmers developing the same thing. It just multiple years. So you would look at the data very differently. And one of the things you couldn't get
from that is you're not going to have the same programmers developing the same thing. It just
doesn't happen. In the real world, everyone is developing a different piece of software.
So they aren't really apples to apples comparisons anymore, and you can't treat them as such.
You have to look at the data a little differently. But they record the same types of data.
How long did something take?
How big was it?
Were there defects?
Where did I find them?
Where did they come in?
Are these processes still taught today,
or have they been subsumed by things like XP and Scrum
and other processes?
I was thinking about your personal software process
and how different it is to TDD in the testing way
where you say you can make your test pass the first time.
Well, TDD advocates will say never do that.
Like red, green, refactor, it should fail, right?
So it's at odds with a few of these other processes.
I'm curious if it's still extant,
if it's in the world of academia.
Tell us about it.
Well, it's still being used.
By and large, the Agile techniques
have more or less taken over the world. And I can give you some various group of programmers, you always had
this surge of new guys because every cadre,
every new cohort was twice as big as the last one. So you were always
dealing with these inexperienced programmers who needed to do something very
simple. Now, you said something very interesting and I
hear that again and again about, you know,
the program, it should fail the first time. Well, the way we would teach them, and we've taught
people to do that, is, okay, if you want to program that way, that's fine. But the first failure is not
a failure. Yeah, you should get a red light the first time, but that's not a failure because
that's what you intended it to do. you intended it to do. It's expected.
It's expected.
What we're counting are mistakes.
Mistakes that caused a program and caused you to do some additional rework.
Not the first development.
That first failure is an expected failure.
So superficially, it sounds like they're at odds.
They really aren't.
But the mindset and the discipline is somewhat different.
We were really focused on doing things like being methodical, really engaging in the design.
You see some more of these things coming out now as you move into the DevOps world.
The pendulum is swinging back the other way.
That's a whole other discussion.
Kind of reminds me of that quote, I believe, as Thomas Edison saying,
I haven't failed 10,000 times. I've only found 10,000 ways my idea didn't work.
That's an optimist.
Yeah. One of the things that we tried to teach was that testing alone is never enough because
you can't test everything. If you're doing something like the Avionics DL-178C, where they have truly comprehensive
testing, that's extraordinarily expensive.
By and large, the best testing I've found will find maybe about half the total defects.
So you've got to do other things to find the edge cases.
And that could be things like designing it so that they can't happen,
inspecting, doing inspections on your design and your code so that
you know these code conditions or these input conditions
aren't going to hit or that you're controlling for them. But it is almost
impossible to test quality into code.
Oh, you need the testing, but it shouldn't be your starting
point in the quality. It's really your end point. As the world at large is spending even more time
online, search will become a more critical lever for engaging your users and your customers and some good news out there for those fighting the fight against COVID-19 and the coronavirus.
Our partners Algolia have offered their pro plan for free.
If you are a developer or a team working on a COVID-19 related not for profit website or app, you can get Algolia's ProPlan totally free. Check for a link in the
show notes or head to algolia.com. So Bill, we've talked about a few of your research papers. Let's talk about some vapor research here, things that you have currently brewing,
which is a very interesting cost-benefit analysis of static analysis and doing that during development.
This is something that you have in progress.
You want to talk about it?
Sure.
I wrote a technical report, but that's mostly just a data dump.
Technically, I had to get a lot of the data out there so that I could publish it openly.
It had to go through a lot of different approvals.
The underlying problem is we have all of these commercial tools for analyzing code, finding defects.
You're probably familiar with things like Poverity, Clagg, SonarQ, Clockwork, FXCop.
These things are very good at finding defects.
And I wasn't really interested in comparing one to another because they often find different types of things.
But the question was, how do they affect the software development process?
If I'm a program manager and I have to make a decision, should I spend the money to put these things on the project and have
the developers use them? Is that money well spent?
So I decided I could look at this from a very narrow standpoint. I don't care about
how much they spent specifically. That's a financial problem. But using these tools,
how much time do they take? How many defects do they find?
And what I was finding was that pretty consistently
the teams that I was following, the teams that I had data for
that used these tools were consistently finding defects
they were removing them, the removal costs were relatively
low and from an economic standpoint at the margins they were
actually finishing
faster and spending less time in development. So, operationally, at least, these tools had
a small but measurable improvement in the performance. Now, the flip side is this was
just through the development. You're going to have higher quality product at the end because you have all these other defects that didn't have to go through test, that didn't escape from test,
and aren't going to be found in the field because they were swept out by these tools.
So the bottom line in that was that the effects look like they're actually pretty modest. They
aren't as enormous as you might expect, but that is stipulating that these were teams that were actually doing a pretty good job
on quality already. I have some anecdotal data from a guy at the Veterans Administration,
and he was ready to pull his hair out at the quality of codes that were being given to him.
So he basically laid down the law. You're not going to give me this delivery until you've run it through this commercial static analysis tool and resolved every defect that it finds.
It doesn't mean you have to make all the changes, but you have to run this tool and you have to dispose of all of these problems.
Either it was a mistake or it wasn't.
And until you've done that, you're not done. And the first time he did that, he had like 109,000 reports on 100,000 lines of code.
That's a lot.
Sounds like fun.
That's a lot, yeah.
But he found that once...
More than one per line? Is that more than one per line?
More than one per line, yeah.
I'm just doing the math. I'm like, wait a second.
Yeah.
But he found that once it became clear that he was going to use this as an acceptance criteria, part of his acceptance criteria, they would have to build it into their process and just start getting used to it.
And he found that his data was suggesting they were actually coming in faster.
So I wanted to look at that atomically.
So I used our TSP data, which said all of these defects, we found all of these defects, which ones specifically were being found by the static analysis.
How much time was spent running these tools and finding the defects and fixing them.
And I did some modeling to make some estimates on how much time was being saved downstream, because now you're not going to be finding quite as many things in test but i said okay it looks like this is not taking longer
anyone who says this is going to take longer is probably wrong so operationally these things
actually speed up development so you could not make an economic case not to use them
why would you not use them well Well, maybe they cost too much.
That's a different argument.
But operationally, the data says they work.
There are other things that work too,
and there's evidence that they find different things
that you might find using tests
or using other types of inspection techniques,
but there's no reason not to use them.
Did you apply the die model to this by any chance?
Because if you got these defects in there over time,
other inherent costs or hidden costs that sort of like linger and hide?
Yeah, good question.
What I did was for each of the teams,
I created a parametric model.
And I've done this before.
I didn't send you a copy of this one,
but I did a paper a few years back for Software Quality Professional that showed how you could
model the economics of some of these things. The model included how much time you're going to spend
in these different activities. How much time have you spent doing design, coding, testing?
How many defects do you inject in each of these activities? Where are you finding the defects? How effective are these activities at finding and removing defects?
And for every defect, how long does it take?
So I did some parametric averages of the defect finds in different phases, and I used that
parametric model to estimate if I find this defect in static analysis, how much time is that going to save me downstream?
Oh, I only had about a 50% chance of finding that in test, and it was going to take about three times longer.
Okay, that was a net win.
Yeah.
But that's the way I did the modeling.
And the actual parameters I put in the paper.
The model is not dissimilar to some of the things you've
seen from, well, you've probably heard of Kokomo. There's a variant of Kokomo called CoQalmo by one
of Barry Bain's students. And it turned out that our models have a lot of commonality. They aren't
exactly alike, but they look enough alike that we could talk to each other about it.
I have not heard of Kokomo. Adam, have you heard of Kokomo?
Oh, Kokomo. Kokomo is a parametric software size estimator.
If you look up Kokomo, Barry Boehm, he came out of TRW with this originally like in the 80s.
It was, how can I estimate the cost of a big software program before I begin work?
Based on Google, it says Kokomo is a constructive cost model, is a regression model based on lines of code, LLC.
A number of lines of code is a procedural cost estimate model for software projects and is often used as a process of reliability predicting the various parameters associated with making a project such as size, effort, cost, time, and quality.
Thanks, GeeksforGeeks.org.
Exactly.
So basically with something like a Kokomo,
you'll get some kind of wag about how big is the software going to be.
It might start out with a function point estimate,
but you have to get it into something like lines of code.
You'll put a bunch of tuning parameters like what is the experience,
what is the experience of the developers who are going to work on this?
Is this precedented?
Various types of parameters, and you get an estimate out at the end. containment model, parameterized defect containment model, to estimate the cost associated with
managing quality during the development.
So you mentioned you had anecdotal evidence on the improvement side.
And you had good results on the
TSP data that you had.
Yeah, my data suggested the anecdotal evidence
was consistent
with the anecdotal evidence.
Right.
But what would be really cool
would be
the same teams
with and without
the static analysis.
The teams that might have
already been performing well
because of good practice.
Yeah.
Got a couple 10Xers in there.
Just kidding.
It's all,
it's a recursive problem here.
Right.
Were there 10Xers?
Were there not?
Did they really exist?
Yeah.
What you'd really like to see is some kind of longitudinal study where you introduced it.
But now you're getting into why getting these things out of software is so hard.
Yeah.
You need to raise some money to do a longitudinal study like that.
That would be a lot of work.
And getting people to cooperate with that sort of thing.
Right.
Well, the tooling, some of the tooling you mentioned, Kokomo and others, like I hadn't heard of before, but it reminded me of like Code Climate or other analysis tools that sort of try to infer or determine problems, essentially.
You know, whether it's a particular language or a particular framework they're analyzing.
And I'm just kind of curious, of those that you're studying,
what kind of teams are using those kinds of tooling?
Is it large-scale teams with 500-plus engineers or 100 engineers?
What size teams?
What are some of the parameters there versus, say,
maybe a modern team that tries to be smaller, eight people or less,
one product manager, a few developers,
a designer.
How's the teams layout using this tooling?
Okay.
What I found, the teams that we were working with were typically in the modern team size.
If you have more than about 10 people, it's generally because you've got multiple teams
working together.
Multiple teams working together on a bigger project.
So most of our data were from teams in the order of five to ten people.
However, for the one study, I did have this one team that essentially had, I think it maxed out at about 40 people.
So they were kind of an outlier.
That was in a military application, longer term.
So a lot of those people were on or off the project at various times.
I'd have to go back in the data and look a lot more granular inspection
to see what were the most at any one time.
But most of the teams were in the 5 to 10 range.
Well, let's use Jared as an example here.
Jared, you always mention how you haven't had a typical trajectory
of a software developer.
You haven't worked inside of a large enterprise before.
You've never used
tools like this? You use tools like this?
What kind of things do you use to find your bugs,
to find your defects? Is it error tracking
or is it users?
That question presupposes that I write bugs, Adam,
which is a false dichotomy.
No, I fix my bugs
in production. I don't know.
I'm not a good software engineer,
so I just kind of roll with the punches.
A team of one with very few quality controls in place
is how I've been rolling.
I do test-driven development to a certain degree.
I'm not a red-green refactor zealot.
I don't think you have to go red.
But I do write tests as I go, sometimes exploratory.
Sometimes when I do know the solution
i just want to like harness it as i develop it and then i i add regression tests uh when there are
bugs that are fixed but i don't use any sort of static analysis i don't use very much tooling at
all but i think that's probably because of how small my teams are which usually myself maybe one
or two contractors and that's it.
And so it's just so much easier to manage when you don't have more people.
So Bill, in this case, then hearing Jared's example of his cycles,
and knowing what you know about all the studies you've done,
do you believe that a static analysis tool would help a Jared out there?
If you think about the kind of warnings you get in an IDE or a compiler, the typical static analysis tools simply find more.
And what I've typically found is that if you use one of these tools before you start running tests,
you typically are going to find some things before you get into test, and some of those might have turned up in the test.
So I think you probably would find that they're likely to be helpful.
Not guaranteed, because it depends if you've got to pick the right tool
that's aligned with the type of environment you're going to be using.
I mean, operationally, it's going to be finding the right types of bugs.
So it's an indicator light.
Here's more things to pay attention to, right or wrong.
Yeah, and what I've typically found is that a rising tide floats all boats.
If you find a bunch of these warnings, some fraction of those were going to turn up as testing problems.
Some of those were going to turn up as testing problems. Some of those were going to turn up in the field.
And if you are unfortunate enough to be in a domain where you've got real security issues,
some fraction of those would actually turn out to be a security violation
and might actually be exploitable. For sure. But it's a small number. It's a very small number.
And the best way to get something really secure into the field is make sure it doesn't have any bugs before you ship it.
So you have all of this data. We talked about your programmer productivity data, this 10 years of PSP exercise results and timings.
We have this static analysis data and you're slicing and dicing it, you're analyzing it,
you're putting your thought into it.
I'm curious if you ever thought,
what if I just put the data out there,
publicized the data, made it public,
open license or whatever,
and invited more people to get more eyes on it.
That way maybe you get more insights
from the same amount of information.
I've actually done that fairly recently.
If you check the citations
in the paper, I think I do refer to
where I put the PSP
data. Okay.
If I didn't put it in that paper, then
that was my mistake. I know I
have put it out there since.
Which paper, Bill? I think
I put it in the IEEE.
I believe I put the
IEEE paper into the IEEE's repository.
I should have also put that out somewhere on our SEI site.
Well, it might be available from SEI.
It may not.
There was a lot of resistance for a long time because there were concerns about privacy.
What I did was with that data, I really stripped out everything
that could be even vaguely identifiable.
So it was
real easy for the PSP
data. That is out
there. I think the biggest barrier
is if you haven't taken
the PSP course, the data
is pretty overwhelming.
And it's really hard to get
your head around. I found that
unless I worked with
someone, if they hadn't taken a PSP
course, they really didn't know what to do with
the data. And the problem is
even more so with the team software
process data, because
the data itself is noisier.
You don't have those constraints.
You have to understand the data to understand where the noise
kind of lives and operate around it.
Yeah, so it's out there.
It hasn't been out publicly available for a long time,
but I've historically had very little pull.
Tim Menzies is one of the few who's come to me for data.
Gotcha.
So the Programmer Productivity paper was published in September of 2019 or in the September-October issue. I'm curious if there's been a response that's been measurable from the community? Have people given you high fives? Have they said, oh, come on, Bill, you're full of it? What's been the response? That's a good question. I was kind of surprised at just how big the response was.
It went kind of viral on places like Reddit, Stack Exchange.
Ended up with thousands of hits on our SEI website when I wrote the blog.
Very little came out of the IEEE, by the way.
But when I did a blog on the SEI website in January, that went viral.
And then people went back and looked up the IEEE paper.
And it's kind of interesting because a lot of the comments you get
on the places like Reddit were mostly
critical. They were bringing up things that the study didn't really aim to do.
Like saying, that's not how you measure programmer productivity anyway. It's the unicorn
developer can solve the more difficult problem. Okay, well, that's
fine, but that's not what the study was aimed at.
I got some pushback on the other side from people like
Steve McConnell about, well, what are all
of the things that you haven't accounted for? What about attrition bias?
I still see a lot of high rates of variation among individual developers.
On your paper, why did you stop at the 5% and the
95% ranges and this and that? So yeah, there were plenty
of criticism, which frankly I welcome because at least
people are talking about it.
Yeah.
And they're actually discussing the issues.
Well, it also depends on what you're trying to get at with, you know, what the point is.
What are you optimizing for?
Are you optimizing for the absolute truth or, you know, some sort of indicator of truth
so that we can be better at what we do?
And, you know, maybe try to hire the 10Xers, good luck,
or find the base hitters like we've talked about.
Yeah, and more to the point is,
what question are you really trying to answer?
Because one of the things that came out was
when people say 10x programmer,
they don't always mean the same thing.
Some people mean it very literally.
There are plenty of papers that study the productivity, factor of 10 productivity differences.
But that's not necessarily what all the programmers mean.
They mean the 10 times, they mean the unicorn.
10 times is really kind of a metaphorical 40 days and 40 nights type of thing.
Yeah, most people that I hear say the 10x developer nowadays,
maybe 2020 time range,
are speaking not
so much as somebody who can code faster,
better, stronger than anybody else, but it's
really the multipliers inside of a team.
It's the person that can make everybody else around
them better.
It's the Scotty Pippins on the team.
It's like, that team is better because
Scotty Pippin,
NBA, 90s Bulls reference for those out there.
Look up 90s Bulls, they're pretty good.
It's a player like that.
Everybody else is better because Scotty Pippen's on the team.
That person is a force multiplier,
even if they aren't going to finish these exercises
faster than anybody else.
I actually will have a lot of sympathy with
that because among
the things I've talked about, how do you
as a manager improve the productivity
of your team? It's putting together
the right team, getting the right set of skills.
I'll give you another example.
Have you ever heard of
William Shockley?
No. No.
He was more or less the inventor of the transistor.
Okay.
Everyone hated the guy.
He founded one of the big electronics firms out in Silicon Valley.
He basically made Silicon Valley what it is, along with HP.
Bell Labs?
He was such a terrible person to work for that the guy who founded Intel left and made a fortune
because he couldn't stand working for Shockley.
But Shockley wrote a short paper thinking about
why are some people so much more productive than others
from writing papers?
And he sat down and thought about, well, let's see.
To write a good paper, you have to have a good idea.
You have to be good at managing the research.
You have to be really good at collecting and managing your data.
You have to be really good at writing.
And he put together a list of maybe six, seven items, just brainstorming, that you had to be good at.
So he said, okay, what are the odds that someone's going to be better than average in these seven different categories?
They said, oh, that's going to lead you to a log normal distribution, which is kind of what we see, isn't it?
Well, I would take that as, yeah, you're not going to find too many people who are better than average at seven different categories.
Right. different categories. But you can build a team with people who are
better than average at one or two of them. And now you've got a
team that can do that sort of thing. Those are your force multipliers.
Yeah, I think in today's age we
see the aspects of community forming, for one, but then
the multiplier of good teams. Like here but then the, the multiplier of like
good teams, like here at Chainsaw, we have a great team.
You know, it's because it's not because I'm amazing or Jared's amazing.
It's because the whole entire team works very well together.
You know, and I, I've been a proponent, have even come from the military too, to focus
on team advantages versus individual advantages.
Now you'll always have that individual that will be the Scottie Pippen, the force multiplier.
You're going to have that.
It's going to be natural, but rather optimized for a strong team
than finding that person because that person will eventually
just naturally appear given enough effort.
And I don't have all of your data.
I haven't surmised that from all of your data or all your insights,
but that's at least my personal uninvited opinion,
or maybe invited, who knows. And that's the sort of thing I would recommend when you're trying,
as a manager in an organization, when you're trying to build teams,
when you're trying to build a strong team, do the things that help you build a strong team.
Don't wait for the Scotty Pippen to come around.
Look for the ways to put together that effective team and kind of work well
Harvey Brooks beat the mighty Soviet Red Army team
with a bunch of college players
just shows you anything's possible really
Bill thank you so much for I guess your dedication to the craft
and to teaching
without people like you in the game
since we're speaking metaphorically about games and teams it makes sense without people like you in the game, you know, since we're speaking metaphorically about games and teams, it makes sense.
You know, without people like you in the game, you know, we need people like you.
So thank you from our side and from our audience's side to do all the teaching you've done and all the insights and research you've done and for sharing that.
I mean, it takes a lot of dedication, a lot of time, and we don't discount that one little bit.
We thank you for that.
Well, thanks for having me.
You ask great questions.
It's really a pleasure to talk to people who can put those challenging
and insightful questions out there.
Well, that's what we aim to do around here.
And, Nadi, thank you for listening.
Bill, thanks for coming on today.
Appreciate your time.
You're welcome.
All right, the next step is to sound off in the
comments. If you have some thoughts, I'm sure you do on the 10 X developer myth is bills research
botched. Is it biased? Is it true? What do you think? Hey, the change law.com slash three 88,
let us know. And we get asked this all the time. How can you support us? Well, the best way for
you to support us is to tell your friends, literally that's the way to help send a text send a tweet literally
pick up the phone and call somebody hey i listen to the changelog and you should listen too and as
you know jared and i we host the changelog our beats are produced by the beat free breakmaster
cylinder and we're brought to you by some awesome partners fastly linode and rollbar
oh and one more thing we have a master feed that brings you all of our podcasts into one single feed.
It's the easiest way to listen to everything we ship.
Head to changelove.com slash master to subscribe or search for Change Love Master in your podcast app.
You'll find us.
Thanks for listening.
We'll see you next week.