The Changelog: Software Development, Open Source - GitHub's Open Source Survey (2017) (Interview)
Episode Date: June 9, 2017On Friday, June 2, 2017 – GitHub announced the details of their Open Source Survey – an open data set on the open source community for researchers and the curious. Frannie Zlotnick, Nadia Eghbal, ...and Mikeal Rogers joined the show to talk through the backstory and key insights of this open data project which sheds light on the broader open source community's attitudes, experiences, and backgrounds of those who use, build, and maintain open source software.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly.
Learn more at fastly.com.
And we're hosted on Linode servers.
Head to linode.com slash changelog.
This episode of the Changelog is brought to you by our friends at Sentry.
They show you everything you need to know to find and fix errors in your applications.
Don't rely on your customers to report your errors.
That's not the way you do it.
Use Sentry.
You can start tracking your errors today for free.
They support React, Angular, Ember, Vue, Backbone,
Node frameworks like Express and Koa
and many, many other languages.
That's just JavaScript I mentioned.
View actual code and stack traces,
including support for source maps.
You can even prompt your users for feedback when front and errors happen so you can compare their experience to the actual data at the changelog.com slash century start tracking your errors today for
free no credit cards required get off the ground with their free plan and when you're ready to
expand your usage simply page you go again, changelog.com slash
Sentry. Tell them Adam from the Changelog sent you, and now on to the show.
Hello and welcome to the Changelog. This show is about getting to the heart of open source
technologies and the people who create them. And on today's show, we're talking about GitHub's recent open source survey with Franny Zlotnick, Nadia Ekbal, and Michael Rogers.
You may know Nadia and Michael from our other podcast called Request for Commits, changelog.com slash RFC. insights of this open data project from GitHub, which sheds light on the broader open source community's attitudes, experiences, and backgrounds of those who use, build, and maintain open source
software. So we have a fun show today, a different show than maybe our normal episode of The Change
Log. Jared's sitting out. He's got an awesome baseball game to attend with his kids. And we have this cool show called Request for Commits.
Michael Rogers hosts that show. And today we have a show kind of peeling back the layers of this
open source survey. And Michael, you're here. So might as well say hello.
Hello. It's nice to be on the changelog. Yeah, it's been a while.
It's been a while.
And on an episode, or I guess the after show of Request for Commits,
we were sort of chatting.
This was like earlier this week, I think.
And Nadia, you and Franny have done this cool thing.
And many of the people too will talk about that.
But this open source survey conducted by GitHub.
So Nadia, you can say hello as well since you're here hi and we also have franny here which uh franny you're in
data at uh github but what do you work on there um i am a data scientist uh i work on sort of long
mid to long-term research projects for internal and in this case sometimes external
audiences and what is it about data that gets you excited uh i really love being the first person in
the world to see nothing um so i i like to like know something before anybody else does
interesting and uh carrying somewhat of a tradition of this show, we like to kind of
dig a little bit into somebody's backstory. So like, you know, what's your story? You're in data,
but you work at GitHub. What's some of your backstory that we can share with the audience
to kind of give some context to who you are? Well, my background is academic social science.
I was trained as a political scientist, and I got interested in academic social science. I was trained as a political scientist and I got interested in computational social science,
which basically means doing social science with data sets that are too big to kind of
handle with the normal tools.
And so I started doing a little bit of CS and data mining type methodologies.
And that kind of led accidentally to working at GitHub.
I actually interviewed before I knew what a GitHub was. And yeah, I just I sort of fell into it
accidentally. But it appealed to me because really the fascinating data on GitHub is all
really social data. It's about how people are working together to build things.
And so it's actually a really fantastic place to be doing data work from a social science perspective.
I have an interesting history with GitHub to some degree because I remember sitting in San Francisco in a random office with Chris Wanstroth and Tom Preston Warner
conducting an interview for a podcast called The Web 2.0 Show.
And this was like literally a month and a half after GitHub launched.
So it's been a trek.
So GitHub has evolved over the years.
It started out as social coding and has done all these cool things
to raise, I guess, the bar of open source
but also make it so much more accessible to the world. So GitHub has
evolved tremendously. At first launch it didn't need
maybe it did need data scientists, I don't know, but you can totally see
a true need now. And then obviously the survey, opensourcesurvey.org
I guess that redirects to 2017 because you have plans for future versions of this.
Is that true? Yes. Yeah. So we're in a new world where
GitHub is essentially the, many might say, the epicenter
of open source and now needing folks like Franny to help
make sense and see data first and hopefully put back the
layers of what's important out there.
One of the things that I find really interesting about this data set too is that, you know,
GitHub changed open source and now those people that are flooding in are sort of changing GitHub
a little bit too. And it's really interesting to see some insights into what they think. How did
you, like, how did the survey start? Like there's a lot of academic surveys out there. So I'm
curious, like what the motivation behind
this particular one was.
This particular survey was
kind of the
idea of
a guy who used to work on the open source team
at GitHub named Arvin Smith,
who was our
open data person, program
manager for open data. And I think he's been on,
I think you guys have interviewed him before. Yeah. Yep. Um, yeah. So he, uh, he came to me,
um, several months ago, eight, nine months ago, I think maybe more, uh, and, um, had this idea to use the access that we had to the open source community
to gather high quality data for researchers,
and in particular academic researchers,
studying open source development and processes,
help them get better data than they're able to get otherwise.
Because you're right, there are a ton of surveys out there
because this is a fascinating domain.
It's a weird and unique form of production of public goods
that a lot of the world's critical services are based on.
And people are really interested in understanding how and why
people participate or don't in this community. But it's hard to get good data on it.
Why is that? What makes it hard to get good data?
Is it selecting the right kind of people or them self-selecting?
Yeah. So for the types of questions that people are
interested in, a large part of the problem is getting access to the right people. So this is
a community that's kind of over-surveyed. People get a lot of emails asking them to take surveys,
and at some point they get tired of doing it. And so it's hard to get the people you want to
talk to to take a survey.
So sampling is hard.
The unbiased way is really difficult to do.
So generally what people have done is they'll go through public records of open source projects
and look for people who have committed or otherwise participated in the project and then just email them.
But that means you miss this huge community of people that we
know are there who are using projects and looking at them, but not necessarily actively contributing
to them. And we think that that's a really important part of the community as well.
But unless they leave a visible artifact of having been there, there's no way for the
researchers to know that they're there.
So by having access to the traffic going to these projects, we have an ability to get to
people in a way that is virtually impossible for
most other people.
Yeah. It certainly makes sense to be GitHub and conduct this interview because otherwise
on the outside of me, like you had just said, it's difficult to sort of seamlessly access those kind of people.
Let's rewind a bit and kind of touch base on exactly what this is.
So this is an open source survey conducted by GitHub in collaboration with researchers from academia, industry, other folks from the community. The purpose was to gather high-quality insights and data
from those who are interacting with or even just checking out open-source projects.
A lot of responses, a little over 5,000 responses,
not only from GitHub's data pool, but also it seems like
random samples from other communities that aren't the GitHub platform.
And then open-scing that data set.
Is that right?
So it's a little over 5,500 randomly sampled respondents
from open source repos on GitHub,
and then another 500-ish responses from a non-random sample of communities off GitHub.
So we know that not all open source
is on GitHub. There's a bunch of very important projects that predate GitHub or work on other
platforms for lots of reasons. And we wanted them to be part of the sample as well. It's
harder for us to access them for obvious reasons. So that sample is non-random, but we did try to make an effort to make sure that they're
represented in the data as well.
Right. And when you say like random and non-random, that's really important to say
for what reason? As somebody who doesn't do data much, why is it so important to
make that distinction?
Yeah. So typically the way these sorts of the open source surveys have been done
is to use opt-in sampling. Basically somebody makes a survey and then they publicize it in
lots of places like Twitter or on a website. And they basically kind of broadcast that they're
doing a survey and ask people to come to them. And that means that people go out of their way to
say, I want you to hear my opinion. And that means that people go out of their way to say,
I want you to hear my opinion.
And the people who do that are kind of weird, right?
They're people who have really strong opinions on things
or they're people who think taking a survey
is a fun way to spend 15 minutes, which is weird, right?
Like there's lots of reasons why that's not,
that gets you a pretty biased sample.
It gets you a like biased sample. It gets you a very opinionated sample.
It's subject to basically kind of gaming.
People will try to send the link to specific groups that they want to take the survey, but maybe not other people.
And so the way you get high-quality data that is closer to representative of the whole
community is you randomly select people, hopefully in a way that gives everybody who's there an
equal probability of being invited to take the survey. With that in mind, you've got quite a
few questions. So maybe not a 15 minute survey survey, maybe a half hour, 45 minutes.
This is 50 questions.
What's the rough average that you would expect for someone to dedicate towards answering this?
This is actually, it was like an 11 to 15-minute survey depending on which set of branches you hit.
So if you answered some questions in a certain way,
you'd get some additional questions. But the average time was something like 11 minutes.
And that was intentional. A really long survey is really tedious and taxing. And we wanted to,
you get better data if people don't find it really annoying to take your survey.
So we made a big effort to make sure that we use almost exclusively
closed response questions that we wrote them in a really straightforward way that it was easy to
take the survey as fast as possible, given the volume of stuff we wanted to cover, so that it
wouldn't take up too much of people's time. And we had actually a really good completion rate. It was something like 50% of the people who started taking it finished it,
which is really high for a survey of this length.
That is really high because I've visited lots of surveys and I'm like, no.
I was nervous about that.
I was afraid people weren't going to take it because it was a really long survey.
But yeah, it turned out to be great.
Well, surveys in general are pretty tough.
You may have some experience with Node foundations.
I know you do surveys there or have done surveys there each year.
Were you involved in that at all?
Yeah, yeah.
I mean, we did the one where you kind of blast everybody and you try to get everybody to get in on the survey.
The one thing that we do to try to quantify what kinds of respondents we're getting
is that we ask a question,
how many years have you been using Node?
And we have a pretty good idea
of what the growth trajectory is.
So we know how many users in the overall community
have only been using it a year, two years, three years.
And so we know that the respondents tend to be people
who are in this slice of our community that is very high. And so we don't the respondents tend to be people who are in this slice of our community that is very high.
And so we don't consider the results to be, you know,
representative of the entire community,
more like representative of the people
that have more experience,
because those are the people
that end up filling out the survey.
And we have a few other questions in there
that help us kind of slice up the data
to know which section of the community that it's addressing.
We don't really have any kind of mechanism to randomly select the way that GitHub can't hear.
So they can really look at it
and get like a very truly representative sample.
Yeah, and to be fair, there's also another,
we have this problem of response bias too
where not everyone we invite actually takes it.
So like some certain percentage of the people
that we invited to take it actually did so. And so there's certainly ways percentage of the people that we invited to take
it actually did so. And so there's certainly ways in which the people who did decide to take it are
different from the overall community. For example, a huge percentage of people say that they are
maintainers or people have actually contributed that. And we know that we see in our traffic
numbers that there are many, many more people who are just visiting or using repos without actually necessarily actively contributing to them.
And they are not represented in the data in as high numbers as we see.
So there's still this sort of bias in who decides to participate. But because we invite it in a random way, it's a better sample, not necessarily precisely representative, but it is higher quality than you would otherwise get.
Let's find out where this came from, if you don't mind.
I want to go deeper into the context, some of the insights discovered from this.
But it sounds like this has been in the making for quite a while.
You mentioned Arvon Smith, also on season one of RFC.
Great episode.
We'll put that in the show notes,
so go check that out.
But this predates your getting hired there, Nadia.
So where does this begin in terms of motivation?
What was the purpose?
What were the beginnings of this effort?
Yeah, the motivation was really just
to make data available
for people to do interesting
and good research on this community.
It's good for us internally at GitHub
if people are doing research
on what makes open source
sustainable and healthy. The data is useful to us as well.
And we also have interest in making sure that the data is available not just to researchers,
but also broadly available to the people who can kind of do the most with it, which is the community itself.
So the idea was really just to use our kind of unique position
with regard to the community to be able to create high-quality data
that could then be used by other people to do interesting research
and to help people make decisions on what their communities need
and help people kind of understand the different parts of their communities need and help people kind of understand
the different parts of their communities.
You know, like help maintainers understand contributors,
help contributors understand users
who don't necessarily contribute, things like that.
Can you go back to the moment to some degree?
Like, was it some sort of message, an an email was it a face-to-face
conversation like what was the original context for like hey we should do this survey we need to
know this information and kind of who was there and what was what was going on to sort of surface
this this desire uh i think it was a it was a video chat between me and arvin um he sent me a
message he just sent me a message, asked me to meet.
And I think at this time Arvin was doing his like nomadic thing.
He was traveling around the country in a van with his family.
And so I don't remember exactly where he was,
but I was in our office and he was in a, in a van somewhere.
Last time I caught up, well, we caught up with him for RFC.
He was in Canada.
So that was about nine months ago.
So it might have been then.
It might have been then.
I think I remember there being some trees behind him.
It looked really pretty.
Canada's got a lot of trees.
And he just pitched this idea to me.
He said, I would love to, if we could do a survey of open source, do it in a high-quality way.
And it's impactful. People are using the data to make decisions about what they do or they're publishing papers on it.
Can we do something that would provide good data, help get people interested in using our data for other types of things,
and also make sure that when people are trying their best to make data-driven decisions,
that they're using high-quality data. And I thought that sounded really interesting. I've
mostly been working on internal facing things, you know, the reports for internal people at the company.
And it sounded like a really fun opportunity to learn more about open source
because I'm not a domain expert in it.
And to do something that would have a really wide audience and potentially a
really large impact.
After the break,
we'll dive deep into the insights of this open source survey from GitHub.
With more than 50 questions, this survey is by far one of the most widest range research topics GitHub has done to release as open data.
So we're going to highlight some of the most actionable, important insights from this after this break.
This episode of the Change Log is brought to you by TopTal, a global network of top freelance software developers, designers, and finance experts. If you're looking for contract or
freelance opportunities, apply to join TopTow
to work with top clients like Airbnb, Artsy, Zendesk, and more. When you join TopTow, you'll
be part of a global community of developers who have the freedom and flexibility to live where
they want, travel, attend TopTow events all over the world, and more. And on the flip side,
if you're looking to hire developers designers or finance experts top towel makes it
super easy to find qualified talent to join your team head to top.com that's t-o-p-t-a-l.com
and tell them adam from the changelog sent you
i'm i'm kind of ready to dive into the insights a little bit is everybody else ready to dive into
the inside do it man Open it up, Michael.
So this first one about documentation is actually very interesting to me. It seems like something obvious, right? Like, of course, docs are important, but everybody says that and nobody
actually does a very good job of it. And it's interesting here because the way that it's framed,
you have a question about problems that are encountered in open source. And, you know, some of the other things you ask about other than
documentation are, you know, unresponsiveness or dismissive responses in issues and PRs and
things like that. And I know that across all the projects that are trying to take getting new
contributors seriously, there's been a huge focus on improving that flow and making things nicer and easier to get in.
And maybe not so much on documentation.
And now we're seeing this data that really shows that documentation is much, much higher.
Like it's, you know, something like, what is it, 95, 96% of respondents?
93%.
93%, wow.
That's unbelievable.
Yeah. So like, you know, you, it looks like, you know, more than just this question, y'all really dove into this to figure out, you know, what is going on here. But a thing that hasn't been as well studied are the things that prevent people who
would like to but are not necessarily contributing as much or in the ways that they want to.
And so we devoted kind of a whole section of the survey
to negative experiences in open source,
which is, I mean, probably kind of a bummer to take the survey
and only get to talk about like the crappy things you encounter.
But it was where there was kind of a hole in the data that existed as I saw it.
And also kind of more actionable than, you know,
like tell us all the wonderful things you get out of doing open source.
Like those are kind of self-evident.
Like we know a lot about that already.
What we don't know are more about like the types of things that get in people's way that make it hard for them to contribute.
I like the tie of tying it to our findings around different groups that aren't traditionally super well represented in open source value those processes more.
And I thought that was an important way to tie together why documentation matters to also getting new contributors.
Yeah, you made a particular point in here about English and how nearly a quarter of
open source communities have less than very well English skills.
So it's not saying all docs need to be
in another language, but it needs to be in a form of English that is not
really complicated, right?
That's something actually that I personally discovered
while at Node Interactive recently, Michael,
when I was invited to come out there with y'all and do that.
And I was talking to Xia Lu, and she was originally from China,
but works at Autodesk and had moved to San Francisco,
but also kind of went back and forth,
talked about
the Great Firewall China, you know, the disconnection there.
But she talked a really big deal about the difference of accessibility to docs because
of the language barrier.
But then the time it takes for things to be translated and by the time it is translated
or if it ever is, it's kind of too late.
Yeah, yeah.
The way that we've kind of solved this in the Node project
is just that a lot of people that speak other languages
are contributors or committers in some way
and they watch doc changes.
And so as doc changes happen,
they make suggestions about the language
to simplify it a little bit
to make it more easily translatable,
but also that just makes it more understandable.
I've noticed a lot of translations get out of date.
I mean, probably not as much for Node, but for smaller projects where
a relatively
common contribution is someone volunteering
to translate the docs, but then
that's not a commitment to keeping them updated
forever. So yeah, I definitely
see the value of, if you're going
to write in English, at least simplifying the language
can make a big difference.
So I saw a completely different piece of research lately.
It tried to quantify all the steps that a contributor to open source goes through,
not just the visible ones like the pull request and each comment and things like that,
but a lot of the sort of invisible steps that they do on their own,
like running the test locally and checking documentation.
And I was kind of blown away by the number of times that they check some form of documentation
in the project, right? Like they try to find things that are similar. They try to check the
documentation or the code style and all these things. And that should have made me not very
surprised when I saw that documentation was so important. But still, I don't think that I
internalized it enough. But like a huge part of the process of contributing
and just going back to the documentation and reading it.
Well, no one's too good for docs, right?
I mean, you're never expert enough to not need docs is my point, right?
Is that no matter how good you are, it's like, what was the document?
And how was that used?
You're never, you can't, if you can't keep it on your brain,
you're superhuman
and i don't know i don't know how you got there but write a book about something it's like kind of
it's kind of dumb not to like why why remember all this stuff when you can just refer to
right imitation when you need it i don't want to waste that brain right exactly so i mean when we
look at this it sounds like you had 50 good questions, but from that, at least at the public website,
we got five kind of core highlighted actionable insights. One of them being documentation.
Freddie, you mentioned the negative interaction section. So there's some insights around that,
how open source is being used by the whole entire world, not just, you know, let's say
San Francisco, for example.
Using and contributing to it, to open source happens a lot, even on the job,
and that it's often the default when choosing open source software.
Why, at least from the public website, only share those insights?
What's the plan for future insights being shared?
That's a good question. So I think we wanted to highlight some of the more actionable things.
So things that we knew would be either really highly interesting that people would really want to know about,
so like the demographic stuff falls in that category.
Some stuff that's really actionable, like write documentation,
make sure that it's accessible to people with varying English skills. And then stuff that was kind of really strong signals in the data. It's certainly not all that there is in there and i think like a fully complete uh treatment of all the data in there would
be just overwhelming uh and people would stop reading it they they wouldn't go through the
whole thing so we wanted to keep it like pretty limited to some key things and then uh leave the
rest of it open for other people to find and to research and publish on.
Because the original idea was really the point was the data,
that we would be putting out the data for the research community
and for the open source community to do what they will with.
And the kind of insight section was more of a, oh, we found all this fascinating stuff.
We want to make sure that people actually do learn these things from there.
But it's kind of a small chunk of what's in there.
Also add for any of those like top five or whatever insights, it wasn't like one question to one insight each of
those was a mix of probably like at least five questions per section of looking at the data and
saying oh i mean like like the documentation part is interesting but then the like um non-native
english speakers is interesting what does that mean together and so some of that was already a
little bit of an analysis and mixing together existing questions. We did go through all of
the questions
as a group just to see
given Franny's
experience with data and
my experience with open source
and we had a few other folks in there too of
just saying given our collective
knowledge, what do we think is really interesting
and actionable here? But yeah,
it's definitely not
complete either. I think it really interesting and actionable here. But yeah, it's definitely not complete either.
I think it's a little timely too.
As we've been recording season two of RFC,
we've been able to talk to a lot more people
that have raised money for their projects,
for fairly big kind of notable projects.
And a recurring theme seems to be,
we have this money and we don't know what to do with it
because there's not a lot of really discrete,
specific things that you can spend it on but documentation is one of those things like you can hire somebody
to just write better documentation yeah um that's like a really actionable thing and and you know
armed with this data to say like actually this is the most important thing this is probably
something that we should spend money on like that's that's really actionable and really timely
that makes a lot of sense because we often hear we're not really,
we have money.
How do we spend it?
And do we spend it on staffing more people to,
you know,
put more developers behind a project or we,
you know,
do we embrace or,
you know,
invite community into this?
Do we do events?
Do we do meetups?
Do we do swag?
It's like,
there's so many unknowns out there. And so being able to like have, you know, unbiased opinions on what really
matters certainly gives better waypoints for maintainers and community leaders and open source
to take action upon. You know, without something like this, you're sort of just shooting in the
dark. It was fun to see a lot of common wisdom, I guess, passed around and seeing how that maps to the data that we found.
And some of them were really spot on.
I was actually thinking of you, Michael, in the negative interactions findings.
Because you had this tweet from a while ago that was like, don't tolerate asshole.
Oh, sorry, Adam.
No, it's okay. Go ahead.
Heads up.
If you're in a car with a kid,
you got your kids around muted or whatever.
Go ahead.
Just repeating his tweet verbatim.
Um,
so they're like,
don't,
don't tolerate assholes in open source because new people that see that will,
will want to walk away or something like that.
You probably remember it better than I do,
but,
um, yeah,
well,
there's an interesting Venn diagram that goes with it,
which is basically like, you know,
yeah, yeah, like assholes are this really small dot,
and then there's, you know, people who tolerate assholes,
and then, you know, and then a much larger bubble of nice people.
And some nice people will tolerate that kind of behavior, but far more just won't.
And so you're excluding this much bigger group
when you sort of like accommodate people who are toxic, right?
And that was totally what we had found about
smaller portion of people have personally experienced something,
but a lot more people have seen something happen.
And there's a pretty significant number of people
that stop contributing to a project when they see behavior like that.
And so I thought it was good to actually have data to say
this really does matter and that common practice
or common wisdom is actually true and useful it's interesting to see how much negativity is
after that i mean even 18 percent of respondents having that's that's that's enough i mean it's a
lot yeah that means like if you interact with open source 100 times 18 times out of this 100
you're going to get you know some sort of negative reaction. I got
negativity today.
Hold on. You're interpreting the data
a little bit off there.
This is of individual people, not
individual interactions.
Well, that's 18% of response.
Oh, I guess you're right.
You're right. I'll stand corrected then.
But I'm sure my number is just as
accurate.
I mean, it's definitely not great great it's definitely higher than you would want it to be but it's also not out of line with data from similar communities especially online communities
like this is it's not necessarily um just open source that has this problem um but i think the
visibility of open source, like everything being open
and kind of having a little bit of a viral aspect to it,
like people send really like kind of fire issues
to each other just to, you know,
take a look at what happened over here.
That's the visibility might be higher.
But this kind of like people being jerks to each
other on the internet is a thing that happens in a lot of places i think one of the insights here
is like you have 50 of people have witnessed negative behavior and 21 of them have actually
left a project because of that just just from the witnessing not even necessarily the being
you know the victim of it well that's the truth it's like if you if you just witness the witnessing, not even necessarily the being, you know, the victim of it. Well, that's the truth. It's like, if you, if you just witnessed the negative behavior,
you're going to assume that that's a common thing or a standard or it happens often, especially
if you see negativity unresponded to, you know, like allowing it, you know, that's,
that's a difference. It's like, that's why I think maybe your tweet, Michael, and Nadia,
your mention of it may be right on point because if you allow somebody to be that jerk, then, you know, you're you're not so much just as.
At fault, but somebody needs to to say you know if you're
going to participate in this whatever this is here are the rules for which we all agree to abide by
and if you don't here's the ramifications and here's how you can you contact someone to say hey
this is happening can you please make it stop i want to re-emphasize fernie's point about the
that the levels of negative interactions that we saw is not that different
from other online communities.
Cause one thing that was sort of sad to see in a lot of the press findings
was, or a lot of the press reporting was,
their takeaway was this open source is really terrible and toxic and
everything is just like, you know, Linus Torvalds.
And, and it makes me sad because I think at least for me, part of the goal with this data
was to, I don't want to scare people off
from contributing to open source.
And I think it's important to highlight
that this stuff, it's not great,
but it's not different from other online communities.
I think open source gets an unfair reputation for that
and it turns a lot of people off from it.
But like none of it is great because humans are just kind of not nice to each other on the Internet, period, when left unchecked.
But being able to see that is useful.
I think one important difference, though, between GitHub and other online communities is that if you're like really into ham radios and you go on the ham radio forum um and you experience
negative behavior your choice is basically you know i stop talking about ham radios on the internet
or i continue to put up with this and on github it's literally like i just go to another project
that's not terrible and so you have this huge number of people that like actually just leave
projects because of it right yeah at the same time, you can't make or instill a change
unless you measure it, right?
That's a known thing.
So just having this data alone
shouldn't say, oh, this is an issue
or this is to scare people off, as you said.
It should be something that attracts some change.
Being aware of an issue is the way you instill some change.
Exactly.
Coming up after the break, we get into a heavy topic dealing with negativity in open source. We also talked about maintainers having to be the police.
And an even more touchy subject, the accuracy of this research.
Is this a true representation of the overall open source community?
All this and more after the break.
This episode of The Change Log is brought to you by GoCD,
an open-source continuous delivery server from our friends at ThoughtWorks.
GoCD lets you model complex workflows, promote trusted artifacts,
see how your workflow really works, deploy any version, anytime, run and grok your tests, compare builds, take advantage
of plugins, and so much more. Check out gocd.io slash changelog to learn more. And now back to the show. this is a tough subject not often discussed the impact of negativity in open source and
sometimes when that happens people are forced to go into private channels or to enforce their
code of conduct and get into very uncomfortable situations.
Basically having to deal with these negative experiences that have real
consequences to not just the people, but also to the project.
And sometimes people just don't interact or they just don't interact publicly.
What do you think?
Yeah. People can, they might withdraw from a project.
They might keep working on the project, but start working in kind of back channels. Instead of working publicly in the repo, they might start pinging people through email or Slack or other, other methods to avoid sort of like the, the microscope, the public microscope of attention on the work
or maybe risk of somebody saying something really critical
and that ending up being part of their public professional record?
Well, no one wants that, right?
I mean, if we were in an issue and Michael called me a name,
I would probably not want to talk to him ever again for one.
And then two,
I'll just be like not being involved in this anymore.
I'm done.
Or maybe I wouldn't,
maybe I would just take it and just feel stupid.
What do you think,
Michael?
What are common interactions to like,
I mean,
is that how you would respond?
What do I think is in like,
you want to test this out and have me call your name on the internet?
Yeah.
No, I think like in, like, you want to test this out and have me call your name on the internet? Sure. Yeah, sounds like a wrap. No, I think, like, the kind of top line of this
is that people that experience negative behavior,
it's, okay, I would put it this way.
It's really important to deal with negative behavior
when it happens at the project level.
The next point in here is that, you know,
the most effective tool here is to ban people.
And then maybe even banning people, like liberally um is a is a fairly good idea and one
of the the the arguments that we hear over and over again is that like you know using bans and
using other moderation tools um they we keep creating a higher and higher bar for like the
kinds of behavior that does that um but there are a lot of other negative consequences to not dealing
with it like you know not just the person who's the victim of the behavior, but also everybody
watching can just move to another project a lot of the time, or they might, you know,
move into private channels, which is like not good for an open source project.
And so it's just like really important to deal with the behavior, right? To actually
moderate it in some way. Because if you don't,
you know, something like three times the number of people that experience it are, you know,
potentially also seeing it and they're going to do something as a result, right?
The problem with the blocking part, though, to me at the banning the blocking is that somebody's
got to be the police and somebody's got to be the bad guy, you know, or bad person, so to speak.
You know, like somebody, which is okay,
but that just means that somebody who may not exactly want to take on the role
of being the enforcer, so to speak, has to take on that role.
And that has to happen in every single project.
And that could become an issue just generally.
Like I don't always want to be the bad person and say,
hey, you can't play anymore because
you've crossed the line here. I get it.
It's needed, but it's hard to be that person.
We asked about individual
users having the ability to block another
user. That's separate
from asking a maintainer to go in
and block somebody from a project.
Actually, the
finding is that an individual having the ability
to block another user without having to involve somebody else is the part that is most effective at addressing problematic behavior. sort of legal intervention, police intervention, ISPs or hosting services, bringing in these third
parties at any level is less effective than giving people the power to protect themselves on an
individual basis. And so I think the findings suggest that you want to move away from having
a kind of third party policeman and give people tooling to be able to say, you know, whether or not
this person is part of this community or not, I don't want to interact with them.
That seems to be.
So a singular level opt out, basically.
Yeah, because, you know, anything that relies on maintainers, first of all, as you said,
it puts somebody in this police role that they probably don't want to be in um or they
may or may not want to be in and it also means that you need to have a really responsive maintainer
in order to have any sort of like uh ability to have this yeah and you know there's um lots of
projects where the maintainers are not necessarily responsive on the timeline that you would need
in order to address it well certainly adds one more notch to the job role of being a maintainer,
if that were the case.
Must also be police.
I think that it's important for you to think about these things in your project.
But that's a really good point, which is that a lot of individual maintainers
are just not engaged enough in the project to even do that.
And in fact, there's not a lot of other people engaged in that particular issue.
So it's really enough for the one user that's seeing this to just block that person or experiencing it to block that person.
And that kind of scales down to all of these smaller projects that don't have as much infrastructure.
I'll also just give a shout out to GitHub making improvements.
There is a beta tested feature right now for temporary bans at the org level.
So you can kick people out temporarily and it's not so much of a permaban.
And it's a nice way to say that behavior is not appropriate.
Here's a consequence, but you're not banned for life.
Yeah, I see this finding as being...
The people who should pay attention to this finding are people like us like github like
other platforms it's like a platform level finding like this is what we need to build in order to
make sure that communities have the ability to for people to work in healthy and safe ways it's
not necessarily like at the project level or it's not necessarily on a maintainer to go build
something to do this right like we the platforms should make this available.
Nice.
So earlier we talked a bit about how this has been covered in the media.
In that vein, I saw an article in Wired
talking about some of the gender imbalance stuff.
And one thing that I noticed was that
the metrics that they took from the survey,
they were sort of implying were the overall GitHub metrics and that the overall GitHub users were only 3% women.
Yeah, that's not true.
I'll let you dig into that a little bit.
So can you tell us a little bit about the gender imbalance findings that you found?
Yeah, so I mean, they're not good. Uh, it's, uh, 95% of the people who, uh, gave a substantive answer to the gender question, um, identified as men, uh, and only 3% identified as women. Another 1% identified as non-binary, uh, is just like a profound imbalance. There's no other way to talk about it.
Does this go back to the pool for which the data represents? You know, going back to the random, non-random question, biased, unbiased, and this assumes that the person was on GitHub, right?
They were prompted some way to say, we have this survey, please take part.
Being that it's such a wide chasm
between those numbers, 95%, 3%, 1%,
I'm just wondering, given that big of a difference,
how confident do you feel in the accuracy of that?
Being, you know, if you took the same
and you
expanded across all of github and everyone who's ever interacted with github answered would that
still be true uh i don't so this the way we sampled was definitely not how you would try to
get all github users like you had to um do a really specific set of actions on a licensed open source repo that
indicated like sincere interest in open source in order to make it into the pool and then we
randomly sampled from that so it's definitely in no way representative of the general github user
base can you share what those actions might be maybe Maybe to kind of give folks a... Yeah.
Is that secret stuff?
It's not secret.
You had to do... If you download the data set,
there's some documentation that has more details than the website.
You had to do something like three clicks on a licensed repo
or visit three licensed licensed projects uh within 30 minutes in order to to make it into
the sampling frame and that's because it's really easy to fall into a github repo from google and
like not really intend to be there right uh so we were trying to make sure that we're only getting
people who like seem to have a sincere interest in open source.
To get back to the sampling part, there's a possibility that maybe women are more likely to
use open source or be interested in it, but not be contributors. And maybe they felt like when
they got invited, we didn't actually mean them. Maybe they don't consider themselves a member of the community
or something so they thought like oh we probably don't want to hear from them like that's possible
but um like an imbalance this large i i think that this is probably pretty pretty accurate for
open source and it's consistent with you, basically all other research that's been done on open source communities that
like between one and 10%.
The reason why I asked this question isn't to, isn't to deny the accuracy.
It's because it makes me sad.
It took me a few weeks to like process that finding.
Yeah. I mean, that's really sad.
If that is representative of the truth,
we've got to do a better job.
Well, understanding how the selection works now,
I'm not that surprised.
I mean, I've seen these other studies
that show between 1% and 10%.
And I mean, all of them have different issues
in their sampling,
but at no point has there ever been data that shows that open source is as good as even the rest of the industry,
which is only 22%. And just like from my own experience, when you work in these communities,
as you work your way up the kind of engagement stack from user and casual contributor,
and then eventually into leadership the numbers
just get smaller and smaller and smaller and when become like less and less visible um with with a
couple you know individual communities as as really important exceptions um that should probably be
studied so we can figure out how to do this better um but yeah i i think yeah i i'm very i'm unhappy
about the number but uh now i can see why it would be there.
So if someone has my reaction, and this is a question for all of you here, what can we as a community do?
What are some ways to fine-tune that ratio to be a bit less of a chasm between the two?
One thing that made me happy from the reaction was there are a couple of people who offered to prominent open source contributors who offered to mentor people that are trying to get
into open source um i think from the react community there were a couple people so i thought
that was really nice just for people to be sort of like aware that if the numbers are that bad like
it's really important to keep an eye out for people that are interested in contributing but
might need a little bit of an extra push um that's not obviously like super scalable but i thought it was just like a nice
human response um and yeah i mean stuff like about the like for me the documentation part of it ties
really strongly back to this of document your stuff and make it as transparent as possible so
everybody understands how to get into it that was why we did the open source guides earlier this year too,
so that it doesn't feel like open source is this big shadowy process
and you have to talk to the right people to understand how it works.
Some of that ties into also the findings around people who had given
or received help from a stranger in open source
and seeing that women were less likely to ask for help from a stranger in open source and seeing that women were less likely
to ask for help from a stranger
because there's sort of the assumption
that I'm not allowed to do that or whatever.
So just really going out of your way to knowing that,
and it's not just women.
I mean, like there are a lot of people
who are hesitant to contribute
because they don't feel comfortable
asking for those sort of things.
So making processes really explicit and transparent
might bring more people out of the woodwork than you would expect.
That was my take on it.
This last point here too, half of contributors say that their open source work was somewhat or very important in getting their
current job or role. I mean, that's in the same area we're talking
about here in terms of this data being shown on the website. But that
knowing how crucial open source on the website, but like that, knowing how crucial
open source is in general, but then also at the micro level of me or someone else getting a future
job or the, the, the dream job, so to speak, how important it is to be able to interact with open
source makes it far more, even more important to, to be welcoming, you welcoming, because it's that important to them as a person,
but also generally to tech in general.
I've also seen data that shows that people with open source experience
make more money on average, too, than other average developers.
So it's also important there.
Yep.
I think one of the answers to this problem
you can see in this data a little bit, right?
You have the differences between men and women on some of these things.
And some of the biggest gaps are in code of conduct and welcoming community.
This is just far more important to women.
And I've certainly experienced this as like a conference organizer where you're trying to invite and get people out.
And especially when it's the first time that people are being visible,
women are much more cautious about this than men are.
Is it the code of conduct and what it says,
or is it the fact that it's there because somebody knew it was important enough
to put there and take the time to figure out what that community's conduct
should represent?
I feel like at this point, if you don't have it,
you're making a different kind of statement.
It's not like the code of conduct makes a statement.
It's actually not having it as the statement.
And that's a really negative statement.
I don't know if it's, let's use the changelog as an example.
Because we literally, as of maybe a week, two weeks ago,
three weeks ago maybe, and today is June 5th.
WWDC day by the way.
Apple good stuff, whatever.
But we just recently put a code of conduct in.
And that's not because we're idiots.
It's because we didn't think we really needed one.
We're a podcast primarily.
A group of podcasts.
A newsletter.
So we didn't really have a community.
And we've actually had a membership
slash community for a while now. And for me, my paralysis, and I'm not sure if this reflects
Jared's opinion, but for me, my paralysis around it was like, I don't know the first step to enact
one. Do I write it by hand? Do I adopt one from another community that best represents me?
Kind of going on a rabbit hole here, but the point I'm trying to make is that even us,
I would imagine we're pretty close to open source and we realize how important this is.
Only recently did we enact one.
And I don't feel like we were behind the curve.
I felt like we did it when we needed to.
You know?
You know, we wrote an open source guide about that.
Just saying.
Maybe I didn't read it yet one of the neat things about that's good uh one of the neat neat things about the um code
of conduct findings we didn't um highlight this in the write-up but um i i kind of wish we had
was that it has this reputation of being controversial
because there's like some loud people on the internet who really,
really hate them.
But our findings show that they're actually not controversial.
So we,
we allowed people to say,
to say about all these things in that table that,
that you're looking at,
from responsible maintainers all the way down to contributor license agreements.
Is it very important to have it all the way to doesn't matter either way,
all the way to very important not to have it.
So we allow people to say, like, I really, really don't want to see this on a project,
all the way to I really want to see it. And only something like 4% of people said it was
important not to have a code of conduct. The vast majority of people either really want to see them,
they say it's important, or they're kind of indifferent about it. And that includes men.
If you throw all the women in the data out, that still true um so they're they're broadly popular you're not
actually going to dissuade a lot of contributors um from participating if you have one even though
there's this sort of like reputation on the internet that it's sort of a you're taking
aside in a culture where like most people actually are like pretty happy to see one or they don't
care either way so it seems like a
a good and easy way to make your community welcoming to the people who care about it and
it's not going to turn away other people i think either way it's it's really about opening the
door right like it like you said if if people are indifferent then they're not going to stay
or leave because of it but there are going to be people who don't come at all if you don't have it yeah and it's not just codes of conduct too i mean
i think there's also just a broader cultural issue of i i can only speak from my experience i guess
i'll just do that but of sometimes like wanting to have permission to do something or feel like
you're not good enough and unless someone says you're good enough to do a thing and then you
and then you're okay to do it so it it's not like, for me at least,
it's not like I wouldn't speak at a conference.
It's not like if I see a code of conduct,
it's like, oh, great, this is the conference for me.
But it's more like, well, am I qualified as a speaker?
Is that okay?
And so for me, one of the big takeaways
from something like certain groups
wanting processes clearly spelled out or
documentation is just like really encouraging people when they're on the fence to just say,
yes, you can do it, just do it. And, and just being really supportive and encouraging of people
that might be kind of like hovering in the sidelines, unsure whether they want to participate
or not. Like everyone can play that role day to day of if you see someone with potential,
just encourage them to get out there.
Yeah.
And I think to call back to what we were talking about earlier with sort of signaling to everybody watching that, you know, you care about this kind of thing.
How you handle that pull request and that discussion around a code of conduct says a lot about the project.
Because while it isn't controversial among most maintainers, as this data shows,
it is controversial among 4chan members
who will show up and just start saying stuff.
And either you allow that to become
a giant thread that derails the pull request,
or you merge it and close it and lock it
and say, I'm sorry, but that's not our community.
Go away.
And that sends the right kind of signal, right?
That's a good takeaway. The people who are really loud about this stuff are not actually representative
of the silent majority.
Coming up after the break, we talk about the relationship
of businesses and open source, how they use it, whether
or not they encourage contributing to it. We also examine and open source, how they use it, whether or not they encourage contributing to it.
We also examine how open source has become the cultural default in many cases, what people
actually value in open source, because you might be surprised, and what you can expect from the
future of the survey. This episode of The Change Log is brought to you by our friends at Microsoft and Azure Open Dev Conference,
their upcoming no-cost live virtual conference that's focused on showcasing
open source technologies on Azure. Engineers are looking to bring more of the open source tools
they know and love to the cloud, but often need a grounding on what to look out for and what to
expect. The fastest way to learn is to see live demonstrations and get time to Q&A with experts in the field.
Microsoft is providing this at no cost.
It's a virtual event, which means you don't have to travel anywhere.
Reserve your spot today.
Head to azure.com slash open dev.
That's A-Z-U-R-E dot com slash open dev
to register for this free live event from Microsoft.
It is on June 21st, 2017.
And now back to the show.
I can't remember the exact quote you have, Michael,
but you had a quote on the bonus episode, the season one, the behind the scenes episode of Request for Commits about the relationship of businesses and their relationship with open source and how it's severely skewed.
With so much as this data shows of open source contributions happening on the job, it kind of reflects back onto that idea
that more companies should help sustain open source.
And that doesn't mean just money.
It might mean the 10% time or the 20% time
being allowed towards open source.
We've been talking about this on Request and Recommend.
What do you all think about this finding?
Are you surprised? Are you enlightened?
I have a couple of things that are surprising about this.
So one is
I was really surprised that
the number of people
that found certain policies unclear.
That was really interesting to me.
And I'm very curious,
do they contribute anyway when it's unclear?
Do they use it anyway when it's unclear?
Yes, they do.
There are actually separate questions like, what's your employer's policy
and what do you do?
When the people who say that it's either unclear or they don't know
what the policy is, their practices
look like those of people who say their
policies are supportive of it. So in the absence of
any clear rule, people will do it.
And I think that demonstrates you kind of need to in modern software development. Open source is
pervasive. You can't not use it and you need to fix things if they don't work.
So if you have any justifiable way of doing it, you will do it. That's what I took from that.
So, see, I find that really fascinating, right?
So they're doing it anyway, but it's not like they're being encouraged to do it.
Like, I think the people that say that they have a permissive, that means that their employer has been very clear that they can and that they should, right?
It was people, permissive means either your employer is encouraging of it, which was like about a little less than half, like 46 percent, or they were accepting of it.
Like they wouldn't tell you not to do it, but they wouldn't necessarily like go out of their way to encourage it. So what is the disparity between people encouraging the use of open source versus the contribution to open source?
I'm sorry, I don't quite understand the question.
So using open source applications and using open source dependencies are incredibly high,
right? So you have the permissive, which I think covers, encourages and allows, right?
Yeah.
Yeah.
So that's really high, right?
It's in like 80%. But it drops considerably when you talk about contributing.
So clearly there's like a huge disparity here
between people that are being strongly encouraged
to use open source versus like,
and then they're not encouraged in the same way to contribute back to open source.
Right.
Well,
specifically the question asked about non IP policies on non work
contributions.
So it might be that people don't know what their IP policies specifically
is.
Um,
it may not mean that they're not allowed to do it or not encouraged to do it,
but like that specific policy they're not super familiar with or that there's
something about non-work contributions that are different.
We actually didn't ask about contributions on the job,
which we should have. That was an oversight. Next year, we'll ask about it.
Does that mean they're
on work time, they're contributing
to non-work open source
and they're not sure about the
their ability to contribute
to non-work open source while working?
It could even be off work, though, because people
have employment agreements that say that
everything that they produce is owned by the company.
Whether they're working or not, or even the hardware they're using.
I think it's really telling, right?
It's not surprising that everyone's using it, but it's unclear how you're able to contribute back.
And hopefully that will change.
So should that just mean that employers should make it a bit more clear about the company's relationship with open source and the permissions around it?
Well, but to come back to your point, right,
you were saying that they should dedicate some amount of time.
And I don't think that we're at even the place where we can say
that they should be dedicating some amount of time
because what this looks like is that employers are encouraging people
to use open source and depend on open source
at a rate that they're not telling them to contribute back.
And so just getting those level might be enough without
dedicating four hours or whatever. Maybe we should be setting our goals a little bit differently.
I mean, they definitely go hand in hand. If you don't have the policy and you don't know what it is,
then you're definitely not going to just contribute before you know what the policy is.
If you don't know the policy or if there is one, you're not being encouraged to do so.
Yeah, I mean, it sounds like people are anyway.
And I think that that's kind of telling, too, because it looks like they're doing it because there's no other way, right?
There's no other way for them to get this bug fixed in their critical dependency unless they have this policy.
So they just kind of ignore it.
I did notice that there was a restrictive,
like people that actually are restricted
from contributing and using.
And that's the most offensive to me,
even though it's a really small number,
because it looks like,
I can't eyeball the numbers from the graph,
but it's either two or three times
as many people that are,
so people, that means that there's like,
you know, a bunch of people that are being
highly restricted from contributing to open
source, but are not being restricted
from using it in any way.
Which is just offensive.
Yeah.
The last one, though, is a bit telling around
the last inside list. Open source
is the default.
Did you really need to do the survey
to find that out, though?
But you know what I thought was...
It's funded back up your claims with data, though.
Right, of course. We need that.
What I thought was really interesting about this is
it wasn't that people necessarily thought
open source was better among
most parameters besides security.
People actually
think proprietary does better in a lot of situations,
and yet, vast majority of people
will still seek out open source options
which says something a little bit more interesting
to me of it's not just like
people recognize that open source is better than
proprietary and that's why they do it it's like
they don't even know if it's better or not but for some reason
they're just going to keep using it anyway because it's so
like culturally default at this point that
like that's just what you do that it doesn't even
matter what the quality is.
And that kind of says something too
about the state of where we're at with open source.
Everyone is going to use it.
It's also free though.
But interestingly, people didn't value the cost
as much as, cost was not even one of the top reasons
what people value in choosing software,
which is also really bizarre to me
because you would think that those two things
would be explicitly tied. I think it goes even, I mean, this is just me guessing,
but I think it goes even beyond that of people just do open source because they hear that's
what they're supposed to be doing. People aren't really thinking twice about what,
even that it is open source, or the cost or anything. It's just like, this is what I do.
I take this software, it comes from magical berries somewhere, and then I just put it in my software and that's it.
They heard React was cool, and so they used it. Right, exactly!
I mean, that matches up at least with anecdotally how I understand a lot of software gets made.
Yeah, stability, security, and user experience are the highest
of the importance graph, yet 72% say they always seek it out.
And that kind of says to some degree, regardless of stability, security,
and user experience, because as you said, it's the cultural default.
And you can see cost is kind of like down there in the middle somewhere.
Yeah, I mean, I didn't answer the survey.
I wasn't one of the lucky ones who clicked five times
and went through the special portal. So I didn't get to answer this, but't one of the lucky ones who clicked five times and went through the
special portal.
So I didn't get to answer this,
but to me,
I pay for things,
right?
I pay for things that they,
if I value them,
I pay for them.
I don't care how much it costs,
but cost matters.
But you know,
like I'm not seeking it for free.
It's,
it's like if it matters,
you pay for it one way or another,
whether it's involvement or actual dollar exchange yeah
it probably matters like kind of where in someone's life cycle they are like i bet if we
like made this plot and split it out by whether people were students versus employed it's true
like maybe cost would be that's true like even if i didn't uh if I didn't have the money to pay for it and I still valued it,
I'd be like, well, I can't buy it, so I'm a student.
Yeah, I haven't looked at that.
Someone should look at that.
I bet it's interesting.
So something like 20% of the data is people who are students.
Well, that's a good point right there, I think,
is that someone should look into this.
So this data that you've pulled
back is all open source.
The GitHub repo is linked up on the website
opensourcesurvey.org, so it's not like
you can't go and find it. You can download the data
and just get started. There's a
big download button at the bottom of the site.
So this, what we've been talking
through is your findings of this.
This is your insights from this.
That doesn't mean that somebody else might go and
that doesn't mean the questions changed, but someone can go back
to this data and kind of pull back more insights than the five
you've shared here. You've dug into quite a bit, shared a lot
of details, even came up with some graphs to kind of
share all the data points that we've talked through.
Yeah, the whole point is that it's an
open data project, so we hope people will use it and learn from it.
There's a lot more in here than we've covered.
A lot of really fascinating things that we found
that didn't have room for in the write-up.
So please do go analyze it and tell us what you found.
We really want to know how people are using it
because we'd like to do this again. We want to know what what was useful what didn't turn out to be useful how are people using
it can we expect like maybe uh this to be a once a year thing twice a year thing i mean open source
moves fast so everybody's just trying to keep up so should we do this once a year twice a year
what's the what do you see the future of this survey becoming? You know, so it was a ton of work.
So I've been sort of thinking like once every two years,
but if we get a lot of people are using it
and a lot of people are finding either the insights
or the raw data really valuable.
And people have ideas about things they want to know over time or new questions they want to ask.
I think now that we've done it once, I have a pretty good idea of how we could change how we did it so that it would take less time.
Well, the insights part seems to be the most time consuming.
Actually conducting or allowing somebody to take the survey seems
to be a pretty passive type of role actually you know the hardest part was writing the survey
uh because i didn't have like i wasn't familiar with uh kind of the existing research on open
source so i had to go learn about it and I had to go write a whole 50,
54 question instrument with a lot of help from our collaborators in academia
and industry. I don't know if we've talked about them yet,
but like tons of contributions from people who are doing a lot of active
research in this field. But that was actually the,
I think the most time
consuming part of it well now that the without having done all the work you've done of course
i can have this point of view but now that it's there do you see the questions needing to change
very much to continue the survey does it have to? Or is this something that just sort of can kind of keep
operating on the random selections, as you mentioned to Michael's question a bit earlier?
Can the same 50 question, 54 question survey keep going to kind of keep gathering
and kind of keep maybe a real time pulse on the results?
Yeah, I mean, that's a good question. So one of the things we've tried to be
conscious of is that open source is a community that's over surveyed. And so we don't want to add to the noise, right? Like people in open source are constantly getting emails asking them to take surveys. Partly, we hope that we can cut down on some of that uncoordinated research efforts by providing one single high quality data set that everybody uses instead of trying to like piece together their own.
You know, we don't want to like bother people with research unless we think that these things are actually changing over time but if uh if there is like if people want to know like we saw this last year we want to know how
it's changed a year later because we've invested a lot in say our documentation efforts uh that
that's certainly something that we could consider doing i was sort of thinking that we would do a
whole new every time we do this we just just ask completely different questions to try to open up new avenues of research.
But it kind of matters what the research community and what the open source community wants out of this data.
And obviously open source is, hopefully in two years from now, there are going to be so many different things that open source will be facing, given the pace that it changes at.
Yeah. so many different things that open source will be facing given the pace that it changes at.
So before Adam closes out the changelog here,
I'm just going to do a complete takeover for RFC.
Please do.
So RFC, we focus on open source sustainability.
And I'm staring at this figure about what open source users value in the software.
So why are people actually using open source?
And so we know that there's just these widespread sustainability issues.
And the first things that they impact, and the things they impact the worst, are stability
and security, which are the most important things to people.
And conversely, the classic business models around funding open source rely on support or new features.
And support is ranked the absolute lowest thing that people care about, with innovation being second to that.
So it's just like literally the things that people care about and how we've traditionally looked at funding are at opposite ends of the spectrum.
I was hoping someone else would notice that. I'm glad you pointed that out. I think it's funny
because almost always when people talk about turning an open source project
into a quote-unquote real business or whatever, it's like, oh, we'll just
offer support and services. And it's like, actually, people don't really care.
Well, that should be a clear indicator to anybody who's going that route to say that's
probably or could be the wrong route to go.
I mean, it'd be nice to do this in a year and see if that number changes much, though, because I'm very surprised that those two are the lowest.
Let's see if support gets lower.
I don't know if it could get any lower.
It's literally right above less important.
Well, that scale is...
That's not the full scale.
I think one thing to keep in mind is that these things...
We know these things actually do vary quite a bit by community.
So actually, this particular set of questions was taken from some ongoing research by a lab over at Carnegie Mellon that studies differences in
the values of different ecosystems. And so they've done this among a number of communities and found
differences in what different open source communities value in the things that they
build and in their own processes. And so this is sort of overall,
if you aggregate all of the projects together,
this is what falls out of that.
But there's probably significant variation
between communities in what they value.
So for your individual project,
it may not be the case that support
is the least important thing.
But when you aggregate everyone together,
that's how it falls out.
Any call to action for those listening? So we got lots of people who listen to this show, is the least important thing. But when you aggregate everyone together, that's how it falls out.
Any call to action for those listening?
So we got lots of people who listen to the show,
a lot of people who care about open source.
Either they're contributors,
they desire to get into new languages,
they listen to the show for various reasons.
It all chasms of developerhood, so to speak.
You know, what any core call to action could you give?
I mean, obviously go check it out, pull down the data,
play with it if you're a data scientist or anybody to sort of gather your own insights.
But what other call to actions can you give
to the listening audience?
I really hope that open source projects
use some of this stuff to figure out
how to get new contributors on board
and how to strengthen their communities
because I think there's a lot of really good insights
around that.
So opensourcesurvey.org.
And I also mentioned that at the very bottom, you can subscribe to Survey Update.
So you can put your email address in there, click the button subscribe.
And I guess that means that we'll obviously have to do another show like this because
this was super fun.
I mean, I love just kind of roundtabling this, you know, kind of digging through everything
and getting to hear different perspectives.
It's been a lot of fun.
So Nadia, Michael, Franny
this has been fun, thank you
thank you
the changelog is produced by myself
Adam Stachowiak and also Jared Santo
we are edited by
Jonathan Youngblood and our theme music
is produced by the mysterious
Breakmaster Cylinder
you can find more episodes like this at changelog.com or by Blood and our theme music is produced by the mysterious Breakmaster Cylinder.
You can find more episodes like this at changelog.com or by subscribing wherever you get your podcasts.
Thank you to our sponsors, Sentry, TopTow, GoCD, and also Microsoft with their Azure Open Dev Conference.
Check that out.
Also, thanks to Fast fastly our bandwidth partner head to fastly.com to learn more and also
linode our cloud server of choice head to linode.com changelog and we'll see you next week Thank you.