The Changelog: Software Development, Open Source - GitHub's Open Source Survey (2017) (Interview)

Starting point is 00:00:00 Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com. And we're hosted on Linode servers. Head to linode.com slash changelog. This episode of the Changelog is brought to you by our friends at Sentry. They show you everything you need to know to find and fix errors in your applications. Don't rely on your customers to report your errors. That's not the way you do it.

Starting point is 00:00:27 Use Sentry. You can start tracking your errors today for free. They support React, Angular, Ember, Vue, Backbone, Node frameworks like Express and Koa and many, many other languages. That's just JavaScript I mentioned. View actual code and stack traces, including support for source maps.

Starting point is 00:00:50 You can even prompt your users for feedback when front and errors happen so you can compare their experience to the actual data at the changelog.com slash century start tracking your errors today for free no credit cards required get off the ground with their free plan and when you're ready to expand your usage simply page you go again, changelog.com slash Sentry. Tell them Adam from the Changelog sent you, and now on to the show. Hello and welcome to the Changelog. This show is about getting to the heart of open source technologies and the people who create them. And on today's show, we're talking about GitHub's recent open source survey with Franny Zlotnick, Nadia Ekbal, and Michael Rogers. You may know Nadia and Michael from our other podcast called Request for Commits, changelog.com slash RFC. insights of this open data project from GitHub, which sheds light on the broader open source community's attitudes, experiences, and backgrounds of those who use, build, and maintain open source software. So we have a fun show today, a different show than maybe our normal episode of The Change

Starting point is 00:02:02 Log. Jared's sitting out. He's got an awesome baseball game to attend with his kids. And we have this cool show called Request for Commits. Michael Rogers hosts that show. And today we have a show kind of peeling back the layers of this open source survey. And Michael, you're here. So might as well say hello. Hello. It's nice to be on the changelog. Yeah, it's been a while. It's been a while. And on an episode, or I guess the after show of Request for Commits, we were sort of chatting. This was like earlier this week, I think.

Starting point is 00:02:35 And Nadia, you and Franny have done this cool thing. And many of the people too will talk about that. But this open source survey conducted by GitHub. So Nadia, you can say hello as well since you're here hi and we also have franny here which uh franny you're in data at uh github but what do you work on there um i am a data scientist uh i work on sort of long mid to long-term research projects for internal and in this case sometimes external audiences and what is it about data that gets you excited uh i really love being the first person in the world to see nothing um so i i like to like know something before anybody else does

Starting point is 00:03:20 interesting and uh carrying somewhat of a tradition of this show, we like to kind of dig a little bit into somebody's backstory. So like, you know, what's your story? You're in data, but you work at GitHub. What's some of your backstory that we can share with the audience to kind of give some context to who you are? Well, my background is academic social science. I was trained as a political scientist, and I got interested in academic social science. I was trained as a political scientist and I got interested in computational social science, which basically means doing social science with data sets that are too big to kind of handle with the normal tools. And so I started doing a little bit of CS and data mining type methodologies.

Starting point is 00:04:03 And that kind of led accidentally to working at GitHub. I actually interviewed before I knew what a GitHub was. And yeah, I just I sort of fell into it accidentally. But it appealed to me because really the fascinating data on GitHub is all really social data. It's about how people are working together to build things. And so it's actually a really fantastic place to be doing data work from a social science perspective. I have an interesting history with GitHub to some degree because I remember sitting in San Francisco in a random office with Chris Wanstroth and Tom Preston Warner conducting an interview for a podcast called The Web 2.0 Show. And this was like literally a month and a half after GitHub launched.

Starting point is 00:04:55 So it's been a trek. So GitHub has evolved over the years. It started out as social coding and has done all these cool things to raise, I guess, the bar of open source but also make it so much more accessible to the world. So GitHub has evolved tremendously. At first launch it didn't need maybe it did need data scientists, I don't know, but you can totally see a true need now. And then obviously the survey, opensourcesurvey.org

Starting point is 00:05:23 I guess that redirects to 2017 because you have plans for future versions of this. Is that true? Yes. Yeah. So we're in a new world where GitHub is essentially the, many might say, the epicenter of open source and now needing folks like Franny to help make sense and see data first and hopefully put back the layers of what's important out there. One of the things that I find really interesting about this data set too is that, you know, GitHub changed open source and now those people that are flooding in are sort of changing GitHub

Starting point is 00:05:54 a little bit too. And it's really interesting to see some insights into what they think. How did you, like, how did the survey start? Like there's a lot of academic surveys out there. So I'm curious, like what the motivation behind this particular one was. This particular survey was kind of the idea of a guy who used to work on the open source team

Starting point is 00:06:16 at GitHub named Arvin Smith, who was our open data person, program manager for open data. And I think he's been on, I think you guys have interviewed him before. Yeah. Yep. Um, yeah. So he, uh, he came to me, um, several months ago, eight, nine months ago, I think maybe more, uh, and, um, had this idea to use the access that we had to the open source community to gather high quality data for researchers, and in particular academic researchers,

Starting point is 00:06:55 studying open source development and processes, help them get better data than they're able to get otherwise. Because you're right, there are a ton of surveys out there because this is a fascinating domain. It's a weird and unique form of production of public goods that a lot of the world's critical services are based on. And people are really interested in understanding how and why people participate or don't in this community. But it's hard to get good data on it.

Starting point is 00:07:32 Why is that? What makes it hard to get good data? Is it selecting the right kind of people or them self-selecting? Yeah. So for the types of questions that people are interested in, a large part of the problem is getting access to the right people. So this is a community that's kind of over-surveyed. People get a lot of emails asking them to take surveys, and at some point they get tired of doing it. And so it's hard to get the people you want to talk to to take a survey. So sampling is hard.

Starting point is 00:08:06 The unbiased way is really difficult to do. So generally what people have done is they'll go through public records of open source projects and look for people who have committed or otherwise participated in the project and then just email them. But that means you miss this huge community of people that we know are there who are using projects and looking at them, but not necessarily actively contributing to them. And we think that that's a really important part of the community as well. But unless they leave a visible artifact of having been there, there's no way for the researchers to know that they're there.

Starting point is 00:08:48 So by having access to the traffic going to these projects, we have an ability to get to people in a way that is virtually impossible for most other people. Yeah. It certainly makes sense to be GitHub and conduct this interview because otherwise on the outside of me, like you had just said, it's difficult to sort of seamlessly access those kind of people. Let's rewind a bit and kind of touch base on exactly what this is. So this is an open source survey conducted by GitHub in collaboration with researchers from academia, industry, other folks from the community. The purpose was to gather high-quality insights and data from those who are interacting with or even just checking out open-source projects.

Starting point is 00:09:32 A lot of responses, a little over 5,000 responses, not only from GitHub's data pool, but also it seems like random samples from other communities that aren't the GitHub platform. And then open-scing that data set. Is that right? So it's a little over 5,500 randomly sampled respondents from open source repos on GitHub, and then another 500-ish responses from a non-random sample of communities off GitHub.

Starting point is 00:10:04 So we know that not all open source is on GitHub. There's a bunch of very important projects that predate GitHub or work on other platforms for lots of reasons. And we wanted them to be part of the sample as well. It's harder for us to access them for obvious reasons. So that sample is non-random, but we did try to make an effort to make sure that they're represented in the data as well. Right. And when you say like random and non-random, that's really important to say for what reason? As somebody who doesn't do data much, why is it so important to make that distinction?

Starting point is 00:10:41 Yeah. So typically the way these sorts of the open source surveys have been done is to use opt-in sampling. Basically somebody makes a survey and then they publicize it in lots of places like Twitter or on a website. And they basically kind of broadcast that they're doing a survey and ask people to come to them. And that means that people go out of their way to say, I want you to hear my opinion. And that means that people go out of their way to say, I want you to hear my opinion. And the people who do that are kind of weird, right? They're people who have really strong opinions on things

Starting point is 00:11:12 or they're people who think taking a survey is a fun way to spend 15 minutes, which is weird, right? Like there's lots of reasons why that's not, that gets you a pretty biased sample. It gets you a like biased sample. It gets you a very opinionated sample. It's subject to basically kind of gaming. People will try to send the link to specific groups that they want to take the survey, but maybe not other people. And so the way you get high-quality data that is closer to representative of the whole

Starting point is 00:11:49 community is you randomly select people, hopefully in a way that gives everybody who's there an equal probability of being invited to take the survey. With that in mind, you've got quite a few questions. So maybe not a 15 minute survey survey, maybe a half hour, 45 minutes. This is 50 questions. What's the rough average that you would expect for someone to dedicate towards answering this? This is actually, it was like an 11 to 15-minute survey depending on which set of branches you hit. So if you answered some questions in a certain way, you'd get some additional questions. But the average time was something like 11 minutes.

Starting point is 00:12:30 And that was intentional. A really long survey is really tedious and taxing. And we wanted to, you get better data if people don't find it really annoying to take your survey. So we made a big effort to make sure that we use almost exclusively closed response questions that we wrote them in a really straightforward way that it was easy to take the survey as fast as possible, given the volume of stuff we wanted to cover, so that it wouldn't take up too much of people's time. And we had actually a really good completion rate. It was something like 50% of the people who started taking it finished it, which is really high for a survey of this length. That is really high because I've visited lots of surveys and I'm like, no.

Starting point is 00:13:18 I was nervous about that. I was afraid people weren't going to take it because it was a really long survey. But yeah, it turned out to be great. Well, surveys in general are pretty tough. You may have some experience with Node foundations. I know you do surveys there or have done surveys there each year. Were you involved in that at all? Yeah, yeah.

Starting point is 00:13:36 I mean, we did the one where you kind of blast everybody and you try to get everybody to get in on the survey. The one thing that we do to try to quantify what kinds of respondents we're getting is that we ask a question, how many years have you been using Node? And we have a pretty good idea of what the growth trajectory is. So we know how many users in the overall community have only been using it a year, two years, three years.

Starting point is 00:14:00 And so we know that the respondents tend to be people who are in this slice of our community that is very high. And so we don't the respondents tend to be people who are in this slice of our community that is very high. And so we don't consider the results to be, you know, representative of the entire community, more like representative of the people that have more experience, because those are the people that end up filling out the survey.

Starting point is 00:14:16 And we have a few other questions in there that help us kind of slice up the data to know which section of the community that it's addressing. We don't really have any kind of mechanism to randomly select the way that GitHub can't hear. So they can really look at it and get like a very truly representative sample. Yeah, and to be fair, there's also another, we have this problem of response bias too

Starting point is 00:14:38 where not everyone we invite actually takes it. So like some certain percentage of the people that we invited to take it actually did so. And so there's certainly ways percentage of the people that we invited to take it actually did so. And so there's certainly ways in which the people who did decide to take it are different from the overall community. For example, a huge percentage of people say that they are maintainers or people have actually contributed that. And we know that we see in our traffic numbers that there are many, many more people who are just visiting or using repos without actually necessarily actively contributing to them. And they are not represented in the data in as high numbers as we see.

Starting point is 00:15:14 So there's still this sort of bias in who decides to participate. But because we invite it in a random way, it's a better sample, not necessarily precisely representative, but it is higher quality than you would otherwise get. Let's find out where this came from, if you don't mind. I want to go deeper into the context, some of the insights discovered from this. But it sounds like this has been in the making for quite a while. You mentioned Arvon Smith, also on season one of RFC. Great episode. We'll put that in the show notes, so go check that out.

Starting point is 00:15:48 But this predates your getting hired there, Nadia. So where does this begin in terms of motivation? What was the purpose? What were the beginnings of this effort? Yeah, the motivation was really just to make data available for people to do interesting and good research on this community.

Starting point is 00:16:14 It's good for us internally at GitHub if people are doing research on what makes open source sustainable and healthy. The data is useful to us as well. And we also have interest in making sure that the data is available not just to researchers, but also broadly available to the people who can kind of do the most with it, which is the community itself. So the idea was really just to use our kind of unique position with regard to the community to be able to create high-quality data

Starting point is 00:16:53 that could then be used by other people to do interesting research and to help people make decisions on what their communities need and help people kind of understand the different parts of their communities need and help people kind of understand the different parts of their communities. You know, like help maintainers understand contributors, help contributors understand users who don't necessarily contribute, things like that. Can you go back to the moment to some degree?

Starting point is 00:17:21 Like, was it some sort of message, an an email was it a face-to-face conversation like what was the original context for like hey we should do this survey we need to know this information and kind of who was there and what was what was going on to sort of surface this this desire uh i think it was a it was a video chat between me and arvin um he sent me a message he just sent me a message, asked me to meet. And I think at this time Arvin was doing his like nomadic thing. He was traveling around the country in a van with his family. And so I don't remember exactly where he was,

Starting point is 00:17:56 but I was in our office and he was in a, in a van somewhere. Last time I caught up, well, we caught up with him for RFC. He was in Canada. So that was about nine months ago. So it might have been then. It might have been then. I think I remember there being some trees behind him. It looked really pretty.

Starting point is 00:18:15 Canada's got a lot of trees. And he just pitched this idea to me. He said, I would love to, if we could do a survey of open source, do it in a high-quality way. And it's impactful. People are using the data to make decisions about what they do or they're publishing papers on it. Can we do something that would provide good data, help get people interested in using our data for other types of things, and also make sure that when people are trying their best to make data-driven decisions, that they're using high-quality data. And I thought that sounded really interesting. I've mostly been working on internal facing things, you know, the reports for internal people at the company.

Starting point is 00:19:26 And it sounded like a really fun opportunity to learn more about open source because I'm not a domain expert in it. And to do something that would have a really wide audience and potentially a really large impact. After the break, we'll dive deep into the insights of this open source survey from GitHub. With more than 50 questions, this survey is by far one of the most widest range research topics GitHub has done to release as open data. So we're going to highlight some of the most actionable, important insights from this after this break.

Starting point is 00:20:16 This episode of the Change Log is brought to you by TopTal, a global network of top freelance software developers, designers, and finance experts. If you're looking for contract or freelance opportunities, apply to join TopTow to work with top clients like Airbnb, Artsy, Zendesk, and more. When you join TopTow, you'll be part of a global community of developers who have the freedom and flexibility to live where they want, travel, attend TopTow events all over the world, and more. And on the flip side, if you're looking to hire developers designers or finance experts top towel makes it super easy to find qualified talent to join your team head to top.com that's t-o-p-t-a-l.com and tell them adam from the changelog sent you

Starting point is 00:20:54 i'm i'm kind of ready to dive into the insights a little bit is everybody else ready to dive into the inside do it man Open it up, Michael. So this first one about documentation is actually very interesting to me. It seems like something obvious, right? Like, of course, docs are important, but everybody says that and nobody actually does a very good job of it. And it's interesting here because the way that it's framed, you have a question about problems that are encountered in open source. And, you know, some of the other things you ask about other than documentation are, you know, unresponsiveness or dismissive responses in issues and PRs and things like that. And I know that across all the projects that are trying to take getting new contributors seriously, there's been a huge focus on improving that flow and making things nicer and easier to get in.

Starting point is 00:21:46 And maybe not so much on documentation. And now we're seeing this data that really shows that documentation is much, much higher. Like it's, you know, something like, what is it, 95, 96% of respondents? 93%. 93%, wow. That's unbelievable. Yeah. So like, you know, you, it looks like, you know, more than just this question, y'all really dove into this to figure out, you know, what is going on here. But a thing that hasn't been as well studied are the things that prevent people who would like to but are not necessarily contributing as much or in the ways that they want to.

Starting point is 00:22:42 And so we devoted kind of a whole section of the survey to negative experiences in open source, which is, I mean, probably kind of a bummer to take the survey and only get to talk about like the crappy things you encounter. But it was where there was kind of a hole in the data that existed as I saw it. And also kind of more actionable than, you know, like tell us all the wonderful things you get out of doing open source. Like those are kind of self-evident.

Starting point is 00:23:19 Like we know a lot about that already. What we don't know are more about like the types of things that get in people's way that make it hard for them to contribute. I like the tie of tying it to our findings around different groups that aren't traditionally super well represented in open source value those processes more. And I thought that was an important way to tie together why documentation matters to also getting new contributors. Yeah, you made a particular point in here about English and how nearly a quarter of open source communities have less than very well English skills. So it's not saying all docs need to be in another language, but it needs to be in a form of English that is not

Starting point is 00:24:04 really complicated, right? That's something actually that I personally discovered while at Node Interactive recently, Michael, when I was invited to come out there with y'all and do that. And I was talking to Xia Lu, and she was originally from China, but works at Autodesk and had moved to San Francisco, but also kind of went back and forth, talked about

Starting point is 00:24:25 the Great Firewall China, you know, the disconnection there. But she talked a really big deal about the difference of accessibility to docs because of the language barrier. But then the time it takes for things to be translated and by the time it is translated or if it ever is, it's kind of too late. Yeah, yeah. The way that we've kind of solved this in the Node project is just that a lot of people that speak other languages

Starting point is 00:24:49 are contributors or committers in some way and they watch doc changes. And so as doc changes happen, they make suggestions about the language to simplify it a little bit to make it more easily translatable, but also that just makes it more understandable. I've noticed a lot of translations get out of date.

Starting point is 00:25:06 I mean, probably not as much for Node, but for smaller projects where a relatively common contribution is someone volunteering to translate the docs, but then that's not a commitment to keeping them updated forever. So yeah, I definitely see the value of, if you're going to write in English, at least simplifying the language

Starting point is 00:25:22 can make a big difference. So I saw a completely different piece of research lately. It tried to quantify all the steps that a contributor to open source goes through, not just the visible ones like the pull request and each comment and things like that, but a lot of the sort of invisible steps that they do on their own, like running the test locally and checking documentation. And I was kind of blown away by the number of times that they check some form of documentation in the project, right? Like they try to find things that are similar. They try to check the

Starting point is 00:25:54 documentation or the code style and all these things. And that should have made me not very surprised when I saw that documentation was so important. But still, I don't think that I internalized it enough. But like a huge part of the process of contributing and just going back to the documentation and reading it. Well, no one's too good for docs, right? I mean, you're never expert enough to not need docs is my point, right? Is that no matter how good you are, it's like, what was the document? And how was that used?

Starting point is 00:26:20 You're never, you can't, if you can't keep it on your brain, you're superhuman and i don't know i don't know how you got there but write a book about something it's like kind of it's kind of dumb not to like why why remember all this stuff when you can just refer to right imitation when you need it i don't want to waste that brain right exactly so i mean when we look at this it sounds like you had 50 good questions, but from that, at least at the public website, we got five kind of core highlighted actionable insights. One of them being documentation. Freddie, you mentioned the negative interaction section. So there's some insights around that,

Starting point is 00:26:58 how open source is being used by the whole entire world, not just, you know, let's say San Francisco, for example. Using and contributing to it, to open source happens a lot, even on the job, and that it's often the default when choosing open source software. Why, at least from the public website, only share those insights? What's the plan for future insights being shared? That's a good question. So I think we wanted to highlight some of the more actionable things. So things that we knew would be either really highly interesting that people would really want to know about,

Starting point is 00:27:41 so like the demographic stuff falls in that category. Some stuff that's really actionable, like write documentation, make sure that it's accessible to people with varying English skills. And then stuff that was kind of really strong signals in the data. It's certainly not all that there is in there and i think like a fully complete uh treatment of all the data in there would be just overwhelming uh and people would stop reading it they they wouldn't go through the whole thing so we wanted to keep it like pretty limited to some key things and then uh leave the rest of it open for other people to find and to research and publish on. Because the original idea was really the point was the data, that we would be putting out the data for the research community

Starting point is 00:28:33 and for the open source community to do what they will with. And the kind of insight section was more of a, oh, we found all this fascinating stuff. We want to make sure that people actually do learn these things from there. But it's kind of a small chunk of what's in there. Also add for any of those like top five or whatever insights, it wasn't like one question to one insight each of those was a mix of probably like at least five questions per section of looking at the data and saying oh i mean like like the documentation part is interesting but then the like um non-native english speakers is interesting what does that mean together and so some of that was already a

Starting point is 00:29:20 little bit of an analysis and mixing together existing questions. We did go through all of the questions as a group just to see given Franny's experience with data and my experience with open source and we had a few other folks in there too of just saying given our collective

Starting point is 00:29:39 knowledge, what do we think is really interesting and actionable here? But yeah, it's definitely not complete either. I think it really interesting and actionable here. But yeah, it's definitely not complete either. I think it's a little timely too. As we've been recording season two of RFC, we've been able to talk to a lot more people that have raised money for their projects,

Starting point is 00:29:55 for fairly big kind of notable projects. And a recurring theme seems to be, we have this money and we don't know what to do with it because there's not a lot of really discrete, specific things that you can spend it on but documentation is one of those things like you can hire somebody to just write better documentation yeah um that's like a really actionable thing and and you know armed with this data to say like actually this is the most important thing this is probably something that we should spend money on like that's that's really actionable and really timely

Starting point is 00:30:23 that makes a lot of sense because we often hear we're not really, we have money. How do we spend it? And do we spend it on staffing more people to, you know, put more developers behind a project or we, you know, do we embrace or,

Starting point is 00:30:38 you know, invite community into this? Do we do events? Do we do meetups? Do we do swag? It's like, there's so many unknowns out there. And so being able to like have, you know, unbiased opinions on what really matters certainly gives better waypoints for maintainers and community leaders and open source

Starting point is 00:30:58 to take action upon. You know, without something like this, you're sort of just shooting in the dark. It was fun to see a lot of common wisdom, I guess, passed around and seeing how that maps to the data that we found. And some of them were really spot on. I was actually thinking of you, Michael, in the negative interactions findings. Because you had this tweet from a while ago that was like, don't tolerate asshole. Oh, sorry, Adam. No, it's okay. Go ahead. Heads up.

Starting point is 00:31:27 If you're in a car with a kid, you got your kids around muted or whatever. Go ahead. Just repeating his tweet verbatim. Um, so they're like, don't, don't tolerate assholes in open source because new people that see that will,

Starting point is 00:31:41 will want to walk away or something like that. You probably remember it better than I do, but, um, yeah, well, there's an interesting Venn diagram that goes with it, which is basically like, you know, yeah, yeah, like assholes are this really small dot,

Starting point is 00:31:53 and then there's, you know, people who tolerate assholes, and then, you know, and then a much larger bubble of nice people. And some nice people will tolerate that kind of behavior, but far more just won't. And so you're excluding this much bigger group when you sort of like accommodate people who are toxic, right? And that was totally what we had found about smaller portion of people have personally experienced something, but a lot more people have seen something happen.

Starting point is 00:32:13 And there's a pretty significant number of people that stop contributing to a project when they see behavior like that. And so I thought it was good to actually have data to say this really does matter and that common practice or common wisdom is actually true and useful it's interesting to see how much negativity is after that i mean even 18 percent of respondents having that's that's that's enough i mean it's a lot yeah that means like if you interact with open source 100 times 18 times out of this 100 you're going to get you know some sort of negative reaction. I got

Starting point is 00:32:45 negativity today. Hold on. You're interpreting the data a little bit off there. This is of individual people, not individual interactions. Well, that's 18% of response. Oh, I guess you're right. You're right. I'll stand corrected then.

Starting point is 00:33:02 But I'm sure my number is just as accurate. I mean, it's definitely not great great it's definitely higher than you would want it to be but it's also not out of line with data from similar communities especially online communities like this is it's not necessarily um just open source that has this problem um but i think the visibility of open source, like everything being open and kind of having a little bit of a viral aspect to it, like people send really like kind of fire issues to each other just to, you know,

Starting point is 00:33:37 take a look at what happened over here. That's the visibility might be higher. But this kind of like people being jerks to each other on the internet is a thing that happens in a lot of places i think one of the insights here is like you have 50 of people have witnessed negative behavior and 21 of them have actually left a project because of that just just from the witnessing not even necessarily the being you know the victim of it well that's the truth it's like if you if you just witness the witnessing, not even necessarily the being, you know, the victim of it. Well, that's the truth. It's like, if you, if you just witnessed the negative behavior, you're going to assume that that's a common thing or a standard or it happens often, especially

Starting point is 00:34:15 if you see negativity unresponded to, you know, like allowing it, you know, that's, that's a difference. It's like, that's why I think maybe your tweet, Michael, and Nadia, your mention of it may be right on point because if you allow somebody to be that jerk, then, you know, you're you're not so much just as. At fault, but somebody needs to to say you know if you're going to participate in this whatever this is here are the rules for which we all agree to abide by and if you don't here's the ramifications and here's how you can you contact someone to say hey this is happening can you please make it stop i want to re-emphasize fernie's point about the that the levels of negative interactions that we saw is not that different

Starting point is 00:35:07 from other online communities. Cause one thing that was sort of sad to see in a lot of the press findings was, or a lot of the press reporting was, their takeaway was this open source is really terrible and toxic and everything is just like, you know, Linus Torvalds. And, and it makes me sad because I think at least for me, part of the goal with this data was to, I don't want to scare people off from contributing to open source.

Starting point is 00:35:32 And I think it's important to highlight that this stuff, it's not great, but it's not different from other online communities. I think open source gets an unfair reputation for that and it turns a lot of people off from it. But like none of it is great because humans are just kind of not nice to each other on the Internet, period, when left unchecked. But being able to see that is useful. I think one important difference, though, between GitHub and other online communities is that if you're like really into ham radios and you go on the ham radio forum um and you experience

Starting point is 00:36:06 negative behavior your choice is basically you know i stop talking about ham radios on the internet or i continue to put up with this and on github it's literally like i just go to another project that's not terrible and so you have this huge number of people that like actually just leave projects because of it right yeah at the same time, you can't make or instill a change unless you measure it, right? That's a known thing. So just having this data alone shouldn't say, oh, this is an issue

Starting point is 00:36:34 or this is to scare people off, as you said. It should be something that attracts some change. Being aware of an issue is the way you instill some change. Exactly. Coming up after the break, we get into a heavy topic dealing with negativity in open source. We also talked about maintainers having to be the police. And an even more touchy subject, the accuracy of this research. Is this a true representation of the overall open source community? All this and more after the break.

Starting point is 00:37:36 This episode of The Change Log is brought to you by GoCD, an open-source continuous delivery server from our friends at ThoughtWorks. GoCD lets you model complex workflows, promote trusted artifacts, see how your workflow really works, deploy any version, anytime, run and grok your tests, compare builds, take advantage of plugins, and so much more. Check out gocd.io slash changelog to learn more. And now back to the show. this is a tough subject not often discussed the impact of negativity in open source and sometimes when that happens people are forced to go into private channels or to enforce their code of conduct and get into very uncomfortable situations. Basically having to deal with these negative experiences that have real

Starting point is 00:38:29 consequences to not just the people, but also to the project. And sometimes people just don't interact or they just don't interact publicly. What do you think? Yeah. People can, they might withdraw from a project. They might keep working on the project, but start working in kind of back channels. Instead of working publicly in the repo, they might start pinging people through email or Slack or other, other methods to avoid sort of like the, the microscope, the public microscope of attention on the work or maybe risk of somebody saying something really critical and that ending up being part of their public professional record? Well, no one wants that, right?

Starting point is 00:39:18 I mean, if we were in an issue and Michael called me a name, I would probably not want to talk to him ever again for one. And then two, I'll just be like not being involved in this anymore. I'm done. Or maybe I wouldn't, maybe I would just take it and just feel stupid. What do you think,

Starting point is 00:39:36 Michael? What are common interactions to like, I mean, is that how you would respond? What do I think is in like, you want to test this out and have me call your name on the internet? Yeah. No, I think like in, like, you want to test this out and have me call your name on the internet? Sure. Yeah, sounds like a wrap. No, I think, like, the kind of top line of this

Starting point is 00:39:49 is that people that experience negative behavior, it's, okay, I would put it this way. It's really important to deal with negative behavior when it happens at the project level. The next point in here is that, you know, the most effective tool here is to ban people. And then maybe even banning people, like liberally um is a is a fairly good idea and one of the the the arguments that we hear over and over again is that like you know using bans and

Starting point is 00:40:14 using other moderation tools um they we keep creating a higher and higher bar for like the kinds of behavior that does that um but there are a lot of other negative consequences to not dealing with it like you know not just the person who's the victim of the behavior, but also everybody watching can just move to another project a lot of the time, or they might, you know, move into private channels, which is like not good for an open source project. And so it's just like really important to deal with the behavior, right? To actually moderate it in some way. Because if you don't, you know, something like three times the number of people that experience it are, you know,

Starting point is 00:40:51 potentially also seeing it and they're going to do something as a result, right? The problem with the blocking part, though, to me at the banning the blocking is that somebody's got to be the police and somebody's got to be the bad guy, you know, or bad person, so to speak. You know, like somebody, which is okay, but that just means that somebody who may not exactly want to take on the role of being the enforcer, so to speak, has to take on that role. And that has to happen in every single project. And that could become an issue just generally.

Starting point is 00:41:22 Like I don't always want to be the bad person and say, hey, you can't play anymore because you've crossed the line here. I get it. It's needed, but it's hard to be that person. We asked about individual users having the ability to block another user. That's separate from asking a maintainer to go in

Starting point is 00:41:38 and block somebody from a project. Actually, the finding is that an individual having the ability to block another user without having to involve somebody else is the part that is most effective at addressing problematic behavior. sort of legal intervention, police intervention, ISPs or hosting services, bringing in these third parties at any level is less effective than giving people the power to protect themselves on an individual basis. And so I think the findings suggest that you want to move away from having a kind of third party policeman and give people tooling to be able to say, you know, whether or not this person is part of this community or not, I don't want to interact with them.

Starting point is 00:42:31 That seems to be. So a singular level opt out, basically. Yeah, because, you know, anything that relies on maintainers, first of all, as you said, it puts somebody in this police role that they probably don't want to be in um or they may or may not want to be in and it also means that you need to have a really responsive maintainer in order to have any sort of like uh ability to have this yeah and you know there's um lots of projects where the maintainers are not necessarily responsive on the timeline that you would need in order to address it well certainly adds one more notch to the job role of being a maintainer,

Starting point is 00:43:08 if that were the case. Must also be police. I think that it's important for you to think about these things in your project. But that's a really good point, which is that a lot of individual maintainers are just not engaged enough in the project to even do that. And in fact, there's not a lot of other people engaged in that particular issue. So it's really enough for the one user that's seeing this to just block that person or experiencing it to block that person. And that kind of scales down to all of these smaller projects that don't have as much infrastructure.

Starting point is 00:43:40 I'll also just give a shout out to GitHub making improvements. There is a beta tested feature right now for temporary bans at the org level. So you can kick people out temporarily and it's not so much of a permaban. And it's a nice way to say that behavior is not appropriate. Here's a consequence, but you're not banned for life. Yeah, I see this finding as being... The people who should pay attention to this finding are people like us like github like other platforms it's like a platform level finding like this is what we need to build in order to

Starting point is 00:44:09 make sure that communities have the ability to for people to work in healthy and safe ways it's not necessarily like at the project level or it's not necessarily on a maintainer to go build something to do this right like we the platforms should make this available. Nice. So earlier we talked a bit about how this has been covered in the media. In that vein, I saw an article in Wired talking about some of the gender imbalance stuff. And one thing that I noticed was that

Starting point is 00:44:41 the metrics that they took from the survey, they were sort of implying were the overall GitHub metrics and that the overall GitHub users were only 3% women. Yeah, that's not true. I'll let you dig into that a little bit. So can you tell us a little bit about the gender imbalance findings that you found? Yeah, so I mean, they're not good. Uh, it's, uh, 95% of the people who, uh, gave a substantive answer to the gender question, um, identified as men, uh, and only 3% identified as women. Another 1% identified as non-binary, uh, is just like a profound imbalance. There's no other way to talk about it. Does this go back to the pool for which the data represents? You know, going back to the random, non-random question, biased, unbiased, and this assumes that the person was on GitHub, right? They were prompted some way to say, we have this survey, please take part.

Starting point is 00:45:48 Being that it's such a wide chasm between those numbers, 95%, 3%, 1%, I'm just wondering, given that big of a difference, how confident do you feel in the accuracy of that? Being, you know, if you took the same and you expanded across all of github and everyone who's ever interacted with github answered would that still be true uh i don't so this the way we sampled was definitely not how you would try to

Starting point is 00:46:18 get all github users like you had to um do a really specific set of actions on a licensed open source repo that indicated like sincere interest in open source in order to make it into the pool and then we randomly sampled from that so it's definitely in no way representative of the general github user base can you share what those actions might be maybe Maybe to kind of give folks a... Yeah. Is that secret stuff? It's not secret. You had to do... If you download the data set, there's some documentation that has more details than the website.

Starting point is 00:46:54 You had to do something like three clicks on a licensed repo or visit three licensed licensed projects uh within 30 minutes in order to to make it into the sampling frame and that's because it's really easy to fall into a github repo from google and like not really intend to be there right uh so we were trying to make sure that we're only getting people who like seem to have a sincere interest in open source. To get back to the sampling part, there's a possibility that maybe women are more likely to use open source or be interested in it, but not be contributors. And maybe they felt like when they got invited, we didn't actually mean them. Maybe they don't consider themselves a member of the community

Starting point is 00:47:46 or something so they thought like oh we probably don't want to hear from them like that's possible but um like an imbalance this large i i think that this is probably pretty pretty accurate for open source and it's consistent with you, basically all other research that's been done on open source communities that like between one and 10%. The reason why I asked this question isn't to, isn't to deny the accuracy. It's because it makes me sad. It took me a few weeks to like process that finding. Yeah. I mean, that's really sad.

Starting point is 00:48:26 If that is representative of the truth, we've got to do a better job. Well, understanding how the selection works now, I'm not that surprised. I mean, I've seen these other studies that show between 1% and 10%. And I mean, all of them have different issues in their sampling,

Starting point is 00:48:46 but at no point has there ever been data that shows that open source is as good as even the rest of the industry, which is only 22%. And just like from my own experience, when you work in these communities, as you work your way up the kind of engagement stack from user and casual contributor, and then eventually into leadership the numbers just get smaller and smaller and smaller and when become like less and less visible um with with a couple you know individual communities as as really important exceptions um that should probably be studied so we can figure out how to do this better um but yeah i i think yeah i i'm very i'm unhappy about the number but uh now i can see why it would be there.

Starting point is 00:49:27 So if someone has my reaction, and this is a question for all of you here, what can we as a community do? What are some ways to fine-tune that ratio to be a bit less of a chasm between the two? One thing that made me happy from the reaction was there are a couple of people who offered to prominent open source contributors who offered to mentor people that are trying to get into open source um i think from the react community there were a couple people so i thought that was really nice just for people to be sort of like aware that if the numbers are that bad like it's really important to keep an eye out for people that are interested in contributing but might need a little bit of an extra push um that's not obviously like super scalable but i thought it was just like a nice human response um and yeah i mean stuff like about the like for me the documentation part of it ties

Starting point is 00:50:16 really strongly back to this of document your stuff and make it as transparent as possible so everybody understands how to get into it that was why we did the open source guides earlier this year too, so that it doesn't feel like open source is this big shadowy process and you have to talk to the right people to understand how it works. Some of that ties into also the findings around people who had given or received help from a stranger in open source and seeing that women were less likely to ask for help from a stranger in open source and seeing that women were less likely to ask for help from a stranger

Starting point is 00:50:47 because there's sort of the assumption that I'm not allowed to do that or whatever. So just really going out of your way to knowing that, and it's not just women. I mean, like there are a lot of people who are hesitant to contribute because they don't feel comfortable asking for those sort of things.

Starting point is 00:51:01 So making processes really explicit and transparent might bring more people out of the woodwork than you would expect. That was my take on it. This last point here too, half of contributors say that their open source work was somewhat or very important in getting their current job or role. I mean, that's in the same area we're talking about here in terms of this data being shown on the website. But that knowing how crucial open source on the website, but like that, knowing how crucial open source is in general, but then also at the micro level of me or someone else getting a future

Starting point is 00:51:32 job or the, the, the dream job, so to speak, how important it is to be able to interact with open source makes it far more, even more important to, to be welcoming, you welcoming, because it's that important to them as a person, but also generally to tech in general. I've also seen data that shows that people with open source experience make more money on average, too, than other average developers. So it's also important there. Yep. I think one of the answers to this problem

Starting point is 00:52:02 you can see in this data a little bit, right? You have the differences between men and women on some of these things. And some of the biggest gaps are in code of conduct and welcoming community. This is just far more important to women. And I've certainly experienced this as like a conference organizer where you're trying to invite and get people out. And especially when it's the first time that people are being visible, women are much more cautious about this than men are. Is it the code of conduct and what it says,

Starting point is 00:52:32 or is it the fact that it's there because somebody knew it was important enough to put there and take the time to figure out what that community's conduct should represent? I feel like at this point, if you don't have it, you're making a different kind of statement. It's not like the code of conduct makes a statement. It's actually not having it as the statement. And that's a really negative statement.

Starting point is 00:52:54 I don't know if it's, let's use the changelog as an example. Because we literally, as of maybe a week, two weeks ago, three weeks ago maybe, and today is June 5th. WWDC day by the way. Apple good stuff, whatever. But we just recently put a code of conduct in. And that's not because we're idiots. It's because we didn't think we really needed one.

Starting point is 00:53:17 We're a podcast primarily. A group of podcasts. A newsletter. So we didn't really have a community. And we've actually had a membership slash community for a while now. And for me, my paralysis, and I'm not sure if this reflects Jared's opinion, but for me, my paralysis around it was like, I don't know the first step to enact one. Do I write it by hand? Do I adopt one from another community that best represents me?

Starting point is 00:53:45 Kind of going on a rabbit hole here, but the point I'm trying to make is that even us, I would imagine we're pretty close to open source and we realize how important this is. Only recently did we enact one. And I don't feel like we were behind the curve. I felt like we did it when we needed to. You know? You know, we wrote an open source guide about that. Just saying.

Starting point is 00:54:07 Maybe I didn't read it yet one of the neat things about that's good uh one of the neat neat things about the um code of conduct findings we didn't um highlight this in the write-up but um i i kind of wish we had was that it has this reputation of being controversial because there's like some loud people on the internet who really, really hate them. But our findings show that they're actually not controversial. So we, we allowed people to say,

Starting point is 00:54:39 to say about all these things in that table that, that you're looking at, from responsible maintainers all the way down to contributor license agreements. Is it very important to have it all the way to doesn't matter either way, all the way to very important not to have it. So we allow people to say, like, I really, really don't want to see this on a project, all the way to I really want to see it. And only something like 4% of people said it was important not to have a code of conduct. The vast majority of people either really want to see them,

Starting point is 00:55:15 they say it's important, or they're kind of indifferent about it. And that includes men. If you throw all the women in the data out, that still true um so they're they're broadly popular you're not actually going to dissuade a lot of contributors um from participating if you have one even though there's this sort of like reputation on the internet that it's sort of a you're taking aside in a culture where like most people actually are like pretty happy to see one or they don't care either way so it seems like a a good and easy way to make your community welcoming to the people who care about it and it's not going to turn away other people i think either way it's it's really about opening the

Starting point is 00:55:56 door right like it like you said if if people are indifferent then they're not going to stay or leave because of it but there are going to be people who don't come at all if you don't have it yeah and it's not just codes of conduct too i mean i think there's also just a broader cultural issue of i i can only speak from my experience i guess i'll just do that but of sometimes like wanting to have permission to do something or feel like you're not good enough and unless someone says you're good enough to do a thing and then you and then you're okay to do it so it it's not like, for me at least, it's not like I wouldn't speak at a conference. It's not like if I see a code of conduct,

Starting point is 00:56:31 it's like, oh, great, this is the conference for me. But it's more like, well, am I qualified as a speaker? Is that okay? And so for me, one of the big takeaways from something like certain groups wanting processes clearly spelled out or documentation is just like really encouraging people when they're on the fence to just say, yes, you can do it, just do it. And, and just being really supportive and encouraging of people

Starting point is 00:56:55 that might be kind of like hovering in the sidelines, unsure whether they want to participate or not. Like everyone can play that role day to day of if you see someone with potential, just encourage them to get out there. Yeah. And I think to call back to what we were talking about earlier with sort of signaling to everybody watching that, you know, you care about this kind of thing. How you handle that pull request and that discussion around a code of conduct says a lot about the project. Because while it isn't controversial among most maintainers, as this data shows, it is controversial among 4chan members

Starting point is 00:57:27 who will show up and just start saying stuff. And either you allow that to become a giant thread that derails the pull request, or you merge it and close it and lock it and say, I'm sorry, but that's not our community. Go away. And that sends the right kind of signal, right? That's a good takeaway. The people who are really loud about this stuff are not actually representative

Starting point is 00:57:51 of the silent majority. Coming up after the break, we talk about the relationship of businesses and open source, how they use it, whether or not they encourage contributing to it. We also examine and open source, how they use it, whether or not they encourage contributing to it. We also examine how open source has become the cultural default in many cases, what people actually value in open source, because you might be surprised, and what you can expect from the future of the survey. This episode of The Change Log is brought to you by our friends at Microsoft and Azure Open Dev Conference, their upcoming no-cost live virtual conference that's focused on showcasing

Starting point is 00:58:46 open source technologies on Azure. Engineers are looking to bring more of the open source tools they know and love to the cloud, but often need a grounding on what to look out for and what to expect. The fastest way to learn is to see live demonstrations and get time to Q&A with experts in the field. Microsoft is providing this at no cost. It's a virtual event, which means you don't have to travel anywhere. Reserve your spot today. Head to azure.com slash open dev. That's A-Z-U-R-E dot com slash open dev

Starting point is 00:59:20 to register for this free live event from Microsoft. It is on June 21st, 2017. And now back to the show. I can't remember the exact quote you have, Michael, but you had a quote on the bonus episode, the season one, the behind the scenes episode of Request for Commits about the relationship of businesses and their relationship with open source and how it's severely skewed. With so much as this data shows of open source contributions happening on the job, it kind of reflects back onto that idea that more companies should help sustain open source. And that doesn't mean just money.

Starting point is 01:00:09 It might mean the 10% time or the 20% time being allowed towards open source. We've been talking about this on Request and Recommend. What do you all think about this finding? Are you surprised? Are you enlightened? I have a couple of things that are surprising about this. So one is I was really surprised that

Starting point is 01:00:28 the number of people that found certain policies unclear. That was really interesting to me. And I'm very curious, do they contribute anyway when it's unclear? Do they use it anyway when it's unclear? Yes, they do. There are actually separate questions like, what's your employer's policy

Starting point is 01:00:48 and what do you do? When the people who say that it's either unclear or they don't know what the policy is, their practices look like those of people who say their policies are supportive of it. So in the absence of any clear rule, people will do it. And I think that demonstrates you kind of need to in modern software development. Open source is pervasive. You can't not use it and you need to fix things if they don't work.

Starting point is 01:01:18 So if you have any justifiable way of doing it, you will do it. That's what I took from that. So, see, I find that really fascinating, right? So they're doing it anyway, but it's not like they're being encouraged to do it. Like, I think the people that say that they have a permissive, that means that their employer has been very clear that they can and that they should, right? It was people, permissive means either your employer is encouraging of it, which was like about a little less than half, like 46 percent, or they were accepting of it. Like they wouldn't tell you not to do it, but they wouldn't necessarily like go out of their way to encourage it. So what is the disparity between people encouraging the use of open source versus the contribution to open source? I'm sorry, I don't quite understand the question. So using open source applications and using open source dependencies are incredibly high,

Starting point is 01:02:16 right? So you have the permissive, which I think covers, encourages and allows, right? Yeah. Yeah. So that's really high, right? It's in like 80%. But it drops considerably when you talk about contributing. So clearly there's like a huge disparity here between people that are being strongly encouraged to use open source versus like,

Starting point is 01:02:49 and then they're not encouraged in the same way to contribute back to open source. Right. Well, specifically the question asked about non IP policies on non work contributions. So it might be that people don't know what their IP policies specifically is. Um,

Starting point is 01:03:03 it may not mean that they're not allowed to do it or not encouraged to do it, but like that specific policy they're not super familiar with or that there's something about non-work contributions that are different. We actually didn't ask about contributions on the job, which we should have. That was an oversight. Next year, we'll ask about it. Does that mean they're on work time, they're contributing to non-work open source

Starting point is 01:03:31 and they're not sure about the their ability to contribute to non-work open source while working? It could even be off work, though, because people have employment agreements that say that everything that they produce is owned by the company. Whether they're working or not, or even the hardware they're using. I think it's really telling, right?

Starting point is 01:03:50 It's not surprising that everyone's using it, but it's unclear how you're able to contribute back. And hopefully that will change. So should that just mean that employers should make it a bit more clear about the company's relationship with open source and the permissions around it? Well, but to come back to your point, right, you were saying that they should dedicate some amount of time. And I don't think that we're at even the place where we can say that they should be dedicating some amount of time because what this looks like is that employers are encouraging people

Starting point is 01:04:19 to use open source and depend on open source at a rate that they're not telling them to contribute back. And so just getting those level might be enough without dedicating four hours or whatever. Maybe we should be setting our goals a little bit differently. I mean, they definitely go hand in hand. If you don't have the policy and you don't know what it is, then you're definitely not going to just contribute before you know what the policy is. If you don't know the policy or if there is one, you're not being encouraged to do so. Yeah, I mean, it sounds like people are anyway.

Starting point is 01:04:50 And I think that that's kind of telling, too, because it looks like they're doing it because there's no other way, right? There's no other way for them to get this bug fixed in their critical dependency unless they have this policy. So they just kind of ignore it. I did notice that there was a restrictive, like people that actually are restricted from contributing and using. And that's the most offensive to me, even though it's a really small number,

Starting point is 01:05:12 because it looks like, I can't eyeball the numbers from the graph, but it's either two or three times as many people that are, so people, that means that there's like, you know, a bunch of people that are being highly restricted from contributing to open source, but are not being restricted

Starting point is 01:05:30 from using it in any way. Which is just offensive. Yeah. The last one, though, is a bit telling around the last inside list. Open source is the default. Did you really need to do the survey to find that out, though?

Starting point is 01:05:46 But you know what I thought was... It's funded back up your claims with data, though. Right, of course. We need that. What I thought was really interesting about this is it wasn't that people necessarily thought open source was better among most parameters besides security. People actually

Starting point is 01:06:02 think proprietary does better in a lot of situations, and yet, vast majority of people will still seek out open source options which says something a little bit more interesting to me of it's not just like people recognize that open source is better than proprietary and that's why they do it it's like they don't even know if it's better or not but for some reason

Starting point is 01:06:18 they're just going to keep using it anyway because it's so like culturally default at this point that like that's just what you do that it doesn't even matter what the quality is. And that kind of says something too about the state of where we're at with open source. Everyone is going to use it. It's also free though.

Starting point is 01:06:32 But interestingly, people didn't value the cost as much as, cost was not even one of the top reasons what people value in choosing software, which is also really bizarre to me because you would think that those two things would be explicitly tied. I think it goes even, I mean, this is just me guessing, but I think it goes even beyond that of people just do open source because they hear that's what they're supposed to be doing. People aren't really thinking twice about what,

Starting point is 01:06:56 even that it is open source, or the cost or anything. It's just like, this is what I do. I take this software, it comes from magical berries somewhere, and then I just put it in my software and that's it. They heard React was cool, and so they used it. Right, exactly! I mean, that matches up at least with anecdotally how I understand a lot of software gets made. Yeah, stability, security, and user experience are the highest of the importance graph, yet 72% say they always seek it out. And that kind of says to some degree, regardless of stability, security, and user experience, because as you said, it's the cultural default.

Starting point is 01:07:34 And you can see cost is kind of like down there in the middle somewhere. Yeah, I mean, I didn't answer the survey. I wasn't one of the lucky ones who clicked five times and went through the special portal. So I didn't get to answer this, but't one of the lucky ones who clicked five times and went through the special portal. So I didn't get to answer this, but to me, I pay for things,

Starting point is 01:07:49 right? I pay for things that they, if I value them, I pay for them. I don't care how much it costs, but cost matters. But you know, like I'm not seeking it for free.

Starting point is 01:07:58 It's, it's like if it matters, you pay for it one way or another, whether it's involvement or actual dollar exchange yeah it probably matters like kind of where in someone's life cycle they are like i bet if we like made this plot and split it out by whether people were students versus employed it's true like maybe cost would be that's true like even if i didn't uh if I didn't have the money to pay for it and I still valued it, I'd be like, well, I can't buy it, so I'm a student.

Starting point is 01:08:31 Yeah, I haven't looked at that. Someone should look at that. I bet it's interesting. So something like 20% of the data is people who are students. Well, that's a good point right there, I think, is that someone should look into this. So this data that you've pulled back is all open source.

Starting point is 01:08:47 The GitHub repo is linked up on the website opensourcesurvey.org, so it's not like you can't go and find it. You can download the data and just get started. There's a big download button at the bottom of the site. So this, what we've been talking through is your findings of this. This is your insights from this.

Starting point is 01:09:03 That doesn't mean that somebody else might go and that doesn't mean the questions changed, but someone can go back to this data and kind of pull back more insights than the five you've shared here. You've dug into quite a bit, shared a lot of details, even came up with some graphs to kind of share all the data points that we've talked through. Yeah, the whole point is that it's an open data project, so we hope people will use it and learn from it.

Starting point is 01:09:28 There's a lot more in here than we've covered. A lot of really fascinating things that we found that didn't have room for in the write-up. So please do go analyze it and tell us what you found. We really want to know how people are using it because we'd like to do this again. We want to know what what was useful what didn't turn out to be useful how are people using it can we expect like maybe uh this to be a once a year thing twice a year thing i mean open source moves fast so everybody's just trying to keep up so should we do this once a year twice a year

Starting point is 01:09:59 what's the what do you see the future of this survey becoming? You know, so it was a ton of work. So I've been sort of thinking like once every two years, but if we get a lot of people are using it and a lot of people are finding either the insights or the raw data really valuable. And people have ideas about things they want to know over time or new questions they want to ask. I think now that we've done it once, I have a pretty good idea of how we could change how we did it so that it would take less time. Well, the insights part seems to be the most time consuming.

Starting point is 01:10:41 Actually conducting or allowing somebody to take the survey seems to be a pretty passive type of role actually you know the hardest part was writing the survey uh because i didn't have like i wasn't familiar with uh kind of the existing research on open source so i had to go learn about it and I had to go write a whole 50, 54 question instrument with a lot of help from our collaborators in academia and industry. I don't know if we've talked about them yet, but like tons of contributions from people who are doing a lot of active research in this field. But that was actually the,

Starting point is 01:11:24 I think the most time consuming part of it well now that the without having done all the work you've done of course i can have this point of view but now that it's there do you see the questions needing to change very much to continue the survey does it have to? Or is this something that just sort of can kind of keep operating on the random selections, as you mentioned to Michael's question a bit earlier? Can the same 50 question, 54 question survey keep going to kind of keep gathering and kind of keep maybe a real time pulse on the results? Yeah, I mean, that's a good question. So one of the things we've tried to be

Starting point is 01:12:04 conscious of is that open source is a community that's over surveyed. And so we don't want to add to the noise, right? Like people in open source are constantly getting emails asking them to take surveys. Partly, we hope that we can cut down on some of that uncoordinated research efforts by providing one single high quality data set that everybody uses instead of trying to like piece together their own. You know, we don't want to like bother people with research unless we think that these things are actually changing over time but if uh if there is like if people want to know like we saw this last year we want to know how it's changed a year later because we've invested a lot in say our documentation efforts uh that that's certainly something that we could consider doing i was sort of thinking that we would do a whole new every time we do this we just just ask completely different questions to try to open up new avenues of research. But it kind of matters what the research community and what the open source community wants out of this data. And obviously open source is, hopefully in two years from now, there are going to be so many different things that open source will be facing, given the pace that it changes at. Yeah. so many different things that open source will be facing given the pace that it changes at.

Starting point is 01:13:28 So before Adam closes out the changelog here, I'm just going to do a complete takeover for RFC. Please do. So RFC, we focus on open source sustainability. And I'm staring at this figure about what open source users value in the software. So why are people actually using open source? And so we know that there's just these widespread sustainability issues. And the first things that they impact, and the things they impact the worst, are stability

Starting point is 01:13:55 and security, which are the most important things to people. And conversely, the classic business models around funding open source rely on support or new features. And support is ranked the absolute lowest thing that people care about, with innovation being second to that. So it's just like literally the things that people care about and how we've traditionally looked at funding are at opposite ends of the spectrum. I was hoping someone else would notice that. I'm glad you pointed that out. I think it's funny because almost always when people talk about turning an open source project into a quote-unquote real business or whatever, it's like, oh, we'll just offer support and services. And it's like, actually, people don't really care.

Starting point is 01:14:39 Well, that should be a clear indicator to anybody who's going that route to say that's probably or could be the wrong route to go. I mean, it'd be nice to do this in a year and see if that number changes much, though, because I'm very surprised that those two are the lowest. Let's see if support gets lower. I don't know if it could get any lower. It's literally right above less important. Well, that scale is... That's not the full scale.

Starting point is 01:15:08 I think one thing to keep in mind is that these things... We know these things actually do vary quite a bit by community. So actually, this particular set of questions was taken from some ongoing research by a lab over at Carnegie Mellon that studies differences in the values of different ecosystems. And so they've done this among a number of communities and found differences in what different open source communities value in the things that they build and in their own processes. And so this is sort of overall, if you aggregate all of the projects together, this is what falls out of that.

Starting point is 01:15:49 But there's probably significant variation between communities in what they value. So for your individual project, it may not be the case that support is the least important thing. But when you aggregate everyone together, that's how it falls out. Any call to action for those listening? So we got lots of people who listen to this show, is the least important thing. But when you aggregate everyone together, that's how it falls out.

Starting point is 01:16:06 Any call to action for those listening? So we got lots of people who listen to the show, a lot of people who care about open source. Either they're contributors, they desire to get into new languages, they listen to the show for various reasons. It all chasms of developerhood, so to speak. You know, what any core call to action could you give?

Starting point is 01:16:25 I mean, obviously go check it out, pull down the data, play with it if you're a data scientist or anybody to sort of gather your own insights. But what other call to actions can you give to the listening audience? I really hope that open source projects use some of this stuff to figure out how to get new contributors on board and how to strengthen their communities

Starting point is 01:16:40 because I think there's a lot of really good insights around that. So opensourcesurvey.org. And I also mentioned that at the very bottom, you can subscribe to Survey Update. So you can put your email address in there, click the button subscribe. And I guess that means that we'll obviously have to do another show like this because this was super fun. I mean, I love just kind of roundtabling this, you know, kind of digging through everything

Starting point is 01:17:01 and getting to hear different perspectives. It's been a lot of fun. So Nadia, Michael, Franny this has been fun, thank you thank you the changelog is produced by myself Adam Stachowiak and also Jared Santo we are edited by

Starting point is 01:17:19 Jonathan Youngblood and our theme music is produced by the mysterious Breakmaster Cylinder you can find more episodes like this at changelog.com or by Blood and our theme music is produced by the mysterious Breakmaster Cylinder. You can find more episodes like this at changelog.com or by subscribing wherever you get your podcasts. Thank you to our sponsors, Sentry, TopTow, GoCD, and also Microsoft with their Azure Open Dev Conference. Check that out. Also, thanks to Fast fastly our bandwidth partner head to fastly.com to learn more and also

Starting point is 01:17:47 linode our cloud server of choice head to linode.com changelog and we'll see you next week Thank you.

The Changelog: Software Development, Open Source - GitHub's Open Source Survey (2017) (Interview)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.