Python Bytes - #74 Contributing to Open Source effectively

Episode Date: April 19, 2018

Topics covered in this episode: * Contributing to Open Source effectively* Jupyter, Mathematica, and the Future of the Research Paper Depression AI Extras Joke See the full show notes for this e...pisode on the website at pythonbytes.fm/74

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 74, recorded on April 17, 2018. I'm Michael Kennedy. And I'm Brian Ocken. And I'm Matt Harrison. Yeah, that's right. We've got a special guest here on the show. So Brian and I decided to invite Matt Harrison to mix things up a little bit and bring a slightly different perspective. And we decided we're going to try this little experiment from time to time, you know, once a month, once every six weeks,
Starting point is 00:00:27 something like that. So Matt, welcome to the show. Thanks. Thanks for having me. Good to be here. Good to have you here. So before we get into the topics, I just want to say thank you to Datadog. They're sponsoring this episode. Check them out at pythonbytes.fm slash Datadog. I'll tell you more about them later. Brian, you've been running your own open source project lately, and it's been fun, right? Yeah, I started a little project. I called it Cards, but the intent was to, it's on GitHub, and the intent is to talk about how I'm going to go about testing it
Starting point is 00:00:57 using all the, what I think of as good test methodologies. But doing it in the open on an open source project, I'm getting contributors already, which is really cool. That's awesome. I don't really know the conventions. And even though I've been programming for decades, I'm kind of new to the actually contributing to open source. Well, and it's different on GitHub than it was, say, in Subversion or something like that, right? There's the whole Git flow and PRs and that whole thing, right? Yeah, and how to deal with branching and forking with GitHub
Starting point is 00:01:28 versus doing something out in the open on GitHub is one thing, but actually running it like a normal open source project is completely different. So I had a, actually, Anthony Shaw did a pull request on cards and he did a really good job on it. And one of the things is he started it with a WIP for work in progress. And I didn't know that was a convention already.
Starting point is 00:01:52 So I looked around and there's some, if people are new to this and want to learn more, I'm sure everybody out there already knows about open source stuff. But anyway. I would not make that assumption. I definitely would i mean it feels that way but i think it's more of a dark matter sort of experience where like you see the stuff that people are doing but that doesn't mean everyone is participating and many people i think are just wanting to get get into it what do you think matt i've been using open source since the mailing list days and you had to talk over the mailing list. And so GitHub, when it first came out, was sort of a big change for me. And I think a lot of people use GitHub as just, I'm going to throw my source over the fence and see what happens to it rather than maintaining it as a real open source project and trying to include community and get contributions.
Starting point is 00:02:39 So I'm curious to see how this works. And I think, like Michael said, don't make the assumption that everyone knows how to do these things, even though some people are doing it. There's certainly a lot of people who can learn from this. There's a couple of things I wanted to throw out. This is from 2015 already, but it's an article called How to Write the Perfect Pull Request. And it kind of talks about the philosophy sort of both with approaching the pull request when you're first getting started. And it even talks about the WIP trick to tell the person that owns the repo, you're not done with it. You just kind of like early feedback on it. And then there's some advice on offering feedback to pull request submitters and also responding to that feedback as the submitter as well.
Starting point is 00:03:23 It's a short read, but there's some good tips there. And then I'm also really excited about VM's new book coming out from Pragmatic called Forge Your Future with Open Source. And it includes things like how to deal with pull requests and everything. That sounds like such an interesting book. I love it. And I'd just like to say that, like using open source and getting involved for someone who wants to get a break into learning Python or learning a library or even getting a job, that's an awesome way to do it. I agree. And one of the problems people run into is they work in a place that doesn't use Python. And so they don't have a place to actually practice Python outside of just like toy things. So contributing to open source lets you make a meaningful contribution, even though maybe you're a Java or.NET shop and they're like, Python, no.
Starting point is 00:04:12 Yeah, they might even be using Subversion or some other source control. Absolutely. Cool. This is a good one. This is a good one, Brian. So Matt, you and I both do in-person training periodically. And I just did a class last week, I guess, a short class. And it was a lot of folks coming from a traditional sort of MATLAB, Mathematica background and moving into things like Jupyter. And I think that this might be a trend. What do you think? Yeah, I think we're seeing Jupyter changing from early adopter to sort of normalcy.
Starting point is 00:04:43 I found a thing going around Twitter. Paul Romer, who's an economy professor at NYU, tweeted about his experience with Mathematica and Jupiter. And he referenced an article in The Atlantic about both of those products, Mathematica and Jupiter. And for me, like I said, I've been involved with open source for a long time. You don't often see stuff in the Atlantic or professors, economy professors. This isn't a computer science professor. This is an economy professor posting about Python and Jupiter. So really cool stuff there. This is such an interesting find.
Starting point is 00:05:18 And if you open, people should open up the Atlantic link that you're linking to here because, wow, that's a pretty provocative picture. There's like a formal paper that says the scientific paper is obsolete. And there's like a paper, like an academic paper that's literally on fire, like in an animated way, like full screen. It's crazy. Yeah, and it makes reference to the discovery of gravitational waves and how that there was a paper on that but they also along with that published a jupiter notebook where you could go out on your own and you could look at their code look at their data and it had embedded text in it as well and basically discover gravitational waves or go through the same sequence and and reproduce their science
Starting point is 00:06:01 so i thought that was pretty cool a quote from mr. Romer, he says that Jupiter is the new open source alternative to Mathematica that is well on its way to becoming the standard for exchanging research results. I agree. I think academics has been too dependent on these couple of big, really expensive lock-in type of things like Mathematica and MATLAB. I'm also thinking of journals and stuff like that. This sort of open source paper in the form of Jupyter kind of touches on both of those. Brian, what do you think? I'm actually really excited about all that. I was just listening to a topic not too long ago about how the, actually it was one of your podcasts, about the
Starting point is 00:06:38 academic journals that are, a lot of times nobody actually follows the steps along, but having the code out on Jupyter notebooks just allows everybody to go and follow along right there. And one of the main points of these articles isn't that there's a notebook per se, but the compelling reason for using Python and Jupyter is not necessarily that the technology is better, but that there's a huge community around it. So, you know, they make the argument that the Wolfram notebook might be prettier or whatnot, but you have so many people who are contributing to these open source projects. You've got Matplotlib for graphics, SymPy for symbolic math, NumPy, SciPy, Pandas, NLTK. And, you know, if you look at PyPI, there's 135,000 packages last week on that. And it's really hard to compete with that. That's super compelling.
Starting point is 00:07:29 So really cool. Yeah, it is super compelling. So speaking of community, Brian, you know I love to like pull on the Stack Overflow developer survey and like try to dig out results from the community, right? Yes. Yeah, that's always fun. So there's another one that uh just came out that gives us a different perspective and also is more python focused than that one right that's like broad software development so jet brains the pycharm team teamed up with the psf just at the
Starting point is 00:07:57 end of last year to do a python developer survey and so the thing i'm linking to is python developer survey 2017 results but it's like december right so it's pretty relevant still it's pretty pretty fresh they were just talking about on their blog and so this is a really nice piece of sort of almost journalism around data science i think it's actually they've really written it up nice they they show you graphs and they're like here's the main takeaway from this section here's the main takeaway from this section. Here's the main takeaway from that section. So how about I share some takeaways with you? Yeah. All right.
Starting point is 00:08:28 So the first one is, says of the people that they interviewed. Now, this is obviously a self-selecting crowd. But the question was, you're obviously doing Python. Is this your main language or a secondary language? And they said 80% of the people, Python is their main language. That was pretty cool. They said data analysis is actually just as popular as web development, which is pretty cool. So there's basically as many Python web developers as there are data scientists.
Starting point is 00:08:59 Does that surprise either of you? To me, Python felt like a web thing for most folks. Yeah, I mean, recently it wouldn't surprise me. But, you know, if you would have said that two years ago, it probably wasn't the case two years ago. But now, yeah, it's not surprising to me. Yeah, they also talk about the growth of Python. And Brian and I, you and I, we've touched on this a few times, but they're also confirming like we think that massive hockey stick growth is largely data science people coming in. Yeah, it could be. There's I think there's a lot of room to grow. There's a lot of people who are using
Starting point is 00:09:28 Excel and some of these tools that you mentioned who probably want to migrate to something like Python for the libraries and machine learning capabilities. Yeah. Another interesting one was Python versus legacy Python. So Python is at 75 usage among this group and 25 for python 2 and if you look at the curve that's like increasing in time like the the rate at which people are moving to python 3 so that's really cool yeah that's cool yeah you wonder how much self-selecting is there right the legacy laggards didn't want to participate in this. That's right. I don't even know the stick in surveys. I learned everything about Python I need to know in 2008. Okay, so they also talked about where code runs,
Starting point is 00:10:11 where people run their Python code. And this is, I don't think it includes the hosted notebook type stuff, so probably not that. But 67% AWS. Brian, does that surprise you? I'm going to plead. I'm not in the field in the web sort of space to know really where it's running. My basis for sort of judging the use of AWS compared to
Starting point is 00:10:33 other platforms is when AWS goes down, what parts of the internet are no longer accessible? And they're pretty broad. Yeah. Yeah. I would think that it'd actually be a little bit higher than that. The ones that surprised me was you've got Google App Engine at 29, Heroku at 26, Digital Ocean at 23. And the last one they say is Microsoft Azure at 16. And I think that 16 is probably going to change a little bit. They've been doing a lot of hiring in the Python space and getting some prominent Python people.
Starting point is 00:11:01 So there's going to be a big push from Microsoft on that. They're definitely focused on Python in a lot of important ways. They now have Azure Notebooks. They have Brett Cannon, Steve Dower, both Python Core devs working there. They brought the guy who did the Python extension for Visual Studio Code in-house.
Starting point is 00:11:18 They're doing a bunch of cool stuff. All right, so a few more takeaways. Team size, right? You think of how big of a group do you, you know, like how large of a team do you work on? And if you think about like one of the advantages of Python is you don't need a large set of people to build something interesting.
Starting point is 00:11:35 And I think that's reflected here. So it says like team size, two to seven people, 75%, 74% of the respondents are in that two to seven group. And then eight to 12 is 16. And then basically above that, above 12 people, all the way up to like 40 or larger, is 9% of the balance, basically. So really small teams. And then operating systems, Brian, you touched on this a lot. 49% of the people are still using, are currently using Windows as their OS.
Starting point is 00:12:03 Then 19% for Linux, 15% for Mac. And like you said before, Windows often gets the short end of the stick and sort of testing and examples and stuff, but it probably shouldn't. Yeah, one of the things I want to go back to is the cloud platforms that we talked about. One of the things that's interesting there
Starting point is 00:12:20 is that clearly some people are running on multiple platforms because that's over 100%. Yeah, that's interesting. I can tell you for sure that if somebody asked me which of these platforms do you use, I would definitely check the DigitalOcean and AWS boxes. Okay. Because, for example, the main server for our podcast and the database server runs on DigitalOcean droplets. But when you interact with it and you get like say an email, especially around the training stuff, that goes through Amazon simple email notification service
Starting point is 00:12:50 and things like that. So like there's this blend of them. Yeah, I'm similar. I've done Heroku and DigitalOcean and both had stuff in S3 as well. So it's not a either or. One of the things I thought was interesting was the operating systems.
Starting point is 00:13:03 I mean, like you said, Windows tends to get, you know, people have something in their heart against it or whatnot. But I was surprised that Mac was so low on this. Yeah. Interesting. I wouldn't have thought that at all. But you go to the, I'm telling you, dark matter developers, that's what it is. Yeah.
Starting point is 00:13:22 It's interesting. I think the story on windows is going to get better i believe the new version of python is going to use ms build and not visual studio 2008 for its its compilation stuff during install which means like modern versions of windows will be able to install stuff without like installing a 2008 version of visual studio which will be real nice all right so before before we get to the next one i want to just tell you both a little bit about datadog so speaking of stuff hosted in the clouds and spaning multiple machines and things like that datadog is a monitoring solution that provides
Starting point is 00:13:57 visibility and tracks down issues with distributed systems involving python applications so within just a few minutes, you can find bottlenecks in your code by exploring graphs and rich dashboards, and you can visualize your whole performance across all of your apps, which when you're doing distributed programming or distributed apps, microservice type things, that's a huge deal. And you can go to pythonbytes.fm slash Datadog, do a quick little trial there, and you'll get a free Datadog t-shirt, which is pretty cute. So check them out and let them know you appreciate them supporting the show. All right, Brian, I'm a big fan of databases, especially shiny new ones.
Starting point is 00:14:35 You've got a really new one yet. Like I can't even get this one yet, but it's still pretty cool. Yeah, you can't get it. But one of our listeners, Arash, I think that's his name. Anyway, he let us know about EdgeDB. And EdgeDB has a blog post up. It says, EdgeDB, a new beginning. And at first I thought, yeah, okay, we'll keep an eye on this. And maybe we'll cover it later when we can actually play with it. Because it's a new database that's not available to use yet. However, it's going to be open source.
Starting point is 00:15:05 And the reason why I brought it up now is because it's coming from some fairly interesting people. It has some pretty powerful Python origins, right? Yeah. Well, so like, for instance, the Elvis and Yuri, and I'm not going to try to pronounce their last names or they will flame me, part of this. And they're the people that brought us AsyncIO and UV Loop. So that's pretty impressive. That's very impressive. One of the things that's interesting is looking at the kind of code that you get with this.
Starting point is 00:15:32 So they're trying to attack, the problem they're addressing is that document databases have some issues with just scalability after your project gets larger. The schemaless part of it sometimes can be hard to deal with. A lot of people deal with it fine, but they see it as part of a problem. And relational databases are growing a lot. And Postgres, for instance, is keeping up to date. But the interface, how you interact with the database, the schemas, and the underlying API to the database hasn't changed much in a long time. So they're trying to change that. And I forget what they call it, an object relational?
Starting point is 00:16:14 They call it an object database. Object relational database. Yeah. Yeah, yeah. Yeah, not like the traditional ones, they say, from the 80s. Yeah, so one of the things to look at, if you're going to look at anything, is to go to the link and look at the example. They have a new query language called EdgeQL. So they have a different way to write a schema that's fairly, doesn't really look like Python, but it's type-based and it's fairly expressive. It's pretty interesting. So instead
Starting point is 00:16:41 of saying, like, let's have a class and map that to the database, like, say, SQL Alchemy or Mongo Engine might, they said, we're going to define our own data definition language, our own DSL. But it's really incredibly simple, like doing to look it up. And this is like, you want a foreign key relationship, you say link assignees goes to a user definition and the cardinality is double stars. So I'm guessing multiple mini to mini sort of thing. And that's on like incredibly short there. So my first impression was like, really a new schema definition language,
Starting point is 00:17:21 a new query language. Like seriously, like it's just like okay well i'm tired of sql and i'm tired of the other ways of programming so we're going to invent like another thing that people are going to get tired of but it's starting to grow on me matt what do you think about this whenever you say i'm going to invent something that's going to replace sql i think you you hear a million developers cringe because they all know sql right but i think if you can get the five minute out of the box presentation where it's like, this is a compelling reason to use it. And
Starting point is 00:17:50 you know, everyone, or at least most Python people I know want to use like an ORM and interact with the database that way. But there is this impedance mismatch with those. So if you can nail that down and have like a really smooth five minute out of the box experience with this. I think you could get a lot of people interested in them. Yeah, it's pretty interesting. I'm glad you brought it, Brian. Thanks. All right, Matt. So you're a fan of the Wizard of Oz is what I'm to draw from this next one. Yeah, yeah. Follow the yellow brick road. So I've been I do corporate training and I do consulting. And one of the things that I do when I'm doing data stuff is visualization. Visualization's pretty important.
Starting point is 00:18:28 I mean, I've literally found bugs by visualizing something that we couldn't have found just by looking at the data necessarily. And so visualization is also important in the evaluation of machine learning projects. And one of the projects that I've been liking and using recently is a project called Yellow Brick. So I guess this will take you to the Wizard of Oz if you follow it. It's not a new project necessarily, but it's a project that's alive and going and being worked on still. And what it does is it offers visualizations for various machine learning algorithms. So if you use a tool like Scikit-learn, you can go to their website and they'll have all these visualizations up there. But those aren't included in the library for Scikit-learn, you can go to their website and they'll have all these visualizations up there. But those aren't included in the library for Scikit-learn.
Starting point is 00:19:08 You either need to... You've got to go create them yourself, right? Yeah, you've got to either copy and paste their code or go find some Stack Overflow. So what I've been doing, I mean, I have a project on my GitHub, MLViz, that I just have my own. Here's the visualizations that I commonly use. And then I use my little library. But I'm looking to replace it with this. And I've been using this for some of my training as well recently.
Starting point is 00:19:30 So it's got visualizations for classification, regression, clustering, and text. One of the cool things about it is if you're familiar with SK Learn or Scikit-learn is that it has a similar API to that. So there's a fit. You can fit your visualization you can transform it and then you call this method called poof and that will pull up a map plot for your poof that's the magic method that they have how do you spell poof p-u-f-f p-o-o-f poof yeah poof gotcha yeah perfect love it so just a nice little library to you know one of those things that can be annoying or that you always go and copy and paste that code. And if you can just pip install this and use it and it has a great interface, it makes your life a little bit easier.
Starting point is 00:20:14 Yeah, absolutely. So, Brian, the next one, the last one I want to cover comes from the whole Alexa thing. We've had a couple of people write us about interesting things with like say flask ask and alexa skills right yeah yeah so this one is a little bit of a serious one or at least addressing a serious problem right it's not like putting mustaches on cats but like it's actually trying to solve a problem that although that would be a hard thing to do audibly on alexa nonetheless so this one, this one is called Depression AI and it's an Alexa skill. I apologize, everybody's little device
Starting point is 00:20:52 is probably going off. It's a Amazon device skill for people who are suffering with depression. And it's open source. It's based on Flask Ask, which I covered pretty deeply on episode 146 of Talk Python. So that's basically a way to use Flask to write these Amazon voice assistant skills, which is pretty cool. So the idea is that if you are suffering from depression, one of the
Starting point is 00:21:20 things that's really hard for people apparently who are suffering from depression is to sort of go about your normal daily routine, right? Get up, make your bed, take a shower. It's like easy to just sort of like stay sprawled out on the couch or the bed or whatever. And so it sort of helps to encourage you to keep doing those things. And it's supposed to be able to detect your moods and kind of give you some feedback. What do you think? I think that's super impressive. I mean, I have relatives who have dealt with these sorts of issues, and I don't know that they're necessarily ones who would take to technology. But anytime you can get some help or get, you know, some feedback or someone other,
Starting point is 00:22:00 you know, you're not listening to yourself, it can be a good thing. I think this is awesome to have. I think there's a lot of people that would aren't somebody that wants to go talk to somebody else, but making the decision to put this in place when they're feeling good and then have it help them through the hard times. This would be great. Yeah, it's pretty cool. It won the Valley Hackathon, which I think is in in modesto sort of outside san francisco but this was apparently built like what is that a weekend or something which is also a pretty pretty big
Starting point is 00:22:31 testament so you can do things like it'll evaluate your mood it actually has suicidal intervention it has location-based recommendations and mostly helps you with small activities so you can say things like alexa check on me or i I feel down, or help me feel better. I haven't gotten out of bed today. It'll ask you things like, have you gotten out of bed yet? Things like that. So it's pretty cool. And it's also open source and on GitHub and based on Python.
Starting point is 00:22:58 So if this is inspiring, even if it's a totally different subject area, take it and use it as an example. Well, that's it for our official news. Brian, you got anything you want to share with the world while we're here? I've got some good news and some bad news. So the good news is I went to an estate sale the other day and I bought a book called How to Be Interesting in 10 Simple Steps. So that's the good news. The bad news is I'm a really slow reader, so it might take a while to take effect. Step two. No, I've just skimmed it so far. So I haven't even started yet. Well, what's a book? Yeah, very good. It was printed a long time ago before we had eBooks. Is it one of those things on paper? It's like a tablet, but it doesn't run out of batteries. Is that
Starting point is 00:23:42 what you're telling me? I've got like another book author harassing me about physical books. How about you, Matt? You got any books lined up? I'm working on revamping my Pandas one. So big demand for Pandas, and I want to update mine to the latest version. So it's 0.17, and that's a couple years out. Sure, very cool. Oh, yeah, and we also have some news about maybe a course coming out for you
Starting point is 00:24:05 we'll leave that as a teaser but i think a video course maybe in in near future yeah yeah maybe i don't know we'll have to see we'll have to see if we can get our act together awesome all right well matt thank you for joining us and dropping in on this podcast and brian thank you as always yeah thanks my pleasure thank you as always. Yeah, thanks. My pleasure. Thank you. Yep. Bye, guys. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm.
Starting point is 00:24:38 If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchcken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.