Python Bytes - #32 8 ways to contribute to open source when you have no time

Episode Date: July 1, 2017

Topics covered in this episode: [more] Introducing Dash Keeping Python competitive PyPI Quick and Dirty Minimal examples of data structures and algorithms in Python 8 ways to contribute to open so...urce when you have no time NumPy receives first ever funding, thanks to Moore Foundation Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/32

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 32, recorded on June 29th, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And we've got a bunch of great stuff lined up for you. But first, I just want to say apologies for the slightly off audio on my end. I'm not dialing in from the Python Byte studio in Portland, Oregon. I'm actually on the road. So Brian and I are doing a little bit different this week.
Starting point is 00:00:29 Yeah, it's ungodly early at 6am here. I don't know what your problem is. It's 2 in the afternoon over here in Ireland. I slept in. The magic of Skype. The magic of Skype. We live in the future. We just don't really fully appreciate it. All right, let's talk about web apps. This time you are the one bringing up a web app. Yeah, so this is pretty exciting. There's a Medium article called Introducing Dash, and Dash is a reactive web app open source project from Plotly. And it looks really exciting. The graphics and the
Starting point is 00:01:06 plots that you can do on this are kind of amazing. And it looks like an interactive real-time web page with like interactive graphs and you hook up input and output and data coming in and out. And it's really kind of hard to describe, but people should check it out because it's amazing. Yeah, it looks really, really cool. And a lot of it is done in Python, right? Yeah, so there's Python and Pandas and Flask and React and JSON and all sorts of stuff like that involved to make this stuff work.
Starting point is 00:01:42 But it ends up being some fairly impressive demos with just a handful of lines of code. Yeah, that's super cool. So basically, if you're trying to do visualizations with some of the data science tooling, you can just make that available on the web, not as pictures, but in a super interactive format, right? Which is great. Yeah. And they say it's good for data analysis, data exploration, visualization, modeling, and they also include
Starting point is 00:02:11 instrument control and reporting in what they think is a good application. I want to try this for, uh, for instrument control and visualization myself. Oh yeah. That sounds, that looks really, really cool. I kind of feel like I wish I had something to show so I could play with it, but I just don't have that much to graph these days. I used to do a lot with science, but not in the last 10 years. Maybe we could do, I don't know, plotting how much traffic our website gets or something. Yeah, actually, that would actually be kind of fun, like bandwidth by country or downloads over time. Or who knows?
Starting point is 00:02:46 We could actually play with that. That might be pretty cool. And then they include a link in this, but there's a user guide that has a gallery. And it looks like it's both pricing up. So I think it's both something you can use as a service or yourself with the tool. Yeah, cool. That looks very, very nice. Definitely it will give you that pro touch if you're trying to put yourself with the tool. Yeah, cool. That looks very, very nice. Definitely it will give you that pro touch
Starting point is 00:03:07 if you're trying to put graphs on the internet and you're using Python. Especially if you're trying to stay competitive. Yeah. You know what? There was a Python language summit back in the end of May, so almost exactly one month ago at the time of this recording. And one of the topics that came up was how do we keep Python competitive? And this has two angles, right? There's basically one angle is how do we keep Python competitive so you don't hear people going, I'm going to rewrite everything in Go or
Starting point is 00:03:37 something silly like that, which seems to be like a meme or something that's happening quite often. But also, how do we get people to move from legacy Python to modern Python? And there have been a bunch of interesting little features that have been added to Python. The async IO stuff, we've talked a lot about, you know, little language touches, like cleaner ways to generate dictionaries from sets of dictionaries, you know, union sort of thing, that kind of stuff. But a couple years ago, they really started hitting the drumbeat of, you know what, the thing that actually matters the most to people is just flat out performance. If we could make Python 3 faster than Python 2, if we could make Python 3 use less memory than Python 2, that is going to be a solid reason for these big companies with big code bases to move to Python 3 and really change that equation. And so this was sort of a conversation about how do we keep that
Starting point is 00:04:33 going at the Language Summit, from what I understand. It's not entirely clear how that all goes together. I think this was mostly based on a presentation by Victor Steiner. He's done a ton of stuff for performance in the last couple versions of Python. I think this style of approaching the problem of, like, how do we get adoption of Python 3 over Python 2, and the decision to say, well, let's focus on performance, I think that's actually working. Like, we saw this to some degree with the Instagram presentation we covered last time, right?
Starting point is 00:05:02 Yeah, so those guys got, I think, 40% less memory usage on their async tier, and they got 12% less CPU usage on their web tier. And when you talk to about companies like Instagram, that's a lot. That's a lot of servers. Right. So that's really nice. Yeah. Well, and then also just some of the feedback we've gotten about people switching some applications to asynchronous within Python and AIO, having like 10 times speed up or 100 times speed up sometimes. Yeah, that's a good point. That's a really good point. It's not about the CPU. It's just about leveraging the async IO bit, which is so much easier. So this is kind of a summary of that conversation. Like I don't think the language segment is recorded. I could be wrong, but this is a write-up of that presentation. So it's kind of nice. It says, basically, we really need to keep Python performant to be competitive with other languages, but it's not as easy to optimize as, say, optimizing C Sharp or Java or C because of the boundary that the C API brings.
Starting point is 00:06:11 Basically, there's a lot of stuff that ways of working that you're forced to follow in Python to keep the C API working. And the C API is actually a really important part of the Python performance story, right? Yes. Yeah. Yeah. So if you're going to use NumPy, that's super fast. But NumPy basically is just a C, mostly written in C. So you can't break that because you might make the Python code go faster,
Starting point is 00:06:35 but you're going to lose the ability to do the C stuff. So that's really pretty interesting. And they say it's great to compare Python 3 to Python 2 and say, oh, look, it's much faster by most benchmarks. But what you really need to do is compare it against modern languages, not languages from the year 2000. So let's try to work on this. There was some talk about the JIT implementations. We've got PyPy, which is like five times faster, but is not very compatible because there's mostly because of the C API, but also some other things, I think. There's Pidgin done by Dino Veland and Brett Cannon at Microsoft. And that's actually a really
Starting point is 00:07:17 interesting thing to bring JIT compilation to proper standard CPython, not yet another fork of it. So that's pretty interesting. And the final thing that someone proposed there was like, is there a way to use the type hints and types annotations that are appearing in Python 3 to make a slight variation of Cython, which compiles to C, that lets you write code that's closer to regular Python and leverage those type hints?
Starting point is 00:07:43 Because it actually would, you know, basically in Cython, you have to say what the types are, but you're kind of would do that anyway, if you have the type hints in there. So there's a lot of interesting stuff just brewing, you know, for the future there. That's a kind of a really interesting idea.
Starting point is 00:07:56 I like that. Like if you've got a whole, like a huge data set and it's, it's not going to change, it's going to be a fixed data type and you're declaring it with type hints anyway. Having the language be able to take advantage of that and just behind the scenes just Cythonize it or something, that would be slick.
Starting point is 00:08:16 I would love that. It would actually be pretty darn cool, wouldn't it? So, yeah, we'll see. I mean, to me, I almost see, like, could you in C or C++, you can have, like, inline assembler, right? You say this little bit, these five lines, this is assembly code, but, like, we need this. Or you can, like, inline methods.
Starting point is 00:08:36 It would be cool if you could say here within my regular Python code, this one function where this is the thing we do all the time, this one and two functions this is like you know at you do an at cython on it and it just goes that'd be cool yeah well this is the future i want to see definitely all right so that'd be a quick and dirty solution to uh make it a faster if i could just put an at cython on things yeah and um man i was just i have a hard time not laughing when we do these um. They're so bad. We should just take one episode and just see what's the worst possible thing we can do. The next article is PyPI Quick and Dirty.
Starting point is 00:09:13 It's by Heineck. And I met him at PyCon. I shook his hand and told him I loved what he's doing. And he said, oh, you're the guy that always mispronounces my name on podcasts. Anyway, sorry, Heineck. This is an awesome article. We've talked about packaging before on the podcast, but this is a really good quick write-up of how to package your code and get it ready and put it up on PyPI.
Starting point is 00:09:40 Just a little bit of history, not too much of the background. Just how do you do it today? This is how you do it today. It's opinionated because he takes basically what he does for the ATTRS or adders project and talks about doing that. So that's pretty much what it is. It's about distribution. Yeah, that's cool. I love the subtitle, a completely incomplete guide to packaging a Python module and sharing it with the world on PyPI. It's beautiful. And I know that for some people, it might be a little bit frustrating that we as a community, we're not done. This is probably not the final solution for packaging. It's still being worked on. People are still coming up with ideas for how to
Starting point is 00:10:20 maybe make this easier. And it's pretty darn easy now. Yeah, it is not too bad. I've put something up on PyPI before, and I was like, really, that's it? That's actually pretty darn easy. So basically, I think the challenge here is actually creating the package, not getting it on PyPI.
Starting point is 00:10:41 Like once you've got the package, getting it on PyPI is actually like a few CLI argument commands. And you basically have to have an account and set up like a profile file that has your info in it. But other than that, you're kind of done. So yeah, if we could, the more we can make packaging easy and obvious, the better. And then some of the differences between getting a package ready for sharing within just a local group at work or something and getting it ready for PyPI, a lot of it is just getting all the metadata there that it's nice to have for distributions. One of the confusions as well, I think, is the word package, because that really has two meanings. In Python, a package can be just a directory with an init
Starting point is 00:11:26 py file, but it also is a distribution because the PyPI is not the Python distribution index, it's the package index. So there's a little bit of confusion there. Yeah, that's for sure. That's for sure. Luckily, consuming them is all nice and easy. The next thing that I want to cover is basically a set of example algorithms, especially if you're looking for a new job, or you're going to do an interview. But also, if you're coming from another language, I think it's helpful to study algorithms in like simple forms. So imagine like you're super good at Java, and you know how to do, say, like a depth first tree traversal in Java.
Starting point is 00:12:08 How do I do this in Python? Right. Is it simpler? Is it harder? Whatever. Right. So there's this GitHub repository that's a minimal set of minimal examples of data structures and algorithms in Python. And there are many of them here.
Starting point is 00:12:23 The GitHub repo is just algorithms. So for her name. But it's all Python. And there are many of them here. The GitHub repo is just algorithms. So for her name, but it's all Python. And you look at them and it's like, here's how you create the, how you would do a greatest common denominator computation in Python. And these are like the six lines of Python you write. Here's how you reverse a linked list. Here's how you would do a binary search and things like that. And so regardless, if you're looking for a new job, if you're trying to compare one implementation of another language to Python, to the Pythonic style, like there's a lot of cool stuff going on in this. This is actually pretty cool. When I saw this at first, I sort of dismissed it as, you know, just interview material. But there's some
Starting point is 00:13:01 decent things in here, like rotating an image, doing subsets, that I would definitely know how to do coming from a different language there, like in C++, but yeah, this is good. I like it. It's pretty cool, right? Yeah, to me, I think this is, you could try to solve this yourself
Starting point is 00:13:19 and then compare that against, you compare your solution against what's here. I feel like if I did that, I'd have similar experience to what I did with PyCheckIO, their Python stuff. So that's kind of that game, that Python game, and you like conquer islands by writing Python code, which is interesting. But then you can view other people's solutions to the steps in the games. And I realized like I have a particular style that's different than other people's style. And some ways there's better, some ways mine's better, but I think you would also get the same experience here for algorithms. Yeah, definitely. And also sometimes when you just need to be able
Starting point is 00:13:52 to do something for a work, you don't want to come up with your own solution. I just want, how do I do this in Python? Exactly. Just somebody just show me. That's great. Yeah. So that's cool. And you know, it's, it's an open source project. So if you actually want to contribute back, you look at it and you're like, oh, this is good. But actually you could write a more Pythonic implementation of a particular algorithm. You could contribute back to that, right? Yeah, yeah.
Starting point is 00:14:15 But what if you don't have time? This is one of those great transitions, folks. There's a lot of ways you could still contribute to open source if you don't have time. And I think there's a lot of people, especially I've talked with a lot of people about open source contributions. And there's times in your life where you've got more time to devote to something and then it to open source and then things happen like a new job or a change in your job or maybe a baby or something happens where you don't have as much time and there's ways to stay involved. There's a nice article called Eight Ways to Contribute to
Starting point is 00:14:51 Open Source When You Have No Time. I think people forget that there is, when they're used to contributing code, there's other ways to contribute to make a project successful. And he lists a handful of them like bug triaging, like going through the defect reports and our bug reports and trying to figure out adding detail or asking for more detail or cleaning those up. That's a lot of things you can do with just if you've got a few minutes. I think that's great because one of the things that to me is a big red flag for open source projects is if i go there and there's a ton of unanswered bugs yeah not like there's a conversation they haven't been closed necessarily but they're like not even responded to and even worse is pull requests
Starting point is 00:15:38 like people have taken the time to like spend an afternoon and write some new feature and the people can't even be bothered to say no this is not is not good or it's good. Like it's, that's to me seems like a real red flag on these. So like, this is a way to keep these projects healthy. I think you just jumping in and helping out with that kind of stuff. Yeah. And then there's along those same lines is mailing list support. If there's a mailing list around the project, be one of the people that answers some of the newbie questions. That's huge help to people running the project. Documentation patches. I don't know of an open source project that doesn't have documentation holes and things that could be cleaned up with their documentation.
Starting point is 00:16:18 Sure. Well, and there's a big tension in taking new things. So, for example, there might be a pull request that says, I want to change the way this works. And it might be like super simple to change one thing about it, but it might have like so many knock-on effects into little areas, but that are like problematic. So for example, you might want to change the way you start some new project, but if even like the steps are self-describing that happen as you like run some little like scaffolding thing, if that changes, then you've got to go change all the documentation.
Starting point is 00:16:52 You've got to go change all the samples. You've got to just like, all that stuff is like friction to prevent people from accepting pull requests. And so if you could help reduce that friction, that'd be good. I didn't even think about that. You could help the person doing the person having a pull request. You could work on their branch as well and say, hey, we need to add documentation changes to this before it gets pulled in. Yeah, for sure.
Starting point is 00:17:14 And then my favorite, actually, these are all great, but there's a bullet here for marketing. Talking about your project on community or social media or blogging or podcasting about your favorite open source project. Yeah, that's cool. That's near and dear to my heart because I've been doing that with PyTest on Testing Code and on the blog, trying to promote what I think is the best testing platform on the planet. But it wasn't really viewed as that before I got started. So I don't know if I doubt I'm the only person to take credit for that, but I think I helped a little bit. Well, and you've taken it to a very extreme level by writing a whole book. Yeah. Oh yeah. It doesn't, that's not even listed in here is you could write a book about your
Starting point is 00:18:01 project. Yeah. That's actually a good point. Like you can spread the word and education about it by writing blog posts, but you could also do video tutorials. You could do online courses about an open source project. You could write a book about it. There's like, like marketing is like really actually super broad. And it could be that the person who's great at programming is not really as good or interested in doing that, or even maybe just their time is better spent like creating features. And you could be that the person who created a program and is not really as good or interested in doing that, or even maybe just their time is better spent creating features, and you could be spreading the word about it. There's a lot of good ways there. And then there's a second half of the article that talks about basically ways to find more time in your life. If you really want to try to find time, there's a couple ways, which whether they're realistic or not,
Starting point is 00:18:43 the one that amused me is if you're having trouble sleeping, why try sleeping? Just get up and work on your open source projects. That's right. Use it as a sleeping. You know, one of the things I think you can easily, a lot of people can easily do is not watch television. If you're an average person, especially average American, if you're looking to find more ways to, to more time in your life to do things like this or, or work on your own projects or whatever, we spend a lot of time on TV. And if you don't watch it, you find your app, your evenings all of a sudden have some time for these kinds of things. You know, I totally see that point, but I also want to have some moderation there.
Starting point is 00:19:25 You can cut cold turkey and have a ton of free time. Yes. But when I tried to do this and realized that was also like an hour a day or something that I was hanging out with my wife that if I didn't do that. Yeah. So I would moderate that and say, also just pay attention to how much time you're spending. And if you want to watch a little TV at night, go for it. But maybe put a limit on it to say, you know, when one show's done, I'm not going to try to find something else.
Starting point is 00:19:51 I'm just going to turn it off and go do something. Open source. Yeah. Absolutely. Sounds good. All right. So speaking of open source, the last thing I want to cover for us is a real open source success story. And we talked about NumPy at the
Starting point is 00:20:05 beginning. NumPy is really one of the super foundational building blocks for all the scientific data science side of Python. As we've seen and covered in a couple of ways, like some of the massive growth, a good portion of the last three or four years of massive growth in Python has to do with data science. So NumPy is like really a core pillar of that whole area, right? Yes. So there's really good news for NumPy. They have just received a $645,000 grant for the next two years to improve NumPy.
Starting point is 00:20:40 That's very exciting. That is really great. We had PyPi recently receive the $200,000 Mozilla grant. And now we have NumPy getting almost three quarters of a million dollars to make it better. So this grant comes from the Moore Foundation and is going through UC Berkeley's data science program. So Dr. Nathaniel Smith is like sort of shepherding this. You know, of course, NumPy was started by Travis Oliphant, a continuum back in 2006. And it's great to see it growing. So just another open source success project.
Starting point is 00:21:12 Yeah, definitely. That's neat. All right. Very, very good news. I don't want to, you know, don't have a whole lot more to say other than I just want to call it out that, you know, here's another great funding coming into Python and open source. Any more news for you on the book?
Starting point is 00:21:25 I'm very excited that I've got a little bit of a break because I've got all of the book turned in and it's at the point where it's gone out to a handful of actually quite a few technical reviewers who go through it and make sure I didn't make any horrible mistakes or leave out something very crucial. And I've got a great team of people set up to do that. Luckily, actually a lot of the core contributors to PyTest have agreed to help out with that, which is amazing. Very humbled by that.
Starting point is 00:21:57 That's awesome. And then, yeah, then it's out of my hands for the most part. I'm on the line for making changes if anybody comes up with something. These are all pretty picky people, so I probably will have a lot of changes. But then it's off to being ready to probably ship a physical copy September or October. That'd be cool. You can actually put it on your bookshelf. Then you'll have officially done it.
Starting point is 00:22:21 Yeah. That's awesome. All right. Well, congratulations. Not a lot of news on my end to report. I'm just hanging out here in Ireland for a short work trip. That's just awesome, man. I wish I was there with you.
Starting point is 00:22:34 Yeah, it's been fun. Definitely been fun. So, all right, well, thanks, Brian, as always, for finding all these cool things to share with everyone. And everyone, thank you for listening. Thank you. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
Starting point is 00:22:49 That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast
Starting point is 00:23:08 with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.