Python Bytes - #32 8 ways to contribute to open source when you have no time
Episode Date: July 1, 2017Topics covered in this episode: [more] Introducing Dash Keeping Python competitive PyPI Quick and Dirty Minimal examples of data structures and algorithms in Python 8 ways to contribute to open so...urce when you have no time NumPy receives first ever funding, thanks to Moore Foundation Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/32
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 32, recorded on June 29th, 2017.
I'm Michael Kennedy.
And I'm Brian Ocken.
And we've got a bunch of great stuff lined up for you.
But first, I just want to say apologies for the slightly off audio on my end.
I'm not dialing in from the Python Byte studio in Portland, Oregon.
I'm actually on the road. So Brian and I are doing a little bit different this week.
Yeah, it's ungodly early at 6am here.
I don't know what your problem is. It's 2 in the afternoon over here in Ireland.
I slept in.
The magic of Skype.
The magic of Skype. We live in the future. We just don't really fully appreciate it.
All right, let's talk about web apps. This time you are the one bringing up a web app.
Yeah, so this is pretty exciting. There's a Medium article called Introducing Dash,
and Dash is a reactive web app open source project from Plotly. And it looks really exciting. The graphics and the
plots that you can do on this are kind of amazing. And it looks like an interactive real-time
web page with like interactive graphs and you hook up input and output and data coming in and out.
And it's really kind of hard to describe,
but people should check it out because it's amazing.
Yeah, it looks really, really cool.
And a lot of it is done in Python, right?
Yeah, so there's Python and Pandas and Flask and React and JSON
and all sorts of stuff like that involved to make this stuff work.
But it ends up being some fairly impressive demos with just a handful of lines of code.
Yeah, that's super cool.
So basically, if you're trying to do visualizations with some of the data science tooling, you
can just make that available on the web, not as pictures, but in a super interactive format,
right?
Which is great.
Yeah.
And they say it's good for data analysis, data exploration, visualization, modeling, and they also include
instrument control and reporting in what they think is a good application. I want to try this
for, uh, for instrument control and visualization myself. Oh yeah. That sounds, that looks really,
really cool. I kind of feel like I wish I had something to show so I could play with it,
but I just don't have that much to graph these days.
I used to do a lot with science, but not in the last 10 years.
Maybe we could do, I don't know, plotting how much traffic our website gets or something.
Yeah, actually, that would actually be kind of fun, like bandwidth by country or downloads over time.
Or who knows?
We could actually play with that.
That might be pretty cool.
And then they include a link in this, but there's a user guide that has a gallery.
And it looks like it's both pricing up.
So I think it's both something you can use as a service or yourself with the tool.
Yeah, cool.
That looks very, very nice. Definitely it will give you that pro touch if you're trying to put yourself with the tool. Yeah, cool. That looks very, very nice.
Definitely it will give you that pro touch
if you're trying to put graphs on the internet and you're using Python.
Especially if you're trying to stay competitive.
Yeah. You know what?
There was a Python language summit back in the end of May,
so almost exactly one month ago at the time of this recording.
And one of the topics that came up was how do we keep
Python competitive? And this has two angles, right? There's basically one angle is how do we
keep Python competitive so you don't hear people going, I'm going to rewrite everything in Go or
something silly like that, which seems to be like a meme or something that's happening quite often. But also, how do we get people to move from legacy Python to modern Python?
And there have been a bunch of interesting little features that have been added to Python.
The async IO stuff, we've talked a lot about, you know, little language touches,
like cleaner ways to generate dictionaries from sets of dictionaries, you know, union sort of thing, that kind of stuff. But a couple years ago, they really started hitting the drumbeat of, you know
what, the thing that actually matters the most to people is just flat out performance. If we could
make Python 3 faster than Python 2, if we could make Python 3 use less memory than Python 2,
that is going to be a solid reason for these big companies with big code bases to move to Python
3 and really change that equation. And so this was sort of a conversation about how do we keep that
going at the Language Summit, from what I understand. It's not entirely clear how that all
goes together. I think this was mostly based on a presentation by Victor Steiner. He's done a ton of stuff for performance in the last couple versions of Python.
I think this style of approaching the problem of, like,
how do we get adoption of Python 3 over Python 2,
and the decision to say, well, let's focus on performance,
I think that's actually working.
Like, we saw this to some degree with the Instagram presentation
we covered last time, right?
Yeah, so those guys got, I think, 40% less memory
usage on their async tier, and they got 12% less CPU usage on their web tier. And when you talk to
about companies like Instagram, that's a lot. That's a lot of servers. Right. So that's really
nice. Yeah. Well, and then also just some of the feedback we've gotten about people switching some applications to asynchronous within Python and AIO, having like 10 times speed up or 100 times speed up sometimes.
Yeah, that's a good point. That's a really good point. It's not about the CPU. It's just about leveraging the async IO bit, which is so much easier. So this is kind of a summary of
that conversation. Like I don't think the language segment is recorded. I could be wrong, but this is
a write-up of that presentation. So it's kind of nice. It says, basically, we really need to keep
Python performant to be competitive with other languages, but it's not as easy to optimize as, say, optimizing C Sharp or Java or C because of the boundary that the C API brings.
Basically, there's a lot of stuff that ways of working that you're forced to follow in Python to keep the C API working.
And the C API is actually a really important part of the Python performance story, right?
Yes.
Yeah.
Yeah.
So if you're going to use NumPy, that's super fast.
But NumPy basically is just a C, mostly written in C.
So you can't break that because you might make the Python code go faster,
but you're going to lose the ability to do the C stuff.
So that's really pretty interesting.
And they say it's great to compare Python 3 to Python 2 and say, oh, look,
it's much faster by most benchmarks. But what you really need to do is compare it against modern
languages, not languages from the year 2000. So let's try to work on this. There was some talk
about the JIT implementations. We've got PyPy, which is like five times faster, but is not very compatible
because there's mostly because of the C API, but also some other things, I think.
There's Pidgin done by Dino Veland and Brett Cannon at Microsoft. And that's actually a really
interesting thing to bring JIT compilation to proper standard CPython, not yet another fork
of it. So that's pretty interesting.
And the final thing that someone proposed there was like,
is there a way to use the type hints and types annotations
that are appearing in Python 3
to make a slight variation of Cython, which compiles to C,
that lets you write code that's closer to regular Python
and leverage those type hints?
Because it actually would, you know, basically in Cython,
you have to say what the types are,
but you're kind of would do that anyway,
if you have the type hints in there.
So there's a lot of interesting stuff just brewing,
you know,
for the future there.
That's a kind of a really interesting idea.
I like that.
Like if you've got a whole,
like a huge data set and it's,
it's not going to change,
it's going to be a fixed data type and you're declaring it with type hints anyway.
Having the language be able to take advantage of that
and just behind the scenes just Cythonize it or something,
that would be slick.
I would love that.
It would actually be pretty darn cool, wouldn't it?
So, yeah, we'll see.
I mean, to me, I almost see, like, could you in C or C++,
you can have, like, inline assembler, right?
You say this little bit, these five lines, this is assembly code,
but, like, we need this.
Or you can, like, inline methods.
It would be cool if you could say here within my regular Python code,
this one function where this is the thing we do all the time,
this one and two functions this is like
you know at you do an at cython on it and it just goes that'd be cool yeah well this is the future
i want to see definitely all right so that'd be a quick and dirty solution to uh make it a faster
if i could just put an at cython on things yeah and um man i was just i have a hard time not
laughing when we do these um. They're so bad.
We should just take one episode and just see what's the worst possible thing we can do. The next article is PyPI Quick and Dirty.
It's by Heineck.
And I met him at PyCon.
I shook his hand and told him I loved what he's doing.
And he said, oh, you're the guy that always mispronounces my name on podcasts.
Anyway, sorry, Heineck.
This is an awesome article.
We've talked about packaging before on the podcast, but this is a really good quick write-up
of how to package your code and get it ready and put it up on PyPI.
Just a little bit of history, not too much of the background.
Just how do you do it today? This is how you do it today. It's opinionated because he takes basically what he does for the
ATTRS or adders project and talks about doing that. So that's pretty much what it is. It's
about distribution. Yeah, that's cool. I love the subtitle, a completely incomplete guide to
packaging a Python module and sharing it with the world on PyPI. It's beautiful.
And I know that for some people, it might be a little bit frustrating that
we as a community, we're not done. This is probably not the final solution for packaging.
It's still being worked on. People are still coming up with ideas for how to
maybe make this easier. And it's pretty darn easy now.
Yeah, it is not too bad.
I've put something up on PyPI before,
and I was like, really, that's it?
That's actually pretty darn easy.
So basically, I think the challenge here
is actually creating the package,
not getting it on PyPI.
Like once you've got the package,
getting it on PyPI is actually like a few CLI argument commands. And you basically have to have an account and set up like a profile
file that has your info in it. But other than that, you're kind of done. So yeah, if we could,
the more we can make packaging easy and obvious, the better. And then some of the differences
between getting a package ready for sharing within just a local group at work or something and getting it ready
for PyPI, a lot of it is just getting all the metadata there that it's nice to have for
distributions. One of the confusions as well, I think, is the word package, because that really
has two meanings. In Python, a package can be just a directory with an init
py file, but it also is a distribution because the PyPI is not the Python distribution index,
it's the package index. So there's a little bit of confusion there.
Yeah, that's for sure. That's for sure. Luckily, consuming them is all nice and easy.
The next thing that I want to cover
is basically a set of example algorithms, especially if you're looking for a new job,
or you're going to do an interview. But also, if you're coming from another language, I think it's
helpful to study algorithms in like simple forms. So imagine like you're super good at Java,
and you know how to do, say, like a depth first tree traversal in Java.
How do I do this in Python?
Right.
Is it simpler?
Is it harder?
Whatever.
Right.
So there's this GitHub repository that's a minimal set of minimal examples of data structures and algorithms in Python.
And there are many of them here.
The GitHub repo is just algorithms. So for her name. But it's all Python. And there are many of them here. The GitHub repo is just algorithms. So for her name,
but it's all Python. And you look at them and it's like, here's how you create the, how you
would do a greatest common denominator computation in Python. And these are like the six lines of
Python you write. Here's how you reverse a linked list. Here's how you would do a binary search
and things like that. And so regardless, if you're looking for a new job,
if you're trying to compare one implementation of another language to Python, to the Pythonic
style, like there's a lot of cool stuff going on in this. This is actually pretty cool. When I saw
this at first, I sort of dismissed it as, you know, just interview material. But there's some
decent things in here, like rotating an image, doing subsets,
that I would definitely know how to do
coming from a different language there,
like in C++, but yeah, this is good.
I like it.
It's pretty cool, right?
Yeah, to me, I think this is,
you could try to solve this yourself
and then compare that against,
you compare your solution against what's here.
I feel like if I did that, I'd have
similar experience to what I did with PyCheckIO, their Python stuff. So that's kind of that game,
that Python game, and you like conquer islands by writing Python code, which is interesting.
But then you can view other people's solutions to the steps in the games. And I realized like
I have a particular style that's different than other people's style. And some ways there's better, some ways mine's better, but I think you would also get the same
experience here for algorithms. Yeah, definitely. And also sometimes when you just need to be able
to do something for a work, you don't want to come up with your own solution. I just want,
how do I do this in Python? Exactly. Just somebody just show me. That's great. Yeah. So that's cool.
And you know, it's, it's an open source project.
So if you actually want to contribute back, you look at it and you're like,
oh, this is good.
But actually you could write a more Pythonic implementation of a particular algorithm.
You could contribute back to that, right?
Yeah, yeah.
But what if you don't have time?
This is one of those great transitions, folks.
There's a lot of ways you could still contribute to open source if you don't have time.
And I think there's a lot of people, especially I've talked with a lot of people about open source
contributions. And there's times in your life where you've got more time to devote to something
and then it to open source and then things happen like a new job or a change in your job or maybe a
baby or something happens where you don't have as much time and
there's ways to stay involved. There's a nice article called Eight Ways to Contribute to
Open Source When You Have No Time. I think people forget that there is, when they're used to
contributing code, there's other ways to contribute to make a project successful.
And he lists a handful of them like bug triaging, like going through the defect reports
and our bug reports and trying to figure out adding detail or asking for more detail or
cleaning those up. That's a lot of things you can do with just if you've got a few minutes.
I think that's great because one of the things that to me is a big red flag for open source projects is if i go
there and there's a ton of unanswered bugs yeah not like there's a conversation they haven't been
closed necessarily but they're like not even responded to and even worse is pull requests
like people have taken the time to like spend an afternoon and write some new feature and the
people can't even be bothered to say no this is not is not good or it's good. Like it's, that's to me seems like a real red flag on
these. So like, this is a way to keep these projects healthy. I think you just jumping in
and helping out with that kind of stuff. Yeah. And then there's along those same lines is mailing
list support. If there's a mailing list around the project, be one of the people that answers some of the newbie questions.
That's huge help to people running the project.
Documentation patches.
I don't know of an open source project that doesn't have documentation holes and things that could be cleaned up with their documentation.
Sure.
Well, and there's a big tension in taking new things.
So, for example, there might be a pull request
that says, I want to change the way this works. And it might be like super simple to change one
thing about it, but it might have like so many knock-on effects into little areas, but that are
like problematic. So for example, you might want to change the way you start some new project,
but if even like the steps are self-describing that happen as you like run some little like
scaffolding thing, if that changes, then you've got to go change all the documentation.
You've got to go change all the samples.
You've got to just like, all that stuff is like friction to prevent people from accepting
pull requests.
And so if you could help reduce that friction, that'd be good.
I didn't even think about that.
You could help the person doing the person having a pull request.
You could work on their branch as well and say, hey, we need to add documentation changes to this before it gets pulled in.
Yeah, for sure.
And then my favorite, actually, these are all great, but there's a bullet here for marketing.
Talking about your project on community or social media or blogging or podcasting about
your favorite open source project. Yeah, that's cool. That's near and dear to my heart because
I've been doing that with PyTest on Testing Code and on the blog, trying to promote what I think
is the best testing platform on the planet. But it wasn't really viewed as that before I got started. So
I don't know if I doubt I'm the only person to take credit for that, but I think I helped a
little bit. Well, and you've taken it to a very extreme level by writing a whole book.
Yeah. Oh yeah. It doesn't, that's not even listed in here is you could write a book about your
project. Yeah. That's actually a good point. Like you can spread the word and education about it by writing blog posts, but you could also
do video tutorials. You could do online courses about an open source project. You could write a
book about it. There's like, like marketing is like really actually super broad. And it could
be that the person who's great at programming is not really as good or interested in doing that, or even maybe just their time is better spent like creating features. And you could be that the person who created a program and is not really as good or interested in doing that,
or even maybe just their time is better spent creating features, and you could be spreading the word about it.
There's a lot of good ways there.
And then there's a second half of the article that talks about basically ways to find more time in your life.
If you really want to try to find time, there's a couple ways, which whether they're realistic or not,
the one that amused me is if you're
having trouble sleeping, why try sleeping? Just get up and work on your open source projects.
That's right. Use it as a sleeping. You know, one of the things I think you can easily,
a lot of people can easily do is not watch television. If you're an average person, especially average American,
if you're looking to find more ways to, to more time in your life to do things like this or,
or work on your own projects or whatever, we spend a lot of time on TV. And if you don't watch it,
you find your app, your evenings all of a sudden have some time for these kinds of things.
You know, I totally see that point, but I also want to have some moderation there.
You can cut cold turkey and have a ton of free time.
Yes.
But when I tried to do this and realized that was also like an hour a day or something that
I was hanging out with my wife that if I didn't do that.
Yeah.
So I would moderate that and say, also just pay attention to how much time you're spending.
And if you want to watch a little TV at night, go for it.
But maybe put a limit on it to say, you know, when one show's done, I'm not going to try to find something else.
I'm just going to turn it off and go do something.
Open source.
Yeah.
Absolutely.
Sounds good.
All right.
So speaking of open source, the last thing I want to cover for us is a real open source success story.
And we talked about NumPy at the
beginning. NumPy is really one of the super foundational building blocks for all the
scientific data science side of Python. As we've seen and covered in a couple of ways, like some
of the massive growth, a good portion of the last three or four years of massive growth in Python
has to do with data science. So NumPy is like really a core pillar of that whole area, right?
Yes.
So there's really good news for NumPy.
They have just received a $645,000 grant
for the next two years to improve NumPy.
That's very exciting.
That is really great.
We had PyPi recently receive the
$200,000 Mozilla grant. And now we have NumPy getting almost three quarters of a million dollars
to make it better. So this grant comes from the Moore Foundation and is going through UC Berkeley's
data science program. So Dr. Nathaniel Smith is like sort of shepherding this. You know, of course, NumPy was started by Travis Oliphant, a continuum back in 2006.
And it's great to see it growing.
So just another open source success project.
Yeah, definitely.
That's neat.
All right.
Very, very good news.
I don't want to, you know, don't have a whole lot more to say other than I just want to
call it out that, you know, here's another great funding coming into Python and open
source.
Any more news for you on the book?
I'm very excited that I've got a little bit of a break
because I've got all of the book turned in
and it's at the point where it's gone out to a handful of
actually quite a few technical reviewers
who go through it and make sure I didn't make any horrible mistakes
or leave out something very crucial.
And I've got a great team of people set up to do that. Luckily, actually a lot of the core contributors to
PyTest have agreed to help out with that, which is amazing. Very humbled by that.
That's awesome.
And then, yeah, then it's out of my hands for the most part. I'm on the line for making changes
if anybody comes up with something.
These are all pretty picky people, so I probably will have a lot of changes.
But then it's off to being ready to probably ship a physical copy September or October.
That'd be cool.
You can actually put it on your bookshelf.
Then you'll have officially done it.
Yeah.
That's awesome.
All right.
Well, congratulations.
Not a lot of news on my end to report.
I'm just hanging out here in Ireland for a short work trip.
That's just awesome, man.
I wish I was there with you.
Yeah, it's been fun.
Definitely been fun.
So, all right, well, thanks, Brian, as always,
for finding all these cool things to share with everyone.
And everyone, thank you for listening.
Thank you.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.