Python Bytes - #72 New versioning: Episode 0.0.7.2 (with 72 releases)
Episode Date: April 5, 2018Topics covered in this episode: ZeroVer: 0-based Versioning GitHub Security Alerts Detected over Four Million Vulnerabilities Markdown Descriptions on PyPI Concurrency comparison between NGINX-unit... and uWSGI Loop better: A deeper look at iteration in Python Misconfigured Django Apps Are Exposing Secret API Keys, Database Passwords Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/72
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 72, recorded April 4th, 2018. I'm Michael Kennedy.
And I'm Brian Hocken.
And we've got some awesome stuff for you. Before we get to it, Brian, I want to say thank you to Datadog.
Check out what they're offering over at talkpython.fm slash datadog.
And they've got a bunch of cool stuff, including a cute little doggy t-shirt.
I still need to get one of those.
I know. I don't have a shirt either. I really need to do that.
All right. I guess the first thing we should talk about is versioning.
When I normally look at commercial software, they have numbers in the front like version 6 or 12 or version 3 of this software.
It's pretty rare to have three dot anything
in open source, isn't it?
I don't know if it's rare,
but there sure is a lot of stuff that's still,
we think of stuff that starts with a zero
as being in beta and it's not.
Well, like for instance, with semantic versioning,
you can, once it's at 1.0,
the interface is pretty solid and you can depend on it.
But there was a website put up this month called ZeroVer to talk about zero-based versioning.
And it's sort of tongue-in-cheek.
It's from one of our friends, Mahmoud Hashemi, and some others that started it. But they kind of wanted to call out a bunch of Python projects and other projects
that are like perpetually
starting with Xero by putting
up this sort of mock
website to say that
you don't need to do anything other than
Xero-based versioning. It really helps you
it gives it like it starts out
with a down-to-earth demo. It's pretty awesome.
So it has like some versions and says
yes, these are good. 0.0.1 0.1.0.0.4.0 and then no 1.0 1.0.0 1.0.0 2018.04.0 like none of these are
okay right and uh yeah so if you haven't figured it out, it is a joke. But it is, like, for instance, I guess I hadn't realized this.
Flask is one of the ones that was called out.
It's currently 0.12.2.
Come on, it's been eight years.
I think that maybe you can go to a 1.0.
I have a new solution here, a new way to solve this problem.
Just whenever you print or look at the version
number just strip off all the left zeros and dots so it's like 12.2 is basically what flask is you
know there's some pretty ones that are very dependent on by a bunch of people that like
if they completely change the interface the api that would be bad. So there clearly is the point where they could bump to a 1.0.
Yeah, I think eight years, absolutely, is a time frame where you could say,
you know, we're pretty stable at this point, right?
Like also Pandas in here with a 0.23, and it's 7.1 years.
And also they count the releases, right?
Like 21 releases of Flask, 75 releases
of Pandas, and it's still on a zero. One thing I would like to point out is if you go and you
look at a lot of the sort of manager folks at more commercial oriented enterprise software groups,
so people that use like.NET or other non-open source development ecosystems, when they see things like 0.1, they're like, oh, this thing is not ready for us to use.
We can't use this at our company.
And I think it actually sends a bit of a message that this thing's not quite ready.
I mean, I know obviously looking at this list, it doesn't mean that.
But a significant number of people, I think, interpret it that way.
And so, you know, I think it's worth considering maybe saying, all right, we're actually at a
version where we're going to call it 1.0. Like Flask is probably fully ready for 1.0.
Anything that starts with it, it's just the dots and it's not date-based. People kind of assume
it's a semantic versioning. So I think semantic versioning is the way to go. It's not an easy
thing though.
And that's part of the reason why they're, they're being a little gentle with it there.
If you check out the about page, it talks about on here, it talks about really what you should do
about it. But when you're actually running a project, it's hard to decide when what's something
that's big enough to go to flip the major digit. Django is doing that pretty well. I think right
there, they had their one Oh, and that was stable for a long time.
They said, we're going to make a major break and change,
so we're flipping to 2.0, the DropPython 2 support,
and all that kind of stuff.
Django is one that's not following Mahmood's recommendation.
Recommendation, yeah.
Yeah, I love it.
I love it how he branded that, him and all the folks involved.
It's very cool.
So when we build these projects in Python and any open source system,
you basically layer on a whole bunch of external dependencies and packages and stuff.
How do you know when something has gone terribly wrong?
Like suppose you depend on Vexig in Flask,
and there's some huge security vulnerability in that dependency.
Do you get notified?
How do you know?
No, I don't know.
You don't know.
Right.
And so this is actually a really big problem.
It's like when you think about problems or security issues with your application, it's
not just what you have.
It's the stuff you're built upon.
I mean, the whole Equifax thing was a vulnerability in the, was it Swing?
I don't know, some foundational library in Java.
And they just didn't patch it in time, right?
So getting notified of these things is really important.
And so much of our code lives on GitHub.
And GitHub decided they're going to take some responsibility for this and try to help people.
So there's a nice article that says,
GitHub security alerts detected over 4 million vulnerabilities last year.
I think it was in the year.
Actually, it's not even the year.
It's since like November of last year or something.
So that's pretty insane.
So they launched this thing called GitHub security alerts.
Initially, it's only for Ruby and JavaScript, which is lame.
But they have Python support coming, which is why I'm talking about this.
And what it does is it looks at your GitHub repo and it says, are you using a certain dependency? Does that dependency have a known security vulnerability? If it does, then like
right at the top of your repo, you get this great scary warning that says your application isn't
insecure because it depends on this thing that is insecure. Yeah, actually, that's a great idea.
Yeah. I don't know if you get an email notice, but certainly your your repo looks scary. When that's the case, like this
happened to one of my courses, and it just came back again. Because one of my courses, the Python
course demonstrates using electron j editing and electron j s app and electron j s had some
security vulnerability. It's not actually used, but you know, whatever, it still says your app,
it depends upon ElectronJS
and it's got this issue.
It's pretty cool. There's some good numbers
and whatnot here. It says
nearly half of all the displayed
alerts were responded to
within a week
and 30%
were fixed within the first seven days.
Oh, that's great. That's good. That's a good thing
that they're adding that. And it does. So you said it's coming for Python and I see that
there is planned for this year for 2018. So that's good. Yeah. There's not a whole lot of details
about exactly when it's coming, but yeah, that will be great. They said, if you look at repositories
that have had a contribution in the last 90 days, so things that are active, it says 98% of such repositories were patched within
fewer than seven days.
Like that's insane.
That's a really big deal.
Yeah.
Yeah.
So they said they found over half a million of repositories that had some
kind of security vulnerability and were pretty much fixed up.
So anyway,
that's all really good.
I just want to give a shout out to pie up as well.
P Y U P dot IO.
I use that for my stuff and it basically does
the same thing and more for python already so you link it to your github repo it'll like look
at your requirements.txt if there's a new version it'll send you a pr to upgrade your your dependencies
and if there's a security alert it'll tell you don't really want to get on this tangent too far
but i started using pyup for the cards project that I started
recently. Since I'm sort of doing this project, I can't remember who I read it from, but the packages
that are intended to be used by other applications probably shouldn't have their versions pegged.
So if I unpegged all my versions in a package, then PyUp.io kind of complains about that.
Yeah, that's a little bit.
It does require you to more or less pin your versions.
And you can do expressions like I want it to be this version or higher.
And I think maybe it'll upgrade it.
I don't know.
There's a little flexibility.
It's not perfect.
But for like fixed apps like my web apps all have the stuff pinned and it just automatically
updates because nothing depends upon it.
It's fine.
Yeah.
Yeah, pretty nice.
And it's free for stuff. So open source yeah it's pretty nice great speaking
of open source pypi is the place where it lives and now you can describe it better right yes i'm
very excited about this because like the cards project i was working on i was sort of bummed
that i had to put the readme in rest in or not rest, but restructured text.
And now you don't anymore.
That's awesome.
So readme.md and a couple other variants of that extension are now supported on pypi.org.
And we're linking to a couple articles, one of them basically describing all the steps you have to do. There's a little bit of changes you have to do to your setup.py file and a couple other things and update all your tools.
But for the most part, it just works, and that's awesome.
And then also, just recently, GitHub-flavored Markdown has been added.
Oh, yeah, that's nice.
GitHub-flavored Markdown has a little bit more, I think,
from the stuff that I played with.
Like tables and cross-
Yes, tables.
Mark-through and stuff. Yes, like tables.
Mark through and stuff.
So that's nice.
And I'm looking forward to changing a couple of projects to utilize that.
And now the old legacy PyPI,
which I think maybe they've taken from your legacy Python.
I love it.
Yeah.
It still renders the descriptions as plain text,
but they comment, don't worry, it's going PyPI.org is really close to being the thing. So maybe this will just hasten the move away from legacy PyPI with like the descriptions looking funky. Yeah. So hopefully.
Yeah. Awesome. I'm really excited to see PyPI making some progress. It felt kind of
stale for a little while and it seems like it's really been rocking the last nine months.
To be fair, even if your markdown gets displayed as plain text on legacy pipe BI,
that's the point of markdown is it's still readable. So that's okay.
Exactly. If it were HTML with lots of styles that have been different. That's right.
Yeah.
Nice. All right. So before we get to the next one, let me tell you about Datadog.
It's a monitoring solution provides like deep visibility and tracking into your distributed apps. So your
application, your data layer, your servers, your services, everything. So within minutes,
you'll be able to investigate bottlenecks and actually see where they are throughout your
entire distributed app, which is pretty cool to put it together. So if you want to visualize your
Python performance today, get started with a free trial and And to also get that cool Datadog t-shirt, visit pythonbytes.fm slash Datadog.
Earlier I said TalkPython.
They both work.
But pythonbytes.fm slash Datadog.
Speaking of web apps and distributed things and whatnot, I think there's a really interesting
new web server that people should start paying attention to in the Python space.
So you've probably heard of Nginx, right, Brian?
I know you don't do a ton of web stuff, but yeah.
Yeah, definitely.
Nginx is kind of like the static front-end server and load balancer thing for many web
apps.
On my sites, I have Nginx hitting, it takes all the requests, does the SSL stuff, any
static resources, CSS, JavaScript images, that just gets sent
straight back. And only the sort of data driven stuff makes its way back to the Python web server,
which in my case is micro whiskey. And micro whiskey is really nice. But the NGINX folks have
come up with this thing called NGINX unit. And so the thing I want to link to is this
performance comparison between NGIN Unit and MicroWhiskey.
So MicroWhiskey is written in C++.
It's like one of the best high-performance things that will run and farm out your Python application, Pyramid Flask, whatever.
And it works really well.
But Nginx Unit is a little more flexible.
And, for example, you can configure it over a RESTful API instead
of just config files. It'll run multiple languages and versions at the same time,
improve TLS support, HTTP2, which is cool. It'll run Python, multiple versions. It'll run Go,
Ruby, JavaScript, whatever, right? So it'll run all these things in this one server.
It's not just I'm going to run one flavor of Python.
So anyway, it's pretty cool.
And the thing I wanted to look at was this comparison.
So there's this, I don't know who did it actually,
a group that put together sort of a performance analysis
and said we're going to slowly add more and more traffic,
concurrent traffic, to both of these things add more and more traffic, concurrent traffic,
to both of these things running more or less a Hello World Flask app.
And so pull up the pictures, and those of you who are listening,
there's a little link, you can pull up the pictures, and this really tells it all.
Do you got the pictures, Brian?
Yeah.
So if you look at that, there's a line that's pretty much flat across this Nginx unit as you go from zero to 500 concurrent users doing 10,000 requests per second.
And it's just kind of like, got it, no problem.
MicroWhiskey or with or without threads is sort of a linear slope equals one downward trend of performance
as you add more and more traffic.
Like, soon as you get to, you know, a couple hundred users, it just really becomes, it goes from handling like 7,500 requests to handling 50 per second.
I mean, it really falls over.
So I thought that was pretty interesting.
This whole Nginx unit thing seems like it might be a really powerful
and new way to run some nice backend stuff.
Okay, so the high numbers are better.
You want to keep...
Yeah, those are requests per second, basically.
Yeah, so once you do 100,000 requests,
it goes to zero on Microwave Scheme,
where it's still basically flat on Nginx unit.
So really, really cool.
I think that's quite promising
in terms of making Python faster and scale better,
which is super important
because people move to other languages,
Go or whatever,
because, like, well, we need this concurrency.
Or you could just run something that runs it better.
So they have a little note that says it's still in
beta, not for production.
Yeah, it's pretty new. It's not quite ready.
So my message, my
takeaway is I'm going to start paying attention to this thing.
Maybe switch to it at some point, but
yeah, don't switch to it yet. I wonder what version
number it is. It doesn't say. It's got to be
zero something, right?
Yeah,, don't switch to it yet. I wonder what version number it is. It doesn't say. It's got to be zero something, right? Yeah, I don't know either what version number it is.
That's a good question.
Okay.
Cool.
Very, very funny.
All right.
Awesome.
You've got something on looping, right?
Trey Hunter, who was on the show last week.
Didn't he do last week?
He was your stand-in, your impersonator last week.
Well, he's got an article, which is a really good read,
and I'm going to not do it justice,
but it's called Loop Better,
A Deeper Look at Iteration in Python.
And, you know, I'm glancing through this,
I'm thinking, you know, I already know how to loop in Python.
But the general, he shows a few gotcha examples
of generators used in loops.
And generators are, like, like for instance even a list
comprehension is a generator you can't loop twice and you if you use um containment check like is
nine in my generator it it'll work once and then it won't work the next time but it's not in there
anymore and you're in your collections half the size And it's a little strange. And it just behaves weird. I mean, I don't know if I've ever run
into these, but it hurt my head at first trying to figure out. I didn't know why they just weren't
working. So then the article goes on to describe in detail really the iterator protocol and what
iterators, iterable sequences and generators and all that
good stuff is, and then go back and look at those gotchas again and explain with that information
why they behave as they do. And I think this is just a well-written article that'll be
going to make you a smarter Python programmer to read it. Yeah, it's cool. Definitely covers a lot.
Well done, Trey. I think this is one of those concepts where if you come from a language that doesn't have generators, this concept of generators, or maybe if you just never really use them, the stuff that comes out of these generators, it looks like you just treat it like a normal collection.
But you're right.
They definitely don't behave like normal collections in a lot of ways.
And you can find these subtle bugs.
So nice to have them all covered like that.
Yeah, and one of the things, I guess I'll go a little bit,
is that generators, it's this iterator protocol,
and you keep it internally in a loop.
Python will call the next operator,
and then eventually it gets to the end.
There's not a way to reset them.
So they're done.
But you can generate, however you generate it, you can. So they... Yeah, they're done. They're done. And you got to generate...
But you can generate more.
However you generate it,
you can generate another one.
Yeah, pretty cool.
So the final thing that I want to cover
is a little bit like the first one.
It's a bit of a warning,
but this is not an automated system
like GitHub saying,
hey, there's all these repos.
We're going to tell you there's this problem.
It's just something people should be aware of.
So in Django, there's these configuration files,
and there's this part where you can set debug, true or false,
and there's like a little comment by it that says,
do not set this to be true in production.
However, do you think everyone goes into it,
the big long config file, and fixes that before they push it out?
No.
No, they don't.
So the article is called,
Misconfigured Django apps are Exposing Secret API Keys and Database Passwords.
That sounds bad.
Oh, no.
No.
So it says,
Researchers have begun stumbling upon misconfigured Django apps that are exposing information like these API keys.
It could be your Stripe key, whatever.
In just like a week, they discovered 28,000 Django apps
where the admin left the debug
mode enabled and then you know you see it'll be like screenshots of pulling up just random apps
on the internet here's the aws secret key here's the database password etc etc just listed in the
debug tools so that sounds bad right yeah well especially you're probably you probably leave that on while you're developing it so that you can look at all that stuff.
Yeah, it sounds really bad.
And it pretty much is.
It says, just skimming through a few servers, researchers found debug mode were exposing extremely sensitive information that would allow a malicious actor full access to the app owner's data.
But I like that they were really clear to emphasize this is not
a failure on the Django side. But in fact, you're just not supposed to do this in production. And
somebody on Twitter was like, it would be so awesome if there was like a comment or like a
little note in Django that said, don't put this in production. And then of course, right under
there's a screenshot of never run this in production in debug mode. It's not supposedly
not Django's fault.
However, I mean, maybe there needs to be
more than just on or off.
Maybe there needs to be a,
I'm debugging my app,
but I don't want to expose all the API keys
mode or something.
Oh yeah, for sure.
I think, or maybe just the debug stuff
is off by default
and you have to turn it on
and the act of turning it on,
you go to the section and you read that, but you might never go and read that part of the config file so you just don't know
right i mean django is famous for like getting easy like just getting stuff up really easy i
don't have to be a super developer so maybe you just don't know right uh to sort of make things
worse a security researcher victor jivas crevas said uh some of these apps running Django have already been compromised. And he found one server running the Weebly web shell.
That's bad.
I mean, they were somehow able to entirely take over the computer and just SSH into it.
And so he said, I've been notifying server owners about their leaky Django apps.
At the moment, we've reported 1,822 servers.
Well, 143 were fixed.
Not so many, right? right yeah or taken offline which the taken offline tells me that there's some people out there that just don't know how
to do that yet so they're just they'll just take it down yeah there's like you know what my little
toy site is not worth getting hacked i'm just taking it off right yeah right well so i guess
takeaway if you're running django site make sure it's not in debug mode or you could be a statistic. Don't be a statistic.
Yes. Don't be a statistic.
All right. That's it for our official six items. Brian, you got anything else? me that in episode 70 we covered Wagtail, which is a CMS written in Python.
But the Wagtail team
is trying to get some new features
out and they're running a Kickstarter campaign
to try to
fund that. So I think it's
a good thing. They're not looking for that much money
so if everybody pitches in a little bit
it'd be good. So we've got a link.
Yeah, they're pretty close to their goal.
They've got 10 days left. They're about halfway there. They we've got a link. Yeah, they're pretty close to their goal, right? They've got 10 days left.
They're about halfway there.
They should get there, hopefully.
Yeah, Wagtail is one of the really nice CMSs that's based on Django.
Hopefully it's a bug mode equals false.
Yeah, pretty nice stuff.
So yeah, if you care at all about Wagtail or these CMSs, go in there and help them out a bit.
I wanted to mention I've had a lot of great feedback on testing code. I've been doing a kind of a series of getting an open source project out and all of the sort of the testing requirements around it and talking about some of the common test design patterns.
And that's been going well.
And I've actually been learning a lot about running an open source. I thought, you know, lately I've just been using GitHub for just like a revision control.
But actually running an open source project, even if it's just got a couple of contributors, you learn a lot.
So hopefully I'll get some of those learnings written up sometime soon.
You definitely should.
That's a really cool project you're doing.
So keep it up.
Yeah.
You got any news?
No news right now.
Nothing to report.
But I'm always working on new projects.
I will let you know soon.
All right.
Well, thanks a lot for today.
Yeah, you bet.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.