Python Bytes - #72 New versioning: Episode 0.0.7.2 (with 72 releases)

Episode Date: April 5, 2018

Topics covered in this episode: ZeroVer: 0-based Versioning GitHub Security Alerts Detected over Four Million Vulnerabilities Markdown Descriptions on PyPI Concurrency comparison between NGINX-unit... and uWSGI Loop better: A deeper look at iteration in Python Misconfigured Django Apps Are Exposing Secret API Keys, Database Passwords Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/72

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 72, recorded April 4th, 2018. I'm Michael Kennedy. And I'm Brian Hocken. And we've got some awesome stuff for you. Before we get to it, Brian, I want to say thank you to Datadog. Check out what they're offering over at talkpython.fm slash datadog. And they've got a bunch of cool stuff, including a cute little doggy t-shirt. I still need to get one of those. I know. I don't have a shirt either. I really need to do that.
Starting point is 00:00:28 All right. I guess the first thing we should talk about is versioning. When I normally look at commercial software, they have numbers in the front like version 6 or 12 or version 3 of this software. It's pretty rare to have three dot anything in open source, isn't it? I don't know if it's rare, but there sure is a lot of stuff that's still, we think of stuff that starts with a zero as being in beta and it's not.
Starting point is 00:00:57 Well, like for instance, with semantic versioning, you can, once it's at 1.0, the interface is pretty solid and you can depend on it. But there was a website put up this month called ZeroVer to talk about zero-based versioning. And it's sort of tongue-in-cheek. It's from one of our friends, Mahmoud Hashemi, and some others that started it. But they kind of wanted to call out a bunch of Python projects and other projects that are like perpetually starting with Xero by putting
Starting point is 00:01:30 up this sort of mock website to say that you don't need to do anything other than Xero-based versioning. It really helps you it gives it like it starts out with a down-to-earth demo. It's pretty awesome. So it has like some versions and says yes, these are good. 0.0.1 0.1.0.0.4.0 and then no 1.0 1.0.0 1.0.0 2018.04.0 like none of these are
Starting point is 00:01:57 okay right and uh yeah so if you haven't figured it out, it is a joke. But it is, like, for instance, I guess I hadn't realized this. Flask is one of the ones that was called out. It's currently 0.12.2. Come on, it's been eight years. I think that maybe you can go to a 1.0. I have a new solution here, a new way to solve this problem. Just whenever you print or look at the version number just strip off all the left zeros and dots so it's like 12.2 is basically what flask is you
Starting point is 00:02:35 know there's some pretty ones that are very dependent on by a bunch of people that like if they completely change the interface the api that would be bad. So there clearly is the point where they could bump to a 1.0. Yeah, I think eight years, absolutely, is a time frame where you could say, you know, we're pretty stable at this point, right? Like also Pandas in here with a 0.23, and it's 7.1 years. And also they count the releases, right? Like 21 releases of Flask, 75 releases of Pandas, and it's still on a zero. One thing I would like to point out is if you go and you
Starting point is 00:03:12 look at a lot of the sort of manager folks at more commercial oriented enterprise software groups, so people that use like.NET or other non-open source development ecosystems, when they see things like 0.1, they're like, oh, this thing is not ready for us to use. We can't use this at our company. And I think it actually sends a bit of a message that this thing's not quite ready. I mean, I know obviously looking at this list, it doesn't mean that. But a significant number of people, I think, interpret it that way. And so, you know, I think it's worth considering maybe saying, all right, we're actually at a version where we're going to call it 1.0. Like Flask is probably fully ready for 1.0.
Starting point is 00:03:54 Anything that starts with it, it's just the dots and it's not date-based. People kind of assume it's a semantic versioning. So I think semantic versioning is the way to go. It's not an easy thing though. And that's part of the reason why they're, they're being a little gentle with it there. If you check out the about page, it talks about on here, it talks about really what you should do about it. But when you're actually running a project, it's hard to decide when what's something that's big enough to go to flip the major digit. Django is doing that pretty well. I think right there, they had their one Oh, and that was stable for a long time.
Starting point is 00:04:28 They said, we're going to make a major break and change, so we're flipping to 2.0, the DropPython 2 support, and all that kind of stuff. Django is one that's not following Mahmood's recommendation. Recommendation, yeah. Yeah, I love it. I love it how he branded that, him and all the folks involved. It's very cool.
Starting point is 00:04:44 So when we build these projects in Python and any open source system, you basically layer on a whole bunch of external dependencies and packages and stuff. How do you know when something has gone terribly wrong? Like suppose you depend on Vexig in Flask, and there's some huge security vulnerability in that dependency. Do you get notified? How do you know? No, I don't know.
Starting point is 00:05:09 You don't know. Right. And so this is actually a really big problem. It's like when you think about problems or security issues with your application, it's not just what you have. It's the stuff you're built upon. I mean, the whole Equifax thing was a vulnerability in the, was it Swing? I don't know, some foundational library in Java.
Starting point is 00:05:29 And they just didn't patch it in time, right? So getting notified of these things is really important. And so much of our code lives on GitHub. And GitHub decided they're going to take some responsibility for this and try to help people. So there's a nice article that says, GitHub security alerts detected over 4 million vulnerabilities last year. I think it was in the year. Actually, it's not even the year.
Starting point is 00:05:50 It's since like November of last year or something. So that's pretty insane. So they launched this thing called GitHub security alerts. Initially, it's only for Ruby and JavaScript, which is lame. But they have Python support coming, which is why I'm talking about this. And what it does is it looks at your GitHub repo and it says, are you using a certain dependency? Does that dependency have a known security vulnerability? If it does, then like right at the top of your repo, you get this great scary warning that says your application isn't insecure because it depends on this thing that is insecure. Yeah, actually, that's a great idea.
Starting point is 00:06:22 Yeah. I don't know if you get an email notice, but certainly your your repo looks scary. When that's the case, like this happened to one of my courses, and it just came back again. Because one of my courses, the Python course demonstrates using electron j editing and electron j s app and electron j s had some security vulnerability. It's not actually used, but you know, whatever, it still says your app, it depends upon ElectronJS and it's got this issue. It's pretty cool. There's some good numbers and whatnot here. It says
Starting point is 00:06:51 nearly half of all the displayed alerts were responded to within a week and 30% were fixed within the first seven days. Oh, that's great. That's good. That's a good thing that they're adding that. And it does. So you said it's coming for Python and I see that there is planned for this year for 2018. So that's good. Yeah. There's not a whole lot of details
Starting point is 00:07:13 about exactly when it's coming, but yeah, that will be great. They said, if you look at repositories that have had a contribution in the last 90 days, so things that are active, it says 98% of such repositories were patched within fewer than seven days. Like that's insane. That's a really big deal. Yeah. Yeah. So they said they found over half a million of repositories that had some
Starting point is 00:07:34 kind of security vulnerability and were pretty much fixed up. So anyway, that's all really good. I just want to give a shout out to pie up as well. P Y U P dot IO. I use that for my stuff and it basically does the same thing and more for python already so you link it to your github repo it'll like look at your requirements.txt if there's a new version it'll send you a pr to upgrade your your dependencies
Starting point is 00:07:57 and if there's a security alert it'll tell you don't really want to get on this tangent too far but i started using pyup for the cards project that I started recently. Since I'm sort of doing this project, I can't remember who I read it from, but the packages that are intended to be used by other applications probably shouldn't have their versions pegged. So if I unpegged all my versions in a package, then PyUp.io kind of complains about that. Yeah, that's a little bit. It does require you to more or less pin your versions. And you can do expressions like I want it to be this version or higher.
Starting point is 00:08:32 And I think maybe it'll upgrade it. I don't know. There's a little flexibility. It's not perfect. But for like fixed apps like my web apps all have the stuff pinned and it just automatically updates because nothing depends upon it. It's fine. Yeah.
Starting point is 00:08:43 Yeah, pretty nice. And it's free for stuff. So open source yeah it's pretty nice great speaking of open source pypi is the place where it lives and now you can describe it better right yes i'm very excited about this because like the cards project i was working on i was sort of bummed that i had to put the readme in rest in or not rest, but restructured text. And now you don't anymore. That's awesome. So readme.md and a couple other variants of that extension are now supported on pypi.org.
Starting point is 00:09:19 And we're linking to a couple articles, one of them basically describing all the steps you have to do. There's a little bit of changes you have to do to your setup.py file and a couple other things and update all your tools. But for the most part, it just works, and that's awesome. And then also, just recently, GitHub-flavored Markdown has been added. Oh, yeah, that's nice. GitHub-flavored Markdown has a little bit more, I think, from the stuff that I played with. Like tables and cross- Yes, tables.
Starting point is 00:09:44 Mark-through and stuff. Yes, like tables. Mark through and stuff. So that's nice. And I'm looking forward to changing a couple of projects to utilize that. And now the old legacy PyPI, which I think maybe they've taken from your legacy Python. I love it. Yeah.
Starting point is 00:10:03 It still renders the descriptions as plain text, but they comment, don't worry, it's going PyPI.org is really close to being the thing. So maybe this will just hasten the move away from legacy PyPI with like the descriptions looking funky. Yeah. So hopefully. Yeah. Awesome. I'm really excited to see PyPI making some progress. It felt kind of stale for a little while and it seems like it's really been rocking the last nine months. To be fair, even if your markdown gets displayed as plain text on legacy pipe BI, that's the point of markdown is it's still readable. So that's okay. Exactly. If it were HTML with lots of styles that have been different. That's right. Yeah.
Starting point is 00:10:56 Nice. All right. So before we get to the next one, let me tell you about Datadog. It's a monitoring solution provides like deep visibility and tracking into your distributed apps. So your application, your data layer, your servers, your services, everything. So within minutes, you'll be able to investigate bottlenecks and actually see where they are throughout your entire distributed app, which is pretty cool to put it together. So if you want to visualize your Python performance today, get started with a free trial and And to also get that cool Datadog t-shirt, visit pythonbytes.fm slash Datadog. Earlier I said TalkPython. They both work.
Starting point is 00:11:30 But pythonbytes.fm slash Datadog. Speaking of web apps and distributed things and whatnot, I think there's a really interesting new web server that people should start paying attention to in the Python space. So you've probably heard of Nginx, right, Brian? I know you don't do a ton of web stuff, but yeah. Yeah, definitely. Nginx is kind of like the static front-end server and load balancer thing for many web apps.
Starting point is 00:11:54 On my sites, I have Nginx hitting, it takes all the requests, does the SSL stuff, any static resources, CSS, JavaScript images, that just gets sent straight back. And only the sort of data driven stuff makes its way back to the Python web server, which in my case is micro whiskey. And micro whiskey is really nice. But the NGINX folks have come up with this thing called NGINX unit. And so the thing I want to link to is this performance comparison between NGIN Unit and MicroWhiskey. So MicroWhiskey is written in C++. It's like one of the best high-performance things that will run and farm out your Python application, Pyramid Flask, whatever.
Starting point is 00:12:37 And it works really well. But Nginx Unit is a little more flexible. And, for example, you can configure it over a RESTful API instead of just config files. It'll run multiple languages and versions at the same time, improve TLS support, HTTP2, which is cool. It'll run Python, multiple versions. It'll run Go, Ruby, JavaScript, whatever, right? So it'll run all these things in this one server. It's not just I'm going to run one flavor of Python. So anyway, it's pretty cool.
Starting point is 00:13:10 And the thing I wanted to look at was this comparison. So there's this, I don't know who did it actually, a group that put together sort of a performance analysis and said we're going to slowly add more and more traffic, concurrent traffic, to both of these things add more and more traffic, concurrent traffic, to both of these things running more or less a Hello World Flask app. And so pull up the pictures, and those of you who are listening, there's a little link, you can pull up the pictures, and this really tells it all.
Starting point is 00:13:36 Do you got the pictures, Brian? Yeah. So if you look at that, there's a line that's pretty much flat across this Nginx unit as you go from zero to 500 concurrent users doing 10,000 requests per second. And it's just kind of like, got it, no problem. MicroWhiskey or with or without threads is sort of a linear slope equals one downward trend of performance as you add more and more traffic. Like, soon as you get to, you know, a couple hundred users, it just really becomes, it goes from handling like 7,500 requests to handling 50 per second. I mean, it really falls over.
Starting point is 00:14:11 So I thought that was pretty interesting. This whole Nginx unit thing seems like it might be a really powerful and new way to run some nice backend stuff. Okay, so the high numbers are better. You want to keep... Yeah, those are requests per second, basically. Yeah, so once you do 100,000 requests, it goes to zero on Microwave Scheme,
Starting point is 00:14:29 where it's still basically flat on Nginx unit. So really, really cool. I think that's quite promising in terms of making Python faster and scale better, which is super important because people move to other languages, Go or whatever, because, like, well, we need this concurrency.
Starting point is 00:14:46 Or you could just run something that runs it better. So they have a little note that says it's still in beta, not for production. Yeah, it's pretty new. It's not quite ready. So my message, my takeaway is I'm going to start paying attention to this thing. Maybe switch to it at some point, but yeah, don't switch to it yet. I wonder what version
Starting point is 00:15:02 number it is. It doesn't say. It's got to be zero something, right? Yeah,, don't switch to it yet. I wonder what version number it is. It doesn't say. It's got to be zero something, right? Yeah, I don't know either what version number it is. That's a good question. Okay. Cool. Very, very funny. All right.
Starting point is 00:15:13 Awesome. You've got something on looping, right? Trey Hunter, who was on the show last week. Didn't he do last week? He was your stand-in, your impersonator last week. Well, he's got an article, which is a really good read, and I'm going to not do it justice, but it's called Loop Better,
Starting point is 00:15:30 A Deeper Look at Iteration in Python. And, you know, I'm glancing through this, I'm thinking, you know, I already know how to loop in Python. But the general, he shows a few gotcha examples of generators used in loops. And generators are, like, like for instance even a list comprehension is a generator you can't loop twice and you if you use um containment check like is nine in my generator it it'll work once and then it won't work the next time but it's not in there
Starting point is 00:16:01 anymore and you're in your collections half the size And it's a little strange. And it just behaves weird. I mean, I don't know if I've ever run into these, but it hurt my head at first trying to figure out. I didn't know why they just weren't working. So then the article goes on to describe in detail really the iterator protocol and what iterators, iterable sequences and generators and all that good stuff is, and then go back and look at those gotchas again and explain with that information why they behave as they do. And I think this is just a well-written article that'll be going to make you a smarter Python programmer to read it. Yeah, it's cool. Definitely covers a lot. Well done, Trey. I think this is one of those concepts where if you come from a language that doesn't have generators, this concept of generators, or maybe if you just never really use them, the stuff that comes out of these generators, it looks like you just treat it like a normal collection.
Starting point is 00:16:59 But you're right. They definitely don't behave like normal collections in a lot of ways. And you can find these subtle bugs. So nice to have them all covered like that. Yeah, and one of the things, I guess I'll go a little bit, is that generators, it's this iterator protocol, and you keep it internally in a loop. Python will call the next operator,
Starting point is 00:17:18 and then eventually it gets to the end. There's not a way to reset them. So they're done. But you can generate, however you generate it, you can. So they... Yeah, they're done. They're done. And you got to generate... But you can generate more. However you generate it, you can generate another one. Yeah, pretty cool.
Starting point is 00:17:30 So the final thing that I want to cover is a little bit like the first one. It's a bit of a warning, but this is not an automated system like GitHub saying, hey, there's all these repos. We're going to tell you there's this problem. It's just something people should be aware of.
Starting point is 00:17:47 So in Django, there's these configuration files, and there's this part where you can set debug, true or false, and there's like a little comment by it that says, do not set this to be true in production. However, do you think everyone goes into it, the big long config file, and fixes that before they push it out? No. No, they don't.
Starting point is 00:18:02 So the article is called, Misconfigured Django apps are Exposing Secret API Keys and Database Passwords. That sounds bad. Oh, no. No. So it says, Researchers have begun stumbling upon misconfigured Django apps that are exposing information like these API keys. It could be your Stripe key, whatever.
Starting point is 00:18:20 In just like a week, they discovered 28,000 Django apps where the admin left the debug mode enabled and then you know you see it'll be like screenshots of pulling up just random apps on the internet here's the aws secret key here's the database password etc etc just listed in the debug tools so that sounds bad right yeah well especially you're probably you probably leave that on while you're developing it so that you can look at all that stuff. Yeah, it sounds really bad. And it pretty much is. It says, just skimming through a few servers, researchers found debug mode were exposing extremely sensitive information that would allow a malicious actor full access to the app owner's data.
Starting point is 00:19:02 But I like that they were really clear to emphasize this is not a failure on the Django side. But in fact, you're just not supposed to do this in production. And somebody on Twitter was like, it would be so awesome if there was like a comment or like a little note in Django that said, don't put this in production. And then of course, right under there's a screenshot of never run this in production in debug mode. It's not supposedly not Django's fault. However, I mean, maybe there needs to be more than just on or off.
Starting point is 00:19:29 Maybe there needs to be a, I'm debugging my app, but I don't want to expose all the API keys mode or something. Oh yeah, for sure. I think, or maybe just the debug stuff is off by default and you have to turn it on
Starting point is 00:19:42 and the act of turning it on, you go to the section and you read that, but you might never go and read that part of the config file so you just don't know right i mean django is famous for like getting easy like just getting stuff up really easy i don't have to be a super developer so maybe you just don't know right uh to sort of make things worse a security researcher victor jivas crevas said uh some of these apps running Django have already been compromised. And he found one server running the Weebly web shell. That's bad. I mean, they were somehow able to entirely take over the computer and just SSH into it. And so he said, I've been notifying server owners about their leaky Django apps.
Starting point is 00:20:19 At the moment, we've reported 1,822 servers. Well, 143 were fixed. Not so many, right? right yeah or taken offline which the taken offline tells me that there's some people out there that just don't know how to do that yet so they're just they'll just take it down yeah there's like you know what my little toy site is not worth getting hacked i'm just taking it off right yeah right well so i guess takeaway if you're running django site make sure it's not in debug mode or you could be a statistic. Don't be a statistic. Yes. Don't be a statistic. All right. That's it for our official six items. Brian, you got anything else? me that in episode 70 we covered Wagtail, which is a CMS written in Python.
Starting point is 00:21:06 But the Wagtail team is trying to get some new features out and they're running a Kickstarter campaign to try to fund that. So I think it's a good thing. They're not looking for that much money so if everybody pitches in a little bit it'd be good. So we've got a link.
Starting point is 00:21:21 Yeah, they're pretty close to their goal. They've got 10 days left. They're about halfway there. They we've got a link. Yeah, they're pretty close to their goal, right? They've got 10 days left. They're about halfway there. They should get there, hopefully. Yeah, Wagtail is one of the really nice CMSs that's based on Django. Hopefully it's a bug mode equals false. Yeah, pretty nice stuff. So yeah, if you care at all about Wagtail or these CMSs, go in there and help them out a bit.
Starting point is 00:21:40 I wanted to mention I've had a lot of great feedback on testing code. I've been doing a kind of a series of getting an open source project out and all of the sort of the testing requirements around it and talking about some of the common test design patterns. And that's been going well. And I've actually been learning a lot about running an open source. I thought, you know, lately I've just been using GitHub for just like a revision control. But actually running an open source project, even if it's just got a couple of contributors, you learn a lot. So hopefully I'll get some of those learnings written up sometime soon. You definitely should. That's a really cool project you're doing. So keep it up.
Starting point is 00:22:19 Yeah. You got any news? No news right now. Nothing to report. But I'm always working on new projects. I will let you know soon. All right. Well, thanks a lot for today.
Starting point is 00:22:28 Yeah, you bet. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy.
Starting point is 00:22:51 Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.