Python Bytes - #31 You should have a change log

Episode Date: June 21, 2017

Topics covered in this episode: [more] TinyMongo A dead simple Python data validation library PuDB Analyzing Django requirement files on GitHub Changelogs Understanding Asynchronous Programming in ...Python Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/31

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This time it's Python Bytes episode 31, recorded on Tuesday, June 20th, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And we have a bunch of cool things to talk about. Some of them are huge and some of them are kind of tiny. Let's start small, huh? Yeah, let's start small.
Starting point is 00:00:22 I really appreciate, it's one of the reasons why I like following Twitter for Python news is that's where I found TinyMongo. So I saw somebody talking about it last week. That's awesome. I'm a fan of MongoDB and TinyDB. And if they could come together, that'd be even better. Right. So this is essentially an attempt to put, it's not an exact same interface, but it's his intent to always be right on top of TinyDB, but so far he's been really happy with TinyDB as the backend for TinyMongo. And so, yeah,
Starting point is 00:01:12 it just sits, it's using TinyDB as the database part, but exposes an interface that's very close to Mongo. Yeah, that's super cool. So basically if you have code that talks to MongoDB through the PyMongo API, you could more or less adapt that really quickly to TinyMongo and TinyDB, the backing store for this thing, more or less is like, let's create a simple document database that's really just some json files living on your disk it's not a full-on production database but if you're doing simple stuff like really simple things like this is actually pretty sweet there's no server right right and yeah it's no there's no server i would say probably the other direction probably works the best so if you were if you were going to your end goal was to use mongo that tiny mongo might be a good way to start because it isn't the full set of functionality. I don't have a complete list of what's missing. I just have the personal experience of I tried to take a Mongo application and just swap this in and I ran across a few errors and I haven't
Starting point is 00:02:20 finished debugging those yet. I'm just really excited about it because there's more than one document database that I can use in small applications. Yeah, that's cool. And then also, one of the applications for this, when I was talking with the maintainer of it, is that he's using it on Raspberry Pis even. So having a Mongo-like... That is really cool,
Starting point is 00:02:43 because you don't want to start up a whole separate server on like a Raspberry Pi, but certainly having a little couple of JSON files laying around that you have like a database interface over top of, that's cool. Yeah, definitely. So I was excited about this and I'm going to start using it right away. That's sweet. Yeah, if people are interested in TinyDB, I back on episode 80 of talk Python many moons ago, I interviewed the guy who created tiny DB and talked about some of the use cases. And I think there's some extensions you can get like indexing add ons and stuff like that. So there's a lot of stuff to do with this pretty cool. So that sounds pretty dead simple, right? Just fire up tiny DB and off you go.
Starting point is 00:03:21 Yeah, dead simple. You know what else I want want some dead simple validation. And so the next project I chose is called Validus. And Validus is on GitHub and it's described itself as a dead simple Python data validation library. And have you ever tried to write a regular expression to match an email or a URL or something like that? Oh, yes. Yeah. That's super fun, right? No. You think you get it working that someone emails you like, I have a proper email address, but I can't sign up your system. It says my email is invalid. You're like, oh, gosh. So this validus thing kind of like solves that for a class of types of data, basically simple input. So you can just import this and say
Starting point is 00:04:00 validus.isemail and give it a string and it will say yes or no and you can ask it questions like is it an rgb color is it a phone number is it an isbn is it a ipv4 or ipv6 address is it a number is it a slug like would it fit at the end of a url without you know needing encoding all that kind of stuff that's pretty awesome that's cool i'd say it's dead simple it's even got is mongo id so nice yeah yeah that's awesome so um you know what else i like about say it's dead simple. It's even got is Mongo ID. Nice. Yeah, yeah, that's awesome. So you know what else I like about this? It's Python only, no legacy Python. 3.6, 3.3. Yeah, yeah.
Starting point is 00:04:33 Yeah, 3.3 and above. So it's only a Python 3 thing. So yet another sweet example of that. I have a lot of interesting stuff to say about that at the end of the show. Not Validus, but Python versus legacy Python. While this works pretty well, we may still need to jump in the debugger, right? Yeah, definitely. And I'm a command line debugger kind of person. Actually, I don't really jump into the debugger too much.
Starting point is 00:04:54 You're a last resort, a debugger of last resort type person? Yes. Yeah, definitely. Last resort. And so in episode 29, we talked about launching the ability to launch PDB, the Python debugger, from a failed PyTest. Somebody on Twitter, another Twitter person, KidPixo, I think. Yeah, KidPixo, he runs the Geek Cookies Italian podcast, which I was a guest on like two and a half years ago. He's a great guy. He sends us lots of good stuff. Yeah. Well, he passed this along because he said he really loves the PUDB debugger. And my first reaction is, oh my God, this thing is ugly. Because it does look like you're back in the 80s running on a 386 or something. I feel like I've dialed into a VBS. But it does have themes.
Starting point is 00:05:40 So after I played with it for a while, I switched it to a midnight theme, and it looks just like I'm in my editor. And then it's actually pretty slick. And one of the things that you can do with it, it's a lot better than PDB, and it's still small and fast. And there's some documentation in it for how you can do the same thing that we did with PyTest. You can launch it just whenever you hit a PyTest failure. So that's pretty cool. Yeah, it's really nice. I mean, you can use it over like SSH and stuff.
Starting point is 00:06:12 So if you're SSH into a server, you can debug with this, but it actually has like little windows. I mean, it really does feel like I'm back on a BBS. It's awesome. Like you see your code and you can step through it. You've got like a variables window and a stack and breakpoints. And like, it's really nice. It's like a variables window and a stack and breakpoints. And like, it's really nice.
Starting point is 00:06:28 It's like a ASCII curses type thing. But the local, yeah, the local window of already having your listing up and also all your local variables and that changing when you go up and down the stack is, it's just, it's usually enough. So I like it. Yeah. Yeah. It definitely hits the sweet spot. Like the 80, 80% case for debuggers. It's cool. All right. So I'm definitely gonna start using that if I need to debug anything, uh, without a windows environment, a windowing environment, like Mac OS or Linux or windows. Okay. So the next thing that I want to talk about is a really
Starting point is 00:06:59 interesting sort of wide ranging study that the guys at pie up.io did. So piup.io is a cool service. I'm actually a paying customer of theirs because I really think what they're doing is awesome. And I use it for my web apps. So the idea is you basically point you give piup.io access to your requirements file in your public or private GitHub repo. And if there's a new version of indie requirement or transitive requirement that you depend upon, it will tell you like, hey, there's a new release of the pyramid web framework. And here's the change log. And actually, this one's a security update. So get in there and fix it quick. So it'll like basically watch your requirements and tell you if there are any upgrades and things like that. And it'll
Starting point is 00:07:42 issue them as a pull request. So really cool. So these guys have access to all these requirements files and many other things, right? And they studied some Django requirements files on GitHub. Now this isn't through their business, they were able to use BigQuery to just get ahold of all of the Django requirement files that are on GitHub. And they found some interesting things. And I guess this is not private, not the private repos, probably just the public ones. But anyway, they said that Django is the most popular web framework. And it's pretty old. It's been around for 12 years, used in all sorts of different projects.
Starting point is 00:08:19 So let's look at these requirements files, which specify like all the dependencies you have to install and see what we can get from them. So the first thing they ask is, do developers pin or freeze their requirements, right? That's where in your requirements TXT, you could say, I depend on Django and I depend on SQL alchemy and I depend on requests. Or you could say, I depend on Django equal, equal this version, request equal, equal that version, right? That's pinning or freezing. And they said that 64% of Django developers pin their requirements. That's interesting. And another 20% or so do ranges.
Starting point is 00:08:55 So like I'm willing to take this range of versions, but not leave it unpinned. And then some of them are just like, give me whatever I can when I ask for it. So that's interesting. Another thing that they said was pretty interesting is that Django 1.8, even though I think 1.10, 1.11 is the latest, Django 1.8 is the most popular of them. And that was pretty cool. But one of the things I really wanted to point out here is they said that what is more worrisome is 1.9, 1.7, and 1.6 are second, third, and fourth most popular on the list. Why is that a problem? None of them are receiving any security updates at all. Oh, weird.
Starting point is 00:09:34 Isn't that bad? So 1.7 and 1.6 went end of life over two years ago. So if you are on the web and your application listens on a socket, you want it to have all the security patches, let me tell you. That's bad news. And here's like, if I add those up really quick, that's something like 40% of Django files they found are using these older versions.
Starting point is 00:09:55 And in fact, he said only 2% of all Django projects they could find are actually on a secure release. Among all the projects, more than 60% use Django releases with one or more known security vulnerabilities. And that's pretty intense, man, that only 2% of them are on a 100% known secure release. Well, I mean, clearly it's recommended to go make sure that you're using a secure release, but I was curious about the pinning or freezing. Is that considered best practice? So I think it depends on what you're doing. For large, complicated applications, it's definitely
Starting point is 00:10:30 considered a best practice. The idea is you want to make the upgrade in your dependencies at the time of your choosing, right? Like you want to have, so if you're going to upgrade from, especially major frameworks like Django, if you're going to go from Django 1.8 to 1.9, you don't want that to just happen one day when it gets released and you happen to refresh your server because that might have breaking changes. So you want to explicitly say, I depend on this one. Oh, there's a new one out. Let me test the new one and then explicitly change that number and have it like flip it for you okay and basically that's what the pi up service does that that i i use like it will automatically upgrade like my pyramid web framework from like 17 to 18 to 19 but it doesn't flip it immediately it's like i have to it'll tell me and change my
Starting point is 00:11:16 requirements files as a pr and i have to like accept it basically okay yeah but pretty interesting stats there uh especially if you're into dango, check that out. Yeah, definitely. It's kind of concerning that there's so many. And then there's, those are, I'm sorry to like hang out on this so much, but this is, was this projects or applications and is there a difference? So as far as I can tell from the, I don't really know, Yanis, I think this guy who wrote it probably could maybe chime in in the comments if he's listening. But my understanding is basically they went and they studied the public repos that use Django. Okay. So this also may not be quite representative because companies like Pinterest that depend on Django, they're obviously not going to make their code public, right?
Starting point is 00:12:00 So they may be doing slightly different things. But still, it's interesting for you into at least the open source side of Django. Definitely. It's cool. Speaking of open source projects, do you think they should have a changelog? Well, that's what I was curious about. Yeah. So I kind of am warming to the idea of changelogs. I appreciate other projects with changelogs. I actually asked some people back on Twitter again what they thought of them. And there's a couple of things I came across, which was a website called Keep a Changelog. I really like that site. It's so clear and compelling. It's great.
Starting point is 00:12:35 Yeah. Well, it's also, it talks about that there really isn't a standard, if there is a standard format forum, this is probably as close as you can get. And it talks about different standards in either REST or in Markdown. There's different ways to do it. And then when I was talking on Twitter about changelogs, some of the people from the PyTest project piped up and said that they're using a tool called Town Crier to maintain their change log. That looks really cool, but I've never done anything with it. What's Town Crier do? So what it does is you keep a separate directory within your project
Starting point is 00:13:13 so that you can have it on different, if you're using different branches, and then different changes go in, and you keep the changes in little snippet files so that since they're separate files, they merge easy because they're going to be a new file for each change. And then you go through and say, okay, I've pulled all these things in. I want to go ahead and take everything in the directory and add it to the changelog. Oh, I see. You can keep a separate file that says, these are the breaking changes. These are the new features or whatever. then it'll build a changelog out of them?
Starting point is 00:13:46 Yeah. Oh, sweet. Okay. Well, it adds to your existing, and it can add to your existing one. And one of the things I liked, if you're not doing something like Town Crier, one of the recommendations from Keep a changelog was to keep at the top a unreleased changes so that you, things that you haven't put a label on or or done a official supported release yet because those are things that may i don't know maybe you may end up kicking out yeah they also have some things that you shouldn't do like don't just take your get change log and make that your proper change log things like that yeah and the one of the things there i saw when i was doing some research for this i did see some some various automated ways to do it but that's the sort of thing is you're going to pull things out of
Starting point is 00:14:28 file changes and that's not really what you want you really want a a human moderated list of things that went in and that's one of the reasons why i like town crier because it was uh sort of halfway in between yep yeah it's it's definitely really, it's like a nice way to sort of manage that human. Because you don't want merged conflict, took PR, accepted this. I changed the spelling. Like, you know, you don't need all that noise. You just want the four things that change. Do I want to upgrade to this or not?
Starting point is 00:14:59 Whatever. Let's just move on, right? Yeah. And then I guess I would lump this in. Last time we talked about uh different decisions based on scaling and for projects that i'm just i'm the main maintainer of i would definitely just keep a file but if if we start getting a lot of contributors then something like town crier totally makes sense so yeah i think it's i think it's really nice i'm gonna definitely
Starting point is 00:15:21 look into it all right last thing i want to talk about is asynchronous programming, which is something that I talk about often because I'm a big fan. This is an article called Understanding Asynchronous Programming in Python by Doug Farrell from Dan Bader's site. And we've had some of Doug's stuff on before. He does good writing. He works at Shutterfly doing Python there. So he takes some of his experience and puts it in this article,
Starting point is 00:15:49 and it's pretty cool. What I would call or sort of describe this as this is like a very friendly introduction to asynchronous programming so starts out and says let's imagine like a web server and could it be synchronous sure it'd be fine if we had a synchronous web server and we could optimize the heck out of it but no matter how much we optimize it like at some point you're waiting on a thing and you want to go do other stuff for example just like shipping the html back to the browser on a slow network right like you want to be processing other requests and do that in the background and so he's got something to the effect of like eight or nine examples. And to sort of start them off, he says, look, the real world is asynchronous. For example, if you're a parent, kids are a long-running task with high priority, superseding any other task you might be doing, like a checkbook balancing or laundry or something like this.
Starting point is 00:16:42 So he has a lot of like analogies back to real life that are pretty cool. Then he says, okay, we're going to go through some examples, like eight examples and build them up. Start with like a synchronous sort of job doing program that has a queue, you put some work in the queue, it does the work. And then it says, all right, let's see how we can use generator methods with the yield keyword to instantiate like cooperative multi threading or cooperative concurrency, I guess, between those two methods, which is actually a really cool way to do it where there's no concurrent IO, there's no threads, there's no multi-processing. It's just like, let's interweave the work of these two methods or multiple methods using generators, which I thought was really a cool way to look at it. And it says, okay, well, what if some of that work is slow? That's a problem.
Starting point is 00:17:23 And then he kind of takes you on a tour of different apis and libraries to make this work so g event twisted twisted callbacks and so you can compare all these different ways of doing things and i should throw in there some aio http type things as well but yeah very very cool article if you want a super gentle introduction to asyncers programming so this doesn't cover the uh ao ai ai yes exactly So this doesn't cover the AO, AI. AI. Yes, exactly. Yeah. It doesn't cover the, the basically the three, five stuff. Okay. Yeah. So this would work on any version. I really liked this article because we've been talking about asynchronous for a while and I, I have to admit, I have my hard time getting my head around how to think about, I've been doing it for so long in C++, but I have a hard time
Starting point is 00:18:06 getting my hand around it in Python. And this article is really a good starter. Yeah, I feel like it's definitely a good starter. I was happy to pick one of our picks this week. All right, so that's all the news that we have that we've kind of found, but you have extra credit, don't you? Yeah. Well, yeah. In episode 29, I gave the wrong credit to the wrong person for cluing me into PipCash. I'm sure they appreciated it though. Yeah. But it really was KidPixo and he reminded me that it was him. And so sorry about that. And thanks a lot for keeping us informed. Yeah, definitely. We really appreciate these ideas and these notes and these little
Starting point is 00:18:44 topics people send us. They're very nice. And then I just had, I couldn't resist, this is going to be hard to do over a podcast, but we have a link to a funny comic about Python private methods. And if you haven't seen this, check it out. It's just, it's basically a key under the mat in front of a door. I love it. I love it. That's really awesome. Yeah, that's kind of the thing.
Starting point is 00:19:13 It's like, it's private unless you want to look for it, then it's right there. Yeah. Nice. All right. So update us on the book. The book is coming along and taking almost all of my time. The multitasking is a hard thing. But yeah, the third beta is coming out, should be out this week with the last chapter, chapter seven. And this one is using PyTest with other tools like PDB and coverage and mock and talks and Jenkins and things that I get
Starting point is 00:19:40 a lot of questions about. So I'm really happy to get this chapter out. Yeah, that's awesome. How about you? Yeah, last time we talked, I was recording and recording and recording TalkPython episodes. So now I'm kind of finishing up recording courses. So I've actually got two eight and nine hour courses that I've finished recording over the last couple of weeks. So I've finished recording the RESTful and HCP Services and Pyramid. And I've also finished recording, writing and recording the MongoDB for in Pyramid. And I've also finished recording, writing, recording the MongoDB for Python developers courses. So I'm working on editing the final videos for those and getting those up.
Starting point is 00:20:11 So I'm really excited to get that out. Really fun. I'm really excited to take a look at that MongoDB course. That sounds very interesting. It's a cool hands-on one. We build like this database that represents a dealership and it's got like millions of records in it. We get it to where we'll like do queries in like one millisecond,
Starting point is 00:20:27 even with millions of records. It's fun. Nice. Yeah. Cool. All right. Well, that's,
Starting point is 00:20:32 that's our news for the week, Brian. Thank you so much for, as always sharing with everyone. All right. Thank you. Yep. See you all later.
Starting point is 00:20:40 Thank you for listening to Python bites. Follow the show on Twitter via at Python bites. That's Python bites as in in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.