Python Bytes - #54 PyAnnotate your way to the future

Episode Date: November 29, 2017

Topics covered in this episode: The PSF awarded $170,000 grant from Mozilla Open Source Program to improve sustainability of PyPI Dropbox releases PyAnnotate pytest-annotate is now open-source! Run... Python script as systemd service pytest 3.3.0 released Why d = {} is faster than d = dict() Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/54

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bites, where we deliver Python news and headlines directly to your earbuds. This is episode 54, recorded November 28th, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And Brian, I feel like we've got some pretty good stuff lined up for this week. What do you think? Yeah, we do. Totally. Before we get to that, though, let's just say thank you to DigitalOcean. They want you to know about Spaces at do.co.python. Spaces is awesome. It's
Starting point is 00:00:25 like AWS S3, but 10 times better, maybe even more so. I'll tell you more about that later. But Brian, you have some fantastic news for the stability of Python open source infrastructure, right? Yes. This just came out yesterday, an announcement that the python software foundation has awarded a 170 000 grant that the money came from the mozilla open source program and it's to improve the sustainability of pi pi that is our packaging index that everybody uses yeah and we've talked about the challenges that pi pi had previously. I've actually done an entire panel episode on TalkPython. It's a ways back. It's in the 60s, 70s range in the episode number.
Starting point is 00:01:10 But this has been a really big problem, and it's really been on the shoulders of Donald Stuffed to just keep pip and PyPI running, right? There are other people involved with trying to keep it up and running, but really that's all that they have time for right now. There was effort for the new warehouse code base, but Donald has switched jobs recently and cannot spend as much time as he was before working on it. So there's a big gap there, and we need some work. So there's a lot of people that have asked this warehouse thing.
Starting point is 00:01:46 I thought it was going to become the new PyPI. What's up? Still not the default. I know. You know, the site basically works. It uses the same database, so it doesn't get out of sync. And, you know, if you go to PyPI.org or PyPI.io, you end up there. And it's a much better experience than the funky double PyPI URL
Starting point is 00:02:05 that's at python.org. There are some administrative capabilities that, for instance, if you're pushing up a new package, you will notice you still have to go use the old API to create an account. And there are some backwards-compatible administrative capabilities
Starting point is 00:02:23 that are needed in order to get this going and farther. And also, it's used by so many people that we kind of have to migrate slowly, a little bit slowly and carefully. And hopefully this grant will be enough to at least get us started and get that done. So I'm excited about it. Yeah, that'd be super awesome. Maybe they can take a page out of how the Instagram folks migrated from Python 2 and the older version of Django to Python 3 and the newer version of Django, where at first it just rolled out to the internal people and then a small group and so on. It's either that, them or their Facebook, same company, but I can't remember exactly the product, but I think it was Instagram. I think it'll be pretty good a plan put together. They've got in the article that we link up,
Starting point is 00:03:08 they do talk about one of the first steps is redirecting some of the production traffic to the warehouse and then gradually migrating that over. And then again, the main thing is to try to get all the administrative capabilities up to snuff. Yeah, nice. I don't know what the timeline is like, but I'm looking forward to seeing some of those changes. You know, I'm looking forward to that red pre-production website banner thing being on. Yeah, yeah, definitely. Because the site, at least from a consumer perspective, is really, really great.
Starting point is 00:03:37 I think they could actually take that down now and just say, admin people, if you want to maintain your package, go over here. It's still kind of a messy thing to have to try to to teach people how to put up new packages it's still a convoluted instruction set yep for sure so how often do you use type annotations python's a dynamic language you might say here's a function called register and it has a thing called user maybe that's the user's email maybe that's a user object maybe it it's something else. Like you could annotate that. But do you do that? I try to do it for at least the API for a package. That's what I've been using it for. Yeah, that's a really great point. I do that as well. I don't like go over the top and like annotate everything in my code. But I find as you cross like major architectural boundaries, which hopefully you've put into your application,
Starting point is 00:04:25 you know, you've got like a data access layer and and you've got some other layer that's using it, like if you annotate just that data access layer, like that really flows a lot of good checking through. So one of the tools that has been around for a while, and it's actually, as I understand it, one of the main projects that Guido van rossum has been working on is my pi which is an experimental optional type checker for python yeah yeah it's cool right so basically what it does is it's like flake 8 or something you run it against your code and if you've used these type annotations which are just editor notes basically they have no runtime behavior for most frameworks uh i've seen some people try
Starting point is 00:05:05 to make use of it and that's it's been pretty cool what i've seen but generally it's just a like a here's a note for the editors to give you some hints my pi will check that through as it follows the you know the flow of your code right yeah so that's pretty good even works on python 2 which doesn't support type annotations but there's like a doc string style of doing it. So the big announcement is that Dropbox has just released something called PyAnnotate. So PyAnnotate builds on MyPy. And instead of just going, okay, great, so you wrote this code, and then you went and you added type annotations, I can tell you if it's correct. PyAnnotate will say you wrote a bunch of code, or you inherited a bunch of code, I will annotate it for you. That is awesome.
Starting point is 00:05:50 It's pretty cool. Yeah. So basically, if you've got some amount of code you want to annotate, what you do is you can go and like import some profiler hooks. And you can do it just on a function by function or you know, call graph by call graph section and say, start collecting annotation information here, stop there. And it generates a JSON file with all the info. And then if you want,
Starting point is 00:06:13 you can run a separate command line utility, pass it that JSON file plus your source files and it will then go put the type annotations in it. So I think this is huge and I really like it. I think it has a potential of being huge. There's a few things I'm on the fence about. Like what? Like it only does the Python 2 style comment annotations so far.
Starting point is 00:06:37 Yeah, that's not so amazing. Well, hold on. Let me look. Let me pull this up. So one of the things I think this is actually coming from is the fact that Dropbox is trying to move away from Python 2. I'm pretty sure that's why this whole thing exists. You're right. It does do the Python 2 style, which is kind of annoying, but I guess, you know, it wouldn't be that much work to like migrate it up. Maybe some enterprising person
Starting point is 00:07:02 will add that feature, the Python 3 style, which I think is much, much nicer. A version of PyAnnotate. Yeah, a PyAnnotate 3. Yeah. One of the comments is, it's pull requests accepted. Beautiful. Yeah, that's really cool. So I think the plan is those guys have one of the largest code bases in Python, period. And it's all in Python 2. Well, I should say all, I don't know. I think much of it is in Python, period. And it's all in Python 2. Well, I should say all. I don't know all. I think much of it is in Python 2.
Starting point is 00:07:33 And so here's a great way to prepare this for some kind of automated migration or much stronger migration story. Yeah, it's definitely a step in the right direction. I think it's really cool. Yeah, very cool. Maybe somebody will take this and do something fun with it. One of the other parts of it is the little boilerplate that you've got to do to try to import your code and run it to generate that stuff. There's somebody already, the Kencho Engineering, has released a project called PyTest Annotate that makes this a little bit cleaner. So with PyTest Annotate, you can just run your tests against your code
Starting point is 00:08:06 without doing any hooks into your code for the PyAnnotate. And it will generate all, it does all of the start and stop or the... The resume and stop, whatever it is, yeah. Yeah, the resume. And it, yeah, it generates that stuff for you with that. Again, these are all in the early phases and there's a few caveats with it, yeah, it generates that stuff for you with that. Again, there's, these are all in the early phases and there's a few caveats with it,
Starting point is 00:08:28 but I played with it a little bit and it's a lot, it's pretty easy. There's just a couple lines of code to generate some, to get annotations out of your code. It's pretty cool. Yeah, I think that's really great. And so basically you can run individual tests or all the sets of tests and everything under test will then have type annotation information available for it. Then one more line command line thing and you'll put it back in code. Yeah, I tried it out. One of the things I do like about the Piantate is there's by default, it doesn't modify your code, but it tells you what you should change.
Starting point is 00:09:02 And then if you want to have it actually write the code, you add a dash W flag and it'll write it. So that's a good behavior. I like it. Yeah. It gives you the option to see what's going to happen before you actually commit. I mean, we have source control. I hope people are using source control. Yeah. But still. Awesome. So before we get to the next item, I want to tell you guys about DigitalOcean Spaces. So DigitalOcean Spaces is online object storage, file storage for your applications and all the other things you might use something like Amazon S3 for. But it's much, much more affordable instead of being, say, $93 for the first terabyte of traffic. It's $5 and you get free inbound traffic, all sorts of really good stuff. And after that, it's still 10 times,
Starting point is 00:09:48 nine times cheaper than AWS. So really great, same APIs. You can just switch over there super easy, more or less just point your client at a different URL and you're still doing the same type of thing. So check it out at do.co slash Python. Speaking of server code that wants to store stuff in places and link other people to it,
Starting point is 00:10:09 have you ever created a systemd service for Linux? I have not. I haven't either. It always seemed like kind of a complicated thing that you'd have to set up. So systemd is the more modern sort of daemon service for at least Ubuntu. I think on other ones as well, but I only play
Starting point is 00:10:25 with Ubuntu. So that's a really early one that I encountered on. And there's this guy who created just showing how to use a Python script as a system daemon in the systemd service. And then you can control it with like service control and all those sorts of things, just like you would say Nginx or MicroWSGI or some other major built built-in server component. It is super, super easy. You basically create a Python file and you create this little.service file. Those are both in the gist. Copy and restart in location, run a few command line arguments to enable them and start them, and off it goes. You can just have a little while true, go do your stuff work in your Python script and it'll just run indefinitely
Starting point is 00:11:08 and even auto start when Linux boots. Oh, that's cool. And it's super easy, right? Are you looking at the code? I mean, it's like... Yeah, I mean, it's just a handful of lines of code. That's it. Yeah, and it's just, it's basically a configuration.
Starting point is 00:11:18 It's probably like eight lines of configuration, half of which is like headers. So it's really, really super easy. So if you need to have stuff running in the background and just run with your system on Linux, check this out if you want to write that in Python. Nice. Cool. Yeah, for sure. So you were talking about PyTest before. PyTest is shiny and new again, right? Yes, there's a new version came out, PyTest 3.3. And there's quite a few changes, one of which is they're not supporting
Starting point is 00:11:47 a couple versions of Python anymore. I think 2.6 and 3.3 are out now, so you have to do either 2.7 and above, or 3.7 or 3.4 and above. Yeah, that's right. The Python 3.3 just went out of support in its own right, so those are probably tied. I'm not sure about 2.6. There's a bunch of new features which are exciting,
Starting point is 00:12:12 but the most exciting thing is just a visual thing for me, is that PyTest now displays a progress percentage while running tests. So you get along the right-hand side of your terminal window, you'll get percentage of tests done. And I imagine it's based on just the number of, it does collections first and it's probably just the number of tests.
Starting point is 00:12:34 Yeah, it probably doesn't go, okay, this one last time took 10 seconds and this one took one. So you have, you know, you've whatever, right? I don't know that for sure, but I'm guessing that. Yeah, yeah. It'd be awesome if it had kind of both, but I can totally see why that wouldn't make any sense.
Starting point is 00:12:49 Yeah, and then one of the other things that PyTest has always been great about is capturing standard out and standard error and display those. If there's, for test failures by default, you can display them all the time if you feel like it. And also you can write tests around the captured output and test against that.
Starting point is 00:13:09 And they've added built-in support for capturing the output from the standard logging module, which is quite helpful for people using the logging module. Oh, yeah. How nice. That's pretty cool. Now I've got to go out and test my entire book to make sure that it still runs against
Starting point is 00:13:27 my test 3.3. Ah, the joys of being an author. You're never done. Yeah, I'm pretty sure everything looks pretty compatible, so it shouldn't be an issue. That's cool. Think of it this way. Someday it'll break bad enough you have to write a version to a second edition. Yeah, that's the
Starting point is 00:13:44 plan. Yeah, for sure. Cool. All right. So I want to wrap this episode up with something pretty straightforward, but also it kind of gives you a really unique technique. So it turns out that if you're going to create a dictionary,
Starting point is 00:13:58 as we all know, there's multiple ways to do this in Python. Same for list, same for strings, same for tuples and so on. I could say D equals open curly, close curly, that's the sort of language way. Or there's the more type driven way where I say d equals dict, open close parentheses, right? So you either use the curly braces or use the dict similarly list or square brackets, or set, I guess that you can do it, but tuples and things like that. So there's the type way and then there's the built-in way.
Starting point is 00:14:27 It turns out that the built-in way is faster. Okay, that's kind of an interesting piece of trivia. But what's really interesting is this guy wrote an article called why D equals curly braces is faster than D equals dict. And he goes through the analysis and he uses the dis module and he goes through and he actually disassembles the line that uses curly braces and the line that uses dict and analyzes why the one is like 20 slower whatever the numbers turn out to be it's fun anddy. It looks like just one extra bytecode or something like that.
Starting point is 00:15:05 Yeah, the main thing that makes it slow is when you use the type way, you're effectively calling a function. And when you're calling a function, it needs to load the global variables and check to see if that function is overridden in the local scope rather than in the major scope. So it can't be convinced that stir or dict or whatever is what the built-in one means. So it has to kind of load up the state and check it out and then carry on. And it turns out that that makes that slower. And so this is all interesting, right? But it's kind of just like a little trivia trick. But the reason I brought up this article is if you look farther down at the end, he analyzes something that has nothing to do
Starting point is 00:15:48 with this whole dict versus curly thing. He says, let's suppose we're going to do some mathematical calculations with like math.floor and logarithms and so on. There's a way to structure it. We're using the functions directly out of, say, out of the globals that you've imported. So you say import math and then math.flow or math.log10 and so on. And then there's another way to like pass those into the function. The passing it in
Starting point is 00:16:17 means you get to skip that load global for really hot loops or really short functions that are called super frequently. And that's like 22% faster by just passing them in from the outside than calling them directly. So if you're really trying to optimize something, this is a super simple, non-obvious trick to get like a significant speed up. That actually to get around loading globals. Isn't that weird? Yeah. I didn't know you could get around that. So that's cool. I didn't either. Apparently you can. And I just think it's an interesting way of going like, all right, here's this incongruity. Like, why would these have any different speed?
Starting point is 00:16:52 They're effectively doing the same thing in the end. And then using the dissimilarity to analyze it and then seeing, okay, well, here's the problem. How do we get around that? Let's make this other unrelated thing faster. I think that's just fascinating. Yeah, that's pretty cool. For sure.
Starting point is 00:17:04 All right. Well, that's pretty much it for our news this week, everyone. Hopefully, enjoy it. I thought all of them were very, very cool. I do have one follow-up item for you, Brian. Okay, great. Remember I told you guys a couple of weeks ago about All Work, All Play, that weird esports championship thing that apparently has taken over? Yeah. So there's this article that came out in Ars Technica that caught my attention that's really closely related to that. And I love Ars Technica. It says F1 esports is now more exciting than the real F1. As in Formula One, like the many, many million dollar racing teams. And it says, look, watching the esports version is actually more interesting.
Starting point is 00:17:42 And they go through and they talk about why that is. It was just like the first world championship of F1. And they have the 20-minute race video with real announcers and this super excited Italian guy as one of the announcers. And if you look through the comments, I think they might be right. I think esports F1 might actually be more interesting than real F1 racing. And I love racing things, like real racing, not game racing. That's kind of cool though.
Starting point is 00:18:06 I'll have to go check this out. Yeah. Yeah. So if this sounds interesting to you guys, check it out. Watch that video for like five minutes and wait for the Italian announcer. He's awesome. All right. Great.
Starting point is 00:18:16 Well, hopefully you guys can enjoy that and find some cool stuff in the news. Brian, thank you for sharing this with everyone. Yeah. Thank you. You bet. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it
Starting point is 00:18:40 our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.