Python Bytes - #89 A tenacious episode that won't give up

Episode Date: August 4, 2018

Topics covered in this episode: tenacity Why is Python so slow? A multi-core Python HTTP server (much) faster than Go (spoiler: Cython) Extras Joke See the full show notes for this episode on th...e website at pythonbytes.fm/89

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 89, recorded August 2nd, 2018. I'm Michael Kennedy. And I'm Brian Ocken. Hey, Brian. How you doing? I'm doing great. It's good to talk to you again. Yeah, you as well. And man, we got some cool stuff lined up this week. We always say it, but there's really so much happening in the Python space. It's more exciting every week. Yeah, I think it's because we're here and then people are excited about Python and then they do more and then it builds on itself. It's
Starting point is 00:00:29 this awesome feedback loop. I can totally feel it. What also is awesome is Datadog for sponsoring the show. So if you need infrastructure monitoring or monitoring of your apps, check them out at pythonbytes.fm slash Datadog. Tell you more about them later. Brian, you found some code that's pretty tenacious out there, didn't you? It just won't quit. Tenacious is a fun word. And there's a project called Tenacity that is pretty cool. We'll, of course, have a link in the show notes.
Starting point is 00:00:57 But the Tenacity says, Tenacity is a general purpose retrying library to simplify the task of adding retry behavior to just about anything. And there's a little code snippet, but the gist of it is, is you, you import retry and it can have a lot of options, but you can just, the defaults just sort of work. Also, you can just put this around a function. And if your, if your function raises an exception it uh just tries it again and and eats the exception so for a lot of stuff this is terrible it's a terrible idea in a lot of places but a few places um this this might be reasonably good especially like if for instance
Starting point is 00:01:40 if you're gonna like i guess i'm i'm going to come with with for instance you guys know in places where retrying is a good idea like maybe saving saving something to a file system or connecting to a service that sometimes has contention yeah or even a database like what if you have to like restart the database server or something to that effect right or you need to really quickly apply a migration right you could say just keep retrying the database until you get there yeah and then there's a whole bunch of extra conditions you can put on it you can say try so many times or a wait time for um if it doesn't work after a while then give up then you can customize uh which exceptions that it catches or doesn't catch. It even has retry on coroutines, which is kind of fun.
Starting point is 00:02:31 I haven't tried that, but I've tried the simple case. But I'm going to use it right away. We've got several conditions where we're in testing devices where sometimes it takes a device. We're doing like little Wi-Fi devices, for example. It might be asleep, so it might take a while for the thing to wake up and respond so a number of retries is good and we have like retry code all over our code base and so having decorator do that for us is just pretty slick i like it yeah it's really cool and you know like i said the database thing like you wouldn't just put retry and say continue to hammer the database indefinitely if you can't if you run into any problems but you know like that like try five times with an
Starting point is 00:03:09 exponential fall off just so you could basically handle five second down times things like that be nice yeah there's definitely be parts of like you said a distributed system where you got things that sometimes are are not available for a little while. Yeah, especially stuff that you don't control everything, like other services you depend upon. Yeah. Yeah, pretty cool. So I'm going to bring a theme into this show here for the next couple of topics that I'm going to bring in.
Starting point is 00:03:39 I think it's actually going to touch on every one of the things I'm bringing. So let's start with Anthony Shaw. Of course, we've got to cover an article by Anthony Shaw, right? Yeah, he writes a lot of good stuff. Yeah, he does. And one of the ones that I came across here is called Why is Python so slow? So I think it's interesting to even ask the question, like, is Python slow and explain how that is. So Anthony looks at some benchmarks comparing, say, Python to Java to C Sharp to C++ and things like that and talks about, well, in some cases it is slower,
Starting point is 00:04:11 and why might that be, right? So basically answering the question, like, Python completes a comparable operation like two to ten times slower than, say, Java or C Sharp. Why? And why can't we make it faster? So he has three main hypotheses, which are pretty interesting. One is it's the GIL, the Global Interpreter Lock. The other is it's because it's interpreted rather than compiled.
Starting point is 00:04:40 And the final one is it's a dynamic language. So those three ones are pretty interesting. What do you think? Okay, so he's not saying that these are the reasons, but these are theories. These are theories, and then he goes through each one of them and pretty deeply looks at how they work and then compares them.
Starting point is 00:05:00 So, for example, it's interpreted, not compiled. Well, what does that mean in terms of trade-off? Let's compare that to, say, the way C Sharp does things and the way C++ works. What are the tradeoffs there? So we'll go through some of them. One is it's the GIL, right? Now, modern computers, modern processors have multiple cores. And if you actually look at Moore's law,
Starting point is 00:05:32 Moore's law is still alive and well, right? The number of transistors in a chip, but people sort of correlated that indirectly to, well, that means computers are getting faster and faster and faster. But it turns out that a number of years ago, four, five, I don't know how many years ago, not too long ago, the actual clock speed no longer kept doubling along with Moore's law. It sort of went flat, more or less. And what started happening was we got two core machines and then four core machines. And I just got a new MacBook. It has six hyper-threaded cores. It's crazy, right? But on Python, if I want to take advantage of that computationally, it's super hard within one process because of the GIL. So he said, well, if you want to take advantage of modern hardware, maybe the GIL is the problem. So he talks about some of the tradeoffs there, when it matters, when it doesn't matter.
Starting point is 00:06:16 So, for example, if what you're doing is IO bound, it basically doesn't much matter, right? The GIL is released when you're waiting on like network calls and stuff like that. So in some sense, like the GIL is not the problem. If you created a bunch of threads and they all started, you know, reading, writing files, talking over the network, it should just automatically handle that. But if on the other hand,
Starting point is 00:06:37 you tried to create six threads and do computational stuff, you'll still probably get 12% CPU usage on my machine because, you know, it only really gets to run one at a time. So that's one of the theories. And I think this theory applies more in some places and less in others. And I kind of touched on that a little bit. Like if your goal is to do computational mathematical things, the GIL can really, really matter. It makes a big difference, right? Because you're trying to execute your Python code, it doesn't let go of the gill. But if say you're building a web app,
Starting point is 00:07:10 it probably doesn't matter, right? There are some ways and you can do some things that would be better. But it doesn't really matter. So for example, if I looked at the various servers before we came on today, the training.talkpython.fm site has 16 worker processes all running parallel versions of the website handling requests. TalkPython itself has eight. I think maybe Python Bytes has eight as well.
Starting point is 00:07:35 So anyway, there's these eight processes. And sure, one of them may lock up something with the GIL, but there's a whole bunch of others that can leverage those other CPU cores and just keep on rocking. So if you're doing web stuff, it matters less in this sense, I think. And I mean, even if you are, there's ways to get around it. Yeah, if you can break up your algorithm and do sub process type parallelism. Okay, so that's the GIL. The other is could be it's an interpreted language. And it's,
Starting point is 00:08:02 I think this one is the most, most interesting, probably. So it is an interpreted language. And it's, I think this one is the most, most interesting, probably. So it is an interpreted language, but actually, it does compile code to bytecode, but it just it doesn't JIT compile it, right. And so one of the main considerations around JIT link compiled languages versus not is startup time. So if our Python code is going to start and run for a while, then doing a whole bunch of JIT optimization would be maybe a little slower to start, but then faster to run. But if we want to just do some CLI stuff
Starting point is 00:08:33 that starts really quick, does a tiny thing and goes away, a whole bunch of JIT stuff might be sort of counterproductive. So there's a pretty interesting comparison against C Sharp and Java and CPython here. The other thing I think that's worth throwing in here is because of C extensions and things like that, it's an interpreted language. I think that's, that's a simplistic view. That's like, well, if you just take straight Python code and just run it on Python, you
Starting point is 00:08:58 don't interact with any libraries. But if you work with NumPy or if you work with SQL alchemy or a whole bunch of stuff that has C extensions to make certain parts fast, well, all of a sudden it's not interpreted, right? So there's these weird blends. All right, last one. It's because it's dynamically typed. So this is also, I think this is really interesting. I think actually this is probably why. And I'm going to throw in another one unless I get distracted and forget it. So I think this is really why it's probably the slowest,
Starting point is 00:09:28 is that it's a dynamic language. And it's not that you can't make a dynamic language fast, but because it's so flexible, it's hard to know how to optimize it. Right? Like you might want to inline a function, but somebody could monkey patch that function, and then you wouldn't be inlining the right thing, for example. And then you monkey patch it only sometimes.
Starting point is 00:09:49 What do you do then, right? So things like method inlining, which can really make things faster, is super hard because that could actually change. Where say in C sharp or C++ or whatever, the method won't change. Yeah. Okay. All right. Final one.
Starting point is 00:10:03 This is mine. I'm adding this to his. And it sort of has to do with this dynamic typing thing is everything is a reference type allocated on a heap in Python, right? Some of the stuff that makes C++ and C Sharp really, really fast is things like numbers and other stuff are allocated on the stack.
Starting point is 00:10:21 And when you work with them, you never do pointer dereferencing. You never do reference counting. You never do garbage collection or memory management. You just work with little bits on the stack. And because that little thing on the stack could become a full blown list or something just by changing what it points at. I think probably that also makes a big difference. Okay. There's also like a function, function calls are slower than they need to be. And I think that's one of the things the Python core team is working on is to try to... And luckily there have been some advances there in the latest version of Python.
Starting point is 00:10:54 I think 20% or something they got. Was that 3.7, I think, that they got that much faster. So work is being done. But yeah, it could be more, I guess. But what's really interesting is the trade-offs or sort of comparisons of the article. The thing that I want to, there's a forest and the trees sort of issue I have here is that I don't think Python is slow. Yeah, I'm with you. And my people time is way more expensive than computer time for like 98% of the applications in the world, as far as I probably.
Starting point is 00:11:27 Well, and somebody made that comment below in the comment section of this article. It said, well, if you're optimizing milliseconds versus nanoseconds, yes, maybe. If you're optimizing weeks versus month from idea to shipped, you know, Python's not slow at all. It's really fast. Yeah. And that, I mean, that's what I see is the maintenance, development time, the maintenance time, all of those extra people time things. Python is way faster. And a lot of these things that we say is like the GIL is a problem or multiprocessing is difficult.
Starting point is 00:11:59 Well, multiprocessing isn't easy. Maybe it's easy in some languages. I mean, Go is sort of designed to do that from the start. But getting a complex algorithm in C++ to utilize multicores, that's not a trivial task either. No, it's not. It's definitely not. And I would say on the web framework side,
Starting point is 00:12:18 actually Python is really quite fast. I've compared it against other like C Sharp- based JIT compiled things like ASP.NET and stuff for some conference talk I did comparing Python to the.NET stuff. And actually Python was not just as fast, but faster than the JIT compiled C sharp stuff. Yeah. And also just for people that do think that maybe the Python speeds, the problem really measure it and you can optimize there's ways in Python to optimize the parts that are slow. Yeah, and my next item will come back to this and show you some ways to make it faster.
Starting point is 00:12:51 Okay. All right. So what's this Mew thing all about? You talked about Mew before, right? It's like a simplified IDE type thing. Is that right? Right. So I'm going to highlight Mew again,
Starting point is 00:13:04 partly because I think it's a neat project that's going on and there's a lot of cool people working on it. But there's an article called Keynoting in Mew, and Mew being M-U. And in the EuroPython 2018, David Beasley did a talk in a demo called Die Threads. And what amused me is my first thought was is this a german joke and he actually addressed it early on it's not a german joke but it's a good demo but he used mu during his python talk and this article talks about and also asked him about it why he did that and um it's just a simple thing there's not it's the same experience for everybody. Like, for instance, I use PyCharm, but my environment in PyCharm, all the different colors I like or the plugins I use, it's going to be different than everybody else. It's one of those customizable things.
Starting point is 00:13:55 And so having a very simple interface like Mew that works as a learning tool but also shows people exactly what you're doing. So like one of the, one of the features of it is to, if you've got a little script that you can want to run, you can just push run at the top and then automatically a little, a little window pops up at the bottom and shows you the output of the thing you're running. And that's, um, that's just really handy. And so it's, it's kind of fun to watch that in a talk and being used. And I think that would be, I use it, do little demos for people at work. And I think I might try this also just to not have to answer questions like, what plugin are you using or whatever.
Starting point is 00:14:37 Yeah, just sort of keep the distractions to a minimum, huh? Yeah, and it's a real clean interface and looks like you can change the font size and it looks fun. Plus, Mew is, if you haven't played around with it yet, it's also something that it automatically has hooks into things like running micro bit and Raspberry Pi and stuff. A micropython and embedded IoT things. Yeah. Yeah, very cool. Yeah, I like it it that's a nice one so before we get on to my other my next performance thing let me tell you guys about
Starting point is 00:15:11 datadog so datadog is sponsoring this episode like they have many thank you to them for that of course so they have infrastructure monitoring distributed tracing and they've added logging they provide end-to-end visibility for requests across different parts of your infrastructure, as well as health and performance monitoring of your Python apps. So get started with them today for a 14-day free trial. And of course, they'll send you a cool Datadog t-shirt on it. Just go to pythonbytes.fm slash datadog and check it out. So yeah, very cool stuff from Datadog. Brian, I told you about the performance thing and whether Python is slow. I agree with you.
Starting point is 00:15:47 I generally find for what I do, it's like actually blazing. So not a problem, but it sort of depends on choosing the right infrastructure. Yeah. Yeah. So there's this interesting proof of concept done by this, I forgot the name, a European open source consortium. And the basic theme of this article is sort of exploring a response to the question of,
Starting point is 00:16:13 so I've heard Python is slow, is it? And I think it depends. So what they've done is they've created a multi-core, talked about how that can be hard to take advantage of, but here it is, multi-core Python HTTP how that can be hard to take advantage of but here it is multi-core python htp server that is much faster than go so you've heard a lot of people say well we're going to go to go because it's parallelism is so much better than python and it's fast so we can you know do things fast and of course there's that big trade-off with sort of the functionality and speed to market or speed to idea completion and so on.
Starting point is 00:16:48 But this thing is like, hey, and we could even go as fast or faster. So they compared against actually two Go web servers, and it's faster than both of them. So what it is, is these guys have gone and said, we've talked about this idea before. They said, we want to go look at all the various C-based, not Python, C-based sort of co-routine libraries that will let us write low-level HTTP servers. And it turns out that there were not that many good options. And the best option that they found was still slower than Go. So they're like, well, maybe this isn't going to work. But then it turned out they found this thing called LWAN, L-W-A-N, which is a C library.
Starting point is 00:17:25 And they used Cython to create a Cython-based web server that wraps it up and exposes it to Python. Ooh, neat. Cool, right? So basically, it's not really ready to ship or anything. It just validates the concept of creating a high-performance thing with Cython. And I think that's a big part of the article. So it says, here's some interesting things about Cython. And I think that's a big part of the article. So it says, here's some interesting things about Cython. It's both an optimizing static compiler and a hybrid language that gives you the ability to write Python code that can call back and forth with C and C++
Starting point is 00:17:55 really easy. It has static type declarations that make Python code faster because it can do like, doesn't have to treat everything like a reference type. It can, you know, put integers as integers on the stack and whatnot. And the other thing is it releases the GIL by just having like a keyword in Cython. So you can say this part actually don't need the GIL. I'm doing this in C, leave me alone. Isn't it interesting how that actually hits so many of the things that Anthony brought up, right? Yeah. The typing, optimizations, and the GIL stuff is all right there. So it generates super efficient C code that has been compiled into a Python module. So it's really great for wrapping up C libraries
Starting point is 00:18:36 and exposing them to Python. I hope they continue on with work on this. Yeah, it looks pretty awesome. Neat. Yeah, anyway, I think if you're interested in sort of checking out the performance, definitely have a look. It's, it's pretty cool.
Starting point is 00:18:48 So are you going to cover something on testing in this episode? Oh yeah. It's my turn again. It's your turn for a theme. Where am I at? I'm trying to get on here. Oh yes. I am very excited about some news that came out last week.
Starting point is 00:19:01 So I've been playing with, I started PyCharm, uh, using PyCharm a while ago. You've been using it off and on. I think you use lots of stuff now. Yeah. Yeah. I've done it for quite a while. Yeah. The PyTest support has been getting better and better in the last year or so. And, uh, the PyCharm released 2018.2, I think it was last week, and it totally beefs up support for PyTest fixtures. And I just wanted to mention that because anybody that's using both PyCharm and PyTest definitely needs to update to 2018.2 because a few things that used to not work but do now,
Starting point is 00:19:42 if you have a fixture that you're... So a fixture, if anybody's not familiar, a fixture is just a little piece of code, a little function that you declare as a fixture that you put it as the parameter to your test and it gets run before your test gets run at various levels. That's a simplification, but that works. But it shows up as a parameter, but you don't actually have to use it in your function. It just tells PyTest to run that code. Well, PyCharm used to flag that as a variable that isn't referenced.
Starting point is 00:20:13 So it gives you a warning. And you also couldn't do code completion with it if it was an object that had methods on it. And you also couldn't use it to look up where, where's that fixture defined? Because it's often defined in a different file in a confidence file or something, but now you can, all those things now work. So big thank you to the PyCharm team for getting that out so quickly. And, uh, yeah, just, uh, that I just wanted to let people know about that. Yeah. That's really awesome. PyCharm just keeps getting better and better. All right. So speaking of getting better, let's go back to Python performance in an indirect way.
Starting point is 00:20:47 You got like this bone that you won't let go of, man. No, it's good. So this one actually hits on two themes that I think we've touched a lot on. One is this performance kick that I'm on today. But the other is packaging. Like we've talked about how it's not super easy to package up Python. So for example, Go compiles to a single executable binary with zero dependencies. You take that, you give it to somebody, they run it. That's not how it works for complicated apps that have
Starting point is 00:21:14 package dependencies and stuff in Python. You can't just easily go here, double click this. You're like, yeah, good luck. Hold on. First pip install requirements. Now what else? Oh, you got the wrong version of Python. Hold on. So the whole packaging thing, you know, there are some attempts to deal with it like CX freeze and stuff like that. And it's somewhat working, but here's a interesting thing from Facebook called a czar. So a czar is an executable archive and basically what it is is you can package up some bunch of code into a single executable file that you can then mount as like a separate file system that's read-only like like a cd or something i guess and so you can mount this and then execute it but because it came as a whole like block basically it's sort of a native
Starting point is 00:22:05 file system that you know you can read files that are next to your python files and you just package up your dependencies and run it interesting yeah so it's this read-only file system which when you mount it it looks like a regular directory to user space the one drawback like i was when i saw this i'm, oh my gosh, is this it? Is this the thing that is going to make it that we can just go here, double click this in Python. It turned out there's a minor little step here. This requires a one-time installation of a system level device driver for the file system. So maybe not so much just double click, but if you're willing to install this, you know, maybe like as your organization, you install it or on your servers, for whatever reason, you're willing to install this thing called squash FS. Then stars become these things you can just pass around and run.
Starting point is 00:22:55 So that's pretty cool. So that's a good caveat. But, you know, think of Facebook. Their goal is to make it super easy to deploy these applications onto servers and run them. Right. to make it super easy to deploy these applications onto servers and run them. So they must pre-configure their servers with SquashFS and then just make this part of their deploy mechanism. So there's basically two primary use cases for these SARs. One is simply collecting the number of files for automatic atomic mounting
Starting point is 00:23:19 somewhere in the file system. Cool. And you can use this thing called czar exec helper. It becomes a self-contained package of executable code and its data. So an example might be a Python app that archives all of its source code and its native libraries and configuration and all that kind of stuff. I still think that's pretty cool. Yeah. It's kind of a focused use, but still pretty cool. Yeah. But then it's a focused use, but still pretty cool. Yeah, but then it's a rabbit hole.
Starting point is 00:23:49 Now I got to go and read about Squash and what that is. Yes, it's true. So actually on this, it seems like it's generally, it could be a general mechanism for, I don't know, Ruby or JavaScript or Node or something like that. But they particularly call out Python on the GitHub page from the Facebook incubator. It says some of the advantages for using it for Python are it looks like regular files
Starting point is 00:24:14 on disk to Python, so that means you can just run CPython and it doesn't know any better. Same thing, it looks like regular files to use, so you don't need to use weird package import, package resource tricks or anything like that to pass stuff around standard and compression it doesn't require of unpacking so files apparently which like if you try to use one of the pecs mechanisms that was a problem so more or less it just like c python just works basically there your idea of like okay so there's
Starting point is 00:24:42 this dependency but if you're using it within an organization that totally makes sense, it's fine. And I, yeah, that sort of use case or within your own servers or whatever. Yeah. There's a lot of use cases where I think this would be very useful for people. Yeah. I mean this, like those places you described, there's, that's where a lot of Python is used. So it also has some interesting performance benefits, which is coming back to that. Because of the way SquashFS works, you're actually reading off a disk a smaller set of binaries because it's compressed, a smaller set of data. So there's less disk activity, and it's still really fast. So the startup time can actually be faster for your app than even native Python, which is pretty cool.
Starting point is 00:25:27 And so once it started, once it's pretty cool, there's some statistics in there that show it's either as fast or sometimes like the second time you've interacted with that, that SAR, because the file system actually decompresses and caches that stuff in memory, it can actually run a little bit quicker. So there's a lot of interesting things around performance there.
Starting point is 00:25:44 And finally, these file system things, these SARs, right? They're read-only, which means the integrity of your app is guaranteed, as opposed to, say, a virtual environment or like a folder where people could mess with it or change the Python system. Like, it's read-only. So what you gave them is what they're running. That's a cool thing, too. Yeah, there's a lot of neat stuff here.
Starting point is 00:26:04 I don't think it fits my use case, but maybe some listeners, it'll work well for them. Definitely. Yeah, a lot of people seemed excited about it on the Twitter. On the Twitters. Exactly. All right, well, that's it for all of our items. You got some extras you want to throw out there? Oh, just somebody mentioned to me, speaking of Twitters, that NumPy 1.15.0 was just released recently,
Starting point is 00:26:25 and they completely overhauled their testing to use PyTest. Yay. Yay. Another win for PyTest. That's awesome. Yeah. And then you've got a couple lists of some videos that are out. Yeah, I have a couple of events.
Starting point is 00:26:38 I have two in the future and two in the past. Maybe, depending on when you listen to this, actually all of them in the past. But right now, two in the future and two in the past, maybe depending on when you listen to this, actually all of them in the past, but right now, two in the future and two in the past. So SciPy 2018, the data science Python world conference, the videos for that are now out on YouTube. So if you couldn't make it to SciPy and you want to catch a bunch of the presentations there, here's a link to the videos. Also, PyOhio, we have a lot of these regional Python conferences, and somehow PyOhio has actually gained quite a bit of momentum, which is interesting with PyCon being there last year and next. Anyway, the videos for that are also out, so a bunch of good YouTube watching coming up here over the weekend.
Starting point is 00:27:17 Yeah, I've already started a couple of these. And then in the future, we've got… In the future, PyCon Canada is is coming so the call for papers on that is open for about a month and in the future uh pybay 2018 so the regional san francisco python conference is happening in a couple of weeks and i would love to go to that but i can't yeah i still haven't decided are you thinking about going i was thinking about going to pybay but there's a couple of weeks. And I would love to go to that, but I can't. Yeah, I still haven't decided. Are you thinking about going? I was thinking about going to Pi Bay, but there's a lot of stuff going on in the fall. So we'll see.
Starting point is 00:27:49 Yeah, I have daughters going to college right around then. So it'd probably be better if I helped them do that instead of just go hang out in San Francisco. All right, so the last thing is I just want to let people know that I have another course out, Building Data-Driven Web Apps with Pyramid and SQL Alchemy.
Starting point is 00:28:04 It's super fun. Nine hours of awesomeness. So there's a link there. Check that out as well if people are interested. That looks fun. Yeah, it's definitely a good one. It covers some things that I've wanted to cover for a while, like ORM migrations and managing stuff over time with databases and stuff. Pretty cool.
Starting point is 00:28:19 Cool. All right. Well, Brian, thanks as always. It's been fun. Thank you. Yep. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool.
Starting point is 00:28:43 On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.