Python Bytes - #23 Can you grok the GIL?

Episode Date: April 26, 2017

Topics covered in this episode: Grok the GIL - How to write fast and thread-safe Python The New NBA by Mark Cuban Ian Cordasco gets a Community Service Award from PSF Release of IPython 6.0 Testing... & Packaging AWS Lambda adds Python 3.6 support Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/23

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 23, recorded April 25th, 2017. And I'm one of your hosts, Michael Kennedy. And I'm Brian Ocken. And we're here to share a bunch of cool Python stuff with you. We've got six cool items queued up and ready to go. But before we get to that, I want to say thanks to Advanced Digital. They have an awesome Python job. You can check it out at python.advanced.net.
Starting point is 00:00:29 And we'll talk more about that later. Right now, I want to talk about the GIL, Brian, what do you think? I think it's a great to talk about the GIL. And I'm really glad that, so this is an article called Grok the GIL, how to write fast andsafe Python. And we talk about the GIL as the reason why we can't do parallel computing and programming just built in in Python. But, you know, I haven't really jumped into it a lot. And this article is from A. Jesse Giroud-Davis, who, by the way, is an excellent writer. If you want to have some examples of great writing, read his stuff. It's great. So he has this a little, it's a very lightweight introduction to what the GIL is, and to talk, and I like the approach of not just the details of it, because most of us aren't going
Starting point is 00:01:17 to go in and start hacking the CPython core, but a little peek into the C Python core to see that it's a mutex inside an interpreter lock. The global, the Jigil is the global interpreter lock. I love how he pulls out little snippets from C Python. He's got a section, behold, the global interpreter lock. And it just shows you like the C code. Yeah, it's just one line. Yeah, exactly.
Starting point is 00:01:39 And you don't really need to know a lot of C to appreciate it, but there's just like, there's enough to make it super concrete you're like this is actually the code that runs when you like call the socket like and it that's how the gil gets released for example right yeah talking about sockets and it really he really talks about that that it's a it's focused around the lock is around uh io whenever you were waiting for io and I think there's other places too, but that's the main place where your code will pause and let some other thread run. I like the, he has a thing that says,
Starting point is 00:02:14 it's so simple that you can, the effect on threads is so simple, you can write it on the back of your hand. One thread runs Python while in other sleep or await I-O. And he actually has a picture of his hand. I think it's his hand. Yeah. I was wondering if that's actually his hand.
Starting point is 00:02:29 Yeah. And if he wrote it, that means he must be left-handed because it's written on the back of his right hand or he had somebody else write it. So I was always curious about this. What are the limitations and it's, and how do you utilize it to have faster code? And,
Starting point is 00:02:43 and the gist of it is, is if you've got some code that's waiting IO, like maybe pulling off a whole bunch of different, taking a bunch of connections or downloading a bunch of URLs, that's a great place to use multithreading because the GIL doesn't really get in your way. In places where you really have multiple processing, where you really want your Python code to run at the same time, then you have to jump into multi-processing.
Starting point is 00:03:11 And he actually gives an example of that. And it's not that bad either. So anyway, I liked the quick jump into it. And I think I'm going to be a better Python programmer for reading this. Yeah, this is really nice work. Good job, Jesse. He's a great writer. I actually had him on TalkPython on episode 69, I think, about design patterns for programmer blogs. And we did a whole session on blogging. It was great. And one of the things I like about this is he talks about cooperative multitasking,
Starting point is 00:03:41 concurrency versus parallelism, preemptive multitasking, how sometimes you still need to actually lock your Python code, even though you might think of like, well, this is all straight Python, it's not going to get interrupted. But there's certain mechanisms that slightly vary between Python two and three, where the if you hang on to the gill too long, it will be potentially taken from you and given to another thread. And so that might still cause what would appear to be parallel race conditions. So that's also worth reading about. Yeah, and one of the things that surprised me is, and I do realize I don't really worry about that.
Starting point is 00:04:16 I deal with multi-threading in C++, and with C++, you have to do it fine-grain locking of data structures, any data structure shared by multiple threads. But I was surprised how much you can share between threads in Python because the GIL won't interrupt a bytecode. And it'll only interrupt, yeah, between bytecodes, not in the C code. So things like sorting a list will happen atomically, and you won't be interrupted with that, which is, that surprised me. I didn't know that. It is where, ironically,
Starting point is 00:04:54 incrementing a variable could be interrupted. Right, because it ends up being like a two-step or a read-modify-write operation. Yeah, exactly, exactly. And Jesse uses the disk module to look inside, which is all very good. So that's a great article. I think that's probably the most substantive thing we're covering.
Starting point is 00:05:10 Do you want to think about not so substantive, but pretty cool? Yes. I've got one for you. Let's talk about the NBA, as in National Basketball Association, the American basketball. So there was a pretty big deal on Twitter the other day. So Mark Cuban,
Starting point is 00:05:26 he owns the Dallas Mavericks. And he's, I don't know if he comes from tech or not. I don't really think so. But he definitely was an entrepreneur. He's, you know, worth like he's a billionaire, basically. But as a billionaire owner of a NBA team, he posted out a pretty interesting thing on Twitter saying, here's the new NBA. And it was a picture of him learning Python machine learning with, I think, iPython and iPython notebook open. And he's like, I need to understand the Mavericks and the NBA. I'm on it. Wow.
Starting point is 00:05:58 It's pretty cool, right? It is pretty cool. I don't know much about basketball or Mark Cuban or any of that, but it's neat that somebody that high up is wanting to learn Python and notebooks. That was basically the main takeaway. A bunch of people like our friends over at Partially Derivative invited him to be on the show. They're like, oh, we have to hear your story. He's like, no, no, I'm just getting started. They have a team at the Mavericks.
Starting point is 00:06:20 I just want to understand what they're doing when they use machine learning to help make predictions and planning. And that's kind of cool to think of how machine learning is actually like driving these professional sports teams as well. Yeah. Very interesting. Indeed. Indeed. All right. So next up we have somebody winning an award.
Starting point is 00:06:39 How cool is that? Yeah. Ian Cardasco. He got a or was announced that he will get the 2017 community service award from the python software foundation and um i think that's pretty cool it's uh apparent i didn't know that he was doing that a lot of the stuff that he did i mean i i was familiar with ian he was on test and code episode 13 and we talked about Betamax library that he has for recording and playing back requests, interactions. He's apparently been the election administrator for the PSF since 2015,
Starting point is 00:07:15 volunteering all that time, of course. And he is active in mentoring new coders and supporting other Python developers with apparently really a focus on trying to be active in mentoring women in Python. And I think that's just pretty awesome. Yeah, that's really awesome. So congratulations, Ian. And this project that you talked about, like replaying requests, that's called Betamax? Yes.
Starting point is 00:07:39 That's an awesome name. Yeah, yeah. I guess the idea, of course, of there's a VCR type library in some other languages, but he chose Betamax because, well, everybody knows Betamax was better. That's right. But yeah, you should listen to it. It's a pretty interesting tool. So that was one that the community asked me to do. There were community members of listeners of Testing Code that said, hey, could you go find Ian and talk to him about Betamax? That's awesome. We'd love to get those recommendations for all the shows,
Starting point is 00:08:10 including some stuff that we're covering here today, right? Yeah, definitely. Definitely. So if you want to work with these kind of fun things, maybe you work at a company where you're doing Java and you dabble in Python, or you don't really get to do all the cool things you'd like, Advanced Digital has a cool job offer for everyone out there who might want to make a change. I wish I was near Jersey City
Starting point is 00:08:32 because this sounds fun. It does sound fun. So, right, they're in Jersey City just across the Hudson from Manhattan there. Small, agile environment. They're mostly a Python shop, but they play with other cool technologies. They fund you guys to go to conferences, professional development, and most importantly and coolest, I think, is they run one of the 10 largest news sites by
Starting point is 00:08:53 traffic in the US, and they do it with Python. So if you want to be part of that team, you want to play with cool stuff like that, just visit python.advance.net and check it out. So there's a couple of things coming up, Brian, that have to do with Python versus legacy Python. Remember, Matthias from the IPython project, Matthias Boussonier, I'm sorry if I mess up your name, but I think that's pretty close. He was the original guy who got us talking Python versus legacy Python instead of Python 3 versus Python two. Oh yeah.
Starting point is 00:09:26 Right. Yeah. So he works on I Python and Jupiter and all that stuff. And he's back with a new blog post, which is my next item. And it's a pretty big deal. We just talked about Mark Cuban, the new NBA machine learning,
Starting point is 00:09:38 I Python. And so they just released I Python six. Okay. So that that's pretty big news. That is big news. Yeah. And so people who use IPython 6. Okay. So that's pretty big news. That is big news. Yeah. And so people who use IPython, you know, there's a brand new version. That's awesome.
Starting point is 00:09:51 The bigger thing is that this is the first release where IPython goes Python 3 only. They've dropped for Python 2. That's great. Or as Matthias would say, they now support Python and not mixing in legacy Python with it. And what I thought was nice is, you know, it's a pretty major project. They did a little write-up of what was their experience of converting a mixed source code to Python 3 only. What were the benefits and what were the drawbacks? So let's see.
Starting point is 00:10:18 A couple of things, a couple of stats that Matthias put out. The size of the IPython code base has decreased by 1,500 lines. That's pretty solid, right? That's significant. Less code means less maintenance. Right. They said it's not just because of dropping Python 2, but a significant amount is.
Starting point is 00:10:35 And even more impressive is they added some entirely new features that required hundreds of new lines of code. So really, the decrease in amount of code they had to support for Python 3 or really for 2 they were able to get rid of when they went to Python 3 is actually probably more.
Starting point is 00:10:52 So that's pretty cool. And they said one of the benefits they think is that contributors can spend less time worrying about, well, how does this work if we do it in Python 2? Or this has happened to me. You make a pull request, you submit it, it runs on the continuous integration and it works fine in Python 2, or this has happened to me, you make a pull request, you submit it, it runs on the continuous integration,
Starting point is 00:11:07 and it works fine in Python 3, but then it fails in Python 2 because you forgot the B in the string or whatever, right? So they don't have to worry about that. CI runs faster. They said basically, in summary, we're totally happy, we're entirely pleased with having switched to basically have the ability to write Python 3-only code. And they're looking forward to using a lot of the improvements
Starting point is 00:11:29 in Python 3, specifically async and await, which will be cool. So an async and await REPL inside of IPython. How cool is that? That's neat. Is async and await in all of the three versions, or did that get introduced? It came in 3.5.
Starting point is 00:11:44 Okay. The async IO stuff was introduced in Python three, four, I think. And then three, five, they're like, let's put some proper syntax on this and make it really easy.
Starting point is 00:11:52 Yeah. I'm, I'm trying to, I'm writing a little thing that I want to have available on Python two also. And, uh, at least two, seven.
Starting point is 00:12:03 And even if I were to just do Python 3, all of the three versions, I still can't use F strings, which I wish I could use F strings. I know, they're so new. It's 3.6 only. Like even on my production servers, it's 3.5. So it is what it is.
Starting point is 00:12:16 That's a move in the right direction. And I think it's great that Matthias and others talked about their experience with that change. Yeah, that's awesome. Yeah, thanks Matthias. Excellent. I think I'm, to use an American expression, beating a dead horse, but we have another... Is that dead horse called Source, S-R-C? Yes, yes. The other package I was talking about me building up, it's for the book, but I wanted to make sure I was representing the
Starting point is 00:12:40 community correctly in how to put together a distributable package and do it correctly, at least with best practice. I know there's not really a correct, but somebody pointed me to the direction. I have a article by, I'm going to probably get this wrong. I think it's Enoch and it's called Testing and Packaging. And it's basically,'s the guy that that did the adders project or attrs great project that we've talked about a couple times and how there were issues at least with uh with one package that wasn't using the source src that the testing that was done was there was a bug that showed up in with uh installed applications that doesn't show up in non-installed. So one of the benefits of using SRC is you can more easily make sure that you're only testing the installed package and not the non-installed.
Starting point is 00:13:37 And he also just shows that it's really just two lines of code change. So to do the right thing is not that much work. Right. So basically in your setup, PY, the call to setup, you set the packages to be looking in the source directory and you set the package to be in the source directory, right? Yeah. So when you would normally say find packages, he recommends specifically saying find packages
Starting point is 00:13:59 and then give it a where equals SRC. But you can also just put SRC as the first argument, and that works also. And then listing it in the packages dir. And then one of the things I noticed, which I don't think people have really talked about, is the entire repository looks better. You've got all of the package junk, like your setup and your manifest and all that stuff at the top level.
Starting point is 00:14:26 And the stuff you really care about on a day-to-day basis is separated into subdirectories. You've got the docs in one and the tests in another, and then your source in another. And that separation just, it pleases my organization. It just is nice. Yeah, I'm coming around to this as well. It sounds pretty solid. Anyway, but that's, I'll probably try to drop talking about that every episode, but there you go. One more article. Well, I'm not quite done beating the Python versus legacy Python horse yet. So I'm going to keep going on that one because there's some more big news. We've heard that IPython went to Python 3 only. And now same week, last week, AWS Lambda goes to Python or ads, not only, but adds Python 3.6 support was just
Starting point is 00:15:07 to 7. So that's a big jump, right? Wow, that's a big jump. Yeah. Yeah. So that's pretty awesome. And do you have much experience with Lambda? Have you played with it?
Starting point is 00:15:15 No, I've heard a lot about it, but I haven't played with it yet. So Lambda is one of these things from AWS, from Amazon, that fits into this serverless architecture. So basically you say, here's a function. And when something happens, run this, please. So run it on a schedule. Somebody changes a database. Somebody uploads a file to S3, whatever. And it just runs.
Starting point is 00:15:40 There's no servers that you deal with. Obviously there are servers, but like it just distributes your code to run when it needs to i'll cross a whole bunch of servers so it scales basically infinitely you know as long as you have infinite money you can infinitely scale this it's fine right and that's that's pretty cool yeah so you just have you tried it no i have not had a real reason to do it i mean i guess there's a couple of things that I could do. Like on the websites, there's a job that runs like every couple hours that will completely re-index the database and like reorganize it for super fast queries. Like the queries on the various websites run, and I'm going to be adding to Python bytes. No worries.
Starting point is 00:16:22 They, you know, run like sub millisecond, right? In order to get that stuff, you've got to pre-compute some things. Maybe that's a perfect lambda operation. Especially now that they have 3.6 support, I'm intrigued enough that I might give it a shot anyway, just to make up some excuse to play with it. Exactly. We need to run this. But if you're using other AWS stuff, like their database services, Dynamo or RDS or S3, or, like, here's a way to run code, like, really near your resources on triggers with no effort. And one of the things I thought was pretty cool, like, this announcement just came out.
Starting point is 00:16:58 And Zappa, so Zappa, if you look at their page, which I linked, it's called Serverless Python Web Services. That's interesting, right? So basically, you can set it up so that using the AWS architecture, you can route web requests to these Lambda functions. But you don't really have servers or anything like that. And people have been asking for Python 3 support. And they've been saying, no, no, no, no. As soon as this dropped, they're like, yes, it has Python 3 support. So that is pretty cool as well. So you've seen things that basically are layered on top of Lambda also starting to support Python 3, which is great. Yeah, definitely. Cool. All right. Yeah. Maybe we should play with Lambda.
Starting point is 00:17:40 I don't know. Yeah. Very nice. All right. Well, that's it for the news, Brian. You got any, uh, anything personally you want to share with everyone? No, I'm going to be, um, I guess, uh, I guess I'll be in the, in the Munich area the second week in May. If there's anybody around that wants to, wants to have a beer or something with me, hit me up. Yeah. That sounds awesome. I'm jealous. I'd love to go visit Germany. Well, I'll do that at the end of the, end of the summer, maybe we'll see, but no, no news for me. I just love to go visit Germany. Well, I'll do that at the end of the summer, maybe. We'll see. But no news for me. I just want to say thank you, everyone, for listening. Oh, you know what? Actually, one more thing.
Starting point is 00:18:09 This is not personal news, but it falls right in here. I also saw Check.io at checkio.org. These guys have a pretty cool gamification of learning Python. They also just went Python 3. Oh, cool. So just to keep on this, hey, Python 3 is starting to really roll. I'd say it's really starting to roll this week. I use Check.io for, hopefully I won't get in trouble for this, but I've gone through a bunch of this stuff and I use them for interview
Starting point is 00:18:34 questions. Yeah, I think it's actually pretty good. And what I really like is you can solve a puzzle and then you can look at other people's solutions. And I found after solving a bunch and looking at the solutions that I unknowingly have an implicit bias towards performance over ease of reading or simplicity or whatever. And, you know, it was just, it's interesting that it uncovered that for me. Oh, that's interesting. And I totally have the opposite. I like them to be readable more than anything else. Yeah. Yeah. Funny, huh? All right. Well, thanks, Brian. Thank you everyone for listening and we'll catch you next week. And thank you. Yep.
Starting point is 00:19:07 You're welcome. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
Starting point is 00:19:27 We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.