Python Bytes - #23 Can you grok the GIL?
Episode Date: April 26, 2017Topics covered in this episode: Grok the GIL - How to write fast and thread-safe Python The New NBA by Mark Cuban Ian Cordasco gets a Community Service Award from PSF Release of IPython 6.0 Testing... & Packaging AWS Lambda adds Python 3.6 support Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/23
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 23, recorded April 25th, 2017.
And I'm one of your hosts, Michael Kennedy.
And I'm Brian Ocken.
And we're here to share a bunch of cool Python stuff with you.
We've got six cool items queued up and ready to go.
But before we get to that, I want to say thanks to Advanced Digital.
They have an awesome Python job. You can check it out at python.advanced.net.
And we'll talk more about that later. Right now, I want to talk about the GIL, Brian,
what do you think? I think it's a great to talk about the GIL. And I'm really glad that,
so this is an article called Grok the GIL, how to write fast andsafe Python. And we talk about the GIL as the reason why we can't do parallel computing and programming just built in in Python.
But, you know, I haven't really jumped into it a lot.
And this article is from A. Jesse Giroud-Davis, who, by the way, is an excellent writer.
If you want to have some examples of great writing, read his stuff.
It's great. So he has this a little, it's a very lightweight introduction to what the GIL is,
and to talk, and I like the approach of not just the details of it, because most of us aren't going
to go in and start hacking the CPython core, but a little peek into the C Python core to see that it's a mutex inside
an interpreter lock.
The global, the Jigil is the global interpreter lock.
I love how he pulls out little snippets from C Python.
He's got a section, behold, the global interpreter lock.
And it just shows you like the C code.
Yeah, it's just one line.
Yeah, exactly.
And you don't really need to know a lot of C to appreciate it, but there's just like,
there's enough to make it super concrete you're like this is actually the code that runs when you like call the socket like
and it that's how the gil gets released for example right yeah talking about sockets and
it really he really talks about that that it's a it's focused around the lock is around uh
io whenever you were waiting for io and I think there's other places too,
but that's the main place where your code will pause
and let some other thread run.
I like the, he has a thing that says,
it's so simple that you can,
the effect on threads is so simple,
you can write it on the back of your hand.
One thread runs Python while in other sleep or await I-O.
And he actually has a picture of his hand.
I think it's his hand.
Yeah.
I was wondering if that's actually his hand.
Yeah.
And if he wrote it,
that means he must be left-handed because it's written on the back of his
right hand or he had somebody else write it.
So I was always curious about this.
What are the limitations and it's,
and how do you utilize it to have faster code?
And,
and the gist of it is,
is if you've got some code that's waiting IO,
like maybe pulling off a whole bunch of different,
taking a bunch of connections or downloading a bunch of URLs,
that's a great place to use multithreading
because the GIL doesn't really get in your way.
In places where you really have multiple processing,
where you really want your Python code to run at the same time, then you have to jump into multi-processing.
And he actually gives an example of that.
And it's not that bad either.
So anyway, I liked the quick jump into it.
And I think I'm going to be a better Python programmer for reading this.
Yeah, this is really nice work.
Good job, Jesse. He's a great writer. I actually had him on TalkPython on episode 69, I think,
about design patterns for programmer blogs. And we did a whole session on blogging. It was great.
And one of the things I like about this is he talks about cooperative multitasking,
concurrency versus parallelism, preemptive multitasking, how sometimes you still
need to actually lock your Python code, even though you might think of like, well, this is all
straight Python, it's not going to get interrupted. But there's certain mechanisms that slightly vary
between Python two and three, where the if you hang on to the gill too long, it will be potentially
taken from you and given to another thread.
And so that might still cause what would appear to be parallel race conditions.
So that's also worth reading about.
Yeah, and one of the things that surprised me is, and I do realize I don't really worry about that.
I deal with multi-threading in C++, and with C++, you have to do it fine-grain locking of data structures,
any data structure shared by multiple threads. But I was surprised how much you can share between threads in Python because
the GIL won't interrupt a bytecode. And it'll only interrupt, yeah, between bytecodes, not
in the C code. So things like sorting a list will happen atomically,
and you won't be interrupted with that,
which is, that surprised me.
I didn't know that.
It is where, ironically,
incrementing a variable could be interrupted.
Right, because it ends up being like a two-step
or a read-modify-write operation.
Yeah, exactly, exactly.
And Jesse uses the disk module to look inside,
which is all very good.
So that's a great article.
I think that's probably the most substantive thing we're covering.
Do you want to think about not so substantive,
but pretty cool?
Yes.
I've got one for you.
Let's talk about the NBA,
as in National Basketball Association,
the American basketball.
So there was a pretty big deal on Twitter the other day. So Mark Cuban,
he owns the Dallas Mavericks. And he's, I don't know if he comes from tech or not. I don't really
think so. But he definitely was an entrepreneur. He's, you know, worth like he's a billionaire,
basically. But as a billionaire owner of a NBA team, he posted out a pretty interesting thing on Twitter saying, here's the new NBA.
And it was a picture of him learning Python machine learning with, I think, iPython and
iPython notebook open.
And he's like, I need to understand the Mavericks and the NBA.
I'm on it.
Wow.
It's pretty cool, right?
It is pretty cool.
I don't know much about basketball or Mark Cuban or any of that, but it's neat that somebody that high up is wanting to learn Python and notebooks.
That was basically the main takeaway.
A bunch of people like our friends over at Partially Derivative invited him to be on the show.
They're like, oh, we have to hear your story.
He's like, no, no, I'm just getting started.
They have a team at the Mavericks.
I just want to understand what they're doing when they use machine learning to help make predictions and planning.
And that's kind of cool to think of how machine learning is actually like driving these professional sports teams as well.
Yeah.
Very interesting.
Indeed.
Indeed.
All right.
So next up we have somebody winning an award.
How cool is that?
Yeah.
Ian Cardasco.
He got a or was announced that he will get the 2017 community
service award from the python software foundation and um i think that's pretty cool it's uh apparent
i didn't know that he was doing that a lot of the stuff that he did i mean i i was familiar with
ian he was on test and code episode 13 and we talked about Betamax library that he has for recording and playing back
requests, interactions. He's apparently been the election administrator for the PSF since 2015,
volunteering all that time, of course. And he is active in mentoring new coders and supporting
other Python developers with apparently really a
focus on trying to be active in mentoring women in Python.
And I think that's just pretty awesome.
Yeah, that's really awesome.
So congratulations, Ian.
And this project that you talked about, like replaying requests, that's called Betamax?
Yes.
That's an awesome name.
Yeah, yeah.
I guess the idea, of course, of there's a VCR type library in some
other languages, but he chose Betamax because, well, everybody knows Betamax was better. That's
right. But yeah, you should listen to it. It's a pretty interesting tool. So that was one that
the community asked me to do. There were community members of listeners of Testing Code that said,
hey, could you go find Ian and talk to him about Betamax?
That's awesome. We'd love to get those recommendations for all the shows,
including some stuff that we're covering here today, right?
Yeah, definitely.
Definitely. So if you want to work with these kind of fun things, maybe you work at a company
where you're doing Java and you dabble in Python, or you don't really get to do all the cool things you'd like,
Advanced Digital has a cool job offer
for everyone out there
who might want to make a change.
I wish I was near Jersey City
because this sounds fun.
It does sound fun.
So, right, they're in Jersey City
just across the Hudson from Manhattan there.
Small, agile environment.
They're mostly a Python shop,
but they play with other cool technologies.
They fund you guys to go to conferences, professional development, and most importantly and coolest, I think, is they run one of the 10 largest news sites by
traffic in the US, and they do it with Python. So if you want to be part of that team, you want to
play with cool stuff like that, just visit python.advance.net and check it out. So there's a couple of things coming up, Brian,
that have to do with Python versus legacy Python.
Remember, Matthias from the IPython project, Matthias Boussonier,
I'm sorry if I mess up your name, but I think that's pretty close.
He was the original guy who got us talking Python versus legacy Python
instead of Python 3 versus Python two.
Oh yeah.
Right.
Yeah.
So he works on I Python and Jupiter and all that stuff.
And he's back with a new blog post,
which is my next item.
And it's a pretty big deal.
We just talked about Mark Cuban,
the new NBA machine learning,
I Python.
And so they just released I Python six.
Okay.
So that that's pretty big news. That is big news. Yeah. And so people who use IPython 6. Okay. So that's pretty big news.
That is big news.
Yeah.
And so people who use IPython, you know, there's a brand new version.
That's awesome.
The bigger thing is that this is the first release where IPython goes Python 3 only.
They've dropped for Python 2.
That's great.
Or as Matthias would say, they now support Python and not mixing in legacy Python with it.
And what I thought was nice is, you know, it's a pretty major project.
They did a little write-up of what was their experience of converting a mixed source code to Python 3 only.
What were the benefits and what were the drawbacks?
So let's see.
A couple of things, a couple of stats that Matthias put out.
The size of the IPython code base has decreased by 1,500 lines.
That's pretty solid, right?
That's significant.
Less code means less maintenance.
Right.
They said it's not just because of dropping Python 2,
but a significant amount is.
And even more impressive is
they added some entirely new features
that required hundreds of new lines of code.
So really, the decrease in amount of code
they had to support for Python 3
or really for 2 they were able to get rid of
when they went to Python 3
is actually probably more.
So that's pretty cool.
And they said one of the benefits they think
is that contributors can spend less time
worrying about, well, how does this work
if we do it in Python 2?
Or this has happened to me.
You make a pull request, you submit it, it runs on the continuous integration and it works fine in Python 2, or this has happened to me, you make a pull request, you submit it,
it runs on the continuous integration,
and it works fine in Python 3, but then it fails in Python 2
because you forgot the B in the string or whatever, right?
So they don't have to worry about that.
CI runs faster.
They said basically, in summary, we're totally happy,
we're entirely pleased with having switched to
basically have the ability to write Python 3-only code.
And they're looking forward to using a lot of the improvements
in Python 3, specifically async and await,
which will be cool.
So an async and await REPL inside of IPython.
How cool is that?
That's neat.
Is async and await in all of the three versions,
or did that get introduced?
It came in 3.5.
Okay.
The async IO stuff was introduced in Python three,
four,
I think.
And then three,
five,
they're like,
let's put some proper syntax on this and make it really easy.
Yeah.
I'm,
I'm trying to,
I'm writing a little thing that I want to have available on Python two also.
And,
uh,
at least two,
seven.
And even if I were to just do Python 3,
all of the three versions,
I still can't use F strings,
which I wish I could use F strings.
I know, they're so new.
It's 3.6 only.
Like even on my production servers, it's 3.5.
So it is what it is.
That's a move in the right direction.
And I think it's great that Matthias
and others talked about their experience with that change.
Yeah, that's awesome.
Yeah, thanks Matthias.
Excellent. I think I'm, to use an American expression, beating a dead horse, but we have
another... Is that dead horse called Source, S-R-C? Yes, yes. The other package I was talking
about me building up, it's for the book, but I wanted to make sure I was representing the
community correctly in how to put together a distributable package and do it
correctly, at least with best practice. I know there's not really a correct, but somebody pointed
me to the direction. I have a article by, I'm going to probably get this wrong. I think it's
Enoch and it's called Testing and Packaging. And it's basically,'s the guy that that did the adders project or attrs great project
that we've talked about a couple times and how there were issues at least with uh with one package
that wasn't using the source src that the testing that was done was there was a bug that showed up
in with uh installed applications that doesn't show up in non-installed.
So one of the benefits of using SRC is you can more easily make sure that you're only testing the installed package and not the non-installed.
And he also just shows that it's really just two lines of code change.
So to do the right thing is not that much work.
Right. So basically in your setup, PY, the call to setup,
you set the packages to be looking in the source directory
and you set the package to be in the source directory, right?
Yeah.
So when you would normally say find packages,
he recommends specifically saying find packages
and then give it a where equals SRC.
But you can also just put SRC as the first argument, and that works also.
And then listing it in the packages dir.
And then one of the things I noticed,
which I don't think people have really talked about,
is the entire repository looks better.
You've got all of the package junk, like your setup and your manifest
and all that stuff at the top level.
And the stuff you really care about on a day-to-day basis is separated into subdirectories. You've got the
docs in one and the tests in another, and then your source in another. And that separation just,
it pleases my organization. It just is nice. Yeah, I'm coming around to this as well. It
sounds pretty solid. Anyway, but that's, I'll probably try to drop talking
about that every episode, but there you go. One more article. Well, I'm not quite done beating
the Python versus legacy Python horse yet. So I'm going to keep going on that one because there's
some more big news. We've heard that IPython went to Python 3 only. And now same week, last week,
AWS Lambda goes to Python or ads, not only, but adds Python 3.6 support was just
to 7.
So that's a big jump, right?
Wow, that's a big jump.
Yeah.
Yeah.
So that's pretty awesome.
And do you have much experience with Lambda?
Have you played with it?
No, I've heard a lot about it, but I haven't played with it yet.
So Lambda is one of these things from AWS, from Amazon, that fits into this serverless architecture.
So basically you say, here's a function.
And when something happens, run this, please.
So run it on a schedule.
Somebody changes a database.
Somebody uploads a file to S3, whatever.
And it just runs.
There's no servers that you deal with.
Obviously there are servers, but like it just distributes your code to run when it needs to i'll cross a whole bunch of servers so it scales
basically infinitely you know as long as you have infinite money you can infinitely scale this it's
fine right and that's that's pretty cool yeah so you just have you tried it no i have not had a real
reason to do it i mean i guess there's a couple of things that I could do.
Like on the websites, there's a job that runs like every couple hours that will completely re-index the database and like reorganize it for super fast queries.
Like the queries on the various websites run, and I'm going to be adding to Python bytes.
No worries.
They, you know, run like sub millisecond, right? In order to get that stuff, you've got to pre-compute some things.
Maybe that's a perfect lambda operation.
Especially now that they have 3.6 support, I'm intrigued enough that I might
give it a shot anyway, just to make up some
excuse to play with it. Exactly. We need to run this. But if you're using
other AWS stuff, like their database services, Dynamo or RDS or S3,
or, like, here's a way to run code, like, really near your resources on triggers with no effort.
And one of the things I thought was pretty cool, like, this announcement just came out.
And Zappa, so Zappa, if you look at their page, which I linked, it's called Serverless Python Web Services.
That's interesting, right?
So basically, you can set it up so that using the AWS architecture, you can route web requests to these Lambda functions.
But you don't really have servers or anything like that.
And people have been asking for Python 3 support. And they've been saying, no, no, no, no.
As soon as this dropped, they're like, yes, it has Python 3 support. So that is pretty cool as well.
So you've seen things that basically are layered on top of Lambda also starting to support Python
3, which is great. Yeah, definitely. Cool. All right. Yeah. Maybe we should play with Lambda.
I don't know. Yeah. Very nice. All right. Well, that's it for the news, Brian. You got any, uh, anything personally you want to share with everyone? No, I'm going to
be, um, I guess, uh, I guess I'll be in the, in the Munich area the second week in May. If there's
anybody around that wants to, wants to have a beer or something with me, hit me up. Yeah. That
sounds awesome. I'm jealous. I'd love to go visit Germany. Well, I'll do that at the end of the,
end of the summer, maybe we'll see, but no, no news for me. I just love to go visit Germany. Well, I'll do that at the end of the summer, maybe. We'll see. But no news for me.
I just want to say thank you, everyone, for listening.
Oh, you know what?
Actually, one more thing.
This is not personal news, but it falls right in here.
I also saw Check.io at checkio.org.
These guys have a pretty cool gamification of learning Python.
They also just went Python 3.
Oh, cool.
So just to keep on this, hey, Python 3 is starting to really
roll. I'd say it's really starting to roll this week. I use Check.io for, hopefully I won't get
in trouble for this, but I've gone through a bunch of this stuff and I use them for interview
questions. Yeah, I think it's actually pretty good. And what I really like is you can solve
a puzzle and then you can look at other people's solutions. And I found after solving a bunch and
looking at the solutions that I unknowingly have an implicit bias towards performance over ease of
reading or simplicity or whatever. And, you know, it was just, it's interesting that it uncovered
that for me. Oh, that's interesting. And I totally have the opposite. I like them to be readable more
than anything else. Yeah. Yeah. Funny, huh? All right. Well, thanks, Brian. Thank you everyone for listening and we'll catch you next week.
And thank you.
Yep.
You're welcome.
Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool. If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.