Python Bytes - #59 Instagram disregards Python's GC (again)
Episode Date: January 5, 2018Topics covered in this episode: gc.freeze() and Copy-on-write friendly Python garbage collection SpeechPy - A Library for Speech Processing and Recognition PyBites Code Challenges: Bites of Py How ...big is the Python Family Dramatiq: simple task processing Controlling Python Async Creep Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/59
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 59, recorded January 4th, 2018.
I'm Michael Kennedy.
And I'm Brian Harkin.
And we got a bunch of awesome stuff lined up for you in this very first episode of 2018.
So, let's say thank you and Happy New Year to DigitalOcean.
Yeah, thanks and definitely Happy New Year. It's exciting to be back.
It's very exciting to be back.
And we, you know, the Python news doesn't stop coming.
I think if anything, it's just picking up speed.
I'm afraid we might scare people a little bit with some of your picks this time, Brian.
What?
The stuff near the end.
The stuff near the end.
So, yeah.
Okay.
Another thing that's kind of scary is turning off garbage collection.
Seems like that might be bad, right?
Right.
Well, I was actually surprised and very interested when I was listening to the Instagram talk at PyCon about turning off garbage collection.
And there's an article that they put out again.
They said that they had turned it off last year, and then they wanted to sort of, they were having memory problems, so they wanted to try to turn it back on a little bit, but they still have concerns.
Yeah, so maybe we should take a moment, just a step back and say, you described the original thing.
So why did they start down this path of turning off garbage collection in the first place?
What they found was they were running many instances of the largest Django deployment on Python in the world.
So they're running lots of servers with us.
And they found that the shared memory across multiple processes running that
on a single server was completely falling apart because garbage collection was
shifting stuff around.
They said,
well,
could we turn it off?
And it turned out that they could,
but they then this article you're referring to, they basically were losing those gains again.
And we'd talked about this, I guess, a couple times of,
if you turn it off, then you can,
eventually will run out.
But if you're restarting tasks every once in a while,
that completely cleans it up.
Yeah, exactly.
They were losing some of those gains,
but they wanted, so they wanted to get some of those back.
This is a really interesting,
and I had to read this article about three times, but it's called copy on right
friendly Python garbage collection. And it's a pretty interesting story, but the end punchline
is that they've got a new addition to Python. That's going to go into Python three, seven,
or it's already in there. That is called gc freeze which what happens is they get their
their main stuff running with all the shared objects but before they like fork off a bunch
of threads they call this gc freeze and all the stuff that's in memory right now at this point
doesn't get garbage collected but everything from now from like this point in time on, will be garbage collected,
which is pretty interesting. Yeah, that's really, it's really interesting. So Python memory management is a little,
I think it's a little obscure. People don't talk about it very much. And I don't think there's a
lot of good write ups. You actually found a really fantastic write up on the intricate details of
Python memory management. The short version is
most things are cleaned up through reference counting. So number of things pointing at it,
when that goes to zero, it goes away. But the problem with reference counting is cycles. I
have one object that points at another, that object points back at the first, they both have
a count of one or higher forever and they get leaked. And so there's this secondary garbage collection phase that goes
through and looks at these items, cleans them up and so on. So this GC freeze says, let's take all
the stuff that exists now and just tell the garbage collector to ignore it. Don't touch it. Don't mess
with it. Leave it alone. Right? And so you get like basically your app into it's like normal
working state and then freeze it one time. And then all the new stuff that would make the memory grow and grow and grow over time is going to be continually GC'd.
But the core essence of your app, Python runtime and a bunch of things to get started should be kind of fixed, right?
Yeah. And I think that's a pretty cool idea because that's a common model for applications to get connections up and, and get your normal,
like sitting state,
idle state running.
And then before you get requests in and,
and spawning stuff,
just at that point,
you're like,
well,
this is all the shared stuff.
Let's just,
we don't need to move this stuff around.
It's always going to be there.
Anyway,
it's a cool idea.
And,
and apparently it saved them.
They were at linear,
linear memory growth and they slowed that down quite a bit.
Yeah, it looks really, really interesting.
Instagram is doing amazing stuff, I think, in the Python space, in the web space.
And if any of those guys are out there listening and want to come talk about Python and Instagram on TalkPython,
they're more than welcome to come over.
It'll be fun.
And I definitely appreciate that.
They're very open about this to say,
Hey,
this is what we're trying.
It's not like perfect yet,
but it's better.
Yeah.
It's super cool.
Do you know if GC freeze is approved or just proposed for three,
seven?
So we have a link to the,
the pull request that looks like it's already in.
Oh,
it is merged.
Yes,
it is merging.
So this is pretty
awesome, right? We have CPython on GitHub with a pull request merged in with its comment history.
That's new, right? That's the 2017 bit of magic that it's on GitHub. Yeah. Yeah, cool. So nice
that we can actually track that. So the next thing that I want to talk about is a little bit
different. I think this will be mostly of interest for data science folks. This is a little bit lower level maybe than it sounds, but this thing's
called SpeechPy. So SpeechPy, it's a library for speech processing and recognition. So this is a
pretty interesting Python project. You can come along and basically give it some, you know, spoken words, and it can pull out
various effects and things that are sort of the essence of what you need to do speech recognition.
I think this works a little, you don't just feed it like here's, say a WAV file, and out pops text
of what it said, but it gives you what you would need to feed to a machine learning system,
basically takes the spoken words into a representation you can feed to some kind
of algorithm to actually get the text. So I think that was pretty cool. And one of the things that I
wanted to bring this up for is they have a really nice citation statement. So if you look at the
GitHub repo, like kind of near the top, it says, if you're going to use this package,
please cite it as follows. And that's interesting, because there's been some talk in the scientific
space, more true science, not data science around people want to publish their software,
they want to work on advancing software, but in the academic space, you have to publish
articles, or you articles or the whole publish
or perish type of thing. And the way you get credit for your work is to be cited in other
articles. And so this is sort of showing a way to cite this work, which is not a paper, but which
is an open source project in the same sense that the person, the people who created it might get the same
level of academic credit for their thing being cited. So I think that's pretty cool.
Yeah. I don't get the syntax, but...
It must mean something. I have no idea what it is.
Okay.
I thought it's kind of neat. If you're doing machine learning, you need to turn
waveforms into something you can process. This is pretty cool. And the other thing that's kind
of nice is if you look at it here, and I think it's in the documentation or the tutorial,
they actually show you how to process wave files from SciPy, which is also maybe cool and handy at
some point. Yeah, it's actually something I need to be doing some wave file processing.
Well, SciPy apparently has it. it nice how about the next one well next up we've got
our friends at um pie bites is that what they're called pie bites yeah pie bites that's right
they've got a new platform and i suddenly forgot the url but it's there it is uh code challenges, but the ES is after the dot. So code challenge.es.
No, clever though.
But we've covered other things before.
Like there's a, I should have looked this up.
There's a game one that's,
they're like going through a game and doing code challenges
and there's code katas around.
This is a similar sort of thing.
So you are able to do these little code challenges and
they say, it's called bytes of Python, bytes of pie and are their self-contained 20 to 60 minute
code challenges. And you can write them and verify them in the browser. And I had, I did two of them
this morning and I had kind of a lot of fun with it. It was fun. Nice. And you verify them by writing PyTest unit test, right?
You don't write it.
It has pre-written PyTest code that checks your answers.
I see.
So you've got to do some sort of thing and then you check it in and it runs basically
the test against your code and says thumbs up, thumbs down.
Yeah.
Like for instance, on the second challenge, you have to write three different functions
to manipulate a list of names.
And it has tests for all of these.
I went ahead and just solved one at a time, for instance.
So I tried to solve the first one and then ran the test and noticed that the first one
passed and then just did that.
And looking at that with the help of the test output helped me solve the rest of them.
That's really cool.
And I also learned something by the transitive property through you.
You did?
I did.
I learned what you learned
in that min takes a key,
like sort and sorted does.
That way you could sort some complex object
based on like a attribute of it.
I didn't know that.
I had just discovered that this morning.
So my solution for one of the challenges
is to try to find the name with the shortest first name.
And I went ahead and sorted the list by the length of the first name and then just picked the first element.
Their solution uses min instead of sorting the list.
You can just find the min length, which is pretty cool.
Yeah, that's really awesome.
That's got to be quicker than a full on sort. One of the things I like about these sorts of
quick challenges is you can probably do them like on your lunch break or a couple of lunch breaks
to do one of them. And, uh, they just take a browser so you could just do it on your laptop.
It's pretty fun. Yep. That's cool. You could maybe even do it on an iPad or something if
you really wanted. Yeah. Well, I don't know. I haven't tried that probably if it runs in the
browser, I bet it would. Nice. So yeah, that's really cool. I do like that you
learn these little things like, wait, Min takes a key? I didn't know that. You know, that's just,
you wouldn't think you'd pick up these little things so quickly, but you know, these little
challenges are nice like that. So before we get to the next item, I want to say thank you to
DigitalOcean. They're sponsoring this episode and many, many other episodes. They're really a big supporter of Python Bytes. So as many of you know, many of our bits of code,
our stuff on the web, and our files or mp3 files that get sent down to you all go through Digital
Ocean. So Python Bytes is basically delivered in all of its forms to you through Digital Ocean,
have a bunch of servers there. They're super easy to work with, very quick, very reliable. You can create a new server,
a new droplet, they call it, in probably 30 seconds. And then you SSH in and you're off to
the races. So really, really nice and affordable. And check them out at do.co.python and let them
know that you heard about it on Python Bytes. So this end of the year thing, Brian, this is kind of when,
I mean, we're sort of on the other side of it, but this is when you get together with your family,
right? People maybe you didn't even know, like, wait, I have a second cousin from where?
Python's like that, right? Yeah. Yeah. You were talking about like, what is the place where you
can like do sort of gamified code challenges and that's Check.io. So the reason that i'm coming back to it is there's an article by the guys at check io called how big
is the python family so this is really nice and you know some of you i'm sure are aware of it but
many people i don't really think are aware of how varied python is as it's sort of as a platform
so when you say python typically you mean Python, typically you mean CPython.
Hopefully you mean modern Python 3.6,
not legacy 2.7 Python,
but we'll let that slide for now.
There's also things like Jython and Jython will let you write Python code,
but execute it on the JVM and interact with Java objects.
Iron Python is the same thing for.NET.
There's also Python for.NET,
which I think is a more up-to-date,
modern variant on the same thing.
There's Cython, which is compiled,
slightly different Python.
There's PyPy, which is a JIT version.
MicroPython, which is Python as an,
your app is an operating system in Python
on microchips, basically.
And on talk Python,
you and I talked about grumpy,
right?
Yeah.
Which is on go.
Yeah.
So grumpy is from the YouTube guys,
which is instead of using C to implement C Python,
they said,
well,
what if we wrote the same thing,
but in go?
And that's kind of an interesting thing.
So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of them.
You know, if you're bringing people onto your team and you're like, well, wait a minute, there's actually a lot of types of Python.
Here, check this out, right?
And also maybe a reminder to, like, give PyPy a try.
Like, they just had a big release for both Python 2 and Python 3 versions.
One of the things I like about this write-up that they did is it reminds you why some of these are around.
Like if you had to work with.NET,
then working with like IronPython or Python.NET
might be like a better thing
than just trying to do it other ways.
Yeah, and one of the advantages there might be,
you know, if you're working on a.NET app,
but you want to add scripting.
Yeah.
Like what are your choices? You probably don't want to give them C Sharp. And even if you did, if you're working on a.NET app, but you want to add scripting. Like, what are your choices?
You probably don't want to give them C sharp.
And even if you did, like it requires full on compilation and like, you know, how do
you deal with that?
Right.
So this could be a really nice way to plug in like scriptability into your enterprise
app, which would be pretty cool.
And one more thing I wanted to throw in on this conversation is a lot of times I'll say
Python runtime.
And I know often people say Python interpreter.
This is what the Python interpreter does. It does this and that. Well, if you look at how the whole
Python family, only some of them are interpreters. Some of them are compiled execution engines,
right? Like the JVM. That's actually not a great example. But say PyPy, for example, or Cython, those two definitely are not interpreted in the traditional sense.
PyPy starts out that way, but it converts to a JIT version for the hotspots.
I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really just got the mindset of CPython, which is the most popular, but not always.
What do you say?
Say interpreter? I don't usually say either. I just say Python. Yeah, there you go. Cool. So
anyway, I think this is a nice write up and good to have it all in one place. So I like the one
that you have coming up next. One of the problems I often see is I want to do some work, but I don't
care if it happens right now. I just want to like start it and let it go somewhere. I don't usually have a great answer for that.
Task processing stuff. And one of the common things is often people bring up is Celery.
And to be honest, I've tried to get into Celery a couple of times,
but kind of the learning curve on it, maybe it's just me, but I had a little bit of trouble
getting into it. I was interested when I heard an interview on podcast.init
about a library called Dramatic, or Dramatic, I'm not sure.
It's D-R-A-M-A-T-I-Q.
But it's a very, I'm sure, since it's task scheduling,
it's quite complicated internals, I'm sure.
You just declare an actor on some code,
and it's pretty easy to get started.
I thought I'd point people to it. Yeah, it's quite cool. You basically put a decorator onto a method
and then that method, instead of running locally, you can like send work to it. And that send work
actually kicks it off on the example they had was rabbit MQ, I think. And that there's like a
producer of the work. And then there's another process that just hangs out
and consumes anything that lands on the queue.
It's pretty cool.
Yeah, so that you can configure
like what your defaults to RabbitMQ, I think.
And there's just good defaults that work right off the,
just if you don't care.
And then there's, you can configure it
to use other things if you need to.
It apparently is, well, the person and during i forget his name that developed this it's used on quite significant
projects i mean it isn't a toy project but it's pretty easy to get started and you can configure
it to be all sorts of fancy stuff if you need it to be. But one of the things I liked about the conversation is he
brought up that he intentionally kept the documentation and the fairly terse and small
so that when you're looking for something that you think you saw before, it's pretty easy to
find again. So that's cool. Okay. Yeah. That's an interesting point. Yeah. And it looks like
you can run it on top of RabbitMQ or Redis. Take your pick.
One final thing I want to point out that I thought was interesting is it's licensed under
AGPL, but it also has commercial licenses available upon request, which people are
always looking for ways to basically fund their open source work. And I thought that was an
interesting variation that I saw going through it. Really?
Okay, so I didn't pay attention to that.
So I'm not sure what the AGPL is.
Yeah, I actually don't know either.
But apparently you might want a commercial license instead.
Okay, so the last one I want to talk about
is a little bit similar to what you're talking about
running async work.
But it's sort of the challenge
of taking advantage of async work, but it's sort of the challenge of taking advantage of async things, but not making
that a problem for people trying to consume it who don't want to think of things that way.
So this article is called Controlling Python Async Creep from friend of the show, Kristen Medina.
And he says, basically, if you've got some library that is written in an async way,
you're supposed to await it, but anybody who's going to call that and take advantage of that,
that caller has to also be async, and then the caller, that has to be async, so maybe way,
way down somewhere, you're trying to do something async, and it creates this sort of chain reaction
of, well, the callers of this have to be async. Well, the caller of those things have to be async and so on. It becomes, it can become quite a problem.
So he wrote this nice article basically going through three examples of where you can sort of
put a stop gap and say, okay, like at this level, we're no longer worried about async,
but we're still taking advantages of it internally. So one way you can do that is you can
wait for blocks of async code. So if you
got to contact, you know, a database, two web services, read something from the file system,
you want to do that sort of asynchronously, you could create those pieces of work, but then wait
on them as a group. And there's some built in ways and async IO how to do that, which is really cool.
It's got some nice examples on that. So you could just use a thread and then let that thread's main bit of work be the async thing, but you don't have to deal with it.
And the most interesting, I think, as an async function or as a regular function and
implements an async behavior or a synchronous behavior the same. So you could write a single
library and if somebody in Python 3.6 wants to use it in a fancy async way, it becomes magically
async. But if somebody from 2.7 calls it or something like that, an older version, or they
just don't call it in this async way, it just magically is a synchronous call and doesn't use that whole stuff.
Okay.
This is really an interesting way to make it possible to bring async into your package or your libraries without having the consumer of your libraries have to care about the fact that it's async.
But still make it into something they could take advantage of.
Oh, that's great.
I'm going to have to read this.
This reminds me of the, I guess,
the learning hurdle that people go through in the C++,
C and C++ world when you go from single-threaded applications to multi-threaded applications.
You have to look in all the corners.
Yeah.
It's definitely a mind shift.
Yeah, this is very much like that.
Okay.
But yeah, Christian did a great job on this,
and I really like his solution at the end. actually he has it done in if statements i feel like you could
create a decorator that would basically wrap that up and just like a magic like a syncable or a
waitable decorator uh it's really really close to having some sort of decorator magic making this
even better yeah okay cool all right well that's all our news for the week except for that it's not
well yeah we have an extra one really quick i just want to let people know that the pie Yeah. Okay, cool. All right. Well, that's all our news for the week, except for that it's not. Well, yeah.
We have an extra one. Really quick, I just want to let people know that the Pi Tennessee
Conference in Nashville is coming up almost a month from now. So if you are in the Nashville
area or willing to travel there, February 10th and 11th, they've got their schedule out,
the tickets are on sale and things like that. And they even made a special discount code for Python Bytes.
If we, you know, said, are you going to tell us about it?
Then definitely, here's the code.
So if you want to go to Python C,
you can use the discount code Python Bytes,
no spaces, capital P, capital B, and you get 10% off.
Cool.
Yeah, very cool.
You have some pretty interesting news.
It's not directly Python related, but it is very much affects all of us. Yeah. Right. Codes on server, especially
in the cloud. I thought I don't know what to do about this, but I saw it this morning. I thought
we just it's important enough to not ignore it. So I thought I'd drop a link. What do you think?
Like unplug all of the Internet, just go hide in a corner or something like that? It's like one of
those things like having the credit services get hacked.
You just, I guess, be aware of it and pay attention.
It's very much like the Experian.
What was that credit service?
Equifax, maybe?
Equifax.
I'm not going to say it because I don't want to say the wrong one.
But the e-credit agency, I totally, for some reason, forgetting.
I think you're right.
But yeah, basically you're told your world is
crashing down we're sorry moving on now and this is kind of like that let me read from what you
quote a couple articles let me read what they said in the new york times here it said basically
there's two problems called meltdown and spectra could allow hackers to steal the entire memory
contents of computers including mobile devices personal, and servers running in so-called cloud computer networks. There's no easy fix for Spectre, which could
require a redesign of the processors, according to researchers. As for Meltdown, the software
patch needed to fix the issue could slow down computers by as much as 30%. So, you know, your
AWS, DigitalOcean, whatever, server may just get 30% slower now.
Wonderful.
Yeah.
So most of the places, I think Google, Amazon, and Microsoft have all said that the servers
are pretty much changed to deal with meltdown, but Spectre is still a problem.
I don't think there's a ton of concrete details here, at least not that I ran across.
It's sort of vague.
Apparently, not all the details about the exploit are out.
But I'd recommend people check out risky.biz, which is my favorite developer security podcast.
It's super, super good.
And those guys are going to definitely have an insightful conversation on this next time they're on deck.
In case we were too vague about it, it was a design flaw found in all microprocessors that allow attackers to read the entire memory of a computer.
Yeah.
Bummer.
I hope you don't do anything on the internet.
Carry on now.
Okay.
So, yeah.
So the last thing, this is a more positive thing.
I think of it at least. I just announced all my courses, not all of them actually, only a few of
them for 2018, but I announced this new deal that I'm having for all the TalkPython courses called
the Everything Bundle. So TalkPython.fm slash everything. And it gets you, it'll be probably 120 hours
of Python course awesomeness,
including some new ones,
Mastering PyCharm,
Python 3 and Illustrated Tour,
Introduction to Ansible
and tons more coming.
So I was just finishing
some of the videos
for the PyCharm course
right before we chatted.
So it's almost done.
So is that going to be out
this month then or soon?
That is going to be out
probably next week. Okay, cool. Definitely soon. Definitely soon. So is that going to be out this month then or soon? That is going to be out probably next week.
Okay, cool.
Definitely soon.
Definitely soon.
It's so fun to create these courses and just, you know, keep exploring the different areas
and helping people get better with them.
So lots of fun.
Yeah.
And you do things like working with companies if they want to like get access to these for
like everybody that works there or a handful of people.
I definitely have special programs for like site licenses, things like that. I've even
talked to some universities about having the courses for like all of their students or
something like that. That would be wild. Still talking.
You'll have to increase the price for them, I guess, maybe.
I guess. But they're students, you know.
Cool.
All right, cool. Well, Brian, thanks for sharing all your news.
Yeah, thank you.
Nice to be back together after the whole holiday time off.
Yes.
All right, catch you later.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.