Python Bytes - #59 Instagram disregards Python's GC (again)

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 59, recorded January 4th, 2018. I'm Michael Kennedy. And I'm Brian Harkin. And we got a bunch of awesome stuff lined up for you in this very first episode of 2018. So, let's say thank you and Happy New Year to DigitalOcean. Yeah, thanks and definitely Happy New Year. It's exciting to be back. It's very exciting to be back.

Starting point is 00:00:27 And we, you know, the Python news doesn't stop coming. I think if anything, it's just picking up speed. I'm afraid we might scare people a little bit with some of your picks this time, Brian. What? The stuff near the end. The stuff near the end. So, yeah. Okay.

Starting point is 00:00:42 Another thing that's kind of scary is turning off garbage collection. Seems like that might be bad, right? Right. Well, I was actually surprised and very interested when I was listening to the Instagram talk at PyCon about turning off garbage collection. And there's an article that they put out again. They said that they had turned it off last year, and then they wanted to sort of, they were having memory problems, so they wanted to try to turn it back on a little bit, but they still have concerns. Yeah, so maybe we should take a moment, just a step back and say, you described the original thing. So why did they start down this path of turning off garbage collection in the first place?

Starting point is 00:01:19 What they found was they were running many instances of the largest Django deployment on Python in the world. So they're running lots of servers with us. And they found that the shared memory across multiple processes running that on a single server was completely falling apart because garbage collection was shifting stuff around. They said, well, could we turn it off?

Starting point is 00:01:40 And it turned out that they could, but they then this article you're referring to, they basically were losing those gains again. And we'd talked about this, I guess, a couple times of, if you turn it off, then you can, eventually will run out. But if you're restarting tasks every once in a while, that completely cleans it up. Yeah, exactly.

Starting point is 00:01:58 They were losing some of those gains, but they wanted, so they wanted to get some of those back. This is a really interesting, and I had to read this article about three times, but it's called copy on right friendly Python garbage collection. And it's a pretty interesting story, but the end punchline is that they've got a new addition to Python. That's going to go into Python three, seven, or it's already in there. That is called gc freeze which what happens is they get their their main stuff running with all the shared objects but before they like fork off a bunch

Starting point is 00:02:33 of threads they call this gc freeze and all the stuff that's in memory right now at this point doesn't get garbage collected but everything from now from like this point in time on, will be garbage collected, which is pretty interesting. Yeah, that's really, it's really interesting. So Python memory management is a little, I think it's a little obscure. People don't talk about it very much. And I don't think there's a lot of good write ups. You actually found a really fantastic write up on the intricate details of Python memory management. The short version is most things are cleaned up through reference counting. So number of things pointing at it, when that goes to zero, it goes away. But the problem with reference counting is cycles. I

Starting point is 00:03:15 have one object that points at another, that object points back at the first, they both have a count of one or higher forever and they get leaked. And so there's this secondary garbage collection phase that goes through and looks at these items, cleans them up and so on. So this GC freeze says, let's take all the stuff that exists now and just tell the garbage collector to ignore it. Don't touch it. Don't mess with it. Leave it alone. Right? And so you get like basically your app into it's like normal working state and then freeze it one time. And then all the new stuff that would make the memory grow and grow and grow over time is going to be continually GC'd. But the core essence of your app, Python runtime and a bunch of things to get started should be kind of fixed, right? Yeah. And I think that's a pretty cool idea because that's a common model for applications to get connections up and, and get your normal,

Starting point is 00:04:05 like sitting state, idle state running. And then before you get requests in and, and spawning stuff, just at that point, you're like, well, this is all the shared stuff.

Starting point is 00:04:15 Let's just, we don't need to move this stuff around. It's always going to be there. Anyway, it's a cool idea. And, and apparently it saved them. They were at linear,

Starting point is 00:04:24 linear memory growth and they slowed that down quite a bit. Yeah, it looks really, really interesting. Instagram is doing amazing stuff, I think, in the Python space, in the web space. And if any of those guys are out there listening and want to come talk about Python and Instagram on TalkPython, they're more than welcome to come over. It'll be fun. And I definitely appreciate that. They're very open about this to say,

Starting point is 00:04:47 Hey, this is what we're trying. It's not like perfect yet, but it's better. Yeah. It's super cool. Do you know if GC freeze is approved or just proposed for three, seven?

Starting point is 00:04:57 So we have a link to the, the pull request that looks like it's already in. Oh, it is merged. Yes, it is merging. So this is pretty awesome, right? We have CPython on GitHub with a pull request merged in with its comment history.

Starting point is 00:05:11 That's new, right? That's the 2017 bit of magic that it's on GitHub. Yeah. Yeah, cool. So nice that we can actually track that. So the next thing that I want to talk about is a little bit different. I think this will be mostly of interest for data science folks. This is a little bit lower level maybe than it sounds, but this thing's called SpeechPy. So SpeechPy, it's a library for speech processing and recognition. So this is a pretty interesting Python project. You can come along and basically give it some, you know, spoken words, and it can pull out various effects and things that are sort of the essence of what you need to do speech recognition. I think this works a little, you don't just feed it like here's, say a WAV file, and out pops text of what it said, but it gives you what you would need to feed to a machine learning system,

Starting point is 00:06:05 basically takes the spoken words into a representation you can feed to some kind of algorithm to actually get the text. So I think that was pretty cool. And one of the things that I wanted to bring this up for is they have a really nice citation statement. So if you look at the GitHub repo, like kind of near the top, it says, if you're going to use this package, please cite it as follows. And that's interesting, because there's been some talk in the scientific space, more true science, not data science around people want to publish their software, they want to work on advancing software, but in the academic space, you have to publish articles, or you articles or the whole publish

Starting point is 00:06:45 or perish type of thing. And the way you get credit for your work is to be cited in other articles. And so this is sort of showing a way to cite this work, which is not a paper, but which is an open source project in the same sense that the person, the people who created it might get the same level of academic credit for their thing being cited. So I think that's pretty cool. Yeah. I don't get the syntax, but... It must mean something. I have no idea what it is. Okay. I thought it's kind of neat. If you're doing machine learning, you need to turn

Starting point is 00:07:20 waveforms into something you can process. This is pretty cool. And the other thing that's kind of nice is if you look at it here, and I think it's in the documentation or the tutorial, they actually show you how to process wave files from SciPy, which is also maybe cool and handy at some point. Yeah, it's actually something I need to be doing some wave file processing. Well, SciPy apparently has it. it nice how about the next one well next up we've got our friends at um pie bites is that what they're called pie bites yeah pie bites that's right they've got a new platform and i suddenly forgot the url but it's there it is uh code challenges, but the ES is after the dot. So code challenge.es. No, clever though.

Starting point is 00:08:08 But we've covered other things before. Like there's a, I should have looked this up. There's a game one that's, they're like going through a game and doing code challenges and there's code katas around. This is a similar sort of thing. So you are able to do these little code challenges and they say, it's called bytes of Python, bytes of pie and are their self-contained 20 to 60 minute

Starting point is 00:08:32 code challenges. And you can write them and verify them in the browser. And I had, I did two of them this morning and I had kind of a lot of fun with it. It was fun. Nice. And you verify them by writing PyTest unit test, right? You don't write it. It has pre-written PyTest code that checks your answers. I see. So you've got to do some sort of thing and then you check it in and it runs basically the test against your code and says thumbs up, thumbs down. Yeah.

Starting point is 00:08:57 Like for instance, on the second challenge, you have to write three different functions to manipulate a list of names. And it has tests for all of these. I went ahead and just solved one at a time, for instance. So I tried to solve the first one and then ran the test and noticed that the first one passed and then just did that. And looking at that with the help of the test output helped me solve the rest of them. That's really cool.

Starting point is 00:09:22 And I also learned something by the transitive property through you. You did? I did. I learned what you learned in that min takes a key, like sort and sorted does. That way you could sort some complex object based on like a attribute of it.

Starting point is 00:09:36 I didn't know that. I had just discovered that this morning. So my solution for one of the challenges is to try to find the name with the shortest first name. And I went ahead and sorted the list by the length of the first name and then just picked the first element. Their solution uses min instead of sorting the list. You can just find the min length, which is pretty cool. Yeah, that's really awesome.

Starting point is 00:10:03 That's got to be quicker than a full on sort. One of the things I like about these sorts of quick challenges is you can probably do them like on your lunch break or a couple of lunch breaks to do one of them. And, uh, they just take a browser so you could just do it on your laptop. It's pretty fun. Yep. That's cool. You could maybe even do it on an iPad or something if you really wanted. Yeah. Well, I don't know. I haven't tried that probably if it runs in the browser, I bet it would. Nice. So yeah, that's really cool. I do like that you learn these little things like, wait, Min takes a key? I didn't know that. You know, that's just, you wouldn't think you'd pick up these little things so quickly, but you know, these little

Starting point is 00:10:36 challenges are nice like that. So before we get to the next item, I want to say thank you to DigitalOcean. They're sponsoring this episode and many, many other episodes. They're really a big supporter of Python Bytes. So as many of you know, many of our bits of code, our stuff on the web, and our files or mp3 files that get sent down to you all go through Digital Ocean. So Python Bytes is basically delivered in all of its forms to you through Digital Ocean, have a bunch of servers there. They're super easy to work with, very quick, very reliable. You can create a new server, a new droplet, they call it, in probably 30 seconds. And then you SSH in and you're off to the races. So really, really nice and affordable. And check them out at do.co.python and let them know that you heard about it on Python Bytes. So this end of the year thing, Brian, this is kind of when,

Starting point is 00:11:26 I mean, we're sort of on the other side of it, but this is when you get together with your family, right? People maybe you didn't even know, like, wait, I have a second cousin from where? Python's like that, right? Yeah. Yeah. You were talking about like, what is the place where you can like do sort of gamified code challenges and that's Check.io. So the reason that i'm coming back to it is there's an article by the guys at check io called how big is the python family so this is really nice and you know some of you i'm sure are aware of it but many people i don't really think are aware of how varied python is as it's sort of as a platform so when you say python typically you mean Python, typically you mean CPython. Hopefully you mean modern Python 3.6,

Starting point is 00:12:10 not legacy 2.7 Python, but we'll let that slide for now. There's also things like Jython and Jython will let you write Python code, but execute it on the JVM and interact with Java objects. Iron Python is the same thing for.NET. There's also Python for.NET, which I think is a more up-to-date, modern variant on the same thing.

Starting point is 00:12:33 There's Cython, which is compiled, slightly different Python. There's PyPy, which is a JIT version. MicroPython, which is Python as an, your app is an operating system in Python on microchips, basically. And on talk Python, you and I talked about grumpy,

Starting point is 00:12:49 right? Yeah. Which is on go. Yeah. So grumpy is from the YouTube guys, which is instead of using C to implement C Python, they said, well,

Starting point is 00:12:57 what if we wrote the same thing, but in go? And that's kind of an interesting thing. So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of them. You know, if you're bringing people onto your team and you're like, well, wait a minute, there's actually a lot of types of Python. Here, check this out, right? And also maybe a reminder to, like, give PyPy a try. Like, they just had a big release for both Python 2 and Python 3 versions.

Starting point is 00:13:20 One of the things I like about this write-up that they did is it reminds you why some of these are around. Like if you had to work with.NET, then working with like IronPython or Python.NET might be like a better thing than just trying to do it other ways. Yeah, and one of the advantages there might be, you know, if you're working on a.NET app, but you want to add scripting.

Starting point is 00:13:43 Yeah. Like what are your choices? You probably don't want to give them C Sharp. And even if you did, if you're working on a.NET app, but you want to add scripting. Like, what are your choices? You probably don't want to give them C sharp. And even if you did, like it requires full on compilation and like, you know, how do you deal with that? Right. So this could be a really nice way to plug in like scriptability into your enterprise app, which would be pretty cool.

Starting point is 00:13:57 And one more thing I wanted to throw in on this conversation is a lot of times I'll say Python runtime. And I know often people say Python interpreter. This is what the Python interpreter does. It does this and that. Well, if you look at how the whole Python family, only some of them are interpreters. Some of them are compiled execution engines, right? Like the JVM. That's actually not a great example. But say PyPy, for example, or Cython, those two definitely are not interpreted in the traditional sense. PyPy starts out that way, but it converts to a JIT version for the hotspots. I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really just got the mindset of CPython, which is the most popular, but not always.

Starting point is 00:14:44 What do you say? Say interpreter? I don't usually say either. I just say Python. Yeah, there you go. Cool. So anyway, I think this is a nice write up and good to have it all in one place. So I like the one that you have coming up next. One of the problems I often see is I want to do some work, but I don't care if it happens right now. I just want to like start it and let it go somewhere. I don't usually have a great answer for that. Task processing stuff. And one of the common things is often people bring up is Celery. And to be honest, I've tried to get into Celery a couple of times, but kind of the learning curve on it, maybe it's just me, but I had a little bit of trouble

Starting point is 00:15:21 getting into it. I was interested when I heard an interview on podcast.init about a library called Dramatic, or Dramatic, I'm not sure. It's D-R-A-M-A-T-I-Q. But it's a very, I'm sure, since it's task scheduling, it's quite complicated internals, I'm sure. You just declare an actor on some code, and it's pretty easy to get started. I thought I'd point people to it. Yeah, it's quite cool. You basically put a decorator onto a method

Starting point is 00:15:51 and then that method, instead of running locally, you can like send work to it. And that send work actually kicks it off on the example they had was rabbit MQ, I think. And that there's like a producer of the work. And then there's another process that just hangs out and consumes anything that lands on the queue. It's pretty cool. Yeah, so that you can configure like what your defaults to RabbitMQ, I think. And there's just good defaults that work right off the,

Starting point is 00:16:18 just if you don't care. And then there's, you can configure it to use other things if you need to. It apparently is, well, the person and during i forget his name that developed this it's used on quite significant projects i mean it isn't a toy project but it's pretty easy to get started and you can configure it to be all sorts of fancy stuff if you need it to be. But one of the things I liked about the conversation is he brought up that he intentionally kept the documentation and the fairly terse and small so that when you're looking for something that you think you saw before, it's pretty easy to

Starting point is 00:16:57 find again. So that's cool. Okay. Yeah. That's an interesting point. Yeah. And it looks like you can run it on top of RabbitMQ or Redis. Take your pick. One final thing I want to point out that I thought was interesting is it's licensed under AGPL, but it also has commercial licenses available upon request, which people are always looking for ways to basically fund their open source work. And I thought that was an interesting variation that I saw going through it. Really? Okay, so I didn't pay attention to that. So I'm not sure what the AGPL is.

Starting point is 00:17:29 Yeah, I actually don't know either. But apparently you might want a commercial license instead. Okay, so the last one I want to talk about is a little bit similar to what you're talking about running async work. But it's sort of the challenge of taking advantage of async work, but it's sort of the challenge of taking advantage of async things, but not making that a problem for people trying to consume it who don't want to think of things that way.

Starting point is 00:17:54 So this article is called Controlling Python Async Creep from friend of the show, Kristen Medina. And he says, basically, if you've got some library that is written in an async way, you're supposed to await it, but anybody who's going to call that and take advantage of that, that caller has to also be async, and then the caller, that has to be async, so maybe way, way down somewhere, you're trying to do something async, and it creates this sort of chain reaction of, well, the callers of this have to be async. Well, the caller of those things have to be async and so on. It becomes, it can become quite a problem. So he wrote this nice article basically going through three examples of where you can sort of put a stop gap and say, okay, like at this level, we're no longer worried about async,

Starting point is 00:18:39 but we're still taking advantages of it internally. So one way you can do that is you can wait for blocks of async code. So if you got to contact, you know, a database, two web services, read something from the file system, you want to do that sort of asynchronously, you could create those pieces of work, but then wait on them as a group. And there's some built in ways and async IO how to do that, which is really cool. It's got some nice examples on that. So you could just use a thread and then let that thread's main bit of work be the async thing, but you don't have to deal with it. And the most interesting, I think, as an async function or as a regular function and implements an async behavior or a synchronous behavior the same. So you could write a single

Starting point is 00:19:33 library and if somebody in Python 3.6 wants to use it in a fancy async way, it becomes magically async. But if somebody from 2.7 calls it or something like that, an older version, or they just don't call it in this async way, it just magically is a synchronous call and doesn't use that whole stuff. Okay. This is really an interesting way to make it possible to bring async into your package or your libraries without having the consumer of your libraries have to care about the fact that it's async. But still make it into something they could take advantage of. Oh, that's great. I'm going to have to read this.

Starting point is 00:20:07 This reminds me of the, I guess, the learning hurdle that people go through in the C++, C and C++ world when you go from single-threaded applications to multi-threaded applications. You have to look in all the corners. Yeah. It's definitely a mind shift. Yeah, this is very much like that. Okay.

Starting point is 00:20:21 But yeah, Christian did a great job on this, and I really like his solution at the end. actually he has it done in if statements i feel like you could create a decorator that would basically wrap that up and just like a magic like a syncable or a waitable decorator uh it's really really close to having some sort of decorator magic making this even better yeah okay cool all right well that's all our news for the week except for that it's not well yeah we have an extra one really quick i just want to let people know that the pie Yeah. Okay, cool. All right. Well, that's all our news for the week, except for that it's not. Well, yeah. We have an extra one. Really quick, I just want to let people know that the Pi Tennessee Conference in Nashville is coming up almost a month from now. So if you are in the Nashville

Starting point is 00:20:55 area or willing to travel there, February 10th and 11th, they've got their schedule out, the tickets are on sale and things like that. And they even made a special discount code for Python Bytes. If we, you know, said, are you going to tell us about it? Then definitely, here's the code. So if you want to go to Python C, you can use the discount code Python Bytes, no spaces, capital P, capital B, and you get 10% off. Cool.

Starting point is 00:21:21 Yeah, very cool. You have some pretty interesting news. It's not directly Python related, but it is very much affects all of us. Yeah. Right. Codes on server, especially in the cloud. I thought I don't know what to do about this, but I saw it this morning. I thought we just it's important enough to not ignore it. So I thought I'd drop a link. What do you think? Like unplug all of the Internet, just go hide in a corner or something like that? It's like one of those things like having the credit services get hacked. You just, I guess, be aware of it and pay attention.

Starting point is 00:21:49 It's very much like the Experian. What was that credit service? Equifax, maybe? Equifax. I'm not going to say it because I don't want to say the wrong one. But the e-credit agency, I totally, for some reason, forgetting. I think you're right. But yeah, basically you're told your world is

Starting point is 00:22:05 crashing down we're sorry moving on now and this is kind of like that let me read from what you quote a couple articles let me read what they said in the new york times here it said basically there's two problems called meltdown and spectra could allow hackers to steal the entire memory contents of computers including mobile devices personal, and servers running in so-called cloud computer networks. There's no easy fix for Spectre, which could require a redesign of the processors, according to researchers. As for Meltdown, the software patch needed to fix the issue could slow down computers by as much as 30%. So, you know, your AWS, DigitalOcean, whatever, server may just get 30% slower now. Wonderful.

Starting point is 00:22:46 Yeah. So most of the places, I think Google, Amazon, and Microsoft have all said that the servers are pretty much changed to deal with meltdown, but Spectre is still a problem. I don't think there's a ton of concrete details here, at least not that I ran across. It's sort of vague. Apparently, not all the details about the exploit are out. But I'd recommend people check out risky.biz, which is my favorite developer security podcast. It's super, super good.

Starting point is 00:23:19 And those guys are going to definitely have an insightful conversation on this next time they're on deck. In case we were too vague about it, it was a design flaw found in all microprocessors that allow attackers to read the entire memory of a computer. Yeah. Bummer. I hope you don't do anything on the internet. Carry on now. Okay. So, yeah.

Starting point is 00:23:44 So the last thing, this is a more positive thing. I think of it at least. I just announced all my courses, not all of them actually, only a few of them for 2018, but I announced this new deal that I'm having for all the TalkPython courses called the Everything Bundle. So TalkPython.fm slash everything. And it gets you, it'll be probably 120 hours of Python course awesomeness, including some new ones, Mastering PyCharm, Python 3 and Illustrated Tour,

Starting point is 00:24:12 Introduction to Ansible and tons more coming. So I was just finishing some of the videos for the PyCharm course right before we chatted. So it's almost done. So is that going to be out

Starting point is 00:24:22 this month then or soon? That is going to be out probably next week. Okay, cool. Definitely soon. Definitely soon. So is that going to be out this month then or soon? That is going to be out probably next week. Okay, cool. Definitely soon. Definitely soon. It's so fun to create these courses and just, you know, keep exploring the different areas and helping people get better with them.

Starting point is 00:24:34 So lots of fun. Yeah. And you do things like working with companies if they want to like get access to these for like everybody that works there or a handful of people. I definitely have special programs for like site licenses, things like that. I've even talked to some universities about having the courses for like all of their students or something like that. That would be wild. Still talking. You'll have to increase the price for them, I guess, maybe.

Starting point is 00:24:59 I guess. But they're students, you know. Cool. All right, cool. Well, Brian, thanks for sharing all your news. Yeah, thank you. Nice to be back together after the whole holiday time off. Yes. All right, catch you later. Thank you for listening to Python Bytes.

Starting point is 00:25:15 Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Your Ad Here

Python Bytes - #59 Instagram disregards Python's GC (again)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.