Python Bytes - #149 Python's small object allocator and other memory features

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 149, recorded September 18th, 2019. I'm Michael Kennedy. And I am Brian Ocken. And this episode is brought to you by Datadog. I'll tell you more about them later. Brian, this first item that you have here, it actually sparked some philosophical sort of challenge to my way of seeing the world here. So why don't you run it by and I'll tell you about my problem. Maybe you can help me through it.

Starting point is 00:00:29 I'm curious about this now. I'm pretty sure we've covered this before, but Dropbox is kind of behind a lot of the push to do different type checking or type hinting and checking those type hints within Python. The MyPy project is, I think, spearheaded by Dropbox. Yes. There's an article that they put out called Our Journey to Type Checking 4 Million Lines of Python. Wow, 4 million lines. That's a big code base.

Starting point is 00:00:57 That's a lot of Python. Yeah. I wonder how much of it's interconnected. You know, like you've got all these little utilities and nothing actually depends on it directly. Maybe they depend on the output. On the other hand, there could be like a super complicated

Starting point is 00:01:12 sort of monolith thing. It's interesting to think about that much code. That is a ton. They're leading a lot of stuff, but one of the, I like this. So why? I mean, that's not free. You don't have a huge code base

Starting point is 00:01:23 and move it to type checking. You don't get that for free. So there has to be benefits to this cost. And that's one of the things. So this article does talk about their kind of their does go through some of their story of how they did it. What I really liked is it covered some of the benefits. And this isn't even that surprising. It says experience tells us that understanding code becomes the key to maintaining developer productivity and that grows with a larger code base.

Starting point is 00:01:51 So without type of annotation, basic reasoning such as figuring out what the valid arguments to a function are or the return types, that's a key one for me, becomes a kind of a hard problem. And just answering those questions quickly, more quickly, what does this function return? Does it return none sometimes? Can it return none? Things like that. These become more and more of a drain as you're looking at a larger code base. I mean, it's definitely true. You spend more time reading code than writing it. So thinking about the types as you're writing it and putting those in place, especially for interfaces to functions, those are an easy win. I like it. They talked about some of the other benefits that the type checker is actually finding subtle bugs that they wouldn't have caught easily without it. Refactoring becomes easier. And then running the

Starting point is 00:02:41 type checking is faster than running the suite of unit tests. So the feedback can be faster. And I didn't think about that aspect of it. That's pretty interesting to include type checking as part of like a TDD flow. I haven't tried that. That'd be kind of fun. And then one of the things I do know is that the IDEs such as Visual Studio Code and PyCharm allow for better completion and static error checking and a whole bunch of goodies that you get from the IDEs if you have type hints in there. But anyway, the other part of the story that I think they talk about is the improvements to MyPy to fit their needs. And so if you like MyPi now, it's probably it's because Dropbox needed it to be really good.

Starting point is 00:03:28 So anyway, it's a good article. I'm a big fan of type hinting and stuff. I think all these things here that you've laid out, I definitely think they're all true. I would say absolutely the biggest one for me is making the IDEs

Starting point is 00:03:41 and the editors just better. When I get the return value function that declares its return type, and I hit dot on that on that variable, boom, there's the list of the things that I can do, I type one or two characters, it auto completes, I just, you know, just flow. And I, yes, it's in the docs, what comes back from some of these things? Yes, you can go look them up what arguments or what operations you can do on them. But if it's one character or two typing and it's just always there,

Starting point is 00:04:08 it just massively improves what you're doing and your confidence and the speed and it doesn't take you out of that flow. And I really appreciate that aspect of it. One of the things I'm embracing more and more is things that can return multiple types because we definitely can do that in Python. So things that, arguments that can be set to none,

Starting point is 00:04:25 but are either a none or a Boolean, or there can be an A element or a list of those types of elements. Those sorts of things are great because if they're one of the types, most of the time, you don't even really think about making sure that it works for the other one. For sure. So you want to hear my philosophical dilemma? Yeah, I do.

Starting point is 00:04:44 All right. So in that article it says something to the effect of my pi is an open source project and the core team is employed by dropbox one of the people who is doing major work on this project is guido van rassen you know yeah i think he did something in python like created things like that's right he created the language and whatnot and it wasn't until gosh i don't know well into the 2010s or something like that till type ending became a thing in the language so python was created it's it's sort of core essence is a language without type declarations right so here's my philosophical debate like Would Guido have gone back and said,

Starting point is 00:05:26 in 1991, actually, a little bit of type hints should have been how Python originally came into the world? Or is this something that you have to go through? And you're like, oh, it's fine when you have 100 lines of code that don't have any type information. But if you have 4 million, all of a sudden, you're in a bad place with 4 million and hundreds of people working on it. Well, all of a sudden, these types now are super valuable because here he is working explicitly on this thing that, you know, he probably decided not to have in his original language. And there's my dilemma. I think it's the size thing. It's helpful for large projects, for tiny little things it's not. I mean, has it ever bothered you that there are no type declarations in Bash scripts? Yeah, not really, I guess.

Starting point is 00:06:12 I don't do really huge Bash applications. Yeah, there's probably some form of anti-pattern right there, isn't there? Yeah, I don't know. Maybe it's also the tooling, right? Like the editors do a lot more with that information now. It is an interesting question of why didn't it have it to begin with? If someone else was working on this, sure, okay, these are two philosophies, and they kind of come together or don't in different ways.

Starting point is 00:06:33 But it's the same person, right? So that was my thought as I was looking through this article. Yeah. But cool. I'm happy to see them doing it, and I like to bring this sort of stuff into my code as well. I think it makes it better. All right. Well, what do you got for us?

Starting point is 00:06:47 I did mention that we have these editors these days that do so much more than they did in 1991. And namely, this would be PyCharm and Visual Studio Code, right? Those are the two main ones. Obviously, there's others. But these are the main ones that are like super rich, right? Yeah. Our friend Miguel Grimberg decided he was going to put together a cool video about setting up visual studio code to work with a full fledged flask application. So with PyCharm, I think it's pretty

Starting point is 00:07:15 straightforward, right? PyCharm kind of is what it is. It's you go in and I got, here's the project. I see that here's how I run stuff. Here's how like there's, it's sort of really clear what you do. There's a lot of stuff going on there and it's really busy, but it's, you can look at it and see what you're supposed to do with visual studio code. I don't feel that way. I look at it. I go like, I know that this thing can be configured and adapted to do all this amazing stuff. And it gives me no breadcrumbs or hints on how to even like take that first step.

Starting point is 00:07:43 I'm like, man, I know this thing's cool, probably, but I'm just going to edit this file and go on, right? But this is a video that also has a blog post version from Miguel. And it's actually a follow-up to doing the same thing in PyCharm about a year ago. And I think the reason he did it in PyCharm, even though I just told you how easy it was, is he's doing it in PyCharm Community, which is not officially able to support web development. It's the free version. So he's like, how do you set up a web development project in a thing that's not meant for that or officially configured for that or whatever? Anyway, so it goes through and it sort of walks you through all the steps. And you know what, it's really nice.

Starting point is 00:08:22 And I think that the grand finale, you will appreciate here, Brian. So as I think a lot of people do, so here's what we're going to do, we're going to go set up, we're going to clone the repo and create a virtual environment, we're going to install the requirements and sort of configure environment variables, maybe run some custom flask commands like flask deploy, which initializes the database, or does database migrations and all that kind of stuff in the terminal before we actually get to the editor. And this is how I work as well. How about you? Do you like start from within PyCharm or do you kind of get to it eventually? Oh, no, I, the same thing. I'm, I'm setting up, well, I've got a little extra little hooks to,

Starting point is 00:08:59 to create an environment and activate an environment. Cause I'm, I'm doing that on the command lane all the time anyway. Like if I'm going to clone a repo and stuff i'm just going to do that so same and i have all these aliases and stuff that will like do multiple steps at once and make it a little bit nicer and so on all right so all that is in terminal but then he says all right here's what we're going to do in vs code you're going to open the folder which is the thing you could do in vs code and it will automatically find the virtual environment. But in order for all that stuff to happen, you have to encourage Visual Studio Code to go into Python mode. So just open any Python file, and that like activates all the little subsystems that fire up like the environment

Starting point is 00:09:37 variable detection and all that kind of stuff, the virtual environment detection, and so on. And then says, all right, now what we want to do is how do you run the thing? So he talks about how to set up a run configuration in the debugger. So you open the debugger tab, add a configuration, and you can actually pick flask. And it knows all about flask, it asks you a couple of questions like, well, what's the app py called, and things like that. So then it'll, you know, set it all up. And then you can run it in the debugger or run it without, and that's pretty nice. And then it says, finally, there's another thing about this UI that, like I said, it's kind of like water, right?

Starting point is 00:10:12 It can be whatever you want, but, you know, you don't look at water and go, I bet that could be like a sculpture of a seal if I froze it and carved it down, right? So- It's our example, but yeah, sure, go on. Yeah, right, like, okay like okay ice sculptures so there's another command you can run in vs code this i didn't know about is you can ask it to discover

Starting point is 00:10:30 python tests that's nice yeah so you can say discover python tests and it'll hunt through and find all the tests in your project and it'll even offer the what test frameworks you want to run you want to run unit test or pycharm or whatever. And then once you do that, like a new UI element sort of pops up and now you can run your tests in a pretty cool runner. So it's about a half hour video. It's good, I think. And there's something really nice about seeing it in action. I'm a big fan of learning through, you know, video stuff as people might imagine, since I put some time and energy into it, but it's one thing to read it. It's another to see just that sort of process gone through and explain step by step. And Miguel does a good job, and I like it.

Starting point is 00:11:10 At the end, he also talks about a limitation of handling crashing Flask applications with a debugger. And he says it's a Flask thing, not a VS Code thing. So you have to do it in both PyCharm and VS Code. But he shows you the little workaround. Yeah, basically you have to stop going through the flask run option and go to the flask.py or app.py, run it, and then override some settings in the run there.

Starting point is 00:11:38 So yeah, it's pretty straightforward, but that's definitely a nice touch as well. Yeah, and then the other thing I wanted to touch on is when he's showing how to run tests in the video, they're just sort of magically running in the background and you don't see what they're doing at the end. He doesn't cover this, but at the bottom of the screen or at the bottom of your VS code window, there's some icons that show you the status of the tests. And if you click on that, you can go, that's where you go look at the output and look at the failures and whatever. So yeah, very cool. Nice. So that's a good one. Another thing that I am a big fan of is parallel programming. And you've got a few things on that one for us, huh?

Starting point is 00:12:12 There's an article called Multiprocessing vs. Threading in Python, What Every Data Scientist Needs to Know. It talked about multiprocessing and threading. It did not talk about async. And I don't know if that's appropriate or not with if async is even something that you can would be useful for data science or not. Sometimes, not computationally, though. In any case, I liked it because a lot of people from data science are coming into program. Like we know, they're coming in not as programmers. They're coming in not as programmers. They're coming in from other fields. So there's a lot of background computer science knowledge that they just don't have. Or, you know, there might be gaps.

Starting point is 00:12:50 So that's one of the reasons why I picked this because I like it. I like that it talked about some of the basic concepts of parallelism, parallel computing, how to think about it as some some diagrams and then what the difference between multi-processing and threading is in general specifically multi or threading is within one process you've got a bunch of stuff going on and multi-processing is you get a bunch of processes but there's trade-offs and then it also talks about specifically that python has a gill so it's a little different. But because of the GIL, so it talks about that threads wait on, you can use either one, but in general, the general rule of thumb is CPU-intensive work, you need multiprocessing.

Starting point is 00:13:35 If you're IO-bound or waiting on users, then threads are fine for that. So the surprising bit to me was the charts and some of the graphs that he has, because he sort of does some benchmarks of code running something on both CPU intensive and IO intensive work and how it speeds up with multi-processing, multi-threading. Obviously throwing more processors at it helps, or more threads. But what surprised me is that the difference between the two wasn't really that great. I thought it would be more pronounced. Basically, if you're not sure which one to use, pick one, and it'll speed up your code.

Starting point is 00:14:21 Interesting, yeah. I kind of thought it would be, even with CPU-intensive stuff, at least with what stuff he was showing, that even multithreading helped speed things up. So I think this is good. And then he goes through a couple of data, specifically data science examples, and shows the code and how to throw multiprocessing

Starting point is 00:14:40 and multithreading at data science problems. That sounds super useful, And the comparisons are interesting. These benchmarks are always so full of landmines and special cases. And I didn't use it that way. So I didn't get the right results that you said. You know, like they're just so tricky to get them right. But it is cool to have them here. I like that a lot.

Starting point is 00:14:59 One thing I would like to throw out there is, you know, a lot of times you have these sort of, I could do it this way or I could do it that way and we'll see what we get. And then sometimes it's this, sometimes it's that. So now you've got to know two APIs and how you combine them. And I'm a big fan of the unsync, U-N-S-Y-N-C library, which takes the async programming model and applies it to multiprocessing to threads and async methods and makes it all nice and clean. Just a couple decorators and they're all the same. So do you still have to pick? You have to pick at the implementation level. So imagine you have three functions. One of them is async because it actually implements async in a way it uses that. One is just a regular function you'd

Starting point is 00:15:40 like to run on a thread. One is a regular function. Sorry, one is a function that does computational stuff and one does a weighting. So you just put a decorator, you say at unsync on the regular async one that will run on async IO on the one that's doing waiting stuff that would work for threads, you just say at unsync, and it automatically runs on threads if it's not an async method. In the last one, you would say at unsync CPU bound equals true. But then once you consume those, the way you program against it, they're all the same regardless of which style it is. So it's like when you define the function, like, oh, this is a CPU bound one. Oh, this one is actually async. So it just is async and it just adapts. It's a pretty cool library. It's 126

Starting point is 00:16:22 lines of Python in one file. And it does all that to unify all these APIs. It's great. Oh library it's 126 lines of python in one file and it does all that to unify all these apis it's great oh that's cool yeah so pretty cool anyway yeah this is really nice and certainly something people want to think about it's it's a little bit tricky we'll see if this is still a discussion in a couple years right in python 3.9 there's talk of maybe using sub interpreters to remove the limitation of the gIL inside of single processes and all sorts of stuff. Aerosnow is working on that. So if they actually got that working, then you'd probably be better because you can share data better, more richly, and faster within a single process. And it gets

Starting point is 00:16:58 about to get even more crazy. That's a long discussion. How much more do you have to care about blocking and stuff like that? Yeah, it brings all that stuff back in because you don't have the gill anymore. Actually, with the sub-interpreters, they're talking about a mechanism to explicitly share data in a safe way between them. So still, it's faster, though. Okay. Cool. Well, speaking of making things faster, if you're looking at your app and you want wonder what's going on it would be nice to see everything that's

Starting point is 00:17:29 going on across all the layers across the database across the web tier things like that so you should check out datadog they're sponsoring this episode it's a modern cloud monitoring cloud scale monitoring platform that brings together metrics and logs and distributed traces all in one place. So it auto instruments things like Django and Flask and Postgres means you get to see everything across all those boundaries. And it helps you optimize your Python apps in just a few minutes. Start monitoring your environment for free and get a sweet Datadog t-shirt. Just visit pythonbytes.fm slash Datadog to get started. Nice. Well, not to be outdone by your async stuff. I also chose the async stuff here.

Starting point is 00:18:10 So remember, we talked about Starlet a little while ago. And Starlet comes from this GitHub organization called Encode, E-N-C-O-D. And that place is full of magic. So they have uveicorn, which is the ASGI server. That's pretty awesome, like G-Unicorn, but for async based on the UV event loop, UV loop, event loop, and so on.

Starting point is 00:18:33 And there's Starlet. There's also Django REST framework, but there's HCPX, which we talked about last time. And the last thing I want to just cover is a few more things in here, because like I said, there's a lot of great stuff, is there's a project just simply called ORM, right? We've got SQL alchemy and Django ORM on it. And these guys just said, you know what? We'll just, the term ORM is just free

Starting point is 00:18:54 in Python. So let's just do that, which is an async ORM. And they also have a thing called databases, which adds async support to talk for talking to all these different databases, Postgres and whatnot. So this is a really cool project, especially this ORM one, because it's kind of like SQL alchemy. And it's actually based on the SQL alchemy core for building queries. And that gives you a bunch of benefits, right? That means if you already have some stuff that works with SQL Alchemy, to some degree it will be similar. It means that Alembic,

Starting point is 00:19:30 which is the tool to do database migrations on SQL Alchemy also works with this ORM. So you can automatically just apply Alembic to it. And that's pretty cool. Wow. Yeah, it uses this database project that I talked about for cross database async support. And it also has this thing called type system for data validation, which is pretty cool. I hadn't

Starting point is 00:19:49 heard of that either, but yeah, it's, it's a really sweet async API for working with databases and ORM. So the way you create the models, it's very similar to SQL alchemy. It's not identical, but it's similar. And then from there on, you just work with it kind of like you would do normal ORM stuff, right? Like I would say, if I'm working on an album, I might say album.objects.create, or maybe I would do some kind of filter. So I'd say track.objects.filter, and I would do something. But every one of these operations is async. So you just put await in front of it. And if you have something you got to scale a whole lot of concurrent data traffic, like say a website, well, this is a pretty good combo.

Starting point is 00:20:36 Okay. So like in the future, will we just have await in front of every other word? Everything. Exactly. So I was going to point out that you've got to be pretty async and await savvy to be doing that. Like there's a lot of awaiting, isn't there? Yeah. I think if you want to work

Starting point is 00:20:52 with this library, you just have to say, we're just going all in on async. And that's the way it goes, right? No, it's good. If you're already working with async, that's when you would think, hey, I wonder if there's an async ORM

Starting point is 00:21:03 that I can use. Yeah. Yeah, it looks good. And I like that it's based on SQL Alchemy Core. That means a big chunk of the database conversation and the table creation and the migrations, all that stuff is already known and proven and working really well. It's just this API kind of site around the side of the traditional SQL alchemy conversation, like directly with the database. I do wish that SQL alchemy would take this approach. I interviewed Mike Bayer about it long time ago. And like four years ago, he said, I don't really think it's going to make that big of a difference. But I think it actually would make a huge difference. You just got to think about, you know, what is your goal, right?

Starting point is 00:21:45 If your goal is performance, it probably won't make a big difference. If your goal is scalability, it can make a tremendous difference, right? Are you trying to make an individual user's experience a little bit faster? Or are you trying to make the website not take 10 concurrent users, but 10,000, right? Like it probably might even make it a tiny bit slower for that one person, but it might make that 10 to 10,000 like no big deal. So it depends on what you're after, right? Yeah, definitely.

Starting point is 00:22:12 Speaking of what you're after, what's next for us? One of the things you might be after is some data on somebody else's website, like through an API. Yes. There's more and more people. And I think it's great kind of doing the data science stuff of people coming into Python and programming from just trying to get their work done. And this is a DataQuest.io blog post called Getting Started with APIs. And it's not getting started writing APIs.

Starting point is 00:22:38 It's getting started consuming them with Python. If you kind of know what all this stuff is, but you haven't really thought about the basics. That's what I, why I picked up this post is because it's a really good with the basics has a conceptual introduction of what a web APIs are versus what a website is kind of how the, what the differences are and why, I mean, why also why have APIs? If you can just have, people could just store the data in CSV files? That'd be easier, wouldn't it? That'd be amazing. I'd love to live in that world.

Starting point is 00:23:09 No. No, but there are a lot of data sets out there that are just CSV files sitting around. Right. It depends if it's dynamic, right? Right. Dynamic and also if you want to specify it. So with APIs, you can have, you can have parameters to your queries to say, I only want, I only want the data for this user, or they gave an example of Spotify music or something. You don't want to have like all the data for all the songs that Spotify knows about,

Starting point is 00:23:38 but you know, maybe just the songs from a particular artist or something. So things like that are good. But this is actually the first time I've seen this, and they're probably all over the place, but talked about status codes, especially get status codes, because that's what we're doing here is retrieving things. And it had a nice list of all the descriptions and things that you might run into for error codes, including like the 301, which isn't necessarily a problem, but you're

Starting point is 00:24:05 getting redirected. So maybe you want to know about that. And then the 400 is something's not wrong on their end. It's wrong on your end. The server thinks you made a bad request. So that might be an endpoint that expects data or parameters, but you didn't send any parameters with it. Or you sent an int when it expected a string or whatever. And then it talks about endpoints and endpoints that take query parameters, endpoints being the specific APIs. So we think of a service providing an API, but it's usually not just one API.

Starting point is 00:24:40 It's usually a whole bunch of related different bits of data that you can query together or query separately for different aspects of it and then of course what apis usually return is json data so it has a little bit of an explanation for what json looks like and then using the json module to convert back and forth between native python stuff and JSON. And it also talks about requests and a bunch of examples for how to pull this. So if you're getting started trying to pull some data from an API somewhere, this is a good way to get started. It's a nice blend of theory and steps, right?

Starting point is 00:25:18 It doesn't just say, well, you open up requests and you do this. It's like, here's what an API is. Here's what the HTTP verbs mean. Here's what the api is here's what the http verbs mean here's what's the those status codes are here's how you get to that and you know how do you like manifest that in python and stuff yeah it's nice yeah but it's not at the level of like a college course lecture it's a just enough to to get the concepts right exactly it's not trying to make you read the rest restful uh dissertations and things like that yeah i don't even know if it mentions REST,

Starting point is 00:25:45 even though that's what we're talking about. Cool. That's probably a good thing. That was overdone for a while. Now, last thing I want to cover is memory management in Python. This is an article entitled Memory Management in Python, but what it really is is it's a narrow slice, but a common slice of memory management in Python. So you probably don't think about memory very much in Python, huh, Brian?

Starting point is 00:26:08 No, I usually forget about it. Yeah, just forget about it. That's right. So you don't use malloc or free or new or any of these things. Definitely not delete. If you use delete, it means something else, sort of, and things like that, right? Yeah. So I think it's actually pretty interesting that the story of understanding how the runtime experience is in CPython, it's kind of opaque a little bit, right? There's not a lot written about memory management, which is why I decided to pick this thing and talk a little bit about what it covers. Because I think it doesn't really matter that you know this in some sense, right? Like your Python code will still work, but you more closely understand what your code is doing, how that might map over to like CPU architectures and caches and RAM and all that kind of stuff. And, you know, just having a high level understanding of that's good. Yeah. So here's a pretty deep detailed article, not too long,

Starting point is 00:27:02 get to it pretty quick about memory management and Python. But it only covers, like I said, a little bit. It's really about how does small object allocation and deallocation happen in Python. It doesn't talk about the gill, which is about thread safety and memory allocation. It doesn't talk about reference counts. It doesn't talk about garbage collection for cycles, or much else. So it's all about small objects. But most things we make in Python are small objects. Even when they're big, they're really just a bunch of small things all pointed at each other, right? So if I've got like a list of a million items, I don't have each of those items is 10 bytes. I don't have 10 million bytes.

Starting point is 00:27:42 I have this big list with a bunch of things. But then each one of those is a pointer out to its actual thing that it is, right? Even when you have strings, or even numbers, right? A lot of languages, numbers are allocated on the stack, and treated as value types and stuff. But you know, everything is an object. So every little thing that you make has to get allocated and deallocated. So understanding how these small objects get allocated, that's, that's pretty interesting. So that's what this article talks about. So I'll try to like summarize some of the stuff covered there. One of the

Starting point is 00:28:13 problems you have with memory allocation is that memory can get super fragmented, right? If I just allocate a bunch of stuff and then delete it and keep allocating it and just, just let that grow, you know, just keep adding on, on the end, wherever the memory is and i want to interact with that that can really mess up like reading from ram and getting stuff on cache to be high performance and stuff like that right so what python does is it actually pre-allocates these little 256k chunks and then it partitions those up and it plucks plucks in the the small objects into those spaces and then we'll potentially take them back out and then reuse those spaces that it had already allocated when it needs to make a new small thing okay okay all right so that's

Starting point is 00:28:58 supposed to help with memory optimization the locality stuff the fragmentation and so on. So there's a special memory manager in Python called PyMalloc, general purpose allocator on top of like C Malloc, there's a Python allocator, right? So there's like this layer, we have RAM, we have the operating systems, virtual memory management, we have C's Malloc, we have this PyMem, PyM malloc thing, we have the Python object allocator, that then figures out where to place these things. And we actually have object memory. So there's a lot of stuff going on here. And they break it into three levels of organization. Okay, so for small objects, which are things that are individually smaller than 512 bytes, right, not like maybe a list that has a bunch of stuff, but each little bit smaller,

Starting point is 00:29:45 right? So those are the things we're talking about. And what happens is it gets broken into these three things called the block, the pool, and the arena. So a block is a chunk of memory of a certain size, and it only holds Python objects of a certain size. So maybe there's a block that holds 16-byte Python things. Okay. That's weird. Yeah. So the reason is Python can then, it knows how to exactly fill up and then reuse those blocks.

Starting point is 00:30:17 Oh, yeah. Right? So if it's like, oh, I'm going to get a bunch of numbers, all the numbers are the same size unless they become utterly huge. So we can just like allocate them into the spot, some of those numbers go away, we got another block, we drop that new number pointer in right there, or the number which we then point out right there, and so on. So there's these different blocks, each one is a uniform size between eight and 512 bytes. And then the blocks

Starting point is 00:30:40 are managed by this thing called a pool, which is usually limited to a memory page size, so four kilobytes. And then the pools are managed as these things called arenas. And these are the things that are allocated on the heap. I believe they are 256K pieces of memory, which hold 64 pools, which hold some number of blocks and things like that, right? So there's this really intricate way in which memory is trying to be grouped together and then also trying to be reused without reallocating it from the operating system. Okay.

Starting point is 00:31:15 Right? So even though Python might new up a bunch of objects, it actually says, well, but we already have this block that holds those size of things, and there's some spots in there, So let's fill that bad boy up. Oh, all right. Yeah. Anyway, so it's pretty interesting how all the stuff is working together. But that's the Python small object allocator.

Starting point is 00:31:33 Never thought about that before. But kind of interesting. Also, I'm trying to visualize like a sports arena with 64 swimming pools in it. That's not a bad one. And then each pool is filled of exactly the same size people or creatures swimming around, something like that. Yeah. Yeah, there you go.

Starting point is 00:31:49 That makes a lot of sense. That's the first part of it totally made sense. The last bit, maybe not so much. All right. Well, anyway, what I like about this article is it seems like it has a lot of stuff from like, here's the actual C code that defines what an arena is. Here you can see it's like a doubly linked list and how it all fits together. And it's just got some good analysis.

Starting point is 00:32:08 So have a look if you've wondered about this. All right. Well, that's it for our main items. I know, Brian, you have big news for the entire world if they live near Portland. If they live in Portland or really close to Portland. Or want to come to Portland. September 26, I'll be speaking downtown at the Portland Python user group and I'm still working on my talk but I'll

Starting point is 00:32:31 be there that'll be fun and then I'll probably polish it more and people have to volunteer for this other talk so on October 6th it's the inaugural first day of uh meeting the python pdx west so we've got a new new user group and for python in town i'm hosting it with along with you yeah it'd be fun i'm really looking forward to it yeah and you'll be speaking there i will and i'm trying to get other people to volunteer to speak and if they don't then it'll just be you and me speaking but i think it'll be fun so we got a bunch of people signed up so far, so it's filling up fast. People should sign up. That's cool.

Starting point is 00:33:08 Maybe we could do a live Python Bytes sometime there as well at the end of the day or something. Who knows? That's a great idea. Yeah, we could have. Maybe not Tuesday, October 6th, but maybe someday we can make that happen. Maybe someday, yeah. Yeah, that's great news. If you happen to be around, definitely drop in.

Starting point is 00:33:23 That'd be great. It's on meetup.com, right? People can just sign up there. Yep, and a link in the show notes. Do you have any intention of recording, live casting, or otherwise spreading this in a farther path? It's not a bad idea. We don't have anything like that set up right away.

Starting point is 00:33:38 In the future, maybe we could do that. Probably people would be interested in watching these. But I also want to make it really accessible to people that are new to presenting as well. I'd love to have people come in and do like a talk that they're working on. It's not quite polished yet. I want it to not just be experts talking to everybody else, but I'd like it to be people working out things that they're just interested in. So I think it'd be good. Yeah, that sounds like a great philosophy for it. How about you?

Starting point is 00:34:06 Any extras? I have a couple. Presenting and speaking PyCon 2020, which is a little earlier this year. I believe it's like in April or something. The website's up. Yeah. Yeah, so April 15th to 23rd.

Starting point is 00:34:19 So the call for proposals is now open for PyCon 2020. So if you would like to be considered, a talk of yours to be considered there, then now is the time. Yeah, go ahead and submit those because you know you're only going to spend like a week writing it up anyway. So, may as well get that done right away. That's right. Do like a band-aid, stop worrying about it, just get it over with. Yeah. Pull it right off. All right. Another thing, I just, have you heard of Gitbook?

Starting point is 00:34:42 Yeah, but I haven't really looked into it much. I hadn't either. I was interviewing the guy, Joe, from Masonite, the Masonite web framework. And I noticed that Masonite's documentation is written in Gitbook. And so I looked at it and Gitbook is pretty interesting. You can use it as kind of like almost a base camp project management type thing. So stuff, personal notes or things you want to track or stuff like that. But you can also use it for documentation and knowledge bases and whatnot. So it looked pretty cool. And so I thought I'd just, you know, let people know that it's out there. It's free for small teams, like with some limitations. It's

Starting point is 00:35:21 cost a little bit of money for non-trivial small teams like $7 user, but it's also free for open source and nonprofit teams, which is kind of cool. So I'm also a big fan of read the docs. So it's, you know, I'm not saying they shouldn't use that, but here's an interesting project that I ran across that I hadn't heard of. It looks nice. If people for some reason are opposed to read the docs, I don't know why you would be. Or just like this look better, here's another opportunity. So good to have options. Good to have options. Also good to have laughs.

Starting point is 00:35:50 Yeah, let's do some jokes. All right. How about you go first? Okay. So I pulled these out of a list of dad jokes you had posted somewhere on our Trello, but changed it a little bit. So what do you call a 3.14 foot long snake? I don't know.

Starting point is 00:36:05 Well, that would be a python, of course. With the Greek symbol thon, yeah, python. Yeah. So if it's not feet, but 3.14 inches, then what is it? It's a micropython. It's a micropython, a mu python. Yeah, I feel like we're back in calculus or physics. Yeah.

Starting point is 00:36:22 So do you want to do some of these? Sure. So why doesn't Hollywood make more big data movies? I don't know. Why? No sequel. This last one, it's a little bit crass. It's, I don't know, it's a little low level, but I'll see what I can do here.

Starting point is 00:36:35 So why didn't the angle bracket div get invited to the dinner party? I don't know. Why? It had no class. Oh, yeah. That's a good one. All right. Well, thanks for throwing those in there.

Starting point is 00:36:47 These are fun. Yeah. Thank you once again for talking with me on a nice Wednesday. Absolutely. See you later. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.

Starting point is 00:36:57 That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #149 Python's small object allocator and other memory features

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.