Python Bytes - #149 Python's small object allocator and other memory features
Episode Date: September 25, 2019Topics covered in this episode: Dropbox: Our journey to type checking 4 million lines of Python Setting Up a Flask Application in Visual Studio Code Multiprocessing vs. Threading in Python: What Ev...ery Data Scientist Needs to Know ORM - async ORM Getting Started with APIs Memory management in Python Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/149
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 149, recorded September 18th, 2019.
I'm Michael Kennedy.
And I am Brian Ocken.
And this episode is brought to you by Datadog.
I'll tell you more about them later.
Brian, this first item that you have here, it actually sparked some philosophical sort of challenge to my way of seeing the world here. So why don't you run
it by and I'll tell you about my problem. Maybe you can help me through it.
I'm curious about this now. I'm pretty sure we've covered this before, but Dropbox is kind of
behind a lot of the push to do different type checking or type hinting and checking those type
hints within Python. The MyPy project is, I think, spearheaded by Dropbox.
Yes.
There's an article that they put out called
Our Journey to Type Checking 4 Million Lines of Python.
Wow, 4 million lines.
That's a big code base.
That's a lot of Python.
Yeah.
I wonder how much of it's interconnected.
You know, like you've got all these little utilities
and nothing actually depends on it directly.
Maybe they depend on the output.
On the other hand,
there could be like a super complicated
sort of monolith thing.
It's interesting to think about that much code.
That is a ton.
They're leading a lot of stuff,
but one of the, I like this.
So why?
I mean, that's not free.
You don't have a huge code base
and move it to type checking.
You don't get that for free.
So there has to be benefits to this cost.
And that's one of the things.
So this article does talk about their kind of their does go through some of their story of how they did it.
What I really liked is it covered some of the benefits.
And this isn't even that surprising.
It says experience tells us that understanding code becomes the key to maintaining developer productivity and that grows with a larger code base.
So without type of annotation, basic reasoning such as figuring out what the valid arguments to a function are or the return types, that's a key one for me, becomes a kind of a hard problem. And just answering those questions
quickly, more quickly, what does this function return? Does it return none sometimes? Can it
return none? Things like that. These become more and more of a drain as you're looking at a larger
code base. I mean, it's definitely true. You spend more time reading code than writing it. So
thinking about the types as you're writing it and putting
those in place, especially for interfaces to functions, those are an easy win. I like it.
They talked about some of the other benefits that the type checker is actually finding subtle bugs
that they wouldn't have caught easily without it. Refactoring becomes easier. And then running the
type checking is faster than running the suite of unit tests.
So the feedback can be faster. And I didn't think about that aspect of it. That's pretty
interesting to include type checking as part of like a TDD flow. I haven't tried that. That'd be
kind of fun. And then one of the things I do know is that the IDEs such as Visual Studio Code and PyCharm allow for better completion and static
error checking and a whole bunch of goodies that you get from the IDEs if you have type
hints in there. But anyway, the other part of the story that I think they talk about is the
improvements to MyPy to fit their needs. And so if you like MyPi now, it's probably it's because Dropbox
needed it to be really good.
So anyway, it's a good article.
I'm a big fan of type hinting and stuff.
I think all these things here
that you've laid out,
I definitely think they're all true.
I would say absolutely
the biggest one for me
is making the IDEs
and the editors just better.
When I get the return value function
that declares its
return type, and I hit dot on that on that variable, boom, there's the list of the things
that I can do, I type one or two characters, it auto completes, I just, you know, just flow. And
I, yes, it's in the docs, what comes back from some of these things? Yes, you can go look them
up what arguments or what operations you can do on them. But if it's one character or two typing
and it's just always there,
it just massively improves what you're doing
and your confidence and the speed
and it doesn't take you out of that flow.
And I really appreciate that aspect of it.
One of the things I'm embracing more and more
is things that can return multiple types
because we definitely can do that in Python.
So things that, arguments that can be set to none,
but are either a none or a Boolean,
or there can be an A element or a list of those types of elements.
Those sorts of things are great because if they're one of the types,
most of the time,
you don't even really think about making sure that it works for the other one.
For sure.
So you want to hear my philosophical dilemma?
Yeah, I do.
All right.
So in that article it says something to the effect of my pi is an open source project and the core
team is employed by dropbox one of the people who is doing major work on this project is guido van
rassen you know yeah i think he did something in python like created things like that's right he
created the language and whatnot and it wasn't until gosh i
don't know well into the 2010s or something like that till type ending became a thing in the
language so python was created it's it's sort of core essence is a language without type declarations
right so here's my philosophical debate like Would Guido have gone back and said,
in 1991, actually, a little bit of type hints should have been how Python originally came into
the world? Or is this something that you have to go through? And you're like, oh, it's fine when
you have 100 lines of code that don't have any type information. But if you have 4 million,
all of a sudden, you're in a bad place with 4 million and hundreds of people working on it. Well, all of a sudden,
these types now are super valuable because here he is working explicitly on this thing that,
you know, he probably decided not to have in his original language. And there's my dilemma.
I think it's the size thing. It's helpful for large projects, for tiny little things it's not. I mean, has it ever bothered you that there are no type declarations in Bash scripts?
Yeah, not really, I guess.
I don't do really huge Bash applications.
Yeah, there's probably some form of anti-pattern right there, isn't there?
Yeah, I don't know.
Maybe it's also the tooling, right?
Like the editors do a lot more with that information now.
It is an interesting question of why didn't it have it to begin with?
If someone else was working on this, sure, okay, these are two philosophies,
and they kind of come together or don't in different ways.
But it's the same person, right?
So that was my thought as I was looking through this article.
Yeah.
But cool.
I'm happy to see them doing it, and I like to bring this sort of stuff into my code as well.
I think it makes it better.
All right.
Well, what do you got for us?
I did mention that we have these editors these days that do so much more than they did in 1991.
And namely, this would be PyCharm and Visual Studio Code, right?
Those are the two main ones.
Obviously, there's others.
But these are the main ones that are like super rich, right?
Yeah.
Our friend Miguel Grimberg decided he was going to put together a cool video about setting up visual
studio code to work with a full fledged flask application. So with PyCharm, I think it's pretty
straightforward, right? PyCharm kind of is what it is. It's you go in and I got, here's the project.
I see that here's how I run stuff. Here's how like there's, it's sort of really clear what you do.
There's a lot of stuff going on there and it's really busy, but it's, you can look at
it and see what you're supposed to do with visual studio code.
I don't feel that way.
I look at it.
I go like, I know that this thing can be configured and adapted to do all this amazing stuff.
And it gives me no breadcrumbs or hints on how to even like take that first step.
I'm like, man, I know this thing's
cool, probably, but I'm just going to edit this file and go on, right? But this is a video that
also has a blog post version from Miguel. And it's actually a follow-up to doing the same thing in
PyCharm about a year ago. And I think the reason he did it in PyCharm, even though I just told you
how easy it was, is he's doing it in PyCharm Community, which is not officially able to support web
development. It's the free version. So he's like, how do you set up a web development project in a
thing that's not meant for that or officially configured for that or whatever? Anyway, so it
goes through and it sort of walks you through all the steps. And you know what, it's really nice.
And I think that the grand finale, you will appreciate here, Brian. So as I think a lot of people do, so here's what we're going to do,
we're going to go set up, we're going to clone the repo and create a virtual environment,
we're going to install the requirements and sort of configure environment variables,
maybe run some custom flask commands like flask deploy, which initializes the database,
or does database migrations and all that kind of
stuff in the terminal before we actually get to the editor. And this is how I work as well.
How about you? Do you like start from within PyCharm or do you kind of get to it eventually?
Oh, no, I, the same thing. I'm, I'm setting up, well, I've got a little extra little hooks to,
to create an environment and activate an environment. Cause I'm, I'm doing that on
the command lane all the time anyway. Like if I'm going to clone a repo and stuff i'm just going to do that so same and i have
all these aliases and stuff that will like do multiple steps at once and make it a little bit
nicer and so on all right so all that is in terminal but then he says all right here's what
we're going to do in vs code you're going to open the folder which is the thing you could do in vs
code and it will automatically find the virtual environment. But in order for all
that stuff to happen, you have to encourage Visual Studio Code to go into Python mode. So just open
any Python file, and that like activates all the little subsystems that fire up like the environment
variable detection and all that kind of stuff, the virtual environment detection, and so on.
And then says, all right, now what we want to do is how do you run the thing? So he talks about how to set up a run configuration in the debugger. So you open
the debugger tab, add a configuration, and you can actually pick flask. And it knows all about flask,
it asks you a couple of questions like, well, what's the app py called, and things like that.
So then it'll, you know, set it all up. And then you can run it in the debugger or run it without, and that's pretty nice.
And then it says, finally,
there's another thing about this UI that,
like I said, it's kind of like water, right?
It can be whatever you want,
but, you know, you don't look at water and go,
I bet that could be like a sculpture of a seal
if I froze it and carved it down, right?
So-
It's our example, but yeah, sure, go on.
Yeah, right, like, okay like okay ice sculptures so there's
another command you can run in vs code this i didn't know about is you can ask it to discover
python tests that's nice yeah so you can say discover python tests and it'll hunt through
and find all the tests in your project and it'll even offer the what test frameworks you want to
run you want to run unit test or pycharm or whatever. And then once you do that, like a new UI element sort of pops up and now you can run your tests in a pretty cool runner. So it's about
a half hour video. It's good, I think. And there's something really nice about seeing it in action.
I'm a big fan of learning through, you know, video stuff as people might imagine, since I put some
time and energy into it, but it's one thing to read it. It's another to see just that sort of process gone through
and explain step by step.
And Miguel does a good job, and I like it.
At the end, he also talks about a limitation of handling
crashing Flask applications with a debugger.
And he says it's a Flask thing, not a VS Code thing.
So you have to do it in both PyCharm and VS Code.
But he shows you the little workaround.
Yeah, basically you have to stop going through the flask run option
and go to the flask.py or app.py, run it,
and then override some settings in the run there.
So yeah, it's pretty straightforward, but that's definitely a nice touch as well.
Yeah, and then the other thing I wanted to touch on is
when he's showing how to run tests in the video,
they're just sort of magically running in the background and you don't see what they're doing at the end. He doesn't cover this, but at the bottom of the screen or at the
bottom of your VS code window, there's some icons that show you the status of the tests.
And if you click on that, you can go, that's where you go look at the output and look at
the failures and whatever. So yeah, very cool. Nice. So that's a good one. Another thing that I am a big fan of is
parallel programming. And you've got a few things on that one for us, huh?
There's an article called Multiprocessing vs. Threading in Python, What Every Data
Scientist Needs to Know. It talked about multiprocessing and threading. It did not
talk about async.
And I don't know if that's appropriate or not with if async is even something that you can would be useful for data science or not.
Sometimes, not computationally, though. In any case, I liked it because a lot of people from data science are coming into program.
Like we know, they're coming in not as programmers.
They're coming in not as programmers. They're coming in from other fields. So there's a lot of background computer science knowledge that they just don't have.
Or, you know, there might be gaps.
So that's one of the reasons why I picked this because I like it.
I like that it talked about some of the basic concepts of parallelism, parallel computing, how to think about it as some some diagrams and then what the difference between multi-processing and threading
is in general specifically multi or threading is within one process you've got a bunch of stuff
going on and multi-processing is you get a bunch of processes but there's trade-offs and then it
also talks about specifically that python has a gill so it's a little different. But because of the GIL, so it talks about that threads wait on,
you can use either one, but in general,
the general rule of thumb is CPU-intensive work,
you need multiprocessing.
If you're IO-bound or waiting on users,
then threads are fine for that.
So the surprising bit to me was the charts and some of the graphs that
he has, because he sort of does some benchmarks of code running something on both CPU intensive
and IO intensive work and how it speeds up with multi-processing, multi-threading. Obviously throwing more processors at it helps, or more threads.
But what surprised me is that the difference between the two wasn't really that great.
I thought it would be more pronounced.
Basically, if you're not sure which one to use, pick one, and it'll speed up your code.
Interesting, yeah.
I kind of thought it would be, even with CPU-intensive stuff,
at least with what stuff he was showing,
that even multithreading helped speed things up.
So I think this is good.
And then he goes through a couple of data,
specifically data science examples,
and shows the code and how to throw multiprocessing
and multithreading at data science problems.
That sounds super useful, And the comparisons are interesting.
These benchmarks are always so full of landmines and special cases.
And I didn't use it that way.
So I didn't get the right results that you said.
You know, like they're just so tricky to get them right.
But it is cool to have them here.
I like that a lot.
One thing I would like to throw out there is, you know, a lot of times you have these
sort of, I could do it this way or I could do it that way and we'll see what we get.
And then sometimes it's this, sometimes it's that.
So now you've got to know two APIs and how you combine them.
And I'm a big fan of the unsync, U-N-S-Y-N-C library, which takes the async programming model and applies it to multiprocessing to threads and async methods and makes it all nice and clean.
Just a couple decorators and they're all the same. So do you still have to pick? You have to
pick at the implementation level. So imagine you have three functions. One of them is async
because it actually implements async in a way it uses that. One is just a regular function you'd
like to run on a thread. One is a regular function. Sorry, one is a function that does
computational stuff and one does a weighting. So you just put a decorator, you say at unsync
on the regular async one that will run on async IO on the one that's doing waiting stuff that
would work for threads, you just say at unsync, and it automatically runs on threads if it's not
an async method. In the last one, you would say at unsync CPU bound equals true. But then once you
consume those, the way you program against it, they're all the same regardless of which style
it is. So it's like when you define the function, like, oh, this is a CPU bound one. Oh, this one
is actually async. So it just is async and it just adapts. It's a pretty cool library. It's 126
lines of Python in one file. And it does all that to unify all these APIs. It's great. Oh library it's 126 lines of python in one file and it does all that to unify
all these apis it's great oh that's cool yeah so pretty cool anyway yeah this is really nice and
certainly something people want to think about it's it's a little bit tricky we'll see if this
is still a discussion in a couple years right in python 3.9 there's talk of maybe using sub
interpreters to remove the limitation of the gIL inside of single processes and all sorts of stuff.
Aerosnow is working on that. So if they actually got that working, then
you'd probably be better because you can share data better, more richly, and
faster within a single process. And it gets
about to get even more crazy. That's a long discussion.
How much more do you have to care about blocking and stuff like that?
Yeah, it brings all that stuff back in because you don't have the gill anymore.
Actually, with the sub-interpreters, they're talking about a mechanism to explicitly share data in a safe way between them.
So still, it's faster, though.
Okay.
Cool.
Well, speaking of making things faster, if you're looking at your app and you want wonder what's going on it would be nice to see everything that's
going on across all the layers across the database across the web tier things like that so you should
check out datadog they're sponsoring this episode it's a modern cloud monitoring cloud scale
monitoring platform that brings together metrics and logs and distributed traces all in one place. So it
auto instruments things like Django and Flask and Postgres means you get to see everything across
all those boundaries. And it helps you optimize your Python apps in just a few minutes. Start
monitoring your environment for free and get a sweet Datadog t-shirt. Just visit pythonbytes.fm
slash Datadog to get started. Nice. Well, not to be outdone by your async stuff.
I also chose the async stuff here.
So remember, we talked about Starlet a little while ago.
And Starlet comes from this GitHub organization
called Encode, E-N-C-O-D.
And that place is full of magic.
So they have uveicorn, which is the ASGI server.
That's pretty awesome, like G-Unicorn,
but for async based on the UV event loop,
UV loop, event loop, and so on.
And there's Starlet.
There's also Django REST framework,
but there's HCPX, which we talked about last time.
And the last thing I want to just cover
is a few more things in here,
because like I said, there's a lot of great stuff,
is there's a project just simply called ORM, right? We've got SQL alchemy and
Django ORM on it. And these guys just said, you know what? We'll just, the term ORM is just free
in Python. So let's just do that, which is an async ORM. And they also have a thing called
databases, which adds async support to talk for talking to all
these different databases, Postgres and whatnot. So this is a really cool project, especially this
ORM one, because it's kind of like SQL alchemy. And it's actually based on the SQL alchemy core
for building queries. And that gives you a bunch of benefits, right? That means if you already have
some stuff that works with SQL Alchemy,
to some degree it will be similar.
It means that Alembic,
which is the tool to do database migrations
on SQL Alchemy also works with this ORM.
So you can automatically just apply Alembic to it.
And that's pretty cool.
Wow.
Yeah, it uses this database project
that I talked about for cross database async support. And it also
has this thing called type system for data validation, which is pretty cool. I hadn't
heard of that either, but yeah, it's, it's a really sweet async API for working with
databases and ORM. So the way you create the models, it's very similar to SQL alchemy. It's
not identical, but it's similar. And then from there on, you just work with it kind of like you would do normal ORM stuff, right? Like I would
say, if I'm working on an album, I might say album.objects.create, or maybe I would do some
kind of filter. So I'd say track.objects.filter, and I would do something. But every one of these
operations is async.
So you just put await in front of it. And if you have something you got to scale a whole lot of
concurrent data traffic, like say a website, well, this is a pretty good combo.
Okay. So like in the future, will we just have await in front of every other word?
Everything. Exactly. So I was going to point out that you've got to be pretty
async and await savvy
to be doing that.
Like there's a lot of awaiting,
isn't there?
Yeah.
I think if you want to work
with this library,
you just have to say,
we're just going all in on async.
And that's the way it goes, right?
No, it's good.
If you're already working with async,
that's when you would think,
hey, I wonder if there's an async ORM
that I can use.
Yeah. Yeah, it looks good. And I like that it's based on SQL Alchemy Core. That means a big chunk
of the database conversation and the table creation and the migrations, all that stuff
is already known and proven and working really well. It's just this API kind of site around the side of the traditional SQL alchemy conversation,
like directly with the database. I do wish that SQL alchemy would take this approach.
I interviewed Mike Bayer about it long time ago. And like four years ago, he said, I don't really
think it's going to make that big of a difference. But I think it actually would make a huge
difference. You just got to think about, you know, what is your goal, right?
If your goal is performance, it probably won't make a big difference.
If your goal is scalability, it can make a tremendous difference, right?
Are you trying to make an individual user's experience a little bit faster?
Or are you trying to make the website not take 10 concurrent users, but 10,000, right?
Like it probably might even make it a tiny bit slower for that one person, but it might
make that 10 to 10,000 like no big deal.
So it depends on what you're after, right?
Yeah, definitely.
Speaking of what you're after, what's next for us?
One of the things you might be after is some data on somebody else's website, like through
an API.
Yes.
There's more and more people.
And I think it's great kind of doing the data science stuff of people coming into Python and programming from just trying to get their work done.
And this is a DataQuest.io blog post called Getting Started with APIs.
And it's not getting started writing APIs.
It's getting started consuming them with Python.
If you kind of know what all this stuff is, but you haven't really thought about the basics. That's what I, why I picked up this post is because it's a really good with the
basics has a conceptual introduction of what a web APIs are versus what a website is kind of how
the, what the differences are and why, I mean, why also why have APIs? If you can just have,
people could just store the data in CSV files?
That'd be easier, wouldn't it?
That'd be amazing.
I'd love to live in that world.
No.
No, but there are a lot of data sets out there that are just CSV files sitting around.
Right.
It depends if it's dynamic, right?
Right.
Dynamic and also if you want to specify it. So with APIs, you can have, you can have parameters to your queries to
say, I only want, I only want the data for this user, or they gave an example of Spotify music
or something. You don't want to have like all the data for all the songs that Spotify knows about,
but you know, maybe just the songs from a particular artist or something. So things like
that are good. But this is actually the first time I've seen this,
and they're probably all over the place,
but talked about status codes, especially get status codes,
because that's what we're doing here is retrieving things.
And it had a nice list of all the descriptions
and things that you might run into for error codes,
including like the 301, which isn't necessarily a problem, but you're
getting redirected. So maybe you want to know about that. And then the 400 is something's not
wrong on their end. It's wrong on your end. The server thinks you made a bad request.
So that might be an endpoint that expects data or parameters, but you didn't send any parameters
with it. Or you sent an int when it expected a string or whatever.
And then it talks about endpoints and endpoints that take query parameters,
endpoints being the specific APIs.
So we think of a service providing an API,
but it's usually not just one API.
It's usually a whole bunch of related different bits of data
that you can query together or query
separately for different aspects of it and then of course what apis usually return is json data so
it has a little bit of an explanation for what json looks like and then using the json module to
convert back and forth between native python stuff and JSON. And it also talks about requests and a bunch of examples for how to pull this.
So if you're getting started trying to pull some data from an API somewhere,
this is a good way to get started.
It's a nice blend of theory and steps, right?
It doesn't just say, well, you open up requests and you do this.
It's like, here's what an API is.
Here's what the HTTP verbs mean.
Here's what the api is here's what the http verbs mean here's what's the
those status codes are here's how you get to that and you know how do you like manifest that in
python and stuff yeah it's nice yeah but it's not at the level of like a college course lecture it's
a just enough to to get the concepts right exactly it's not trying to make you read the rest restful
uh dissertations and things like that yeah i don't even know if it mentions REST,
even though that's what we're talking about.
Cool. That's probably a good thing.
That was overdone for a while.
Now, last thing I want to cover is memory management in Python.
This is an article entitled Memory Management in Python,
but what it really is is it's a narrow slice,
but a common slice of memory management in Python.
So you probably don't think about memory very much in Python, huh, Brian?
No, I usually forget about it. Yeah, just forget about it. That's right.
So you don't use malloc or free or new or any of these things. Definitely not delete. If you use delete, it means something else, sort of, and things like that, right? Yeah. So I think it's
actually pretty interesting that the story of understanding how the runtime experience is in CPython, it's kind of opaque a little bit, right?
There's not a lot written about memory management, which is why I decided to pick this thing and talk a little bit about what it covers.
Because I think it doesn't really matter that you know this in some sense, right?
Like your Python code will still work, but you more closely understand what your code is doing, how that might map over to like CPU
architectures and caches and RAM and all that kind of stuff. And, you know, just having a high level
understanding of that's good. Yeah. So here's a pretty deep detailed article, not too long,
get to it pretty quick about memory management and Python. But it only covers, like I said, a little bit. It's really about how does small object allocation
and deallocation happen in Python. It doesn't talk about the gill, which is about thread safety and
memory allocation. It doesn't talk about reference counts. It doesn't talk about garbage collection
for cycles, or much else. So it's all about small objects.
But most things we make in Python are small objects.
Even when they're big, they're really just a bunch of small things all pointed at each other, right?
So if I've got like a list of a million items, I don't have each of those items is 10 bytes.
I don't have 10 million bytes.
I have this big list with a bunch of things.
But then each one
of those is a pointer out to its actual thing that it is, right? Even when you have strings,
or even numbers, right? A lot of languages, numbers are allocated on the stack,
and treated as value types and stuff. But you know, everything is an object. So every little
thing that you make has to get allocated and deallocated. So understanding how these small
objects get allocated, that's, that's pretty interesting. So that's what this
article talks about. So I'll try to like summarize some of the stuff covered there. One of the
problems you have with memory allocation is that memory can get super fragmented, right? If I just
allocate a bunch of stuff and then delete it and keep allocating it and just, just let that grow,
you know, just keep adding on, on the end, wherever the memory is and i want to interact with that that can really mess
up like reading from ram and getting stuff on cache to be high performance and stuff like that
right so what python does is it actually pre-allocates these little 256k chunks and then
it partitions those up and it plucks plucks in the the small objects into
those spaces and then we'll potentially take them back out and then reuse those spaces that
it had already allocated when it needs to make a new small thing okay okay all right so that's
supposed to help with memory optimization the locality stuff the fragmentation and so on. So there's a special memory manager in Python
called PyMalloc, general purpose allocator on top of like C Malloc, there's a Python allocator,
right? So there's like this layer, we have RAM, we have the operating systems, virtual memory
management, we have C's Malloc, we have this PyMem, PyM malloc thing, we have the Python object allocator,
that then figures out where to place these things. And we actually have object memory.
So there's a lot of stuff going on here. And they break it into three levels of organization.
Okay, so for small objects, which are things that are individually smaller than 512 bytes,
right, not like maybe a list that has a bunch of stuff, but each little bit smaller,
right? So those are the things we're talking about. And what happens is it gets broken into
these three things called the block, the pool, and the arena. So a block is a chunk of memory
of a certain size, and it only holds Python objects of a certain size. So maybe there's a block that holds 16-byte Python things.
Okay.
That's weird.
Yeah.
So the reason is Python can then,
it knows how to exactly fill up and then reuse those blocks.
Oh, yeah.
Right?
So if it's like, oh, I'm going to get a bunch of numbers,
all the numbers are the same size unless they become utterly huge.
So we can just like allocate
them into the spot, some of those numbers go away, we got another block, we drop that new number
pointer in right there, or the number which we then point out right there, and so on. So there's
these different blocks, each one is a uniform size between eight and 512 bytes. And then the blocks
are managed by this thing called a pool, which is usually limited to a memory page size, so four kilobytes.
And then the pools are managed as these things called arenas.
And these are the things that are allocated on the heap.
I believe they are 256K pieces of memory, which hold 64 pools, which hold some number
of blocks and things like that, right?
So there's this really intricate way in which memory is trying to be grouped together
and then also trying to be reused without reallocating it from the operating system.
Okay.
Right?
So even though Python might new up a bunch of objects,
it actually says, well, but we already have this block that holds those size of things,
and there's some spots in there, So let's fill that bad boy up.
Oh, all right.
Yeah.
Anyway, so it's pretty interesting how all the stuff is working together.
But that's the Python small object allocator.
Never thought about that before.
But kind of interesting.
Also, I'm trying to visualize like a sports arena with 64 swimming pools in it.
That's not a bad one.
And then each pool is filled of exactly the same size people or creatures swimming around,
something like that.
Yeah.
Yeah, there you go.
That makes a lot of sense.
That's the first part of it totally made sense.
The last bit, maybe not so much.
All right.
Well, anyway, what I like about this article is it seems like it has a lot of stuff from
like, here's the actual C code that defines what an arena is.
Here you can see it's like a doubly linked list and how it all fits together.
And it's just got some good analysis.
So have a look if you've wondered about this.
All right.
Well, that's it for our main items.
I know, Brian, you have big news for the entire world if they live near Portland.
If they live in Portland or really close to Portland.
Or want to come to Portland.
September 26, I'll be
speaking downtown at the Portland Python user group and I'm still working on my talk but I'll
be there that'll be fun and then I'll probably polish it more and people have to volunteer for
this other talk so on October 6th it's the inaugural first day of uh meeting the python pdx west so we've got a new new user group
and for python in town i'm hosting it with along with you yeah it'd be fun i'm really looking
forward to it yeah and you'll be speaking there i will and i'm trying to get other people to
volunteer to speak and if they don't then it'll just be you and me speaking but i think it'll be
fun so we got a bunch of people signed up so far, so it's filling up fast.
People should sign up.
That's cool.
Maybe we could do a live Python Bytes sometime there as well at the end of the day or something.
Who knows?
That's a great idea.
Yeah, we could have.
Maybe not Tuesday, October 6th, but maybe someday we can make that happen.
Maybe someday, yeah.
Yeah, that's great news.
If you happen to be around, definitely drop in.
That'd be great.
It's on meetup.com, right?
People can just sign up there.
Yep, and a link in the show notes.
Do you have any intention of recording, live casting,
or otherwise spreading this in a farther path?
It's not a bad idea.
We don't have anything like that set up right away.
In the future, maybe we could do that.
Probably people would be interested in watching these.
But I also want to make it really accessible to people
that are new to presenting as well. I'd love to have people come in and do like a talk that
they're working on. It's not quite polished yet. I want it to not just be experts talking to
everybody else, but I'd like it to be people working out things that they're just interested
in. So I think it'd be good. Yeah, that sounds like a great philosophy for it.
How about you?
Any extras?
I have a couple.
Presenting and speaking PyCon 2020,
which is a little earlier this year.
I believe it's like in April or something.
The website's up.
Yeah.
Yeah, so April 15th to 23rd.
So the call for proposals is now open for PyCon 2020.
So if you would like to be considered,
a talk of yours to be considered there, then now is the time.
Yeah, go ahead and submit those because you know you're only going to spend like a week
writing it up anyway. So, may as well get that done right away.
That's right. Do like a band-aid, stop worrying about it, just get it over with.
Yeah.
Pull it right off. All right. Another thing, I just, have you heard of Gitbook?
Yeah, but I haven't really looked into it much.
I hadn't either. I was interviewing the guy, Joe, from Masonite, the Masonite web framework.
And I noticed that Masonite's documentation is written in Gitbook.
And so I looked at it and Gitbook is pretty interesting.
You can use it as kind of like almost a base camp project management type thing.
So stuff, personal notes or things you want to track or stuff like that. But you can also use it for documentation and
knowledge bases and whatnot. So it looked pretty cool. And so I thought I'd just, you know, let
people know that it's out there. It's free for small teams, like with some limitations. It's
cost a little bit of money for non-trivial small teams like $7 user, but it's also free for
open source and nonprofit teams, which is kind of cool. So I'm also a big fan of read the docs.
So it's, you know, I'm not saying they shouldn't use that, but here's an interesting project that
I ran across that I hadn't heard of. It looks nice. If people for some reason are opposed to
read the docs, I don't know why you would be. Or just like this look better, here's another opportunity.
So good to have options.
Good to have options.
Also good to have laughs.
Yeah, let's do some jokes.
All right.
How about you go first?
Okay.
So I pulled these out of a list of dad jokes you had posted somewhere on our Trello,
but changed it a little bit.
So what do you call a 3.14 foot long snake?
I don't know.
Well, that would be a python, of course.
With the Greek symbol thon, yeah, python.
Yeah.
So if it's not feet, but 3.14 inches, then what is it?
It's a micropython.
It's a micropython, a mu python.
Yeah, I feel like we're back in calculus or physics.
Yeah.
So do you want to do some of these?
Sure.
So why doesn't Hollywood make more big data movies?
I don't know.
Why?
No sequel.
This last one, it's a little bit crass.
It's, I don't know, it's a little low level, but I'll see what I can do here.
So why didn't the angle bracket div get invited to the dinner party?
I don't know.
Why?
It had no class.
Oh, yeah.
That's a good one.
All right.
Well, thanks for throwing those in there.
These are fun.
Yeah.
Thank you once again for talking with me on a nice Wednesday.
Absolutely.
See you later.
Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit PythonBytes.fm. If you have a news item you want featured,
just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something
cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and
sharing this podcast with your friends and colleagues.