Python Bytes - #294 Specializing Adaptive Interpreters in Full Color
Episode Date: July 26, 2022Topics covered in this episode: Specialist: Python 3.11 perf highlighter tomli “A lil’ TOML parser” Pydantic V2 Plan pikepdf Extras Joke See the full show notes for this episode on the web...site at pythonbytes.fm/294
Transcript
Discussion (0)
I am pulling off a very, very cool trick.
I just want to point out before we get started.
Okay.
On the TalkPython channel,
I'm doing a podcast with Anthony Shaw and Shane from Microsoft
about Azure and Python and some CLI stuff they built in FastAPI.
And at the exact same time, I'm doing this one here.
They're both streaming live.
I don't know how that's happening.
The other one was recorded two months ago, and we couldn't release it because some of the things weren't finished yet.
So I just, I hit go on that.
The real one, if you're bouncing around, the real one is here.
Okay.
So join us here.
Anyway, with that, you ready to start a podcast?
Yeah, definitely.
Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 294, recorded July 12th,
2022. I'm Michael Kennedy. And I am Brian Ocken. It's just us this weekend, or this today. It's
just us. Yeah. Yeah. I don't know. Dean out of the audience asks, is this a daily podcast show now?
I'm a little bit torn about it. I feel like we almost could do a daily show,
but then I think what it might take to do a daily show,
knowing how much work a weekly show is.
No, it's not a daily podcast.
No.
It might be fun to do sometime,
just do like a full week or something.
Right, exactly.
Just a super, there's so much news.
We're seeing it every day for the week.
But just like the same topics, like six days in a row.
Just do them over.
Yeah.
Exactly.
Exactly.
All right.
Am I up first this week?
You are.
Yes.
Right on.
Well, let me tell you about something special.
Specialist.
Okay.
Just last week, I believe it was, I interviewed Alex Waygood, who did the write-up for the
Python Language Summit.
And as part of the topics we were discovering, you know, the Python Language Summit and Python
this year is focusing a lot on performance and what's called the Shannon Plan.
So this is Mark Shannon's plan to make Python five times faster over five releases.
It's got a ton of support at Microsoft.
Peter Van Rossum's there working on it, but they've hired like five or six
other people who are full-time working on making Python faster now. So awesome, awesome. Thank you
for that. However, one of the things that made Python 3.11 fast is some of the early work they
did. And it comes down to PEP 659, a specializing adaptive interpreter. So let me tell you about this feature, this performance improvement first, and then we'll
see what specialist is about, because it's about understanding and visualizing this behavior.
Okay.
So one of the things that is a problem with Python, because it's dynamic and its types
can change and what can be passed could vary.
I mean, you could have type hints, but you can violate the type hints all day long and it's dynamic and its types can change and what can be passed could vary. I mean, you could have type hints,
but you can violate the type hints all day long
and it's fine.
So what the interpreter has to do is say,
well, we're gonna do all of our operations super general.
So if I have a function and it's called add
and it takes X and Y and it returns X plus Y,
seems easy, but is that string addition?
Is that numerical addition? Is that numerical addition?
Is that some custom operator overloading
with a dunder add or whatever it is in some type?
If it fails in one way, you kind of got to reverse it.
Like there's all this unknown, right?
Yeah.
What if you knew?
What if you knew those were integers
and not classes or not strings?
You could run different code.
You wouldn't have to first figure out what
they are. Are they compatible? Do you do the add in the low level CPython internals or do you go
to like some Python class and do it? You could be much more focused. Additionally, if it was
adding for a list, you could say, well, if I know their list, what we just do is go list dot
extend and we give it the other list,
right?
We don't hunt around and figure out all this other stuff.
So that's the general idea of the specializing interpreter is it goes through and it says,
look, we don't know for sure what could be passed here, but if it looks like over and
over, we're running the same code and it's always the same types.
Is there a way we could specialize those types, right?
Is there a way that we could put specific code for adding numbers or specific code for
combining lists?
And this is called adaptive and speculative specialization.
Okay.
Okay.
And my favorite part of it, when it's performed, it's called the quickening.
Quickening is the process of replacing slow instructions
with faster variants.
So kind of like I said,
it has some advantages over immutable bytecode.
It can be changed at runtime.
Like you see, we're always adding integers.
It can use super instructions that span lines
or take multiple operands.
And it does not need to handle tracing as it can fall back to the original bytecode for that. Okay. So there's a whole bunch
of stuff going on here. Like the example they give is you might want to specialize load adder.
So load adder is a way to say, give me the value that this thing contains. But what is the thing?
One of the things you might do is you might realize it's an instance class,
and then you would call load adder instance value. You might realize it's a module,
and you might call load adder module or slot or so on, right? But if you knew, you don't have to
go through first the abstract step and then figure out which of these it is. You just do the thing
that it is. Okay. So that's the idea of this PEP. This is one of the things that's making Python 3.11 faster.
Awesome.
So to the main topic.
Okay.
And I'll just, just as a note, I'm saying, okay, as if I understand what you just said,
but most of it just went.
It's all right.
I think we'll, let's, let's look at pictures.
Okay.
All right.
So this thing by Brant Boucher is called Specialist,
and it's about visualizing this specializing adaptive interpreter.
Oh, okay.
Good.
Okay.
So it says Specialist uses fine-grained location information
to create visual representations of exactly where and how
CPython 3.11's new specializing adaptive interpreter optimizes your code.
And it's not just interesting,
it has actionable information. So for example, see here, and you've got to pull up the website
if you're just listening. If you see in that website, you'll see some color. You'll see green,
less green, yellow, orange, and all the way to red. So there's two aspects. There's sort of a darkness as well as a color.
So the most, like where Python could take advantage of this feature,
you see green.
Where it can't, you see red.
And imagine a spectrum.
It goes like green, yellow, orange, red.
So it's not on or off.
It's how much could it specialize, okay?
Okay.
So what you see here
for example is it's able to take um some numbers and in that um an integer and a string and then
use the fact that it knows what those are to make certain things like appending an output and doing
some um character operations on it yeah right it was able to replace that with a different runtime behavior because of this quickening.
All right.
So let's skip down here.
I gave you a bit of the background.
So it says, let's look at this example.
We have F to C, which converts Fahrenheit to Celsius.
And what it does is, okay, we're going to take an F and it has type hints that say float,
float.
Okay.
So, but those don't matter.
So it says we're going to take an f and subtract 32 from
it and then we're going to do simple math we're going to take that result that range that that
size of temperature there based on zero and then multiply it by five and divide it by nine we all
learned this in chemistry class or somewhere or we talked about converting different measurement
yeah of course yeah right So these are straightforward,
but there's actually problems in here
that make it slower and prohibit Python
from quickening it as much as it can be quickened.
Okay.
So if we take this code,
it just runs F to C and C to F
and it gives us some test values and says,
just do it and tell us what happened.
We can run specialists on it and it says,
okay, this X here,
the green areas indicate regions of code
that were successfully specialized
where red areas are unsuccessful.
Like it tried and it failed.
So it says one of the problems is start out
the X equals F minus 32.
It says, well, we can quicken operations
on numerical types that are the same but for now
there's not a float int and float variant of this it's got to be float float all right so it says
right you you could have gotten a faster operation there but because the types didn't match you won't
but then what it did get out is an x and that's great uh an x which is a float and it's going to
do some stuff and it could sort of make it better. But it said, look, here's some multiplication again by an integer and a float.
So that's not quickened.
And this division is apparently never quickened.
So what can we do?
Well, with that information, you can say, well, what's the problem with subtracting 32?
Well, it wasn't a float.
What if I said 32.0?
Oh, yes.
All right.
That gets replaced by faster code.
Oh, nice.
Right?
Yeah.
So that's pretty nice. And if you want to return, it was adding like x plus 32 for the other direction. And now it's 32.0. That's faster. Okay, well, what else? What if we, now you can see when we did that part of the conversion x times five divided by nine, if we put a 5.0, that gets faster still, but the divide is never quickened. Okay. Well, what if we put the
divide in parentheses? It doesn't really matter if it's X times five divided by nine or X times
five divided by nine, right? It's, these are mathematically equivalent, but they're not
equivalent to Python because that, that operation results in, it leverages constant folding,
right? Five divided by nine is pre-computed in Python
to be a float.
Okay.
Right, at parse time, right?
That's just how it works with constants.
If it says it can do math with constants ahead of time,
it does it.
So that becomes a float
and then float times float is now quickened, right?
Isn't this cool?
The way you can apply this
and actually make your code faster,
not just go, oh, it's interesting.
It must be quickening it there.
But it's actionable.
It is really pretty cool.
And I'd really like to see this incorporated
into an editor or something to say,
your code will be faster if you just add a 0.0 here
or something like that.
And it's going to become a float anyway.
It doesn't matter.
It just, why would you write 32.0 when you just meant 32?
Seems more precise to say 32.
Because I'm used to doing that, to thinking if it's okay well me personally
i if i know it's going to be a float math i usually do 0.0 but maybe maybe that's not a
normal thing you're such a c programmer
so all right well i think this is really cool this is specialist and you know i don't know
if i have any code that does
math at that fine a greater level that i really care but but maybe uh you know if you're in charge
of a library where you've got a tight loop or you do a lot of math science stuff where it matters
this can be really useful and what's cool is it's not like and switch to rust or switch to c or
switch to scython and it'll take effect. Like, no, this, this is like straight Python code. This
is just, how do I take most advantage of what is already happening for performance boosts in 3.11
that we haven't had before? I think, and I think it's going to be just one more workflow step. So
you've got your profile or your code, your whole thing is a little bit slower than you'd like it
to be. You throw a profiler on it. You see the bottleneck areas that you could improve and you think,
should I like rewrite some of this and rest or see, or, you know,
what should I do? Well, first off, let's try doing this, like throw,
throw, throw this at it and, and,
and have the optimizer from three 11 help you out. And, and yeah,
so I think this,
I can definitely see that this is going to be part of people's workflow.
But yeah, profile first.
I agree that you want to profile first.
Don't do it everywhere.
Yes, exactly.
Because while it's fun to do this,
only focus where it's going to matter.
Don't optimize a bunch of stuff that doesn't.
So Brian out in the audience says,
different Brian,
is there a plan to do lossless type conversion or maybe flake eight can make this kind of suggestion?
Yeah, exactly. Yeah. I'm not really sure if, um, you don't want to write the code where you get
different outputs probably. Right. But everything that was happening here, you were, you ended up
with the same outcome anyway. It's just like, well, do I do the division first
or the multiplication?
Or do I start with an int that results
after some addition, subtraction with a float?
Or is I just make them all floats, right?
I feel like it's, in most cases,
it shouldn't be changing the outcome.
So, yeah, cool.
Anyway, that's what I got for the first one.
How about you?
Well, kind of sticking with a 3.11 theme so far.
Well, we can use Toml now.
But in 3.11, we are going to have a Toml lib be part of Python 3.11 with PEP 680.
And we covered that in episode 273. But one of the things we didn't mention was that the Tomlib is, and I think we did mention it, is based on Tomlib.
But Tomlib you can use right now.
So a lot of projects are switching to use Tomlib as their Toml parser to read like pypro to, to read, read, uh, like pipe project at Tom or, or read
their own, uh, config file. And, um, and so I just wanted to highlight it. It's a Tom Lee is the,
uh, a little Tom will parser. Um, it's a cute little thing on the project. But, um, but I was
reminded of it because, um, uh, real, the real Python people real Python people put out actually looks like gear on.
Sorry, I'm not going to try to pronounce that name.
Real Python wrote an article called Python and Tom all new new best friends.
And I really love it's a it's a very comprehensive article.
But I really love at least the first three parts of it.
Using Toml as a config format, getting to know key value pairs, and load Toml with Python.
Because this is kind of what you're going to do with it.
You're going to write config files for something.
And I just kind of, this is a great introduction of Toml for Python.
And that's kind of what we care about right so um it goes through like just
getting getting used to what toml looks like what a configure file looks like talking about how
all the keys even if you it's like key value stuff and even if you you put a number there
or something it's going to be a string all the keys get converted to strings even if they don't
look like them um and they are um they're
they're utf-8 so you can use um unicode in there as well which is kind of neat um but your emojis
in there yeah well can you is is our emojis utf-8 um i think mostly uh okay many of them are
interesting that'll be fun to put putjis in here. I don't know.
What mode are we running? Are we running in cow mode or lizard mode?
I'll do lizard. Yeah. Okay. Well, if you're running in lizard mode, you need to check out.
Okay. I got to try that to see. I should have done that before.
Oh my gosh. I think almost it's both horrible and amazing to imagine writing like config files to like put it in lizard mode. Do it.
Yeah. One of the things that I didn lizard mode do it yeah um one of the
things that i didn't before reading this article one of the things i didn't know you could do
in toml because i just sort of cursory i use it with pyproject.toml and that's about it
but you can do um uh so uh talks about um normal how to read stuff but one of the things is um
oh what was i going to talk about arrays uh and you
can do arrays of things which are neat and tables and arrays of tables uh which is like so you have
arrays of tables are these bracket bracket things and uh and then you can do dot stuff so if you
have like uh um how was it user and user dot player these will show up as like, you know,
sub-dictionary key things.
And so one of the things that I,
and I played with it this morning,
and it really, I should have had something to show,
but the thing I like to do is to just read it,
just like this article talks about reading it,
just read the TOML file into Python and print it. Just like this article talks about reading it. Just read the TOML file into Python
and print it. And then you can, and it'll print out as a dictionary. And then you can create
whatever format you want for your TOML file. And then you can just see what it's going to look like.
And then you know how to access it. That's one of the best ways to do that.
That's awesome. What an interesting format that's pretty that's pretty in-depth and a blast from last week past ashley hey ashley says utf-8 can encode any unicode
character emoji your heart emoji heart out mary oh yeah you could do like you know is it in heart
mode heart equals true heart equals false or uh oh optimize optimizer, you could do a flame emoji equals true.
Um, exactly. So I love it. Yeah. I think, look, we have not leveraged the configuration as emoji
sufficiently. Oh yeah. I think, I think a pie test should rewrite all of its config
figs as emoji items. Just do a PR. I'm sure they'll take it.
Yeah. Yeah. All right. Yeah. All right.
Let me tell you about our sponsor for this week before we move on.
So this week is brought to you by Microsoft Founders Hub.
In fact, they are supporting a whole bunch of upcoming episodes.
So thank you a whole bunch to Microsoft for startups here.
Starting business is hard by some estimates.
Over 90% of startups go out of business within their first year.
With that in mind, Microsoft for startups set out to understand what startups need to be successful and to create a digital platform to
help overcome those challenges. Microsoft for Startups Founders Hub. Their hub provides all
founders at any stage with a bunch of free resources to help solve challenges. And you get
technology benefits, but also really importantly, access to expert guidance and skilled resources, mentorship and networking connections, and a bunch more.
So unlike a bunch of other similar projects in the industry, Microsoft for Startup Founders
Hub does not require startups to be investor backed or third party validated to participate.
It's free to apply.
And if you apply, get in, then it's, you're in.
It's open to all. So what do you, get in, then you're in. It's open to all.
So what do you get if you join or apply and then get accepted?
So you can speed up your development with access to GitHub, Microsoft Cloud, the ability
to unlock credits over time, as in it gets over $100,000 worth of credits over time over
the first year if you meet a bunch of milestones, which is fantastic.
Help your startup innovate.
Founder Hubs is partnering with companies like OpenAI,
a global leader in AI research and development
to provide benefits and discounts too.
Neat.
Yeah.
Through Microsoft Startup Founders Hub,
becoming a founder is no longer about who you know.
You'll have access to the mentorship network,
giving you access to a pool of hundreds of mentors
across a range of disciplines,
areas like idea validation, fundraising,
management coaching, sales marketing,
as well as specific technical, technical stress points.
To me, that that's actually the biggest value is the networking and mentor side.
So book a, you'll be able to book a one-on-one meeting with these mentors, many of whom are founders themselves.
Make your idea a reality today with the critical support you'll get from Microsoft or startups
founders hub.
Join the program at pythonbytes.fm slash founders hub
link will be in your player show notes. Nice. Yeah, cool. Indeed. All right. I guess I'm up
next with this order we got and oh my goodness, Samuel Colvin take a bow because he put out a plan
for what's happening with Pydantic version two. But the reason I say take a bow is this is one detailed plan
that is really, really thought through,
thought out, backed up with a bunch
of GitHub discussions and so on.
So the idea is Pydantic started out
as an interesting idea and surprise, surprise,
a bunch of people glommed onto it,
probably more than it was originally envisioned to be so.
So for example,
SQL model from Sebastian Ramirez is like,
Pydandic models are now our ORM to the database with all the interesting
stuff that ORMs have.
And Roman Wright said,
guess what?
We could do that for MongoDB as well.
Same with the Pydastic thing we recently spoke about.
And then Sebastian Ramirez is like,
also like,
hey,
fast API,
this can be both our data exchange as well as our documentation. I was like, oh my goodness,
what's going on here? So since there's a bunch of stuff on the insides that could be better,
let's say, or maybe time to rethink this. So in this plan, it talks about what they'll add,
what they'll remove, what will change, some of the ideas for how long it will take and so on. Interesting. Yeah. Here's a, here's a pretty significant thing. I'm currently taking a kind of sabbatical after leaving my last job to work on
this, which goes until October. So that's a big commitment to, I'm going to help make Pydantic
better. So it sounds familiar. It sounds a bit like a rich and textual and those types of
things as well. But this is a big, big commitment from Samuel and he's really doing a ton of work.
It says people seem to care about my project. It's downloaded 26 million times a month.
Wow. It's insane. Yeah. That's awesome. That's kind of incredible. It is. And so it says,
here's the basic roadmap. Implement a few
features in what's now called the Pydantic Core. We just had Ashley, who as we saw is out in the
audience. Hey, Ashley, who give a bit of a shout out to this feature. And also I do want to also
credit a couple other people's because Douglas Nichols and John Fagan also let me know that this
was big news coming. So thank you all for that. The PyDana core is being rewritten in Rust,
which doesn't mean you have to know or do anything.
It just means you have to pip install something.
You get a binary compiled thing that runs a lot faster.
Okay, so more on that in a second.
First, they're working to get 110 out
and basically merge every open PR that makes sense
and close every PR that doesn't make sense
and then profusely apologize to why your PR that you sense and close every PR that doesn't make sense. And then profusely
apologize to why your PR that you spent a long time making was closed without merging. Some other
bookkeeping things, start tearing the Pydantic code apart and see how many existing tests can
still be made to pass and then release eventually Pydantic. The goal is to have this done by October,
probably by the end of the year for sure. A couple of things worth paying attention to.
There are a bunch of breaking changes in here. A lot of things are being
cleaned up, reorganized, renamed, some removed, like from ORM, people might be using that with
SQL Alchemy, that's being removed, for example, and so on. So there's, if you depend heavily on
Pydantic, especially if you build a project like Beanie that depends heavily on PyDantic, you are going to need to look at this
because some of the stuff won't work anymore.
But let's highlight a couple of things here.
Performance.
This one is really important
because this is the data exchange level for FastAPI.
This is the database transformation level.
When I do a query from the database,
what comes back comes back in some raw form
and then it's turned into a PyDantic model.
And those are computationally expensive things that happen often.
And in general, Pydandic version two is about 17 times 1,700% faster than V1 when validated
models in a standard scenario.
It says between four to 50 times faster than Pydandic one.
That's cool, right?
Yeah.
That alone should make your ears perk up and go,
excuse me, my ORM just got 17 times faster?
Wait a minute, I'm liking this.
I know this is not the only thing that happens at ORM level,
but the ones I called out that depend heavily on it,
that's in the transformation path.
So this is important.
Yeah.
This is actually, I'm super impressed.
I have not, I normally don't even see this sort of advanced planning in commercial projects.
Yes.
Oh yeah.
You could do a whole business startup that doesn't have the amount of thought that went
into like what's happening in the next version of PyDantic.
It's ridiculous.
Yeah.
It's incredible.
I was serious when I said take a bow.
It really lays out, opens a discussion about certain things and so on.
So like another one is strict mode.
I think I even saw a comment in the chat about it.
So one of the things I actually like about Pydantic, but under certain circumstances,
I can see why you would not want it is if you have something you say is an integer field
and then you pass one, two the number great but if you also pass quote one two three pydantic will magically parse that
for you like this happens all the time on the internet like a query string has a number but
query strings are always strings there's no way to have anything but strings yeah so you got to
convert them right so this automatically does that but if you don't want that to happen you say you
gave me a string it's invalid you can turn on strict mode, which is off by default, I believe. There's also a bunch
of play. Go ahead. So strict mode does the conversion or strict mode does not?
Strict mode won't do the conversion. It says, you said it's an int, you gave me a string.
Nope. Rather than, could it be an integer? Let's try that first. You know what I mean?
Yeah.
You know, maybe one of the things you do is
in the orm level one of those things you might put it in strict mode so it doesn't do as much
work trying to convert stuff i don't know if that actually would matter but formalizes a bunch of
conversions it has um built-in json support and different things another big thing is this
pydantic core will be able to be used outside of pydantic classes now so you can do a significant
performance improve to improve stuff like adding validation to data classes or validating
arguments and query strings or a type dick or a function argument or whatever yeah yeah uh let's
see next up um and let's see this one. Strict mode. We talked about strict mode. Another one is required versus nullable. There's a little bit of ambiguity of, you know, if you said something's a string, that means it's required and it can't be none. If you say it's a string type none or as an optional string or something like that, then basically the behaviors were a little bit different.
So originally, I think this is when typing was pretty new,
said,
Hydantic previously had a confused idea
of required versus nullable.
This is mostly resulted from Sam's misgivings
about marking a field as optional,
but requiring a value to be provided to it,
but allowing it to be set to none
or something along those lines.
Anyway, there's minor changes around that. Uh, let's see final one that I want to cover is namespace stuff. And this is like a whole bunch of things
are now getting renamed. So for example, if you override, if you implemented or overrode
validate JSON, it's now model underscore validate JSON. If you had is instance,
it's now model is instance. So there you had is instance, it's now model is
instance. So there's a bunch of these changes all over the place that look like they're going to
cause break-in changes. They're easy to fix. Just change the name, but you know, it's not nothing.
Also parse file. I still love his hander here. Parse file. This was a mistake. It should have
never been in PyTantic. We're removing it. Okay. Partially replaced by this other thing.
Anything else it did was a mistake.
From ORM, this has been moved somewhere else.
Schema and so on.
So there's a lot of stuff that people were using here.
So just have a look.
Try it out.
Don't just go, oh, then version 2 is out.
Is this going to work?
This is going to have some significant changes.
And another reason why it's really awesome that he goes through so much detail
is because there's going to be stuff that breaks.
So it's a breaking interface change.
And so, yeah, it's cool that it's this detailed.
And a couple of things to notice.
Let's see, somebody else in the chat mentioned,
Richard mentioned, and he has emojis in the headers.
Yeah, there's emojis in the headers.
And I got to say like the navigation
in the table of contents, very cool.
It goes to like light gray for areas you've already seen.
And then-
Oh, that's interesting.
It's a cool thing.
Yeah, it's quite cool.
I think going on and on, but two real quick things.
One, there'll be no pure Python implementation of the core.
It's always Rust, but they list out the platforms where it'll be compiled to, including WebAssembly.
Oh, nice.
They previously had some Cython in what was supposed to be pure Python's Pydantic.
And so now, a kind of bonus is the Pydantic model,
the Pydantic package becomes a pure Python package, whereas previously it wasn't. So they've
taken like all of that behavior and put it under this core thing that ships as a Rust binary. And
now instead of doing some Cython middle ground, it's pure Python again. So that's interesting
refactoring, I think. Yeah. Yeah. And finally, documentation.
When you get a validation error, it gives you a link to the documentation in the JSON error message.
That's pretty cool.
That's nice.
All right.
Yeah.
Anyway, that's quite a plan, isn't it, Brian?
Yeah.
Quite a plan.
All right.
Well, I'm excited for it.
Okay.
Well, next topic is a little more lighthearted.
It's about fish.
Pike, to be specific.
No, it's about PDFs.
So it's just a cool project I saw, noticed.
Pike PDF.
It's a Python library for reading and writing PDF files.
What's the big deal?
We've had these before.
But this is, it's based on QPDF, which is a C++-based library,
and it's presently continued being maintained.
So it's kind of pretty fast.
Well, actually, I'm assuming it's fast if it's C plus plus in the background. Um, but the, uh, uh, it's, it's also pretty just nice and elegant to do things.
And the documentation, um, has this nice fish, which is good. I always like cool diagram,
cool, cool logos. Uh, but, um, some of the neat things that you can do with it. So it's recommending that you not use it.
If you're just writing PDF files that there's, there's, um, there's other things that you
can use.
What was it like report lab to, to write PDFs.
But if you're having to read or modify PDFs, then this is where it shines.
You can do things like, uh, copy pages from one PDF to another, split and merge PDFs,
extract content out of PDFs.
Like if you're using it for data stuff,
you get a report in PDF and you're trying to pull the information out,
you can use it for that.
Or images, you can pull all the images out of a PDF file.
Or, this is kind of cool,
you can replace images in a PDF file
and generate a new one without changing anything else about the file. It's kind of cool. You can replace images in a PDF file and generate a new one
without changing anything else about the file.
It's kind of neat.
So just kind of a neat,
if people are working with reading
or modifying PDF files,
maybe check this one out.
Yeah, this looks great.
The fact that it's in C++,
I'm guessing it's probably standalone.
I remember I've done some PDF things before
and it felt like I had to install some OS level thing
that it shelled out to.
So this is cool.
Yeah.
And the, some nice on the readme,
it has a comparison of some of the different PDF doc
or PDF libraries that you could use.
And some of the reasons why you might want this one,
like it supports more versions.
I didn't realize that like one of these libraries
I've heard of before, PDFRW,
doesn't support the newer versions.
So bummer.
And then also password protected files.
It supports that, except for, but not public key ones,
but just normal passwords.
Straight passwords.
Yeah.
Yeah.
That's great.
So it's kind of neat.
I also like the measure of actively maintained,
the commit activity per year over the years,
something like that.
Oh, right.
That's kind of interesting.
Yeah.
It's an interesting metric.
It seems good.
I haven't really thought about it lately, but yeah.
Nice.
All right.
Yeah.
This is a great one.
Well, so that's it for our main items.
Yeah. What else you got? great one. Well, so that's it for our main items. Yeah.
What else you got?
Any extras?
Well, last week we talked about the critical packages.
Or at some recent.
Yeah, last week we talked about critical packages.
Yesterday or last week, depending on how you consume this.
Exactly.
Yeah. So we, I was surprised to find out that a pie test check where the plugin I wrote was,
was one of those.
I'm like, really?
Uh, because it's like the top 1%.
So I, um, if anybody's curious, I wanted to just highlight that a little bit.
So a pie test check is a plugin that allows multiple failures per test.
And, uh, in one of the best ways i it's a secondary
way that one of the contributors added is you can use it as a context manager you can say like with
check and then do an assert then you're not multiple of those within a i like the one liner
even that's yeah nice and this is totally like black will totally reformat this if you ran it
through black but it's nice uh you'd have to
block it out anyway um i i was like how could it be what well i'm curious what what on the list it
was so there's there's a um a place called uh what uh hugo vk um has a top pi pi packages list
and it's updated i think it's just updated once a month or something.
But you can do the top 5,000.
Yeah, it's the top 5,000 or 1,000 or 100.
And so I was curious about where on the list I was.
I'm number 1,677, so kind of far down the list.
But, hey, we're just talking. It's still in the top third of the top one percent.
That's pretty awesome.
The pie test is number seventy two.
That was pretty neat.
And Pydantic, which we covered was was I just checked one seventeen.
But there are there are fifty seven pie test plugins that show up in the top thirty five
hundred.
So that's pretty neat.
That is pretty neat.
That's all I got for extras.
All right.
Well, I have zero extras.
So mine are finished as well.
How about a joke?
Yeah.
Great.
All right.
I told you we're coming back to it.
So this one comes from Netta,
Netta Code Girl at Netta, N-E-T-A dot M-K.
And let me just pull this one up here.
All right.
So this one is, there's this colleague here.
Can I make this?
There we go.
Make it a little bigger.
There's the two women who are developers, Netta and her unnamed friend who always has
gotten in trouble with the elevator last time, basically.
And there's this sort of weird manager looking guy that comes in and says, I tested your
chatbot, but some of its replies are really messed up.
Well, that's what testing is all about.
I'll go through the logs later, says one of the girls.
No, no, no, no, no, no, no, no, no need.
Check out the faces.
She's like, excuse me, I'm not even sure I want to open the logs now.
Yeah, yeah, don't look at the logs.
That's what testing is for. I'll go open the logs now yeah yeah don't look at the logs that's what testing's for uh
i'll go through the logs well yeah she's got some good ones in her list there so
love it yeah i like the art too uh nice art i do too it is so also nice was our podcast
thanks for being here thank you yeah you bet see you next week see you next time