Python Bytes - #294 Specializing Adaptive Interpreters in Full Color

Starting point is 00:00:00 I am pulling off a very, very cool trick. I just want to point out before we get started. Okay. On the TalkPython channel, I'm doing a podcast with Anthony Shaw and Shane from Microsoft about Azure and Python and some CLI stuff they built in FastAPI. And at the exact same time, I'm doing this one here. They're both streaming live.

Starting point is 00:00:19 I don't know how that's happening. The other one was recorded two months ago, and we couldn't release it because some of the things weren't finished yet. So I just, I hit go on that. The real one, if you're bouncing around, the real one is here. Okay. So join us here. Anyway, with that, you ready to start a podcast? Yeah, definitely.

Starting point is 00:00:38 Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 294, recorded July 12th, 2022. I'm Michael Kennedy. And I am Brian Ocken. It's just us this weekend, or this today. It's just us. Yeah. Yeah. I don't know. Dean out of the audience asks, is this a daily podcast show now? I'm a little bit torn about it. I feel like we almost could do a daily show, but then I think what it might take to do a daily show, knowing how much work a weekly show is. No, it's not a daily podcast.

Starting point is 00:01:10 No. It might be fun to do sometime, just do like a full week or something. Right, exactly. Just a super, there's so much news. We're seeing it every day for the week. But just like the same topics, like six days in a row. Just do them over.

Starting point is 00:01:24 Yeah. Exactly. Exactly. All right. Am I up first this week? You are. Yes. Right on.

Starting point is 00:01:32 Well, let me tell you about something special. Specialist. Okay. Just last week, I believe it was, I interviewed Alex Waygood, who did the write-up for the Python Language Summit. And as part of the topics we were discovering, you know, the Python Language Summit and Python this year is focusing a lot on performance and what's called the Shannon Plan. So this is Mark Shannon's plan to make Python five times faster over five releases.

Starting point is 00:01:59 It's got a ton of support at Microsoft. Peter Van Rossum's there working on it, but they've hired like five or six other people who are full-time working on making Python faster now. So awesome, awesome. Thank you for that. However, one of the things that made Python 3.11 fast is some of the early work they did. And it comes down to PEP 659, a specializing adaptive interpreter. So let me tell you about this feature, this performance improvement first, and then we'll see what specialist is about, because it's about understanding and visualizing this behavior. Okay. So one of the things that is a problem with Python, because it's dynamic and its types

Starting point is 00:02:42 can change and what can be passed could vary. I mean, you could have type hints, but you can violate the type hints all day long and it's dynamic and its types can change and what can be passed could vary. I mean, you could have type hints, but you can violate the type hints all day long and it's fine. So what the interpreter has to do is say, well, we're gonna do all of our operations super general. So if I have a function and it's called add and it takes X and Y and it returns X plus Y,

Starting point is 00:03:00 seems easy, but is that string addition? Is that numerical addition? Is that numerical addition? Is that some custom operator overloading with a dunder add or whatever it is in some type? If it fails in one way, you kind of got to reverse it. Like there's all this unknown, right? Yeah. What if you knew?

Starting point is 00:03:17 What if you knew those were integers and not classes or not strings? You could run different code. You wouldn't have to first figure out what they are. Are they compatible? Do you do the add in the low level CPython internals or do you go to like some Python class and do it? You could be much more focused. Additionally, if it was adding for a list, you could say, well, if I know their list, what we just do is go list dot extend and we give it the other list,

Starting point is 00:03:45 right? We don't hunt around and figure out all this other stuff. So that's the general idea of the specializing interpreter is it goes through and it says, look, we don't know for sure what could be passed here, but if it looks like over and over, we're running the same code and it's always the same types. Is there a way we could specialize those types, right? Is there a way that we could put specific code for adding numbers or specific code for combining lists?

Starting point is 00:04:14 And this is called adaptive and speculative specialization. Okay. Okay. And my favorite part of it, when it's performed, it's called the quickening. Quickening is the process of replacing slow instructions with faster variants. So kind of like I said, it has some advantages over immutable bytecode.

Starting point is 00:04:35 It can be changed at runtime. Like you see, we're always adding integers. It can use super instructions that span lines or take multiple operands. And it does not need to handle tracing as it can fall back to the original bytecode for that. Okay. So there's a whole bunch of stuff going on here. Like the example they give is you might want to specialize load adder. So load adder is a way to say, give me the value that this thing contains. But what is the thing? One of the things you might do is you might realize it's an instance class,

Starting point is 00:05:06 and then you would call load adder instance value. You might realize it's a module, and you might call load adder module or slot or so on, right? But if you knew, you don't have to go through first the abstract step and then figure out which of these it is. You just do the thing that it is. Okay. So that's the idea of this PEP. This is one of the things that's making Python 3.11 faster. Awesome. So to the main topic. Okay. And I'll just, just as a note, I'm saying, okay, as if I understand what you just said,

Starting point is 00:05:35 but most of it just went. It's all right. I think we'll, let's, let's look at pictures. Okay. All right. So this thing by Brant Boucher is called Specialist, and it's about visualizing this specializing adaptive interpreter. Oh, okay.

Starting point is 00:05:50 Good. Okay. So it says Specialist uses fine-grained location information to create visual representations of exactly where and how CPython 3.11's new specializing adaptive interpreter optimizes your code. And it's not just interesting, it has actionable information. So for example, see here, and you've got to pull up the website if you're just listening. If you see in that website, you'll see some color. You'll see green,

Starting point is 00:06:20 less green, yellow, orange, and all the way to red. So there's two aspects. There's sort of a darkness as well as a color. So the most, like where Python could take advantage of this feature, you see green. Where it can't, you see red. And imagine a spectrum. It goes like green, yellow, orange, red. So it's not on or off. It's how much could it specialize, okay?

Starting point is 00:06:44 Okay. So what you see here for example is it's able to take um some numbers and in that um an integer and a string and then use the fact that it knows what those are to make certain things like appending an output and doing some um character operations on it yeah right it was able to replace that with a different runtime behavior because of this quickening. All right. So let's skip down here. I gave you a bit of the background.

Starting point is 00:07:10 So it says, let's look at this example. We have F to C, which converts Fahrenheit to Celsius. And what it does is, okay, we're going to take an F and it has type hints that say float, float. Okay. So, but those don't matter. So it says we're going to take an f and subtract 32 from it and then we're going to do simple math we're going to take that result that range that that

Starting point is 00:07:31 size of temperature there based on zero and then multiply it by five and divide it by nine we all learned this in chemistry class or somewhere or we talked about converting different measurement yeah of course yeah right So these are straightforward, but there's actually problems in here that make it slower and prohibit Python from quickening it as much as it can be quickened. Okay. So if we take this code,

Starting point is 00:07:56 it just runs F to C and C to F and it gives us some test values and says, just do it and tell us what happened. We can run specialists on it and it says, okay, this X here, the green areas indicate regions of code that were successfully specialized where red areas are unsuccessful.

Starting point is 00:08:13 Like it tried and it failed. So it says one of the problems is start out the X equals F minus 32. It says, well, we can quicken operations on numerical types that are the same but for now there's not a float int and float variant of this it's got to be float float all right so it says right you you could have gotten a faster operation there but because the types didn't match you won't but then what it did get out is an x and that's great uh an x which is a float and it's going to

Starting point is 00:08:41 do some stuff and it could sort of make it better. But it said, look, here's some multiplication again by an integer and a float. So that's not quickened. And this division is apparently never quickened. So what can we do? Well, with that information, you can say, well, what's the problem with subtracting 32? Well, it wasn't a float. What if I said 32.0? Oh, yes.

Starting point is 00:09:00 All right. That gets replaced by faster code. Oh, nice. Right? Yeah. So that's pretty nice. And if you want to return, it was adding like x plus 32 for the other direction. And now it's 32.0. That's faster. Okay, well, what else? What if we, now you can see when we did that part of the conversion x times five divided by nine, if we put a 5.0, that gets faster still, but the divide is never quickened. Okay. Well, what if we put the divide in parentheses? It doesn't really matter if it's X times five divided by nine or X times five divided by nine, right? It's, these are mathematically equivalent, but they're not

Starting point is 00:09:35 equivalent to Python because that, that operation results in, it leverages constant folding, right? Five divided by nine is pre-computed in Python to be a float. Okay. Right, at parse time, right? That's just how it works with constants. If it says it can do math with constants ahead of time, it does it.

Starting point is 00:09:51 So that becomes a float and then float times float is now quickened, right? Isn't this cool? The way you can apply this and actually make your code faster, not just go, oh, it's interesting. It must be quickening it there. But it's actionable.

Starting point is 00:10:02 It is really pretty cool. And I'd really like to see this incorporated into an editor or something to say, your code will be faster if you just add a 0.0 here or something like that. And it's going to become a float anyway. It doesn't matter. It just, why would you write 32.0 when you just meant 32?

Starting point is 00:10:19 Seems more precise to say 32. Because I'm used to doing that, to thinking if it's okay well me personally i if i know it's going to be a float math i usually do 0.0 but maybe maybe that's not a normal thing you're such a c programmer so all right well i think this is really cool this is specialist and you know i don't know if i have any code that does math at that fine a greater level that i really care but but maybe uh you know if you're in charge of a library where you've got a tight loop or you do a lot of math science stuff where it matters

Starting point is 00:10:54 this can be really useful and what's cool is it's not like and switch to rust or switch to c or switch to scython and it'll take effect. Like, no, this, this is like straight Python code. This is just, how do I take most advantage of what is already happening for performance boosts in 3.11 that we haven't had before? I think, and I think it's going to be just one more workflow step. So you've got your profile or your code, your whole thing is a little bit slower than you'd like it to be. You throw a profiler on it. You see the bottleneck areas that you could improve and you think, should I like rewrite some of this and rest or see, or, you know, what should I do? Well, first off, let's try doing this, like throw,

Starting point is 00:11:36 throw, throw this at it and, and, and have the optimizer from three 11 help you out. And, and yeah, so I think this, I can definitely see that this is going to be part of people's workflow. But yeah, profile first. I agree that you want to profile first. Don't do it everywhere. Yes, exactly.

Starting point is 00:11:54 Because while it's fun to do this, only focus where it's going to matter. Don't optimize a bunch of stuff that doesn't. So Brian out in the audience says, different Brian, is there a plan to do lossless type conversion or maybe flake eight can make this kind of suggestion? Yeah, exactly. Yeah. I'm not really sure if, um, you don't want to write the code where you get different outputs probably. Right. But everything that was happening here, you were, you ended up

Starting point is 00:12:21 with the same outcome anyway. It's just like, well, do I do the division first or the multiplication? Or do I start with an int that results after some addition, subtraction with a float? Or is I just make them all floats, right? I feel like it's, in most cases, it shouldn't be changing the outcome. So, yeah, cool.

Starting point is 00:12:41 Anyway, that's what I got for the first one. How about you? Well, kind of sticking with a 3.11 theme so far. Well, we can use Toml now. But in 3.11, we are going to have a Toml lib be part of Python 3.11 with PEP 680. And we covered that in episode 273. But one of the things we didn't mention was that the Tomlib is, and I think we did mention it, is based on Tomlib. But Tomlib you can use right now. So a lot of projects are switching to use Tomlib as their Toml parser to read like pypro to, to read, read, uh, like pipe project at Tom or, or read

Starting point is 00:13:26 their own, uh, config file. And, um, and so I just wanted to highlight it. It's a Tom Lee is the, uh, a little Tom will parser. Um, it's a cute little thing on the project. But, um, but I was reminded of it because, um, uh, real, the real Python people real Python people put out actually looks like gear on. Sorry, I'm not going to try to pronounce that name. Real Python wrote an article called Python and Tom all new new best friends. And I really love it's a it's a very comprehensive article. But I really love at least the first three parts of it. Using Toml as a config format, getting to know key value pairs, and load Toml with Python.

Starting point is 00:14:12 Because this is kind of what you're going to do with it. You're going to write config files for something. And I just kind of, this is a great introduction of Toml for Python. And that's kind of what we care about right so um it goes through like just getting getting used to what toml looks like what a configure file looks like talking about how all the keys even if you it's like key value stuff and even if you you put a number there or something it's going to be a string all the keys get converted to strings even if they don't look like them um and they are um they're

Starting point is 00:14:45 they're utf-8 so you can use um unicode in there as well which is kind of neat um but your emojis in there yeah well can you is is our emojis utf-8 um i think mostly uh okay many of them are interesting that'll be fun to put putjis in here. I don't know. What mode are we running? Are we running in cow mode or lizard mode? I'll do lizard. Yeah. Okay. Well, if you're running in lizard mode, you need to check out. Okay. I got to try that to see. I should have done that before. Oh my gosh. I think almost it's both horrible and amazing to imagine writing like config files to like put it in lizard mode. Do it. Yeah. One of the things that I didn lizard mode do it yeah um one of the

Starting point is 00:15:25 things that i didn't before reading this article one of the things i didn't know you could do in toml because i just sort of cursory i use it with pyproject.toml and that's about it but you can do um uh so uh talks about um normal how to read stuff but one of the things is um oh what was i going to talk about arrays uh and you can do arrays of things which are neat and tables and arrays of tables uh which is like so you have arrays of tables are these bracket bracket things and uh and then you can do dot stuff so if you have like uh um how was it user and user dot player these will show up as like, you know, sub-dictionary key things.

Starting point is 00:16:10 And so one of the things that I, and I played with it this morning, and it really, I should have had something to show, but the thing I like to do is to just read it, just like this article talks about reading it, just read the TOML file into Python and print it. Just like this article talks about reading it. Just read the TOML file into Python and print it. And then you can, and it'll print out as a dictionary. And then you can create whatever format you want for your TOML file. And then you can just see what it's going to look like.

Starting point is 00:16:36 And then you know how to access it. That's one of the best ways to do that. That's awesome. What an interesting format that's pretty that's pretty in-depth and a blast from last week past ashley hey ashley says utf-8 can encode any unicode character emoji your heart emoji heart out mary oh yeah you could do like you know is it in heart mode heart equals true heart equals false or uh oh optimize optimizer, you could do a flame emoji equals true. Um, exactly. So I love it. Yeah. I think, look, we have not leveraged the configuration as emoji sufficiently. Oh yeah. I think, I think a pie test should rewrite all of its config figs as emoji items. Just do a PR. I'm sure they'll take it. Yeah. Yeah. All right. Yeah. All right.

Starting point is 00:17:25 Let me tell you about our sponsor for this week before we move on. So this week is brought to you by Microsoft Founders Hub. In fact, they are supporting a whole bunch of upcoming episodes. So thank you a whole bunch to Microsoft for startups here. Starting business is hard by some estimates. Over 90% of startups go out of business within their first year. With that in mind, Microsoft for startups set out to understand what startups need to be successful and to create a digital platform to help overcome those challenges. Microsoft for Startups Founders Hub. Their hub provides all

Starting point is 00:17:55 founders at any stage with a bunch of free resources to help solve challenges. And you get technology benefits, but also really importantly, access to expert guidance and skilled resources, mentorship and networking connections, and a bunch more. So unlike a bunch of other similar projects in the industry, Microsoft for Startup Founders Hub does not require startups to be investor backed or third party validated to participate. It's free to apply. And if you apply, get in, then it's, you're in. It's open to all. So what do you, get in, then you're in. It's open to all. So what do you get if you join or apply and then get accepted?

Starting point is 00:18:28 So you can speed up your development with access to GitHub, Microsoft Cloud, the ability to unlock credits over time, as in it gets over $100,000 worth of credits over time over the first year if you meet a bunch of milestones, which is fantastic. Help your startup innovate. Founder Hubs is partnering with companies like OpenAI, a global leader in AI research and development to provide benefits and discounts too. Neat.

Starting point is 00:18:50 Yeah. Through Microsoft Startup Founders Hub, becoming a founder is no longer about who you know. You'll have access to the mentorship network, giving you access to a pool of hundreds of mentors across a range of disciplines, areas like idea validation, fundraising, management coaching, sales marketing,

Starting point is 00:19:07 as well as specific technical, technical stress points. To me, that that's actually the biggest value is the networking and mentor side. So book a, you'll be able to book a one-on-one meeting with these mentors, many of whom are founders themselves. Make your idea a reality today with the critical support you'll get from Microsoft or startups founders hub. Join the program at pythonbytes.fm slash founders hub link will be in your player show notes. Nice. Yeah, cool. Indeed. All right. I guess I'm up next with this order we got and oh my goodness, Samuel Colvin take a bow because he put out a plan

Starting point is 00:19:38 for what's happening with Pydantic version two. But the reason I say take a bow is this is one detailed plan that is really, really thought through, thought out, backed up with a bunch of GitHub discussions and so on. So the idea is Pydantic started out as an interesting idea and surprise, surprise, a bunch of people glommed onto it, probably more than it was originally envisioned to be so.

Starting point is 00:20:04 So for example, SQL model from Sebastian Ramirez is like, Pydandic models are now our ORM to the database with all the interesting stuff that ORMs have. And Roman Wright said, guess what? We could do that for MongoDB as well. Same with the Pydastic thing we recently spoke about.

Starting point is 00:20:21 And then Sebastian Ramirez is like, also like, hey, fast API, this can be both our data exchange as well as our documentation. I was like, oh my goodness, what's going on here? So since there's a bunch of stuff on the insides that could be better, let's say, or maybe time to rethink this. So in this plan, it talks about what they'll add, what they'll remove, what will change, some of the ideas for how long it will take and so on. Interesting. Yeah. Here's a, here's a pretty significant thing. I'm currently taking a kind of sabbatical after leaving my last job to work on

Starting point is 00:20:53 this, which goes until October. So that's a big commitment to, I'm going to help make Pydantic better. So it sounds familiar. It sounds a bit like a rich and textual and those types of things as well. But this is a big, big commitment from Samuel and he's really doing a ton of work. It says people seem to care about my project. It's downloaded 26 million times a month. Wow. It's insane. Yeah. That's awesome. That's kind of incredible. It is. And so it says, here's the basic roadmap. Implement a few features in what's now called the Pydantic Core. We just had Ashley, who as we saw is out in the audience. Hey, Ashley, who give a bit of a shout out to this feature. And also I do want to also

Starting point is 00:21:35 credit a couple other people's because Douglas Nichols and John Fagan also let me know that this was big news coming. So thank you all for that. The PyDana core is being rewritten in Rust, which doesn't mean you have to know or do anything. It just means you have to pip install something. You get a binary compiled thing that runs a lot faster. Okay, so more on that in a second. First, they're working to get 110 out and basically merge every open PR that makes sense

Starting point is 00:22:01 and close every PR that doesn't make sense and then profusely apologize to why your PR that you sense and close every PR that doesn't make sense. And then profusely apologize to why your PR that you spent a long time making was closed without merging. Some other bookkeeping things, start tearing the Pydantic code apart and see how many existing tests can still be made to pass and then release eventually Pydantic. The goal is to have this done by October, probably by the end of the year for sure. A couple of things worth paying attention to. There are a bunch of breaking changes in here. A lot of things are being cleaned up, reorganized, renamed, some removed, like from ORM, people might be using that with

Starting point is 00:22:34 SQL Alchemy, that's being removed, for example, and so on. So there's, if you depend heavily on Pydantic, especially if you build a project like Beanie that depends heavily on PyDantic, you are going to need to look at this because some of the stuff won't work anymore. But let's highlight a couple of things here. Performance. This one is really important because this is the data exchange level for FastAPI. This is the database transformation level.

Starting point is 00:22:59 When I do a query from the database, what comes back comes back in some raw form and then it's turned into a PyDantic model. And those are computationally expensive things that happen often. And in general, Pydandic version two is about 17 times 1,700% faster than V1 when validated models in a standard scenario. It says between four to 50 times faster than Pydandic one. That's cool, right?

Starting point is 00:23:24 Yeah. That alone should make your ears perk up and go, excuse me, my ORM just got 17 times faster? Wait a minute, I'm liking this. I know this is not the only thing that happens at ORM level, but the ones I called out that depend heavily on it, that's in the transformation path. So this is important.

Starting point is 00:23:42 Yeah. This is actually, I'm super impressed. I have not, I normally don't even see this sort of advanced planning in commercial projects. Yes. Oh yeah. You could do a whole business startup that doesn't have the amount of thought that went into like what's happening in the next version of PyDantic. It's ridiculous.

Starting point is 00:24:01 Yeah. It's incredible. I was serious when I said take a bow. It really lays out, opens a discussion about certain things and so on. So like another one is strict mode. I think I even saw a comment in the chat about it. So one of the things I actually like about Pydantic, but under certain circumstances, I can see why you would not want it is if you have something you say is an integer field

Starting point is 00:24:23 and then you pass one, two the number great but if you also pass quote one two three pydantic will magically parse that for you like this happens all the time on the internet like a query string has a number but query strings are always strings there's no way to have anything but strings yeah so you got to convert them right so this automatically does that but if you don't want that to happen you say you gave me a string it's invalid you can turn on strict mode, which is off by default, I believe. There's also a bunch of play. Go ahead. So strict mode does the conversion or strict mode does not? Strict mode won't do the conversion. It says, you said it's an int, you gave me a string. Nope. Rather than, could it be an integer? Let's try that first. You know what I mean?

Starting point is 00:25:02 Yeah. You know, maybe one of the things you do is in the orm level one of those things you might put it in strict mode so it doesn't do as much work trying to convert stuff i don't know if that actually would matter but formalizes a bunch of conversions it has um built-in json support and different things another big thing is this pydantic core will be able to be used outside of pydantic classes now so you can do a significant performance improve to improve stuff like adding validation to data classes or validating arguments and query strings or a type dick or a function argument or whatever yeah yeah uh let's

Starting point is 00:25:41 see next up um and let's see this one. Strict mode. We talked about strict mode. Another one is required versus nullable. There's a little bit of ambiguity of, you know, if you said something's a string, that means it's required and it can't be none. If you say it's a string type none or as an optional string or something like that, then basically the behaviors were a little bit different. So originally, I think this is when typing was pretty new, said, Hydantic previously had a confused idea of required versus nullable. This is mostly resulted from Sam's misgivings about marking a field as optional, but requiring a value to be provided to it,

Starting point is 00:26:22 but allowing it to be set to none or something along those lines. Anyway, there's minor changes around that. Uh, let's see final one that I want to cover is namespace stuff. And this is like a whole bunch of things are now getting renamed. So for example, if you override, if you implemented or overrode validate JSON, it's now model underscore validate JSON. If you had is instance, it's now model is instance. So there you had is instance, it's now model is instance. So there's a bunch of these changes all over the place that look like they're going to cause break-in changes. They're easy to fix. Just change the name, but you know, it's not nothing.

Starting point is 00:26:54 Also parse file. I still love his hander here. Parse file. This was a mistake. It should have never been in PyTantic. We're removing it. Okay. Partially replaced by this other thing. Anything else it did was a mistake. From ORM, this has been moved somewhere else. Schema and so on. So there's a lot of stuff that people were using here. So just have a look. Try it out.

Starting point is 00:27:17 Don't just go, oh, then version 2 is out. Is this going to work? This is going to have some significant changes. And another reason why it's really awesome that he goes through so much detail is because there's going to be stuff that breaks. So it's a breaking interface change. And so, yeah, it's cool that it's this detailed. And a couple of things to notice.

Starting point is 00:27:39 Let's see, somebody else in the chat mentioned, Richard mentioned, and he has emojis in the headers. Yeah, there's emojis in the headers. And I got to say like the navigation in the table of contents, very cool. It goes to like light gray for areas you've already seen. And then- Oh, that's interesting.

Starting point is 00:28:00 It's a cool thing. Yeah, it's quite cool. I think going on and on, but two real quick things. One, there'll be no pure Python implementation of the core. It's always Rust, but they list out the platforms where it'll be compiled to, including WebAssembly. Oh, nice. They previously had some Cython in what was supposed to be pure Python's Pydantic. And so now, a kind of bonus is the Pydantic model,

Starting point is 00:28:25 the Pydantic package becomes a pure Python package, whereas previously it wasn't. So they've taken like all of that behavior and put it under this core thing that ships as a Rust binary. And now instead of doing some Cython middle ground, it's pure Python again. So that's interesting refactoring, I think. Yeah. Yeah. And finally, documentation. When you get a validation error, it gives you a link to the documentation in the JSON error message. That's pretty cool. That's nice. All right.

Starting point is 00:28:56 Yeah. Anyway, that's quite a plan, isn't it, Brian? Yeah. Quite a plan. All right. Well, I'm excited for it. Okay. Well, next topic is a little more lighthearted.

Starting point is 00:29:08 It's about fish. Pike, to be specific. No, it's about PDFs. So it's just a cool project I saw, noticed. Pike PDF. It's a Python library for reading and writing PDF files. What's the big deal? We've had these before.

Starting point is 00:29:26 But this is, it's based on QPDF, which is a C++-based library, and it's presently continued being maintained. So it's kind of pretty fast. Well, actually, I'm assuming it's fast if it's C plus plus in the background. Um, but the, uh, uh, it's, it's also pretty just nice and elegant to do things. And the documentation, um, has this nice fish, which is good. I always like cool diagram, cool, cool logos. Uh, but, um, some of the neat things that you can do with it. So it's recommending that you not use it. If you're just writing PDF files that there's, there's, um, there's other things that you can use.

Starting point is 00:30:11 What was it like report lab to, to write PDFs. But if you're having to read or modify PDFs, then this is where it shines. You can do things like, uh, copy pages from one PDF to another, split and merge PDFs, extract content out of PDFs. Like if you're using it for data stuff, you get a report in PDF and you're trying to pull the information out, you can use it for that. Or images, you can pull all the images out of a PDF file.

Starting point is 00:30:39 Or, this is kind of cool, you can replace images in a PDF file and generate a new one without changing anything else about the file. It's kind of cool. You can replace images in a PDF file and generate a new one without changing anything else about the file. It's kind of neat. So just kind of a neat, if people are working with reading or modifying PDF files,

Starting point is 00:30:55 maybe check this one out. Yeah, this looks great. The fact that it's in C++, I'm guessing it's probably standalone. I remember I've done some PDF things before and it felt like I had to install some OS level thing that it shelled out to. So this is cool.

Starting point is 00:31:09 Yeah. And the, some nice on the readme, it has a comparison of some of the different PDF doc or PDF libraries that you could use. And some of the reasons why you might want this one, like it supports more versions. I didn't realize that like one of these libraries I've heard of before, PDFRW,

Starting point is 00:31:29 doesn't support the newer versions. So bummer. And then also password protected files. It supports that, except for, but not public key ones, but just normal passwords. Straight passwords. Yeah. Yeah.

Starting point is 00:31:44 That's great. So it's kind of neat. I also like the measure of actively maintained, the commit activity per year over the years, something like that. Oh, right. That's kind of interesting. Yeah.

Starting point is 00:31:54 It's an interesting metric. It seems good. I haven't really thought about it lately, but yeah. Nice. All right. Yeah. This is a great one. Well, so that's it for our main items.

Starting point is 00:32:05 Yeah. What else you got? great one. Well, so that's it for our main items. Yeah. What else you got? Any extras? Well, last week we talked about the critical packages. Or at some recent. Yeah, last week we talked about critical packages. Yesterday or last week, depending on how you consume this. Exactly.

Starting point is 00:32:24 Yeah. So we, I was surprised to find out that a pie test check where the plugin I wrote was, was one of those. I'm like, really? Uh, because it's like the top 1%. So I, um, if anybody's curious, I wanted to just highlight that a little bit. So a pie test check is a plugin that allows multiple failures per test. And, uh, in one of the best ways i it's a secondary way that one of the contributors added is you can use it as a context manager you can say like with

Starting point is 00:32:52 check and then do an assert then you're not multiple of those within a i like the one liner even that's yeah nice and this is totally like black will totally reformat this if you ran it through black but it's nice uh you'd have to block it out anyway um i i was like how could it be what well i'm curious what what on the list it was so there's there's a um a place called uh what uh hugo vk um has a top pi pi packages list and it's updated i think it's just updated once a month or something. But you can do the top 5,000. Yeah, it's the top 5,000 or 1,000 or 100.

Starting point is 00:33:33 And so I was curious about where on the list I was. I'm number 1,677, so kind of far down the list. But, hey, we're just talking. It's still in the top third of the top one percent. That's pretty awesome. The pie test is number seventy two. That was pretty neat. And Pydantic, which we covered was was I just checked one seventeen. But there are there are fifty seven pie test plugins that show up in the top thirty five

Starting point is 00:34:02 hundred. So that's pretty neat. That is pretty neat. That's all I got for extras. All right. Well, I have zero extras. So mine are finished as well. How about a joke?

Starting point is 00:34:12 Yeah. Great. All right. I told you we're coming back to it. So this one comes from Netta, Netta Code Girl at Netta, N-E-T-A dot M-K. And let me just pull this one up here. All right.

Starting point is 00:34:24 So this one is, there's this colleague here. Can I make this? There we go. Make it a little bigger. There's the two women who are developers, Netta and her unnamed friend who always has gotten in trouble with the elevator last time, basically. And there's this sort of weird manager looking guy that comes in and says, I tested your chatbot, but some of its replies are really messed up.

Starting point is 00:34:47 Well, that's what testing is all about. I'll go through the logs later, says one of the girls. No, no, no, no, no, no, no, no, no need. Check out the faces. She's like, excuse me, I'm not even sure I want to open the logs now. Yeah, yeah, don't look at the logs. That's what testing is for. I'll go open the logs now yeah yeah don't look at the logs that's what testing's for uh i'll go through the logs well yeah she's got some good ones in her list there so

Starting point is 00:35:14 love it yeah i like the art too uh nice art i do too it is so also nice was our podcast thanks for being here thank you yeah you bet see you next week see you next time

Python Bytes - #294 Specializing Adaptive Interpreters in Full Color

Topics covered in this episode: Specialist: Python 3.11 perf highlighter tomli “A lil’ TOML parser” Pydantic V2 Plan pikepdf Extras Joke See the full show notes for this episode on the web...site at pythonbytes.fm/294

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.