Python Bytes - #435 Stop with .folders in my ~/

Episode Date: June 9, 2025

Topics covered in this episode: platformdirs poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” Python Pandas Ditches NumPy for Speedier PyA...rrow pointblank: Data validation made beautiful and powerful Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: platformdirs A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir". Why the community moved on from appdirs to platformdirs At AppDirs: Note: This project has been officially deprecated. You may want to check out pypi.org/project/platformdirs/ which is a more active fork of appdirs. Thanks to everyone who has used appdirs. Shout out to ActiveState for the time they gave their employees to work on this over the years. Better than AppDirs: Works today, works tomorrow – new Python releases sometimes change low-level APIs (win32com, pathlib, Apple sandbox rules). platformdirs tracks those changes so your code keeps running. First-class typing – no more types-appdirs stubs; editors autocomplete paths as Path objects. Richer directory set – if you need a user’s Downloads folder or a per-session runtime dir, there’s a helper for it. Cleaner internals – rewritten to use pathlib, caching, and extensive test coverage; all platforms are exercised in CI. Community stewardship – the project lives in the PyPA orbit and gets security/compatibility patches quickly. Brian #2: poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” from Bob Belderbos Tasks are easy to define and are defined in pyproject.toml Michael #3: Python Pandas Ditches NumPy for Speedier PyArrow Pandas 3.0 will significantly boost performance by replacing NumPy with PyArrow as its default engine, enabling faster loading and reading of columnar data. Recently talked with Reuven Lerner about this on Talk Python too. In the next version, v3.0, PyArrow will be a required dependency, with pyarrow.string being the default type inferred for string data. PyArrow is 10 times faster. PyArrow offers columnar storage, which eliminates all that computational back and forth that comes with NumPy. PyArrow paves the way for running Pandas, by default, on Copy on Write mode, which improves memory and performance usage. Brian #4: pointblank: Data validation made beautiful and powerful “With its … chainable API, you can … validate your data against comprehensive quality checks …” Extras Brian: Ruff rules Ruff users, what rules are using and what are you ignoring? Python 3.14.0b2 - did we already cover this? Transferring your Mastodon account to another server, in case anyone was thinking about doing that I’m trying out Fathom Analytics for privacy friendly analytics Michael: Polars for Power Users: Transform Your Data Analysis Game Course Joke: Does your dog bite?

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bites where we deliver Python news and headlines directly to your earbuds. This is episode 435 recorded June 9th, 2025 and I am Brian Ocken. And I'm Michael Kennedy. And this episode is sponsored by us. So please check out all the awesome courses at Talk Python Training and lots of different options for learning PyTest at pythontest.com. So check that out. And also thank you to our Patreon supporters. They've been sticking around for a long time
Starting point is 00:00:30 and we really appreciate them. If you'd like to connect with us, more about this later, but if you'd like to connect with us, please check out the links in the show notes. Links to both Blue Sky and Mastodon are there. And if you would like to join us live, you can head on over to pythonbytes.fm slash live, and it'll show you how to do that. There's a link at the top. It's really easy. And then also, if you, when you're listening and you have, want to check out all the cool things that we talked about about but don't want to write them down, no worries just go ahead and sign up to be part of the newsletter and we'll
Starting point is 00:01:09 send those directly to your inbox. So with that, do you have a cool item for us Michael? I do I have some stuff I want to talk about and it's an update on things I've spoken about before so but it could have been six years ago. I can't remember when we covered this, and I for some reason didn't do a search. So here's the deal. You are writing an application, and you're doing things like, hey, I need to have my app save some content in a file.
Starting point is 00:01:38 I don't want the user to access it. I want my app to be able to get to it. And that's pretty much it. Like maybe a SQLite DB file, right? That's not a thing that the users should be tracking down. You don't want to put it in their my documents on Windows or somewhere in the user profile on Mac or Linux or something like that, right? You just you need to save it somewhere. It needs to be associated with the user. They need to have permissions right there. So I previously talked about appters, the appters package,
Starting point is 00:02:03 a small Python module for determining appropriate platform specific directories like user directories or stuff like that. So where's your config? Where's your cache directory? So on macOS that's like tilde slash capital library, capital library, cache, then you probably use app name, something like that, those kinds of things. You want to get access to that and put things in there. And it drives me nuts, Brian. I go to my user profile. If I want to look at hidden files, there's a ton of dot this app, dot that app settings in my
Starting point is 00:02:38 user profile. And they're not there for me. They're there because the designers of those apps were too lazy, usually, too lazy to put that into the appropriate location on that platform, right? There's some that maybe I should mess with, like for example, oh my zshell, you might want to go and tweak like your prompt, how that looks or whatever, right? You would get to that from there. But a lot of things they just do it because like, well, we'll just do a dot because it's hidden. People won't see that. just drop it right in there So I have more people listening to put it to put stuff in the right place or a better place Well, I okay I'm guilty of this but I don't really think about Windows too much I just think about like in in a in a Linux the environment the home director is usually the right place Well, I well maybe I mean, let me see what I got in here. I don't think so, but let me do an A.
Starting point is 00:03:29 What do I do with my hidden ones there? You know what, I can do it this way. Hold on, I'll do it this way. So for example, Well, I'm probably doing it the wrong way there too, so you're probably right. Well, maybe, but there's like Studio 3T, which is a MongoDB GUI management tool.
Starting point is 00:03:41 It has all of its settings in here. Is that something that I should be working with? Should I be seeing this? Probably not. It's like a.3t folder and it's got like my roaming history and crap in there. I've got.android for whenever I have that Android SDK doing something. I've got I mean I can see like I've got a.aws. Maybe it's got things in there, right? But I have a.mapplotlib. For real. There's a dot-map plot lib in here Why why is it here? I don't know. There's a pip audit cache. Literally. It's called cache Should I see the cache? No, the cache is not for reading
Starting point is 00:04:15 It should be in there's literally a cache folder on all these platforms where it goes and maybe on Linux It literally is just like there but okay fine, but on the other ones it shouldn't be right anyway Okay, maybe I'm maybe I'm ranting a bit hard on this But you should probably do these correct things and I wasn't even thinking about Windows although Windows does have a different set of places things go right so this this thing was great, and I recommended actors But if you go up a little bit it says, this project has officially been deprecated. Oh no! No! We were so close. So it says, you may want to check out platform.durs, which is my recommendation. So moving over to platform.durs, this is under talks dev,
Starting point is 00:04:58 t-o-x dev, so we'll group there, and it's a small python module for determining the appropriate platform specific directories such as i just said however it's better in some ways so it has i don't know if you noticed before but there's maybe three or four directories on the appters this one has documents downloads pictures videos music desktop runtime etc etc lots of different places that you might want to go to depending on what platform you're on. It has better typing. It has been rewritten to use path, as in path objects from Pathlib and all that.
Starting point is 00:05:35 It's got CI testing against all the different platforms that it supports. So very, very cool that this is not only continues to live, but it's actually gotten much better. Yeah, and the talks development group is gonna keep it updated. So cool. Yeah, exactly.
Starting point is 00:05:51 It can't require that much support over time, right? These things are not super dynamic, but I do really appreciate them modernizing it. I think that some of the changes were that basically app dress got out of sync with Python rather than out of sync with the platforms, right? It was just like, well, it doesn't have types. And so now we got, you know,
Starting point is 00:06:11 and it was just a bunch of stuff like that. So I'm very happy to see it and recommend platform-dirs. I already used it to massive success on some projects. I was like, I want to use this and some cool caching things. And it made my app so much better. And guess what? There's not yet another dot Michael's app cache in my tilde slash.
Starting point is 00:06:28 Over to you. I'll have to do this on some of my applications. It's worth considering. So I'm glad you brought it up. All right. I'd like to actually wanna do a shout out to Bob Belderbos because he posted, I think he posted on several platforms
Starting point is 00:06:44 but I saw it on LinkedIn. Says, I love make files. They save me time and help my teams work in more uniform way. I'd love to be able to say this. I love make files also, or actually I don't love them. I'm just used to them. So I throw in a make file into something that makes sense,
Starting point is 00:07:02 but for a very long time, nobody else on my projects have been familiar with make files. So I can't use them. That would be me. I'd be like, what's a make file? Why do we have this? That we do in Python.
Starting point is 00:07:12 So some Python projects I use, I use talks because you can do kind of some similar stuff with talks, but it is annoying that it like creates a virtual environment. It's better with UV now because it's fast, but still. There is, so there's a new kid on the block, or at least new to me, called, that Bob has introduced and it's called Poe the Poet. And I was a little leery because I thought it would maybe
Starting point is 00:07:37 was too tied to the poetry project. But even if it is, you don't have to use it with it. I'm using it without it, of course. But anyway, so the thing I love about this is, as an example, I'm leaving this on Bob's post because it's great. In your pyproject.toml, that's where you define the tasks. And this is great because I don't have to have an extra thing, an extra file in my project.
Starting point is 00:08:02 This makes sense to just put it there. So here's an example of like, so a test action would be you can just define it as py test or something or cove you can have a the have py test running with cove with coverage. The linter is rough, which is it's funny, a lot of people still do this. But actually I remember Ruff better than I remember Lent anymore, but you know, it's just me. But anyway, some cool type, oh he has tie check for the TY. What? Wow, he's on top of that, that is super new. Integrates well with UV, I tried it out, it is pretty slick.
Starting point is 00:08:41 So the Poe the Poet documentation says it's a batteries included task runner that works well with poetry or with UV. I'm definitely using it with UV. The thing that there's a bunch of cool things how to define this in the documentation is pretty good. So you can, like Bob said, you just add a section called tool.poe.tasks and then define a bunch of actions.
Starting point is 00:09:09 There's ways to have like built up actions, like you would with a make file or talks have, actually I don't know if you can do this talks, anyway, you'd have multiple actions that run one after another, the sequences is pretty cool. You can also add help text to things, which is nice. Anyway, but having it right in, I just love it's right in the pyproject.toml
Starting point is 00:09:32 and it works well with UV, love that. So definitely gonna switch it for a lot of projects. And there's an example, I'm not sure where the example is right now, but in the kind of getting started, or maybe it was Bob, anyway, adding it to your development dependencies within your PyProject.toml is slick
Starting point is 00:09:54 because then now if you're using, hopefully you're using development dependencies within your project.toml so that new developers can just install all the tools they need to just by using that, so pretty cool. Love it. Yeah, very cool. And by the way, did you catch that the the logo of Poe the poet is Edgar Allan Poe? Yeah, he's the poet. Very interesting. Okay. Next up is some data science. There's an article I'm linking to on the new stack by
Starting point is 00:10:23 an article I'm linking to on the new stack by Jow Jacobs, Jackson rather, sorry. And this is that Pandas 3.0 will significantly boost performance by replacing NumPy with PyAero as the default engine. Enabling faster loading and reading of columnar data. So since 2.0 I think Pandas has had possible support for PyAero, which is more of an analytics type backend than NumPy, which is more database-y, like traditional rows, like here's an entry in its internal structure, making it much easier to ask questions like, what's the average or the max of this aspect of the data right what is
Starting point is 00:11:06 the the average sales price for these million rows is way faster in a columnar orientation because you just go right down the the data as it is it makes it slower for row type of operations but that's less common right for data science it also does much much faster reading and writing of files and all sorts of things. So I recently spoke with Reuven Lerner about the PyAero revolution on TalkPython not too long ago and that was a lot of fun. Basically we talked about all the benefits of PyAero and why you might consider it but the news here is that this will be the default in 3.0 for pandas. When is that coming out? Who knows in 3.0 for Pandas.
Starting point is 00:11:45 When is that coming out? Who knows, right? Pandas have been around for like many, many, many years and it's on version two, so maybe it's gonna take, what, another 15 years for version three to come out? I don't know. Versions are, I think the perception of versions is changing over, you know, we talked about zerover
Starting point is 00:12:04 and all this stuff, but I think versions are starting to go faster these days, right? Django is on five and it was just on four and it's been around a long time. But it should be out pretty soon. So I'm not exactly sure on the timing. I know some people will know better than me, but anyway. Also may have been like that just have switching out the backend is a big enough change that they decided that's a good reason to switch to a three. Yeah, that's a very good point. It says, over time Pyro is becoming better and better integrated with Pandas, but using it as a backend is still experimental and isn't recommended in
Starting point is 00:12:34 production, presumably until 3.0 where it becomes the default in which it would become recommended because you'd have to work around it to know better, right? Anyway, that's my item. Check out Pyro, check out the conversation I had with Reuven about it to see what's coming for Pandas. Cool. All right. I wanted to talk about data science a little bit as well. So, and I'm going to point to a project called PointBlank. It's a data validation made beautiful and powerful. At least that's the sales pitch or the sub point. Anyway, so why I don't do a lot of data science, but I really think this is a cool idea. Even without data science, any sort of pipeline stuff of checking
Starting point is 00:13:19 data within your pipeline makes sense and what to do with it. So here's the idea. So it says it's a powerful yet elegant data validation framework for Python that transforms how you ensure data quality. There's a chainable API that you can use to validate data against a comprehensive quality checks. And it says visualized through stunning interactive reports
Starting point is 00:13:44 which I haven't looked at the reports yet. But anyway, so I'm just at the top example. You've got you import point blank and then you have these validation features where you can take a data set and invalidate it. In this example it says okay on column D, validate that the values are greater than 100. There's more than 100 values there. And then actually on column C, validate that the values are less than or equal to five. And then there's certain columns exist,
Starting point is 00:14:25 make sure the date and the time exist in those columns. So you get data frames in, or data sets and data frames, and you gotta be able to make sure that they're in the right shape, the data's in the right place. And that's awesome, and what do you do if they don't? Well, there's a REPL with some validation, get tabular report.
Starting point is 00:14:47 That's great for when you're debugging stuff and looking at it. But what about in real time? You're not gonna wanna do that. And this is something you can use in real time to check all this because it's got these cool actions. So there's a bunch of actions as part of this where you can set up callbacks. It's kind of called
Starting point is 00:15:08 a default one that's built in to notify you on Slack. I don't use Slack anymore, but there's a function called one. So if there's any failures, you can set up your own notify function, which I love this idea of just like, give me an API to like fill this in and I can fill that in with whatever I want. I can call it REST API. I'm gonna send myself an email or whatever. Whatever. I'll hook it up. And so this idea of like, just run this all the time.
Starting point is 00:15:36 And then if there's problems with your data chain, let somebody know about it in real time. Love that idea. You can even log it. Lots of stuff there. And looks like the interface is pretty intuitive. They've got a roadmap for additional validation methods, advanced logging capabilities, more messaging actions. So kind of a neat project. I love that that's set up with already with a code of conduct and a governance system. So
Starting point is 00:16:07 Pretty promising little project So that's that's it for that. I'm really like I like that. I did this a lot. Yeah, I like it, too I'm not sure it'd be fun to to hear somebody that's used it and see Is it something that really can be used in real time? Does it slow things down too much? I imagine not because you're gonna do a little bit of work just after you've done a lot of work in a few places, validate it a few places in the pipeline
Starting point is 00:16:37 to make sure everything's still kosher. It definitely, data science has the feel a lot of times of I'm gonna do a little bit of work to get things set up and I'm gonna do some complex calculations on lots of data rather than a bunch of tiny little function calls. I think here's a function, a bit of functionality I'm adding to like an API. It would be totally reasonable to say
Starting point is 00:16:58 I want that to be under 10 milliseconds for this API call. No one's going I need my notebook to process stuff in less than 10 milliseconds. Like if you add a little overhead, it's probably fine. This certainly looks like it's going to be useful for unit tests as well, right? Yeah. Yep. And, uh, and checking to make sure. Yeah. And that's, that's the sort of thing is like, okay, you've checked all the pieces, but this is once you put everything together, um, are we like losing something somewhere or Is some part of the process filling in nulls or treating invalid data incorrectly and putting garbage in there or something?
Starting point is 00:17:34 Yeah, it's super easy to lose pieces of data on some of these things. How's this for a tie-in on our, on to my first extra? extra. If you're working with pollers or something, you can say, I want to do some calculations and what you get back is a new data frame. By default, what you get back are the columns you've asked for, like the new computer columns. But maybe what you intended is I want to add a column to the existing data frame, not take a data frame
Starting point is 00:18:00 and transform it into a new data frame that has just the stuff you asked for. So instead of doing an operation, you would say dot with columns, then do the operation, and it will keep the original data frame and then add on to it, right? You would maybe like with your that these columns exist, test, then you would be able to say, oh, we've lost some of the columns. Who forgot the with columns? Why would you know to do that, Brian? Because you take our Polars for Power Users course that we just released last week. Oh nice. Yeah so this is out, it's by Chris Trudeau.
Starting point is 00:18:31 It's really awesome course Polars for Power Users. Three hours almost just under three hours and you can go and learn a whole bunch of cool things how to work with Polars and Polars uses Pi arrow underneath the covers or arrow underneath the covers so you can also get some cool experience with that. So people check this out. It's been well received and continuing to go. All right. That's my only extra. I have, I have a joke, but any extras before we do that? Oh, I'm loaded with extras this week. So, uh, let's, let's load them up. So there's my point blank. Um, I was interested. So I'm, I'm, I'm used,
Starting point is 00:19:03 I haven't been using rough. I use rough a lot, but I don't tweak it very much. I haven't put, there's a bunch of new rules or not really new, but there's a lot of rules that you can turn on for rough for, for checking your code. And the, the, if you look at the rules list, there's a lot, there's a lot of rules that you could add. So I was interested in this discussion on Reddit about, hey, rough users, what are the rules you're using and what are you ignoring? This one was amusing to me that said I start every project and select all and ignore none. That's not realistic because some of them conflict with each other. So I think this was probably a joke, but I'm not sure. I do like, so there's some serious people that said, hey, this is my set.
Starting point is 00:19:53 It had like Pyflakes rules and Pycode style and warnings and errors. It seems like a decent set. It's just kind of a fun thing to see what other people are using to maybe check out some things. Here's somebody else that has a different way. Instead of picking which rules they use, it's a select all but ignore certain rules that don't make sense for them. And yeah, just kind of an interesting discussion. Anyway, I'm not sure if I covered this one already, but this was as of May 26, Python 3.14.0 Beta 2 is out.
Starting point is 00:20:32 So if you're testing new Python versions, make sure to keep up on the new Betas that come out. Nice, yeah, I don't know though, I don't think we covered it. Okay, and I guess sort of a, oh, I got a couple more topics. One of them is, if you happen, like just randomly wanted to switch Mastodon servers,
Starting point is 00:20:53 it's not, and not lose anything, like lose anybody that might be following you already, there is a way to do it, and we'll link to an article on how to do this on Feddy tips. And it's not, this is not a trivial action. There is a way to do it and we'll link to an article on how to do this on Feddy tips. And it's not, this is not a trivial action. So I guess don't do it on an empty stomach. Looks like a lot of steps, but I'm glad somebody pointed that out.
Starting point is 00:21:15 Why might you? Well, there's, there's been a little bit of drama on, on Faustadon lately and I am not the expert on this but there's enough people I trust that are migrating away from Faustadon that I'm looking into it. I just haven't done it yet because yeah, anyway, thoughts on Faustadon? I have noticed people who I thought followed me. I got notifications on Macedon that they had followed me and I think that's why I think they were moving accounts. Yeah, I got some too. I'm like, well, I want to follow them too.
Starting point is 00:21:49 And I was already following them. So if you want, and so I guess this is public service as well, even if you're not considering switching, if you're getting notifications from people that they're, that you've, that you're like, I thought they all already followed me. It's probably because of something around this. The last thing I wanted to talk about is I was, I've, I've got some, uh, I think
Starting point is 00:22:12 that my, uh, Python or yeah, Python test.com something that I got. Yeah. That's my site. Um, I don't know if I can trust the analytics that I was using before, because I think I'm getting hit hard by AI bots and stuff. So I did update my robots.txt, but I'm also checking out a new analytics package. I'm trying out Fathom analytics and I chose them because they don't collect data on anybody. It's just a which pages were hit and which pages weren't, and they are a GDPR compliant.
Starting point is 00:22:48 So, um, yeah, anyway, I'm, I not much more of a plug than that other than I'm trying it out and uh, yeah, we'll see. Yeah, that's it. Awesome. Awesome. Yeah. I'm still loving you mommy analytics again, GDPR, even better than GDPR compliant, like no cookies whatsoever. How's that? And we self-host it so we don't have to, we don't share the data with anybody, so we don't have to tell you about how we're sharing our data with people, but we're not sharing data with people, which I think is great.
Starting point is 00:23:16 Yeah. Okay, effortless, it says effortless. I'm not sure it's effortless. I've expended effort when the migrations don't work or whatever, but you know what? It's fine. It's still good. It's better than Google Analytics. All right. Are you ready for something fun as well? Yes. I'm not sure if I call this fun. I think it's fun-e in a sort of sympathy way. So this joke is a cartoon called programming humor from programming here called emotional damage, and I think Brian, you want to be the one you want to act this sound, you want to be the one that does the first square? Sure. Hey, there's two people looking at a dog. Does your dog bite?
Starting point is 00:23:54 No, but it can hurt you in other ways. I don't know what I was going to say. I'll do the dog. The dog says, the feature you spent hours coding will not be deployed. Management finds it unnecessary. you spent hours coding will not be deployed. Management finds it unnecessary." The person starts crying. That's so bad. Does it bite? No, but it can hurt you. Yeah. Do you know what's more rough than software having that happen with you? Hardware people. I've got, I remember this was decades ago, but I was working with a team and somebody got a, was retiring and they were kind of roasting him during the retirement. But he was an FPGA developer,
Starting point is 00:24:32 or no, an ASIC developer doing custom chips. And like he'd worked on like dozens of projects and only like 5% of them ever shipped. Most of them got canceled before they, you know, before the project end date. So that's rough. Yeah, the lead time is definitely longer in hardware than it is on software, right?
Starting point is 00:24:53 Like from the time you work on until the time it's out. Yeah, it's in like single digits or fractional digits of years instead of weeks or months. Yeah, wow. I definitely think about the, I watch documentaries about, you know, like the Cassini probe that went to, was that Saturn it went to?
Starting point is 00:25:11 And these people will be like, yeah, we started that 25 years ago and now I'm done with this project. Moving on to the next. It's like you get two or three projects for your life. Crazy, those things better not burn up in the atmosphere on their way out. Yeah. Those things better not burn up in the atmosphere on their way out.
Starting point is 00:25:32 Well, may the dogs that talk about unshipped features stay away from you all. Bye. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.