Python Bytes - #435 Stop with .folders in my ~/
Episode Date: June 9, 2025Topics covered in this episode: platformdirs poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” Python Pandas Ditches NumPy for Speedier PyA...rrow pointblank: Data validation made beautiful and powerful Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: platformdirs A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir". Why the community moved on from appdirs to platformdirs At AppDirs: Note: This project has been officially deprecated. You may want to check out pypi.org/project/platformdirs/ which is a more active fork of appdirs. Thanks to everyone who has used appdirs. Shout out to ActiveState for the time they gave their employees to work on this over the years. Better than AppDirs: Works today, works tomorrow – new Python releases sometimes change low-level APIs (win32com, pathlib, Apple sandbox rules). platformdirs tracks those changes so your code keeps running. First-class typing – no more types-appdirs stubs; editors autocomplete paths as Path objects. Richer directory set – if you need a user’s Downloads folder or a per-session runtime dir, there’s a helper for it. Cleaner internals – rewritten to use pathlib, caching, and extensive test coverage; all platforms are exercised in CI. Community stewardship – the project lives in the PyPA orbit and gets security/compatibility patches quickly. Brian #2: poethepoet - “Poe the Poet is a batteries included task runner that works well with poetry or with uv.” from Bob Belderbos Tasks are easy to define and are defined in pyproject.toml Michael #3: Python Pandas Ditches NumPy for Speedier PyArrow Pandas 3.0 will significantly boost performance by replacing NumPy with PyArrow as its default engine, enabling faster loading and reading of columnar data. Recently talked with Reuven Lerner about this on Talk Python too. In the next version, v3.0, PyArrow will be a required dependency, with pyarrow.string being the default type inferred for string data. PyArrow is 10 times faster. PyArrow offers columnar storage, which eliminates all that computational back and forth that comes with NumPy. PyArrow paves the way for running Pandas, by default, on Copy on Write mode, which improves memory and performance usage. Brian #4: pointblank: Data validation made beautiful and powerful “With its … chainable API, you can … validate your data against comprehensive quality checks …” Extras Brian: Ruff rules Ruff users, what rules are using and what are you ignoring? Python 3.14.0b2 - did we already cover this? Transferring your Mastodon account to another server, in case anyone was thinking about doing that I’m trying out Fathom Analytics for privacy friendly analytics Michael: Polars for Power Users: Transform Your Data Analysis Game Course Joke: Does your dog bite?
Transcript
Discussion (0)
Hello and welcome to Python Bites where we deliver Python news and headlines directly to your earbuds.
This is episode 435 recorded June 9th, 2025 and I am Brian Ocken.
And I'm Michael Kennedy.
And this episode is sponsored by us. So please check out all the awesome courses at Talk Python
Training and lots of different options for learning PyTest at pythontest.com.
So check that out.
And also thank you to our Patreon supporters.
They've been sticking around for a long time
and we really appreciate them.
If you'd like to connect with us, more about this later,
but if you'd like to connect with us,
please check out the links in the show notes.
Links to both Blue Sky and Mastodon are there. And if you would like to join us live,
you can head on over to pythonbytes.fm slash live, and it'll show you how to do that. There's a link
at the top. It's really easy. And then also, if you, when you're listening and you have,
want to check out all the cool things that we talked about about but don't want to write them down, no worries just go ahead and sign up to be part of the newsletter and we'll
send those directly to your inbox. So with that, do you have a cool item for
us Michael? I do I have some stuff I want to talk about and it's an update on
things I've spoken about before so but it could have been six years ago.
I can't remember when we covered this,
and I for some reason didn't do a search.
So here's the deal.
You are writing an application, and you're doing things like,
hey, I need to have my app save some content in a file.
I don't want the user to access it.
I want my app to be able to get to it.
And that's pretty much it.
Like maybe a SQLite DB file, right?
That's not a thing that the users should be tracking down. You don't want to put it in their
my documents on Windows or somewhere in the user profile on Mac or Linux or something like that,
right? You just you need to save it somewhere. It needs to be associated with the user. They need
to have permissions right there. So I previously talked about appters, the appters package,
a small Python module for determining
appropriate platform specific directories like user directories or stuff like that.
So where's your config?
Where's your cache directory?
So on macOS that's like tilde slash capital library, capital library, cache, then you
probably use app name, something like that, those kinds of things.
You want to get access to that and put things in there. And it drives me nuts, Brian. I go to my user
profile. If I want to look at hidden files, there's a ton of dot this app, dot that app settings in my
user profile. And they're not there for me. They're there because the designers of those apps were too lazy, usually, too lazy to put that into the appropriate location on that platform, right?
There's some that maybe I should mess with, like for example, oh my zshell, you might want to go and tweak like your prompt, how that looks or whatever, right?
You would get to that from there. But a lot of things they just do it because like, well, we'll just do a dot because it's hidden.
People won't see that. just drop it right in there
So I have more people listening to put it to put stuff in the right place or a better place
Well, I okay I'm guilty of this but I don't really think about Windows too much
I just think about like in in a in a Linux the environment the home director is usually the right place
Well, I well maybe I mean, let me see what I got in here. I don't think so, but let me do an A.
What do I do with my hidden ones there?
You know what, I can do it this way.
Hold on, I'll do it this way.
So for example,
Well, I'm probably doing it the wrong way there too,
so you're probably right.
Well, maybe, but there's like Studio 3T,
which is a MongoDB GUI management tool.
It has all of its settings in here.
Is that something that I should be working with? Should I be seeing this?
Probably not. It's like a.3t folder and it's got like my roaming history and crap in there.
I've got.android for whenever I have that Android SDK doing something. I've got
I mean I can see like I've got a.aws. Maybe it's got things in there, right?
But I have a.mapplotlib. For real. There's a dot-map plot lib in here
Why why is it here? I don't know. There's a pip audit cache. Literally. It's called cache
Should I see the cache? No, the cache is not for reading
It should be in there's literally a cache folder on all these platforms where it goes and maybe on Linux
It literally is just like there but okay fine, but on the other ones it shouldn't be right anyway
Okay, maybe I'm maybe I'm ranting a bit hard on this
But you should probably do these correct things and I wasn't even thinking about Windows although
Windows does have a different set of places things go right so this this thing was great, and I recommended actors
But if you go up a little bit it says, this project has officially been deprecated.
Oh no! No! We were so close. So it says, you may want to check out platform.durs,
which is my recommendation. So moving over to platform.durs, this is under talks dev,
t-o-x dev, so we'll group there, and it's a small python module for determining the appropriate platform specific
directories such as i just said however it's better in some ways so it has i don't know if
you noticed before but there's maybe three or four directories on the appters this one has documents
downloads pictures videos music desktop runtime etc etc lots of different places that you might
want to go to depending on what platform you're on.
It has better typing.
It has been rewritten to use path,
as in path objects from Pathlib and all that.
It's got CI testing against all the different platforms
that it supports.
So very, very cool that this is not only continues to live,
but it's actually gotten much better.
Yeah, and the talks development group
is gonna keep it updated.
So cool.
Yeah, exactly.
It can't require that much support over time, right?
These things are not super dynamic,
but I do really appreciate them modernizing it.
I think that some of the changes were that basically
app dress got out of sync with Python
rather than out of sync with the platforms, right?
It was just like, well, it doesn't have types.
And so now we got, you know,
and it was just a bunch of stuff like that.
So I'm very happy to see it and recommend platform-dirs.
I already used it to massive success on some projects.
I was like, I want to use this and some cool caching things.
And it made my app so much better.
And guess what?
There's not yet another dot Michael's app cache
in my tilde slash.
Over to you.
I'll have to do this on some of my applications.
It's worth considering.
So I'm glad you brought it up.
All right.
I'd like to actually wanna do a shout out
to Bob Belderbos because he posted,
I think he posted on several platforms
but I saw it on LinkedIn.
Says, I love make files.
They save me time and help my teams work
in more uniform way.
I'd love to be able to say this.
I love make files also, or actually I don't love them.
I'm just used to them.
So I throw in a make file into something that makes sense,
but for a very long time,
nobody else on my projects
have been familiar with make files.
So I can't use them.
That would be me.
I'd be like, what's a make file?
Why do we have this?
That we do in Python.
So some Python projects I use,
I use talks because you can do kind of some similar stuff
with talks, but it is annoying that it like creates
a virtual environment.
It's better with UV now because it's fast, but still.
There is, so
there's a new kid on the block, or at least new to me, called, that Bob has introduced
and it's called Poe the Poet. And I was a little leery because I thought it would maybe
was too tied to the poetry project. But even if it is, you don't have to use it with it.
I'm using it without it, of course.
But anyway, so the thing I love about this is,
as an example, I'm leaving this on Bob's post
because it's great.
In your pyproject.toml, that's where you define the tasks.
And this is great because I don't have to have
an extra thing, an extra file in my project.
This makes sense to just put it there. So here's an example
of like, so a test action would be you can just define it as py test or something or
cove you can have a the have py test running with cove with coverage. The linter is rough,
which is it's funny, a lot of people still do this. But actually I remember Ruff better than I remember Lent anymore, but you know, it's just me.
But anyway, some cool type, oh he has tie check for the TY.
What?
Wow, he's on top of that, that is super new.
Integrates well with UV, I tried it out, it is pretty slick.
So the Poe the Poet documentation says
it's a batteries included task runner
that works well with poetry or with UV.
I'm definitely using it with UV.
The thing that there's a bunch of cool things
how to define this in the documentation is pretty good.
So you can, like Bob said, you just add a section
called tool.poe.tasks and then define a bunch of actions.
There's ways to have like built up actions,
like you would with a make file or talks have,
actually I don't know if you can do this talks,
anyway, you'd have multiple actions that run one
after another, the sequences is pretty cool.
You can also add help text to things, which is nice.
Anyway, but having it right in,
I just love it's right in the pyproject.toml
and it works well with UV, love that.
So definitely gonna switch it for a lot of projects.
And there's an example,
I'm not sure where the example is right now,
but in the kind of getting started,
or maybe it was Bob, anyway,
adding it to your development dependencies
within your PyProject.toml is slick
because then now if you're using,
hopefully you're using development dependencies
within your project.toml so that new developers
can just install all the tools they need to
just by using that, so pretty cool. Love it.
Yeah, very cool. And by the way, did you catch that the the logo of Poe the poet is Edgar Allan Poe?
Yeah, he's the poet.
Very interesting. Okay. Next up is some data science. There's an article I'm linking to on the new stack by
an article I'm linking to on the new stack by Jow Jacobs, Jackson rather, sorry.
And this is that Pandas 3.0 will significantly
boost performance by replacing NumPy with
PyAero as the default engine.
Enabling faster loading and reading of columnar data.
So since 2.0 I think Pandas has had possible support for PyAero, which is more of an analytics
type backend than NumPy, which is more database-y, like traditional rows, like here's an entry
in its internal structure, making it much easier to ask questions like, what's the average or the max of this aspect of the data right what is
the the average sales price for these million rows is way faster in a columnar orientation because
you just go right down the the data as it is it makes it slower for row type of operations but
that's less common right for data science it also does much much faster reading and writing of files and all
sorts of things. So I recently spoke with Reuven Lerner about the PyAero
revolution on TalkPython not too long ago and that was a lot of fun.
Basically we talked about all the benefits of PyAero and why you might
consider it but the news here is that this will be the default in 3.0 for
pandas. When is that coming out? Who knows in 3.0 for Pandas.
When is that coming out?
Who knows, right?
Pandas have been around for like many, many, many years
and it's on version two, so maybe it's gonna take,
what, another 15 years for version three to come out?
I don't know.
Versions are, I think the perception of versions
is changing over, you know, we talked about zerover
and all this stuff, but I think versions are starting to go faster these days, right? Django is on five and it
was just on four and it's been around a long time. But it should be out pretty soon. So
I'm not exactly sure on the timing. I know some people will know better than me, but
anyway.
Also may have been like that just have switching out the backend is a big enough change that
they decided that's a good reason to switch to a three.
Yeah, that's a very good point. It says, over time Pyro is becoming better and better integrated
with Pandas, but using it as a backend is still experimental and isn't recommended in
production, presumably until 3.0 where it becomes the default in which it would become
recommended because you'd have to work around it to know better, right? Anyway, that's my
item. Check out Pyro, check out the conversation
I had with Reuven about it to see what's coming for Pandas.
Cool. All right. I wanted to talk about data science a little bit as well. So, and I'm
going to point to a project called PointBlank. It's a data validation made beautiful and powerful. At least that's the
sales pitch or the sub point. Anyway, so why I don't do a lot of data science, but I really
think this is a cool idea. Even without data science, any sort of pipeline stuff of checking
data within your pipeline makes sense and what to do with it.
So here's the idea.
So it says it's a powerful yet elegant data validation
framework for Python that transforms
how you ensure data quality.
There's a chainable API that you can use to validate data
against a comprehensive quality checks.
And it says visualized through stunning interactive reports
which I haven't looked at the reports yet. But anyway, so I'm just at the top example.
You've got you import point blank and then you have these validation features
where you can take a data set and invalidate it. In this example it says
okay on column D,
validate that the values are greater than 100. There's more than 100 values there.
And then actually on column C,
validate that the values are less than or equal to five.
And then there's certain columns exist,
make sure the date and the time exist in those columns.
So you get data frames in, or data sets and data frames,
and you gotta be able to make sure
that they're in the right shape,
the data's in the right place.
And that's awesome, and what do you do if they don't?
Well, there's a REPL with some validation,
get tabular report.
That's great for when you're debugging stuff
and looking at it.
But what about in real time?
You're not gonna wanna do that.
And this is something you can use in real time
to check all this because it's got these cool actions.
So there's a bunch of actions as part of this
where you can set up callbacks. It's kind of called
a default one that's built in to notify you on Slack. I don't use Slack anymore, but there's
a function called one. So if there's any failures, you can set up your own notify function, which
I love this idea of just like, give me an API to like fill this in and I can fill that in with whatever I want.
I can call it REST API.
I'm gonna send myself an email or whatever.
Whatever.
I'll hook it up.
And so this idea of like, just run this all the time.
And then if there's problems with your data chain,
let somebody know about it in real time.
Love that idea.
You can even log it.
Lots of stuff there. And
looks like the interface is pretty intuitive. They've got a roadmap for additional validation
methods, advanced logging capabilities, more messaging actions. So kind of a neat project.
I love that that's set up with already with a code of conduct and a governance system. So
Pretty promising little project
So that's that's it for that. I'm really like I like that. I did this a lot. Yeah, I like it, too
I'm not sure it'd be fun to to hear somebody that's used it and see
Is it something that really can be used in real time?
Does it slow things down too much?
I imagine not because you're gonna do a little bit of work
just after you've done a lot of work in a few places,
validate it a few places in the pipeline
to make sure everything's still kosher.
It definitely, data science has the feel a lot of times
of I'm gonna do a little bit of work to get things set up
and I'm gonna do some complex calculations on lots of data
rather than a bunch of tiny little function calls.
I think here's a function, a bit of functionality
I'm adding to like an API.
It would be totally reasonable to say
I want that to be under 10 milliseconds for this API call.
No one's going I need my notebook to process stuff in less than 10 milliseconds. Like if you add a little overhead,
it's probably fine. This certainly looks like it's going to be useful for unit
tests as well, right? Yeah. Yep. And, uh, and checking to make sure. Yeah.
And that's, that's the sort of thing is like, okay, you've checked all the pieces,
but this is once you put everything together, um,
are we like losing something somewhere or Is some part of the process filling in nulls or
treating invalid data incorrectly and putting garbage in there or something?
Yeah, it's super easy to lose pieces of data on some of these things. How's this
for a tie-in on our, on to my first extra? extra. If you're working with pollers or something,
you can say, I want to do some calculations
and what you get back is a new data frame.
By default, what you get back are the columns
you've asked for, like the new computer columns.
But maybe what you intended is I want to add a column
to the existing data frame, not take a data frame
and transform it into a new data frame
that has just the stuff you asked for.
So instead of doing an operation, you would say dot with columns, then do the operation,
and it will keep the original data frame and then add on to it, right?
You would maybe like with your that these columns exist, test, then you would be able to say,
oh, we've lost some of the columns. Who forgot the with columns?
Why would you know to do that, Brian? Because you take our Polars for Power Users
course that we just released last week. Oh nice. Yeah so this is out, it's by Chris Trudeau.
It's really awesome course Polars for Power Users. Three hours almost just under
three hours and you can go and learn a whole bunch of cool things how to work
with Polars and Polars uses Pi arrow underneath the covers or arrow underneath
the covers so you can also get some cool experience with that. So people check this out.
It's been well received and continuing to go. All right. That's my only extra.
I have, I have a joke, but any extras before we do that?
Oh, I'm loaded with extras this week. So, uh, let's, let's load them up.
So there's my point blank. Um, I was interested. So I'm, I'm, I'm used,
I haven't been using rough.
I use rough a lot, but I don't tweak it very much.
I haven't put, there's a bunch of new rules or not really new, but there's a lot of rules that you can turn on for rough for, for checking your code.
And the, the, if you look at the rules list, there's a lot, there's a lot of rules that you could add. So I was interested in this discussion
on Reddit about, hey, rough users, what are the rules you're using and what are you ignoring?
This one was amusing to me that said I start every project and select all and ignore none.
That's not realistic because some of them conflict with each other. So I think this was probably a joke,
but I'm not sure. I do like, so there's some serious people that said, hey, this is my set.
It had like Pyflakes rules and Pycode style and warnings and errors. It seems like a decent set.
It's just kind of a fun thing to see what other people are using to maybe check out some things.
Here's somebody else that has a different way.
Instead of picking which rules they use, it's a select all but ignore certain rules that
don't make sense for them.
And yeah, just kind of an interesting discussion.
Anyway, I'm not sure if I covered this one already,
but this was as of May 26, Python 3.14.0 Beta 2 is out.
So if you're testing new Python versions,
make sure to keep up on the new Betas that come out.
Nice, yeah, I don't know though,
I don't think we covered it.
Okay, and I guess sort of a,
oh, I got a couple more topics.
One of them is, if you happen,
like just randomly wanted to switch Mastodon servers,
it's not, and not lose anything,
like lose anybody that might be following you already,
there is a way to do it,
and we'll link to an article on how to do this on Feddy tips.
And it's not, this is not a trivial action. There is a way to do it and we'll link to an article on how to do this on Feddy tips.
And it's not, this is not a trivial action.
So I guess don't do it on an empty stomach.
Looks like a lot of steps, but I'm glad somebody pointed that out.
Why might you?
Well, there's, there's been a little bit of drama on, on Faustadon lately and I am not the expert on this but there's enough people I trust
that are migrating away from Faustadon that I'm looking into it.
I just haven't done it yet because yeah, anyway, thoughts on Faustadon?
I have noticed people who I thought followed me.
I got notifications on Macedon that they had followed me and I think that's why I think they were moving accounts.
Yeah, I got some too.
I'm like, well, I want to follow them too.
And I was already following them.
So if you want, and so I guess this is public service as
well, even if you're not considering switching,
if you're getting notifications from people that they're,
that you've, that you're like,
I thought they all already followed me.
It's probably because of something around this.
The last thing I wanted to talk about is I was, I've, I've got some, uh, I think
that my, uh, Python or yeah, Python test.com something that I got.
Yeah.
That's my site.
Um, I don't know if I can trust the analytics that I was using before,
because I think I'm getting hit hard by AI bots and stuff.
So I did update my robots.txt, but I'm also checking out a new analytics package.
I'm trying out Fathom analytics and I chose them because they don't collect data on anybody.
It's just a which pages were hit and which pages weren't, and they are a GDPR compliant.
So, um, yeah, anyway, I'm,
I not much more of a plug than that other than I'm trying it out and uh, yeah,
we'll see. Yeah, that's it. Awesome. Awesome. Yeah.
I'm still loving you mommy analytics again, GDPR,
even better than GDPR compliant, like no cookies
whatsoever. How's that? And we self-host it so we don't have to, we don't share
the data with anybody, so we don't have to tell you about how we're sharing our
data with people, but we're not sharing data with people, which I think is great.
Yeah. Okay, effortless, it says effortless. I'm not sure it's effortless. I've
expended effort when the migrations don't work or whatever, but you know what? It's fine. It's still good.
It's better than Google Analytics. All right. Are you ready for something fun as well?
Yes. I'm not sure if I call this fun. I think it's fun-e in a sort of sympathy way.
So this joke is a cartoon called
programming humor from programming here called emotional damage, and I think
Brian, you want to be the one you want to act this sound, you want to be the one that does the first square?
Sure. Hey, there's two people looking at a dog. Does your dog bite?
No, but it can hurt you in other ways. I don't know what I was going to say. I'll do the dog.
The dog says, the feature you spent hours coding will not be deployed. Management finds it unnecessary.
you spent hours coding will not be deployed. Management finds it unnecessary."
The person starts crying. That's so bad. Does it bite? No, but it can hurt you.
Yeah. Do you know what's more rough than software having that happen with you?
Hardware people. I've got, I remember this was decades ago, but I was working with a team and somebody got a, was retiring and they were kind of roasting him
during the retirement.
But he was an FPGA developer,
or no, an ASIC developer doing custom chips.
And like he'd worked on like dozens of projects
and only like 5% of them ever shipped.
Most of them got canceled before they, you know,
before the project end date.
So that's rough.
Yeah, the lead time is definitely longer in hardware
than it is on software, right?
Like from the time you work on until the time it's out.
Yeah, it's in like single digits or fractional digits
of years instead of weeks or months.
Yeah, wow.
I definitely think about the,
I watch documentaries about, you know,
like the Cassini probe that went to,
was that Saturn it went to?
And these people will be like,
yeah, we started that 25 years ago
and now I'm done with this project.
Moving on to the next.
It's like you get two or three projects for your life.
Crazy, those things better not burn up in the atmosphere
on their way out.
Yeah. Those things better not burn up in the atmosphere on their way out.
Well, may the dogs that talk about unshipped features stay away from you all.
Bye.
Bye.