Python Bytes - #325 It's called a merge conflict
Episode Date: February 28, 2023Topics covered in this episode: Python Parquet and Arrow: Using PyArrow With Pandas FastAPI-Filter 12 Python Decorators to Take Your Code to the Next Level PyHamcrest Extras Joke See the full sh...ow notes for this episode on the website at pythonbytes.fm/325
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly
to your earbuds.
This is episode 325, recorded February 28th, the last day of February in 2023.
I am Brian Ocken.
And I'm Michael Kennedy.
And before we jump in, I want to thank everybody that shows up for the live stream.
If you haven't shown up for the live stream before, it's a lot of fun.
People can stop and ask questions and chat and everything,
and it's a good way to say hi.
And we enjoy having you here,
or watch it afterwards if this is a bad time for you.
Also wanna thank Microsoft for Startup Founders Hub
for sponsoring this episode.
They've been an excellent sponsor of the show,
and they've also agreed to have us
be able to play with the sponsor spots
and do some AI readings. So this one's going to be a fun one, this one. So I'm excited about it.
I am too. That's going to be fun.
So why don't you kick us off with our first topic today?
All right. Let's jump right in. You like solid code. So how about some Codesolid.com?
Has nothing to do with solid code, but it's still interesting, and it does have to do with code.
This one is something called Parquet and Arrow.
Have you heard of Apache Arrow or the Parquet file format, Brian?
I don't.
I've heard of Arrow, but I don't think I've heard of Parquet.
So when people do a lot of data science, you'll see them do things like open up Jupyter Notebooks and import pandas.
And then from pandas, they'll see them do things like open up Jupyter notebooks and import pandas. And then from
pandas, they'll say load CSV. Well, if you could think of a whole bunch of different file formats
and how fast and efficient they might be stored on disk in red, how do you think CSVs might turn
out? Pretty slow, pretty large, and so on. And Arrow through PyArrow has some really interesting in-memory structures that
are a little more efficient than Pandas, as well as it has access to this Parquet format. So does
Pandas through an add-on, but you'll see that it's still faster using PyArrow. So basically,
that's what this article that I found is about. It highlights how these things compare and it basically asks the
questions like, can we use Panda's data frames and arrow tables together? Like if I have a Panda's
data frame, but I want to then switch it into PyArrow for better performance at some point
for some analysis, can I do that? Or if I start with PyArrow, could I then turn it into a data
frame and hand it off to Seaborn or some other thing that expects a Pandas data frame?
The answer is yes.
Short version there.
Are they better?
In which ways are they better?
Which way are they worse?
And then the bulk of the analysis here is like, we could save our data, read and write our data from a bunch of different file formats.
Parquet, but also things like Feather, Orc, CSV, and others, even Excel. What should we maybe consider using? Okay. So installing it is
just pip install, high arrow, super easy, same type of story. If you want to use it with Panda,
so I've got some Pandas data, data frame, and then I want to then convert it over. That's super easy.
So you can use, go to PyArrow and you say PyArrow.table,
say from pandas and give it a pandas data frame.
And then boom, you've got it in PyArrow format.
Okay.
One of the things that's interesting is with pandas is a real nice,
like wrangling exploration style of data okay so i can go and
i can just show the data frame and it'll tell me like there are 14 columns and this example
6,433 rows and it'll list off the headers and then the column data if i do the same thing in pyro i
just get it's kind of human readable you just get like a dump of junk, basically.
It's not real great.
So that aspect, certainly using pandas,
is nice for this kind of exploration.
Another thing about Pyro is the data is immutable.
So you can't say, oh, every time that this thing appears,
actually replace it with this canonical version.
You know, if you get like a Y, lowercase yes,
and capital yes, you want to make them all
just lowercase. Yes. Or just the Y like you got to make a copy instead of change it in place. So
that's one of the reasons you might stick with pandas, which is pretty interesting,
but you can do a lot of really interesting parsing and performance stuff that you would do
with like you would do with pandas. But if your goal is performance,
and performance measured in different ways, how much memory does it take up in computer RAM? How
much disk space type of memory does it take up? How fast is it to read and write from those? It's
pretty much always better to go with PyArrow. So for example, if I take those same sets of data,
those two sets of data from,
I think this is the New York City taxi data,
some subset of that, really common data set.
It's like digit grouping.
It's a little over three megs of memory for the data frame.
And it's just under a hundred, sorry, three megs.
Yeah, I don't know if I said 300.
Three megs of data for pandas,
whereas it's just under one meg for Pyro.
So that's three times smaller, which is pretty interesting there.
Yeah.
The other one is if you do like mathy things on it, like if you've got tables of numbers,
you're really likely to talk about things like the max or the mean or the average and so on. Now, if you do that to pandas and you do it to Pyro,
you'll see it's about eight times faster to do math with Pyro
than it is to do it with pandas.
That's pretty cool, right?
Yeah.
The syntax is a little grosser, but yeah.
The syntax is a little grosser.
I will show you a way to get to this in a moment that is less gross, I believe.
And then Alvaro
out there does say, if you want fast
data frames, Polar's
plus Parquet is the way to go.
He's reading,
skating to where the puck is going to be,
indeed. And Kim says,
presumably the immutability plays
a large part in the performance.
I suppose so.
Yeah. And then also some feedback of real-time presumably the immutability plays a large part in the performance i suppose to yeah um and then
also some feedback of real-time analytics here alvaro says i got a broken script from a colleague
i rewrote it in pandas and it took about two hours of process in pollers it took three minutes so
that's a non-trivial sort of bonus there all right let me uh let me go over the file formats and I'll just really
quickly, I think we've talked about pullers and I'll just reintroduce it really quick.
So if we go and look at the different file formats, we could use parquet. So we could say
two parquet with pie arrow and you get it out. And these numbers are all kind of like insane,
four milliseconds versus reading it with two milliseconds. If you use the fast parquet,
which is the thing that allows data frames to do it,
it's 14 milliseconds,
which is a little over three times slower,
but it's still really, really fast, right?
There's feather,
which is the fastest of all the file formats
with a two millisecond save time,
which is blazing.
There's orc.
I have no idea what orc is.
It's a little bit faster.
Or if you want to show that you're taking lots of time and doing lots of processing,
doing lots of data science-y things, you could always do Excel, which takes about a second
almost. I mean, on a larger data set, it might take lots longer, right? You're like, oh, I'm
busy. I can't work. I'm getting a coffee because I'm saving. Well, I mean, there's some people
that really have to export it to Excel
so that other people can make mistakes later.
Yes, exactly.
Because life is better when it's all go-tos.
Yeah.
But no, you're right.
If the goal is to deliver an Excel file,
then obviously.
But this is more like considering
what's a good intermediate just storage format.
And then CSV is actually not that slow.
It's still slower, but it's only 30 milliseconds.
But the other part that's worth thinking about,
remember, this is only 6,400 rows.
The Parquet format is 191K.
The Pandas one is almost 100K more, which is interesting.
The Feather is almost half a meg.
Ork is three quarters of a meg.
Excel is half a meg.
CSV is a meg, right?
So a meg, it's almost five times file size increase.
So if you're storing tons of data and it's five gigs versus 50 gigs,
you know, you maybe want to think about storing it in a different format.
Plus you read and write it faster, right?
So these are all pretty interesting.
And Polar.rs is the lightning fast data frame built in Rust and Python.
This is built on top of PyArrow.
I had a whole built on top of Apache Arrow.
I had a whole TalkPython episode on it.
I'm pretty sure I'd talked about Polar's before on here as well.
But it's got like a really cool sort of fluent programming style.
And under the covers, it's using PyArrow as well.
So pretty neat yeah so if
you're really looking to say like i just want to go all in on this as avaro pointed out i think it
was avaro that polar is yeah that polar is pretty cool okay neat and henry out there real time
feedback is pandas is fully supporting pyarrow for all data types in the upcoming 1.5 and 2.0
releases there was just a ball of post on it on the Data Pythonista blog.
It's not clear if they're switching to it.
I believe it's NumPy at the moment as the core,
but it could be supported, which is awesome.
Yeah, thanks, Henry, for that update there.
Well, then also, he said, but it did say,
basically starting to get native PyArrow speed with Pandas by just selecting the back end in the new Pandas version.
Indeed.
Awesome.
Yeah, yeah.
Very, very cool.
So lots of options here.
But I think a takeaway that's kind of worth paying attention to here is choosing maybe Parquet as a file format, regardless of whether you're using Pandas or Pyro or whatever, right?
Because I think the default is read and write CSV.
And if your CSV files are ginormous,
that might be something you want to not do.
All right, over to you.
Well, you said I've never heard of Parquet.
And before we get to the next topic,
I was thinking, like, is it butter or is it Parquet?
This is an old thing from when we were kids.
That's right. That's margarine. parquet had a little tub that talked it was neat uh oh that's right it did it had a little mouth yeah yeah uh i want to talk about fast api a bit
so this uh this topic fast api filter comes from us from arthur and Arthur actually it's his library past fast API
filter and this is pretty cool so I'm going to pop over to the documentation quickly but what it is
it's a query string filters for API endpoints to and so you can show them in swagger and use them and stuff for cool things so uh i'll pop over to
the documentation so uh it says um query string filters that supports backends s sql alchemy and
mongo engine so that's nice but uh let's say well we'll get to what the filters look like later but
in the swagger interface this is pretty neat So let's say you're grabbing the users,
and you want to filter them by the name.
You can do a query in the name or the age less than
or age greater than or equal.
These are pretty nice.
So it says the philosophy of FastAPI filter
is to be very declarative.
You define fields that you want to be able to filter on,
as well as the type
of operator and then tie your filters to a specific model. It's pretty easy to set up.
The syntax is pretty well, we'll let you look at it, but it's not that bad to set up the
filters.
Yeah, a lot of Pydantic models as you might expect it being fast API.
Yeah. So plug in these filters, but then you get things like the built-in ones are like
not equal, greater than, greater than, equal in, those sorts of things. But you could do some
pretty complex query strings then, like, oh there's some good examples down here. So like the users,
but order by descending name or order by ascending id, There's like plus and minus for ascending and you can have order by
and you can filter by like the name custom orders and, and that's how
putting some filters right in your your API string is kind of interesting idea.
I don't know if it's a good idea or a bad idea, but it's interesting.
Yeah, this is a, this is a real interesting philosophy of how do I access the data in my database as an API?
Yeah.
And I would say there's sort of two really common ways, and then there's a lot of abuse of what APIs look like and what you should do.
You know, just remote procedure calls and all sorts of randomness.
But the philosophy is I've got data in a database and
I want to expose it over an API. Do I go and write a bunch of different functions in FastAPI in this
example where I decide here's a way where you can find the recent users and you can then possibly
take some kind of parameter about a sort or maybe how recent of the users do you want to be, but you're writing the code that decides here's the database query, and it's generally focused on
recent users, right? That's one way to do APIs. The other is I kind of want to take my database
and just make it queryable over the internet, right? And this is with the right restrictions,
it's not necessarily a security vulnerability, but it's just pushing all of the thinking about what the API is to the client side. Right. So if I'm doing Vue.js, it's like, well, we'll wrap this onto our database and you ask it any question you filters where you say, give me all the users where the created date is less than such and such, or greater than such.
You know, that would basically be like the new users, right? But it's up to the client
to kind of know the data schema and talk to it. And this, you know, this is that ladder style.
If you like that, awesome. You know, you can expose a relational database over SQL Alchemy
or MongoDB through Mongo Engine. And it looks pretty cool.
My thoughts on where I probabli- I mean,
I'm not using this in production.
But my thoughts on where I might use this,
even disregarding one of the Brandon's concerns,
Brandon Brainers says, exposing my API field names
makes me nervous.
But there's a part of your development where you're not quite sure what queries you want.
So custom writing them, maybe you're not ready to do that or it'll be a lot of back and forth.
So I think a great place to be for this would be when you're working with, you've got your front end and your back end code, your API code, and you're trying to figure out what sort of searches you want,
and you can use something like this
to have it right be in the actual API query.
And then once you figure out all the stuff you need,
then you could go back if you want to
and hard code different API endpoints
with similar stuff, maybe?
I don't know.
Yeah, yeah.
And not everything's built the same, right?
Kim out there points out that many of the APIs
that he uses or builds are for in-house use only.
Yeah.
Right?
And so it's just like,
instead of coming up with very, very focused API endpoints,
it's like, well, kind of just leave it open
and people can use this service to access the data
in a somewhat safe way, like a restricted way.
Yeah.
So it's, what are you building? Like, are you putting it just on the open internet to access the data in a somewhat safe way, like a restricted way. Yeah.
So it's, what are you building?
Like, are you putting it just on the open internet or are you putting it, you know, inside?
That's very true.
Yeah, like I've got a bunch of projects
I'm working on that are internal
and like, who cares if somebody knows
what my data names are and stuff, so.
Right, well, and what is in it?
Are you storing social security numbers and addresses
or are you storing voltage levels for RF devices?
Exactly.
Oh, no, the voltage levels have leaked. Oh, no. Right. I mean, flexibility might be awesome.
Yeah. I mean, the end, like it's secretive. We don't want it to get out in the public, but it's not like something that internal users are going to do anything with.
So, yeah. Yeah. Yeah, exactly to do anything with. So, yeah.
Yeah, yeah, exactly.
Cool.
Cool.
Well, yeah, that's really, really a nice one.
So, Brian, sponsor this week?
Yeah, Microsoft for Startups Founders Hub.
But if you remember last week, we did an ad where we asked an AI to like come up with the ad text for us.
In like an official, sort of official sounding way.
Yeah.
So this week, you pushed it through the filter and said to try to come up with the wording in a hipster voice, right?
So here we go.
Tell us about it.
With a hipster style.
I'll try.
Yo, Python Bytes fam, this segment is brought to you by the sickest
program out there for startup founders, Microsoft for Startup Founders Hub. If you're a boss at
running a startup, you're going to want to listen up because this is the deal of a lifetime.
Microsoft for Startup Founders Hub is your ticket to scaling efficiently and preserving your
runway, all while keeping your cool factor intact.
With over six figures worth of benefits,
the program is serious next level. You'll get 150K in Azure credits,
the richest cloud credit offering on the market,
access to the OpenAI APIs
and the new Azure OpenAI service
where you can infuse some serious generative AI
into your apps and a one-on-one
technical advisor from the Microsoft squad who will help you with your technical stack and
architectural plans. This program is open to all, whether you're just getting started or really
or already killing it. And the best part, there's no funding requirement. All it takes is five
minutes to apply and you'll be reaping the benefits in no time.
Check it out and sign up for Microsoft for Startup Founders Hub
at pythonbytes.fm slash foundershub2022.
Peace out and keep listening.
It's insane the power of these AIs these days.
And you know, if you want to get access to OpenAI
and Azure and GitHub and all those things,
well, a lot of people seem to be liking that program.
So it's cool.
They're supporting us.
Yeah.
Also cool that they're letting us play with the ad.
Yes, with their own tools indeed.
Okay.
What I got next, Brian,
is stuff to take your code to the next level, brah.
12, but this sounds pretty interesting.
12 Python decorators to take your code to the next level.
Nice.
Decorators are awesome.
And they're kind of like a little bit of magic Python dust.
You can sprinkle onto a method and make things happen, right?
Now, about half of these are homegrown.
Half of those I'd recommend.
And then a bunch of them are also,
the other half is maybe the built-in ones
that come from various places.
So I'll just go through the list of 12
and you tell me what you think.
The first one that they started off with in this article doesn't thrill me. It says, hey,
I can wrap this function with a thing called logger and it'll tell me when it starts and stops.
Like, yeah, no, no, thanks. That doesn't seem interesting. But the next one, if, especially
if you're already focused on decorators and psyched about that is the func tools wraps,
right? Because if you're going to definitely, you got to use it. Yeah. Right? Because if you're going to... Definitely, you got to use it.
Yeah, it's basically required.
If you create a decorator and they show you how to do that
on the screen here
and you try to interact
with the function that is decorated,
well, you're going to get funky results.
Like what is the function's name?
Well, it's the name of the decorator,
not the actual thing.
What if it's arguments?
It's star args, star kwargs.
What is documentation?
Whatever the name,
the documentation of the decorators and all that.
So with wraps, you can wrap it around and it'll actually kind of pass through that information, which is pretty cool.
So if you're going to do decorators wrapped, that's kind of a meta decorator here.
Another one I think is really cool.
Not for all use cases, not really great on the web because of the scale out across process story that often happens
in deployment. But if you're doing data science-y things or a bunch of repetitive processing,
the LRU cache is like magic unless you are really memory constrained or something.
Yeah. Love LRU cache.
Yeah. You just put it on a function and you say at LRU cache, and you can even give it a max size.
And it just says, as long as given a fixed input,
you'll get the same output every time.
Then you can put the LRU cache on it.
The second time you call it the same arguments,
it just goes, you know what?
I know that answer.
Here you go.
And it's an incredibly easy way to speed up stuff
that takes like numbers and like well-known things
that are not objects, but it can be tested.
Like, yeah, these are the same values.
And if you don't care about the max size,
you can just use the decorator cache now. don't need to have the lru part there
no nice great addition next up we have at repeat suppose for some reason i want to call a function
multiple times like if i want to try to say what if i call this a bunch of times just for say
um load testing or i want to just, you know, kind of enduring development.
I can't see this being used in any realistic way.
But you can just say this is one that they built.
You just wrap it and say, repeat this n number of times.
That might be useful.
Yeah.
Time it.
So time it is one that you could create that I think is pretty nice.
Like this is one of the homegrown ones that I do think is good
is a lot of times you want to know how long a function takes.
And one thing you could do is you could grab the time at the start here,
these imperfect counters, which is pretty excellent.
And then at the end, grab the time, print it out,
but then you're messing with your code, right?
It'd be a lot easier to just go, you know what?
I just want to wrap a decorator over some function and have it print out stuff
just usually during development or debugging or something, not in production,
but like, well, how long did this take? So just yesterday i was fiddling with a function i'm like if i change it this way
will it get any faster it's a little more complicated but maybe there's a big benefit
right and i put this on something like this on there and like yeah it didn't make any difference
so we'll keep in the simple bit of code in place yeah and it's if it's like super fast you can also
do things like um uh loop it like add a loop
thing there so that it runs like a hundred times and then do do the division something that's a
really good point and the um these are composable right decorators are composable so you could say
at time it at repeat 1000 oh yeah yeah right i mean that all of a sudden, repeat's starting to sound useful. They have a retry one for retrying a bunch of times.
No.
Tenacity.
Don't do that.
There's some that are really, really fantastic
with many options.
Don't bother rewriting some of those
because you've got things like tenacity
that has exponential back off,
limiting the number of retries,
customizing different behaviors and plans
based on exceptions. So grab something like tenacity. But the idea of understanding of retries, customizing different behaviors and plans based on exceptions.
So grab something like Tenacity.
But the idea of understanding the retries is kind of cool.
Thanks for reminding us about Tenacity.
I forgot about that.
Yeah, that's a good one, right?
Count call.
If you're doing debugging or performance stuff,
you're just like,
why does it seem like this is getting called like five times?
It should be called once.
This is weird.
And so you could actually,
they introduced this count call decorator
that just every time a function is called,
it's now been called this many times,
which sounds silly,
but are you trying to track down
like an N plus one database problem
or other weird things like that?
You're like, if you don't really know
why something bizarre is happening a ton of times,
this could be kind of helpful.
Rate limited.
This one sounds cool as well like i only want you
to call this function so often per second and i'm you can decide what to do in this case it says
we're going to time.sleep i'm not so sure that makes a lot of sense but it was asynchronous you
could await asyncio.sleep and it would cause no overhead on the system it wouldn't clog anything
up it would just make the caller wait so there's some interesting variations there as well. Keep scrolling. And then some more built-in ones.
Data classes. If you want to have a data class, just at data class, the class. Brian, do you use
data classes much? Yes, quite a bit. Nice. I like my classes to be VC funded. So I use Pydantic more
often. Let's see last week. No, congratulations to the Samuel team there.
But I honestly, I typically use Pydantic a little bit more
because I'm often going to use it with FastAPI or Beanie
or something over the wire.
But I really like the idea of data classes too.
All right, a couple more.
Register.
Let me know if you know about this one.
I heard about it a little while,
but I haven't ever had a chance to use it.
But the at exit module in Python, it has a way to say when my program is shutting down,
even if the user like control C is out of it, I need to make sure that I delete, say,
some file I created or call an API and tell it real quick, like, you know what?
We're gone.
Or I don't know, something like that, right?
You just need, there's something you got to do on your way out,
even if it's a force exit.
Yeah.
You can go.
I have, sorry to interrupt, I have used this.
No problem.
Yeah.
Yeah, when did you use it?
What do you use it for?
Similar sort of thing.
I've got like some thing in the background that I want to make sure that we,
there's a little bit of cleanup that's done before it goes away.
But I just wanted to correct this.
This says from at exit import register and then decorate with register.
I think it looks better if you just import at exit and do the decorator as at exit dot register because it's better documentation.
I totally agree.
I totally agree.
There's a couple of things in this article where the code is a little bit,
there was the other article that I did that was a little bit,
that I talked about that was a little bit weird.
But I agree, keeping the namespace tells you, like,
well, what the heck are you registering for, right?
I think namespaces are a good idea.
I definitely use them.
But anyway, so you can just put this decorator on a function,
and when you exit, they show an example of some loop going just while true,
and they control C out of it.
It says, hey, we're cleaning up here.
Now bye.
Which is, that's a pretty nice way to handle it
instead of trying to catch all the use cases
with exceptions and try finelies and so on.
All right.
Property.
Give your fields behaviors and validation.
Getter setters and so on.
Love it.
And single dispatch,
I believe we've spoken about
before where you can give um basically you do argument over overloads for functions so you can
say here's a function and here's the one that takes an integer and here's the one that takes
a list and these are separate functions and separate implementations and you do that with
that single dispatch decorator you know i actually always forget about this, but I kind of glad I forget about it because I think I would use it too much.
I used to love function overloading when I was doing C, C++, C sharp type stuff.
I would really count on it.
And I thought I would miss it in Python.
And I haven't.
Well, I noticed that some people that convert to python from c will just
assume that it has function overloading and it just doesn't work and that's known as function
erasure the last one wins right yeah we talked about that last time oh no we talked about that
when we talked on talk python which maybe we'll mention at the end but yeah last time we talked
yeah yeah all right, those are the 12
that they put in the article. Most of them are really great. Some of them point you at things
like tenacity, which is also really good. So that's what I got. Nice. Well, I would like to
talk about testing too a bit. Let's talk about PyHamCrest. So this topic is contributed by TXLs on the socials.
So thanks, TXLs.
So PyHamCrest, and the thought was,
like, Brian talks about testing a lot,
so why haven't you covered this?
So what PyHamCrest is,
is a matcher object declarative rule matcher thing
that helps you with asserts and stuff like that
have you used this i have not my first thought it was like a some kind of menu item on a holiday
dinner but no i i literally only heard about this because you put it in the show notes so this is
news to me the idea is instead of like all the assert so you've got a whole bunch of certain
things like assert that assert that and equal to and a whole bunch of certain things like assert that,
assert that and equal to and a bunch of ham crust things
that you can import.
So you can do things like
instead of saying assert the biscuit
equals my biscuit,
you can say assert that the biscuit
equal to my biscuit.
So at first,
so I've always thought
asserts are like,
I get this for unit test,
but for PyTest, do we need it?
Because you could just use assert in PyTest.
However, I'm kind of easing up on that argument because I can see a lot of places where just really if you can make your assertions more readable in some contexts, then why not?
Sure.
And I don't know about this one, but if it's got things like go through a list
and assert everything as equal in the list right yeah or or higher order things where it would be
kind of kind of complex to implement the test that is the thing you want to assert like these three
fields are equal of these three things right then it becomes a little less obvious and if this has
a really nice story well so there's like it does yep there's a
whole bunch of matchers within it like for objects it's like equal to and has length it has property
um has properties is interesting so you could like assert on duck typing hopefully it has these
these values or something uh numbers close to greater than, less than. Of course, these asserts are fine with this,
but the logical stuff, the logical and sequences
is I think where I probably might use it.
Things like all of or any of or anything,
or that's neat.
Like all of these things are true.
And you can combine this with or,
like all of these or all of those or something.
Sequences, if it contains contains
in any order that's kind of interesting um yeah nice has items is in again these are things that
are testable in python raw like just raw tests not too bad but if it's more readable sure why not
um so there's a there's some like uh that are shown
like especially with raising errors like exceptions oh where'd i get it oh the tutorial
has a bunch of cool stuff in it um the the things like assert that calling translate with args
curse word raises a language error well that's kind of neat um very naughty assert that broken function raises exception
okay um i mean with pytest you've got the raises thing with with pytest raises but it is some
people have a hard like it's not obvious and this maybe maybe this looks better um the the this is
kind of neat it use you can use use assertion exceptions with async methods.
So it has a resolved item.
So you can say assert that await resolved future
results in future raising value error or something.
Yeah, nice.
That's cool.
So, yeah.
So a lot of predefined matchers
and I guess it has some syntactic sugar things
like it is underscore.
So just if it sounds better to have an is in there, you can add it.
So assert that the biscuit is equal to doesn't do anything, but it sounds better.
So why not?
I guess if you wanted to read that English, like insert a no op verb.
Yeah, but but I guess I do want to highlight this because why
not? I mean, I'm, I'm, since I'm writing a lot of test code, I'm used to all the different ways you
can check different equivalents of values or comparisons. So I don't know how much I would
use this, but for, I've seen a lot of people struggle with how to, how to write an assertion.
And so having some help with a library, why not?
So this is pretty neat.
Yeah, this totally resonates with me.
I like it.
Well, that's our six items.
Six, four items.
Do you have any extras for us this week?
I do have a few extras.
Let me throw them in here.
First of all, it's a few weeks old.
I didn't remember to put it up here, but Python 3.11.2 is out as well as 3.10.10 and the alpha
five of 3.12. We're getting kind of close to beta, it feels like, for 3.12, which will be exciting
because then we'll get real visibility into what's probably going to be happening for the next version of python that's cool yeah i'm yeah i'm testing for 312 already
with our ci builds so nice uh for for example with 311 2 there were 192 commits since 311 1
194 rather so that's pretty pretty non-trivial right there and they link over to somewhere that
looks i don't know just what am i supposed to learn from that here's the changes from 311 to rather so that's pretty pretty non-trivial right there and they link over to somewhere that looks
i don't know just what am i supposed to learn from that here's the changes from 311 to 312
so i always go to downloads full list of downloads scroll down to the particular version
here and go to release notes and there you go that's probably what they should be linking to
and here's all the things there's some that are in here that are um things that you might actually
care about like for example fixed race race race condition while iterating over thread states
in thread.local.
You might not want that in your code.
And various other things.
Yeah, look at all these changes here.
This is a lot.
Yeah, nice.
Go team.
Yeah, go team.
You might think, oh, it's just a dot plus one,
plus 0.0.1 sort of thing to it. But now it's got some
interesting changes as well as I haven't looked at what's happening in the others, but maybe some
of those are important enough to pull backwards those fixes. Also more recent as in eight days
ago, we've got Django 4.2 beta, beta one. And, you know, typically the philosophy is
once it hits beta, the API should be stable.
The features should be stable.
It's just about fixing bugs.
Doesn't always work out that way,
but that's generally the idea.
So basically here's your concrete look at Django 4.2.
Yeah.
Right?
And 4.2 looks exciting.
Yeah, absolutely.
So you can, you know, they've got some release notes
and various
things about what's going on you can go check that out so they got psycho uh pg3 so postgres
support it now supports post uh psycho pg version 3.1.8 or higher you can update your code to use
that as a back-end i'm still using two so i better i didn't know there's a three. Careful, Brian. PsychoPG2
is likely to be deprecated and removed at some
point in the future.
Comments on columns and
tables. So that's kind of neat
in the database model. So the ORM
gets some of their mitigations.
No comment on that. Yeah, no comment. Very good.
Some stuff about the
so-called breach attack. I have no idea what it seems
to have to do with gzipIP. So check that out.
Another one that's interesting is in-memory file storage and custom file stores.
This is for making testing potentially faster.
So if you're going to write some files as part of a behavior, you can say, just write
them to in-memory.
Don't have to clean them up.
And they write really fast.
Yeah.
It phenomenally speeds up testing.
It's good.
Yeah, I bet.
All right.
So, uh, there's that. And then also, Yeah, I bet. All right. So there's that.
And then also, I want to give a shout out.
I'll put it like this.
I want to give a shout out to an app real quick that people might find useful by way of Journey.
So rewriting the TalkPython apps in Flutter, which all the APIs are Python, but we're having apps on macOS, Windows, Linux, iOS, and Android.
That's really hard to do with Python. so Flutter is what we're using.
And it's going along really well.
Here's a little screenshot for you, Brian, to show you what we've got so far.
Isn't that cool?
Yeah.
Yes, and another, like here's the little app and stuff.
So I think I'm really happy with how it's coming together.
I think it's going to be a better mobile app experience
and an existing desktop experience for like offline mode with the TalkPython courses.
Oh, cool.
Yeah. So that'll be really neat. The thing I want to tell you about is something I just applied
to it. This thing called ImageOptim. And what you can do is you can just take the top level
of your project. So I did this for say the TalkPython training website. I did this for
the mobile app. Just take the very top level project folder and just throw it on this app.
And it'll go find all the images,
all the vector graphics and everything
and minimize the heck out of them.
So for example, when I did that on the mobile app,
it went from 10 megs of image assets
to eight megs of image assets, lossless.
Like no one will know the difference
other than me that I've done it.
And it dropped 20% of the file size,
which is not the end of the world,
but given how much work it is, it's not too bad.
Well, the lossless part is the important bit. So that's pretty exciting.
Yeah, exactly. So it'll do things like if it's a PNG and it sees you're using a smaller color
palette than what it's actually holding, it's like, oh, we can rewrite that in a way that
doesn't make it actually look different, but takes up less storage. Basically, it's a wrapper over things like Moe's JPEG,
PNG Crush, Google's Zapfali.
I don't know how to say these things.
But there are a bunch of lossless image manipulation tools,
and it just applies those to all of them in a super easy way.
And this thing's open source itself.
Cool.
So, yeah.
Anyway, if people have websites out there,
they consider just like,
take your website, throw it on here,
and it'll tell you,
make sure it's all checked in and get,
do this, see what it says.
It gives you a little report at the bottom,
like you saved either 10K or you saved five megs,
depending, you can decide whether to keep the changes.
Yeah, cool.
Yep, all right.
That's all my extras.
How about you?
I just have a couple.
Yesterday, I talked with you on Python Byte.
No, on Talk Python about PyTests tips and tricks.
And I just wanted to point out that the post is available
for people to read if they want to go look through it.
And if you have comments, please, or questions,
let me know, of course.
Also in March, I think I've brought this up before,
but I'll be speaking at PyCascades.
There's a picture of me without hair.
And I did stick up a blog post on pythontest.com,
just a placeholder so that I can link the slides
and code afterwards.
So that's up.
Yeah, awesome.
And that's it. Yeah, awesome. And that's it.
Yeah, that's going to be a really cool talk.
I think a lot of people are interested in how you share fixtures and build them for your team or cross-project.
As well, as it was really great to have you on TalkPython, we talked a bunch of cool PyTest things.
That'll be out in a few weeks for people if they don't want to watch the YouTube version.
And then we'll let people know when that's available.
Yeah, absolutely. But hopefully they're all subscribed to watch the YouTube version. And then we'll let people know when that's available. Yeah, absolutely.
But hopefully they're all subscribed to TalkPython already anyway.
Of course.
I'm sure they are.
Yeah.
I'm sure they are.
How about a joke?
Are we ready?
Yes, let's do a joke.
Let's do it.
So this one, this is a quick and easy one.
And for people listening, no pictures even.
This one comes from NixCraft on Twitter.
And it says, developers, let us describe you as a group.
Groups of things sometimes have weird names.
Like a group of wolves is called a pack.
A group of crows is called a murder.
We think we should call a group of developers, Brian.
That's hilarious.
A group of developers is called a merge conflict.
Isn't that good? Yeah. It is. The comments are pretty good.
If you scroll down here, some of them are silly. Some are just like, yep. Yeah. Yeah. Anyway,
they're pretty good. But yeah, a group of developers is called a merge conflict. And
so true it is. You can even have a merge conflict with yourself. Be a group of developers is called a merge conflict and so true it is you can even have a
merge conflict with yourself be a group of one um how about a group of uh tech ceos with social
media accounts it'd be a lawsuit that's right an sec uh sec investigation that's right yeah
well fun as always thank you yeah thanks thanks everybody Well, fun as always.
Thank you.
Yeah.
Thanks.
Thanks everybody for showing up as always.
And let's, we'll see everybody next week.