Python Bytes - #325 It's called a merge conflict

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 325, recorded February 28th, the last day of February in 2023. I am Brian Ocken. And I'm Michael Kennedy. And before we jump in, I want to thank everybody that shows up for the live stream. If you haven't shown up for the live stream before, it's a lot of fun. People can stop and ask questions and chat and everything,

Starting point is 00:00:25 and it's a good way to say hi. And we enjoy having you here, or watch it afterwards if this is a bad time for you. Also wanna thank Microsoft for Startup Founders Hub for sponsoring this episode. They've been an excellent sponsor of the show, and they've also agreed to have us be able to play with the sponsor spots

Starting point is 00:00:46 and do some AI readings. So this one's going to be a fun one, this one. So I'm excited about it. I am too. That's going to be fun. So why don't you kick us off with our first topic today? All right. Let's jump right in. You like solid code. So how about some Codesolid.com? Has nothing to do with solid code, but it's still interesting, and it does have to do with code. This one is something called Parquet and Arrow. Have you heard of Apache Arrow or the Parquet file format, Brian? I don't.

Starting point is 00:01:14 I've heard of Arrow, but I don't think I've heard of Parquet. So when people do a lot of data science, you'll see them do things like open up Jupyter Notebooks and import pandas. And then from pandas, they'll see them do things like open up Jupyter notebooks and import pandas. And then from pandas, they'll say load CSV. Well, if you could think of a whole bunch of different file formats and how fast and efficient they might be stored on disk in red, how do you think CSVs might turn out? Pretty slow, pretty large, and so on. And Arrow through PyArrow has some really interesting in-memory structures that are a little more efficient than Pandas, as well as it has access to this Parquet format. So does Pandas through an add-on, but you'll see that it's still faster using PyArrow. So basically,

Starting point is 00:01:58 that's what this article that I found is about. It highlights how these things compare and it basically asks the questions like, can we use Panda's data frames and arrow tables together? Like if I have a Panda's data frame, but I want to then switch it into PyArrow for better performance at some point for some analysis, can I do that? Or if I start with PyArrow, could I then turn it into a data frame and hand it off to Seaborn or some other thing that expects a Pandas data frame? The answer is yes. Short version there. Are they better?

Starting point is 00:02:32 In which ways are they better? Which way are they worse? And then the bulk of the analysis here is like, we could save our data, read and write our data from a bunch of different file formats. Parquet, but also things like Feather, Orc, CSV, and others, even Excel. What should we maybe consider using? Okay. So installing it is just pip install, high arrow, super easy, same type of story. If you want to use it with Panda, so I've got some Pandas data, data frame, and then I want to then convert it over. That's super easy. So you can use, go to PyArrow and you say PyArrow.table, say from pandas and give it a pandas data frame.

Starting point is 00:03:13 And then boom, you've got it in PyArrow format. Okay. One of the things that's interesting is with pandas is a real nice, like wrangling exploration style of data okay so i can go and i can just show the data frame and it'll tell me like there are 14 columns and this example 6,433 rows and it'll list off the headers and then the column data if i do the same thing in pyro i just get it's kind of human readable you just get like a dump of junk, basically. It's not real great.

Starting point is 00:03:48 So that aspect, certainly using pandas, is nice for this kind of exploration. Another thing about Pyro is the data is immutable. So you can't say, oh, every time that this thing appears, actually replace it with this canonical version. You know, if you get like a Y, lowercase yes, and capital yes, you want to make them all just lowercase. Yes. Or just the Y like you got to make a copy instead of change it in place. So

Starting point is 00:04:11 that's one of the reasons you might stick with pandas, which is pretty interesting, but you can do a lot of really interesting parsing and performance stuff that you would do with like you would do with pandas. But if your goal is performance, and performance measured in different ways, how much memory does it take up in computer RAM? How much disk space type of memory does it take up? How fast is it to read and write from those? It's pretty much always better to go with PyArrow. So for example, if I take those same sets of data, those two sets of data from, I think this is the New York City taxi data,

Starting point is 00:04:48 some subset of that, really common data set. It's like digit grouping. It's a little over three megs of memory for the data frame. And it's just under a hundred, sorry, three megs. Yeah, I don't know if I said 300. Three megs of data for pandas, whereas it's just under one meg for Pyro. So that's three times smaller, which is pretty interesting there.

Starting point is 00:05:11 Yeah. The other one is if you do like mathy things on it, like if you've got tables of numbers, you're really likely to talk about things like the max or the mean or the average and so on. Now, if you do that to pandas and you do it to Pyro, you'll see it's about eight times faster to do math with Pyro than it is to do it with pandas. That's pretty cool, right? Yeah. The syntax is a little grosser, but yeah.

Starting point is 00:05:39 The syntax is a little grosser. I will show you a way to get to this in a moment that is less gross, I believe. And then Alvaro out there does say, if you want fast data frames, Polar's plus Parquet is the way to go. He's reading, skating to where the puck is going to be,

Starting point is 00:05:58 indeed. And Kim says, presumably the immutability plays a large part in the performance. I suppose so. Yeah. And then also some feedback of real-time presumably the immutability plays a large part in the performance i suppose to yeah um and then also some feedback of real-time analytics here alvaro says i got a broken script from a colleague i rewrote it in pandas and it took about two hours of process in pollers it took three minutes so that's a non-trivial sort of bonus there all right let me uh let me go over the file formats and I'll just really

Starting point is 00:06:25 quickly, I think we've talked about pullers and I'll just reintroduce it really quick. So if we go and look at the different file formats, we could use parquet. So we could say two parquet with pie arrow and you get it out. And these numbers are all kind of like insane, four milliseconds versus reading it with two milliseconds. If you use the fast parquet, which is the thing that allows data frames to do it, it's 14 milliseconds, which is a little over three times slower, but it's still really, really fast, right?

Starting point is 00:06:54 There's feather, which is the fastest of all the file formats with a two millisecond save time, which is blazing. There's orc. I have no idea what orc is. It's a little bit faster. Or if you want to show that you're taking lots of time and doing lots of processing,

Starting point is 00:07:10 doing lots of data science-y things, you could always do Excel, which takes about a second almost. I mean, on a larger data set, it might take lots longer, right? You're like, oh, I'm busy. I can't work. I'm getting a coffee because I'm saving. Well, I mean, there's some people that really have to export it to Excel so that other people can make mistakes later. Yes, exactly. Because life is better when it's all go-tos. Yeah.

Starting point is 00:07:31 But no, you're right. If the goal is to deliver an Excel file, then obviously. But this is more like considering what's a good intermediate just storage format. And then CSV is actually not that slow. It's still slower, but it's only 30 milliseconds. But the other part that's worth thinking about,

Starting point is 00:07:47 remember, this is only 6,400 rows. The Parquet format is 191K. The Pandas one is almost 100K more, which is interesting. The Feather is almost half a meg. Ork is three quarters of a meg. Excel is half a meg. CSV is a meg, right? So a meg, it's almost five times file size increase.

Starting point is 00:08:06 So if you're storing tons of data and it's five gigs versus 50 gigs, you know, you maybe want to think about storing it in a different format. Plus you read and write it faster, right? So these are all pretty interesting. And Polar.rs is the lightning fast data frame built in Rust and Python. This is built on top of PyArrow. I had a whole built on top of Apache Arrow. I had a whole TalkPython episode on it.

Starting point is 00:08:33 I'm pretty sure I'd talked about Polar's before on here as well. But it's got like a really cool sort of fluent programming style. And under the covers, it's using PyArrow as well. So pretty neat yeah so if you're really looking to say like i just want to go all in on this as avaro pointed out i think it was avaro that polar is yeah that polar is pretty cool okay neat and henry out there real time feedback is pandas is fully supporting pyarrow for all data types in the upcoming 1.5 and 2.0 releases there was just a ball of post on it on the Data Pythonista blog.

Starting point is 00:09:08 It's not clear if they're switching to it. I believe it's NumPy at the moment as the core, but it could be supported, which is awesome. Yeah, thanks, Henry, for that update there. Well, then also, he said, but it did say, basically starting to get native PyArrow speed with Pandas by just selecting the back end in the new Pandas version. Indeed. Awesome.

Starting point is 00:09:30 Yeah, yeah. Very, very cool. So lots of options here. But I think a takeaway that's kind of worth paying attention to here is choosing maybe Parquet as a file format, regardless of whether you're using Pandas or Pyro or whatever, right? Because I think the default is read and write CSV. And if your CSV files are ginormous, that might be something you want to not do. All right, over to you.

Starting point is 00:09:53 Well, you said I've never heard of Parquet. And before we get to the next topic, I was thinking, like, is it butter or is it Parquet? This is an old thing from when we were kids. That's right. That's margarine. parquet had a little tub that talked it was neat uh oh that's right it did it had a little mouth yeah yeah uh i want to talk about fast api a bit so this uh this topic fast api filter comes from us from arthur and Arthur actually it's his library past fast API filter and this is pretty cool so I'm going to pop over to the documentation quickly but what it is it's a query string filters for API endpoints to and so you can show them in swagger and use them and stuff for cool things so uh i'll pop over to

Starting point is 00:10:46 the documentation so uh it says um query string filters that supports backends s sql alchemy and mongo engine so that's nice but uh let's say well we'll get to what the filters look like later but in the swagger interface this is pretty neat So let's say you're grabbing the users, and you want to filter them by the name. You can do a query in the name or the age less than or age greater than or equal. These are pretty nice. So it says the philosophy of FastAPI filter

Starting point is 00:11:19 is to be very declarative. You define fields that you want to be able to filter on, as well as the type of operator and then tie your filters to a specific model. It's pretty easy to set up. The syntax is pretty well, we'll let you look at it, but it's not that bad to set up the filters. Yeah, a lot of Pydantic models as you might expect it being fast API. Yeah. So plug in these filters, but then you get things like the built-in ones are like

Starting point is 00:11:45 not equal, greater than, greater than, equal in, those sorts of things. But you could do some pretty complex query strings then, like, oh there's some good examples down here. So like the users, but order by descending name or order by ascending id, There's like plus and minus for ascending and you can have order by and you can filter by like the name custom orders and, and that's how putting some filters right in your your API string is kind of interesting idea. I don't know if it's a good idea or a bad idea, but it's interesting. Yeah, this is a, this is a real interesting philosophy of how do I access the data in my database as an API? Yeah.

Starting point is 00:12:30 And I would say there's sort of two really common ways, and then there's a lot of abuse of what APIs look like and what you should do. You know, just remote procedure calls and all sorts of randomness. But the philosophy is I've got data in a database and I want to expose it over an API. Do I go and write a bunch of different functions in FastAPI in this example where I decide here's a way where you can find the recent users and you can then possibly take some kind of parameter about a sort or maybe how recent of the users do you want to be, but you're writing the code that decides here's the database query, and it's generally focused on recent users, right? That's one way to do APIs. The other is I kind of want to take my database and just make it queryable over the internet, right? And this is with the right restrictions,

Starting point is 00:13:22 it's not necessarily a security vulnerability, but it's just pushing all of the thinking about what the API is to the client side. Right. So if I'm doing Vue.js, it's like, well, we'll wrap this onto our database and you ask it any question you filters where you say, give me all the users where the created date is less than such and such, or greater than such. You know, that would basically be like the new users, right? But it's up to the client to kind of know the data schema and talk to it. And this, you know, this is that ladder style. If you like that, awesome. You know, you can expose a relational database over SQL Alchemy or MongoDB through Mongo Engine. And it looks pretty cool. My thoughts on where I probabli- I mean, I'm not using this in production. But my thoughts on where I might use this,

Starting point is 00:14:11 even disregarding one of the Brandon's concerns, Brandon Brainers says, exposing my API field names makes me nervous. But there's a part of your development where you're not quite sure what queries you want. So custom writing them, maybe you're not ready to do that or it'll be a lot of back and forth. So I think a great place to be for this would be when you're working with, you've got your front end and your back end code, your API code, and you're trying to figure out what sort of searches you want, and you can use something like this to have it right be in the actual API query.

Starting point is 00:14:53 And then once you figure out all the stuff you need, then you could go back if you want to and hard code different API endpoints with similar stuff, maybe? I don't know. Yeah, yeah. And not everything's built the same, right? Kim out there points out that many of the APIs

Starting point is 00:15:07 that he uses or builds are for in-house use only. Yeah. Right? And so it's just like, instead of coming up with very, very focused API endpoints, it's like, well, kind of just leave it open and people can use this service to access the data in a somewhat safe way, like a restricted way.

Starting point is 00:15:24 Yeah. So it's, what are you building? Like, are you putting it just on the open internet to access the data in a somewhat safe way, like a restricted way. Yeah. So it's, what are you building? Like, are you putting it just on the open internet or are you putting it, you know, inside? That's very true. Yeah, like I've got a bunch of projects I'm working on that are internal and like, who cares if somebody knows

Starting point is 00:15:37 what my data names are and stuff, so. Right, well, and what is in it? Are you storing social security numbers and addresses or are you storing voltage levels for RF devices? Exactly. Oh, no, the voltage levels have leaked. Oh, no. Right. I mean, flexibility might be awesome. Yeah. I mean, the end, like it's secretive. We don't want it to get out in the public, but it's not like something that internal users are going to do anything with. So, yeah. Yeah. Yeah, exactly to do anything with. So, yeah.

Starting point is 00:16:05 Yeah, yeah, exactly. Cool. Cool. Well, yeah, that's really, really a nice one. So, Brian, sponsor this week? Yeah, Microsoft for Startups Founders Hub. But if you remember last week, we did an ad where we asked an AI to like come up with the ad text for us. In like an official, sort of official sounding way.

Starting point is 00:16:27 Yeah. So this week, you pushed it through the filter and said to try to come up with the wording in a hipster voice, right? So here we go. Tell us about it. With a hipster style. I'll try. Yo, Python Bytes fam, this segment is brought to you by the sickest program out there for startup founders, Microsoft for Startup Founders Hub. If you're a boss at

Starting point is 00:16:51 running a startup, you're going to want to listen up because this is the deal of a lifetime. Microsoft for Startup Founders Hub is your ticket to scaling efficiently and preserving your runway, all while keeping your cool factor intact. With over six figures worth of benefits, the program is serious next level. You'll get 150K in Azure credits, the richest cloud credit offering on the market, access to the OpenAI APIs and the new Azure OpenAI service

Starting point is 00:17:20 where you can infuse some serious generative AI into your apps and a one-on-one technical advisor from the Microsoft squad who will help you with your technical stack and architectural plans. This program is open to all, whether you're just getting started or really or already killing it. And the best part, there's no funding requirement. All it takes is five minutes to apply and you'll be reaping the benefits in no time. Check it out and sign up for Microsoft for Startup Founders Hub at pythonbytes.fm slash foundershub2022.

Starting point is 00:17:53 Peace out and keep listening. It's insane the power of these AIs these days. And you know, if you want to get access to OpenAI and Azure and GitHub and all those things, well, a lot of people seem to be liking that program. So it's cool. They're supporting us. Yeah.

Starting point is 00:18:07 Also cool that they're letting us play with the ad. Yes, with their own tools indeed. Okay. What I got next, Brian, is stuff to take your code to the next level, brah. 12, but this sounds pretty interesting. 12 Python decorators to take your code to the next level. Nice.

Starting point is 00:18:24 Decorators are awesome. And they're kind of like a little bit of magic Python dust. You can sprinkle onto a method and make things happen, right? Now, about half of these are homegrown. Half of those I'd recommend. And then a bunch of them are also, the other half is maybe the built-in ones that come from various places.

Starting point is 00:18:41 So I'll just go through the list of 12 and you tell me what you think. The first one that they started off with in this article doesn't thrill me. It says, hey, I can wrap this function with a thing called logger and it'll tell me when it starts and stops. Like, yeah, no, no, thanks. That doesn't seem interesting. But the next one, if, especially if you're already focused on decorators and psyched about that is the func tools wraps, right? Because if you're going to definitely, you got to use it. Yeah. Right? Because if you're going to... Definitely, you got to use it. Yeah, it's basically required.

Starting point is 00:19:07 If you create a decorator and they show you how to do that on the screen here and you try to interact with the function that is decorated, well, you're going to get funky results. Like what is the function's name? Well, it's the name of the decorator, not the actual thing.

Starting point is 00:19:18 What if it's arguments? It's star args, star kwargs. What is documentation? Whatever the name, the documentation of the decorators and all that. So with wraps, you can wrap it around and it'll actually kind of pass through that information, which is pretty cool. So if you're going to do decorators wrapped, that's kind of a meta decorator here. Another one I think is really cool.

Starting point is 00:19:38 Not for all use cases, not really great on the web because of the scale out across process story that often happens in deployment. But if you're doing data science-y things or a bunch of repetitive processing, the LRU cache is like magic unless you are really memory constrained or something. Yeah. Love LRU cache. Yeah. You just put it on a function and you say at LRU cache, and you can even give it a max size. And it just says, as long as given a fixed input, you'll get the same output every time. Then you can put the LRU cache on it.

Starting point is 00:20:09 The second time you call it the same arguments, it just goes, you know what? I know that answer. Here you go. And it's an incredibly easy way to speed up stuff that takes like numbers and like well-known things that are not objects, but it can be tested. Like, yeah, these are the same values.

Starting point is 00:20:23 And if you don't care about the max size, you can just use the decorator cache now. don't need to have the lru part there no nice great addition next up we have at repeat suppose for some reason i want to call a function multiple times like if i want to try to say what if i call this a bunch of times just for say um load testing or i want to just, you know, kind of enduring development. I can't see this being used in any realistic way. But you can just say this is one that they built. You just wrap it and say, repeat this n number of times.

Starting point is 00:20:52 That might be useful. Yeah. Time it. So time it is one that you could create that I think is pretty nice. Like this is one of the homegrown ones that I do think is good is a lot of times you want to know how long a function takes. And one thing you could do is you could grab the time at the start here, these imperfect counters, which is pretty excellent.

Starting point is 00:21:09 And then at the end, grab the time, print it out, but then you're messing with your code, right? It'd be a lot easier to just go, you know what? I just want to wrap a decorator over some function and have it print out stuff just usually during development or debugging or something, not in production, but like, well, how long did this take? So just yesterday i was fiddling with a function i'm like if i change it this way will it get any faster it's a little more complicated but maybe there's a big benefit right and i put this on something like this on there and like yeah it didn't make any difference

Starting point is 00:21:37 so we'll keep in the simple bit of code in place yeah and it's if it's like super fast you can also do things like um uh loop it like add a loop thing there so that it runs like a hundred times and then do do the division something that's a really good point and the um these are composable right decorators are composable so you could say at time it at repeat 1000 oh yeah yeah right i mean that all of a sudden, repeat's starting to sound useful. They have a retry one for retrying a bunch of times. No. Tenacity. Don't do that.

Starting point is 00:22:11 There's some that are really, really fantastic with many options. Don't bother rewriting some of those because you've got things like tenacity that has exponential back off, limiting the number of retries, customizing different behaviors and plans based on exceptions. So grab something like tenacity. But the idea of understanding of retries, customizing different behaviors and plans based on exceptions.

Starting point is 00:22:26 So grab something like Tenacity. But the idea of understanding the retries is kind of cool. Thanks for reminding us about Tenacity. I forgot about that. Yeah, that's a good one, right? Count call. If you're doing debugging or performance stuff, you're just like,

Starting point is 00:22:39 why does it seem like this is getting called like five times? It should be called once. This is weird. And so you could actually, they introduced this count call decorator that just every time a function is called, it's now been called this many times, which sounds silly,

Starting point is 00:22:51 but are you trying to track down like an N plus one database problem or other weird things like that? You're like, if you don't really know why something bizarre is happening a ton of times, this could be kind of helpful. Rate limited. This one sounds cool as well like i only want you

Starting point is 00:23:06 to call this function so often per second and i'm you can decide what to do in this case it says we're going to time.sleep i'm not so sure that makes a lot of sense but it was asynchronous you could await asyncio.sleep and it would cause no overhead on the system it wouldn't clog anything up it would just make the caller wait so there's some interesting variations there as well. Keep scrolling. And then some more built-in ones. Data classes. If you want to have a data class, just at data class, the class. Brian, do you use data classes much? Yes, quite a bit. Nice. I like my classes to be VC funded. So I use Pydantic more often. Let's see last week. No, congratulations to the Samuel team there. But I honestly, I typically use Pydantic a little bit more

Starting point is 00:23:50 because I'm often going to use it with FastAPI or Beanie or something over the wire. But I really like the idea of data classes too. All right, a couple more. Register. Let me know if you know about this one. I heard about it a little while, but I haven't ever had a chance to use it.

Starting point is 00:24:05 But the at exit module in Python, it has a way to say when my program is shutting down, even if the user like control C is out of it, I need to make sure that I delete, say, some file I created or call an API and tell it real quick, like, you know what? We're gone. Or I don't know, something like that, right? You just need, there's something you got to do on your way out, even if it's a force exit. Yeah.

Starting point is 00:24:29 You can go. I have, sorry to interrupt, I have used this. No problem. Yeah. Yeah, when did you use it? What do you use it for? Similar sort of thing. I've got like some thing in the background that I want to make sure that we,

Starting point is 00:24:40 there's a little bit of cleanup that's done before it goes away. But I just wanted to correct this. This says from at exit import register and then decorate with register. I think it looks better if you just import at exit and do the decorator as at exit dot register because it's better documentation. I totally agree. I totally agree. There's a couple of things in this article where the code is a little bit, there was the other article that I did that was a little bit,

Starting point is 00:25:09 that I talked about that was a little bit weird. But I agree, keeping the namespace tells you, like, well, what the heck are you registering for, right? I think namespaces are a good idea. I definitely use them. But anyway, so you can just put this decorator on a function, and when you exit, they show an example of some loop going just while true, and they control C out of it.

Starting point is 00:25:25 It says, hey, we're cleaning up here. Now bye. Which is, that's a pretty nice way to handle it instead of trying to catch all the use cases with exceptions and try finelies and so on. All right. Property. Give your fields behaviors and validation.

Starting point is 00:25:40 Getter setters and so on. Love it. And single dispatch, I believe we've spoken about before where you can give um basically you do argument over overloads for functions so you can say here's a function and here's the one that takes an integer and here's the one that takes a list and these are separate functions and separate implementations and you do that with that single dispatch decorator you know i actually always forget about this, but I kind of glad I forget about it because I think I would use it too much.

Starting point is 00:26:10 I used to love function overloading when I was doing C, C++, C sharp type stuff. I would really count on it. And I thought I would miss it in Python. And I haven't. Well, I noticed that some people that convert to python from c will just assume that it has function overloading and it just doesn't work and that's known as function erasure the last one wins right yeah we talked about that last time oh no we talked about that when we talked on talk python which maybe we'll mention at the end but yeah last time we talked

Starting point is 00:26:41 yeah yeah all right, those are the 12 that they put in the article. Most of them are really great. Some of them point you at things like tenacity, which is also really good. So that's what I got. Nice. Well, I would like to talk about testing too a bit. Let's talk about PyHamCrest. So this topic is contributed by TXLs on the socials. So thanks, TXLs. So PyHamCrest, and the thought was, like, Brian talks about testing a lot, so why haven't you covered this?

Starting point is 00:27:16 So what PyHamCrest is, is a matcher object declarative rule matcher thing that helps you with asserts and stuff like that have you used this i have not my first thought it was like a some kind of menu item on a holiday dinner but no i i literally only heard about this because you put it in the show notes so this is news to me the idea is instead of like all the assert so you've got a whole bunch of certain things like assert that assert that and equal to and a whole bunch of certain things like assert that, assert that and equal to and a bunch of ham crust things

Starting point is 00:27:48 that you can import. So you can do things like instead of saying assert the biscuit equals my biscuit, you can say assert that the biscuit equal to my biscuit. So at first, so I've always thought

Starting point is 00:28:01 asserts are like, I get this for unit test, but for PyTest, do we need it? Because you could just use assert in PyTest. However, I'm kind of easing up on that argument because I can see a lot of places where just really if you can make your assertions more readable in some contexts, then why not? Sure. And I don't know about this one, but if it's got things like go through a list and assert everything as equal in the list right yeah or or higher order things where it would be

Starting point is 00:28:31 kind of kind of complex to implement the test that is the thing you want to assert like these three fields are equal of these three things right then it becomes a little less obvious and if this has a really nice story well so there's like it does yep there's a whole bunch of matchers within it like for objects it's like equal to and has length it has property um has properties is interesting so you could like assert on duck typing hopefully it has these these values or something uh numbers close to greater than, less than. Of course, these asserts are fine with this, but the logical stuff, the logical and sequences is I think where I probably might use it.

Starting point is 00:29:11 Things like all of or any of or anything, or that's neat. Like all of these things are true. And you can combine this with or, like all of these or all of those or something. Sequences, if it contains contains in any order that's kind of interesting um yeah nice has items is in again these are things that are testable in python raw like just raw tests not too bad but if it's more readable sure why not

Starting point is 00:29:43 um so there's a there's some like uh that are shown like especially with raising errors like exceptions oh where'd i get it oh the tutorial has a bunch of cool stuff in it um the the things like assert that calling translate with args curse word raises a language error well that's kind of neat um very naughty assert that broken function raises exception okay um i mean with pytest you've got the raises thing with with pytest raises but it is some people have a hard like it's not obvious and this maybe maybe this looks better um the the this is kind of neat it use you can use use assertion exceptions with async methods. So it has a resolved item.

Starting point is 00:30:28 So you can say assert that await resolved future results in future raising value error or something. Yeah, nice. That's cool. So, yeah. So a lot of predefined matchers and I guess it has some syntactic sugar things like it is underscore.

Starting point is 00:30:46 So just if it sounds better to have an is in there, you can add it. So assert that the biscuit is equal to doesn't do anything, but it sounds better. So why not? I guess if you wanted to read that English, like insert a no op verb. Yeah, but but I guess I do want to highlight this because why not? I mean, I'm, I'm, since I'm writing a lot of test code, I'm used to all the different ways you can check different equivalents of values or comparisons. So I don't know how much I would use this, but for, I've seen a lot of people struggle with how to, how to write an assertion.

Starting point is 00:31:24 And so having some help with a library, why not? So this is pretty neat. Yeah, this totally resonates with me. I like it. Well, that's our six items. Six, four items. Do you have any extras for us this week? I do have a few extras.

Starting point is 00:31:42 Let me throw them in here. First of all, it's a few weeks old. I didn't remember to put it up here, but Python 3.11.2 is out as well as 3.10.10 and the alpha five of 3.12. We're getting kind of close to beta, it feels like, for 3.12, which will be exciting because then we'll get real visibility into what's probably going to be happening for the next version of python that's cool yeah i'm yeah i'm testing for 312 already with our ci builds so nice uh for for example with 311 2 there were 192 commits since 311 1 194 rather so that's pretty pretty non-trivial right there and they link over to somewhere that looks i don't know just what am i supposed to learn from that here's the changes from 311 to rather so that's pretty pretty non-trivial right there and they link over to somewhere that looks

Starting point is 00:32:25 i don't know just what am i supposed to learn from that here's the changes from 311 to 312 so i always go to downloads full list of downloads scroll down to the particular version here and go to release notes and there you go that's probably what they should be linking to and here's all the things there's some that are in here that are um things that you might actually care about like for example fixed race race race condition while iterating over thread states in thread.local. You might not want that in your code. And various other things.

Starting point is 00:32:52 Yeah, look at all these changes here. This is a lot. Yeah, nice. Go team. Yeah, go team. You might think, oh, it's just a dot plus one, plus 0.0.1 sort of thing to it. But now it's got some interesting changes as well as I haven't looked at what's happening in the others, but maybe some

Starting point is 00:33:12 of those are important enough to pull backwards those fixes. Also more recent as in eight days ago, we've got Django 4.2 beta, beta one. And, you know, typically the philosophy is once it hits beta, the API should be stable. The features should be stable. It's just about fixing bugs. Doesn't always work out that way, but that's generally the idea. So basically here's your concrete look at Django 4.2.

Starting point is 00:33:38 Yeah. Right? And 4.2 looks exciting. Yeah, absolutely. So you can, you know, they've got some release notes and various things about what's going on you can go check that out so they got psycho uh pg3 so postgres support it now supports post uh psycho pg version 3.1.8 or higher you can update your code to use

Starting point is 00:33:58 that as a back-end i'm still using two so i better i didn't know there's a three. Careful, Brian. PsychoPG2 is likely to be deprecated and removed at some point in the future. Comments on columns and tables. So that's kind of neat in the database model. So the ORM gets some of their mitigations. No comment on that. Yeah, no comment. Very good.

Starting point is 00:34:20 Some stuff about the so-called breach attack. I have no idea what it seems to have to do with gzipIP. So check that out. Another one that's interesting is in-memory file storage and custom file stores. This is for making testing potentially faster. So if you're going to write some files as part of a behavior, you can say, just write them to in-memory. Don't have to clean them up.

Starting point is 00:34:38 And they write really fast. Yeah. It phenomenally speeds up testing. It's good. Yeah, I bet. All right. So, uh, there's that. And then also, Yeah, I bet. All right. So there's that. And then also, I want to give a shout out.

Starting point is 00:34:48 I'll put it like this. I want to give a shout out to an app real quick that people might find useful by way of Journey. So rewriting the TalkPython apps in Flutter, which all the APIs are Python, but we're having apps on macOS, Windows, Linux, iOS, and Android. That's really hard to do with Python. so Flutter is what we're using. And it's going along really well. Here's a little screenshot for you, Brian, to show you what we've got so far. Isn't that cool? Yeah.

Starting point is 00:35:14 Yes, and another, like here's the little app and stuff. So I think I'm really happy with how it's coming together. I think it's going to be a better mobile app experience and an existing desktop experience for like offline mode with the TalkPython courses. Oh, cool. Yeah. So that'll be really neat. The thing I want to tell you about is something I just applied to it. This thing called ImageOptim. And what you can do is you can just take the top level of your project. So I did this for say the TalkPython training website. I did this for

Starting point is 00:35:40 the mobile app. Just take the very top level project folder and just throw it on this app. And it'll go find all the images, all the vector graphics and everything and minimize the heck out of them. So for example, when I did that on the mobile app, it went from 10 megs of image assets to eight megs of image assets, lossless. Like no one will know the difference

Starting point is 00:35:58 other than me that I've done it. And it dropped 20% of the file size, which is not the end of the world, but given how much work it is, it's not too bad. Well, the lossless part is the important bit. So that's pretty exciting. Yeah, exactly. So it'll do things like if it's a PNG and it sees you're using a smaller color palette than what it's actually holding, it's like, oh, we can rewrite that in a way that doesn't make it actually look different, but takes up less storage. Basically, it's a wrapper over things like Moe's JPEG,

Starting point is 00:36:27 PNG Crush, Google's Zapfali. I don't know how to say these things. But there are a bunch of lossless image manipulation tools, and it just applies those to all of them in a super easy way. And this thing's open source itself. Cool. So, yeah. Anyway, if people have websites out there,

Starting point is 00:36:45 they consider just like, take your website, throw it on here, and it'll tell you, make sure it's all checked in and get, do this, see what it says. It gives you a little report at the bottom, like you saved either 10K or you saved five megs, depending, you can decide whether to keep the changes.

Starting point is 00:36:59 Yeah, cool. Yep, all right. That's all my extras. How about you? I just have a couple. Yesterday, I talked with you on Python Byte. No, on Talk Python about PyTests tips and tricks. And I just wanted to point out that the post is available

Starting point is 00:37:15 for people to read if they want to go look through it. And if you have comments, please, or questions, let me know, of course. Also in March, I think I've brought this up before, but I'll be speaking at PyCascades. There's a picture of me without hair. And I did stick up a blog post on pythontest.com, just a placeholder so that I can link the slides

Starting point is 00:37:41 and code afterwards. So that's up. Yeah, awesome. And that's it. Yeah, awesome. And that's it. Yeah, that's going to be a really cool talk. I think a lot of people are interested in how you share fixtures and build them for your team or cross-project. As well, as it was really great to have you on TalkPython, we talked a bunch of cool PyTest things. That'll be out in a few weeks for people if they don't want to watch the YouTube version.

Starting point is 00:38:01 And then we'll let people know when that's available. Yeah, absolutely. But hopefully they're all subscribed to watch the YouTube version. And then we'll let people know when that's available. Yeah, absolutely. But hopefully they're all subscribed to TalkPython already anyway. Of course. I'm sure they are. Yeah. I'm sure they are. How about a joke?

Starting point is 00:38:13 Are we ready? Yes, let's do a joke. Let's do it. So this one, this is a quick and easy one. And for people listening, no pictures even. This one comes from NixCraft on Twitter. And it says, developers, let us describe you as a group. Groups of things sometimes have weird names.

Starting point is 00:38:32 Like a group of wolves is called a pack. A group of crows is called a murder. We think we should call a group of developers, Brian. That's hilarious. A group of developers is called a merge conflict. Isn't that good? Yeah. It is. The comments are pretty good. If you scroll down here, some of them are silly. Some are just like, yep. Yeah. Yeah. Anyway, they're pretty good. But yeah, a group of developers is called a merge conflict. And

Starting point is 00:39:04 so true it is. You can even have a merge conflict with yourself. Be a group of developers is called a merge conflict and so true it is you can even have a merge conflict with yourself be a group of one um how about a group of uh tech ceos with social media accounts it'd be a lawsuit that's right an sec uh sec investigation that's right yeah well fun as always thank you yeah thanks thanks everybody Well, fun as always. Thank you. Yeah. Thanks. Thanks everybody for showing up as always.

Starting point is 00:39:29 And let's, we'll see everybody next week.

Python Bytes - #325 It's called a merge conflict

Topics covered in this episode: Python Parquet and Arrow: Using PyArrow With Pandas FastAPI-Filter 12 Python Decorators to Take Your Code to the Next Level PyHamcrest Extras Joke See the full sh...ow notes for this episode on the website at pythonbytes.fm/325

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Python Bytes - #325 It's called a merge conflict

Topics covered in this episode: Python Parquet and Arrow: Using PyArrow With Pandas FastAPI-Filter 12 Python Decorators to Take Your Code to the Next Level PyHamcrest Extras Joke See the full sh...ow notes for this episode on the website at pythonbytes.fm/325

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.