Python Bytes - #336 We found one of your batteries

Episode Date: May 16, 2023

Topics covered in this episode: Python's Missing Batteries: Essential Libraries You're Missing Out On awesome-polars Running Headless Selenium in Python (2023) Gracy Extras Joke See the full sho...w notes for this episode on the website at pythonbytes.fm/336

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 336, recorded May 16th, 2023. I'm Michael Kennedy. And I'm Brian Ocken. And this episode is brought to you by InfluxDB from InfluxData. We'll tell you more about them later. Be sure to connect with us over on Fostedon.org. I'm at mkennedy. Brian is at Brian Ocken. And the show is at Python Bytes. The rights and status of the show are still undetermined, Brian, but I'm sure we'll figure that out someday. See the last show to get the joke. And join us over at Pythonbytes.fm slash live, usually Tuesday at 11am Pacific time to be
Starting point is 00:00:43 part of the show. Or you can catch also the the older episodes there or of course on your podcast players and with that brian let's dig into some batteries okay well uh as we know python is the language of the batteries included but there's we also have lots of cool cool extra packages on pipei actually quite a few. And one of the things that I wanted to highlight was a few, just a handful of utilities packages that are really kind of fun. And you probably knew about them, but maybe forgot. And we've covered some of these in the past. So I wanted to highlight this article from Martin Hines called Python's Missing Batteries, Essential Libraries You're Missing Out On.
Starting point is 00:01:25 And the first project he talks about is Boltons, which is actually an amazing, it's an amazing package, but it's so big. The comment here is he could probably do an entire article just on Boltons. And I think that's wrong. I think you could do an entire book on Boltons and it would be a big book. I agree. There's a lot in there. Uh, but a few of the things that he highlighted were pretty, pretty cool that I kind of didn't know about. Um, bolt ons has a Jason utils in a time utils and an iter utils that he's demo demoing. So with Jason utils, you can, uh, just iterate with like a, uh, for line and Jason utils, JSON iterator, you can iterate through JSON elements.
Starting point is 00:02:09 That's pretty amazing. That's pretty cool. I like that. The time utils example was, is a using a date range, time utils date range and iterating through days, which is kind of neat.
Starting point is 00:02:22 I didn't know you could do that. Kind of a cool idea to, let me walk through days and get different date times.. I didn't know you could do that kind of a cool idea to let me walk through days and get different daytimes. But anyway, there's a different step, a step size you can do, you can walk through each week or whatever. And then interutils has a highlighting a couple things in their utils. One of them is get path where you which isn't really like a file system path. But's basically saying i've got a deeply nested uh structure and i want to access it without having to do all the access functions so it's a way to get access to deeply nested things and then a remap which is neat um remap uh takes
Starting point is 00:03:00 a takes a deeply nested structure and just changes something inside of it, which is kind of cool. I don't want to go through all of the details of this article, but a couple of quick highlights. There's highlighting the SH package where you can do shell commands from Python in a fairly nice way. Data validation actually avails. This is pretty neat. There's pedantic, of of course but if that's for like which is awesome but there's also this validators library which is neat and it can do things like validate making sure that email you like validating emails or visa card numbers or
Starting point is 00:03:39 an ip address is all format just validate strings are formatted correctly and things it's pretty neat cool and then the fuzz is a is a fuzzy matching string fuzzy string matching library which is kind of cool I wanted to jump down debugging there's a stack printer that has a it's basically a really nice stack trace that has does the error messages, which is kind of cool. What else? For testing, you can freeze time with the freeze gun library. And then this, the last thing is kind of cool. I write a lot of command line applications and there's a, I would not have thought to look for this package called TQDM. I don't know what that stands for,
Starting point is 00:04:24 but it does, it does like, what are these things? Progress bars, we think for command line utilities. So TQDM comes from taqwatam, which means progress in Arabic. Of course. That wouldn't have, that wouldn't have, that wouldn't have clued me in to go search for it. I love that package. There's a lot of cool stuff here. I use like TQDM is just, it's my go-to for this stuff. You know, there's a lot of things like, oh, I need to go over, you know, millions of database records and make some change and do a test. And then maybe, I don't know, update some of them. And that might take a while.
Starting point is 00:05:00 I just did something where I had to do like a report on a bunch of stuff on the talk python courses and it took nine hours to like go do a bunch of compute for a bunch of courses for each you know like an insane amount of stuff and i ran that and you could just see i saw several things one it shows you the progress you can see it doing progress but it also tells you the the per object per time so it'll say like processing you know 200 records per second um for example as it goes through the list and it also estimates the time which is why after five minutes i'm like oh this is going to take nine hours i'm not going to wait for this it's really nice so can you use it if you don't really know how long something's going to take to begin with yes and do you have to like kind of know like do you have to give it like it's
Starting point is 00:05:52 10 done or it's 20 done or no it does it all automatically and i don't really know how i think some things it can figure out okay and others yeah i don't know how it can actually do that because, for example, on the screen, it has a range from zero to 100, right? And you can't ask the length of the range. Okay, right. But it somehow knows. Well, I might play with that because right now I've got an application,
Starting point is 00:06:20 that command line thing that reboots an instrument and then waits for it to finish. And I just have dots, and it'd be kind of nice to have something like this. Yeah, my prior solution was, let's put out a little dot every so often. No, that's too many dots. Let's mod it out a little bit higher.
Starting point is 00:06:37 Maybe every 20 records we'll put a dot or something like that. Exactly. Yeah, so this is nice. You can just wrap an iterator in a TQDM and then loop over it and magic happens. Cool, we'll try that. Yeah.
Starting point is 00:06:49 All right, well, that's pretty awesome. Want to hear about more awesome things, Brian? Yeah, let's do awesome. Let's do some awesome. Some polars. So polars is, as many things in Python are these days, is the rustification, in a good way, of Python things. So it's kind of like Pandas,
Starting point is 00:07:06 but redone in Rust with more of a fluent API that allows it to be more database query engine-like. And so what I have for us today is the awesome pollers, a curated list of pollers, talks, tools, examples, and articles. Now, many of these awesome lists are extensions, and there are a few things in here. Like it talks about the Python library, and you may not know there's actually a Rust library for pollers
Starting point is 00:07:31 that you can directly use if you're integrating with Rust code, but also one for R, one for Node. It's got some things like cheat sheets. If people want to go and check out the cheat sheet, it's got actually a really nice visualization to show you what reshaping data means with concat or appending columns side by side from two data frames
Starting point is 00:07:54 with a horizontal concat flag, which I think the visualization of these things is really nice. What do you think of this, Ryan? Actually, the visualization is what I'm enjoying the most with this cheat sheet. is really nice. What do you think of this, Ryan? I'm actually, the visualization is what I'm enjoying the most with this cheat sheet. That's nice. Yeah, it's really, really nice.
Starting point is 00:08:10 And it has a bunch of tutorials and workshops. So if you are trying to get into Polar's, come over here. There's maybe six or seven different examples and a bunch of blog posts, a whole bunch, how to integrate it with DuckDB or how it compares to DuckDB, and then a bunch of videos,
Starting point is 00:08:29 as well as people in the holders community, right? Like Richie Vink, who created it, but also contributors, if you can follow them and ask them questions. That's kind of a nice addition of like, on social media, who do you follow? That's pretty cool. Yeah. Yeah. It's super nice. So anyway, not a whole lot to go into it there, but yeah, really, really nice. People are into Polar's put it here. Also, I kind of wanted to give it a shout out because Polar's is fairly new. And if you've got something that integrates
Starting point is 00:09:00 with Polar's or builds on top of Polar's in a way that itself is reasonable, come over here and do a PR. I'm sure they're happy to accept it. It says, contributions welcome, exclamation point. So yeah, get in here and contribute. They're so welcome. You know what else is welcome? Our sponsor this week.
Starting point is 00:09:23 So super happy to have a sponsor for the show. As we mentioned at the top, InfluxDB. So InfluxDB is all about the time series data. So this episode is brought to you by Python. This episode of Python Bites is brought to you by InfluxData, the makers of InfluxDB. InfluxDB is a database purpose built for handling time series data at a massive scale for real-time analytics. So developers can ingest, store, and analyze all types of time series data, metrics, events, traces in a single platform. Let me ask you a question.
Starting point is 00:09:55 How would boundless cardinality and lightning fast SQL queries impact the way you develop real-time applications? Maybe make them real-time, huh? Influx TV processes large time series data sets and provides low latency SQL queries, making it a go-to choice for developers building real-time applications and seeking crucial insights. For developer efficiency, InfluxDB helps you create IoT analytics and cloud applications using timestamp data rapidly and
Starting point is 00:10:21 at scale. It's designed to ingest billions of data points in real time with unlimited cardinality. InfluxDB streamlines building wants and deploying across various products and environments from the edge, on-premise, and to the cloud. Try it for free at pythonbytes.fm slash influxdb. The link is in your podcast player show notes. Thank you to InfluxData and InfluxDB for supporting the show. All right, over to you, Brian, what's next? Well, this is a pretty quick one,
Starting point is 00:10:50 but I wanted to, I know that a lot of people test with Selenium. It's a, I know there's lots of other stuff you can do, like Playwright and everything like that, but still, Selenium's heavily used, and I still have some tests in Selenium. And there has been a change, so I want to just make sure everybody is aware.
Starting point is 00:11:11 Here's an article called Running Headless Selenium in Python in 2023. And the catch is basically, well one, if you're not running headless already, why not? Headless is awesome. It can basically, you can run through a web browser, but don't actually load, don't open it. It just, you run it, there's no win anyway.
Starting point is 00:11:37 It's faster, so use headless. But if you are already using headless, there's been a change. So the change is, let's go down, scrolling down, there's an example, which is great. So Selenium 4.8.0 came out in January. And the old way to do things was to, you set up your web driver and you mark headless equals true. And you can do this with both Chrome and Firefox at a little different setting,
Starting point is 00:12:04 but it also had a headless equals true set up. And then you can do this with both Chrome and Firefox at a little different setting, but it also had a headless equals true setup. And then you can run headless. And it was awesome. They took away this dot headless. So don't do that anymore if you're using Selenium 4.8 or above. The new way is, so for Chrome, you add an argument of headless equals new, dash dash headless equals new. And it's really add argument. If you're listening listening to this there's a new options dot add argument and then uh the same sort of thing with uh firefox you just it isn't a equals new it's just dash dash headless but this this
Starting point is 00:12:37 shows you an example why did they do this well it was there's some description of why there was like an old way in a new way and then chrome chromium had a new headless option that you can add. So we want to be able to do the new way. So they deprecated the old way to get people to use the new, more powerful. And we're also linking to an article from Selenium, which is kind of a funny title. So they wanted to get everybody's attention. So they knew. So they named the article headless is going away.
Starting point is 00:13:06 Yes. Which is funny name. And then subtitled it with now that we have your attention, headless is not actually going away. Just the convenience method to set it in Selenium. So I guess just a public service announcement. If you're using Selenium, you got to change your code to use the new Selenium 4.8.
Starting point is 00:13:24 So that's it. Oh, you're on mute. So i am i i do like it uh i wonder though why why you have to pass the command line argument directly and it doesn't just look like oh you said headless that means in chrome now pass dash dash mode this verse you know because it's almost same, but not the same across the browser platforms. Yeah, I think it's because there's different, I don't know, I haven't looked through the explanation, but I think there's other options. So it isn't necessarily just that they've changed the way you turn on headless, but there's more headless options. So they're just building it in so that you can pass in new flags i think chromium might end up getting more more versions and later or something i don't know yeah the the browser space
Starting point is 00:14:12 is a it's an interesting time isn't it yeah we fought through the browser wars we've beaten back internet explorer 6 only to come back and Chromium even more dominant in certain ways. It's interesting because for a usability thing, I'm usually using Vivaldi now, but I use probably Vivaldi and Chrome for day-to-day use. But for testing, yeah, I use Chrome and Firefox. That's what I use Firefox for, is still testing with Firefox. Yeah, absolutely. Cool. You know, just a bit of follow-up on the previous conversation about those different batteries that you talked about.
Starting point is 00:14:52 I love our audience. There's so much cool stuff going on over there. So Blaze says, I wonder if Rich does anything with TQDM. And if you want a definitive answer, how about Will McGugan in the audience says, TQDM has a rich output option. Will obviously being the creator of rich and many other awesome things there.
Starting point is 00:15:12 So nice to follow up. Awesome. We've turned into the water cooler of Python. We sure have. All right. I have one more thing to share with you all. Let's jump into it. And that is Gracie.
Starting point is 00:15:24 So Gracie's an interesting project. It's a little bit like your first topic, Brian, in that it has a bunch of kind of utility features. And this one is around consuming API. So not creating APIs, but writing clients that talk to them specifically around HTTPX, which is one of the absolute go-to HTTP libraries for doing sort of modern async style of APIs in Python, right? Yeah. So Gracie, it says, gracefully manage your API interactions.
Starting point is 00:16:00 Gracie helps you handle failures, logging, retries, throttling, and tracking for all of your HTTP interactions. And it uses HTTPX under the hood. It lets you do the, like Gracie, do the boring stuff and you can focus on your app is the selling point here. So this is pretty cool. It's not super well known. It's got like 180 stars. And it's an interesting library that has a lot of cool functionality.
Starting point is 00:16:24 It feels like it could use a little bit more polish, but it's still quite neat. So let me give you some ideas here. So what you do is it's basically you model your API interactions through a class structure. It's not quite a hierarchy, but kind of use classes to come up with it. So you can come up with an endpoint here and then you create something that derives from the API base class, right? Give it a URL and then you give it a bunch of settings and the settings are where like kind of the useful stuff is.
Starting point is 00:16:57 So for example, you can say, I would like to log the request as it's going out the door, but only in debug. I'd like to log the response, and that one a little more frequently at the info level, and then you can have a custom message that goes out there. Then you also can have a parser that will parse the response as a set of functions. The first example you see here just says, by default, just given any object call.json on it. Given the request call.json on it, right?
Starting point is 00:17:30 So that's kind of handy. But what you can do if you go down a little bit, custom validators, is you can actually say, by default, just try to convert it to a JSON response. But if the status code is not found, then do something else. And you can have a series of different status codes. So if it by default use this parser, but if it's like a 400 bad request, then we need to parse it as something else. And that could even be like convert it from, you know, maybe in a success case, you get this particular say Pydantic model back. But
Starting point is 00:18:05 in an error case, you have a totally different structure and you might want to parse it differently into a different Pydantic model, something along like. So you can do a lot of cool stuff like that there. And yeah, and then you just give it the functions that you call that basically invoke the API. And of course, because it's based on HTTPX, you can await calling those functions. So yeah, anyway, it ends up with a pretty clean model for using it. What do you think? Well, yeah, it'll take some time to get your head around it because of the class-based thing.
Starting point is 00:18:39 But it's all stuff that you're going to have to develop anyway. So having somebody else do the work, so it's pretty good. Yeah, there's some nice examples of like throttling and this might be interesting to you, Brian, is it has the ability, there's a bunch of different things. It has the ability to replay certain data, right? So you can also say we're only allowing certain,
Starting point is 00:19:01 by default, any 200 category status code is considered success. You can say, no know, by default, any 200 category status code is considered success. You can say, no, for this one, it has to be a dot created, like HTTP status dot created, not 200 or something like that. Or you can give it either okay or created, right? You give it a set of options. That's pretty cool. You can add custom validation.
Starting point is 00:19:21 You talked about validators at your beginning as well. And if you're not using Pydantic or something that kind of does its own custom validation, you can still even add more stuff. Like not only does this have to be a string, but it has to be, I don't know, an email of this type or whatever, right? Like of this, say, the domain of our company, right?
Starting point is 00:19:42 Something like that. So you can add these custom validators. And it comes with a retry, built-in retry for how do you handle the retries? How many attempts? What do you do in terms of logging, you know, about retries and failures? What do you do if, you know, you can say, I want to retry three times and if none of them work, I don't care. Just keep going. Don't break my application or please do. Don't raise an exception. You might say, well, why would you ever not want to break it? Like maybe you're trying to write to some sort of audit log to say this happened. And if the server that just records what happened goes down, you don't want
Starting point is 00:20:18 to start crashing your app, right? There's like scenarios where you might not really care about that. Also throttling, which is pretty neat. You can say any time that the URL contains, the examples of Pokemon things. So it has, you know, a regular expression for Pokemon. I want maximum 10 requests for every one and a half minutes. And then you could actually, it has a cool output too. You know, if you print out just the rule, which is an object, it says 10 requests per 90 seconds
Starting point is 00:20:44 for URLs matching this regular expression, which is kind of nice. Oh, cool. And, yeah, the final thing, I don't really know where it is in here, but, yeah, you can also have it throw certain exceptions. So you know how it has that parser type scenario for different HTTP status codes I told you about? So you can say if it's a bad request,
Starting point is 00:21:02 please throw some exception class that you come up with. So instead of just saying bad request, it could potentially have more details. You might build a parse information into it and then raise that exception. There's some pretty neat things. And the final thing, by the way, rich integration right there,
Starting point is 00:21:20 it requires you to install rich if you want fancy output on on it'll tell you sort of it's uh it'll report on how it it's um interacted with the api endpoint so you're going to do like a bunch of processing you know i told you about like i'm going to transform a bunch of things i use tqdm but if you're going to do that at the end you could ask well how did it go and it'll give you this like summary report of how much success and how much failure and what's the average latency and status codes and requests per seconds and all of these and it'll do that in text form or in rich style final thing it will record and replay api
Starting point is 00:21:58 interactions for testing purposes so if you want you know if it's really tricky to mock out some complex interaction you say well i want it to it's really tricky to mock out some complex interaction, you'd say, well, I want it to be as exactly close to real as possible. You could just one time do those API calls and then replay them back, put it either record mode or replay mode. And the backend that stores that could be a SQLite database or a MongoDB database that's automatically integrated. And you just give it that and say, when I talk to the server, remember what you did and store it over here. And then you can play that back for testing. Oh, wow. Cool.
Starting point is 00:22:32 So, yeah. Anyway, people can check this out and see what they think. I think it almost looks like it was a system pretty much designed. Well, one of the obvious use cases is to build a custom uh custom thing to test your application uh because there's a bunch like all the utilities there to to really interrogate something and absolutely yeah you get that report and you get the replay record replayability the logging yeah a lot of that stuff is there it's it's pretty neat yeah cool nice well that's it for that one yeah i. It's pretty neat. Yeah, cool.
Starting point is 00:23:05 Nice. Well, that's it for that one. Yeah, I guess that's all of our items, isn't it? It is. And for extras, I don't have any extras. Do you have any extras? I do. I just have one.
Starting point is 00:23:18 And then we'll get to our joke. So for the extras, do you know what, Brian? Look at this. Look at, here it is. You got in the app store. Yay. I got in the iOS app store too. So finally, finally, finally,
Starting point is 00:23:30 the TalkPython mobile apps are out on all of the app stores. So go get them. Just talkpython.fm slash apps, I believe will take you there. Redirect over to the training site. But yeah, they're available on iPhone, Android, tablets, iPad, Android tablets as well. Maybe more coming.
Starting point is 00:23:49 We might have even desktop apps coming pretty soon, depending on how successful we are with all this. But yeah, so this is out. People can check it out. And as a way to celebrate finally getting this done after four months of work. First of all, I wrote a blog post, maybe I'll add it, throw the link in the notes. Yep, I'll throw it in for people. I talked about some of the design choices about how and why we chose things like Flutter and so on as the mobile app framework. But the one thing for people to know out there, and this is a bit timely, is if you download and install the mobile app before, what day today is Tuesday, May 16th. If you do that before May 22nd, so download the
Starting point is 00:24:34 app before May 22nd, inside the app only, the up and running with Git course, which is normally $39 is completely free to sort of celebrate the launch of the app. So you go in there, go find the courses, go to the free section, join the Git course, and you'll have it forever, not just for a little time. But the only way to get it is to download and install the app, which is free, and then go put the Git course into your account. I just downloaded it. I'm opening it right now.
Starting point is 00:25:01 Awesome. Awesome. So one of the things I'm excited about this is because I mean, when I'm doing a course, we're like not giving a course, but learning from one, I do like to have it on my computer screen, but there's often times where I've got like time to kill. So I'd like to sort of listen to some of the conversation and listen to it. And, and yeah, I'm going to look at some of the stuff on my phone, but a lot of it is kind of following along, but I'm listening. And then I'll go through and watch the same stuff later on the computer and walk through it. So I really like this addition of having a mobile app.
Starting point is 00:25:34 This is pretty cool. Thanks so much, Brian. Yeah, there's a couple of things why you might need it. People are like, well, why don't you just watch it on the web? Like, especially on iPhone, you can't get rid of that navigation section around the the browser so you end up watching like a postage stamp size thing which is not ideal um it won't auto advance because ad companies are evil and ios blocked them from playing ads all the time which you know gobbles up everyone else as well unfortunately okay so just on your app it'll just jump to the next
Starting point is 00:26:06 thing then yeah it just keeps playing smoothly as as you would imagine and then the other thing that's important is you can download content offline like if you're going on a trip or on the train or some people even use it if they work at you know like government institutions that have like high levels of security and they want to like research labs and stuff if they want to be able to take the course at their work but their work is like super restrictive about what they can interact with you could you know install download a whole course onto your tablet set it next to you and watch it your work yeah so there's there's about those are the reasons why it why it exists but anyway long time coming. Super happy about it.
Starting point is 00:26:45 That's my extra. Cool. Download it. Get the Git course. All right. Well, how about a joke? Ah, this is a good one. So you may wonder, you may have friends who are like,
Starting point is 00:26:56 Brian, you do Python. You do C++. You wrote a book on PyTest. Like, how did you get so good at this? So this kind of riffs on that theme. There's two developers here. First one, she says, how do you code so well? The expert developer, she says, practice.
Starting point is 00:27:17 And the first person didn't really hear it. Like, it must be an innate gift, a gift from God. It's practice. I'll never understand how some people are so talented, a mystery, practice. Yeah. Right? Yeah. What do you think? Well, this is great. And it applies to so many things, of course. But one of my daughters is dealing with this right now. She's been doing for about a year, doing aerial silks, aerial arts, and she's working on it and exercising and stuff every day. And it was really hard at first, and now she's pretty good.
Starting point is 00:27:53 And so many people have said, oh, you're just naturally talented at that. She's like, it makes her mad because it's not natural. I've had to work at it. Coding as well, so obviously. Obviously, yeah. It's not just coding to work at it. Yeah. Coding as well. So obviously, Brian Smith, obviously, yeah, this is it's not just coding, but coding. Certainly. Yeah. Podcasting, writing blog posts, everything around what we do practice. Yeah, absolutely. Practice. Nice. Nice way to end
Starting point is 00:28:18 it. So good. Yeah, absolutely. Very, very uplifting. The ball ended on a growth mindset today, Brian. Thanks for being here. Thank you. Thanks, everyone for coming. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.