Python Bytes - #306 Some Fun pytesting Tools

Episode Date: October 19, 2022

Topics covered in this episode: Awesome pytest speedup Strive to travel without a laptop Some fun tools from the previous testing article Refurb Extras Joke See the full show notes for this epis...ode on the website at pythonbytes.fm/306

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 306, recorded October 18th, 2022. I'm Michael Kennedy. And I'm Brian Ocken. Very exciting to have a whole bunch of things to share this week. Also, I want to say thank you to Microsoft for Startups for sponsoring yet another episode of this one. Brian, we've had a very long, dry summer here in Oregon, and I was afraid that we would have terrible fires
Starting point is 00:00:30 and it'd be all smoky and all sorts of badness, and there've been plenty of fires in the West, but not really around here for us this summer. We kind of dodged the bullet until today. It's a little smoky today. We smoke, go inside. Yeah. I thought we dodged it, sadly, no. So I think it's affecting my voice a little smoky today we smoke go inside yeah i thought we dodged it sadly no so i think it's affecting my voice a little bit so apologies for that we'll put that uh that filter on you and
Starting point is 00:00:52 we'll make you sound like someone else and you'll be fine yeah yeah it's also affecting me so um who knows but anyway we'll make our way through we will fight through the fire to bring you the Python news. Hopefully they get that actually put out soon. Like the post office. Yeah, let's kick it off. What's your first thing? So I've got, let's put it up. So I've got, I had to stream.
Starting point is 00:01:18 I've got awesome PyTest speed up. So this is awesome. Yeah. So actually, some people may have noticed the test and code is not really going on lately and so one of the things that makes it easier for me is when i see cool testing related articles i don't have a decision anymore i can just say hey it's gonna go here now testing code will eventually pick up something again but i'm not sure when so for now if i find something cool like this article,
Starting point is 00:01:45 I'll bring it up here. So this is... Good thing I make you show up every week to talk about fun stuff anyway. So this is a GitHub repo, and we're kind of seeing more of people writing, instead of blogging, they just write a readme as a repo. I know, this is such a weird trend.
Starting point is 00:02:05 I totally get it and it's good, but it's also weird. But it's kind of neat that people can update it. So if they can just keep it up and you can see people. That's right. You get a PR to your blog posts. That's not normally how it goes. Yeah. I'm not sure.
Starting point is 00:02:19 But it's probably harder to throw Google analytics at it. Right. Oh yeah. We'll see like whether you should do that or not. Anyway, this comes to us from Nate Zuzupan. Cool name, by the way. We'll include a link in the show notes to a talk he gave at a plone in Namar 2022. Just recently. gave at a plone in uh namar 2022 so just recently um anyway so they he goes through uh best practices
Starting point is 00:02:49 to speed up your pi test suite um and he's uh and he's just kind of lists them all at the top here which is nice um hardware first like well first of all when he goes into the discussion he talks about measuring first so before you start speeding up, you should measure because you want to know if you, if your changes had any effect. And if, and if it's making support a little bit weirder, then you don't want to make the change if it's only marginal. So, uh, I like that he's talking about that of like each step of the way here, measure to make sure it makes a difference right so first off and i'm glad he brought this up is check your hardware um uh make sure you've got the hard fast hardware if you have it so one of the and i've i've noticed this before as well is um so here we go measure first but uh
Starting point is 00:03:40 some ci systems allow you to have uh self-hosted runners and it's something to consider. Um, the, uh, whether you're, uh, your CI is in the cloud or you've got virtual mission, like a, uh, server with some virtual machines around to, to be able to run your test runners. They're not going to be as fast as physical hardware. If you've got some hardware laying around that you can use. So that's something to consider is to throw hardware at it. And then test collection time. Some of the problems with the speed of PyTest is using, if you've got, if you run it from the top level directory of a project and you've got tons of documentation and tons of source code, it's going to look everywhere.
Starting point is 00:04:21 So don't let it look in those places so there's there's ways to turn that off so with no recursors and and giving it the the directory i also wanted to point out he didn't talk about this in the article but i want to point out that uh something to use is oh it went away um test paths so use test paths to say specifically so the no recursors says essentially avoid these directories. But test pass pretty much says, this is where the tests are, look here. So those are good.
Starting point is 00:04:51 Nice. Yeah, I've done that before on some projects, like on the TalkPython training website, where it's got a ton of text files and things laying around. And I've done certain things like that to exclude PyTest and PyCharm and other different things to look there. And those places were like, there's no code, but there's a ton of stuff here and you're going to go hunting through it.
Starting point is 00:05:13 So like I really sped up the startup time for a pyramid scanning for files that have route definitions in them, right? For URL and points. Because it would look through everything, apparently. Doesn't matter. At least looking for files through directories with tons of stuff. And it makes a big difference if you have a large project, for sure. Yeah, it's significant. So it was something to think about.
Starting point is 00:05:36 And documentation, too. Unless you're really testing your documentation, you don't need to look there. So hardware fast, make collection fast. This one is something I haven't used before, but I'll play with it. Python don't write bytecode, a environmental flag. I guess it comments that it might not make a big difference for you, but it might. So, you know, I don't know.
Starting point is 00:06:02 So Python writes the bytecode normally normally and maybe it'd be faster if you didn't do that during tests might as well try um there's a way to disable pytest plugins uh to um yeah let's just go built-in pytest plugins you can say uh no like no no's if or no doc test if you're running those, I haven't noticed that it speeds it up a lot, but it's, again, it's something to try and, uh, and then a subset of tests. So, um, this is especially important if you're in a TDD style. And that's, and one of the things that I think some people forget is your tech. If you've got your tests organized, well, you should be able to run a subset anyway, because you're, you've got like tests organized well, you should be able to run a subset anyway because you've got the feature you're working on is in a subdirectory of everything else.
Starting point is 00:06:49 And just run those when you're working on that feature and then you don't run the whole suite. There's a discussion, and this goes along for a set of tests or the whole suite. And then also disk access, trying to limit that. And he includes a couple ways to ensure those. a really good discussion, a fairly chunky discussion on database access and optimization to databases, including discussion around rollback and there was something else that I hadn't seen before. Let me see if I can remember. Yeah, there's some interesting things. I know you've spoken about it in your PyTest course about using fixtures
Starting point is 00:07:43 for setup of those common type things, right? So one of the things I'm not familiar with is truncate. Have you used the database truncate before? No. So apparently that allows you to set the whole database up, but delete all the stuff out of it, like to empty the tables. And that, I mean, if a big chunk of the work of setting up data is getting all the tables correct,
Starting point is 00:08:10 then truncate might be a good way to clean them all out and then refill them if you need to. But also, yeah, like you said, paying attention to fixtures. That's really good. And then the last thing he brings up is just run them in parallel. By default, PyTest runs single, each test one at a time. And if you've got a code base that you're testing that can allow, like you're not testing a hardware resource or something,
Starting point is 00:08:35 that you can allow parallel, go ahead and turn those on. Turn on the use X dist or something else and run them in parallel. So a really good list, and I'm glad he put it together. Also a very entertaining talk, so we'll give his talk a look. Yeah, absolutely. Brandon out in the audience says, people at work have been trying to convince me that tests should live next to the file they are testing
Starting point is 00:08:58 rather than in a test directory. I create a test directory that mirrors my app folder structure with my tests in there any opinions i don't like that but um uh neither do i honestly if you like it i guess okay i've heard that before but i know i haven't heard people in python recommending that very often yeah for me i feel i understand why like okay here's the code here's the test maybe maybe the test can be exactly isolated to what is only in that file but sometime you know like as soon as you start to blend together like okay well this thing works with that class to achieve its job but it you know you kind of it kind of starts to blur together and like well what if those are in the
Starting point is 00:09:43 wrong places well now it's like half here and i don't know it just it of, it kind of starts to blur together and like, well, what if those are in the wrong places? Well, now it's like half here. And I don't know, it just, it leads to like lots of, I don't know, it's like trying to go to your IDE and say, I have these seven methods, please write the test for it. And it says test function one, test function two, test function three. You're like, no, no, no, that is not really what you're after. But I feel it kind of leads, leads towards that. Like, well, here's the file. Let's test all the things in this file. And it, which is not necessarily the way I would think about testing. Well, also are you really test? I mean, it kind of lends itself to starting to test the implementation instead of testing the behavior. Yes, exactly. Because you might have,
Starting point is 00:10:20 if you've got a file that has no test associated with it, somebody might say, well, why is the test for that? And you're like, well, why is the test for that? And you're like, well, that file is just an implementation detail. It's not something we need to test because you can't access it directly from the API. Right. It's completely covered by these two other tests. And by the way, there are other folders. Go find them. Also, the stuff you're speaking about here by making collection fast and such also is a little bit tricky. Potentially sharing fixtures might be a little more tricky that way.
Starting point is 00:10:51 I don't know. My vote is to not mix it all together. Plus, do you want to ship your test code with your product? Maybe you do, but often you don't. Is it harder if they're all woven together? That's true. Yeah. Yeah.
Starting point is 00:11:07 So anyway, that's the that's the thing also henry schreiner out there kind of says i don't like distributing tests in wheels only sd so like a test folder as well yeah i'm with you i think brandon the vote here is test folder but uh you know that's's just us. Awesome. All right, well, yeah, that's good. Fine. You want to hear my first one? This is a bit of a journey. It's a bit of a journey.
Starting point is 00:11:33 So let's start here. So I have a perfectly fine laptop that I can take places if I need to for work. Take it to the coffee shop to work. If I'm going on like a two-week vacation, it's definitely coming with me, right? It's even if my intent is to completely disconnect, I still have to answer super urgent emails. If the website goes down, any of the many websites I seem to be babysitting these days, like I've got to work on it. Like there could be urgent stuff, right? So I just, I take it with me, but I'm on this mission to do that less, right? Cause I have a 16 inch MacBook pro it's pretty heavy. It's pretty expensive. I don't necessarily want to like take it camping with me, but what if, what if something goes wrong, Brian, I've got to fix it. Do I really want to drive the four hours back?
Starting point is 00:12:19 Because I got a message that like, you know, the website's down and everyone's upset. Can't do their courses or they can't get the podcast. No, I don't want that. So I would probably take the stupid thing and try to not get it wet. So I'm on this mission to not do that. So I just wanted to share a couple of tools and, you know, people, if they've got thoughts, I guess probably the YouTube stream chat for this would be the best or on Twitter. They could let me know, but I think I found like the right combination of tools that will let me just take my iPad and still do all the dev opsy life that I got to lead. So that it's not good for answering emails. You know, I have like minor RSI issues and I can't type on
Starting point is 00:12:56 an iPad, not even a little like keyboard that comes with it. Like I've got my proper Microsoft ergonomic sculpt and you can plug that into an iPad. But once you start taking that, you know, like, well, you might as well just take the computer. So two tools I want to give a shout out to. Prompt by Panic. Panic is a Portland company.
Starting point is 00:13:15 So shout out to the local team. Is it at the disco or? Exactly. They don't really freak out that much at the disco. I don't even panic there. But Prompt is a SSH client for ios in particular for ipad but you could i mean if you wanted to go extreme you could do this on your phone you know how how far are you going camping
Starting point is 00:13:34 right or where where are you going and so this lets you basically import your ssh keys and do full-on ssh like you would in your iTerm2 or Terminal or whatever. Nice. Turns your iPad into a dump terminal. Yeah, and it does. So you can easily log into the Python Bytes server over SSH, do all the things that you need to do. So if you've got to get into the server and you've got to be like, okay, well, I really have to just go restart the stupid thing
Starting point is 00:14:02 or change a connection string because who knows what, right? You could do it. It seems to work pretty well. The only complaint, the only complaint that I have for it is it doesn't have nerd fonts. So my Omiposh, dude, this is serious business. Don't laugh. My nerd fonts, I can't do PLS. I can't do pls i can't do uh oh my posh and get like the cool like shell prompt with all the information no it's all just boxes it's rough no it's fine it would be nice but it does have cool things like if you need to press control shift that or you know it has like a special way to pull up the all those kinds of keys so you press control and then some other type of thing or you know it has up arrow down arrow as like if you want to cycle through your history it's got a it's got a lot of cool features like that where um you can kind of integrate that so it works i think it's going to
Starting point is 00:14:53 work i think this is the the one half of the devops story okay the other the other part is oh my goodness what if it's a code problem do i really want to try to edit code over this prompt thing through the iPad on, you know, in like Emacs or what am I? No, I don't want to do that. So the other half is GitHub. In particular, the VS Code integration into GitHub. So if you remember, like here I have pulled up on screen, just with any public repo or your private ones. This is my ginger partial thing for like basically integrating HTMX with Flask. But you can press the dot.
Starting point is 00:15:34 If you press dot, it turns that whole thing into a cloud hosted VS Code session. That's awesome, right? Even has autocomplete. So if I hit like dot there, you can see it on my autocomplete that's pretty cool that's pretty cool but how do you press dot when you're on a web page in uh in an ipad there is no dot because you can't pull up the keyboard the only thing you do pull up the keyboard is go to an input section and once you're in input well that just types dot it doesn't do that like why so here's the other piece? So here's the other piece. All right, here's the other piece. So you go over here and you change github.com
Starting point is 00:16:07 slash Mike C. Kennedy slash Gentile Partials to github.com slash dev slash whatever. Boom, done. So if you got to edit your code, you just go change the dot com to dot dev and you have an editor. You can check it back in. In my setup, if I commit to the production branch,
Starting point is 00:16:23 it kicks off a continuous deployment which will like automatically restart the server and reinstall like the things that might need if it has a new dependency or something i could literally just come over here make some changes do a pr over to the production branch or push some how merge over to the production branch and it's done it's good to go isn't that awesome just edit, you know, edit your server live. No. Yeah, I saw somewhere, somebody was complaining about the prompt saying,
Starting point is 00:16:51 it's really hard for me to edit my code on the server. I'm like, why would you? No, it should be hard. You don't do that. Don't do that. Yeah. So I went to try this, but I have to do the two-factor authentication
Starting point is 00:17:03 to get into my account. Yeah, yeah, yeah. You got to do that. Brandon also says, hey, I'll buy you a keyboard case. I absolutely hear you. And I would love, you have no idea how jealous I am of people that can go and type on their laptops and type on these small things like RSI. I would be destroyed in like an hour or two if I did it.
Starting point is 00:17:22 It's like, it's not a matter of do I want to get the keyboard and i like i i just can't so anyway it's not that bad to be me but i'm not i'm not typing on like small square keyboards it just doesn't work it's just something i can't do okay so just no exactly no i just i like because when i was 30 my hands got messed up and they just, they almost recovered, but not a hundred percent. Right. So I know you got more going on than I do though. So I just got back from four days off and I took the iPad. Um, and I, I had to answer a few emails, but the, uh, for me, these short emails, the, uh, the type is fine cover thing and it works fine even though they're those are expensive when you add oh i want an ipad but i also want the the keyboard thing and i want the pencil suddenly it's like almost twice as much it is it is absolutely and you know just uh people who've been paying attention for the last two hours apple just released new ipads with m2s so people go check that out if they want to spend money. I'm happy with mine. I'm going to keep it.
Starting point is 00:18:27 All right. Before we move on to the next thing, Brian. Okay. Let me tell you about our sponsor this week. So as has been the case as usual, thank you so much. Microsoft for Startups Founders Hub is sponsoring this episode. We all know that starting a business is hard. By a lot of estimates, over 90% of startups go out of business in just the first year. There's a lot of reasons for that.
Starting point is 00:18:50 Is it that you don't have the money to buy the resources? Can you not scale fast enough? Often it's like, do you have the wrong strategy or do you not have the right connections to get the right publicity or you have no experience in marketing, lots, lots of problems, lots of challenges. And as software developers, we're often not trained in those necessary areas like marketing, for example. But even if you know that, like there's others, right? So having access to a network of founders, like you get in a lot of accelerators, like Y Combinator would be awesome. So that's what Microsoft created with their Founders Hub. So they give you free resources to a whole bunch of cloud things, Azure, GitHub, others,
Starting point is 00:19:32 as well as, very importantly, access to a mentor network where you can book one-on-one calls with people who have experience in these particular areas. Often, many of them are founders themselves and they've created startups and sold them and they're in this mentorship network. So if you want to talk to somebody about idea validation, fundraising, management and coaching, sales and marketing, all those things, you can book one-on-one meetings
Starting point is 00:19:59 with these people to help get you going and make connections. So if you need some free GitHub and Microsoft Cloud resources, if you need access to mentors and you want to get your startup going, now make your idea a reality today with the support from Microsoft for Startups Founders Hub.
Starting point is 00:20:15 It's free to join. It doesn't have to be venture-backed. It doesn't have to be third-party validated. You just apply for free at pythonbyse.fm slash foundershub2022 2022 the link is in your show notes thanks a bunch to microsoft for sponsoring our show what's next brian well that article that i already read about the speeding up by test it had a whole bunch of cool tools in it so i wanted to go through some of the tools that were in the article that i thought were neat um one of them for profiling and timing was a thing called hyperfine and uh this is a uh not i don't think it's a
Starting point is 00:20:51 python thing but you like for max you had to brew install it um but one of the things it does is you can give it uh you give it like two things um and it runs both of them, and it can run it multiple times and then give you statistics comparing them. So it's a really good comparison tool to, you know, like if you're testing your test suite to see how long it runs. May as well run it a couple times and see. Yeah, for people who didn't see yet the example from that first article you covered, a lot of those were CLI flags, right?
Starting point is 00:21:26 Like dash, dash, no, no's for disabling the plugin and so on. So you could have two commands on the command line where you basically change the command line arguments to determine those kinds of things, right? Yeah, exactly. So run it a couple times and run it to run the test suite a couple times each and just see uh if i add the if i had these no flags or this other flag um or with the environmental variable actually i don't know how you could do that in there you can set
Starting point is 00:21:56 environmental variables in command line maybe yeah i'm sure that you can somehow yeah in line an export statement or something who knows at the least, you can run the same command twice. You can run it, set the environmental variable, and then run it again to see if it makes a difference. Yeah. So that was neat. I don't know why I've got the API reference in here. Oh, the thing I wanted to talk about was duration. So let me find that.
Starting point is 00:22:21 I think I lost it. So we did talk about duration um durations oh well oh here it is so durations if you give it a number it like durations 10 it pytest will give you like the 10 slowest tests and tell you how far how slow they are but you can if you don't give it anything it just does all of it um but the uh the other thing that's been fairly recent, it wasn't there when I started using PyTest, is durations min. So you can give it, when you give it durations with blank or in zero, it times everything, but that might be overwhelming. So you can give it a minimum duration in seconds to only include, only time the the tests are all over a second or something like that
Starting point is 00:23:06 right right if it's really if it's 25 milliseconds like just i don't want to see it yeah i'm not going to spend time trying to speed that up but yeah um uh another cool thing brought up was a pie instrument which is a way to it's a very pretty uh way to look at um you know the times that you're spending on different things. It's not just for testing, but you could use it for other stuff. But apparently there's a in the user guide. There is specifically how to profile your tests with PyTest using PyInstrument. So that's a cool, cool bit of documentation.
Starting point is 00:23:39 This doesn't and this doesn't actually look obvious. So maybe I'm looking at this wrong but uh i'm glad i'm glad i'm glad they wrote this up so this is yeah it's kind of cool uh basically profiling your oh interesting and you do it as a fixture yeah and so you create the profiler you start the profiler then you yield nothing which triggers the test to run and then you stop the profiler and do the output that That's really cool. Yeah, pretty cool way to do that. So profiling each test.
Starting point is 00:24:08 Yeah. It's a bit mind-bending on the coroutines. So it's kind of cool they're using it as a fixture because if you had the fixture set up as a, like it's set up by default as a function, so it'll go around every function. But if you set it up as a module, you could just find the slow module test modules uh in your system which might be a easier way to speed things up if you're looking anyway um oh i was i was thrilled that my my little pytest skip slow
Starting point is 00:24:37 plugin that i developed as part of it um i didn't even come up with the ideas for the code but that came out of the PyTest documentation. But it wasn't a plugin yet. But I developed this plugin during writing the second edition of the book. And it showed up in his article, which is cool. More interesting is PyTest Socket, which is a plugin that can turn off. It just turns off socket, Python socket calls. And then it raises a particular exception so it doesn't like if you just install it it doesn't turn things off you have to pass in a disable
Starting point is 00:25:13 socket um to your test suite and then it turns off uh uh accessing the external world so um this is a kind of a cool way to easily find out which tests are failing because your network is not connected. So go figure out if you really want to. If you want to say definitely don't talk to the network or don't talk to the database, turn off the network and see what happens. Yeah. And then you can, I mean, but even if you did want part of your test suite to access the network, you could test it to make sure that there aren't other parts of your test suite that are accessing it when they shouldn't. So it'd be a cool debugging tool. And then file system stuff too there's py fake fs fake file system that you can mock file system so even things that you want to write you don't actually have to have the files
Starting point is 00:25:55 left around you can leave them around just long enough to test them so you can use this that's perfect and then the last thing i thought was cool was a way there's a thing called BlueRacer that that you can attach to a get a GitHub CI to check to check in merges. So if somebody merges something, you can test it. You can check to see if they've terribly slowed down your test suite. So it kind of reports that it doesn't I don't think it fails on slower tests but it just sort of reports uh reports what's going on so yeah it gives you a little report of like uh the nice what happened on the branch and if the test suite slowed down so yeah thanks to know yeah that's a cool project blue racer nice okay and it's automatic which is lovely yeah so nice all right well i've got one more item for us as
Starting point is 00:26:48 well brian yay so we talked a little bit about you talked about pi upgrade uh the last show i think it was yeah we talked about some of these other ones so i want to talk about i'm going to give a shout out to refurb very active project last updated two days ago, 1,600 stars. And the idea is basically, you can point this at your code and just say, here are the things that are making it seem like the old way of doing things. You should try doing it the newer way. So for example, here's something it's, it's asking if the file name is in a list, right? One of the ways you can see if file name equals x or file name equals y or file name equals z, you can say if file name in x, y, or comma z, right?
Starting point is 00:27:35 And that's a more concise and often considered more Pythonic way. But do you need a whole list allocated just to ask that question? What about a tuple? And here we have a with open file name as f, then contents f.read. And we have the split lines and so on. And so, well, if you're using pathlib, just say path.read text. You don't need the context manager. You don't need two lines. Just do it all in one. And so on this simple little bit of code here, they just run refurb against your, this example, Python file. And it'll say use tuple XYZ instead of list XYZ for that in case.
Starting point is 00:28:13 And then what I really like about it is it finds like exactly the pattern that you're doing. So it says you're using with open something as F, then value equals F dot read. Use, you know, value equals path of x dot read text one line it gives you like pretty it doesn't say you should use path read text it gives you in the syntax of here's the multiple lines you did do this instead nice right i don't i don't think i've ever used read text so i learned something new i hadn either, but you know what I do now. It also says you can replace X starts with Y or starts with Z with starts with X or Y comma Z as a tuple, and that'll actually test either of them. Yeah, one or the other.
Starting point is 00:28:59 Okay. It says instead of printing with an empty string, there's no reason to allocate an empty string. Just call print blank. That does the same effect. There's a whole bunch of things like that that are really nice here and yeah just you can ask it to explain you're like dude what's going on here uh it told you told me to do one two three uh what what's the motivation and you'll get kind of like a help text here's the bad version here's the good version Here's why you might consider that. So for example, given a string, don't cast it again to a string, just use it. Maybe more important is you can ignore errors. So you can ignore, just do a dash dash, ignore a number. There's one, which I'll show you in a second, which I've started adopting that for when I use
Starting point is 00:29:40 it. Or you can put a hash no qa and put a particular warning to be disabled or you can just say no just leave this line alone like i just don't want to hear it don't tell me uh so you can say hash no qa and then it'll catch like all of them okay okay so i ran this on the python bytes website and we got this it says um there's a part where it like builds up a list and then take some things out trying to create a unique list. I think this might be for like showing some of the testimonials. It says, give me a list of all a bunch of testimonials and then randomly pick some out of it. And then it'll delete the one it randomly picked and then pick another. So it doesn't get duplication. There's, there's other things like that as well also in the search and so i write del x bracket
Starting point is 00:30:25 y to get rid of the element or whatever it's called item and they say you know what on a dictionary you should just use x dot pop of y i think the del is kind of not obvious entirely what's going on sometimes it means free memory sometimes it means take the thing out of the list right so they're like okay do this and I got the square bracket in warning instead of the parentheses, the tuple version. And then also I had a list and I wanted to make a separate shallow copy of it. So I said list of that thing. And it said, you can just do list.copy or thing.copy
Starting point is 00:30:58 and it'll create the same thing, but it's a little more discoverable what the intention is. Probably also more efficient. Probably do it all at once instead of loop over it. Who knows? Anyway, this is what I got running against our stuff like this. And you know what? I fixed it all. Cool. Except there's this one part where it's got a whole bunch of different tests to transform a string. And it's like line after line of dot replace, dot replace, dot replace, dot replace, dot replace. One of those lines is to replace tabs with spaces. Then eventually it finds all the spaces, turns them into single dashes and
Starting point is 00:31:30 condenses them and whatnot. And it says, oh, you should change x dot replace backslash t, so tab with a space, replace that with x dot expand tabs1. I'm like, no. Maybe if it was just a single line where the only call was to replace the tabs, but there's like seven replaces. And they all make sense. Replace tabs, replace lowercase with that. Like all these other things. And if you just turn one of them into expand tabs,
Starting point is 00:32:02 like where did this come like into the sequence of replacements? Like why would you do turn one of them into expand tabs, like why did, where did this come like into the sequence of replacements? Like, why would you do this one thing? Yeah. And so I just put a no QA on that one and, and fixed it up. But anyway,
Starting point is 00:32:12 I found it to be pretty helpful and offering some nice recommendations. People can check it out. You can just run an entire directory. You don't have to run it on one file. Just say, you know, refurb dot slash go. Yeah.
Starting point is 00:32:27 We should run like several of these and then just do them in a loop and see if it ever settles down exactly if you just keep taking its advice does it upset the other one yeah like if you pi upgrade and then then refurb and then black and just and some others and yeah auto pep it say just see the goal of this one is to modernize Python code bases. If we had Python 2 code, I suspect it would go bonkers, but we don't, so it's okay. But one of the cool things, you mentioned you weren't going to do the expand tabs, but I didn't know about the expand tabs.
Starting point is 00:32:57 So the tools like this also just teach you stuff that you may not have known about the language? Yeah, like that retext versus a context manager and all sorts of stuff. Yeah. Yeah, so the expand tabs, where was it? It was over here. The expand tabs of one, that means replace the tab with one space. So if you wanted like four spaces for every tab,
Starting point is 00:33:16 you would just say expand tabs four. Which is probably correct, right? Yeah, of course. Of course it is. Of course it is. All right, well, that's it for all of our items. You got anything else you want to throw out there? I don't.
Starting point is 00:33:29 How about you? I do, actually. All right. So let's see. I got a few things. I'll go through them quick. So another sequence of things that I think is pretty interesting. This is not really the main thing, but it's kind of starting the motivation.
Starting point is 00:33:43 So we have over on all of our sites on Python bytes on talk Python and talk Python training, we have the ability to do search. So for example, over on talk by that training, I can say in grok API postman. And the results you got were just like previously were like this ugly list that you'd have to kind of make sense of it was it was really not some I was too proud of. But I'm like, I'm not inspired to figure out a different ui but i i got inspired last week and said okay i'm going to come up with this kind of like hierarchical view showing like okay if i search for say ingrock api postman i want to see all the stuff that matches that out of the 240 hours of spoken word basically right on the site and all the descriptions and titles and so on and so like
Starting point is 00:34:26 for example this twilio course i talked about used all those things and actually has one lecture where exactly it talks about all three of those things and then others where they're in there but like one one video talks about ngrok then another one talks about an api or you know some it's not really focused right Right. And here, just in this course, like it doesn't even exist in a single chapter, but across a hundred days of web and Python, like all those words are set. Right. So I came up with this search engine and well, the search engine existed, but it wasn't running in a, it wasn't basically hosted in a way that I was real happy with.
Starting point is 00:35:01 So what I did is I took some of our advice from 2017. I said, you know what? I'm going to create, I'm going to create a system B service that just runs as part of Linux when I turn it on, that is going to do all the indexing and a lot of the pre-processing. So that page can be super fast. So for example, like the response time for this page is effectively instant. It's like 30, 40 milliseconds, right? Even though it's doing tons of searching. So I'm going to run this Python script, series of scripts in the little app,
Starting point is 00:35:33 as a systemd service, which is excellent. And so we talked about how you can do that. And if you look, here's an example. Basically, you just create a systemd.service file and you say like Python space your file with the arguments and you can set it up and it'll just auto start and be managed by, you know, system control, which is awesome. So that's all neat. The other thing I want to give, the main thing I really want to give some advice about though, is those, these daemons,
Starting point is 00:36:03 what they look like is while true chill out for a while do your thing wait for an event do your thing look for a file do your thing then look for it for some more right you're just going over and over in this loop like running but often it's not busy right it's waiting for something in this search thing it's like waiting for an hour or something then it'll rebuild the search but it could just as well be waiting for a file to appear in some kind of upload folder and then like start processing that. I don't know. Right. So it has, they almost always have this pattern of like while true, either wait for an event and then do it or chill for a while and then do the thing. So my recommendation, my thought here is,
Starting point is 00:36:46 if you combine this with multi-processing, you can often get much, much lower overhead on your server, right? So check this out. So here's an example of the search thing on TalkBython search out of glances. Notice it's using 78 megabytes of ram this isn't being a show notes of course this is it just running there in the background before i started using multi-processing it was using like 300 megs of ram constantly on the server because it would wait for an hour and then it would load up the entire 240 hours of text and stuff and process it and do database calls and then generate like a search result a search uh set of keyword maps and then it would you know would refresh those again but normally it's just resting it puts that stuff back in the database but if you
Starting point is 00:37:38 like let it actually do the work it will um basically not not unload those modules and unload all that other stuff that happened in there. So if you take the function that says just do the one thing in the loop and you just call that with multiprocessing, it goes from 350 megs to 70 megs, no other work. Because that little thing fires up, it does all the work, and then it shuts back down. And it doesn't get all that extra stuff
Starting point is 00:38:02 loaded into your process. Okay. Cool, right? It is cool. and then it shuts back down and it doesn't get like all that extra stuff loaded into your process okay a little cool right it's cool you could i mean for special cases like ours i mean for yours you could just kick it off yourself right or as avid be part of your published thing when you publish new show notes um yeah exactly i mean i could i could base it on some of that. Like, yeah, it could. It gets complicated because it's hard to tell when that happens. There's like a bunch. As you can see, like in this example, there's like eight worker processes. All right.
Starting point is 00:38:35 So which one should be in charge of knowing that? So it's easy to just have that thing running and just like, you know, the search will be up to date and it's going. But please don't overwhelm the server by loading the entire thing and hanging on to it forever. Yeah, exactly. Yeah. So anyway, I thought that was a fun story to share. Let's do this one next. We talked about JetBrains Fleet.
Starting point is 00:38:59 Think PyCharm. PyCharm's like little cousin that is very much like VS Code, I guess, but has like PyCharm heritage. So this thing is now out of private beta, is now into public beta. So it has like Google Docs type collaboration. It has, but it has like PyCharm source code refactoring and deep understanding that seems pretty excellent. So people can check that out. It looks pretty neat. I've done a little bit of playing with it, but not too much yet. But if you're a VS Code type of person, this might speak to you more than PyCharm. So that's out. Speaking of PyCharm, I'm going to be on a webcast with Paul Everett on Thursday.
Starting point is 00:39:44 We're talking about Django and PyCharm tips reloaded. So just kind of a bunch of cool things you can do if you're working in a Django project in PyCharm. You want to be awesome and quick and efficient. Okay, last one. How about this? This blows me away and it's interesting. This is interesting.
Starting point is 00:40:05 So we all have got to be familiar with the GDPR. I did weeks worth of work reworking the various websites to be officially compliant with GDPR. You know, like we weren't doing any creepy stuff like, oh, now we've got to stop our tracking or anything like that. But like there's certain things about you need to record the opt-in explicitly and be able to associate a record like that kind of stuff right so um some of us did a bunch of work to make our code gdpr compliant others not so much but the the the news here is that denmark has ruled that google analytics is illegal. I mean, like, okay. And illegal in the sense that Google Analytics violates the GDPR and basically can't be used.
Starting point is 00:40:54 I believe France and two other countries whose name I'm forgetting have also decided that as well. And yeah, more or they've a significant number of european countries are vote um are deciding that google analytics just can't be used if you were going to be following the gdpr which i think most companies in in the west at least um need to follow yeah so i'm glad i i mean i when my early days of web stuff, I was using Google analytics. Of course, a lot of people do. It's a, it's, and it's free.
Starting point is 00:41:34 They like give you all these, all this information free. Why, why not? Why are they giving? Oh, it's not. Wait a minute. They're using you and your website to help collect data on everybody that uses your website. Yeah, it seems like such a good tradeoff. But yeah, I mean, you're basically giving like every single action on your website, giving that information about your users, every one of their actions over to Google, which seems like a little, I could see why that would be looked down upon from a gdpr perspective no doubt um just by the way also on that if you look over on pythonbytes.fm the pay let's see does it say
Starting point is 00:42:12 anything how many how many blockers have we got or how many creepy things do we have to uh worry about over here zero like we don't use google analytics we don't use Google analytics. We don't use, yeah, that's just global stats. But yeah, we don't use Google analytics or any other form of client-side analytics whatsoever. So I'm pretty happy about that actually. But check out the video by Steve Gibson. It's an excerpt of a different podcast, but I think it's worth covering.
Starting point is 00:42:41 It's pretty interesting. Yeah, it's something to watch at least. Yeah, yeah. Ikivu points out in the audience, how can you enforce something like that? That is Google Analytics being not allowed. It's embedded in so many sites everywhere. Sometimes you don't even manage it.
Starting point is 00:42:56 You just enter an analytics ID. Yeah, it's honestly a serious problem. For example, on our Python Bytes website, if you go to one of the newer episodes, they all have a nice little picture. That picture is from the YouTube thumbnail. Like it literally pulls it straight from YouTube. The first thing I tried to do, Brian, was I said, well, here's the image that YouTube uses for the poster on the video. So I'll just put a little image where the source is youtube.com slash video poster, whatever the heck the URL is. Yeah. Even for that, Google started putting tracking cookies on all of our visitors.
Starting point is 00:43:36 Come on, Google. It's just an image. No. Yeah. And so, or cookies, right? So what I had to end up doing as the website on the server side, looks at the URL, downloads the images, puts it in MongoDB. And when a visitor comes, we serve it directly out of MongoDB with no cookies. Like it is not trivial to avoid getting that kind of stuff in there because even when you try not to, it shows up a lot of times, like Ikovu mentioned yeah the way it gets enforced somebody says here's a big website they're violating the gdpr we're gonna recommend i'm gonna report them basically is what happens i think yeah but i think i think for small fish like me or something it's just if a country says don't do that, maybe I, maybe I won't because
Starting point is 00:44:27 they might have good reasons. So. Yeah. I mean, if you're a business, you got to worry a lot more. I don't think any individual will ever get in trouble for that, but it's also, I mean, think about how much you're exposing everybody, everybody's information and that you can't know before you go to a website, whether that's going to happen. It's already. And you can't know before you go to a website whether that's going to happen. It's already happened once you get there. So I guess, see our previous conversation about ad blockers,
Starting point is 00:44:50 NextDNS, do we hate creators? No. Do we hate this kind of stuff? Yes. Yeah. Also, information is interesting. So,
Starting point is 00:44:58 but just pay attention to what you have because you don't need Google Analytics to just find out which pages are viewed most. Absolutely. Things like that. ways so yep all right well that's a bunch of extras but there they are that's so serious though do we have something funny we do okay something i got something this is very much uh i picked this one for you brian okay so this has to do with testing
Starting point is 00:45:22 tell me what's in this picture here. Describe for our listeners. I love this picture. This is great. So it says all unit tests passing, and it is a completely shattered sink. The only thing left of the sink is the faucet is still attached to some porcelain. You can turn it on, and it goes down the drain. Actually, so you've even got integration tests passing yeah you do this is pretty yeah it's pretty not 100 coverage but yeah right not 100 coverage of the sink yeah there's this sink and it's completely smashed
Starting point is 00:45:58 there's just like just a little tiny chunk fragment of it left but it's got the drain and the faucet is still pouring into it unit test pass i love it yeah you might even cut yourself if you tried to wash your hands in this but uh but funny you might you might well that's good fun as always thanks for being here thank you yeah see you later thank you everyone for listening bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.