Python Bytes - #203 Scripting a masterpiece for Python web automation

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 203, recorded October 7th, 2020. I'm Michael Kennedy. And I am Brian Ocken. And this episode is brought to you by Datadog. Thank you, Datadog, for supporting us. Pythonbytes.fm slash Datadog. And a lot of cool stuff out there.

Starting point is 00:00:19 We'll tell you more about it later. Brian, can you believe we're, like, well into the 200s? Well, by three. Yeah, we're getting a good start already. Yeah. A month almost. Yeah, I guess a month because that's zero based, which is pretty awesome. Now, speaking of things that are awesome, DigitalOcean was a sponsor of the show for a while, but before they were sponsors, we actually just use them as hosting our infrastructure and we still do. So when you download the MP3, your podcast player talks to something,

Starting point is 00:00:49 it's talking to our services on DigitalOcean and so on. And over there, we just have a set of virtual machines, some database servers, some other things, and they manage themselves as kind of a cluster. And by manage themselves, I mean I manage them. I mean, they mostly take care of themselves, but I do have to log in and take care of them. But there are different ways of hosting your apps that don't require you to actually log in and configure servers and make sure they're all good and so on. Often that's called platform as a service. We also have

Starting point is 00:01:21 Kubernetes clusters and things like that, where you just say, here's a definition of my code. Please make it go on the internet. So what I want to talk about is DigitalOcean just launched a new app platform that is a platform as a service. And like I said, I'm a fan of DigitalOcean because they're simple and straightforward and affordable and easy to use, but really high quality. So I think that it's worth pointing out this new platform that they just launched. You're comfortable with doing your own what droplet or whatever it is yeah exactly I'm not

Starting point is 00:01:50 so I'm kind of looking forward to trying something like this and I've got a ton of different apps and they have inner connections within each other that they have to care about and like there's a lot of stuff where you know at some point it makes sense to go down that path with various things that all work together.

Starting point is 00:02:06 But if I just got an app and I wanted to get on the internet, often you don't want to deal with or worry about those things or forget to apply an OS patch. Or how many times, I mean, large-scale VC-funded professional web apps say, we're going to be experiencing downtime for the next 30 minutes or for four hours i'm just like what could you possibly be doing that takes four hours i just it's like boggles my mind that you're not able to do it better than four hours of downtime and so platforms like this mean zero downtime deployment and things like that so really really

Starting point is 00:02:42 neat so they've announced this new app platform i I want to point out, this is not an ad. This is just something I think is cool. So I'm sharing with you. So yeah, so they came up with this new app platform that, you know, you say it's pretty modern. It's like, how do you get your code into it? You point it at your GitHub repository. You don't like log into it and do a get thing. You just say, I'm going to give you access to my source code and it will automatically deploy from that. That would be one nice way to get it over there and get it set up. But you also might want continuous deployment.

Starting point is 00:03:11 So if I push, like how do you get a new version with zero downtime deployments and all that? Well, you just push to a particular branch that you decide upon and it automatically notices that and does a redeploy. That's pretty sweet. So I have that for like Talk python training if i push to a production branch it'll automatically do the checkout ensure the requirements are built recreate it i had to write that this just happened

Starting point is 00:03:35 this is just part of it right that's pretty neat yeah yeah i don't want to do that myself i didn't either but it was better than logging in all the time so this is built on top of digital ocean kubernetes which is interesting because a lot of platform as a service type of things are just opaque they're like well you can give us access to your code and we'll make it run magic but really all this is is they'll orchestrate running your code on top of their kubernetes clusters which means you can like define Docker files in your repository that are going to be part of the app that runs in Kubernetes.

Starting point is 00:04:09 You can use some of the tools actually to talk to the underlying infrastructure. So it's not a closed environment. You can actually kind of get down to the infrastructure layer a little bit more. So all these things are pretty neat. It has automatic handling of traffic spikes for simple, simple, simple apps.

Starting point is 00:04:26 For static apps, it's free. For three of them, right? For real apps, I guess, apps that run code like Python, you can pay five bucks for like a simple version, like on a shared server, or you can pay 12 bucks for a more pro version that has more features, CDN, SSL, all those kinds of things. And then if you want to scale it up,

Starting point is 00:04:49 you can pay tons, right? You can pay like $150 to run it on a huge server or a bunch of different small servers. And there's a whole scaling thing that you can do, but there's a pretty decent offering. It's still not as cheap as running it on your own, but just like you said, a lot of people don't want to run it on their own

Starting point is 00:05:03 and that's not their expertise and why should they be doing that right yeah i would tell like if you were i were to offer to do all of my server stuff for me i would totally buy you dinner once a month yeah that's kind of the price right but this would be like a cheap dinner like a muchos gracias type of you know enchiladas and a coke not a filet mignon. Yeah. Maybe just like a $5 gift card to Starbucks. Yeah, there you go. I could totally get two scones.

Starting point is 00:05:32 Anyway, if you were thinking about running your, I talked to so many people, students of the courses and stuff, and they're like, I got my app, but now I got to put it online. What a pain. I can't get Nginx configured right or this other thing or so on. This is another solid option now that has a nice you know push to a branch deploy run your stuff zero downtime you know it's probably most comparable to heroku i would say in the python ecosystem yeah yeah all right well people could check this out i think it's i think it's a cool offering i will not be personally using it because there's

Starting point is 00:06:02 a bunch of little gotchas like you know it would be better if right for example i don't want to use their hosted postgres database i want to run a mongodb server which is fine it's no problem you can do that there but you can't like what i do on the mongodb server is in order to talk to it you have to be within a white list of known ip addresses that the servers the web servers servers and API servers have. So there's like 10 APIs in the world that can talk to that server and no others. The thing is with these Kubernetes clusters, when you push redeploy,

Starting point is 00:06:33 it will regenerate it and rehost it potentially somewhere else. And the IP address keeps changing. So you can't do things like have a custom database server that has firewall limited, restricted, like VPN type of stuff. Those types of things don't exist. Most people probably don't care. I care, so I'm not doing it. You can't do Mongo with this thing? You can do Mongo, but you would have to have the

Starting point is 00:06:55 MongoDB database port listen on the open internet rather than be restricted to just a few IP addresses. Maybe they figured this out and it's buried in the... It's something that there's a whole conversation about, like, here's the things we're going to add, here's the things that it doesn't currently do, here's some workarounds, etc., etc. So anyway, there's a whole conversation.

Starting point is 00:07:17 You can check it out. But if you do things like use their hosted database, which would make sense in a pass type of story, you don't have these problems, right? They automatically wire that stuff up. Because when you want to break the rules, you get in trouble. So, you're a fan of Shakespeare, is that right? Head down to Medford.

Starting point is 00:07:35 I've never been. Ashland, sorry, it's Ashland down there. There's a whole, like, Shakespeare week and, yeah. Is Ashland still there with the fires and all? God, I hope so. Yeah. No, I hope so. Yeah. No, I've always wanted to, but people that don't live in Oregon have no idea what we're talking about. But there's a small town in southern Oregon that does a lot of Shakespeare plays and that

Starting point is 00:07:55 sort of transition was because I want to talk about Playwright. So Microsoft put out an announcement announcing Playwright for Python. I was trying to look into this. I guess I haven't quite got whether or not Playwright was a thing before Playwright for Python or not. But in any case, it's a Microsoft thing, and it's a way to drive and test your web application through easily. So it's an end-to-end testing solution. It's open source and whatnot. But in their announcement, it's a pretty cool announcement.

Starting point is 00:08:30 It gives examples and everything. So I'm going to read their pitch. The pitch for it is, with the Playwright API, you can author end-to-end tests that run on all modern web browsers. Playwright delivers automation that is faster, more reliable, and more capable than existing testing solutions and i'm guessing by existing testing solutions is a nice way of them to say we are better than selenium yeah that's what i was thinking as well so there's already

Starting point is 00:08:56 a pytest plugin there's um runs on python and there's a little um we've said that we like animated gifs of uh of uh how it works and on their announcement page there's a little, we've said that we like animated GIFs of how it works. And on their announcement page, there's a little animation. And I was actually pretty impressed with that little bit. So you can drive it even from a command line or an interactive shell. You can drive some playing with it, which is nice. So a few of the benefits. Apparently, it's timeout free automation. So this playwright automatically waits

Starting point is 00:09:27 for the user interface to be ready before you act on it again. I know there's some workarounds and there's some wrappers on top of Selenium that do that also, but this is built into the system. It's intended to stay modern with emulation of mobile viewports,

Starting point is 00:09:43 geolocation, web permissions. You can automate scenarios across multiple pages. I don't really test websites that much, but I didn't know that that was difficult before, so apparently that's easier now. Cross-platform, of course, or cross-browser, of course, because you got to test against different things. They use a Chromium driver for Chrome and Edge emulation, WebKit driver for Safari, and a Firefox driver. And supposedly the Safari rendering driver even works on Windows and Linux, so you don't actually have to have an Apple computer to do that. So, PyTest-compatible and Django-compatible.

Starting point is 00:10:22 I'm sure it's compatible with lots of other stuff too, but the examples on the announcement show PyTest examples and Django compatible. I'm sure it's compatible with lots of other stuff too, but the examples on the announcement show PyTest examples and Django examples, which is cool. They even mentioned that, of course, you can run this from your continuous integration server and including GitHub Actions and others. You must be happy to see that it's PyTest, like natively PyTest friendly, like with fixtures and whatnot.

Starting point is 00:10:45 I love that that's, that obviously we're to the point now where if you have a new testing tool, you may as well in the announcement, tell people whether or not you can run it with PyTest because people are going to ask. But that's a good state to be in the Python world, I think. So for example, like the simple hello world sort of test is just go to make sure that you get like a header text on a page so it says define a function which takes a page with type annotations by the way double props for that so page and then that's already a fixture from the framework in pytest so it automatically passes that over setup you just all you do is say it takes a page then page go to url assert page dot intertext of H1 equal, equal,

Starting point is 00:11:26 you know, the text you're looking for. There's also more like that you could do. It's like beautiful soup like stuff, but there's more of the kind of drive it. Yeah. Go ahead. That's a two lines of code for a test to make sure there's something's on

Starting point is 00:11:37 our webpage. That's pretty cool. Yeah, that is pretty slick. And the fixture bit is neat. You can also go and like do a test of login. So get a new page go to the url let's do page.fill give it a css selector for the username field heck the input field give it a css

Starting point is 00:11:55 selector for the passwords they fill with that and then click where the text of a button equals login you don't have to do the css stuff or anything just find me a button or a thing or url that has the text login and click that and it's off. And so like one of the examples here is it does that first and then it logs in and it creates a session that remembers that it's logged in for the rest of the testing. So that's like one of the setup phases, which is pretty cool. Yeah. Let me throw out one other thing. You talked about Chromium as one of the drivers, right? So a lot of times when you're doing Selenium, I don't know about this, but it looks the same. You know, you have to install Chromium and then there's drivers, right? So a lot of times when you're doing Selenium, I don't know about this, but it looks the same.

Starting point is 00:12:27 You have to install Chromium and then there's like a little hidden one. You can also do the Firefox browser for Selenium. But I was talking to the guys at Attila from Scraping Hub on TalkPython and he pointed out that Scraping Hub makes a headless browser specifically designed to be a headless browser specifically designed to be a headless browser called Splash.

Starting point is 00:12:48 So their headline is, the headless browser designed specifically for web scraping turned JavaScript-heavy web pages into data. So I don't know how much better that is, but it's interesting to think that you can swap out these browsers. And here's a cool example as well, something that maybe people don't know about. Yeah, I listened to that episode, and thanks for reminding me. I was like, I got to think that you can swap out these browsers. And here's a cool example as well. Something that maybe people don't know about.

Starting point is 00:13:05 Yes, I listened to that episode and thanks for reminding me. I was like, I got to check that out. Yeah, I do too, but I haven't checked it out, but it definitely looks neat. So this though, I like it. I mean, it looks at least as neat as Selenium. I don't know. Maybe it's even better. So pretty cool.

Starting point is 00:13:21 Also cool, Datadog. They're actually sponsoring the show. Unlike DigitalOcean where I just found something that I like from someone who happened to be a sponsor. But Datadog are sponsoring the show, not making them any less cool. So let me ask you a question. Do you have an app in production that's slower than you like? It's performance, maybe it's all over the place, sometimes fast, sometimes slow. Here's the important question.

Starting point is 00:13:42 Do you know why? With Datadog, you will. You can troubleshoot your app's performance with Datadog's end-to-end tracing. Get detailed flame graphs, identify bottlenecks and latency in that finicky app of yours. Be the hero that got your app back on track at your company. Get started with a free trial. And I believe they send you a t-shirt, a little cool t-shirt still, over at pythonbytes.fm slash Datadog.

Starting point is 00:14:07 So Brian, something we haven't spoken about nearly enough is asyncIO and asyncInAwait. Should we touch on that a little? Sure. Okay. Yeah, we've talked about some. Some. I believe some, maybe. So, one of the things that asyncIO is for, I mean, if you look at the name, it's around waiting on IO, waiting on external things like network calls, API calls, and so on, right?

Starting point is 00:14:33 Oh, I thought it was just trying to be cool, like all the.io. It could be that, or it could just be like the Italian pronunciation. Async, yo. Async, yo. No, it's beautiful. So when I think of files, I think of IO.

Starting point is 00:14:46 Like if somebody said, what is IO? I would think file IO. That's the first thing I would say. And yet Python doesn't have built-in support for asynchronously working with file IO. That's bizarre, right? Yeah, it is. I believe there's an external package. I think I saw it somewhere on like awesome async IO or some list like that,

Starting point is 00:15:05 that somebody had built something along those lines. But there's a cool article called asynchronously opening and closing files in async IO by Chris Wellens. So he wrote this and said, look, async IO has great support for networking, sub process, inter process communication stuff, but no file operations like open reading, writing and closing files. And if you're talking to something that might take a long time, I mean, I don't know about you, but I've got a pretty raging SSD on both my computers. So maybe I don't need this. Unless you're at that corporate, maybe you're logged in through a corporate VPN and you've mapped a network share over to your drive. And then you try to read from that all of a sudden your file. might get super slow, right?

Starting point is 00:15:46 Well, even on SSDs, file I.O. is slower than memory reads. Yeah, it's much slower. So there's certainly situations where this could be extreme, like the network one, but you're right. Even normal file I.O. can be slow if you're really looking to squeeze out the most concurrency. So basically he wrote a little article working through it and it's ridiculously short actually on how you can do this. Right. So basically he says, look, if I use

Starting point is 00:16:11 open, open file in Python, I would, as a decent Pythonic bit of code, typically I would write with open thing as file IO object, right? File stream. Let's build that for, so then we're going to call a open, which is an asynchronous one. And it's kind of bizarre and weird that Python has this, but it does. And I think it's neat. It has an async with blocks when you do async things that have to be

Starting point is 00:16:35 asynchronously managed within context managers. So he said, let's write this so it implements the async with style, which is really simple. you basically implement a couple of methods instead of dunder enter dunder exit you do dunder a enter dunder a exit and so on okay and then he says okay well what we're going to do is we're going to define a function that just opens a file super easy but then we're going to run it in an async io event loop by saying run in executor. And what that means

Starting point is 00:17:05 is async IO will create a thread pool where it's going to run over on a background thread and then it just runs that and lets you await it. And that's basically it. Isn't that neat? That's not much code. No. It's like the opening bit is one, two, three, it's six

Starting point is 00:17:21 lines of code, including the function name, which has to be there. The five lines of writing code. Yeah. And one of the things I like about this is not because I really want to do async file stuff. It's because it's a neat, neat little example that I can get my head around so that if I have some other process or other slow thing that I want to make asyncified, this might

Starting point is 00:17:44 be an example to how to do that. Yeah, absolutely. So I think this is super instructive and interesting. I'll also throw out that there is an AIO files package. I think it's files, plural. Maybe it's file, no file, singular AIO file, which you can pip install and then just do this instead of like see the tutorial but the i think the value here is like well what else doesn't have async support and what could i just kick over to a thread but then integrate into async io event loops yeah it's nice indeed you know this is nice excel like so many people who can't do any programming or any scripting or anything they can just go to excel and like drag a droppy a little uh you know a formula and paste it over and then they're good

Starting point is 00:18:29 to go yeah except except what so except it's 2020 that's the problem yeah so this this is only tangentially related to python mostly it's that people start using databases in Python, stop using Excel so much. This article, we had a lot of people actually say, did you guys see this? Yeah. So, yeah, lots of people brought this up to us. I've got an article that I picked. There's a bunch of articles also, but I picked a BBC.com article because it didn't have very many ads. So the BBC article says Excel,

Starting point is 00:19:05 why using Microsoft's tool caused COVID-19 results to be lost. Wow. So there's a, uh, apparently if you haven't heard about this, apparently there were 16,000 coronavirus cases that went unreported in England. The good news is,

Starting point is 00:19:20 is they, well, sort of good. They, they did only, it only took like a few days for somebody to notice this but there is a few days where where there was some stuff not getting tracked right and policy was like hey things are getting better we're trending down this is amazing

Starting point is 00:19:34 yeah except no such a bad just didn't read it so apparently what yeah you had uh several commercial for testing firms filling out csv and sending them to, I forget the name of the place, some health organization in England that was pulling all this stuff together. And they were pulling it together by putting it all in an Excel XLS template that could be then uploaded to a central system and made available to NHS test and trace team, as well as other government computer dashboards. But the use of the XLS template made it so that there was a limit of 65,000 rows. Actually, that just gives me nightmares to think of a 65,000 row Excel spreadsheet. But apparently that's the limit.

Starting point is 00:20:26 Nobody quite noticed that they'd hit it. It didn't say anything about failing. And people noticed, some people said, well, you should have used XLSX because that increases the limit by 16 times. But still Excel for this? Of course, I was thinking thinking why are you doing this in excel and in this article they had a quote from professor john croft crow sorry crowcroft from the university of cambridge he says excel is always meant for people mucking around with a bunch of data on their small company to see what it looked like and then when you need something more serious you build something bespoke that works there's dozens of other things that could do, but you wouldn't use an XLS. Nobody would start with that.

Starting point is 00:21:09 Exactly. Apparently people did, though, and so people should be using Python. Yeah, that's not good. That is not good. So, I think there's a really interesting trend of moving towards things like pandas to answer these

Starting point is 00:21:25 questions right yeah i don't think that's the answer for everybody right like oh well excel is kind of clumsy for you so here's what you should do is you should learn a whole bunch of programming right i mean here's a random story that i would one of the more frustrating things from my corporate days is when i was doing training we would have to write proposals to send off to clients. Like, here's what we're going to cover. Here's what we're going to teach. Here's your goals. And here's the timeline and so on.

Starting point is 00:21:51 And I would send that off as a Word document and work with one of the salespeople I worked with. And they said they'd send it off to the client and something had changed. The Word doc, like a doc X, said, oh, Michael, I need you to replace this word with that word. And so she sent me the document back and asked me to replace that word with that word. I'm like, do you not know about command R or control R? Or whatever the replace hotkey is. And why would you ever send me a file and just say, I need this word to do a find and replace with that word.

Starting point is 00:22:21 But I need to do it first. I was just like, so anyway, I'm thinking of that excel like you would i would never suggest that that person learn it that said a lot of excel power users i think would do really well to adopt jupiter lab and pandas and stuff and actually chris moffitt who's does practical business python just did a webcast with us over we talked about it before but the recording's up now you can check that out and that'll give you some concrete tips to avoid the excel if possible oh nice good resource that links in our show notes yeah would you be a fan of uh getting documents sent to you and asked to do a finder in place on a word i've totally had that happen yeah like i sent you the doc you could just i mean maybe send it back to me and say,

Starting point is 00:23:05 hey, I made some updates and here's my updates if you need to store the version. Yeah, exactly. Yeah, just make sure I did it right, maybe. But I mean, it was pretty straightforward. Anyway, let's move on. I'm sure everyone out there has a story like that of you wouldn't believe what I had to do in my corporate job.

Starting point is 00:23:25 So this next one comes to us from a listener, a person, Daniel, who's given us lots of cool feedback and ideas. And this one is called locust.io. This is actually a pretty good pairing with Playwright. Okay. So Playwright is about validating that what is on the webpage makes sense. I can go log in and press the button, and then I go to this page and this text is here, something like that. Right. As a continuous integration.

Starting point is 00:23:51 So locus is about, okay, you know, that works. What if 10 people do it at the same time? What if a hundred people do it at the same time on our current infrastructure? You hear about things like the whole healthcare debacle where they spent hundreds of millions of dollars of code on code on these projects and like a few people logged in and it just failed and you just wonder like could you just tried it just maybe just seeing like if we call that api 10 times a second will it actually take it right right? And so tools like this are exactly what you want.

Starting point is 00:24:25 It's really cool for just simulating, accessing a bunch of different sites. I was just thinking one good use for this may have been, sorry to interrupt, maybe the schools could have done this before they had everybody log in so that all the kids on their laptops or their tablets wouldn't have said on day one,

Starting point is 00:24:43 I don't know what's going on. It won't let me in what's going on it won't let me in yeah the page won't load it just it keeps giving me the numbers 500 is this a math class anyway yeah exactly so you should test your code and so i've used these before these types of tools and often it's like okay what you're going to do is open a web browser and you're going to go to the site and it'll record like the urls and you can like use some weird like selection syntax i guess weird clumsy gui maybe it stores it as xml but you have like a ui on top of it it's all crummy and they probably charge you a ridiculous amount of money for this so here's

Starting point is 00:25:17 the thing with locus it basically looks like you're writing like unit test code so if you look at the there's an example in the show notes, just check that out. So what you do is you define a user and then you give the user some tasks or some behaviors. Oh, this is the one that I was thinking. I'm sorry, I was confused this with your playwright. So for example, with the user,

Starting point is 00:25:35 like you would say something like self.client.post to log in and you just give it a dictionary. Username is this, password is that. Boom, that's it. And that will actually go over there and submit the login form with that data, which is pretty awesome. And then you give it tasks.

Starting point is 00:25:53 And these are kind of like tests. Like go to the index page, do a get on slash and do a get on the JavaScript. Go to the about page and do a get on slash about. Or, you know, go click this button or go make this thing happen. And then once you have this then you can turn that into like a bunch of discrete distributed parallel requests to see

Starting point is 00:26:11 if you get any 500 errors timeout errors like what the average latency is for 10 users 100 users a thousand users at a time you can run it on distributed machines so you can have it simulate millions of users if you want to run it on like 20 cloud vms or something like that and turn it on onto your website what do you think i think this is cool and you're saying that there's a game website that's using this there is in the notes that they say when they talk about the features they say look you can define user behavior and code suit just plain python code which is neat it's scalable so you can define user behavior and code. Just plain Python code, which is neat. It's scalable, so you can run it, like I said. And then it's battle-tested.

Starting point is 00:26:50 Because Locus has been used to simulate millions of simultaneous users on Battlelog, the web app for Battlefield games. And so you really could say Locus is battle-tested. Nice. I don't know if anybody's seen the trailer for the battlefield games i've not been paying attention to it for ever but for many many years at least wow these games have come a long ways like if you watch the trailer for the latest one that's crazy crazy stuff but it's kind of also beside the point i think this way of saying like this is what a

Starting point is 00:27:19 website user does they log in and then they go to this page and i might also visit this page and you set up things like not just i want to have so when you answer questions like how many users can we support typical users are not like pathological they don't go to like your account page and hold down command r or control r and just refresh it as hard as they can right they'll go there and they'll spend like three or four seconds five seconds and then they'll go to another thing they'll spend 10 seconds there then they'll go off and they'll click this button, right? They'll have normal human behavior. So one of the things you set up in this class, you define that represents a user on your site is the wait time. So say the wait time is between five and 15 seconds. And then you ask, can it take a million

Starting point is 00:27:58 users? It doesn't just do a million concurrent requests. It has like a million of these things randomly waiting between five to 15 seconds as they're kind of like interacting randomly with your site. Oh, cool. So you could sort of scale this then you could start with something like some long wait times and then make sure that it can handle like a thousand users or something and then gradually make it shorter so that it's hitting on your server harder. Yeah, exactly.

Starting point is 00:28:24 I think this is really neat. I don't know that I would necessarily be using it right now, but if I create something new, especially something I'm sure is going to get a lot of traffic, then I would definitely use this. It looks really neat. It's free and open source. Write it in Python. Why the heck not?

Starting point is 00:28:40 The only reason I wouldn't use it now is I've already had some really big spike events. I'm like, okay, well, everything's running at like two percent five percent cpu it's like it's fine i don't know you can totally see i mean there's a huge use case for this is that like people that have the they're rolling out a new app or even if they're an existing company rolling out something new and everything looks fine on their server even when they're testing with like two or three consecutive tests or something. But are we ready to roll it out?

Starting point is 00:29:07 We don't know how many people are going to hit it. So they can sort of gauge that. The one that I always have in mind when I think about this is you've got some app that's been out there and it's kind of getting some traction. Your company's getting some traction in it. and the company decides we're going to run a Super Bowl ad or we're going to launch some huge marketing campaign on Black Friday that's way, way out of bounds of what we normally do. The last thing, I mean, you only get one shot for your app to work when that Super Bowl ad runs or on that Black Friday event. If it just goes down for that little bit of time,

Starting point is 00:29:42 it's not like, well, we got it up. It's fine now. You've lost that moment and that million dollar spend or whatever the heck it turns out to be. So it's like those moments where the spike is unknown, but also the time which you get to deal with it is short. Yeah. Or things like, yeah, I'm pretty sure

Starting point is 00:29:57 that the healthcare marketplace website's ready. It's fine, yeah. Sure, Mr. President, this is going to be fine. It won't be like blemish your record for all of history all right speaking of things that i'm sure are going to be fine hacktoberfest was such a it's a good idea in theory potentially we're like in in middle october or deep into october already i don't know how your repos did but i got a lot of attention did you yeah no mine yes mine didn't so much i'll you about that, but go ahead and tell people where we're going with this.

Starting point is 00:30:27 Okay, so Hacktoberfest. Hopefully you know about it, but if you don't, it's an interesting idea sponsored by Digital Ocean and other sponsors. Again, Digital Ocean not sponsoring this episode. Overall, it's a good idea. So the idea is to encourage people to contribute to open source by bribing them with a t-shirt and other swag. That works for geeks. We love our t-shirts like how else are you going to be like wearing your clothes what

Starting point is 00:30:49 do you put in your closet yeah maybe maybe you can buy a t-shirt with a half an hour of work but we're gonna like have you work for like hours and just get one t-shirt anyway there's always been some spam with this people abusing it but I think it was not as prevalent as this year. But what happened this year, and I'm going to link to a video by Anthony Satili titled What's Wrong with Hacktoberfest. He introduces what Hacktoberfest is, some of the problems, and he recommended some solutions. We're not going to cover those today. But apparently there was a youtuber this year i think it was in india that did a video on how to get a free t-shirt by doing like it's basically how to get free free swag with not much work and he did this video to show you how to submit a pull request to a project and only do things something like update the readme to say an awesome project or change its with it is or something like that.

Starting point is 00:31:48 And then do a pull request saying document or improve docs and do that for four different repos. And there you got a t-shirt. Yeah, I met many of these people. It turned into a big problem. So I was actually really thrilled with how fast DigitalOcean and whoever's working on Hacktoberfest fixed it, or at least hopefully. I'm sure people are still trying to do this, so I'm sure there's a lot of spam going on. But they changed the rules. So as of the 3rd,

Starting point is 00:32:18 they updated the rules to try to reduce the spam. One of the big things is maintainers can opt in by adding a Hectoverfest topic to their repo. So a whole bunch of stale old repos won't get hit, hopefully. And then also you can mark any PR that's dumb as invalid and it invalidates stuff. And actually the full rules is, let's see, we're going to have it in the show notes it's a little uh little pseudocode so if you submit a pr in the month of october and the pr is labeled as hacktoberfest accepted by the maintainer or you submitted it to a repo with hacktoberfest topic and the pull request was merged or it was approved so you can't just submit it and get your t-shirt. It has to be like some maintainer has to say, yeah, this is good or I approve it or whatever. It's not automatic anymore.

Starting point is 00:33:13 And also, if you are a maintainer and you've dealt with all the spam, sorry about that. But also, I'd like to encourage more people to do Hacktoberfest because it's a cool thing. I didn't want to bring it up before because I didn't want to encourage spam, but I think these changes will help. And if you're a maintainer, please be sure to do those notifications by November 1st because that's the deadline. Yeah, interesting. I had no idea what was going on until I saw Anthony Petili's post or Twitter message. You know, somebody came over to some of the,

Starting point is 00:33:47 I have 222 repositories, most of which are public between the courses and various other things. So there's a bunch of opportunity to go in and make changes, right? So somebody came along to the beginner, the Python for Absolute Beginners course and said, I would like to add a few little tips for some beginners to make this slightly better. You know, we can't change anything because it needs to match what's in the video. But if you had a little section that had like some tips and they were meaningful, sure, I guess that's okay.

Starting point is 00:34:14 And then the next day I woke up and it was like 10 PRs, not necessarily all from this person, but from a bunch of different people with weird things like change the readme from this, you know, check out our latest course to check out the latest course and just changing like the word hour to the and i'm like what is going on that i saw anthony's thing and like okay close close close close close close close just just straight out like i don't even want to talk to these people this is super annoying and they

Starting point is 00:34:40 weren't just making changes to the readme they would go into they would make changes to like xml configuration documents i'm like you can't change that that's that's machine that's read by the machine right that's gonna break something if i accept this not only is it like annoying that i gotta deal with it but if i were to accept that i'm pretty sure it would break i think maybe it was like formatting like putting a node closing node bit like on on a line above or like putting a space i mean i don't think it actually broke it but it was really weird stuff and i didn't understand i was coming from hacktoberfest i was being hacked by the hacktoberfesters yeah but it has stopped since they've made these changes which is great oh has it stopped so most of that stuff was in the first few days yeah i haven't

Starting point is 00:35:19 seen the last couple days i didn't realize that's probably because the rules changed i just went through and like just denied everything that I saw coming in. Yeah. I wonder if they forced the takedown of that video or maybe it's gone. Yeah. Who knows?

Starting point is 00:35:31 Who knows? Well, I know that that's it for all of our main topics. Got anything else you want to throw out real quick before we wrap it up with a joke? I don't.

Starting point is 00:35:38 I could totally use a joke. But do you have any extra things? I do. There's a really cool conference. It's, I believe, theoretically was supposed to be this year in Vancouver, B.C., which is an absolutely wonderful town to visit, called Pi Cascades. Cycles between Vancouver, Seattle, and Portland. Well, this year it's taking a diversion to cycle to the internet because 2020.

Starting point is 00:36:01 Although it's in 2021, like still planning now. So PyCascades 2021 will take place Saturday, February 20th from the world. I don't know if they're having any local stuff going on, but anyway, it's basically a virtual conference and the call for proposals is open. So if you'd like to give a presentation there,

Starting point is 00:36:20 you can do that by November 10th. Submit proposals. So that would be cool. I think talking at get-togethers like this, meetups, the smaller not full-blown PyCon, but PyCascades and other types of events are a really good way to sort of raise your profile and stretch your comfort zone as a developer. So I encourage people to do it. Also, Patricia... Yeah, I spoke at the 2020 version that was just before the world fell apart that's right i was there my daughter and i watched from the

Starting point is 00:36:50 back it was great next thing other thing patricio rains rains who is a researcher at the barcelona super computing center which by the way they have this virtual tour he sent me oh my god it is so awesome they have like a pop song for it it is held inside is the the super literally the supercomputer is inside an old cathedral so like where you know where all the arches are and where the sermons would have been given like that's where the supercomputer is that's pretty awesome can we put that link in the show notes too yeah yeah i'll put it in there yeah but that's not why he sent it to me. He just said, hey, I happen to work here and I use Jupyter a lot.

Starting point is 00:37:27 You spoke about Black Cell Magic and then another Black formatter plugin for Jupyter notebooks. So he said, you should also check out nbblack, nb underscore black, which works in Jupyter and JupyterLab. And there's another one

Starting point is 00:37:42 that only works in JupyterLab called the JupyterLab code formatter. So just like always, we mentioned one thing that we kind of discover and then listeners are like, that's great. And, and, and here's a bunch of other stuff. So thank you for that, Patricio. Yeah. Nice. But I love that. I like the multiple tool thing. That's fine. Yeah, indeed. All right. Let's do a joke. I've chosen some very clear ones that actually have a visual component as you will. I don't know why I do that, but that's what I've done. So I'll let you do the first one. I'll do the second one.

Starting point is 00:38:12 So people who don't know, this is a classical programmer painting. And the idea is this is a legitimate, real painting from some museum. Typically, they're hundreds of years old but there's instead of having you know like flowers in the the tide pools or whatever some random thing that the artist named it it's renamed with a programming title okay yeah so why don't you quickly describe your picture and then tell us the title. Okay. So the picture is, it's a white, kind of a white gray background. I think it's snow or something. There's some horses running.

Starting point is 00:38:53 There's a whiteout blizzard almost. Yeah, it's horrible. Yeah. And there's some horses running, two horses running, pulling a, what, like a sled or something? I don't know. And there's somebody laying on the sled. All right. What's the title?

Starting point is 00:39:04 Delivering a Feature in the time of a code freeze this is by anthony petrowski oil on wood 1883 that's beautiful all right so the one that i got here it's these three guys they look highly skeptical almost like they're on some kind of mission sneaking out of like really tall grass on a boat in some kind of swamp you can see them like really slowly sort of approaching and the title is Red Hat Enterprise Linux Sys Admins Entering

Starting point is 00:39:37 the Docker Convention Floor Oil on Canvas 1882 isn't that a great one like look at their face yeah you gotta check this out click on the link in your podcast player and see it they're like angry pirates in a canoe yeah it's sort of a piratey feel to it like they're like oh what are we doing here we're breaking in it's such a weird world this docker kubernetes i love this thing of like programmer quotes on old on paintings

Starting point is 00:40:01 that's uh it's funny yeah If there's ever some sort of like artwork exhibition at a PyCon, this is happening. We could probably do it virtually somehow. Try to do it at a virtual conference.

Starting point is 00:40:14 Yes. I think we could. Yeah. Yep. All right, well, thanks for being here as always and thank you everyone out there who's listening.

Starting point is 00:40:20 Yep. Bye-bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.

Starting point is 00:40:25 That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast

Starting point is 00:40:43 with your friends and colleagues.

Your Ad Here

Python Bytes - #203 Scripting a masterpiece for Python web automation

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.