Python Bytes - #203 Scripting a masterpiece for Python web automation
Episode Date: October 16, 2020Topics covered in this episode: Introducing DigitalOcean App Platform Announcing Playwright for Python Asynchronously Opening and Closing Files in asyncio Excel: Why using Microsoft's tool caused C...ovid-19 results to be lost locust.io Fixing Hacktoberfest Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/203
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 203, recorded October 7th, 2020.
I'm Michael Kennedy.
And I am Brian Ocken.
And this episode is brought to you by Datadog.
Thank you, Datadog, for supporting us.
Pythonbytes.fm slash Datadog.
And a lot of cool stuff out there.
We'll tell you more about it later.
Brian, can you believe we're, like, well into the 200s?
Well, by three. Yeah,
we're getting a good start already. Yeah. A month almost. Yeah, I guess a month because that's zero
based, which is pretty awesome. Now, speaking of things that are awesome, DigitalOcean was a
sponsor of the show for a while, but before they were sponsors, we actually just use them as
hosting our infrastructure and we still do. So when you download the MP3,
your podcast player talks to something,
it's talking to our services on DigitalOcean and so on.
And over there, we just have a set of virtual machines,
some database servers, some other things,
and they manage themselves as kind of a cluster.
And by manage themselves, I mean I manage them.
I mean, they mostly take care of themselves, but I do have to log in and take care of them. But there are different ways
of hosting your apps that don't require you to actually log in and configure servers and make
sure they're all good and so on. Often that's called platform as a service. We also have
Kubernetes clusters and things like that, where you just say, here's a definition of my code. Please make it go on the internet.
So what I want to talk about is DigitalOcean just launched a new app platform
that is a platform as a service. And like I said, I'm a fan of DigitalOcean because
they're simple and straightforward and affordable and easy to use, but really high quality.
So I think that it's worth pointing out this new platform
that they just launched. You're comfortable with doing your own
what droplet or whatever it is
yeah exactly I'm not
so I'm kind of looking forward to trying
something like this and I've got a ton of different
apps and they have inner connections
within each other that they have to care about
and like there's a lot of stuff where
you know at some point it makes sense to
go down that path with
various things that all work together.
But if I just got an app and I wanted to get on the internet,
often you don't want to deal with or worry about those things
or forget to apply an OS patch.
Or how many times, I mean, large-scale VC-funded professional web apps
say, we're going to be experiencing downtime for the next 30 minutes
or for four hours i'm just like what could you possibly be doing that takes four hours i just
it's like boggles my mind that you're not able to do it better than four hours of downtime
and so platforms like this mean zero downtime deployment and things like that so really really
neat so they've announced this new app platform i I want to point out, this is not an ad. This is just something I think is cool. So I'm sharing
with you. So yeah, so they came up with this new app platform that, you know, you say it's pretty
modern. It's like, how do you get your code into it? You point it at your GitHub repository. You
don't like log into it and do a get thing. You just say, I'm going to give you access to my source
code and it will automatically deploy from that.
That would be one nice
way to get it over there and get it set up.
But you also might want continuous deployment.
So if I push, like how do you get a
new version with zero downtime
deployments and all that? Well, you
just push to a particular branch that you
decide upon and it automatically
notices that and does a redeploy.
That's pretty sweet. So I have that for like Talk python training if i push to a production branch it'll automatically
do the checkout ensure the requirements are built recreate it i had to write that this just happened
this is just part of it right that's pretty neat yeah yeah i don't want to do that myself i didn't
either but it was better than logging in all the time so this is built on top of digital
ocean kubernetes which is interesting because a lot of platform as a service type of things are
just opaque they're like well you can give us access to your code and we'll make it run magic
but really all this is is they'll orchestrate running your code on top of their kubernetes
clusters which means you can like define Docker files
in your repository that are going to be part of the app
that runs in Kubernetes.
You can use some of the tools actually
to talk to the underlying infrastructure.
So it's not a closed environment.
You can actually kind of get down
to the infrastructure layer a little bit more.
So all these things are pretty neat.
It has automatic handling of traffic spikes
for simple, simple, simple apps.
For static apps, it's free.
For three of them, right?
For real apps, I guess, apps that run code like Python,
you can pay five bucks for like a simple version,
like on a shared server,
or you can pay 12 bucks for a more pro version
that has more features, CDN, SSL, all those kinds of things.
And then if you want to scale it up,
you can pay tons, right?
You can pay like $150 to run it on a huge server
or a bunch of different small servers.
And there's a whole scaling thing that you can do,
but there's a pretty decent offering.
It's still not as cheap as running it on your own,
but just like you said,
a lot of people don't want to run it on their own
and that's not their expertise and why should they be doing that right yeah i would tell like if you were i were to offer
to do all of my server stuff for me i would totally buy you dinner once a month yeah that's
kind of the price right but this would be like a cheap dinner like a muchos gracias type of you
know enchiladas and a coke not a filet mignon.
Yeah.
Maybe just like a $5 gift card to Starbucks.
Yeah, there you go.
I could totally get two scones.
Anyway, if you were thinking about running your, I talked to so many people, students of the courses and stuff,
and they're like, I got my app, but now I got to put it online.
What a pain.
I can't get Nginx configured right or this other thing or so on.
This is another solid option now that has a nice
you know push to a branch deploy run your stuff zero downtime you know it's probably most comparable
to heroku i would say in the python ecosystem yeah yeah all right well people could check this
out i think it's i think it's a cool offering i will not be personally using it because there's
a bunch of little gotchas like you know it would be better if right for example i don't want to use their hosted postgres database i want to run a mongodb
server which is fine it's no problem you can do that there but you can't like what i do on the
mongodb server is in order to talk to it you have to be within a white list of known ip addresses
that the servers the web servers servers and API servers have.
So there's like 10 APIs in the world
that can talk to that server and no others.
The thing is with these Kubernetes clusters,
when you push redeploy,
it will regenerate it and rehost it
potentially somewhere else.
And the IP address keeps changing.
So you can't do things like have a custom database server
that has firewall limited, restricted,
like VPN type of stuff. Those types of things don't exist. Most people probably
don't care. I care, so I'm not doing it. You can't do Mongo
with this thing? You can do Mongo, but you would have to have the
MongoDB database port listen on the open internet
rather than be restricted to just a few IP addresses.
Maybe they figured this out and it's buried in the...
It's something that there's a whole conversation about,
like, here's the things we're going to add,
here's the things that it doesn't currently do,
here's some workarounds, etc., etc.
So anyway, there's a whole conversation.
You can check it out.
But if you do things like use their hosted database,
which would make sense in a pass type of story,
you don't have these problems, right? They automatically
wire that stuff up. Because when you want to break
the rules, you get in trouble.
So, you're a fan of Shakespeare,
is that right? Head down to Medford.
I've never been.
Ashland, sorry, it's Ashland down there.
There's a whole, like, Shakespeare week
and, yeah. Is Ashland still
there with the fires and all? God, I hope
so. Yeah. No, I hope so.
Yeah. No, I've always wanted to, but people that don't live in Oregon have no idea what we're talking about. But there's a small town
in southern Oregon that does a lot of Shakespeare plays and that
sort of transition was because I want to talk about Playwright.
So Microsoft put out an announcement announcing Playwright for Python.
I was trying to look into this.
I guess I haven't quite got whether or not Playwright was a thing before Playwright for Python or not.
But in any case, it's a Microsoft thing, and it's a way to drive and test your web application through easily.
So it's an end-to-end testing solution.
It's open source and whatnot.
But in their announcement, it's a pretty cool announcement.
It gives examples and everything.
So I'm going to read their pitch.
The pitch for it is, with the Playwright API,
you can author end-to-end tests that run on all modern web browsers.
Playwright delivers automation that is faster, more reliable,
and more capable than
existing testing solutions and i'm guessing by existing testing solutions is a nice way of them
to say we are better than selenium yeah that's what i was thinking as well so there's already
a pytest plugin there's um runs on python and there's a little um we've said that we like
animated gifs of uh of uh how it works and on their announcement page there's a little, we've said that we like animated GIFs of how it works.
And on their announcement page, there's a little animation.
And I was actually pretty impressed with that little bit.
So you can drive it even from a command line or an interactive shell.
You can drive some playing with it, which is nice.
So a few of the benefits.
Apparently, it's timeout free automation. So this playwright automatically waits
for the user interface to be ready
before you act on it again.
I know there's some workarounds
and there's some wrappers on top of Selenium
that do that also,
but this is built into the system.
It's intended to stay modern
with emulation of mobile viewports,
geolocation, web permissions. You can automate
scenarios across multiple pages. I don't really test websites that much, but I didn't know that
that was difficult before, so apparently that's easier now. Cross-platform, of course, or cross-browser,
of course, because you got to test against different things. They use a Chromium driver for Chrome and Edge emulation,
WebKit driver for Safari, and a Firefox driver.
And supposedly the Safari rendering driver even works on Windows and Linux,
so you don't actually have to have an Apple computer to do that.
So, PyTest-compatible and Django-compatible.
I'm sure it's compatible with lots of other stuff too,
but the examples on the announcement show PyTest examples and Django compatible. I'm sure it's compatible with lots of other stuff too, but the examples on the announcement show PyTest examples
and Django examples, which is cool.
They even mentioned that, of course,
you can run this from your continuous integration server
and including GitHub Actions and others.
You must be happy to see that it's PyTest,
like natively PyTest friendly, like with fixtures and whatnot.
I love that that's, that obviously we're to the point now where if you have a new testing tool, you may as well in the announcement, tell people whether or not you can run it with PyTest because people are going to ask.
But that's a good state to be in the Python world, I think.
So for example, like the simple hello world sort of test is just go to make sure that you get like a header text on a
page so it says define a function which takes a page with type annotations by the way double props
for that so page and then that's already a fixture from the framework in pytest so it automatically
passes that over setup you just all you do is say it takes a page then page go to url assert page
dot intertext of H1 equal,
equal,
you know,
the text you're looking for.
There's also more like that you could do.
It's like beautiful soup like stuff,
but there's more of the kind of drive it.
Yeah.
Go ahead.
That's a two lines of code for a test to make sure there's something's on
our webpage.
That's pretty cool.
Yeah,
that is pretty slick.
And the fixture bit is neat.
You can also go and like do a test of login.
So get a new page go to the url
let's do page.fill give it a css selector for the username field heck the input field give it a css
selector for the passwords they fill with that and then click where the text of a button equals
login you don't have to do the css stuff or anything just find me a button or a thing or
url that has the text login and click that and it's off. And so like one of the examples here is
it does that first and then it logs in and it creates a session that remembers that it's logged
in for the rest of the testing. So that's like one of the setup phases, which is pretty cool.
Yeah. Let me throw out one other thing. You talked about Chromium as one of the drivers,
right? So a lot of times when you're doing Selenium, I don't know about this, but it looks
the same. You know, you have to install Chromium and then there's drivers, right? So a lot of times when you're doing Selenium, I don't know about this, but it looks the same.
You have to install Chromium and then there's like a little hidden one.
You can also do the Firefox browser for Selenium.
But I was talking to the guys at Attila
from Scraping Hub on TalkPython
and he pointed out that Scraping Hub
makes a headless browser specifically designed
to be a headless browser specifically designed to be a headless browser
called Splash.
So their headline is,
the headless browser designed specifically for web scraping
turned JavaScript-heavy web pages into data.
So I don't know how much better that is,
but it's interesting to think
that you can swap out these browsers.
And here's a cool example as well,
something that maybe people don't know about. Yeah, I listened to that episode, and thanks for reminding me. I was like, I got to think that you can swap out these browsers. And here's a cool example as well. Something that maybe people don't know about.
Yes, I listened to that episode and thanks for reminding me.
I was like, I got to check that out.
Yeah, I do too, but I haven't checked it out, but it definitely looks neat.
So this though, I like it.
I mean, it looks at least as neat as Selenium.
I don't know.
Maybe it's even better.
So pretty cool.
Also cool, Datadog.
They're actually sponsoring the show.
Unlike DigitalOcean where I just found something that I like from someone who happened to be a sponsor.
But Datadog are sponsoring the show, not making them any less cool.
So let me ask you a question.
Do you have an app in production that's slower than you like?
It's performance, maybe it's all over the place, sometimes fast, sometimes slow.
Here's the important question.
Do you know why?
With Datadog, you will.
You can troubleshoot your app's performance with Datadog's end-to-end tracing.
Get detailed flame graphs, identify bottlenecks and latency in that finicky app of yours.
Be the hero that got your app back on track at your company.
Get started with a free trial.
And I believe they send you a t-shirt, a little cool t-shirt still, over at pythonbytes.fm
slash Datadog.
So Brian, something we haven't spoken about nearly enough is asyncIO and asyncInAwait. Should we touch on that a little?
Sure. Okay. Yeah, we've talked about
some. Some. I believe some, maybe. So,
one of the things that asyncIO is for, I mean,
if you look at the name,
it's around waiting on IO,
waiting on external things like network calls,
API calls, and so on, right?
Oh, I thought it was just trying to be cool,
like all the.io.
It could be that,
or it could just be like the Italian pronunciation.
Async, yo.
Async, yo.
No, it's beautiful.
So when I think of files, I think of IO.
Like if somebody said, what is IO?
I would think file IO.
That's the first thing I would say.
And yet Python doesn't have built-in support for asynchronously working with file IO.
That's bizarre, right?
Yeah, it is.
I believe there's an external package.
I think I saw it somewhere on like awesome async IO or some list like that,
that somebody had built something along those lines. But there's a cool article called
asynchronously opening and closing files in async IO by Chris Wellens. So he wrote this and said,
look, async IO has great support for networking, sub process, inter process communication stuff,
but no file operations like open reading, writing and closing files. And if you're talking to something that might take a long time,
I mean, I don't know about you, but I've got a pretty raging SSD on both my computers.
So maybe I don't need this. Unless you're at that corporate, maybe you're logged in through
a corporate VPN and you've mapped a network share over to your drive. And then you try to read from
that all of a sudden your file. might get super slow, right?
Well, even on SSDs, file I.O. is slower than memory reads.
Yeah, it's much slower.
So there's certainly situations where this could be extreme,
like the network one, but you're right.
Even normal file I.O. can be slow
if you're really looking to squeeze out the most concurrency.
So basically he wrote a little article working through it and it's
ridiculously short actually on how you can do this. Right. So basically he says, look, if I use
open, open file in Python, I would, as a decent Pythonic bit of code, typically I would write
with open thing as file IO object, right? File stream. Let's build that for, so then we're
going to call a open, which is an asynchronous
one. And it's kind of bizarre and weird
that Python has this, but it does. And I
think it's neat. It has an async
with blocks when you do
async things that have to be
asynchronously managed within context managers.
So he said, let's write
this so it implements the async
with style, which
is really simple. you basically implement a couple
of methods instead of dunder enter dunder exit you do dunder a enter dunder a exit and so on okay
and then he says okay well what we're going to do is we're going to define a function that just
opens a file super easy but then we're going to run it in an async io event loop by saying run in executor. And what that means
is async IO
will create a thread pool
where it's going to run over on
a background thread and then it just
runs that and lets you await it.
And that's basically it. Isn't that neat?
That's not much code. No. It's like
the opening bit is one, two, three, it's six
lines of code, including the function
name, which has to be there.
The five lines of writing code.
Yeah.
And one of the things I like about this is not because I really want to do async file
stuff.
It's because it's a neat, neat little example that I can get my head around so that if I
have some other process or other slow thing that I want to make asyncified, this might
be an example to how to do that.
Yeah, absolutely. So I think this is super instructive and interesting. I'll also throw
out that there is an AIO files package. I think it's files, plural. Maybe it's file,
no file, singular AIO file, which you can pip install and then just do this instead of like see the tutorial but the i
think the value here is like well what else doesn't have async support and what could i just
kick over to a thread but then integrate into async io event loops yeah it's nice indeed you
know this is nice excel like so many people who can't do any programming or any scripting or
anything they can just go to excel and like drag a droppy a little uh you know a formula and paste it over and then they're good
to go yeah except except what so except it's 2020 that's the problem yeah so this this is only
tangentially related to python mostly it's that people start using databases in Python, stop using Excel so much.
This article, we had a lot of people actually say, did you guys see this?
Yeah.
So, yeah, lots of people brought this up to us.
I've got an article that I picked.
There's a bunch of articles also, but I picked a BBC.com article because it didn't have very many ads.
So the BBC article says Excel,
why using Microsoft's tool caused COVID-19 results to be lost.
Wow.
So there's a,
uh,
apparently if you haven't heard about this,
apparently there were 16,000 coronavirus cases that went unreported in
England.
The good news is,
is they,
well,
sort of good.
They,
they did only,
it only took like a few days for somebody to
notice this but there is a few days where where there was some stuff not getting tracked right
and policy was like hey things are getting better we're trending down this is amazing
yeah except no such a bad just didn't read it so apparently what yeah you had uh several
commercial for testing firms filling out csv and sending them to, I forget the name of the place,
some health organization in England that was pulling all this stuff together.
And they were pulling it together by putting it all in an Excel XLS template
that could be then uploaded to a central system and made available to NHS
test and trace team, as well as other government computer dashboards. But the use of the XLS
template made it so that there was a limit of 65,000 rows. Actually, that just gives me nightmares
to think of a 65,000 row Excel spreadsheet. But apparently that's the limit.
Nobody quite noticed that they'd hit it. It didn't say anything about failing. And people noticed,
some people said, well, you should have used XLSX because that increases the limit by 16 times.
But still Excel for this? Of course, I was thinking thinking why are you doing this in excel and in this article they had a quote from professor john croft crow sorry crowcroft from the university of cambridge
he says excel is always meant for people mucking around with a bunch of data on their small company
to see what it looked like and then when you need something more serious you build something bespoke
that works there's dozens of other things that could do, but you wouldn't
use an XLS. Nobody would start
with that.
Exactly.
Apparently people did, though,
and so people should be using Python.
Yeah, that's not good.
That is not good. So, I think
there's a really interesting trend of moving
towards things like
pandas to answer these
questions right yeah i don't think that's the answer for everybody right like oh well excel
is kind of clumsy for you so here's what you should do is you should learn a whole bunch of
programming right i mean here's a random story that i would one of the more frustrating things
from my corporate days is when i was doing training we would have to write proposals to send off to clients.
Like, here's what we're going to cover.
Here's what we're going to teach.
Here's your goals.
And here's the timeline and so on.
And I would send that off as a Word document and work with one of the salespeople I worked with.
And they said they'd send it off to the client and something had changed.
The Word doc, like a doc X, said, oh, Michael, I need you to replace this word with that word.
And so she sent me the document back and asked me to replace that word with that word.
I'm like, do you not know about command R or control R?
Or whatever the replace hotkey is.
And why would you ever send me a file and just say, I need this word to do a find and
replace with that word.
But I need to do it first.
I was just like, so anyway, I'm thinking of that excel like you would i would never suggest that that person learn it that said
a lot of excel power users i think would do really well to adopt jupiter lab and pandas and stuff and
actually chris moffitt who's does practical business python just did a webcast with us
over we talked about it before but the recording's up now you can check that out and that'll give you some concrete tips to avoid the excel if possible oh nice good
resource that links in our show notes yeah would you be a fan of uh getting documents sent to you
and asked to do a finder in place on a word i've totally had that happen yeah like i sent you the
doc you could just i mean maybe send it back to me and say,
hey, I made some updates and here's my updates
if you need to store the version.
Yeah, exactly.
Yeah, just make sure I did it right, maybe.
But I mean, it was pretty straightforward.
Anyway, let's move on.
I'm sure everyone out there has a story like that
of you wouldn't believe what I had to do in my corporate job.
So this next one comes to us from a listener,
a person, Daniel, who's given us lots of cool feedback and ideas.
And this one is called locust.io.
This is actually a pretty good pairing with Playwright.
Okay.
So Playwright is about validating that what is on the webpage makes sense.
I can go log in and press the button, and then I go to this page and this text
is here, something like that. Right. As a continuous integration.
So locus is about, okay, you know, that works.
What if 10 people do it at the same time?
What if a hundred people do it at the same time on our current infrastructure?
You hear about things like the whole healthcare debacle where they spent
hundreds of millions
of dollars of code on code on these projects and like a few people logged in and it just failed
and you just wonder like could you just tried it just maybe just seeing like if we call that api
10 times a second will it actually take it right right? And so tools like this are exactly what you want.
It's really cool for just simulating,
accessing a bunch of different sites.
I was just thinking one good use for this
may have been, sorry to interrupt,
maybe the schools could have done this
before they had everybody log in
so that all the kids on their laptops
or their tablets wouldn't have said on day one,
I don't know what's going on.
It won't let me in what's going on it won't
let me in yeah the page won't load it just it keeps giving me the numbers 500 is this a math class
anyway yeah exactly so you should test your code and so i've used these before these types of tools
and often it's like okay what you're going to do is open a web browser and you're going to go to
the site and it'll record like the urls and you can like use some weird like
selection syntax i guess weird clumsy gui maybe it stores it as xml but you have like a ui on top
of it it's all crummy and they probably charge you a ridiculous amount of money for this so here's
the thing with locus it basically looks like you're writing like unit test code so if you look
at the there's an example in the show notes, just check that out.
So what you do is you define a user
and then you give the user some tasks
or some behaviors.
Oh, this is the one that I was thinking.
I'm sorry, I was confused this with your playwright.
So for example, with the user,
like you would say something like self.client.post
to log in and you just give it a dictionary.
Username is this, password is that.
Boom, that's it.
And that will actually go over there
and submit the login form with that data,
which is pretty awesome.
And then you give it tasks.
And these are kind of like tests.
Like go to the index page,
do a get on slash and do a get on the JavaScript.
Go to the about page and do a get on slash about.
Or, you know, go click this button
or go make this thing happen.
And then once you
have this then you can turn that into like a bunch of discrete distributed parallel requests to see
if you get any 500 errors timeout errors like what the average latency is for 10 users 100
users a thousand users at a time you can run it on distributed machines so you can have it
simulate millions of users if you want to run
it on like 20 cloud vms or something like that and turn it on onto your website what do you think
i think this is cool and you're saying that there's a game website that's using this there
is in the notes that they say when they talk about the features they say look you can define user
behavior and code suit just plain python code which is neat it's scalable so you can define user behavior and code. Just plain Python code, which is neat. It's scalable, so you can run it, like I said.
And then it's battle-tested.
Because Locus has been used to simulate millions of simultaneous users
on Battlelog, the web app for Battlefield games.
And so you really could say Locus is battle-tested.
Nice.
I don't know if anybody's seen the trailer for the
battlefield games i've not been paying attention to it for ever but for many many years at least
wow these games have come a long ways like if you watch the trailer for the latest one that's crazy
crazy stuff but it's kind of also beside the point i think this way of saying like this is what a
website user does they log in and then they go to this page and i might also visit this page and
you set up things like not just i want to have so when you answer questions like how many users can
we support typical users are not like pathological they don't go to like your account page and hold
down command r or control r and just refresh it as hard as they can right they'll go there and
they'll spend like three or four seconds five seconds and then they'll go to another thing
they'll spend 10 seconds there then they'll go off and they'll click this button, right? They'll have normal human behavior. So one
of the things you set up in this class, you define that represents a user on your site is the wait
time. So say the wait time is between five and 15 seconds. And then you ask, can it take a million
users? It doesn't just do a million concurrent requests. It has like a million of these things
randomly waiting between five to 15 seconds as they're kind of like interacting randomly with your site.
Oh, cool.
So you could sort of scale this then you could start with something like some
long wait times and then make sure that it can handle like a thousand users or
something and then gradually make it shorter so that it's hitting on your
server harder.
Yeah, exactly.
I think this is really neat.
I don't know that I would necessarily be using
it right now, but if I create something new,
especially something I'm sure is going to get a lot of traffic,
then I would
definitely use this. It looks really neat.
It's free and open source.
Write it in Python. Why the heck not?
The only reason I wouldn't use it now is I've already had
some really big spike events.
I'm like, okay, well, everything's running at like two percent five percent cpu it's
like it's fine i don't know you can totally see i mean there's a huge use case for this is that
like people that have the they're rolling out a new app or even if they're an existing company
rolling out something new and everything looks fine on their server even when they're testing
with like two or three consecutive tests or something.
But are we ready to roll it out?
We don't know how many people are going to hit it.
So they can sort of gauge that.
The one that I always have in mind when I think about this is you've got some app that's been out there and it's kind of getting some traction.
Your company's getting some traction in it. and the company decides we're going to run a Super Bowl ad or we're going to launch some huge marketing campaign on Black Friday
that's way, way out of bounds of what we normally do.
The last thing, I mean, you only get one shot for your app to work
when that Super Bowl ad runs or on that Black Friday event.
If it just goes down for that little bit of time,
it's not like, well, we got it up.
It's fine now.
You've lost that moment and that million dollar spend
or whatever the heck it turns out to be.
So it's like those moments where the spike is unknown,
but also the time which you get to deal with it is short.
Yeah.
Or things like, yeah, I'm pretty sure
that the healthcare marketplace website's ready.
It's fine, yeah.
Sure, Mr. President, this is going to be fine.
It won't be like
blemish your record for all of history all right speaking of things that i'm sure are going to be
fine hacktoberfest was such a it's a good idea in theory potentially we're like in in middle october
or deep into october already i don't know how your repos did but i got a lot of attention did you
yeah no mine yes mine didn't so much i'll you about that, but go ahead and tell people where we're going with this.
Okay, so Hacktoberfest.
Hopefully you know about it, but if you don't, it's an interesting idea sponsored by Digital
Ocean and other sponsors.
Again, Digital Ocean not sponsoring this episode.
Overall, it's a good idea.
So the idea is to encourage people to contribute to open source by bribing them with a t-shirt
and other swag.
That works for geeks. We love our t-shirts like how else are you going to be like wearing your clothes what
do you put in your closet yeah maybe maybe you can buy a t-shirt with a half an hour of work
but we're gonna like have you work for like hours and just get one t-shirt anyway there's always
been some spam with this people abusing it but I think it was not as prevalent as this year. But what happened
this year, and I'm going to link to a video by Anthony Satili titled What's Wrong with Hacktoberfest.
He introduces what Hacktoberfest is, some of the problems, and he recommended some solutions. We're
not going to cover those today. But apparently there was a youtuber this year i think it was in india that did a video on how to get a free t-shirt by doing like it's basically
how to get free free swag with not much work and he did this video to show you how to submit a pull
request to a project and only do things something like update the readme to say an awesome project or change its with it is or something like that.
And then do a pull request saying document or improve docs
and do that for four different repos.
And there you got a t-shirt.
Yeah, I met many of these people.
It turned into a big problem.
So I was actually really thrilled with how fast DigitalOcean and whoever's working
on Hacktoberfest fixed it, or at least hopefully. I'm sure people are still trying to do this,
so I'm sure there's a lot of spam going on. But they changed the rules. So as of the 3rd,
they updated the rules to try to reduce the spam. One of the big things is maintainers can opt in by adding a Hectoverfest
topic to their repo. So a whole bunch of stale old repos won't get hit, hopefully. And then also
you can mark any PR that's dumb as invalid and it invalidates stuff. And actually the full rules is,
let's see, we're going to have it in the show notes it's a little uh little pseudocode so if you submit a pr in the month of october and the pr is labeled as hacktoberfest
accepted by the maintainer or you submitted it to a repo with hacktoberfest topic and the pull
request was merged or it was approved so you can't just submit it and get your t-shirt.
It has to be like some maintainer has to say, yeah, this is good or I approve it or whatever.
It's not automatic anymore.
And also, if you are a maintainer and you've dealt with all the spam, sorry about that.
But also, I'd like to encourage more people to do Hacktoberfest because it's a cool thing.
I didn't want to bring it up before because I didn't want to encourage spam,
but I think these changes will help.
And if you're a maintainer, please be sure to do those notifications by November 1st
because that's the deadline.
Yeah, interesting.
I had no idea what was going on until I saw Anthony Petili's post or Twitter message. You know, somebody came over to some of the,
I have 222 repositories, most of which are public between the courses and various other things. So
there's a bunch of opportunity to go in and make changes, right? So somebody came along to the
beginner, the Python for Absolute Beginners course and said, I would like to add a few little tips
for some beginners to make this slightly better.
You know, we can't change anything
because it needs to match what's in the video.
But if you had a little section that had like some tips
and they were meaningful, sure, I guess that's okay.
And then the next day I woke up and it was like 10 PRs,
not necessarily all from this person,
but from a bunch of different people
with weird things like change the readme
from this, you know, check out our latest course to
check out the latest course and just changing like the word hour to the and i'm like what is
going on that i saw anthony's thing and like okay close close close close close close close just
just straight out like i don't even want to talk to these people this is super annoying and they
weren't just making changes to the readme they would go into they would make changes to like
xml configuration documents i'm like you can't change that that's that's machine that's read
by the machine right that's gonna break something if i accept this not only is it like annoying that
i gotta deal with it but if i were to accept that i'm pretty sure it would break i think maybe it
was like formatting like putting a node closing node bit like on on a line above or like putting
a space i mean i don't think it actually broke it but it was really weird stuff and i didn't understand i was coming from hacktoberfest i
was being hacked by the hacktoberfesters yeah but it has stopped since they've made these changes
which is great oh has it stopped so most of that stuff was in the first few days yeah i haven't
seen the last couple days i didn't realize that's probably because the rules changed i just went
through and like just denied everything that I saw coming in.
Yeah.
I wonder if they forced
the takedown of that video
or maybe it's gone.
Yeah.
Who knows?
Who knows?
Well, I know that that's it
for all of our main topics.
Got anything else
you want to throw out real quick
before we wrap it up
with a joke?
I don't.
I could totally use a joke.
But do you have any extra things?
I do.
There's a really cool conference.
It's, I believe, theoretically was supposed to be this year in Vancouver, B.C.,
which is an absolutely wonderful town to visit, called Pi Cascades.
Cycles between Vancouver, Seattle, and Portland.
Well, this year it's taking a diversion to cycle to the internet because 2020.
Although it's in 2021, like still planning now. So PyCascades 2021 will take
place Saturday, February
20th from the world.
I don't know if they're having any local
stuff going on, but
anyway, it's basically a virtual conference
and the call for proposals is open.
So if you'd like to give a presentation there,
you can do that by November
10th. Submit proposals.
So that would be cool. I think talking at get-togethers like this, meetups, the smaller
not full-blown PyCon, but PyCascades and other types of events are a really
good way to sort of raise your profile and stretch your comfort zone as a developer.
So I encourage people to do it. Also, Patricia... Yeah, I spoke at
the 2020 version that was
just before the world fell apart that's right i was there my daughter and i watched from the
back it was great next thing other thing patricio rains rains who is a researcher at the barcelona
super computing center which by the way they have this virtual tour he sent me oh my god it is so
awesome they have like a pop song for it it is held inside
is the the super literally the supercomputer is inside an old cathedral so like where you know
where all the arches are and where the sermons would have been given like that's where the
supercomputer is that's pretty awesome can we put that link in the show notes too yeah yeah i'll put
it in there yeah but that's not why he sent it to me. He just said, hey, I happen to work here
and I use Jupyter a lot.
You spoke about Black Cell Magic
and then another Black formatter plugin
for Jupyter notebooks.
So he said,
you should also check out nbblack,
nb underscore black,
which works in Jupyter and JupyterLab.
And there's another one
that only works in JupyterLab
called the JupyterLab code formatter. So just like always, we mentioned one thing that we kind of discover and then
listeners are like, that's great. And, and, and here's a bunch of other stuff. So thank you for
that, Patricio. Yeah. Nice. But I love that. I like the multiple tool thing. That's fine.
Yeah, indeed. All right. Let's do a joke. I've chosen some very clear ones that actually have
a visual component as you will. I don't know why I do that, but that's what I've done.
So I'll let you do the first one.
I'll do the second one.
So people who don't know, this is a classical programmer painting.
And the idea is this is a legitimate, real painting from some museum.
Typically, they're hundreds of years old but there's instead of having you know
like flowers in the the tide pools or whatever some random thing that the artist named it it's
renamed with a programming title okay yeah so why don't you quickly describe your picture and then tell us the title. Okay.
So the picture is, it's a white, kind of a white gray background.
I think it's snow or something.
There's some horses running.
There's a whiteout blizzard almost.
Yeah, it's horrible.
Yeah.
And there's some horses running, two horses running, pulling a, what, like a sled or something?
I don't know.
And there's somebody laying on the sled.
All right.
What's the title?
Delivering a Feature in the time of a code freeze this is by anthony petrowski
oil on wood 1883 that's beautiful all right so the one that i got here it's these three guys
they look highly skeptical almost like they're on some kind of mission sneaking out of like really tall grass
on a boat in some kind
of swamp you can see them like really
slowly sort of approaching
and the title is Red Hat Enterprise
Linux Sys Admins Entering
the Docker Convention Floor
Oil on Canvas 1882
isn't that a great one like look at their
face yeah
you gotta check this out click on
the link in your podcast player and see it they're like angry pirates in a canoe yeah it's sort of a
piratey feel to it like they're like oh what are we doing here we're breaking in it's such a weird
world this docker kubernetes i love this thing of like programmer quotes on old on paintings
that's uh it's funny yeah If there's ever some sort of
like artwork exhibition
at a PyCon,
this is happening.
We could probably
do it virtually somehow.
Try to do it
at a virtual conference.
Yes.
I think we could.
Yeah.
Yep.
All right, well,
thanks for being here as always
and thank you everyone
out there who's listening.
Yep.
Bye-bye.
Bye.
Thank you for listening
to Python Bytes.
Follow the show
on Twitter
via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.