Python Bytes - #264 We're just playing games with Jupyter at this point
Episode Date: December 22, 2021Topics covered in this episode: Jupyter Games Canary Tokens A reverse chronology of some Python features Hyperactive GCs and ORMs/ODMs Extras Joke See the full show notes for this episode on the... website at pythonbytes.fm/264
Transcript
Discussion (0)
Hey there, thanks for listening.
Before we jump into this episode,
I just want to remind you that this episode
is brought to you by us over at TalkPython Training
and Brian through his PyTest book.
So if you want to get hands-on
and learn something with Python,
be sure to consider our courses over at TalkPython Training.
Visit them via pythonbytes.fm slash courses.
And if you're looking to do testing
and get better with PyTest,
check out Brian's book at pythonbytes.fm slash PyTest. Enjoy the episode. Hello and welcome to Python Bytes, where we
deliver Python news and headlines directly to your earbuds. This is episode 264, recorded December
22nd, 2021. I'm Michael Kennedy. And I'm Brian Ocken. And I am Kim van Vink. Kim, welcome. You've been on
TalkPython before, but not here. Yeah, that's right. I've done a couple of TalkPythons with
you, including the one where you bravely submitted yourself to questions from your audience. The
other one, I taught them some small tools, so that was very good fun. I'm very much looking
forward to this one as well. You know, both episodes you were on were super popular. One
was about little automation tools and just cool stuff that people can pick up
and use really easily there.
And that was great.
And the Ask Me Anything was surprisingly
one of the more popular episodes as well.
So thank you for being part of that.
And you've been part of the audience for sure.
You've offered comments and feedback
as we do the live show and we're recording.
Basically so, yeah, to be honest.
Yeah, but now here you are on stage.
Thank you for being here.
Tell people a bit about yourself before we get started.
Sure.
I am a DevOps engineer at the moment.
I'm also a move engineering based in South Africa, working with a home loan provider,
a mortgage provider in the American sense.
I've been probably doing Python for close on 20 years.
So the fact that I've shaved means you can't see the gray beard, but I have been around for a while. The gray beard. We're going to come back for
some good jokes at the end about this as well. Not your beard, but just beards in general.
Awesome. That sounds like really fun stuff. So yeah, thanks for being here. Now, before we
actually get into the main content of the show, Brian, I want to do something just a little bit meta.
So I went and pulled up or created a questionnaire for people.
When we first created Python Bytes, we're like, all right, it's 20 minutes.
The time of this episode is going to be 20 minutes.
So we're just going to like knock it out, you and me real quick.
And I think it's grown a little bit.
We've done, we cover a little bit more detail.
We've added a joke.
We've added a few like little extra things. We brought done, we cover a little bit more detail. We've added a joke. We've added a few
like little extra things. We brought on guests like Kim. And is that, is that still in line with
what people want when they signed up? So I put together a questionnaire here that just asked
three simple questions. And I'd really appreciate if listeners could go to the show notes and just
click on the link that says this three-question Google form
or find it on our Twitter account or wherever,
but it should be in your podcast player show notes
right near the top.
And they can just click that and fill it out
and give us some quick feedback on the idea
of having a guest, on the length of the show, and so on.
So anything you want to add about that, Brian?
Just encourage people to give us feedback so we know?
Yeah, I'd love to hear feedback
because sometimes we feel a little guilty
that we're running long, but I enjoy the, a little bit more in-depth conversation. We still
don't go super deep, but I think it's a good, well, I'm flavoring the survey though. So forget
what I said. No, I'd love to hear feedback of what people think. Yeah, absolutely. Yeah. So
people can give us feedback there. We'd really appreciate it. The way people seem to be feeling so far is they, they kind of like the link. They
definitely like the guest format. Uh, so you're welcome here, Kevin, this according to listeners.
Fantastic. Um, but yeah, I think, I think people are generally like him, but still like, let's just
hear from everyone because I'm happy if, if a bunch of the people in the audience are like,
no, we really want no more than 20 minutes. and my going on about this is actually making it still longer, then it would be great to know, right?
So we'll go from there.
And with that, you know, let's play a game.
Jump in the first topic.
Yeah.
I want to talk about Jupyter games.
And the idea around this is IPython Canvas or IPyCanvas with Box box 2d i'll get a little bit more into it but the
gist is um making making video games and small video games is one of the ways that a lot of
us um uh started programming i know that was the that was the case for me uh and there they were
not difficult games but it was difficult enough, these 2D engines.
And some of that's lacking, and I haven't seen that in Jupyter before.
And Jupyter is an excellent platform for a lot of things, especially teaching with people that don't have computers if they use an iPad or something like that.
So often they can still get access to jupiter through hosted systems um so this is a jupiter
this article talks about um writing uh 2d games and mostly it's a 2d physics engine around a
library called box 2d which is a c c plus plus type engine but it's something that you can access
through python and the author yeah the
author those kinds of physics stuff you know when people think of games they think of oh here's what
i got to do to get the picture on the screen oh that's just to start like you need physics you
need collisions there's like so much stuff that also gets done so this is really cool yeah things
like physics and gravity and collision detection and detection and like the examples on this page are great.
But the person that wrote it is Torsten Bier.
And he's one of the, I think he's got a library called PyB2D, which is one of two different Python accesses to this Box2D system.
But it's pretty cool the uh one of the things i like
about this article is that talk it has like lots of pretty examples but physics engines are even
if they're built for games they can also be used for things like uh like a an engine simulation or
even like airflow simulations so there's a lot of cool uses for this too, that
are outside of games. Uh, but the, one of the incredible things is how small the programs can
be. So, uh, the, this, this article has a, a contained, like an attached notebook hosted
notebook that, um, has things like angry shapes, which is like angry birds and a rocket game. And there's
a color mixing game, which I was just fascinated by. There's like a bunch of colors drop into it.
It isn't on the, it isn't listed on the article, but if you go to the example, it's kind of color
mixing thing. And it's, it's only like 70 lines of code. And with that, you can have some amazing physics examples.
And I'm pretty excited about this, actually.
So I'd like to do this.
You know, I think this makes a lot of sense in the notebook form
because you're trying to visualize certain things.
And sometimes graphs are fine,
but other times they just don't capture like flow and that kind of stuff.
And it seems like game animation would
be great kim what do you think i was also going to say if you can get something very impressive
done in 70 lines of code as a learning tool that's brilliant because that's effectively a screen of
code yeah otherwise um you'd be looking at if you're looking at hundreds and hundreds of lines
you know for a seasoned developer that's perfectly reasonable but to a new person that must look
overwhelming yeah yeah If you can
fit a single screen and say, here is it. This is
everything you need to make this thing work.
It's quite a powerful tool. And it looks like
a lot of fun, actually. It does look fun.
Yeah. There's some interesting
the article talks about some interesting
hoops he had to jump through using
iPyEvents and
iPyWidgets and Canvas
to be able to draw things and get, uh, events from
people. But, um, uh, this is just some fun stuff. Here's like the, um, we're showing on the screen,
the, uh, thing like angry birds. Um, and to be honest, like the play ability of it isn't maybe
like, it's not on the level of what you know playing an xbox or something like that
obviously you probably won't hook up a controller do it yeah but um that you can do something like
this so quickly is pretty amazing so i and also on the other hand if you write once you write it
yourself the playability actually doesn't matter that much i think it's you're looking at interacting
the thing you wrote i think that yeah yeah i. I love it. This is really cool.
Nice find, Brian.
All right.
Let me tell you about
some really interesting
cybersecurity side of things.
So I'm going to first tell you
about this thing called a Thinkst Canary,
but that's not actually
what I want to talk about.
It's just to set the stage.
Okay.
So here's a challenge,
something that always stresses me out is what if somebody
was to break into your app, into your systems, into your cloud infrastructure or whatever,
how would you know, right? Like what, what would be the indicator, right? If long, if they don't
trash it, they don't, you know, lock it with a crypto lockers or anything like that ransomware,
then they, they could just cruise around there, right? So
this company Thinkst Canary created this, I think you can put it in the cloud as like a hosted
container type thing, or you can get like a little Raspberry Pi like things and put them
physically on your network if you had a physical network. And you could say you act like a SQL
server, you act like an exchange server, you, if somebody tries to search the network and says,
show me all the active directories, you be that. Maybe we're not even using active directory
because we're not on Windows. But if somebody breaks in, they may well start looking for those
types of things. And what they'll do is they'll trigger alarms if somebody tries to interact with
them and normal things shouldn't, because only if you're like trolling around looking for them,
should it be discovered, right? So that's what this is.
And with this whole log for shell stuff that's going on,
it's just such a nightmare of like,
well, we installed this app that did invoice management for us.
Did it have a log for shell vulnerability?
I don't know, maybe they said they fixed it.
But if somebody gets in,
it's not just we have to patch the log for shell
or the log for J version.
We've also got to then know what else has been
run because they could have installed whatever, right? Yeah. So the thing I actually want to
recommend to Python people is this thing called canary tokens. So check this out. This is
fantastic. So what you can do is you can get different things that will then trigger alarms
like emails or other sorts of stuff to you. So I can come over here and I can say,
I would like to get a URL. And if anybody visits that URL, send me an email and say,
you know, whatever message I put in here. So I could come in and say, here's a URL and send me
at Michael at TalkPython for my email and say, this is hidden in the admin section unused or
something like that. If somebody sends me an email, if I get that email,
somebody's gone in and clicked that link in the admin section of my site.
And if I didn't, it gives you like IP address
and all that sort of stuff of what comes back.
So if I didn't do it or it looks like an unknown IP,
that should be highly concerning, right?
So what else?
That URL is interesting.
I can get a dns token somebody requests like does
a dns look up on um rollouts.pythonbytes.fm i can get an alert to that that'd be pretty interesting
um a unique email address if somebody ever tries to contact that a word document so you get like
a word document and put it in say like sharePoint or something dreadful like that. And if it gets opened, you'll get an email that somebody got that.
Let's see.
You've got VPN wireguards file.
You can create a custom EXE.
And if somebody runs your EXE or a SQL server instance, or you can even do like directly a log for shell link that will run.
So if you are trying to like figure out, just put stuff in there to let you know
if somebody gets into a part they're not supposed to be in.
This is really cool. It's free.
It doesn't cost anything. It doesn't require
any setup. Put a Word document
in a folder. If it gets opened, let us
know. What do you think?
I was going to say, I've been looking for
ways to do exactly this kind of thing
because I'm totally unique
in being concerned that Log4Shell has got impacts that I can't see on our systems.
Just because your public-facing systems happen not to have used log4shell things doesn't mean that you're necessarily safe.
All it means is that if by some other means somebody's got into one of your internal systems, you wouldn't necessarily know that.
So I'm very much interested in this i i knew
about canaries already um things happen to sponsor the the local south african pycon za conference
um but i canary tokens are a very funky additional add-on to that exactly i knew about the canaries
as well i'm like ah but that doesn't really apply to the world that i live in i'm not like an
enterprise but like this these make a lot of sense and they're free, which I think is cool. Yeah. Here's what it looks like. If you get
a notice, it says, this is the email I got. Your Canary token was triggered. The channel was HTTP.
The token was that this is a test, the IP address of the person. So this was one of those URLs.
Somebody interacts with this URL. Let me know. Here's their user agent. Here's the message.
There's the IP and so on. So you would just get a notice like that that says somebody clicked on
something they shouldn't have had access to yeah so anyway pretty neat brian yeah i'm not sure
yeah it's it's actually pretty cool um some of the things i didn't think you could
i wouldn't even expect like can somebody cloning website. Yeah. Didn't know that was a thing.
I'm scared not to be honest.
I didn't realize that was something I should be worrying about.
Get an alert when a MySQL dump is loaded.
Like, okay.
Like how does that happen?
I don't know, but that's pretty awesome that it's possible and also frightening.
Yeah.
Yeah.
And Sam out in the audience says, ironically, the log for shell might have its own vulnerabilities.
You know, that thing's been patched a couple of times.
It's going to be a big, big problem.
Anyway, canary tokens.
I think this is broadly useful for Python people.
You could put the URL stuff inside of your app.
You could put an email inside of locations.
There's lots of stuff like the database restore type things and so on this
this looks useful yeah so i'm still a little lost you throw this like for instance like you said in
the admin section that you shouldn't be using and you just know about it so you don't click it or
something yeah so imagine this imagine you've got um in your admin section you've got a like a search
for user button and then next to it you you could just put an export all data.
Yeah.
And then put one of these URLs at the endpoint.
And nobody who works, you just tell everyone,
never click the export all data.
It doesn't do anything.
But if someone were to break in,
what's the first thing they're going to want?
Oh, well, let's get the export all data.
Boom.
They'll go click it and you'll know.
They're still in.
It's bad, but at least they're not in
and just have unlimited time to be in
you know yeah you can put some other stuff too like let's say you've got a django website and
you stick uh you you load a like a php admin page or something like that um just at the same url
in case somebody's trying to grab that or something yep yeah a lot of a lot of interesting
little uh breadcrumbs you can leave in there. Okay. Kim, that brings us to yours.
Sure.
The first topic I was going to talk about are actually two similar, but not quite the
same pieces of software by PyAutoGUI and PyWinAuto are both toolkits for automating GUIs,
effectively automating GUIs for interacting programmatically with GUIs.
Nice.
Which is normally really hard, right?
Hey, before you go on, before you go on, could you give that like three control pluses?
Just for the watchers.
Just now it's a little bit on the small side.
Thanks.
How's that?
A little more.
Space to play with.
There you go.
Fair enough.
Well, let me just, while I remember, do it to this one as well.
They both happen to be read the docs documents.
So you're quite right.
The programmatically controlling a GUI, it can be quite a pain, particularly for GUIs that aren't particularly easy to understand.
And the reason I bring tools like this up is that there's quite a lot of use cases.
I can think of two examples off the top of my own career, and I'm sure there's hundreds more, where this kind of thing is useful and you might not know it's something you can do.
And the kind of examples I'm thinking of are particularly in, I'm sure, much enterprise
and in industrial software.
When you get a piece of equipment, you frequently get a GUI tool that accompanies it.
Probably no API, right?
Well, no API whatsoever.
There's a tool you fire up and you set all the settings.
But because the company that supplied you the piece of equipment, they don't write software.
It's not their thing.
They either outsource the tool or the intern writes it.
And it has 50 checkboxes laid out in grid form.
And you need to set it up every single time you want to use that piece of software.
There's no ability to remember what you set.
There's nothing to do.
And I've worked with a couple of those systems.
And I see, Brian, I think you probably have as well.
We basically, there's a piece of paper
next to the computer the software is on
with a screen print of what the settings should be
so that the poor sucker has to come down and use it,
knows which of the 50 tick boxes to check
and then they have to check
that the pattern effectively matches on screen
and then they hit run.
And something like PyAuto GUI or PyWinAuto
are both useful so that you can effectively script the startup of that app.
And you can say to your right, a small piece of Python that fires this tool up, identifies all the checkboxes, ticks the ones you've programmed in, and then either leaves it for the human to push go or whatever it is the app does, or for that matter, pushes go itself and then closes the app and records that it did that.
So that kind of use case is very powerful. And I think there are lots
of cases, particularly in enterprise software or internal software that somebody wrote for
the company that does something very useful, but it's been around for 20 years and the guy who
wrote it is not around. Nobody wants to touch it because the source is terrifying. So nobody's
going to sit down and change it. How do you even get that Visual Basic
6 or Visual Basic 5 installed again? Well, exactly. You don't even know, right? How do you even compile it now?
Exactly.
So to be able to wrap it is a very powerful thing to be able to do.
And the other kind of use case that's somewhat related, it also comes to mind, is I've spent
a large amount of my career doing industrial automation, factory-based type work.
And there, the faster you can go and the fewer steps you need a human to repeatedly do, the better for you in many ways.
The human's time is best spent actually manipulating objects and checking things rather than opening pieces of software and clicking boxes and closing them again.
So quite frequently, we've had cases on the production line where the vendor of the chip we're using has supplied this tool that does some security-related thing.
And it's a GUI tool.
And every single time you would have to open it up you'd have to click the same two boxes you'd have
to say yes secure this chip close it again repeat wait for another one to arrive at your at your
workstation and if you can automate it again with a wrapping tool nobody need even be involved at
all effectively part of your production process is you wrap it you fire up the tool you click the two
buttons programmatically you hit go and you close it again and repeat.
And again, I personally have encountered situations
where that's useful, and I'd like to,
I would imagine I'm far from alone in it,
so I just thought I'd mention these things do exist.
I suspect lots of people do use them,
but for people who don't know they're there,
very useful things to be able to do.
Wrapping GUIs is, it's a bit tedious up front
because often these tools aren't
very well written. So you'll have checkbox one, checkbox four, checkbox 27, checkbox 295,
and no obvious naming consistency with what they do or how they work. But once you've figured it
out, let the computer worry, let the script worry about what those checkboxes do.
I've seen the backside of that code where you're like looking at some event handler
and it's like if checkbox 24 dot checked then do this like what in the world like who exactly
didn't want to name this because they got a program against those names that's insane
well they just do one at a time when you're working on exactly it yeah yeah so you're working
on one feature and you go oh i need a checkbox checkbox. Oh, the default is checkbox 24. Then you look for the, you deal the callback handling and you just, you just did it. So, you know, it's 24.
So you don't want to bother changing it. That's cool. Is this necessarily have a user interface the the thing that this
doesn't i don't think these do like web stuff the web automations other tools well i presume you
could automate a browser but i mean by the time you're doing that you might as well be using
the the tools designed for it yeah yeah selenium or something what i what i'd really hope is
anybody that has any sort of tool that they're writing in uh in on a web so web frameworks often
get internal tools get written into web frameworks and uh and then people forget to throw ids in
things so yes the the best way to automate a web stuff is to have an id that you can grab onto
but often they're just these in these nested div nightmares but anyway um yeah there's a couple
tools that we've used uh by win auto for that
are it's pretty nice yeah very nice yeah it seems like if you're building a gooey app you could test
it with this right sort of full-on integration tests from the outside and also i was talking
to somebody and they were like well this app that i work on it doesn't have like a concept of a back
button so you drive drive into the menu hit a thing go and then it'll take you back home it's like 10 steps right i could definitely see a little toolbar thing you press
a couple buttons like get me to this scenario and i'll put the last thing in get me to that
scenario like do the nine steps i'll do the tenth exactly yeah yeah in many ways what the way i've
mainly encountered it has been that the first scenario laid out not so much actually automating the full of the tool, but setting the tool up so that it is in the right state for what the company needs without having somebody have to either consult a document and risk getting it wrong or not know which of the settings they should have because that piece of paper isn't with the computer anymore.
All that kind of thing.
It shouldn't happen, but it does.
And it's much easier to have this kind of, to have the computer worry about what the settings should be.
Ideally, the program should remember that,
but, you know, if they don't, they don't.
It's not much you can do to change that off of the fact.
It's like external intelligence for a bad app.
That's right.
Well, there's also like API stuff that people forget about.
Like, I've got a device that I need to automate
connecting it to Windows
and getting the device set up or something
every time I plug one in.
And, you know, just automating that works sometimes too.
So anyway.
Oh, yeah.
All right, Brian, over to you.
Thanks.
I saw this, Brett Cannon wrote an article
called a reverse chronology of some
Python features. And I really love this article. It's pretty simple. One of the things I like about
it is just because we cover so much and we've been covering Python releases for quite a while.
I kind of forget which releases got, I got which feature in. So a, a really brief, you know,
rundown of some of the different features is,
is nice.
Like,
like last week we were talking and saying,
well,
well you're on,
if you're on three,
seven,
why would you want to move forward?
And I,
you know,
I can't remember which features in which.
So having a quick bullet list,
like like in three 10,
we got the match statement.
Of course,
we've talked about that recently,
but also better, better error messages.
And I'm going to pause a little bit.
Brett brings up in the introduction discussion
that if you're kind of one of those people
that think Python's kind of getting bloated
and they're throwing too much stuff in it,
and I wish that we had the good old days
where you could just think about all Python in your own head,
well, you kind of throw everything out. And I wish that we had the good old days where you could just think about all Python in your own head.
Well, you kind of throw everything out.
If you if he said he recommends going down this list and picking the first feature that you don't think you could live without.
And and everything before that led to that.
So you can't throw that stuff out either.
It all kind of goes together. And one of the examples is the, um, uh, the match statement
or the, um, uh, what are they pattern matching that, um, that was sort of controversial, but the,
um, the, the code to get that to work involved or the process involved, even like making a new,
uh, parser for Python, um, or using a new person for Python. And, but with that new parser, then things like better
error messages are possible. So, uh, if you like better messages, which I do, that means three 10
and everything below kind of has to stay. Um, but anyway, it's kind of funny. The moving on,
I, like, I forgot what the dictionary support for, uh, or equal, that came in in 3.9.
So if somebody's thinking, well, why should I upgrade?
This is a good list to take a look at.
Nice.
All right.
I did the little exercise.
I've decided 3.7.
3.7 if you want.
So what was the thing in 3.7 that you can't live without?
So the dictionary preserving order yeah
stuff is really nice for like reading writing files and making sure that they don't um diff hard
you know what i mean if you try to like so they're in the order you put them there all the other
stuff i'm not hating on it like i like the walrus operator i like some of the other things i like the
lowercase list bracket int rather than importing types all those are great i'm not knocking them
i'm just saying like where would i go oh this it starts to hurt where it really starts to hurt for
me at three seven and below well i was i was trying a jupiter like jupiter an interactive
jupiter system the other day looking at some data science stuff and it was already set up
and i tried to throw in this um the uh the f string value equal thing to be able to quickly debug a
item and it didn't work too soon what the heck and it turned out it was using three seven and not
three eight um and apparently i'm very used to that and i don't think i could live without it
yeah but and then uh reminder also that uh 311 when it comes out in a year um it's um
there's gonna have a lot of speedups.
Yeah, if that comes with a lot of the performance stuff,
then that's my new number.
Jim, where are you?
If you forced me to roll back,
I would refuse to go further than 3.6
because I must have those F-strings.
Yeah.
Because they basically just make your code
so much more attractive.
That said, while I don't necessarily use
everything that comes in the new versions,
I don't particularly have any problem with them being there.
I'm quite happy to just use the Pulsar Python I want.
And what really happens to me is that I don't necessarily know I can do something
until two versions later.
I probably only started doing that val equals on 3.9, for example.
Mainly because that's probably the first time I needed it more than anything else.
I don't particularly rush forward and use the new features when they're
available,
but I'm glad that they,
when I do ultimately want them.
Yeah.
Three,
six is an interesting example that you bring up because it's got F
strings.
It's got a whole bunch of other stuff too,
but really we can stop with F strings.
Pretty much.
Yeah.
Yeah.
Yeah.
And then the,
the debugging stuff,
Sam and audience says, yes, F curly bracket name equals is indispensable for a debug., yeah. Yeah, yeah. And then the debugging stuff, Sam, the audience says,
yes, f curly bracket name equals
is indispensable for debugging.
Oh, yeah, I'm with him.
As I say, I hadn't used it
when it first became available,
but I would really not want
to not have it available now.
Yeah.
I'm a caveman print debugger.
So, yeah.
Kim, I like your take on it.
Like, it's not going to hurt me
if I don't care about it.
I think one of the powers of Python is that you can be very successful with python with a partial quite partial
understanding of what it even is you don't need to know what a generator is what a yield is like
what an expression is what a class is maybe not even how to create a function just just write the
code top to bottom and it'll probably still do something for you. And so you can sort of bring these in when it makes sense.
Yeah, I would definitely still not teach match statements to beginners.
It's unnecessary.
Exactly.
Yeah.
Totally agree.
Whereas I would use if strings, for example, for a beginner because it's just so much more readable than the other stuff is.
But you're right.
You don't have to magically use it all because it's there.
I'm sure there's people out there who feel like, I've got to use it. It's here. But no, I agree with you. All right. You don't have to magically, you don't have to use it all because it's there. I'm sure there's people out there who feel like,
I got to use it, it's here.
But no, I agree with you.
All right.
I don't think I've ever written a WordPress operator,
for example.
Sorry, you're saying.
Yeah, I actually took down a TalkPython website
or the training website,
one of them with the Walrus operator,
because I put the Walrus operator in a utility script
that's not actually used by the site,
but the site scans all
the files trying to figure out where the handlers the view methods are and it it killed it because
i forgot that this is way back when it was still running 37 so that was my my first really oh my
gosh but yeah now i use it it's good all right so i want to talk about something that i've actually
personally been working on lately this is a follow-up to a
TalkPython episode I did where I interviewed Mike Bayer, came on, did a great job, talked about
SQL Alchemy 2 and so on. And I mentioned that just the way that Python's GC is set up is it's
somewhat hostile to things like ORMs where they have to create a bunch of objects and return them to you
in one batch. And what do I mean by that? Well, if I'm going to do a query and it's going to return
a thousand records, like the best case scenario is it has to create a thousand classes, SQL
alchemy models, and give them back, right? If I'm asking for them as a list. Well, the way the GC
and Python works, not the reference counting, but the garbage collector
is after 700 allocations of container types, classes, dictionaries, lists, et cetera, that
do not get cleaned up 700 surviving over the cleanups over a period of time, that's going
to trigger a garbage collection.
And so I said, ah, you know, like, is there something you could do?
Is there something we could like kind of think about with ORMs?
This is not at all specific to SQL Alchemy. This is happens. I have a,
an example here called Python's GC and ORMs as a app and a little conversation on GitHub.
And I said, is there something we maybe can do? Or have you guys thought about it? Cause I don't
really sure what the answer is and said, not, not so much. Sure. But here, check this out. So I
created this app. It creates a000 records in both a SQLite database
and a MongoDB database.
So we have like two really different examples.
And then you run a query that returns 20,000 records.
It's probably a lot.
Just in that sense, you've been in the next 100,000 records.
Yeah, if I didn't say that.
100,000 records in the database,
and it gets 20,000 of them back.
Okay?
It's probably a little extra,
but for example, if you go to, you go over to the talk Python training site over here, we've got a site map. And in this site map, there are many, many holding down the page down arrow
and you can barely see it. We've got to get like 5,000 records, 6,000 records just to like list out the number of the pages that contain transcripts for the sitemap, right?
So it's not entirely unreasonable that you would get a bunch of records back and then
do something like render a page with it, right?
Well, under this scenario, if you just run straight Python, that single query results
in 1,859 garbage collection runs
just to get one answer back.
Is that insane?
None of which is garbage.
Yeah, it's not garbage yet
because it's just being realized from the database, right?
Like it hasn't even come into existence all the way yet.
And it's just like garbage, garbage, garbage, garbage, garbage.
And it takes 900 milliseconds.
If you go and you tweak it in a
way that i described here which you may or may not want to do but if you did if you tweak the
garbage collector it will go from 1800 collections to 29 64 times less the speed of the program is
23 faster okay but it also uses less memory.
Okay.
Less garbage collection.
Less,
lots less garbage collection.
And it's not just 1800 versus 29.
Python has this 100 to 10 to 1 ratio of Gen 0, Gen 1, and Gen 2 collections.
And Gen 0 collections are pretty cheap
because it just touches new memory
and looks at it.
Gen 1 looks at like stuff that's only been inspected once.
And Gen 2 inspects the entire memory space.
For it to see, right?
So this one will also trigger, what is that?
185.
Yeah, 185 Gen 1.
So 18 Gen 2s, right?
So it's not just, oh, there's fewer.
There's also like this other 29 here. This is 0 Gen 2 collections, verys, right? So it's not just, oh, there's fewer. There's also like this other 29 here.
This is zero Gen 2 collections, very likely, right?
So it's not just the number.
They're also like cheaper than doing that.
So this is pretty interesting.
What do you got to do?
You just say you run less frequently on allocations
and then leave everything else alone.
Does it make a lot of sense for absolutely everything? Probably not. There's probably some scenario with lots of cycles
that this is a problem. But anyway, this is an interesting thing to sort of consider if you
are doing some kind of API or a website or something that queries a lot of data, over 700
records, basically, you're going to absolutely encourage ec when you know it's not garbage
right so i don't know um i thought this was interesting i'll put it out there for people
to play with and uh get some feedback it should be fun to hear about it i think this is very
interesting um and i uh i'm going to i mean i plan on playing with the garbage collection myself
so i'm glad you have this little sample app thing up to start playing with it.
One of the things that you can do that a lot of people don't mess with too much is not slowing down the frequency, but you can disable it and enable it.
And I'm not sure.
I'd like to play with that a little bit more to see if you can kind of kick it off or something like that.
You can disable and you can call GC collect if you need to.
So like it's there. I'm not sure if it makes sense to do it, but the switches are there.
Yeah, I mean, there's I mean, there's times where I mean, you're not going to get real time with Python, but you can you can get there's times where you know that you're not doing anything else.
So garbage collection is fine. And there's times where you're doing an event and you really want to get done with this as fast as possible.
So it might make sense to turn off GC.
Sure. And for people who are not super focused on this, turning off garbage collection or altering garbage collection only affects a very small portion of Python's memory.
Because the primary way is reference counting.
So reference counting, things stop referring to it it goes away only in the case where there are cycles does gc even apply
right so that's actually unless you've got really interesting algorithms that are super focused on
that kind of stuff you know you probably don't even have cycles or very rarely do you yeah
interesting it's not a one size fits all solution but where it does fit it's a pretty
simple thing to do that really makes a heck of a difference yeah it's it's quite interesting so my
musings was well maybe someday python will have an adaptive gc where it runs a certain number of
times and says oh i ran but i didn't actually find any garbage any cycles so let me back off
that threshold by a factor or two and then i didn't find any garbage again so cycles. So let me back off that threshold by a factor or two.
And then I didn't find any garbage again.
So I'll back it off.
And then I'll look, I found a bunch.
So now we got to start doing this more frequently.
And I could see like
an adaptive garbage collector
turning these numbers.
But until then, I just cranked it up.
Yeah.
Interesting.
All right.
Yeah.
Kim, you want to take us out of here
for our main topics?
Sure.
The other topic I was going to talk about
is a tool called Docker Slim,
which basically is...
It already sounds good. I don't know what it does, but
the opposite of already is.
I want my Dockers
to be slim. Let's do it.
It's effectively, as far as I can tell,
well, not quite magic, but it certainly seems like
it. I use Docker quite extensively
at work, and because I
use a fair amount of it at work, I started using it for a lot of personal stuff as well.
And the websites I deploy in my own writing, little things running my own systems are all in Docker containers.
And unless you take a lot of care about it, your Docker images can end up quite large.
If you start with just a Python in an Ubuntu base, for example, you're probably looking at about a gig of Docker image before you get anything done. Now, the way Docker works, unless you have just one of those things,
if you've got more than one, you start to benefit from shared layers. So you're not having a gig
and another gig and another gig, et cetera, but still it all kind of adds up.
Docker Slim is a tool to basically look at your existing images, do some analysis and give you
back a much smaller and in many ways much more secure image um i have
run this i read earlier today just to kind of check that i wasn't misremembering from the last
time i used it and i fed it an image i had which was an incredibly simple small little floss api
app i had written and it had one job it basically whenever you sent anything to an endpoint it
printed out what that was uh forget exactly why I needed that. I think I was having trouble figuring out some
suppliers. It wasn't documented how some supplier's web
was going to work. So basically, I set this up and I said, talk to me, and then looked at what it said.
Exactly. That's way better than trusting their outdated,
crappy, inconsistent documentation. It's just, all right, why don't you just call it?
We'll just print out the JSON document.
And then we'll go from there.
So yeah, as a side note, that was quite an easy thing to do.
But that was, I just put that into a Ubuntu-based container
running, I forget exactly what.
Presumably, I was using FAST API.
So it would have been Python and Ubuntu and FAST API
and et cetera.
And that was about a gig of, of image.
I fed that to Docker slim and I ended up with 48 makes.
Um, and it still worked.
It did everything it was supposed to do.
I've granted, I fed the simplest thing I had.
I mean, at one end point and so forth.
I have, there's a lot of dependencies.
There's Python, there's flask, maybe there's even micro
whiskey or something running there.
Who knows?
But yeah, well, exactly.
Um, what it has done is it's closed down all sorts of other angles of attack.
It makes it sound a bit dramatic, but all sorts of ways that you could interface with
the container that you don't necessarily need.
It no longer has, for example, a...
Bash is no longer available and you can't run it in interactive mode and talk to it,
which is not necessarily a 100% positive thing.
It makes debugging a bit harder, but they do have some solutions for how you can do that with
side containers and talk to it in other ways and the like.
And if you go through their documentation, effectively, they're doing all the security stuff
and the app-ommering stuff and all sorts of things that I know are important, but I don't know enough about
to do right. I don't trust myself to do those things correctly. I can basically follow someone's
suggestions, but I have absolutely no way of knowing if the suggestions I'm following are
valid. I'm not immersed enough in this world to know what the best thing is to do. So I'm much
happier to have somebody come along and say, we've written this tool, we get this stuff.
We'll do the best we can to make it more secure. Even if it isn't 100% secure, it's far better
than I was going to achieve my own. And I haven't used it enough to get a 100% recommendation
that this will fit every use case.
I'm sure like every tool, there's things that does well,
there's things that doesn't do well.
There's some use cases where it's maybe not so suited.
But just from a little bit of experimentation with it,
it looks like something I'm going to be inserting
into my tool chain where I can,
because the smaller the images are, the better, really.
Especially if we're all working from home,
we're putting these things down
from servers that aren't actually
in the building that you're in anymore.
And if you're doing continuous
deployment, which means pushing those
actual images, then
you want to build that quicker.
Yeah, cool. Very nice. Yeah, one of the things that
Docker's used for that I think
a lot of web people don't think about
is cross-compiling. That's one
of the places where Docker shows up. And it's one of the places I use it is to compile on a machine
that I don't have access to. So I can have a Docker image, like I can have a Windows machine
with a Linux Docker image or something, and I can do compiling in there. So slimming that down
speeds up my compiles or I conceptually would.
So I think this is something that definitely to try
if you're using that.
Exactly.
You've reminded me in a similar vein,
Docker is the basis of our continuous integration systems.
The ultimate end result is built inside a Docker container
with all the bits we need.
That can take quite a while because it can be quite large.
You can slim that down as well.
The faster you see I is, the better for you, really.
Yeah, always.
Yeah, absolutely.
All right, well, Brian, I think that might be it.
Time for some extras.
Oh, I do want to do a quick follow-up.
I thought these were extras, but they're actually not.
They're things that I do want to point out really quick.
I actually gave a talk on this whole memory thing.
If that GC conversation sounds interesting to you over at the Python web conferences
here.
So people can check that out and also have a talk Python class that like dives into a
whole bunch of this stuff.
Nice.
I meant to include that in the before thing.
Now we're at the extras.
Let's talk about that.
What do you got?
Um, the only thing, one of the things I want to shout out is to everybody that supported the,
the PyTest book.
So pragmatic,
pragmatic,
if you just go to the main page,
there is a bestsellers link that has had a Python testing with PyTest on it
for many weeks now in the top six.
And I just wanted to thank everybody
that supported the book and helped the success of this. Also, the feedback that I got of the
technical reviewers and plus many other people going through and submitting a RATA is going to
make this a really solid book. And I'm really just happy to be part of a community to put this
together. So thanks. Yeah, congratulations. That's awesome.
Kim, you got anything extra you want to throw out there?
A couple of small things I was hoping to mention if we had the time.
I see we've actually got MessWithDNS up on screen.
This is a good place to start.
I just wanted to mention this little website,
MessWithDNS.net,
which Julia Evans, who on Twitter is Bork,
and she produces a variety of excellent webzines and so forth. I think
you've actually, you've discussed her Git learning webzines before. That's the one. Yeah.
And I think there's an HR friendly one whose name I can't remember.
Oh, shucks.
The memorable one. Yeah, exactly. She released something I got. Yeah. She released
messwithdns.net recently as effectively a way to play with DNS without breaking your actual website, which isn't something I'd ever thought to look for.
But now that it's around, it's actually a brilliant idea.
There are some hard-to-understand things based into DNS.
And what is an A record and a C name?
And if your TTL is a three-digit number versus a five-digit digit number, what's the difference or for that matter, what does TTL mean?
And it's not necessarily an explainer for all these things, but it is a way to make
these settings and see what they do without actually breaking a real website.
So effectively she's spun up a sub domain, um, with a assigned name.
This one I happen to be on is goblin61messwithdns.com.
The worst you can do is break goblin61.messwithdns.com,
and that will then just go away for the next person who comes along.
So it's actually a really smart, really clever idea.
Typical to Julia's thoroughness,
she's got a series of experimental suggestions on the side.
Here are some things you can try.
Here are some tutorials.
How about making a CNAME?
Or here are some weird things you can try.
What happens if you've got a very long TTL?
Or you convince three different DNS servers that your subdomain has three different IPs.
Why you would do that is a mystery to me.
But what would happen if you did is something you can explore with this site without actually breaking your real website.
And this seems like a very useful learning tool.
Yeah, absolutely.
Cool. I love it. That's fantastic.
Two other small things I just wanted to mention.
One, just because I use it all the time and I don't know how common knowledge it is,
it is possible in Python, and I don't have a webpage to open for this,
to run a small little web server.
If you do python-m http.serve or.server, I've gone blank on which it is now, to be honest.
.server.
.server, yeah.
I'm reading your notes. I don't actually know. I'm just going back to the is now, to be honest. Dot server. Dot server, yeah. I'm reading your notes.
I don't actually know.
I'm just going back to the notes to have a look myself.
That effectively fires up a web server in the directory you open it in
and serves up the files that are there or the subdirectories that are in there.
There's no security.
There's no attractiveness.
There's no styling.
There's no anything of the sort.
You wouldn't serve this to the public.
But if you wanted to get a file off the machine,
and I do this quite a lot to get files onto my phone,
for example, firing a web server there and then
and just pointing either a script or your own,
you know, just to send your browser to the local host
with the port you gave it,
and just download the files from there.
It's a useful thing to be able to do.
Yeah, that's a cool trick.
Directory browsing, basically, yeah.
Exactly.
And then the final little extra I just wanted to talk
about, and this is a little more tongue-in-cheek
somewhat, in both last week's
Python Bytes and on recent Talk
Python episodes, you have been speaking
a little bit about different ways of doing Git.
You were discussing doing all
your Git on the CLI, and I think
one of your
audience members at the last Python Byte suggested
the way they do Git is just mash all the buttons they can find
in VS Code. There is, I just
want to put out there, there is
a middle ground that you could be looking at. There's
a tool called Magit,
M-A-G-I-T, which is
effectively, if you're an Emacs
user and you don't know Magit, you should change that
immediately. Magit is
effectively a brilliant way of
doing, to me, a effectively a brilliant way of doing,
to me, a brilliant, indispensable way of doing Git inside Emacs.
Granted, it does mean you need to learn Emacs,
but in just a couple of short years after that,
you should be an expert at,
you should find Magit indispensable.
So take a couple of years to learn Emacs.
I'm not disputing that.
But once you've got the Emacs down,
Magit really is an excellent option
to look at doing your Git with.
So if you're tired of doing it on the CLI,
just set some years aside,
learn yourself some Emacs,
turn to Magit and then wonder
how you ever did anything else.
Set some years aside.
I don't think that's fair to Emacs,
but just a little bit of too much.
I'll concede Emacs is a much longer learning curve than VI,
but it's not Gears.
And I say this, I mean,
yeah. Yeah, and Mario and the audience
are taking credit for the VS
code button matching.
Right. Right on.
Cool. Yeah, that's a great recommendation.
Alright, is that it for your actions?
I should just point out in terms of
being unfair to Emacs, I've been using it for more than
20 years and I find it almost impossible to use anything else.
But I'm sure it didn't take me years to learn.
It's just been a long time.
That's right.
Well, all right.
I got a few throughout there.
Actually, just one.
I made a comment, I think on the last show, Brian,
about using emojis in my code.
Yeah.
So I wanted to bring that example up.
So here's like a little CMS thing that I got going on.
And if you return
a collection,
like themes are represented
by these little tags
in the CMS.
And if you return
a collection,
the comment has a list
of emojis.
And if you return,
if they're just like
processing a single emoji,
a single thing,
you get that emoji.
For pages,
you get a list of
page emojis
and so on.
Anyway, that's what I had
in mind when I talked about that.
That's pretty cool use. Yeah. You can sort of just scan through. Oh, look, there's a list of these.jis and so on. Anyway, that's what I had in mind when I talked about that. That's pretty cool to use.
Yeah.
You can sort of just scan through,
go, oh, look, there's a list of these.
This must be doing a bunch of stuff.
I don't know.
I could probably come up with something like a modifying.
I'm going to change a theme versus read a theme
or something like that.
Yeah.
Anyway, well, that brings us to the laughs.
And I hope you all enjoy Schadenfreude
because it's bad this time.
Thank you, Log4J. okay so uh let's see first of all this is not schadenfreude this is just something about the cookies my
daughter yesterday gave me this candle it has a website we use cookies to improve our performance
and then me same i just eat cookies i thought that was really just funny for like a tech candle
that it should be it should be a tan candle. It should be a, it should
be a tan of cookies though. I know it should. It absolutely should. At least it should smell like
cookies. It says scented. I have no idea what scent it is, but it better smell like websites.
Maybe. Maybe. And then I just want to point out more practically, I have this add on you can get
for all the browsers. I don't care about cookies. And if it sees one of those cookie warnings,
it'll try to click it and just accept it.
Oh, this is indispensable.
That's brilliant.
Absolutely.
And then Brian Skin starts us off
with the log4j stuff.
So if you remember, if you're aware,
log4j, the problem with log4j
is if you try to log a piece of text,
even as an argument,
if that text has J and D I colon
L A D P L A L D A P colon slash slash to some Java library, instead of logging it, it will execute
that Java stuff. Even if it's remotely on the internet, then it'll output the result of that,
like you're hacked or whatever. Right. So we we've all heard of little Bobby tables, right? Here's the modern day one. Hi, this is your son's school. We're having
computer trouble. Oh dear. Did he break something? Well, in a way, did you really name your son?
Curly, you know, dollar curly J N D I colon L D A P colon slash slash Evil Corp, parenthesis, parenthesis, Bobby.
Oh yes, little Bobby Jindy, we call him.
Well, we've got our servers crypto locked.
I hope you're happy.
I hope you've learned to synthesize your log4j inputs.
Isn't that fantastic?
Yeah, I have a feeling that this is going to go on.
It'll be the next, It isn't Log4J.
It'll be something else next year.
Yeah.
Well, I mean, it's been there for 10 years.
Exactly.
It's not a new thing, unfortunately.
It's not even a vulnerability.
It's just, wait, you can actually do that on purpose?
It's a feature.
And Brian helpfully suggests this actually came from Log4J memes.com. So we got to go there for a second.
Well, of course that exists.
Of course. And oh my gosh, like look at this picture. So Brian,
will you describe this person for me on the screen? There's a person in a saying next to him.
Old white guy.
To me, he looks like a perfect sort of grandpa sort of character, right? Getting up there,
probably 70. Nothing wrong with the guy, but it says upgrading Log4J three times wasn't that stressful. Dave, 28 years old.
What else have we got in here? We've got-
I wish that was outrageously funny and not just kind of truish, but yeah.
I know. Here's like a 1940s looking picture, like a dad and some kids hanging around.
Daddy, what did you do during the great war? The log for shell incident. Let's see. There's a few of you go in here. How many days
since such and such accident? Zero days without log for JCVE. And there's like Homer running
around with like a nuclear glowing stick. You can can spend some time in this place. It's, it's probably unhealthy. There's like a grim Reaper just going through taking out technology
and it has a log for J on the grim Reaper, you know? Let me see if I can find one more that
there's, there's some really good ones. This one is probably good. There's a picture of a guy in
a tuxedo says vendor, not vulnerable to log for4j but there's a mirror and you see the back of him his clothes are just all god it says uses eol yeah j4 yeah that one's
pretty gross i want to get that on the screen but yeah they're these are these are just fantastic
here um so anyway people can check out the memes thanks brian for sending that in brian skin yeah
yeah i can say i i am reminded i did see one the day, I don't know that I could put it up now,
but it's effectively that
I'd just seen it
in various other means,
a chap receiving an award
from his manager.
So, you know,
me receiving an award
from the manager
for not being vulnerable
to the log4j vulnerabilities.
And the inside thinking,
that's mainly because
I chose not to log in.
I completely forgot
to log anything.
Exactly.
Oh, that's really good.
Yeah, I hear that tweet.
Today, Java runs on billions of devices.
It's not a statement of pride, but a statement of pure terror.
All right.
Well, I don't want to hit on Java too hard, but the log4j, I just cannot believe somebody thought it's a fantastic idea to execute remote code that you cannot escape.
From a logging system. cannot escape from a logging system
yeah in a logging system it's just what did you think you would get so here we are yeah with
log4jmemes.com if you want to scroll through it let's back up and say somebody thought
writing an application in java was a good idea no sorry No, sorry. I'll get hate mail for that one.
Yeah, don't mail Brian.
Don't email Brian.
He knows.
He knows.
All right.
Well, so Brian, that's it for the year, isn't it?
I mean, this is our last episode.
We're going to take a little bit of time off.
Yeah, some well-deserved time off.
Yeah, absolutely.
So thank you everyone for listening.
Kim, thanks for coming to join us this time. Brian, as always, thank you. And we'll see everybody next year. Yeah, see. So thank you everyone for listening. Kim, thanks for coming to join us this time.
Brian, as always, thank you.
And we'll see everybody next year.
Yeah, see you next year.
Thank you for having me, guys.
That was brilliant.
Yeah, you're welcome.
And if you're out there
and you still haven't filled out that form
and given us our feedback,
let us know.
The Google form link is at the top of the show notes.
All right, bye.
Cheers.
Thanks for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes. All right. Bye. the lookout for sharing something cool. If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live. That's usually
happening at noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken,
this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and
colleagues.