Python Bytes - #264 We're just playing games with Jupyter at this point

Starting point is 00:00:00 Hey there, thanks for listening. Before we jump into this episode, I just want to remind you that this episode is brought to you by us over at TalkPython Training and Brian through his PyTest book. So if you want to get hands-on and learn something with Python, be sure to consider our courses over at TalkPython Training.

Starting point is 00:00:17 Visit them via pythonbytes.fm slash courses. And if you're looking to do testing and get better with PyTest, check out Brian's book at pythonbytes.fm slash PyTest. Enjoy the episode. Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 264, recorded December 22nd, 2021. I'm Michael Kennedy. And I'm Brian Ocken. And I am Kim van Vink. Kim, welcome. You've been on TalkPython before, but not here. Yeah, that's right. I've done a couple of TalkPythons with you, including the one where you bravely submitted yourself to questions from your audience. The

Starting point is 00:00:55 other one, I taught them some small tools, so that was very good fun. I'm very much looking forward to this one as well. You know, both episodes you were on were super popular. One was about little automation tools and just cool stuff that people can pick up and use really easily there. And that was great. And the Ask Me Anything was surprisingly one of the more popular episodes as well. So thank you for being part of that.

Starting point is 00:01:12 And you've been part of the audience for sure. You've offered comments and feedback as we do the live show and we're recording. Basically so, yeah, to be honest. Yeah, but now here you are on stage. Thank you for being here. Tell people a bit about yourself before we get started. Sure.

Starting point is 00:01:28 I am a DevOps engineer at the moment. I'm also a move engineering based in South Africa, working with a home loan provider, a mortgage provider in the American sense. I've been probably doing Python for close on 20 years. So the fact that I've shaved means you can't see the gray beard, but I have been around for a while. The gray beard. We're going to come back for some good jokes at the end about this as well. Not your beard, but just beards in general. Awesome. That sounds like really fun stuff. So yeah, thanks for being here. Now, before we actually get into the main content of the show, Brian, I want to do something just a little bit meta.

Starting point is 00:02:06 So I went and pulled up or created a questionnaire for people. When we first created Python Bytes, we're like, all right, it's 20 minutes. The time of this episode is going to be 20 minutes. So we're just going to like knock it out, you and me real quick. And I think it's grown a little bit. We've done, we cover a little bit more detail. We've added a joke. We've added a few like little extra things. We brought done, we cover a little bit more detail. We've added a joke. We've added a few

Starting point is 00:02:25 like little extra things. We brought on guests like Kim. And is that, is that still in line with what people want when they signed up? So I put together a questionnaire here that just asked three simple questions. And I'd really appreciate if listeners could go to the show notes and just click on the link that says this three-question Google form or find it on our Twitter account or wherever, but it should be in your podcast player show notes right near the top. And they can just click that and fill it out

Starting point is 00:02:51 and give us some quick feedback on the idea of having a guest, on the length of the show, and so on. So anything you want to add about that, Brian? Just encourage people to give us feedback so we know? Yeah, I'd love to hear feedback because sometimes we feel a little guilty that we're running long, but I enjoy the, a little bit more in-depth conversation. We still don't go super deep, but I think it's a good, well, I'm flavoring the survey though. So forget

Starting point is 00:03:17 what I said. No, I'd love to hear feedback of what people think. Yeah, absolutely. Yeah. So people can give us feedback there. We'd really appreciate it. The way people seem to be feeling so far is they, they kind of like the link. They definitely like the guest format. Uh, so you're welcome here, Kevin, this according to listeners. Fantastic. Um, but yeah, I think, I think people are generally like him, but still like, let's just hear from everyone because I'm happy if, if a bunch of the people in the audience are like, no, we really want no more than 20 minutes. and my going on about this is actually making it still longer, then it would be great to know, right? So we'll go from there. And with that, you know, let's play a game.

Starting point is 00:03:54 Jump in the first topic. Yeah. I want to talk about Jupyter games. And the idea around this is IPython Canvas or IPyCanvas with Box box 2d i'll get a little bit more into it but the gist is um making making video games and small video games is one of the ways that a lot of us um uh started programming i know that was the that was the case for me uh and there they were not difficult games but it was difficult enough, these 2D engines. And some of that's lacking, and I haven't seen that in Jupyter before.

Starting point is 00:04:30 And Jupyter is an excellent platform for a lot of things, especially teaching with people that don't have computers if they use an iPad or something like that. So often they can still get access to jupiter through hosted systems um so this is a jupiter this article talks about um writing uh 2d games and mostly it's a 2d physics engine around a library called box 2d which is a c c plus plus type engine but it's something that you can access through python and the author yeah the author those kinds of physics stuff you know when people think of games they think of oh here's what i got to do to get the picture on the screen oh that's just to start like you need physics you need collisions there's like so much stuff that also gets done so this is really cool yeah things

Starting point is 00:05:20 like physics and gravity and collision detection and detection and like the examples on this page are great. But the person that wrote it is Torsten Bier. And he's one of the, I think he's got a library called PyB2D, which is one of two different Python accesses to this Box2D system. But it's pretty cool the uh one of the things i like about this article is that talk it has like lots of pretty examples but physics engines are even if they're built for games they can also be used for things like uh like a an engine simulation or even like airflow simulations so there's a lot of cool uses for this too, that are outside of games. Uh, but the, one of the incredible things is how small the programs can

Starting point is 00:06:12 be. So, uh, the, this, this article has a, a contained, like an attached notebook hosted notebook that, um, has things like angry shapes, which is like angry birds and a rocket game. And there's a color mixing game, which I was just fascinated by. There's like a bunch of colors drop into it. It isn't on the, it isn't listed on the article, but if you go to the example, it's kind of color mixing thing. And it's, it's only like 70 lines of code. And with that, you can have some amazing physics examples. And I'm pretty excited about this, actually. So I'd like to do this. You know, I think this makes a lot of sense in the notebook form

Starting point is 00:06:55 because you're trying to visualize certain things. And sometimes graphs are fine, but other times they just don't capture like flow and that kind of stuff. And it seems like game animation would be great kim what do you think i was also going to say if you can get something very impressive done in 70 lines of code as a learning tool that's brilliant because that's effectively a screen of code yeah otherwise um you'd be looking at if you're looking at hundreds and hundreds of lines you know for a seasoned developer that's perfectly reasonable but to a new person that must look

Starting point is 00:07:23 overwhelming yeah yeah If you can fit a single screen and say, here is it. This is everything you need to make this thing work. It's quite a powerful tool. And it looks like a lot of fun, actually. It does look fun. Yeah. There's some interesting the article talks about some interesting hoops he had to jump through using

Starting point is 00:07:39 iPyEvents and iPyWidgets and Canvas to be able to draw things and get, uh, events from people. But, um, uh, this is just some fun stuff. Here's like the, um, we're showing on the screen, the, uh, thing like angry birds. Um, and to be honest, like the play ability of it isn't maybe like, it's not on the level of what you know playing an xbox or something like that obviously you probably won't hook up a controller do it yeah but um that you can do something like this so quickly is pretty amazing so i and also on the other hand if you write once you write it

Starting point is 00:08:16 yourself the playability actually doesn't matter that much i think it's you're looking at interacting the thing you wrote i think that yeah yeah i. I love it. This is really cool. Nice find, Brian. All right. Let me tell you about some really interesting cybersecurity side of things. So I'm going to first tell you

Starting point is 00:08:34 about this thing called a Thinkst Canary, but that's not actually what I want to talk about. It's just to set the stage. Okay. So here's a challenge, something that always stresses me out is what if somebody was to break into your app, into your systems, into your cloud infrastructure or whatever,

Starting point is 00:08:52 how would you know, right? Like what, what would be the indicator, right? If long, if they don't trash it, they don't, you know, lock it with a crypto lockers or anything like that ransomware, then they, they could just cruise around there, right? So this company Thinkst Canary created this, I think you can put it in the cloud as like a hosted container type thing, or you can get like a little Raspberry Pi like things and put them physically on your network if you had a physical network. And you could say you act like a SQL server, you act like an exchange server, you, if somebody tries to search the network and says, show me all the active directories, you be that. Maybe we're not even using active directory

Starting point is 00:09:28 because we're not on Windows. But if somebody breaks in, they may well start looking for those types of things. And what they'll do is they'll trigger alarms if somebody tries to interact with them and normal things shouldn't, because only if you're like trolling around looking for them, should it be discovered, right? So that's what this is. And with this whole log for shell stuff that's going on, it's just such a nightmare of like, well, we installed this app that did invoice management for us. Did it have a log for shell vulnerability?

Starting point is 00:09:56 I don't know, maybe they said they fixed it. But if somebody gets in, it's not just we have to patch the log for shell or the log for J version. We've also got to then know what else has been run because they could have installed whatever, right? Yeah. So the thing I actually want to recommend to Python people is this thing called canary tokens. So check this out. This is fantastic. So what you can do is you can get different things that will then trigger alarms

Starting point is 00:10:21 like emails or other sorts of stuff to you. So I can come over here and I can say, I would like to get a URL. And if anybody visits that URL, send me an email and say, you know, whatever message I put in here. So I could come in and say, here's a URL and send me at Michael at TalkPython for my email and say, this is hidden in the admin section unused or something like that. If somebody sends me an email, if I get that email, somebody's gone in and clicked that link in the admin section of my site. And if I didn't, it gives you like IP address and all that sort of stuff of what comes back.

Starting point is 00:10:55 So if I didn't do it or it looks like an unknown IP, that should be highly concerning, right? So what else? That URL is interesting. I can get a dns token somebody requests like does a dns look up on um rollouts.pythonbytes.fm i can get an alert to that that'd be pretty interesting um a unique email address if somebody ever tries to contact that a word document so you get like a word document and put it in say like sharePoint or something dreadful like that. And if it gets opened, you'll get an email that somebody got that.

Starting point is 00:11:28 Let's see. You've got VPN wireguards file. You can create a custom EXE. And if somebody runs your EXE or a SQL server instance, or you can even do like directly a log for shell link that will run. So if you are trying to like figure out, just put stuff in there to let you know if somebody gets into a part they're not supposed to be in. This is really cool. It's free. It doesn't cost anything. It doesn't require

Starting point is 00:11:52 any setup. Put a Word document in a folder. If it gets opened, let us know. What do you think? I was going to say, I've been looking for ways to do exactly this kind of thing because I'm totally unique in being concerned that Log4Shell has got impacts that I can't see on our systems. Just because your public-facing systems happen not to have used log4shell things doesn't mean that you're necessarily safe.

Starting point is 00:12:15 All it means is that if by some other means somebody's got into one of your internal systems, you wouldn't necessarily know that. So I'm very much interested in this i i knew about canaries already um things happen to sponsor the the local south african pycon za conference um but i canary tokens are a very funky additional add-on to that exactly i knew about the canaries as well i'm like ah but that doesn't really apply to the world that i live in i'm not like an enterprise but like this these make a lot of sense and they're free, which I think is cool. Yeah. Here's what it looks like. If you get a notice, it says, this is the email I got. Your Canary token was triggered. The channel was HTTP. The token was that this is a test, the IP address of the person. So this was one of those URLs.

Starting point is 00:13:00 Somebody interacts with this URL. Let me know. Here's their user agent. Here's the message. There's the IP and so on. So you would just get a notice like that that says somebody clicked on something they shouldn't have had access to yeah so anyway pretty neat brian yeah i'm not sure yeah it's it's actually pretty cool um some of the things i didn't think you could i wouldn't even expect like can somebody cloning website. Yeah. Didn't know that was a thing. I'm scared not to be honest. I didn't realize that was something I should be worrying about. Get an alert when a MySQL dump is loaded.

Starting point is 00:13:35 Like, okay. Like how does that happen? I don't know, but that's pretty awesome that it's possible and also frightening. Yeah. Yeah. And Sam out in the audience says, ironically, the log for shell might have its own vulnerabilities. You know, that thing's been patched a couple of times. It's going to be a big, big problem.

Starting point is 00:13:51 Anyway, canary tokens. I think this is broadly useful for Python people. You could put the URL stuff inside of your app. You could put an email inside of locations. There's lots of stuff like the database restore type things and so on this this looks useful yeah so i'm still a little lost you throw this like for instance like you said in the admin section that you shouldn't be using and you just know about it so you don't click it or something yeah so imagine this imagine you've got um in your admin section you've got a like a search

Starting point is 00:14:21 for user button and then next to it you you could just put an export all data. Yeah. And then put one of these URLs at the endpoint. And nobody who works, you just tell everyone, never click the export all data. It doesn't do anything. But if someone were to break in, what's the first thing they're going to want?

Starting point is 00:14:37 Oh, well, let's get the export all data. Boom. They'll go click it and you'll know. They're still in. It's bad, but at least they're not in and just have unlimited time to be in you know yeah you can put some other stuff too like let's say you've got a django website and you stick uh you you load a like a php admin page or something like that um just at the same url

Starting point is 00:14:56 in case somebody's trying to grab that or something yep yeah a lot of a lot of interesting little uh breadcrumbs you can leave in there. Okay. Kim, that brings us to yours. Sure. The first topic I was going to talk about are actually two similar, but not quite the same pieces of software by PyAutoGUI and PyWinAuto are both toolkits for automating GUIs, effectively automating GUIs for interacting programmatically with GUIs. Nice. Which is normally really hard, right?

Starting point is 00:15:27 Hey, before you go on, before you go on, could you give that like three control pluses? Just for the watchers. Just now it's a little bit on the small side. Thanks. How's that? A little more. Space to play with. There you go.

Starting point is 00:15:38 Fair enough. Well, let me just, while I remember, do it to this one as well. They both happen to be read the docs documents. So you're quite right. The programmatically controlling a GUI, it can be quite a pain, particularly for GUIs that aren't particularly easy to understand. And the reason I bring tools like this up is that there's quite a lot of use cases. I can think of two examples off the top of my own career, and I'm sure there's hundreds more, where this kind of thing is useful and you might not know it's something you can do. And the kind of examples I'm thinking of are particularly in, I'm sure, much enterprise

Starting point is 00:16:11 and in industrial software. When you get a piece of equipment, you frequently get a GUI tool that accompanies it. Probably no API, right? Well, no API whatsoever. There's a tool you fire up and you set all the settings. But because the company that supplied you the piece of equipment, they don't write software. It's not their thing. They either outsource the tool or the intern writes it.

Starting point is 00:16:31 And it has 50 checkboxes laid out in grid form. And you need to set it up every single time you want to use that piece of software. There's no ability to remember what you set. There's nothing to do. And I've worked with a couple of those systems. And I see, Brian, I think you probably have as well. We basically, there's a piece of paper next to the computer the software is on

Starting point is 00:16:49 with a screen print of what the settings should be so that the poor sucker has to come down and use it, knows which of the 50 tick boxes to check and then they have to check that the pattern effectively matches on screen and then they hit run. And something like PyAuto GUI or PyWinAuto are both useful so that you can effectively script the startup of that app.

Starting point is 00:17:06 And you can say to your right, a small piece of Python that fires this tool up, identifies all the checkboxes, ticks the ones you've programmed in, and then either leaves it for the human to push go or whatever it is the app does, or for that matter, pushes go itself and then closes the app and records that it did that. So that kind of use case is very powerful. And I think there are lots of cases, particularly in enterprise software or internal software that somebody wrote for the company that does something very useful, but it's been around for 20 years and the guy who wrote it is not around. Nobody wants to touch it because the source is terrifying. So nobody's going to sit down and change it. How do you even get that Visual Basic 6 or Visual Basic 5 installed again? Well, exactly. You don't even know, right? How do you even compile it now? Exactly.

Starting point is 00:17:47 So to be able to wrap it is a very powerful thing to be able to do. And the other kind of use case that's somewhat related, it also comes to mind, is I've spent a large amount of my career doing industrial automation, factory-based type work. And there, the faster you can go and the fewer steps you need a human to repeatedly do, the better for you in many ways. The human's time is best spent actually manipulating objects and checking things rather than opening pieces of software and clicking boxes and closing them again. So quite frequently, we've had cases on the production line where the vendor of the chip we're using has supplied this tool that does some security-related thing. And it's a GUI tool. And every single time you would have to open it up you'd have to click the same two boxes you'd have

Starting point is 00:18:28 to say yes secure this chip close it again repeat wait for another one to arrive at your at your workstation and if you can automate it again with a wrapping tool nobody need even be involved at all effectively part of your production process is you wrap it you fire up the tool you click the two buttons programmatically you hit go and you close it again and repeat. And again, I personally have encountered situations where that's useful, and I'd like to, I would imagine I'm far from alone in it, so I just thought I'd mention these things do exist.

Starting point is 00:18:55 I suspect lots of people do use them, but for people who don't know they're there, very useful things to be able to do. Wrapping GUIs is, it's a bit tedious up front because often these tools aren't very well written. So you'll have checkbox one, checkbox four, checkbox 27, checkbox 295, and no obvious naming consistency with what they do or how they work. But once you've figured it out, let the computer worry, let the script worry about what those checkboxes do.

Starting point is 00:19:20 I've seen the backside of that code where you're like looking at some event handler and it's like if checkbox 24 dot checked then do this like what in the world like who exactly didn't want to name this because they got a program against those names that's insane well they just do one at a time when you're working on exactly it yeah yeah so you're working on one feature and you go oh i need a checkbox checkbox. Oh, the default is checkbox 24. Then you look for the, you deal the callback handling and you just, you just did it. So, you know, it's 24. So you don't want to bother changing it. That's cool. Is this necessarily have a user interface the the thing that this doesn't i don't think these do like web stuff the web automations other tools well i presume you could automate a browser but i mean by the time you're doing that you might as well be using

Starting point is 00:20:14 the the tools designed for it yeah yeah selenium or something what i what i'd really hope is anybody that has any sort of tool that they're writing in uh in on a web so web frameworks often get internal tools get written into web frameworks and uh and then people forget to throw ids in things so yes the the best way to automate a web stuff is to have an id that you can grab onto but often they're just these in these nested div nightmares but anyway um yeah there's a couple tools that we've used uh by win auto for that are it's pretty nice yeah very nice yeah it seems like if you're building a gooey app you could test it with this right sort of full-on integration tests from the outside and also i was talking

Starting point is 00:20:55 to somebody and they were like well this app that i work on it doesn't have like a concept of a back button so you drive drive into the menu hit a thing go and then it'll take you back home it's like 10 steps right i could definitely see a little toolbar thing you press a couple buttons like get me to this scenario and i'll put the last thing in get me to that scenario like do the nine steps i'll do the tenth exactly yeah yeah in many ways what the way i've mainly encountered it has been that the first scenario laid out not so much actually automating the full of the tool, but setting the tool up so that it is in the right state for what the company needs without having somebody have to either consult a document and risk getting it wrong or not know which of the settings they should have because that piece of paper isn't with the computer anymore. All that kind of thing. It shouldn't happen, but it does. And it's much easier to have this kind of, to have the computer worry about what the settings should be.

Starting point is 00:21:45 Ideally, the program should remember that, but, you know, if they don't, they don't. It's not much you can do to change that off of the fact. It's like external intelligence for a bad app. That's right. Well, there's also like API stuff that people forget about. Like, I've got a device that I need to automate connecting it to Windows

Starting point is 00:22:04 and getting the device set up or something every time I plug one in. And, you know, just automating that works sometimes too. So anyway. Oh, yeah. All right, Brian, over to you. Thanks. I saw this, Brett Cannon wrote an article

Starting point is 00:22:21 called a reverse chronology of some Python features. And I really love this article. It's pretty simple. One of the things I like about it is just because we cover so much and we've been covering Python releases for quite a while. I kind of forget which releases got, I got which feature in. So a, a really brief, you know, rundown of some of the different features is, is nice. Like, like last week we were talking and saying,

Starting point is 00:22:50 well, well you're on, if you're on three, seven, why would you want to move forward? And I, you know, I can't remember which features in which.

Starting point is 00:22:56 So having a quick bullet list, like like in three 10, we got the match statement. Of course, we've talked about that recently, but also better, better error messages. And I'm going to pause a little bit. Brett brings up in the introduction discussion

Starting point is 00:23:12 that if you're kind of one of those people that think Python's kind of getting bloated and they're throwing too much stuff in it, and I wish that we had the good old days where you could just think about all Python in your own head, well, you kind of throw everything out. And I wish that we had the good old days where you could just think about all Python in your own head. Well, you kind of throw everything out. If you if he said he recommends going down this list and picking the first feature that you don't think you could live without.

Starting point is 00:23:38 And and everything before that led to that. So you can't throw that stuff out either. It all kind of goes together. And one of the examples is the, um, uh, the match statement or the, um, uh, what are they pattern matching that, um, that was sort of controversial, but the, um, the, the code to get that to work involved or the process involved, even like making a new, uh, parser for Python, um, or using a new person for Python. And, but with that new parser, then things like better error messages are possible. So, uh, if you like better messages, which I do, that means three 10 and everything below kind of has to stay. Um, but anyway, it's kind of funny. The moving on,

Starting point is 00:24:19 I, like, I forgot what the dictionary support for, uh, or equal, that came in in 3.9. So if somebody's thinking, well, why should I upgrade? This is a good list to take a look at. Nice. All right. I did the little exercise. I've decided 3.7. 3.7 if you want.

Starting point is 00:24:39 So what was the thing in 3.7 that you can't live without? So the dictionary preserving order yeah stuff is really nice for like reading writing files and making sure that they don't um diff hard you know what i mean if you try to like so they're in the order you put them there all the other stuff i'm not hating on it like i like the walrus operator i like some of the other things i like the lowercase list bracket int rather than importing types all those are great i'm not knocking them i'm just saying like where would i go oh this it starts to hurt where it really starts to hurt for me at three seven and below well i was i was trying a jupiter like jupiter an interactive

Starting point is 00:25:14 jupiter system the other day looking at some data science stuff and it was already set up and i tried to throw in this um the uh the f string value equal thing to be able to quickly debug a item and it didn't work too soon what the heck and it turned out it was using three seven and not three eight um and apparently i'm very used to that and i don't think i could live without it yeah but and then uh reminder also that uh 311 when it comes out in a year um it's um there's gonna have a lot of speedups. Yeah, if that comes with a lot of the performance stuff, then that's my new number.

Starting point is 00:25:49 Jim, where are you? If you forced me to roll back, I would refuse to go further than 3.6 because I must have those F-strings. Yeah. Because they basically just make your code so much more attractive. That said, while I don't necessarily use

Starting point is 00:26:04 everything that comes in the new versions, I don't particularly have any problem with them being there. I'm quite happy to just use the Pulsar Python I want. And what really happens to me is that I don't necessarily know I can do something until two versions later. I probably only started doing that val equals on 3.9, for example. Mainly because that's probably the first time I needed it more than anything else. I don't particularly rush forward and use the new features when they're

Starting point is 00:26:27 available, but I'm glad that they, when I do ultimately want them. Yeah. Three, six is an interesting example that you bring up because it's got F strings. It's got a whole bunch of other stuff too,

Starting point is 00:26:36 but really we can stop with F strings. Pretty much. Yeah. Yeah. Yeah. And then the, the debugging stuff, Sam and audience says, yes, F curly bracket name equals is indispensable for a debug., yeah. Yeah, yeah. And then the debugging stuff, Sam, the audience says,

Starting point is 00:26:45 yes, f curly bracket name equals is indispensable for debugging. Oh, yeah, I'm with him. As I say, I hadn't used it when it first became available, but I would really not want to not have it available now. Yeah.

Starting point is 00:26:55 I'm a caveman print debugger. So, yeah. Kim, I like your take on it. Like, it's not going to hurt me if I don't care about it. I think one of the powers of Python is that you can be very successful with python with a partial quite partial understanding of what it even is you don't need to know what a generator is what a yield is like what an expression is what a class is maybe not even how to create a function just just write the

Starting point is 00:27:20 code top to bottom and it'll probably still do something for you. And so you can sort of bring these in when it makes sense. Yeah, I would definitely still not teach match statements to beginners. It's unnecessary. Exactly. Yeah. Totally agree. Whereas I would use if strings, for example, for a beginner because it's just so much more readable than the other stuff is. But you're right.

Starting point is 00:27:40 You don't have to magically use it all because it's there. I'm sure there's people out there who feel like, I've got to use it. It's here. But no, I agree with you. All right. You don't have to magically, you don't have to use it all because it's there. I'm sure there's people out there who feel like, I got to use it, it's here. But no, I agree with you. All right. I don't think I've ever written a WordPress operator, for example. Sorry, you're saying.

Starting point is 00:27:53 Yeah, I actually took down a TalkPython website or the training website, one of them with the Walrus operator, because I put the Walrus operator in a utility script that's not actually used by the site, but the site scans all the files trying to figure out where the handlers the view methods are and it it killed it because i forgot that this is way back when it was still running 37 so that was my my first really oh my

Starting point is 00:28:16 gosh but yeah now i use it it's good all right so i want to talk about something that i've actually personally been working on lately this is a follow-up to a TalkPython episode I did where I interviewed Mike Bayer, came on, did a great job, talked about SQL Alchemy 2 and so on. And I mentioned that just the way that Python's GC is set up is it's somewhat hostile to things like ORMs where they have to create a bunch of objects and return them to you in one batch. And what do I mean by that? Well, if I'm going to do a query and it's going to return a thousand records, like the best case scenario is it has to create a thousand classes, SQL alchemy models, and give them back, right? If I'm asking for them as a list. Well, the way the GC

Starting point is 00:29:03 and Python works, not the reference counting, but the garbage collector is after 700 allocations of container types, classes, dictionaries, lists, et cetera, that do not get cleaned up 700 surviving over the cleanups over a period of time, that's going to trigger a garbage collection. And so I said, ah, you know, like, is there something you could do? Is there something we could like kind of think about with ORMs? This is not at all specific to SQL Alchemy. This is happens. I have a, an example here called Python's GC and ORMs as a app and a little conversation on GitHub.

Starting point is 00:29:34 And I said, is there something we maybe can do? Or have you guys thought about it? Cause I don't really sure what the answer is and said, not, not so much. Sure. But here, check this out. So I created this app. It creates a000 records in both a SQLite database and a MongoDB database. So we have like two really different examples. And then you run a query that returns 20,000 records. It's probably a lot. Just in that sense, you've been in the next 100,000 records.

Starting point is 00:29:58 Yeah, if I didn't say that. 100,000 records in the database, and it gets 20,000 of them back. Okay? It's probably a little extra, but for example, if you go to, you go over to the talk Python training site over here, we've got a site map. And in this site map, there are many, many holding down the page down arrow and you can barely see it. We've got to get like 5,000 records, 6,000 records just to like list out the number of the pages that contain transcripts for the sitemap, right? So it's not entirely unreasonable that you would get a bunch of records back and then

Starting point is 00:30:33 do something like render a page with it, right? Well, under this scenario, if you just run straight Python, that single query results in 1,859 garbage collection runs just to get one answer back. Is that insane? None of which is garbage. Yeah, it's not garbage yet because it's just being realized from the database, right?

Starting point is 00:30:55 Like it hasn't even come into existence all the way yet. And it's just like garbage, garbage, garbage, garbage, garbage. And it takes 900 milliseconds. If you go and you tweak it in a way that i described here which you may or may not want to do but if you did if you tweak the garbage collector it will go from 1800 collections to 29 64 times less the speed of the program is 23 faster okay but it also uses less memory. Okay.

Starting point is 00:31:26 Less garbage collection. Less, lots less garbage collection. And it's not just 1800 versus 29. Python has this 100 to 10 to 1 ratio of Gen 0, Gen 1, and Gen 2 collections. And Gen 0 collections are pretty cheap because it just touches new memory and looks at it.

Starting point is 00:31:44 Gen 1 looks at like stuff that's only been inspected once. And Gen 2 inspects the entire memory space. For it to see, right? So this one will also trigger, what is that? 185. Yeah, 185 Gen 1. So 18 Gen 2s, right? So it's not just, oh, there's fewer.

Starting point is 00:32:04 There's also like this other 29 here. This is 0 Gen 2 collections, verys, right? So it's not just, oh, there's fewer. There's also like this other 29 here. This is zero Gen 2 collections, very likely, right? So it's not just the number. They're also like cheaper than doing that. So this is pretty interesting. What do you got to do? You just say you run less frequently on allocations and then leave everything else alone.

Starting point is 00:32:23 Does it make a lot of sense for absolutely everything? Probably not. There's probably some scenario with lots of cycles that this is a problem. But anyway, this is an interesting thing to sort of consider if you are doing some kind of API or a website or something that queries a lot of data, over 700 records, basically, you're going to absolutely encourage ec when you know it's not garbage right so i don't know um i thought this was interesting i'll put it out there for people to play with and uh get some feedback it should be fun to hear about it i think this is very interesting um and i uh i'm going to i mean i plan on playing with the garbage collection myself so i'm glad you have this little sample app thing up to start playing with it.

Starting point is 00:33:07 One of the things that you can do that a lot of people don't mess with too much is not slowing down the frequency, but you can disable it and enable it. And I'm not sure. I'd like to play with that a little bit more to see if you can kind of kick it off or something like that. You can disable and you can call GC collect if you need to. So like it's there. I'm not sure if it makes sense to do it, but the switches are there. Yeah, I mean, there's I mean, there's times where I mean, you're not going to get real time with Python, but you can you can get there's times where you know that you're not doing anything else. So garbage collection is fine. And there's times where you're doing an event and you really want to get done with this as fast as possible. So it might make sense to turn off GC.

Starting point is 00:33:51 Sure. And for people who are not super focused on this, turning off garbage collection or altering garbage collection only affects a very small portion of Python's memory. Because the primary way is reference counting. So reference counting, things stop referring to it it goes away only in the case where there are cycles does gc even apply right so that's actually unless you've got really interesting algorithms that are super focused on that kind of stuff you know you probably don't even have cycles or very rarely do you yeah interesting it's not a one size fits all solution but where it does fit it's a pretty simple thing to do that really makes a heck of a difference yeah it's it's quite interesting so my musings was well maybe someday python will have an adaptive gc where it runs a certain number of

Starting point is 00:34:37 times and says oh i ran but i didn't actually find any garbage any cycles so let me back off that threshold by a factor or two and then i didn't find any garbage again so cycles. So let me back off that threshold by a factor or two. And then I didn't find any garbage again. So I'll back it off. And then I'll look, I found a bunch. So now we got to start doing this more frequently. And I could see like an adaptive garbage collector

Starting point is 00:34:52 turning these numbers. But until then, I just cranked it up. Yeah. Interesting. All right. Yeah. Kim, you want to take us out of here for our main topics?

Starting point is 00:35:01 Sure. The other topic I was going to talk about is a tool called Docker Slim, which basically is... It already sounds good. I don't know what it does, but the opposite of already is. I want my Dockers to be slim. Let's do it.

Starting point is 00:35:15 It's effectively, as far as I can tell, well, not quite magic, but it certainly seems like it. I use Docker quite extensively at work, and because I use a fair amount of it at work, I started using it for a lot of personal stuff as well. And the websites I deploy in my own writing, little things running my own systems are all in Docker containers. And unless you take a lot of care about it, your Docker images can end up quite large. If you start with just a Python in an Ubuntu base, for example, you're probably looking at about a gig of Docker image before you get anything done. Now, the way Docker works, unless you have just one of those things,

Starting point is 00:35:49 if you've got more than one, you start to benefit from shared layers. So you're not having a gig and another gig and another gig, et cetera, but still it all kind of adds up. Docker Slim is a tool to basically look at your existing images, do some analysis and give you back a much smaller and in many ways much more secure image um i have run this i read earlier today just to kind of check that i wasn't misremembering from the last time i used it and i fed it an image i had which was an incredibly simple small little floss api app i had written and it had one job it basically whenever you sent anything to an endpoint it printed out what that was uh forget exactly why I needed that. I think I was having trouble figuring out some

Starting point is 00:36:28 suppliers. It wasn't documented how some supplier's web was going to work. So basically, I set this up and I said, talk to me, and then looked at what it said. Exactly. That's way better than trusting their outdated, crappy, inconsistent documentation. It's just, all right, why don't you just call it? We'll just print out the JSON document. And then we'll go from there. So yeah, as a side note, that was quite an easy thing to do. But that was, I just put that into a Ubuntu-based container

Starting point is 00:36:55 running, I forget exactly what. Presumably, I was using FAST API. So it would have been Python and Ubuntu and FAST API and et cetera. And that was about a gig of, of image. I fed that to Docker slim and I ended up with 48 makes. Um, and it still worked. It did everything it was supposed to do.

Starting point is 00:37:11 I've granted, I fed the simplest thing I had. I mean, at one end point and so forth. I have, there's a lot of dependencies. There's Python, there's flask, maybe there's even micro whiskey or something running there. Who knows? But yeah, well, exactly. Um, what it has done is it's closed down all sorts of other angles of attack.

Starting point is 00:37:28 It makes it sound a bit dramatic, but all sorts of ways that you could interface with the container that you don't necessarily need. It no longer has, for example, a... Bash is no longer available and you can't run it in interactive mode and talk to it, which is not necessarily a 100% positive thing. It makes debugging a bit harder, but they do have some solutions for how you can do that with side containers and talk to it in other ways and the like. And if you go through their documentation, effectively, they're doing all the security stuff

Starting point is 00:37:56 and the app-ommering stuff and all sorts of things that I know are important, but I don't know enough about to do right. I don't trust myself to do those things correctly. I can basically follow someone's suggestions, but I have absolutely no way of knowing if the suggestions I'm following are valid. I'm not immersed enough in this world to know what the best thing is to do. So I'm much happier to have somebody come along and say, we've written this tool, we get this stuff. We'll do the best we can to make it more secure. Even if it isn't 100% secure, it's far better than I was going to achieve my own. And I haven't used it enough to get a 100% recommendation that this will fit every use case.

Starting point is 00:38:29 I'm sure like every tool, there's things that does well, there's things that doesn't do well. There's some use cases where it's maybe not so suited. But just from a little bit of experimentation with it, it looks like something I'm going to be inserting into my tool chain where I can, because the smaller the images are, the better, really. Especially if we're all working from home,

Starting point is 00:38:44 we're putting these things down from servers that aren't actually in the building that you're in anymore. And if you're doing continuous deployment, which means pushing those actual images, then you want to build that quicker. Yeah, cool. Very nice. Yeah, one of the things that

Starting point is 00:39:00 Docker's used for that I think a lot of web people don't think about is cross-compiling. That's one of the places where Docker shows up. And it's one of the places I use it is to compile on a machine that I don't have access to. So I can have a Docker image, like I can have a Windows machine with a Linux Docker image or something, and I can do compiling in there. So slimming that down speeds up my compiles or I conceptually would. So I think this is something that definitely to try

Starting point is 00:39:29 if you're using that. Exactly. You've reminded me in a similar vein, Docker is the basis of our continuous integration systems. The ultimate end result is built inside a Docker container with all the bits we need. That can take quite a while because it can be quite large. You can slim that down as well.

Starting point is 00:39:48 The faster you see I is, the better for you, really. Yeah, always. Yeah, absolutely. All right, well, Brian, I think that might be it. Time for some extras. Oh, I do want to do a quick follow-up. I thought these were extras, but they're actually not. They're things that I do want to point out really quick.

Starting point is 00:40:03 I actually gave a talk on this whole memory thing. If that GC conversation sounds interesting to you over at the Python web conferences here. So people can check that out and also have a talk Python class that like dives into a whole bunch of this stuff. Nice. I meant to include that in the before thing. Now we're at the extras.

Starting point is 00:40:19 Let's talk about that. What do you got? Um, the only thing, one of the things I want to shout out is to everybody that supported the, the PyTest book. So pragmatic, pragmatic, if you just go to the main page, there is a bestsellers link that has had a Python testing with PyTest on it

Starting point is 00:40:41 for many weeks now in the top six. And I just wanted to thank everybody that supported the book and helped the success of this. Also, the feedback that I got of the technical reviewers and plus many other people going through and submitting a RATA is going to make this a really solid book. And I'm really just happy to be part of a community to put this together. So thanks. Yeah, congratulations. That's awesome. Kim, you got anything extra you want to throw out there? A couple of small things I was hoping to mention if we had the time.

Starting point is 00:41:11 I see we've actually got MessWithDNS up on screen. This is a good place to start. I just wanted to mention this little website, MessWithDNS.net, which Julia Evans, who on Twitter is Bork, and she produces a variety of excellent webzines and so forth. I think you've actually, you've discussed her Git learning webzines before. That's the one. Yeah. And I think there's an HR friendly one whose name I can't remember.

Starting point is 00:41:36 Oh, shucks. The memorable one. Yeah, exactly. She released something I got. Yeah. She released messwithdns.net recently as effectively a way to play with DNS without breaking your actual website, which isn't something I'd ever thought to look for. But now that it's around, it's actually a brilliant idea. There are some hard-to-understand things based into DNS. And what is an A record and a C name? And if your TTL is a three-digit number versus a five-digit digit number, what's the difference or for that matter, what does TTL mean? And it's not necessarily an explainer for all these things, but it is a way to make

Starting point is 00:42:11 these settings and see what they do without actually breaking a real website. So effectively she's spun up a sub domain, um, with a assigned name. This one I happen to be on is goblin61messwithdns.com. The worst you can do is break goblin61.messwithdns.com, and that will then just go away for the next person who comes along. So it's actually a really smart, really clever idea. Typical to Julia's thoroughness, she's got a series of experimental suggestions on the side.

Starting point is 00:42:36 Here are some things you can try. Here are some tutorials. How about making a CNAME? Or here are some weird things you can try. What happens if you've got a very long TTL? Or you convince three different DNS servers that your subdomain has three different IPs. Why you would do that is a mystery to me. But what would happen if you did is something you can explore with this site without actually breaking your real website.

Starting point is 00:42:55 And this seems like a very useful learning tool. Yeah, absolutely. Cool. I love it. That's fantastic. Two other small things I just wanted to mention. One, just because I use it all the time and I don't know how common knowledge it is, it is possible in Python, and I don't have a webpage to open for this, to run a small little web server. If you do python-m http.serve or.server, I've gone blank on which it is now, to be honest.

Starting point is 00:43:21 .server. .server, yeah. I'm reading your notes. I don't actually know. I'm just going back to the is now, to be honest. Dot server. Dot server, yeah. I'm reading your notes. I don't actually know. I'm just going back to the notes to have a look myself. That effectively fires up a web server in the directory you open it in and serves up the files that are there or the subdirectories that are in there. There's no security.

Starting point is 00:43:37 There's no attractiveness. There's no styling. There's no anything of the sort. You wouldn't serve this to the public. But if you wanted to get a file off the machine, and I do this quite a lot to get files onto my phone, for example, firing a web server there and then and just pointing either a script or your own,

Starting point is 00:43:51 you know, just to send your browser to the local host with the port you gave it, and just download the files from there. It's a useful thing to be able to do. Yeah, that's a cool trick. Directory browsing, basically, yeah. Exactly. And then the final little extra I just wanted to talk

Starting point is 00:44:05 about, and this is a little more tongue-in-cheek somewhat, in both last week's Python Bytes and on recent Talk Python episodes, you have been speaking a little bit about different ways of doing Git. You were discussing doing all your Git on the CLI, and I think one of your

Starting point is 00:44:21 audience members at the last Python Byte suggested the way they do Git is just mash all the buttons they can find in VS Code. There is, I just want to put out there, there is a middle ground that you could be looking at. There's a tool called Magit, M-A-G-I-T, which is effectively, if you're an Emacs

Starting point is 00:44:38 user and you don't know Magit, you should change that immediately. Magit is effectively a brilliant way of doing, to me, a effectively a brilliant way of doing, to me, a brilliant, indispensable way of doing Git inside Emacs. Granted, it does mean you need to learn Emacs, but in just a couple of short years after that, you should be an expert at,

Starting point is 00:44:54 you should find Magit indispensable. So take a couple of years to learn Emacs. I'm not disputing that. But once you've got the Emacs down, Magit really is an excellent option to look at doing your Git with. So if you're tired of doing it on the CLI, just set some years aside,

Starting point is 00:45:07 learn yourself some Emacs, turn to Magit and then wonder how you ever did anything else. Set some years aside. I don't think that's fair to Emacs, but just a little bit of too much. I'll concede Emacs is a much longer learning curve than VI, but it's not Gears.

Starting point is 00:45:26 And I say this, I mean, yeah. Yeah, and Mario and the audience are taking credit for the VS code button matching. Right. Right on. Cool. Yeah, that's a great recommendation. Alright, is that it for your actions? I should just point out in terms of

Starting point is 00:45:41 being unfair to Emacs, I've been using it for more than 20 years and I find it almost impossible to use anything else. But I'm sure it didn't take me years to learn. It's just been a long time. That's right. Well, all right. I got a few throughout there. Actually, just one.

Starting point is 00:45:55 I made a comment, I think on the last show, Brian, about using emojis in my code. Yeah. So I wanted to bring that example up. So here's like a little CMS thing that I got going on. And if you return a collection, like themes are represented

Starting point is 00:46:08 by these little tags in the CMS. And if you return a collection, the comment has a list of emojis. And if you return, if they're just like

Starting point is 00:46:15 processing a single emoji, a single thing, you get that emoji. For pages, you get a list of page emojis and so on. Anyway, that's what I had

Starting point is 00:46:23 in mind when I talked about that. That's pretty cool use. Yeah. You can sort of just scan through. Oh, look, there's a list of these.jis and so on. Anyway, that's what I had in mind when I talked about that. That's pretty cool to use. Yeah. You can sort of just scan through, go, oh, look, there's a list of these. This must be doing a bunch of stuff. I don't know. I could probably come up with something like a modifying.

Starting point is 00:46:32 I'm going to change a theme versus read a theme or something like that. Yeah. Anyway, well, that brings us to the laughs. And I hope you all enjoy Schadenfreude because it's bad this time. Thank you, Log4J. okay so uh let's see first of all this is not schadenfreude this is just something about the cookies my daughter yesterday gave me this candle it has a website we use cookies to improve our performance

Starting point is 00:46:58 and then me same i just eat cookies i thought that was really just funny for like a tech candle that it should be it should be a tan candle. It should be a, it should be a tan of cookies though. I know it should. It absolutely should. At least it should smell like cookies. It says scented. I have no idea what scent it is, but it better smell like websites. Maybe. Maybe. And then I just want to point out more practically, I have this add on you can get for all the browsers. I don't care about cookies. And if it sees one of those cookie warnings, it'll try to click it and just accept it. Oh, this is indispensable.

Starting point is 00:47:28 That's brilliant. Absolutely. And then Brian Skin starts us off with the log4j stuff. So if you remember, if you're aware, log4j, the problem with log4j is if you try to log a piece of text, even as an argument,

Starting point is 00:47:44 if that text has J and D I colon L A D P L A L D A P colon slash slash to some Java library, instead of logging it, it will execute that Java stuff. Even if it's remotely on the internet, then it'll output the result of that, like you're hacked or whatever. Right. So we we've all heard of little Bobby tables, right? Here's the modern day one. Hi, this is your son's school. We're having computer trouble. Oh dear. Did he break something? Well, in a way, did you really name your son? Curly, you know, dollar curly J N D I colon L D A P colon slash slash Evil Corp, parenthesis, parenthesis, Bobby. Oh yes, little Bobby Jindy, we call him. Well, we've got our servers crypto locked.

Starting point is 00:48:32 I hope you're happy. I hope you've learned to synthesize your log4j inputs. Isn't that fantastic? Yeah, I have a feeling that this is going to go on. It'll be the next, It isn't Log4J. It'll be something else next year. Yeah. Well, I mean, it's been there for 10 years.

Starting point is 00:48:50 Exactly. It's not a new thing, unfortunately. It's not even a vulnerability. It's just, wait, you can actually do that on purpose? It's a feature. And Brian helpfully suggests this actually came from Log4J memes.com. So we got to go there for a second. Well, of course that exists. Of course. And oh my gosh, like look at this picture. So Brian,

Starting point is 00:49:11 will you describe this person for me on the screen? There's a person in a saying next to him. Old white guy. To me, he looks like a perfect sort of grandpa sort of character, right? Getting up there, probably 70. Nothing wrong with the guy, but it says upgrading Log4J three times wasn't that stressful. Dave, 28 years old. What else have we got in here? We've got- I wish that was outrageously funny and not just kind of truish, but yeah. I know. Here's like a 1940s looking picture, like a dad and some kids hanging around. Daddy, what did you do during the great war? The log for shell incident. Let's see. There's a few of you go in here. How many days

Starting point is 00:49:53 since such and such accident? Zero days without log for JCVE. And there's like Homer running around with like a nuclear glowing stick. You can can spend some time in this place. It's, it's probably unhealthy. There's like a grim Reaper just going through taking out technology and it has a log for J on the grim Reaper, you know? Let me see if I can find one more that there's, there's some really good ones. This one is probably good. There's a picture of a guy in a tuxedo says vendor, not vulnerable to log for4j but there's a mirror and you see the back of him his clothes are just all god it says uses eol yeah j4 yeah that one's pretty gross i want to get that on the screen but yeah they're these are these are just fantastic here um so anyway people can check out the memes thanks brian for sending that in brian skin yeah yeah i can say i i am reminded i did see one the day, I don't know that I could put it up now,

Starting point is 00:50:45 but it's effectively that I'd just seen it in various other means, a chap receiving an award from his manager. So, you know, me receiving an award from the manager

Starting point is 00:50:53 for not being vulnerable to the log4j vulnerabilities. And the inside thinking, that's mainly because I chose not to log in. I completely forgot to log anything. Exactly.

Starting point is 00:51:02 Oh, that's really good. Yeah, I hear that tweet. Today, Java runs on billions of devices. It's not a statement of pride, but a statement of pure terror. All right. Well, I don't want to hit on Java too hard, but the log4j, I just cannot believe somebody thought it's a fantastic idea to execute remote code that you cannot escape. From a logging system. cannot escape from a logging system yeah in a logging system it's just what did you think you would get so here we are yeah with

Starting point is 00:51:32 log4jmemes.com if you want to scroll through it let's back up and say somebody thought writing an application in java was a good idea no sorry No, sorry. I'll get hate mail for that one. Yeah, don't mail Brian. Don't email Brian. He knows. He knows. All right. Well, so Brian, that's it for the year, isn't it?

Starting point is 00:51:53 I mean, this is our last episode. We're going to take a little bit of time off. Yeah, some well-deserved time off. Yeah, absolutely. So thank you everyone for listening. Kim, thanks for coming to join us this time. Brian, as always, thank you. And we'll see everybody next year. Yeah, see. So thank you everyone for listening. Kim, thanks for coming to join us this time. Brian, as always, thank you. And we'll see everybody next year.

Starting point is 00:52:09 Yeah, see you next year. Thank you for having me, guys. That was brilliant. Yeah, you're welcome. And if you're out there and you still haven't filled out that form and given us our feedback, let us know.

Starting point is 00:52:17 The Google form link is at the top of the show notes. All right, bye. Cheers. Thanks for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. All right. Bye. the lookout for sharing something cool. If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live. That's usually happening at noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #264 We're just playing games with Jupyter at this point

Topics covered in this episode: Jupyter Games Canary Tokens A reverse chronology of some Python features Hyperactive GCs and ORMs/ODMs Extras Joke See the full show notes for this episode on the... website at pythonbytes.fm/264

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Python Bytes - #264 We're just playing games with Jupyter at this point

Topics covered in this episode: Jupyter Games Canary Tokens A reverse chronology of some Python features Hyperactive GCs and ORMs/ODMs Extras Joke See the full show notes for this episode on the... website at pythonbytes.fm/264

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.