Python Bytes - #391 A weak episode

Episode Date: July 9, 2024

Topics covered in this episode: Vendorize packages from PyPI A Guide to Python's Weak References Using weakref Module Making Time Speak How Should You Test Your Machine Learning Project? A Beginner...’s Guide Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/391

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is episode 391 recorded July 9, 2024. And I am Brian Ock. And I am Michael Kennedy. This episode is sponsored by Code Comments, an original podcast from Red Hat. Listen to their segment later in the show. You can connect with us on Mastodon, of course, the links are in the show notes, we're all on Fostodon, but you can get to us from any Mastodon. You can also join us if you're listening to it. Later, you can join us live by going to pythonbytes.fm slash live via part of the audience. Or if you just want to see the show later. It's, it's all there. It's usually 10 a.m. Pacific time on Tuesdays,
Starting point is 00:00:48 but sometimes it changes. But if you go to that live thing, it'll tell you when the next one is. And finally, I'd really like to thank a lot of the people that have subscribed to the mailing list, the newsletter. If you go to our pythonbytes.fm, there's a newsletter link. And we'll send
Starting point is 00:01:05 you all of the links of the stuff we cover every week and we'll just send it to your inbox. So even if you miss an episode, you won't miss any of the goodness. So speaking of goodness, what do you have for us, Michael? Well, you know how we all love PyPI and the ability to go and just pip install a thing and make it, make our apps be so much more right programming in python becomes like lego block clicking together not algorithm class yeah you know what i mean which is amazing it's amazing however there are issues that you run
Starting point is 00:01:37 into if you use third-party packages not the least of which is you now probably should have a virtual environment you definitely require some pip install commands just stuff all along those lines right just the management of setup before you can even run your app plus then any potential changes if people don't pin their versions then you're at the whims of what potentially can happen there so what some people do is something called vendoring or vendorizing dependencies. So, for example, if I depend on some functionality from requests, I don't know if that's a super good example, but let's say it is. I could just download the source code of requests, stick it into my code and use it. It's probably not a great example because it itself has a bunch of dependencies, but stuff that's kind of like pure Python, no other dependencies, you could have it as a third party package, or you could just stick the code of that
Starting point is 00:02:28 somewhere to your app and refer to it as a relative import within your app, right? Yeah. And you're then people just pip install your package, pip x install your package, or even just get a script and run it or a set of scripts. So there's this project from M. Williamson called Python Vendorize that I want to talk about. So it'll vendorize packages from PyPI, as I just described. So it allows you pure Python dependencies to be vendorized. That is, it'll copy that into your project. Best used for small pure Python dependencies to avoid version conflicts with other packages require different versions and so on so what you do is you set up a vendorize.toml file and in there you basically set up uh what is your
Starting point is 00:03:14 module where what sub section of your module do you want it to go into what sub module and then what pipe you i packages you want so example here is like a hello package. So they create a dunder, just not an underscore vendor folder. And then they say the packages are six. And once you run it, it'll create that underscore vendor folder. And it'll put the six dist info, the dunder, init pi, the six dot pi, all the stuff it needs to basically have that there. So then in your app, you can say from dot underscore vendor import six rather than having an external dependency. What do you think?
Starting point is 00:03:50 Kind of neat. How do you keep up? Yeah, I've got questions. Like how do you keep up with updates and things like that? Well, I believe that you just run a command line, right? Just run Python vendorize in the directory. And I don't know if it'll red-download it, but it will create it, so worst case, delete the folder,
Starting point is 00:04:10 run it again, and then that'll update it. So the whole point of this is you want less change than normal. You want to freeze it in a place. And things like 6 that don't really change. Or have gotten kicked out of the standard library. Yeah, exactly. A lot of stuff that's
Starting point is 00:04:26 really super stable um and pretty small right you don't want to because if if you do something that's got a bunch of dependencies you've then got to start doing their dependencies and then it gets really wonky right yeah for small things yeah actually um this is pretty cool um yeah neat neat well that's what i got for you the first first one, I got a bunch of other stuff, as you can see. No, I like it. There's projects that I've open sourced that it really wasn't intended for somebody to actually use as a dependency. It's like some example code
Starting point is 00:04:56 that happens to be pip installable, but somebody would probably take it and just copy it and run from there and using something like Vendorize would. So yeah, for sure. Cool. I would like to talk about something not as strong as this, but weaker. weak, so weak. So some weak references. So this is a article from Martin Heinz, a guide to python's weak references using weak ref modules so weak ref is a built-in standard library module and um i actually have never played with it and i kind of knew that
Starting point is 00:05:34 python must have weak references but i just didn't really explore it before now and this is a great introduction to just talk about what they are so if if I mean, we that the term weak reference might be new to somebody that's, that's, like, I don't know, maybe new to you. It's a it's a term that we talked about in like C++ and other things a lot, because of the using, I use it a lot in C++, but using strong references and weak references. Python also has strong references and weak references. A strong reference is just sort of a copy of something. But a weak references. Python also has strong references and weak references. A strong reference is just sort of a copy of something. But weak references is a way to point to something else,
Starting point is 00:06:11 but not muck up the garbage collection. So this is a great article. It talks about, starts out with talking about, sorry, starts out talking about garbage collection and how weak references are used with garbage, how garbage collection and weak references and strong references affect that. So why do we care? Well, it is used in things like the logging module,
Starting point is 00:06:38 for instance. So you have named, this is a cool example because you have named logging modules. Oh, where was it it this is an example anyway some named modules that you you named a logging module or a named logger sorry it's been a rough weekend and and then like if you wanted another one of the same name it might be there it might not so it's a caching sort of thing that how logging uses it. But there's also ways to use it as a like for trees, if you're building a data structure, where you might want bidirectional links between objects. One of the objects, it shouldn't be really hard links in both directions. So one of those links should be
Starting point is 00:07:22 a weak reference, like the the link between a parent and a child and a tree structure would be good like that or other things like he talks about he talked about using an observer building an observer pattern from the design pattern book using weak references just some really cool stuff i don't know i don't build a lot of data structures like there's enough data structures in python already but if uh especially if you're in a in a cs class or you have uh some special needs for data structure weak references are built in and they might help you a lot so yeah they're pretty interesting the only chance only time i've really played with them is for the python memory course that i created at TalkPython to understand.
Starting point is 00:08:06 Because you want to look at stuff and see, we did this, it's alive, we did this, now it's garbage collected, or now it's reference count deleted. But if you have a pointer to it, then obviously it's never going away. So weak references allow you to ask questions like that. I think you can do interesting stuff with caching too. For example, like if you've got a cache and you've handed out an instance of the object and it's still alive and people are still using it, the parts of the app are still using it, you could have a weak reference to it. And if someone else asks for you, like you can upgrade the weak reference to a strong reference, right?
Starting point is 00:08:37 And hand that out again without recreating it. But if no one's using it, it'll get cleaned up because a weak reference won't keep it around. So it's like sort of a self-managing cache type of structure. It could be fun to make too. But that said, I was thinking just like you, I don't usually make data structures these days. Python's pretty much
Starting point is 00:08:54 got something for you. Right, but you know, people are building, well, there's some third-party library data structures they use too and they probably use weak references. I just haven't poked into there to find out. Yeah, exactly. Let someone else do cool stuff with it for us. But I love the idea of
Starting point is 00:09:09 the logging module that uses named items, doing something like a cache named item thing. Yeah, very cool. Do you know what else is cool? Code comments from Red Hat. Yeah. This episode is brought to you by Code Comments, an original podcast from Red Hat. You know This episode is brought to you by Code Comments,
Starting point is 00:09:25 an original podcast from Red Hat. You know, when you're working on a project and you leave behind a small comment in the code, maybe you're hoping to help others learn what isn't clear at first. Sometimes that Code Comment tells a story of a challenging journey to the current state of the project. Code Comments, the podcast, features technologists who've been through tough tech transitions, and they share how their teams survived that journey. The host, Jamie Parker, is a red hatter and an experienced engineer. In each episode, Jamie recounts the stories of technologists from across the industry who've been on a journey implementing new technologies. I recently listened to an episode about DevOps from the folks at Worldwide Technology. The hardest challenge turned out to be
Starting point is 00:10:08 getting buy-in on the new tech stack rather than using that tech stack directly. It's a message that we can all relate to, and I'm sure you can take some hard-won lessons back to your own team. Give Code Comments a listen. Search for Code Comments in your podcast player or just use our link,
Starting point is 00:10:23 pythonbytes.fm slash code dash comments. The link is in your podcast player or just use our link pythonbytes.fm slash code dash comments the link is in your podcast players show notes thank you to code comments for supporting the show this one is make time speak from prasen prasen's been on the show before a friend of the show and also former co-guests co-host and the idea is it's a little bit of a human friendly way to refer to time you might know about things like i think it's arrow that has a humanized thing that says you know five minutes from now or in 10 minutes or just now those kinds of things but the way this one works is it talks in sort of colloquial way of saying the time. So you create a clock object and you give it a, a language to use like English, German, Swahili, I think all those things Dutch. And
Starting point is 00:11:14 then you can ask it, you know, what is 11 colon 15? It'll say quarter past 11 or a bunch of different times. What is seven 29? And well, it says that in Swahili, which I can't, I can't get that. I'm not going to get that right, but it'll convert time into spoken expressions in multiple languages, super easy to use pure Python, so you could vendorize it, I guess, and so on, even as plugins. So super easy to use if people want to check that out and play with it. This is pretty fun. Yeah.
Starting point is 00:11:43 I like it very simple. But if, if you've got a use case like that, you have a date time and you want it to say it in a more human version, well, here you go. All right. Nice. Yeah. I am going to cover a topic that I get asked all the time.
Starting point is 00:11:57 So I talk about testing a lot. And machine learning and AI is kind of a big thing now. So I get questions like, how do I test machine learning projects? And I got to answer, I have no idea. So I'm excited that somebody made an attempt at this. Here is a article called, How Should You Test Your Machine Learning Project? The Beginner's Guide by Francois Porcher. So and it's published in the TORS data science blog. Anyway, kind of a cool intro talked about, you know, some of the simple stuff.
Starting point is 00:12:30 I mean, there is like, how do you test machine learning? It's complex, but there are, there's a lot of pieces that are pretty straightforward to test. So cool introduction, had a project. This article also includes a repository that you can play with directly, had a project. This article also includes a repository that you can play with directly, which is nice. You just follow along with the code. So this is doing, what is it doing? It's essentials of testing with a machine learning pipeline
Starting point is 00:12:58 focusing on fine tuning BERT for text classification on an IMDB dataset. So that's just what he's using. He's using PyTest and PyTest-Cov, which are awesome things to start with. And so it kind of goes through, some of the easy stuff right away is starting with some of the simple things like,
Starting point is 00:13:20 has a clean text function. So a function within the source that takes a string and makes it all lowercase and strips it, but it might do other things too. But these are great examples. In a lot of machine learning stuff, you've got a lot of little helper functions along the way. You may as well go test those,
Starting point is 00:13:39 and it'll get you in the habit of writing tests too. And in this case, it's just giving some examples of some some random text input and what the clean output should look like. And these are your expectations of like, if I pop pop this data into this function, what should the output look like? So it's a great way to get started. I personally would have put this in a parameterized, but I guess we're trying to teach people slowly. These are really three test cases. They could be three test five functions, but it works. And so I'm referring to a test function that does a test for capital letter stripping and
Starting point is 00:14:14 removing extra spaces and what, how, how it should handle the empty string. And this is actually a good point. One of the things I test with interviews a lot is the edge cases for testing. So like what test examples are like derivative small cases that you wouldn't possibly think about. And it's important to test those too. Like what does an empty string get cleaned as? Or a good thing, like if I already had the word spaces in lowercase, how would that end up showing up in the output? Things like that. So good start. And then jumps up to higher level things.
Starting point is 00:14:53 He talks about a larger chunk of the script. So he's got a tokenized text function that uses a lot of sub pieces, uses the tokenizer with certain input. And how you test that? Well, this is a great example of just figuring out really some examples, some example input, and how you would expect it to be tokenized on the output, looking at the length and the shape of the result. And then, you know, making sure that, that not all values
Starting point is 00:15:27 are, I don't know what this would be. Oh, he's making sure that all values are torch by torch tensors. I don't even really know what that means. But, you know, thinking about what the output should be, if even if you don't know the specifics, you can have some way to describe how it should sort of look. And these are good enough tests, or they possibly are good tests to have anyway. So I think this is a good starting point to start a discussion on your team for how to add testing to a machine learning project.
Starting point is 00:15:59 Yeah, it's interesting. I really would have no idea how to test machine learning. It seems like black box type stuff. So yeah, this is a lot more to work with than I would have come up with, I think. Yeah. Just getting started, taking a chunk out of it, and then where to go from there. So after you kind of have a sense of some of the easy stuff, some of the middle level stuff of testing examples and shapes and whatnot, what's left?
Starting point is 00:16:21 Well, that's where a quick introduction to how code coverage works and looking at what the rest of your code is doing and that maybe you want to add tests to, or maybe those are the things that you manually test or something. So anyway. Yeah, excellent. Sounds good.
Starting point is 00:16:39 Well, those are our items. Do you have any extras for us this week? I am clean out of extras. Clean out? Well, don't worry, I'll make up for it for you. Okay. So, wonderful news from Authy. You know, Authy, the 2FA password thing that you can get for multi-factor authentication,
Starting point is 00:16:57 super nice because, you know, so many of the devices are locked to, or some of the apps are locked to one platform like Google, a Google authenticator. You lose your phone or you have to reformat or something. Sorry. Good luck now. You know, there's no syncing, things like that. But with Authy, you have an account. It syncs it across your different devices.
Starting point is 00:17:19 One device can authenticate another if you want to add a new one. It's really nice. Except. device can authenticate another if you want to add a new one. It's really nice. Except now Authy is urging users to stay alert after 33 million phone numbers were leaked. How? Well, there's an authenticated API endpoint, but apparently it would return an error that would indicate whether the phone number that you passed in to try to authenticate with was valid or invalid. Like, sorry, that phone number doesn't exist, or sorry, wrong password, something like that, I think is the deal. And so somebody just hammered
Starting point is 00:17:54 it with, you know, every phone number combination they could think of and recorded the results when it said that phone number exists. And we know that all the has it, and we know that all the has it and we know that you have 2fa and all of these things and so from what i could tell no real information about people was stolen but given that they know you have 2fa and they know that you have uh that this is your phone number they could start sending you all sorts of uh spoof things social engineering type things right yeah well and ath the recently cancelled their desktop apps you know off the bean uh tulio the parent company cancelled their desktop apps it just seems like it's really in a kind of a state of disrepair and lack of love and a lack of confidence in michael
Starting point is 00:18:38 at this point so i went through the super fun experience of resetting about 30 different 2FA logins. And boy, oh boy, I learned some things, Brian. I've learned that some companies make it super easy to reset. Cause my thought was like, if this is, you know, what else potentially has happened, I'm going to revoke all of my 2FA logins and set new secret keys that will generate new passwords. So even if they were able to get a hold of everything in my account, that stuff doesn't work anymore effectively, right? That was my plan. And it took like six hours or something, five hours. You go to different places and you'll see
Starting point is 00:19:15 some of them will let you, some have an awesome button, reset 2FA. Here's a QR code you scan. Boom, you're good to go. Others say your Google authenticator is enabled. Like what? I don't have a Google authenticator. There's like 50 apps that are 2FA apps. T-Mobile and like 10 of the other ones say, use your Google authenticator here. Like, no, it is not.
Starting point is 00:19:40 It's like, use your Internet Explorer 6 here. Like, no, there are other browsers. Please don't just say, use your Google Authenticator, right? But you can just go, yep, this is my Google Authenticator. It's called something else, and it doesn't come from Google. But sure enough, I'm going to set this up, right? And like Christopher out in the audience here, that is my next recommendation is, well, if not Authy, what? Because Google Authenticator is garbage. audience here. That is my next recommendation is well, if not off the what because Google authenticators garbage, like I said, you if you your phone gets messed up, you've lost all
Starting point is 00:20:10 logins forever. There's not a sync, at least last time I use it, there's no way to sync it or export it or any of that stuff. Right? That's bad news. So Bitwarden Bitwarden is awesome. This is a premium feature. So you have to have the premium version of Bitwarden, which is $10 a year or 80 cents a month or something like, yeah, fine. That's, that seems fair. But Bitwarden's cool. It's open source, multi-platform. You just scan, scan stuff or enter the code and that they give you for the 2FA and off it goes. And because it has a browser plugin, you can just click on your
Starting point is 00:20:40 name when it says type in your 2FA code. You don't have to go pull it up. You just click the button and boom, it auto fills it, which is great great I don't put it in my one password because I'm just not ready to say my 2FA logins and my passwords are all stored behind one single platform because then your 2FA is kind of toast if somebody breaks into that so Bitwarden for 2FA one password for logins for me at the moment. What do you think? Well, I'm using, maybe I shouldn't tell people, but yeah, I'm using Authy. So are they, they're still supporting it on like, aren't they? They're not supporting it on,
Starting point is 00:21:14 they used to have a desktop app. They don't have that anymore. They have a iPad and iOS and Android app. Since you have an Apple Silicon one, you can run the iPad version on your Mac, just like a desktop app. So it's kind of feels the same except for it doesn't have like, like the keyboards behave weirdly and stuff, right? Because it doesn't expect you to have a keyboard. Maybe you're using it a lot more than I am, but it doesn't bother me to run,
Starting point is 00:21:39 run it on my phone. But yeah, well, I mean, it doesn't bother me either, but I've got, there's like a bunch of different apps that I have, for yeah, well, I mean, it doesn't bother me either, but I've got, there's like a bunch of different apps that I have. For example, the credit card front end system for talk Python courses. It has a remember me button. It never remembers me. Never. It has a two F a it never remembers the two F a. So even if I say, remember me 20 minutes later, I'm putting in the password and the two F a and then 20 minutes later, I'm putting the password in the job. I'm like, ah, so there's like a few places like that that just constantly ask for the two FAA, um, digital issues a little bit like that.
Starting point is 00:22:13 Like if, uh, you, every single time you're putting in the two FAA, there's not like a trust this device sort of thing. And so I ended up, I'm probably in, used to be in offing now in bit warden. I'm probably in that like five times a day, at least every day. So anyway, so one final thing before I move off of this. After all of this, they said, I don't know how this helps. Doesn't seem like it should help, but somehow they said, as part of our recommendation to users, it's very important that you upgrade to the latest version of Authy.
Starting point is 00:22:41 Why? Because the endpoint ends on that, I don't understand. But anyway, it says you must. And then if you go and look at the upgrade, all it says for the, it says you must get version 26.1.0. What does it say here? Bug fixes, not this is an important security update and you need to update because we're trying to protect your privacy. They're hiding behind bug fixes and it's disgraceful, right? This is bad. So all these things taken together, I'm like, you know what? It may be safe. It may be not, but I'm out.
Starting point is 00:23:08 Like this is not where my, my important things live. So. Okay. Yeah. And OFAC also out there says, Hey, if you're okay with recommending paid services, which I am, one password is what I migrated away to from Authy. Yeah. One password is awesome.
Starting point is 00:23:22 But like I said, I have my logins at one password. So I put my 2FA in Bitward. All right. Woo. That was a long extra. That should have been a thing, right? Extra. Remember a while ago, I did this article unsolicited advice for Mozilla and Firefox, like your AI, your, your good citizen AI project probably won't save Mozilla. It probably needs more than that. And it won't really probably help Firefox either. So let's do some things that help them because I'm in principle a fan of them. Well, I said one of the things, and my main recommendation was like a privacy-focused Google Docs. Well, they didn't do it.
Starting point is 00:23:53 But Proton, the Proton mail people just came up with a Google Docs equivalent, but with end-to-end encryption and no AI training and all the kinds of things you would like about your data without the negatives. So if you have Proton, there's now collaborative docs with it, which is kind of cool. And it looks pretty. I think it looks pretty nice. So just want to give that a shout out. Do you know, I probably don't want to ask, are Google Docs open to scanning for AI? Do you know? I believe that the freeware versions are, the free versions, but the business ones maybe not.
Starting point is 00:24:31 I think your business workspace stuff is not open to that, but your personal Gmail is open to scanning for ads and stuff, whereas the business one isn't. So that's the price you pay. Last thing is the code in a castle thing I'm doing in October 5th. The early bird discount closes tomorrow. If you listen to this right when it comes out. So if you're interested, please check it out. It'd be super awesome to spend a week hanging out in Tuscany, doing all sorts of things together. And yeah, that's, that's,
Starting point is 00:25:00 it's for my extras, Brian. That's a lot of extras, Tony. I mean, Michael. Yeah. All right. All right. Shall we have a joke? Yeah. This joke I called I lied. And it's a like a cartoon. It's a woman behind a esoteric programming sort of thing.
Starting point is 00:25:18 She's got a gun. It says, I lied. I don't have Netflix. Take off your shoes. We're going to learn rust. I just thought about like this, all this like rust energy, like we're converting that to the rest. We're rewriting that and rust.
Starting point is 00:25:30 It's just like, you're doing rust. That's what the world's doing. Sit down. So she invited someone over to like watch Netflix. Like I lied. I don't know. Netflix take off your shoes. We're going to learn rust.
Starting point is 00:25:41 I thought I would catch the zeitgeist. Well, right. It's bizarre, man man it's amazing okay and i'll i'll maybe i'll put that is the the chapter art for this one because the picture the eyes are amazing the desperation that's pretty good yeah yeah yeah yeah yeah i've resisted i've resisted rust so far i mean i'm i'm happy that things are getting faster and whatnot but um i haven't learned it yet we'll see yeah same all right well cool uh nice episode thanks for uh joining us today you bet fun as always bye everyone

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.