Python Bytes - #391 A weak episode
Episode Date: July 9, 2024Topics covered in this episode: Vendorize packages from PyPI A Guide to Python's Weak References Using weakref Module Making Time Speak How Should You Test Your Machine Learning Project? A Beginner...’s Guide Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/391
Transcript
Discussion (0)
Hello and welcome to Python Bytes where we deliver Python news and headlines directly
to your earbuds. This is episode 391 recorded July 9, 2024. And I am Brian Ock.
And I am Michael Kennedy.
This episode is sponsored by Code Comments, an original podcast from Red Hat. Listen to
their segment later in the show. You can connect with us on Mastodon, of course, the links are in the show notes,
we're all on Fostodon, but you can get to us from any Mastodon. You can also join us if you're
listening to it. Later, you can join us live by going to pythonbytes.fm slash live via part of
the audience. Or if you just want to see the show later. It's, it's all there. It's usually 10 a.m. Pacific time on Tuesdays,
but sometimes it changes.
But if you go to that live thing,
it'll tell you when the next one is.
And finally, I'd really like to thank a lot of the people
that have subscribed to the mailing list,
the newsletter.
If you go to our pythonbytes.fm, there's a newsletter link.
And we'll send
you all of the links of the stuff we cover every week and we'll just send it to your
inbox.
So even if you miss an episode, you won't miss any of the goodness.
So speaking of goodness, what do you have for us, Michael?
Well, you know how we all love PyPI and the ability to go and just pip install a
thing and make it, make our apps be so
much more right programming in python becomes like lego block clicking together not algorithm class
yeah you know what i mean which is amazing it's amazing however there are issues that you run
into if you use third-party packages not the least of which is you now probably should have
a virtual environment you definitely require some
pip install commands just stuff all along those lines right just the management of setup before
you can even run your app plus then any potential changes if people don't pin their versions then
you're at the whims of what potentially can happen there so what some people do is something called vendoring or vendorizing dependencies.
So, for example, if I depend on some functionality from requests, I don't know if that's a super good example, but let's say it is.
I could just download the source code of requests, stick it into my code and use it.
It's probably not a great example because it itself has a bunch of dependencies, but stuff that's kind of like pure Python, no other dependencies, you could have it as a third party package, or you could just stick the code of that
somewhere to your app and refer to it as a relative import within your app, right? Yeah.
And you're then people just pip install your package, pip x install your package, or even
just get a script and run it or a set of scripts. So there's this project from M. Williamson called Python Vendorize that I want to talk about.
So it'll vendorize packages from PyPI, as I just described.
So it allows you pure Python dependencies to be vendorized.
That is, it'll copy that into your project.
Best used for small pure Python dependencies to avoid version conflicts with other packages require different versions and so on so
what you do is you set up a vendorize.toml file and in there you basically set up uh what is your
module where what sub section of your module do you want it to go into what sub module and then
what pipe you i packages you want so example here is like a hello package. So they create a dunder,
just not an underscore vendor folder. And then they say the packages are six. And once you run
it, it'll create that underscore vendor folder. And it'll put the six dist info, the dunder,
init pi, the six dot pi, all the stuff it needs to basically have that there. So then in your app,
you can say from dot underscore vendor import six
rather than having an external dependency.
What do you think?
Kind of neat.
How do you keep up?
Yeah, I've got questions.
Like how do you keep up with updates and things like that?
Well, I believe that you just run a command line, right?
Just run Python vendorize in the directory.
And I don't know if it'll red-download it, but it will create
it, so worst case, delete the folder,
run it again, and then that'll
update it. So the whole point of this is
you want less change than normal.
You want to freeze it in a place.
And things like 6 that don't really change.
Or have gotten kicked out of the
standard library.
Yeah, exactly. A lot of stuff that's
really super stable um and pretty small right you don't want to because if if you do something
that's got a bunch of dependencies you've then got to start doing their dependencies and then
it gets really wonky right yeah for small things yeah actually um this is pretty cool um yeah neat
neat well that's what i got for you the first first one, I got a bunch of other stuff, as you can see.
No, I like it.
There's projects that I've open sourced that it really wasn't intended for somebody
to actually use as a dependency.
It's like some example code
that happens to be pip installable,
but somebody would probably take it
and just copy it and run from there
and using something like Vendorize would. So yeah, for sure. Cool. I would like to talk about something
not as strong as this, but weaker.
weak, so weak.
So some weak references. So this is a article from Martin Heinz, a guide to python's weak references using weak ref modules so weak ref is a
built-in standard library module and um i actually have never played with it and i kind of knew that
python must have weak references but i just didn't really explore it before now and this is a great
introduction to just talk about what they are so if if I mean, we that the term weak reference might
be new to somebody that's, that's, like, I don't know, maybe new to you. It's a it's a term that
we talked about in like C++ and other things a lot, because of the using, I use it a lot in C++,
but using strong references and weak references. Python also has strong references and weak
references. A strong reference is just sort of a copy of something. But a weak references. Python also has strong references and weak references.
A strong reference is just sort of a copy of something.
But weak references is a way to point to something else,
but not muck up the garbage collection.
So this is a great article.
It talks about, starts out with talking about,
sorry, starts out talking about garbage collection and how weak references are used with garbage,
how garbage collection and weak references
and strong references affect that.
So why do we care?
Well, it is used in things like the logging module,
for instance.
So you have named, this is a cool example
because you have named logging modules.
Oh, where was it it this is an example anyway some named modules that you you named a logging module or a named logger
sorry it's been a rough weekend and and then like if you wanted another one of the same name it might
be there it might not so it's a caching sort of thing that how logging uses it. But there's also ways to use it as a like for trees, if you're building a data
structure, where you might want bidirectional links between objects. One of the objects,
it shouldn't be really hard links in both directions. So one of those links should be
a weak reference, like the the link between a
parent and a child and a tree structure would be good like that or other things like he talks about
he talked about using an observer building an observer pattern from the design pattern book
using weak references just some really cool stuff i don't know i don't build a lot of data structures like there's enough
data structures in python already but if uh especially if you're in a in a cs class or
you have uh some special needs for data structure weak references are built in and they might help
you a lot so yeah they're pretty interesting the only chance only time i've really played with them
is for the python memory course that i created at TalkPython to understand.
Because you want to look at stuff and see, we did this, it's alive, we did this, now it's garbage collected, or now it's reference count deleted.
But if you have a pointer to it, then obviously it's never going away.
So weak references allow you to ask questions like that.
I think you can do interesting stuff with caching too. For example, like if you've got a cache and you've handed out an instance of the object
and it's still alive and people are still using it, the parts of the app are still using
it, you could have a weak reference to it.
And if someone else asks for you, like you can upgrade the weak reference to a strong
reference, right?
And hand that out again without recreating it.
But if no one's using it, it'll get cleaned up because a weak reference won't keep it
around.
So it's like sort of a self-managing cache
type of structure. It could be fun to make too.
But that said, I was thinking just
like you, I don't usually make data
structures these days. Python's pretty much
got something for you.
Right, but you know,
people are building, well, there's some
third-party library data structures they use too
and they probably use weak references.
I just haven't poked into there to find out. Yeah, exactly.
Let someone else do cool stuff with it for us.
But I love the idea of
the logging
module that uses
named items, doing something
like a cache named item thing.
Yeah, very cool. Do you know what else is
cool? Code comments from Red Hat.
Yeah. This episode is brought
to you by Code Comments, an original podcast from Red Hat. You know This episode is brought to you by Code Comments,
an original podcast from Red Hat. You know, when you're working on a project and you leave behind
a small comment in the code, maybe you're hoping to help others learn what isn't clear at first.
Sometimes that Code Comment tells a story of a challenging journey to the current state of the
project. Code Comments, the podcast, features technologists who've been through tough tech transitions, and they share how their teams survived that journey. The host,
Jamie Parker, is a red hatter and an experienced engineer. In each episode, Jamie recounts the
stories of technologists from across the industry who've been on a journey implementing new
technologies. I recently listened to an episode about DevOps from the folks at Worldwide Technology.
The hardest challenge turned out to be
getting buy-in on the new tech stack
rather than using that tech stack directly.
It's a message that we can all relate to,
and I'm sure you can take some hard-won lessons
back to your own team.
Give Code Comments a listen.
Search for Code Comments in your podcast player
or just use our link,
pythonbytes.fm slash code dash comments. The link is in your podcast player or just use our link pythonbytes.fm slash code dash comments
the link is in your podcast players show notes thank you to code comments for supporting the
show this one is make time speak from prasen prasen's been on the show before a friend of
the show and also former co-guests co-host and the idea is it's a little bit of a human friendly way to refer to time you might know about
things like i think it's arrow that has a humanized thing that says you know five minutes
from now or in 10 minutes or just now those kinds of things but the way this one works is it talks
in sort of colloquial way of saying the time. So you create a clock object and you give
it a, a language to use like English, German, Swahili, I think all those things Dutch. And
then you can ask it, you know, what is 11 colon 15? It'll say quarter past 11 or a bunch
of different times. What is seven 29? And well, it says that in Swahili, which I can't, I can't get that.
I'm not going to get that right, but it'll convert time into spoken expressions
in multiple languages, super easy to use pure Python, so you could vendorize it,
I guess, and so on, even as plugins.
So super easy to use if people want to check that out and play with it.
This is pretty fun.
Yeah.
I like it very simple.
But if, if you've got a use case like that,
you have a date time and you want it to say it
in a more human version, well, here you go.
All right.
Nice.
Yeah.
I am going to cover a topic that I get asked all the time.
So I talk about testing a lot.
And machine learning and AI is kind of a big thing now.
So I get questions like, how do I test machine learning projects?
And I got to answer, I have no idea.
So I'm excited that somebody made an attempt at this.
Here is a article called, How Should You Test Your Machine Learning Project?
The Beginner's Guide by Francois Porcher.
So and it's published in the TORS data science blog. Anyway, kind of a cool intro talked about, you know, some of the simple stuff.
I mean, there is like, how do you test machine learning?
It's complex, but there are, there's a lot of pieces that are pretty straightforward to test.
So cool introduction, had a project.
This article also includes a repository that you can play with directly, had a project. This article also includes a repository
that you can play with directly, which is nice.
You just follow along with the code.
So this is doing, what is it doing?
It's essentials of testing with a machine learning pipeline
focusing on fine tuning BERT for text classification
on an IMDB dataset.
So that's just what he's using.
He's using PyTest and PyTest-Cov,
which are awesome things to start with.
And so it kind of goes through,
some of the easy stuff right away
is starting with some of the simple things like,
has a clean text function.
So a function within the source that takes a string
and makes it all lowercase and strips it,
but it might do other things too.
But these are great examples.
In a lot of machine learning stuff,
you've got a lot of little helper functions along the way.
You may as well go test those,
and it'll get you in the habit of writing tests too.
And in this case, it's just giving some examples
of some some random
text input and what the clean output should look like. And these are your expectations of like,
if I pop pop this data into this function, what should the output look like? So it's a great way
to get started. I personally would have put this in a parameterized, but I guess we're trying to
teach people slowly. These are really three test cases. They could be three test five functions, but it works.
And so I'm referring to a test function that does a test for capital letter stripping and
removing extra spaces and what, how, how it should handle the empty string.
And this is actually a good point. One of the things I test with interviews a lot is
the edge cases for testing. So like what test examples are like derivative small cases that
you wouldn't possibly think about. And it's important to test those too. Like what does
an empty string get cleaned as? Or a good thing, like if I already had the word spaces in lowercase, how would that end up showing up in the output?
Things like that.
So good start.
And then jumps up to higher level things.
He talks about a larger chunk of the script.
So he's got a tokenized text function that uses a lot of sub pieces, uses the tokenizer with certain input.
And how you test that?
Well, this is a great example of just figuring out
really some examples, some example input,
and how you would expect it to be tokenized on the output,
looking at the length and the shape of the result.
And then, you know, making sure that, that not all values
are, I don't know what this would be. Oh, he's making sure that all values are torch by torch
tensors. I don't even really know what that means. But, you know, thinking about what the output
should be, if even if you don't know the specifics, you can have some way to describe how it should
sort of look. And these are good enough tests,
or they possibly are good tests to have anyway.
So I think this is a good starting point
to start a discussion on your team
for how to add testing to a machine learning project.
Yeah, it's interesting.
I really would have no idea how to test machine learning.
It seems like black box type stuff.
So yeah, this is a lot more to work with than I would have come up with, I think.
Yeah.
Just getting started, taking a chunk out of it, and then where to go from there.
So after you kind of have a sense of some of the easy stuff, some of the middle level
stuff of testing examples and shapes and whatnot, what's left?
Well, that's where a quick introduction to how code coverage works
and looking at what the rest of your code is doing
and that maybe you want to add tests to,
or maybe those are the things
that you manually test or something.
So anyway.
Yeah, excellent.
Sounds good.
Well, those are our items.
Do you have any extras for us this week?
I am clean out of extras.
Clean out?
Well, don't worry, I'll make up for it for you.
Okay.
So, wonderful news from Authy.
You know, Authy, the 2FA password thing that you can get for multi-factor authentication,
super nice because, you know, so many of the devices are locked to,
or some of the apps are locked to one platform like Google, a Google authenticator.
You lose your phone or you have to reformat or something.
Sorry.
Good luck now.
You know, there's no syncing, things like that.
But with Authy, you have an account.
It syncs it across your different devices.
One device can authenticate another if you want to add a new one.
It's really nice.
Except. device can authenticate another if you want to add a new one. It's really nice. Except now Authy
is urging users to stay alert after 33 million phone numbers were leaked. How? Well, there's an
authenticated API endpoint, but apparently it would return an error that would indicate whether
the phone number that you passed in
to try to authenticate with was valid or invalid. Like, sorry, that phone number doesn't exist,
or sorry, wrong password, something like that, I think is the deal. And so somebody just hammered
it with, you know, every phone number combination they could think of and recorded the results when
it said that phone number exists. And we know that all the has it, and we know that all the has it and we know that you have 2fa and all
of these things and so from what i could tell no real information about people was stolen but given
that they know you have 2fa and they know that you have uh that this is your phone number they
could start sending you all sorts of uh spoof things social engineering type things right
yeah well and ath the recently cancelled their desktop apps
you know off the bean uh tulio the parent company cancelled their desktop apps it just seems like
it's really in a kind of a state of disrepair and lack of love and a lack of confidence in michael
at this point so i went through the super fun experience of resetting about 30 different 2FA logins.
And boy, oh boy, I learned some things, Brian.
I've learned that some companies make it super easy to reset. Cause my thought was like, if this is, you know, what else potentially has
happened, I'm going to revoke all of my 2FA logins and set new secret
keys that will generate new passwords.
So even if they were able to get a hold of
everything in my account, that stuff doesn't work anymore effectively, right? That was my plan.
And it took like six hours or something, five hours. You go to different places and you'll see
some of them will let you, some have an awesome button, reset 2FA. Here's a QR code you scan.
Boom, you're good to go. Others say your Google authenticator is enabled.
Like what?
I don't have a Google authenticator.
There's like 50 apps that are 2FA apps.
T-Mobile and like 10 of the other ones say,
use your Google authenticator here.
Like, no, it is not.
It's like, use your Internet Explorer 6 here.
Like, no, there are other browsers.
Please don't just say, use your Google Authenticator, right?
But you can just go, yep, this is my Google Authenticator.
It's called something else, and it doesn't come from Google.
But sure enough, I'm going to set this up, right?
And like Christopher out in the audience here, that is my next recommendation is, well, if not Authy, what?
Because Google Authenticator is garbage. audience here. That is my next recommendation is well, if not off the what because Google authenticators garbage, like I said, you if you your phone gets messed up, you've lost all
logins forever. There's not a sync, at least last time I use
it, there's no way to sync it or export it or any of that stuff.
Right? That's bad news. So Bitwarden Bitwarden is awesome.
This is a premium feature. So you have to have the premium
version of Bitwarden, which is $10 a year or 80
cents a month or something like, yeah, fine. That's, that seems fair. But Bitwarden's cool.
It's open source, multi-platform. You just scan, scan stuff or enter the code and that they give
you for the 2FA and off it goes. And because it has a browser plugin, you can just click on your
name when it says type in your 2FA code. You don't have to go pull it up. You just click the button
and boom, it auto fills it, which is great great I don't put it in my one password because
I'm just not ready to say my 2FA logins and my passwords are all stored behind one single
platform because then your 2FA is kind of toast if somebody breaks into that so
Bitwarden for 2FA one password for logins for me at the moment. What do you think? Well, I'm using, maybe I shouldn't tell people,
but yeah, I'm using Authy.
So are they, they're still supporting it on like, aren't they?
They're not supporting it on,
they used to have a desktop app.
They don't have that anymore.
They have a iPad and iOS and Android app.
Since you have an Apple Silicon one,
you can run the iPad version on your Mac,
just like a desktop app. So it's kind of feels the same except for it doesn't have like,
like the keyboards behave weirdly and stuff, right? Because it doesn't expect you to have a
keyboard. Maybe you're using it a lot more than I am, but it doesn't bother me to run,
run it on my phone. But yeah, well, I mean, it doesn't bother me either, but I've got,
there's like a bunch of different apps that I have, for yeah, well, I mean, it doesn't bother me either, but I've got, there's like a
bunch of different apps that I have. For example, the credit card front end system for talk Python
courses. It has a remember me button. It never remembers me. Never. It has a two F a it never
remembers the two F a. So even if I say, remember me 20 minutes later, I'm putting in the password
and the two F a and then 20 minutes later, I'm putting the password in the job.
I'm like, ah, so there's like a few places like that that just constantly ask for the
two FAA, um, digital issues a little bit like that.
Like if, uh, you, every single time you're putting in the two FAA, there's not like a
trust this device sort of thing.
And so I ended up, I'm probably in, used to be in offing now in bit warden.
I'm probably in that like five times a day, at least every day.
So anyway, so one final thing before I move off of this.
After all of this, they said, I don't know how this helps.
Doesn't seem like it should help, but somehow they said, as part of our recommendation to
users, it's very important that you upgrade to the latest version of Authy.
Why?
Because the endpoint ends on that, I don't understand.
But anyway,
it says you must. And then if you go and look at the upgrade, all it says for the, it says you must get version 26.1.0. What does it say here? Bug fixes, not this is an important security update
and you need to update because we're trying to protect your privacy. They're hiding behind bug
fixes and it's disgraceful, right? This is bad. So all these things taken together, I'm like, you know what?
It may be safe.
It may be not, but I'm out.
Like this is not where my, my important things live.
So.
Okay.
Yeah.
And OFAC also out there says, Hey, if you're okay with recommending paid services, which
I am, one password is what I migrated away to from Authy.
Yeah.
One password is awesome.
But like I said, I have my logins at one password.
So I put my 2FA in Bitward. All right. Woo. That was a long extra. That should have been a thing, right?
Extra. Remember a while ago, I did this article unsolicited advice for Mozilla and Firefox, like
your AI, your, your good citizen AI project probably won't save Mozilla. It probably needs
more than that. And it won't really probably help Firefox either. So let's do some things that help
them because I'm in principle a fan of them.
Well, I said one of the things, and my main recommendation was like a privacy-focused Google Docs.
Well, they didn't do it.
But Proton, the Proton mail people just came up with a Google Docs equivalent, but with end-to-end encryption and no AI training and all the kinds of things you would like about your data without the negatives.
So if you have Proton, there's now collaborative docs with it, which is kind of cool.
And it looks pretty.
I think it looks pretty nice.
So just want to give that a shout out.
Do you know, I probably don't want to ask, are Google Docs open to scanning for AI?
Do you know? I believe that the freeware versions are, the free versions,
but the business ones maybe not.
I think your business workspace stuff is not open to that,
but your personal Gmail is open to scanning for ads and stuff,
whereas the business one isn't.
So that's the price you pay.
Last thing is the code in a castle thing I'm doing
in October 5th. The early bird discount closes tomorrow. If you listen to this right when it
comes out. So if you're interested, please check it out. It'd be super awesome to spend a week
hanging out in Tuscany, doing all sorts of things together. And yeah, that's, that's,
it's for my extras, Brian. That's a lot of extras, Tony. I mean, Michael. Yeah.
All right.
All right.
Shall we have a joke?
Yeah.
This joke I called I lied.
And it's a like a cartoon.
It's a woman behind a esoteric programming sort of thing.
She's got a gun.
It says, I lied.
I don't have Netflix.
Take off your shoes.
We're going to learn rust.
I just thought about like this, all this like rust energy, like we're
converting that to the rest.
We're rewriting that and rust.
It's just like, you're doing rust.
That's what the world's doing.
Sit down.
So she invited someone over to like watch Netflix.
Like I lied.
I don't know.
Netflix take off your shoes.
We're going to learn rust.
I thought I would catch the zeitgeist.
Well, right.
It's bizarre, man man it's amazing okay
and i'll i'll maybe i'll put that is the the chapter art for this one because the picture
the eyes are amazing the desperation that's pretty good yeah yeah yeah yeah yeah
i've resisted i've resisted rust so far i mean i'm i'm happy that things are getting faster and
whatnot but um i haven't learned it yet we'll see yeah same all right well cool uh nice episode
thanks for uh joining us today you bet fun as always bye everyone