Python Bytes - #327 Untangling XML with Pydantic
Episode Date: March 13, 2023Topics covered in this episode: pydantic-xml extension How virtual environments work DbDeclare Testing multiple Python versions with nox and pyenv Extras Joke See the full show notes for this ep...isode on the website at pythonbytes.fm/327
Transcript
Discussion (0)
Hello and welcome to Python Bytes,
where we deliver Python news and headlines directly to earbuds.
This is Episode 327,
recorded March 13th, 2023, and I am Brian Ocken.
I am Michael Kennedy.
This week's episode is sponsored by Compiler Podcast from Red Hat.
Listen to their spot later in the show and connect with the show on
Fostodon at pythonbytes at fostodon.org.
And both Brian and Michael are there also.
Brian Ocken and M. Kennedy.
You can also join us on YouTube or join us live by going to pythonbytes.fm slash live to be a part of the audience.
It's really kind of fun.
Usually it's Tuesdays at 11.
This week it's Monday, but usually it's Tuesdays at 11.
And you can watch older videos on the YouTubes as well.
So thanks, Michael, for showing up again this week.
We've got quite a few episodes under our belt.
So are you excited to get started?
We do.
I am.
You know, technology can be a tangled mess sometimes.
And not long ago, we spoke about Untangle.
And then, I believe it was Ian sent in and said, you know, that was really cool.
Yeah, it was Ian.
Thank you.
He said, I know you're a huge fan of Pydantic.
It's true.
And maybe you want to check out something that is similar to Untangle, which would let you talk to XML through Python in an object-oriented style way, a little more dynamic.
So he sent in Pydantic, the Pydantic-XML extension.
Have you heard of this, Brian?
No.
No, I hadn't either.
It's totally news to me. But the idea is basically, you know, the way Pydantic traditionally works is you point it at a JSON file or a Python dictionary, and it can create an object graph
hierarchy of all the pieces that it knows. So you can say it has a name and a number, but then also
has a list of locations and the locations model of these Pydantic objects and so on. And that's
how Pydantic has worked from day one, more or less.
It's based on dictionaries because that's the way that you speak APIs.
And so it was very closely tied to APIs and JSON exchange there.
So this one does basically the same thing, but for XML.
And it's glorious.
It's glorious with the data validation, the required versus optional,
the type conversion, all of those things. It supports dictionaries, listsets, tuples, unions.
It has LXML parser support for high-speed parser processing. You can pass in an element tree
as well, which is the xml-etree-element tree class, which allows you to do parsing traditionally.
So how do you get going?
Well, you create a class here with pure Pydantic you derive from base model.
Here you derive from base XML model.
So it's slightly different, but that's fine.
And check this out.
In the XML document that they're talking about here, the top level node has a thing called status,
an attribute called status in XML.
Okay.
And yeah, or in the product, it does a part of it anyway.
It has two possible valid values.
It can either be running or in development.
It can't be ran or prod or any,
it has to be those two words.
So because it's pydantic,
you can just say the type of this
is a literal running comma development.
Isn't that awesome?
And that's it.
Like you're done validating that,
that that is correct.
And you set that equal to an adder,
which means it's not coming from the body
of the XML node,
but it's coming from this attribute name status down here.
So cool, huh?
Then you can have launched. You could have launched, which is a
numerical date. So the running ones have 2023 and 2019 as launch, but the one that's in development,
well, it doesn't have a launch date, so it's missing. So the optional aspect of Pydantic
is a play here. And then there's a title for that element. And that just comes, you say it's
straight, it just comes straight out of the that just comes, you say it's string,
it just comes straight out of the body of the node
because it's not set to an attribute,
but it's just the base one.
I guess, presumably, you can only have one of those per node.
Okay.
Is title special or can you name it whatever you want then?
You can name it whatever you want, I'm pretty sure.
Okay.
Yeah.
Oh, yeah, it says extracted from the element text.
Nice.
Okay.
Yeah, yeah, exactly.
And so then the overall XML document, I had it reversed when I first started talking about
this.
There's a company and the company has products, right?
So there's a company class.
It has a trade name from its attribute, SpaceX in this case.
And then it has a node, which has a website as its text value.
But the text value is HTTPSSpaceX.com, right?
And so you can say the type is a URL and
it'll actually parse it out as a URL, not just a string, which is really cool. And then in standard
Pydantic style, it has a list of products and you give it the tag name that it's product. The node
name is product and just loops through that list. Isn't that a clever way to parse that with
validation and data conversion and all that? Not only that, I'm really glad you walked me through it because
the first time I looked at this, I was a little bit like lost on how to think about this and
how it's building it up from different components and attributes and elements. It's pretty neat.
Yeah. If I've got to do XML again, I'm all over this. So there's a
bunch of stuff about how you talk about heterogeneous collections, aliases, union times
model and go through it if you want. But I think, you know, this little quick getting started,
but they have right at the top of the website that I'm linking to. That's pretty good. So.
Yeah. Nice. Cool. Very good. Yeah. Anyway, that one's a great one. Thanks, Ian, for sending that in. I'm psyched to know about it.
Well, next, I kind of want to talk about virtual environments.
So I use the virtual environment VENV built into Python.
In the past, I've used the virtual env extra package that you can install.
But since, I don't know, it's been quite a few versions of
Python, the built-in one's pretty darn good. So I'm happy with it. Anyway, there's a lot of people
that kind of don't really get how they work. There's trying to get people on board with that
they should use them is great, but trying to use them effectively, like one of the mistakes I've
seen a lot of people make with virtual environments is, um, is using them. But
then when they go to test in CI, actually trying to activate the virtual environment and you don't
really have to, um, you can just use the binaries, uh, directly. And so I'm really happy this article
is around. So Brett Cannon wrote, article called How Virtual Environments Work.
And this is excellent.
And it's a short read.
So one of the things that starts with a little history, not a lot of history, just a little to remind people that back in the day, we had global and the working directory or your current directory.
And that's it.
It wasn't
anything else and i kind of remember this of of trying to find if i'm sharing sharing some code
trying to find some on the web and then just downloading it and sticking in my directory and
see if it works it's just part of your code now um that's not what we have today and partly in
thanks to virtual environments so um it's better now. You can still complain about them, that's fine,
but it's better now.
And then he goes on to talk about the structure.
So, and it's really, there's really not much there.
I mean, when you're building a virtual environment,
it's kind of a lightweight throwaway thing.
Don't think of it as this huge thing.
It's just a little directory and it's got a bin
and an include and the site packages directory for the Python that you're using.
And on Windows, it's a little different, but we'll just hand wave around that.
In the Unix environment, it's mostly symbolic links to,
I mean, you do have stuff installed there,
but as far as replicating the Python environment,
your Python interpreter isn't copied in there.
It's symbolic linked.
So you don't have to worry about that too much.
It's the site packages in the bin and everything and how that's there.
So how does Python deal with that?
Well, it deals with it through a pyvenv.cfg.
It's a config file that tells Python when you run Python from this virtual environment,
where the home directory should be, where the system, whether or not to include system
packages in the site packages, and then the version and the executable and some other
stuff like the command, if you wanted to recreate it.
I don't know why that's there.
But in general, this is enough to tell Python if you just run it from that environment that you just get all the right stuff.
And so if you're putting it in a script, just use those.
But if you're using it from the shell, then, of course, you're going to activate the shell.
But the activation, he's stressing, and this is important to understand, it's optional optional you don't have to hit activate as long as you're calling stuff within the environment
um and and uh he kind of goes on to talk about uh really what it's doing what what does the
activation do though if you're curious it doesn't do much it uh it sticks some stuff in your path
um it edits a like a virtual environment,
sets a virtual environment variable,
and it registers a deactivate shell function.
And that's about it.
It changes your prompt too,
to let you know that you've activated it,
which is cool.
And then he goes on to talk about how,
partly why he's dug into this lately is because for VS Code,
they're creating a little tiny, but you can use it anywhere you want, a extra extension called microvenv.
Microvenv, I don't know.
So, and this is a single file, less than 100 lines to kind of emulate all of that.
And the reason is because Debian doesn't, or Debian doesn't include the virtual environments by default.
So they kind of have to want to work around that.
So anyway, really great summary of virtual environments.
Yeah, peeling away a little bit of the magic, letting you know what's happening in there, right?
Yeah, well, and also because it's sort of magical to some people, a lot of people are concerned about like trying to copy it or something and it's
you shouldn't think you shouldn't have anything kept that you that's important
within your virtual environment you should be able to recreate it whenever
you want so there should be lightweight thing though oh the one thing I really
wanted to highlight and the reason why I really wanted to talk about this was
because of a flag so where's that flag uh there's a flag dash dash um i gotta find
it do you remember do you know what i'm talking about anyway okay i'll help you search uh there's
like uh no uh there's like without yeah without pip dash without dash pip okay so uh without pip dash dash without dash pip okay so uh without pip uh excellent um thing to know about
because oh here it is uh bnv without pip that will get it so that doesn't ask you ask you to
upgrade pip so especially in ci and other places you don't you don't care about upgrading it right
now i mean i get it if i'm in the development mode. I do want to upgrade it.
I want to use the latest one.
But in a CI environment or a lot of automated places,
I don't need to do that.
I can just use whatever's there.
It's going to be fine.
So turning that off is awesome.
And it saves some time.
It's not just, it doesn't say not,
it isn't really not install pip or upgrade.
It just doesn't try.
So it assumes pip's already there is all.
It uses the system pip.
So that's it.
Cool.
MARK MANDELMANN, That's cool.
Yeah, it just falls back to the global one,
but runs it for that environment.
MARK MIRCHANDANI, Yeah.
Yeah.
And apparently, it saves a lot of time of that.
So that's great.
MARK MIRCHANDANI, Very cool.
Very cool.
Well, before we move on, our sponsor.
MARK MIRCHANDANI, Oh, yeah.
Let's cover our sponsor.
And I really, really appreciate Red Hat and the Compiler podcast for sponsoring this episode.
So just like you, both Michael and I are big fans of podcasts and really happy to share a new one from a highly respected open source company.
Compiler is an original podcast from Red Hat. Compiler brings together a curious team of Red Headers
to simplify tech topics and provide insight
for a new generation of IT professionals.
The show covers topics like
what are the components of a software stack?
Are big mistakes that big of a deal?
And do you have to know how to code
to get started in open source?
Compiler closes the gap between those
who are new to
technology and those behind the inventions and services shaping our world. They bring together
stories and perspectives from the industry and simplifies it, its language, culture and
movements in a way that's fun, informative and guilt free. I recently listened to an episode titled testing PDFs and donkeys is great.
It's part of a stack unstack stack unstuck series.
It's a great series and it talks about the entire tech stack software tech stack, especially around web stuff.
Starting with the great stack debate.
There's episodes on front end frameworks, fundamentals, databases and OS.
Even OS is just OS is in system calls.
And then it even talks about testing,
even though testing really isn't part of the,
think of as the tech stack, it's kind of part of all of it.
So I'm glad they covered it.
Especially for people either jumping into software
or software old hats like me trying on new hats,
like embedded systems or control systems,
people learning how to do web
applications these are great overview episodes and they're timed well they're either um they're
timed how they need to be sometimes some of them are 45 minutes some of them are 25 and i like
that flexibility uh learn more about the compiler at pythonbytes.fm compiler the link is in your
podcast player show notes and thank you Compiler for keeping this podcast going.
Yes.
Thank you, Red Hat.
Thank you, Compiler.
Good show.
Check it out.
All right.
On to the next one, Brian.
Okay.
This one is a project by Raid.
And if you've worked with databases in Python, especially if you're using an ORM like SQL
Alchemy, SQL Model, Kiwi, any of these things.
What's really nice about those is you create classes in Python.
And then through some sort of magic, somehow there's a startup thing that makes sure the database exists,
that the database has tables that map over the classes.
So, for example, if I create a class and say it's these three columns and here's an index and this one must be unique,
it'll talk to the database and make that happen.
But for the rest of database management,
you've got to go and write stuff in SQL
or DDL data definition language or whatever that is, right?
The stuff where you create the tables
and create those types of scripts, create users.
So RAID created DB declare, a declarative layer for your database that adds
on on top of those types of things like SQL alchemies, what I described for that kind of work.
So it's a pretty new project. People can check it out. The idea is, let me find a quick example
here. So what you can do is you can come and say, I want to create a database and it's got this name and I want to create a role.
And here they have the name of the role is a hungry user.
They have to log in.
Here's their password.
They get privileges on this certain database.
And you can model out those types of things.
And then on top of that,
you can just use SQL Alchemy itself
as part of this process.
You create a SQL Alchemy engine and you call run on that.
And you can also, it'll create the SQL Alchemy models.
There's an example a little bit further,
also linked to this that shows how to do the,
basically the standard SQL Alchemy stuff
that will create the tables with the primary keys and so on.
So this one's just a short one,
but if you like the way that SQL Alchemy works,
also with SQL Alchemy, you get migrations, or with SQL Model through Alembic, the idea is that
this is going to be extended in the future as well to have some of those type of transformational
behaviors. But for now, it's really the extra stuff like table creation, database creation,
user roles, and management.
So pretty cool.
People can check that out if they find that useful.
Want to stay more in Python and less in SQL scripts.
MARK MANDELMANN- Well, especially the roles
and permissions, having that covered by that,
that's pretty cool.
That's a piece that always trips me up.
So it's pretty cool.
MARK MIRCHANDANI- You don't just
have the root user.
Just have read access, full access, just run as admin just kidding yeah yeah exactly um it's it always seems
like that it's covered as like an advanced topic but it's almost the first thing you need to figure
out is yeah exactly how to separate user roles so right we're going to put this on the internet
and just let people have at it or are we going to put a little data protection in there?
But anyway, people can check that out.
It's a good one.
MARK MIRCHANDANI, Next, let's talk about Knox.
So I use both Tox and Knox on various projects.
A lot of my open source stuff is Tox-based for testing,
just because I'm used to it.
But I'm starting to use Knox more and more.
And I want to cover this article by Seth Larson
called Testing Multiple Python Versions with Knox
and PyEnv, P-Y-E-N-V.
Now, I personally don't use PyEnv, but I have before.
And one of the things that stripped me up before
is how to use it with like Tox and Knox.
So basically, even so check this out, if you want
to check this article out, if you want to learn more about talk knocks, but also the trick about
even if you're a talks user, the trick about how to use pi in with it with the global,
there's an example here, it's awesome. So let's go over this a little bit. So if you want one of
the first things I want to try to try to do with Knox when it wasn't obvious to me from the documentation
is just how do I set it up like I would talks
to just test my stuff with multiple Python versions?
And that's the example that shows right off the bat.
You have a Knox file.
It's knoxfile.py.
And it's Python code.
So you import Knox,
and you can set up a session for multiple Python versions.
And then within this defining test, this can be anything.
So the function names around a session are what you'll use later.
So we'll cover that in a little bit.
But then within your session, you do stuff.
You either install or run.
There's probably other stuff too,
but this is what I use is install and run.
So installing dot means installing the current project
that you're working on.
And then there's an example here for requirements files,
so a dev requirements.
But if you're using pyproject.toml,
it could be also part of your dot install if you want.
And then run test.
And of course, run pytest.
So good job, Seth. And then it goes through how to run it. So you can either just type Knox and it'll run
everything or you can say Knox dash s for dash session to run test. And if you want to run a
specific one, like just 311, Python 311, you can say test dash 3.11 i kind of like that there's a dot in there it's
pretty easy to understand so i just really like how simple this is to get the basics down the
basics of i want to run tests on my project uh on over multiple python versions and this is pretty
clean this is all already a decent ex decent argument to switch to nox if you're on the fence
between toxin nox i agree and it looks it's so nice because not only is it clean you get
autocomplete support from your editor you get whatever editor you're using will tell you if
you've done something wrong right like there's there's more support than just well here's a
arbitrary text file i'm typing stuff into. I hope it works.
Yeah, and then you can, I mean, I've used it also to, just like I do with talks, with doing something like adding linting and coverage checks and all sorts of stuff.
I do want to, actually, that's one of the things, I'm glad you wrote this because it's a reminder.
I did want to write an example of the workflow differences
between uh using talks and knocks and showing a side-by-side comparison those two so hopefully
in the future i can get that written um but one of the things that gets me is that with the run
command you have to separate every little piece of your command by uh they have to be different
quoted as quoted strings like pytest tests has to be two different parameters
to the run argument.
And if you have a bunch of flags,
each of the flag needs to be different things.
Now this, some people might not care about this.
I kind of care and it bugs me
because I don't have to do the talks.
So what I do is I just, since it's Python,
I just write a string with all of the things
that I want in it.
And then I use split to create a space or something like that.
Yeah, I just use split on space to create a an array with all of the elements.
And then when I run, I pass it to run and do the star thing so that it explodes it and passes it in altogether.
So, yeah, nice. And then here's the trick,
the magic trick about pyenv at the bottom is that if you say pyenv global and list all of
the environments that you want to have available, it makes it available. If you're using pyenv,
it isn't by default. So you have to run this for each session or shell invocation to get it to work for PyEnv people.
But that trick works with talks also.
And the other thing I wanted to mention
was one of the things I really like about Knox
is that if you don't like it's this example has
PyPy3 and 389, 10, 11, 12 all there.
By default, Knox will not fail if you don't
have one of these around.
So if you only have 3.11 installed, it'll just run that.
And it'll skip the others.
You can make it fail if it doesn't have it.
But by default, it just skips them, which is kind of cool.
The tox is the reverse.
Tox is, by default, it'll fail if it's not there.
But you can tell it to skip if it's not there.
So.
MARK MIRCHANDANI. Yeah, that's cool.
One really quick thing, if people are copying and pasting from that example,
I'm pretty sure the dash R dev requirements needs a dash R space dev requirements in there, just people are copy pasting, right?
Because that's the command is install, pip install dash R space file name.
This always surprised me.
I've seen it in multiple tutorials.
I don't know if that's true.
I think you might be able to do it without the space. I't know okay well okay you may be able to we'll try we
could try it yeah we could try later yeah all right uh quick question for you brian because
i don't know the answer damien asked does someone know if how knocks or talks work with poetry
do you know nothing works with poetry. Do you know? Nothing works with poetry. I don't know, actually. So poetry, I'm sure there's, I don't know.
Probably you talked about the Pi Project Tommel integration. So I mean, it probably is more or
less. It probably works. I'm sure many people listening now, sorry, David, I don't know either.
All right. Yeah, that. I don't know either.
All right.
Hold that thought.
Hold that thought.
Yeah, that's all of our things, isn't it, Brian?
Do the search.
Yeah, we were quick.
Do you have any extras for us?
Oh, I always got extras.
So let's go through here.
Remember when we talked about how much drama there was around, how was it, Google maybe? Someone was giving away like 2,000 or 4,000 YubiKeys to the top 100 or top 1,000 maintainers on PyPI,
or maintainers of the top projects on PyPI.
And that was because there was going to be a requirement
for PyPI that the very top 1% or some small percent
was required to have 2FA.
Well, guess what?
If that caused drama, wait until you hear about this.
GitHub makes 2FA mandatory next week
for anyone who is an active developer.
So basically, if you're making contributions to projects,
public projects, I believe, something like that.
So, yeah, security that counts of more than 100 million users.
I'm not sure exactly what the definition of an active developer versus an active contributor, because I might contribute to the code without writing any actual software, but whatever, splitting hairs.
The only reason I really bring this up is not to like go into depth. That's why this is an extra, but if it was a big deal that, you know, thousand
Python developers had to do 2FA and it sounded like it was, what about a hundred million?
It's going to cause some drama. And then how many of those people who are contributing to PyPI are
doing so in some way or another through GitHub.
I would say the majority, probably.
Yeah, yeah.
Actually, so I don't think it's going to be drama.
Hopefully people are just cool with it.
I think that the mess up with PyPI was the dongle thing.
I think people thought they had to have the hardware thing and they don't.
I mean, I use the software 2FA system.
And it's not just them. I don't know about i use the software 2fa system so i do and it's not i mean it's not
just them i don't know about you but i got like i just looked i got like half a dozen dozen
different things i gotta log into with uh with offing so yeah i think i have about 30 30 accounts
or so that are 2fa yeah and i'm happy that i do i that is not a complaint or i mean that's not me
whining that's me like going yes occasionally. Occasionally I'm annoyed by it.
Yes.
Like right now I went to look at a thing and I have to log in.
So I don't have time to do that right now.
So occasionally it's annoying.
Well, here, let me tell you why this is annoying so often.
I'm going to take this and make it a whole episode, aren't I?
So the reason it is annoying is there's so many places. What is the point of the 2FA? The 2FA is if somebody steals your account login
information through some kind of data breach or through password reuse or whatever, that someone
else can't go and use those credentials to log in as you. They have to have the second factor. Well, here's why it's annoying. Every time I log into my credit card processor, I think almost every time I log
into DigitalOcean, it's like, hey, how are you doing? What's your 2FA factor? It's like, I've
given that to you about a hundred times in the same browser, right? It should at some point go,
you know what? They've given us
the 2FA. We trust them. I'm not concerned someone is on my computer logging into my thing. I'm
concerned about the seven other billion people who might want to log in from somewhere else,
right? So I think there should be a little bit of like, hey, if you've already logged in
on this device, maybe you don't need the 2FA every time. You could even refresh it monthly,
but not four times this morning, right?
That's when I'm like, ah, 2FA, it's driving me nuts.
So that's my rule.
Yeah, but I mean, to be fair,
GitHub doesn't do that or PyPI.
No, GitHub is great.
GitHub, I have no complaints.
Okay, because I don't have to do it every time for GitHub.
No, GitHub is really good.
Yeah, and I've been using 2FA
for GitHub for quite a long time. It's been optional
for a long time.
Or a while at least. I have a
very short memory. I'm really good with open source
because I have the same memory span
as the general
technology memory of open
source.
Nice. Alright, I have
one other really quick thing. You brian we always have um good luck
reaching out to our our listeners about things and this one is a little bit different so i recently
got a brand new adventure motorcycle as of last week which is awesome and i found some fun places
to take it and ride like i rode up into the snow around here in the coastal range and stuff like that i'm
looking for somewhere fun in the northwest to go riding that's like not intense motocross off-road
but you know would be a lot of nice types of things nice view yeah just get out and get out
in the woods and cruise around this summer this spring and so listeners out there who know where
to ride around here that's not one of the couple couple huge off-road vehicle like state sponsored areas around Portland.
People got it, shoot it in.
And if you want to know why I kind of got this bike, how much fun it was, there's a cool video I linked to with Ben Townley and another guy, something Raymond.
I can't remember his first name.
Anyway, you can check that out.
And yeah, that's all I got for my extras how about you i mean while we're
asking for like contributors i i just uh we just passed we were driving around this weekend and
saw like a group of um uh motorcycle like 10 20 people riding motorcycles um and since we've got
a couple harley places around here so there's's, there are like Harley groups around, but when I was a kid, I was like scared of these people. Uh, and these people, just people with my motorcycles, um, mostly wearing black leather. But, um, now they're like, I mean, it's mostly people my age or older, you know, it's 50 to 70 year olds riding motorbikes just to hang out with your friends. Well's cool i mean at least that's what i see but i was i think it'd be cool if if i could see like uh are there like e-bike
gangs um or or like are there e-bike groups of people like uh just a bunch of e-bikes riding
together or um anyway i bet those e-bikes are awesome like like electric bicycles they're so
cool and i'm sure there are actually.
But that'd be cool to see a picture of like a bunch of them. Yeah. Anyway.
Do you have to get one of those club patches for it?
Probably. Do you have any extras before?
No, I don't have any extras. I was just BSing. So let's do a joke. Awesome. All right. Let's do a joke. And boy, I didn't do this. I didn't plan this,
but boy, did it line up good. So
this one comes to us from Programming Humor on Reddit. And just check out this picture,
Brian, here really quick. Describe the picture to folks. You see this? So it's some sort
of logging into GitHub. And somebody's got to do a code review here in the morning.
Oh, there's over a million lines changed
and 20 deletions.
1,094,000 lines changed, 20 removed,
so not too bad there, but 2,945 files to review
and zero of those.
It's like, let's get started.
So the title is,
Anyone Else Having This Kind of Colleague?
What a Way to start a Monday.
So my, any guesses what they did? I'm guessing that they like applied black to their project and just changed
everything.
Yeah, maybe the comments are comments.
A section is pretty good too.
Someone else like suggested that maybe the,
the get commit message is fixed typo fixed typo or something like that
added stuff small update small update yeah that's funny reformatted every line of gun
exactly so it's a case of the mondays one of the best shows ever replace all spaces with tabs um exactly nice uh so anyway cool all
right well thanks again for this wonderful episode i had a lot of fun i hope everybody
else did too thanks everyone for listening see y'all later bye