Python Bytes - #300 A Jupyter merge driver for git
Episode Date: September 6, 2022Topics covered in this episode: Test your packages and wheels The Jupyter+git problem is now solved Help us test system trust stores in Python Making plots in your terminal with plotext jinja2-frag...ments SLSA 3 Generic Builder for GitHub Actions GA Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/300
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is the big episode 300, recorded September 6th, 2022.
I'm Michael Kennedy.
And I'm Brian Ocken.
And I'm Seth Larson.
And this episode is brought to you by Microsoft for Startups, Founders Hub.
More about them later.
Seth, welcome to the show.
Thanks for having me.
This is so exciting.
I didn't realize it was going to be a 300.
Yeah, well, you hit the jackpot.
This is the big one.
The big one for at least two more years, I would say.
And Brian, how about that?
300 episodes.
That's amazing.
When did we start this?
We should look this up.
It must have been a while ago.
I don't know.
I mean, that's 5.7692307 years.
Like, that's almost six years.
It's amazing.
Actually, a reason that I'm so focused on floating point numbers and large numbers.
We're going to get to that at the end of the show.
2016.
We started November 2016.
That's pretty cool.
Yeah, absolutely.
Anyway.
Yeah.
Very cool indeed.
David says, congrats on 300.
Thank you, David.
Thank you for being here.
Indeed.
Awesome.
All right. Well, I've been thinking about wheels and packages lately so yeah you you were thinking about the phrase
rolling rolling wheel gathers no moss or something like that how it goes in programming no i wasn't
thinking about that at all all right what were you thinking about tell us about it okay so i was
thinking about actually using different packaging tools because PyProject.toml is
supported like by tons of stuff now.
Well, by tons of stuff, I mean like three that I know of.
So we've got, we've got flit.
Well, poetry also, but I don't use poetry.
Anyway, I've been using flit and hatch and setup tools, which are all really easy to
use with PyProject project at Tomo lately.
And I've been using like the flit method of building wheels and hatch and set it in Python,
the build package also Python, or the if you just pip install build, you can do Python dash m build
the build stuff, which is fun. But since I've been building all these, I've been using a lot
of tools to try to
like check these wheels to make sure that they're the packages and wheels are, are what I expect is
inside. So there's this there's a few tools I'm using. One is wheel inspect. And this one,
actually, it's kind of cool. You can use it programmatically if you want um i'm not i'm using the it comes with this
thing called uh wheel to json and it uh if you run that on a wheel and you give it a wheel name
it just pops out all like dumps the json uh information about the wheel and um and i've
been using this to just uh i'll like use different ways to build things and then dump this into a file and do a diff to just sort of see what's going on to make sure that, like make sure I got like the description correct or everything's right.
And just because I'm curious if all of these tools are building this kind of the same thing and they kind of are, there's a slight differences, but it's neat that there's so many options now.
So wheel inspect, uh,
is really cool for, for wheels. I'm also using, uh, a thing called check wheel contents. Um,
and this is kind of like a linter for wheels. So if you throw this at, um, because it's possible
to make valid wheels that don't have really anything in them, um, or they don't have the
thing that you thought was in there. So there's,
there's,
this is a linter that goes through and it gives you a whole bunch of
warnings and stuff.
If you,
you can kind of look through like a,
a W zero zero one wheel contains a PYC and PYO files.
Like somehow you've configured it wrong to grab that.
And I don't know how you
would do that for the lot of stuff, but with flit, if you have possibly, if you accidentally
threw those in your get, because flit just grabs anything that's checked in, I think, or committed
duplicate files, it checks for that. So it checks for a whole bunch of stuff. So this is handy just
to check as well. But the powerhouse that I'm using, of course, is just talks. I kind of wanted to cover the other ones because they're
fun. But I wanted to remind people that one of the great things about talks is it builds things
on your own on its own. So when you when you run talks on a package, it will build the package,
then install it into an environment. And then then you run your tests. We think of it as more of a test runner, but it does that whole packaging loop also. The, and then the fourth way, I don't
have a, like a slide for this, but the the fourth way that I've been doing is you can just push
them into a get repo, and then you can do the pip install get plus, and then the repo name thing.
And pip will use your packaging tools to create the wheel
before it installs it so that's another way to to check your packaging so yeah doing a lot of
packaging so anyway i'm always super paranoid whenever i configure something to do with
packages so my my method tends to be just unzip the wheel as a as a zip file and see what's in
there see what landed i i didn't try that so what does that do if you just unzip the wheel as a zip file and see what's in there, see what landed.
I didn't try that.
So what does that do?
If you just unzip it?
Way number five, Brian.
Yeah.
So does it just zip, unpacks it in place then?
Yeah, wheels are technically zip files.
So you can unzip them and just inspect what made it in there.
Yeah, put a.zip extension on it,
and then you can just put zip tools on it, and off it goes.
So it must store the metadata somewhere then also though.
Yeah, there's a top level metadata file
that says all the things that it's about.
I love the pun in the chat we got from Pylang.
We'll get stuff, Brian.
Brian, that was real good stuff.
Thanks for bringing it. Yeah yeah so on to the next one
for mine huh yeah before we before we jump onto it you see i have my my race jersey on because
the portland grand prix indycar race was here this weekend so people listening and we're close
by they missed it but next september go be sure to go it was really really fun three days of racing
very nice were they were they fast cars they were we have zindi cars they were like close by. They missed it, but next September, be sure to go. It was really, really fun. Three days of racing. Very nice.
Were they fast cars?
They were very fast. They had no AI. No artificial intelligence yet, from what I understand.
But if you look over on fast.ai, there's something that anybody who does proper data science
is going to be pretty jazzed about.
So Jupyter notebooks are notoriously bad citizens of source control and get and tools like that.
The reasons are basically whenever you have a notebook file, if you've ever run it,
the output and the order in which the cells were run and the number in which the cells were run, and the number of times the cells were run,
is stored in there. That's not great if someone gets the file and runs it, someone else gets it
and runs it, and then you try to put it into source control. That's a problem, right? I mean,
when you and I work on our code, we have Python files, the output goes somewhere, we check it in,
the source code goes in. But with Jupyter, the outputs go in.
Not just the outputs, but the memory address of some of the object used in the address.
So even if it's you running it twice, you get merge conflicts, which is not the coolest
thing ever.
I suspect that this goes by the name the Jupyter plus Git problem, where really it should be
the Jupyter plus version control system
VCS, because it doesn't matter what you're using. Anything that just diffs files is going to hate
this, right? Anyway, the article and the feature really that I want to talk about is the Jupyter
Git problem is now solved from Jeremy Howard over at fast.ai. The solution may surprise you.
So it talks a little bit about the challenges here. And it says, it's interesting, it speaks in terms like that are not really developer oriented. It speaks more in terms of like end users. So like the way that maybe a first year science student might experience what the problem is not the way a seasoned data scientist would. Like, for example,
here's the problem. The problem is when you're collaborating with others over Git,
you literally can't load your notebook if you both try to check it in because it's broken.
Well, what does broken mean? Broken means it has merge conflicts written into it.
That's really the problem is you can easily solve this problem if you accept their changes or accept your changes, but then you're losing data, right?
So anyway, I says, okay, let's, let's look inside.
Well, there's JSON and then there's like the head and then the, the Shaw like diff error.
So I kind of already described this, but they do go into examples of like, when you're talking
about matplotlib or something like that, you'll have things like matplotlib.axis.subplot.axisubplot
at some memory address, right? Which is suboptimal, let's say. Yeah, there's a lot of axes. That's
right. Then non-determining outputs and so on. It says, okay, we identified two categories of
problems here. And I would like to say this is only accurate
if you have zero-based index when you start counting.
So we've identified, in Michael's term,
three problems here.
One, Jupyter Notebook formats are fundamentally incompatible
with version control.
Problem zero.
Problem one, Git conflicts lead to broken notebooks.
There we go.
And many of these, almost all of these conflicts
are unnecessary because metadata,
like the environment, the machine name and stuff
that it was run on, as well as the memory address
of the objects is stored inside the file.
What do you do?
Well, there was this thing called NBDEV that would allow you to
clean the file. I think it was NBDEV that will let you clean it. There's other ways to clean
it within Jupyter as well. You can say, I'm only going to commit to version control the empty
version, right? You can say clear all cells and then commit that. Then that would be fine
because you're wiping all that data out. However, sometimes that data is incredibly hard to compute, right?
I have a picture. The picture comes from an hour of doing training machine models and then processing
a gig of data and then looking at this picture. If I don't clear it and I check it in, the picture's
right there. You know what I mean? Or some of the outputs are right there. So there's a huge reason
to not clear it because it might be incredibly hard to regenerate it.
Maybe on the system you're on,
you can't even run the code necessary, right?
You don't have access to the database or whatever.
So here's what they did.
There's a new NBDev named NBDev2 as part of the name,
not a version, but the name.
And this comes from the folks at Fast.ai.
And here's how it works.
It has a new merge driver for Git, okay?
Instead of like processing the files,
it says what we're going to do
is we're going to set up hooks in Git.
So when there is a merge,
our special Python code that understands notebooks
will present a different view for you.
Wow.
I know.
And there's a new save hook for Jupyter
that automatically removes the unnecessary metadata
and non-deterministic cell output.
So what you'll get is when you open up
this conflicting notebook in Jupyter,
you'll actually have the diff shown
instead of having a corrupted notebook.
Additionally, it drops out the metadata
so you get these unnecessary ones are just kind of gone.
So it talks about some interesting things that you can do there.
You've got to run NB dev install hooks to get it set up and some other various things.
There's also a lot of history on what has been done before.
What are some of the other alternatives?
But the big takeaway is the folks over at Fast AI have been using this internally for
several months and they say it has
transformed their workflow. It's totally solved this problem. And the reason they care so much
is almost all of their work, their unit tests, their documentation, their actual code, everything
is in notebooks. They're like all in on notebooks. So having Git be a first-class citizen is obviously
important. So I recommend people check this out. Postscript side bonus here is there's
another thing called review in B. Review in B is about reviewing, say, a GitHub pull request. So
somebody fixes a bug in a notebook and they do a PR and say, oh, you were generating this graph
wrong. You should have passed this parameter, which means a totally different thing. Wouldn't
it be nice to have a picture of the before graph and the after graph? With this review in B, that's exactly what you get. So you
get your code diff, but then you also get the output diff, which might be a matplotlib picture.
Isn't that cool? That's really cool. I'd be surprised if GitHub doesn't have this eventually.
Yeah, well. This seems like a logical next step. Yeah, it sure does, right? Notebooks are so
important. Right, but it's not justithub though so some people are using git just straight so exactly
right right or or git lab or whatever yeah yeah this is pretty neat um and this i'm yeah i'm
one of the things i really like about this is the all the part like the all the other solutions that
we've tried and everything i mean data science people are really good about covering that sort of stuff where a
lot of other people are like, hey, I came up with a problem.
I solved it.
Maybe some other people have solved it also, but yeah, whatever.
Exactly.
I will say this set of tools exactly solves a problem I had not that long ago.
Okay.
So this really resonates with you, huh?
This resonates with me.
Yeah.
Using notebooks for documentation
and as part of like an integration test suite,
like this is great.
Yeah.
Very cool.
Piling on the audience says,
ah, so it looks like you can actually resolve
merge conflicts inside the notebooks
rather than traditionally ignore conflicts.
I believe so as well.
I think there's like a merge,
merge inside
of jupiter type of thing you can do neat yeah that's it i haven't i haven't totally used it
all right anyway if you're into data science or that aside if you do jupiter and you care about
source control this looks really helpful which you should care about source control yes exactly
yeah so if you use jupiter yeah full stop. Cool. There you go.
Awesome.
All right.
Seth, over to you.
Before we jump into the first topic you want to talk about, though, just real quick.
We were so excited about episode 300.
I didn't give you a chance to introduce yourself properly.
So give us a quick background on you and then tell us about your item.
Yeah.
So I'm currently an engineer at Elastic, working on the language clients team.
Previously, I was the maintainer of the well-known within the Python community, the Elasticsearch client.
Now I'm doing tech leadership for that same team.
And then in terms of open source work, I am a maintainer of many different Python packages, most notably your Libs3, which is most downloaded
Python package. And it's one of the dependencies of requests and Bodo and a whole bunch of other
really foundational packages. That's incredible. Does it make you nervous to make changes to it?
Oh, yeah. So the very first time that I became lead maintainer and had to make a release,
it was I actually spent multiple hours just kind of looking through the wheels
and the source distributions
and making sure that everything was right.
It was a tough day, honestly.
Yeah, so that chat with that Brian open with you,
you've been there as well, huh?
Nice.
All right, well, what's your first item for us?
Yeah, so my first item is about trust stores.
So this is about like certificates
that you use to verify HTTPS
connections. And so this is a library that me and David Glick have worked together to implement.
And it's essentially trying to solve the problem of certify with Python and how it kind of interacts
with certificates that aren't necessarily trusted by the greater world. So for example, if you have like a corporate proxy, if your company is installing
a certificate on your behalf, enable it to do proxying of some sort, Certify just doesn't work
with that. And you get these errors that are kind of insurmountable. You get errors that require
really low level debugging knowledge to figure out. And so we went and implemented this.
Anything that has to do with certificates.
If it goes wrong, it's just like, well, that's never going to work.
I guess we're done here.
It's just so hard to understand, right?
I'm on a campaign to make it so no one on the world needs to type verify equals false ever again.
That's my mission.
Awesome.
Also, you spoke about Certify.
Give us the background. I'm not sure, you spoke about certify, like, tell us what,
give us the background. I'm not sure we all know what certify does. Sure. Yeah. Certify is
essentially, um, every web browser like Chrome and Firefox and all of that, they have a bundle,
a group of certificates that they are marking as these are trusted. Um, and they kind of bundle
those along with every single web browser, right? And so Mozilla,
because it's open source, it open sources its trust store. And so what certify is, is it's a
small, really thin wrapper Python package around that bundle. And it allows Python to make HTTPS
connections to websites essentially without having to like rely on a certificate trust
store being configured manually by the user.
And so a lot of times because Python is installed on Windows or Mac OS, but is relying on Open
SSL for a lot of its TLS, it really requires a file to be there.
Like Open SSL doesn't know anything about the system certificate trust or any of that. It requires a file to be there. Like OpenSSL doesn't know anything about the system,
certificate, trust, or any of that.
It's very, it requires a file to be there.
And so certify is solving that problem.
I see.
So if I went and installed it, if I was on like windows and
installed it into the trusted root store or something like that, it
wouldn't, that wouldn't count.
That wouldn't be enough.
It wouldn't be enough.
Yeah.
You would, there is a whole bunch of other things that you get also by using these native operating system APIs for certificates like auto updates.
It can be centrally managed.
So, you know, your IT department can click a button and update everyone's system trust store.
So, yeah, there's a lot of really good benefits to using the system trust store instead of this Python managed file.
And this article kind of goes into the nitty gritty of that.
But the big announcement for this project was that PIP actually,
with the version 22.2 release, added support, experimental support,
for using this library instead of Certify to verify HTTPS.
And so what this will allow people to do is try out TrustStore optionally, right, instead of switching it to verify HTTPS. And so what this will allow people to do
is try out TrustStore optionally, right?
Instead of switching it to a default.
And if they're experiencing this class of errors
with installing Python packages
or upgrading Python packages,
they can use one flag.
It's, I believe it's listed,
either way it would be listed here.
So you do dash dash use dash feature equals trust
store. And that will, you'll recognize that use feature flag for the 2020 resolver. That's another
feature flag that they use. So this trust store feature flag is the same thing. It will, if trust
store is installed on your system, it will use that instead of certify. And it allows you to get
around the errors that you can see when you have a corporate network involved. So yeah, that this is kind of
the big thing that I'm really excited about. And we're really hoping that in the future,
we can add this to Python, maybe make this a default for requests like there's a whole bunch
of different, really interesting things that we can go forward with if we can prove that, hey, this is useful to these users.
Right.
Yeah.
Yeah.
Fantastic.
So if I say dash dash use feature equals trust store, do I have to previously have installed trust store or something like that?
You do have to have previously installed trust store.
So the package is relatively new.
It's less than a year old. And so to ensure that we're able to keep things moving because it's experimental, we didn't want to bundle with pip.
Their release cycle is a lot longer.
I collaborated with Su Ping for a good long while on this and making sure that everything was all good to go for pip since shipping with pip is a big deal.
So, yeah, it's been a long a long road so yeah this looks super useful uh kim out in the audience
says i'd love to never need verify false again on my internal network seth's mission is fantastic
yeah yeah i'm very grateful that this work is going on i hope that that's true because it drives
me nuts is this something you have to deal with internally as well, Brian? Yeah, because we've got
internal network, corporate firewall,
we've got the trust stores on Windows
systems and it is an issue.
So a lot of, I mean, one of the ways we get around it
is to have internal pipey eye
we'll get we'll get a mirror inside yeah um but uh sometimes i want to try out stuff that's not
there so um having having something like this work uh would be good um but it's not just pipey
it's other places too it's uh so yeah the entire entire outside internet is usually impacted when
you when you have that sort of situation of a
corporate proxy so yeah and i i'd like to be able to and that so i'm i'm guessing that this trust
store i mean using it within pip would be great for a lot of people to try it but uh trying out
this trust store for applications that depend on uh trusted uh sites that would be helpful as well
right yeah so actually the documentation if you're trying to use it manually with other things,
we support Eurolib 3 AIO HTTP requests,
and I'm sure it'll work with other libraries as well.
Nice.
Like HTTPX?
Yeah, it should work with anything
that uses the standard SSL context API.
As long as it can use that API, it should work with it.
This is great. Awesome.
Very cool.
Nice work. Thanks for coming on and sharing it.
Hopefully it makes corporate Python a little better.
You know, there's, this was long ago
when I first started the podcast,
this one and TalkPython.
There was a lot of debate or discussion, I guess,
whether Python was an appropriate enterprise software type of language.
You know, I think that debate is largely over.
And I think the reason it's over is because the data scientists said it's this is not a debate.
You want us to do the job or not do the job?
That's right.
OK, well, so let's use Python.
And it kind of spread from there internally through acceptance. That said, like now that it does live in these environments
that Brian described much more frequently,
it's really important to have this support.
Yeah.
It's actually really funny because,
so to put this in perspective for Java folks,
this is like Java trust stores is like Certify
where you have this manual thing that's shipped with Java
as opposed to just using
the system and i i got that comment on uh lobsters or something that was talking about this article
and they were just like wow this is like getting rid of java trust stories this is great i'm like
okay i didn't even know that existed that's right we really hate it over there and yeah we hate this
so this is great i was like okay thank you cool all right well before we get to the next topic
brian let's talk about our sponsor for this week in many weeks this year microsoft for startups
founders hub if you are starting a business doing a startup you are a little ways going or you're
just thinking about it you should really check check this out because Microsoft versus startups set out to understand the challenges that we all have creating startups
in this digital cloud age.
And they created Microsoft versus startups founders hub to help solve many of them.
So that includes getting cloud resources, GitHub credits, other credits like AI credits, for example, from OpenAI that you can run your code on.
But maybe even more important than that, it has support for connecting you with mentors and experts to make sure that you go in the right direction when you're young and getting started. So, so often you see the successful startups being in places where there are a lot of mentors,
where there's these networks and people have connections to get funding, the marketing
side of things, the product market fit, all of those things are super hard.
So if you are part of Microsoft for Startups Founders Hub, you'll have access to their
mentorship network, which gives you access to hundreds of mentors across a range of disciplines,
like the ones I just named and more,
as well as up to a little bit over $100,000 worth of credits
in Azure and GitHub and OpenAI and other places
as you go through certain checkpoints
as you sort of grow within this program.
So really tons of super support
that you can get for your startup.
It doesn't have to be investor backed.
It doesn't have to be third party verified to participate.
All you have to do is go to pythonbytes.fm
slash foundershub2022, apply.
And if you accept it,
you'll get all of this support from them.
So make your idea a reality
with Microsoft for Startups Founders Hub.
Apply today for free.
Get in, you'll get tons of support.
So very nice.
Also nice, Brian, plots.
Tell us about these plots.
Plots and command lines.
So I like command line stuff.
And actually with the thanks of Will McCougan,
we've got a lot of people excited about CLIs.
But apparently Bob is also, Bob Belderbos
from the PyBytes duo.
So I like this article.
So actually, I kind of skimmed the article.
Sorry, Bob.
But making plots with your terminal
with plot text,
if you install it,
I think it's plot text.
I can see the typo squatting happening right now.
Yeah, so if you pip install it, there's one T in the middle.
So it's P-L-O-T-E-X-T.
So he had some code where he was looking at
plotting the frequency of their blog articles on the terminal.
So he was using some of their own data to plot stuff
and he came up with like uh it's kind of cool walking through how he grabbed the data
and everything but uh i was looking at this plot going oh this is a pretty nice looking plot i mean
it's totally blocky of course but um but it's a bar chart so it's supposed to be lucky so that's
okay and so then i went over and looked at this this uh package that was plot x'd um and it's supposed to be lucky. So that's okay. And so then I went over and looked at this package, this PlotExt.
And it's cool.
Look at all these awesome plots.
I was looking at some of the various things you can do.
It's got basic plots for, you know, just like sine waves and things like that.
But you can also do fill-in plots and then uh multi-color this is
kind of a lot you can kind of cool stuff you can do on the command line and then even data streams
which i was uh look at that it's a data stream going on in a plot in your terminal it's pretty
great uh images even so there's a cat image you can do lol cats all day long yeah i say the people that put together
those examples knew what the internet wants kind of do cat pictures yeah so um and then even uh
subplots so the the first example we saw it it has a it has kind of all this this it's not actually
that bad of uh the interface it looks pretty um you know it's tedious to put together plots anyway
but this isn't too bad but that that cover image that we saw is a is not a combination of images
that's one plot that with subplots in it so i see that's cool so within one terminal window you can
do almost like a dashboard view with different plots and they could probably can be updating live and yeah yeah so this is
pretty exciting i like it uh so anyway that was just i just wanted to say hey if you want to plot
on the command line you can use this so i'm loving this terminal renaissance is so fun so yeah we
make me make us feel like uh hackers again you know so it does absolutely make us feel like a hackers again, you know? So it does absolutely make you feel like a hacker.
I love it.
That's so good.
So, all right.
On the next item.
Yeah.
Just, uh, hadn't really planned to talk about this, but I just yesterday did an episode
with Will McGugan, seven lessons from building a modern 2e framework.
Brian, you covered that article last week on this show.
So I reached out to Will and said,
hey, we should absolutely cover this stuff
in like a deep dive.
So people-
Oh, I can't wait to listen.
This is great.
People can go check that out as well.
All right.
But let's talk about one of my very favorite things,
HTMX.
People who are not familiar with HTMX,
you really owe it to yourself to check this out.
It's what the web should have been forever,
but it wasn't for some reason.
It's like it stalled in the late, mid-90s.
I don't know.
And hyperlinks and forms are the only things
that can make requests.
You can only click on them to make it happen and so on.
Why should the entire screen have to be replaced,
every interaction and all those things?
So HTMX is awesome.
You can just put in little fragments of declarative code and it does all the cool work. You can have a class on it if
people want to check that out, but that's not the topic of today. The topic is template fragments.
So Carson Gross over there wrote this article, this essay called Template Fragments. It said,
one way you might consider doing this is in HTML, you very frequently
have to first show the page and then as little sections of an update, it goes back to the server
and says, I just need the code, the HTML block that goes into this fragment here because somebody
moused over something else. So refresh its related item or whatever. He's a big fan of this thing
called locality of behavior design principle, where instead of having a bunch of pieces that cling
together and reassemble themselves, like if it could just all be right there, wouldn't that be
great? So he says, normally the way that you would have to do this is you would have to have your
full HTML and then a little subsection. And then that subsection has the optional element. But some frameworks, some template libraries
allow you to define a fragment.
And then when the code is requested on the server,
it can either show the whole thing
or just peel that fragment out of the HTML,
but you don't have to parse it into a bunch of small files.
Cool, huh?
It's really useful if there's no reuse.
Like if the only reason you would make that little fragment
is so that you could return it separately, this is great because basically it means you can just write the page once and it's
it can interact with different data different elements if for some reason that fragment was
being used in multiple places all of a sudden it's like code duplication and that's not ideal
but so we talked about this and hey there's some known uh implementations of this apparently django has the render block
extension i created the ginger partials and chameleon partials which i'm not really sure
i'm thinking i might actually take them out now that there's something for ginger better which
i'm about to talk about but nonetheless those are kind of sort of allow this but more more in the
second descriptive way where you have like a fragment that's separate but included.
But I was talking with Sergey of Rixies.
He said, between Ginger 2 Fragments and my Ginger Partials,
HTMLX plus Flask is so awesome.
So he created this library called Ginger 2 Fragments,
which does exactly what I described.
So in Ginger, you have blocks, like you might have your main HTML and you say, here's a block of main content with his library.
What you can do is you can say either just render the template or you can now render block and name
just part of your Jinja template. And that part comes back with the data you supply to it. That's
pretty awesome, right? Like this, this one paragraph is the whole response from the server if you call render block instead of render template this is yeah this is super great honestly
i on twitter i every time i see htmx i'm just like i am so like prepared to write a website
because i've not had the use case for a while but i'm very excited for the next time i will have i
exactly the same i'm working on projects that have been around for six or seven years.
I'm like, if I rewrite this thing, it's getting HTMX all over it.
But I just can't bring myself quite to do it.
But yeah, it's so good.
One day.
A couple of comments from the chat.
Vincent from CalmCode says,
HTMX is the bee's knees and that CalmCode uses it a whole bunch.
I am not surprised, Vincent. awesome yeah yeah if i any website i create after knowing about htmx is likely going to be
using htmx if you thought the answer was view js or react or something like that you may really
really really want to check this out first well especially if you're somebody like me that i'm
like yeah i want to i want to put this interactive stuff in here. I don't, I don't really feel,
I'm not an expert in JavaScript though. So I'm not sure. And so, but I, but I do know somebody
that knows a lot about HTMX. So you might know someone you're venturing very close to getting
me off onto like a very long rant about htmx but it's so good because
even if you know javascript it wouldn't it be better to not have to think about now i'm running
client code now i'm running server code now i'm running the apis to connect the client code to
the server code this one's in this language it knows this that one's in that language in this
location it knows that like in htmx you just write it all in one place in one language with the same
context and security model and everything.
Access to the database, for example.
And then you just do what you need to do.
It's perfect.
And it's not really just about thinking about two languages either.
There's a lot of people, like me, that already have to think in two languages.
I'm thinking in C++ and Python.
So thinking about it in a third language or a fourth language,
that's, it's like, you know, come on.
Having a place to stop, plus, yeah.
Yeah, yeah.
A final comment I'll make on this is
even people are using Node.js like HTMX,
where it's the same language.
It's like, it's also just about the context
and location switch.
Oh, yeah.
That's, I hadn't heard that.
That's pretty cool.
Yeah.
Seth, it sounds like you were going to say something. Maybe I'll let you have the last word here. Oh, no, I was honestly just going to say that location switch oh yeah that's i didn't hear hadn't heard that that's pretty cool yeah seth
it sounds like you were gonna say something maybe i'll let you have the last word here oh no i was
honestly just gonna say that like the more we can stay in html the better because you have to know
html so you might as well stay in it right yeah absolutely absolutely so uh well done sergey check
out his ginger two fragments framework it's it is super new. Like, I don't know
when it got released,
but in a couple days,
these are like
two and three days
on all the commits here.
It's a lot of days.
It's very,
very new.
Two to three days.
Yeah.
Well done.
Well done.
All right.
Seth,
over to you
for the final one.
Sure thing.
Yeah,
this,
this article
was announcing
something that's been getting worked on for a while, which is generic generators for Salsa 3.
So what you're seeing there, SLSA, that stands for, if I can remember, it is Supply Chain Levels for Artifacts, Levels for Software Artifacts.
So SLSA, and you pronounce it Salsa.
And it's essentially...
It's a great way to say that acronym.
Yeah, right?
Makes you hungry every time, which is the best part.
But yeah, it's basically a set of tools and standards to attest and verify the provenance
of artifacts.
So essentially, where did this thing come from?
This file, this wheel, this jar,
depending on what like ecosystem independent, whatever thing, whatever artifact you're
building, where did it come from? How was it built? And it so it uses a whole bunch of different like
cryptographical primitives and open IDC, which is basically magic, but it basically allows you to prove in effect, okay, this was built
from this specific GitHub repository, this commit, this tag, and someone can then later
take this file, this artifact that got built and then verify that that was the case.
And so this is kind of like in the future, hopefully be used as like a defense against maybe like stolen credentials
on the Python package index.
That would never happen.
That would never happen, right?
That's never happened.
That has never happened other than last week.
At the time of the recording,
never has happened, I would say.
So yeah, it gives a good defense against this, right?
Because if you, let's say you have a package
and the Python package index knows that this package came from, you know, github.com slash Seth M. Larson slash
whatever, right? And then in the future, it received something that doesn't come from that
GitHub repository, it can flag that and say, hey, this isn't right, like this didn't come from the
place that it came from before or wherever it's supposed to come from.
And the fact that this is generic is the big deal.
The part that ties us back to Python is that you can use it for wheel files and source distributions.
You can sign like anything.
And so, for example, one of the Python projects that is featured in here is your lives three.
I've been trying to get into this and it's been really successful.
And so your lib three now does this and you can actually verify that it came from a specific
repo and that the wheel was came from a specific tag. And yeah, it's, it's really interesting.
And this ecosystem is like just getting started. And so if you're like interested in anything about
like supply chain security and all of that, this is like a great place to start doing some learning
about what the future might look like.
Yeah, this is great.
When I first saw this, I thought, okay, this is cool,
but how does that really help protect
against somebody sabotaging a package?
But then again, if you think,
and I realize if you think back to what happened
with some of those other packages,
somebody got ahold of the PyPI account,
not the GitHub account.
And they just
published a new version directly not through the ci right right yeah so this is making it just makes
the amount of things that need to get compromised even larger right like right it closes no longer
do you need to only compromise the email account on pi pi you have to also compromise github and
then if you have, you know,
GitHub environments configured,
you need to compromise a second account
to like review the deployment.
And so it just makes it even harder
to actually get that attack off essentially.
Yeah. And if you had to publish
the actual vulnerability
to a popular GitHub repository to trigger it,
it would be discovered sooner, right?
Because people are like, oh, what's J...
Oh, that's unusual.
Who has made this...
They've made this commit,
and now it's doing this URL thing over to hacksore.com.
And, right?
Like, that's just another out-in-public thing,
whereas if the direct account gets attacked,
somebody can just use Twine or
something directly to push a bad wheel up. Yeah, exactly. Yeah. The more pushing bad wheels,
you have to go through so many different hoops just to do something.
You need to flatten those bad wheels. Yes. Got to inspect them too.
Exactly. All right. Awesome. This is good stuff. Well, Brian, that's... No, do you have any more?
No, that's all of them.
Do you have any extras for us? I do. Although I'm going to try to make it quick because now
I'm hungry for some salsa. So I wanted to, I'm like super excited for this upcoming weekend.
I can't believe it. So on Saturday, on Saturday, September 10th, I will be in San Francisco.
And I've got two events going on at Pi Bay.
So Pi Bay, awesome conference.
I haven't been there before, but you've been there last year or something like that?
Yeah, last year, and I absolutely loved it.
I would go this year if I wasn't on single parent duty and had kids that had to go to school.
So I'm giving two events.
So one of them is a Sharing is caring PyTest fixture edition. I'm going to talk about building. Actually, I'm just
going to talk about packaging, but it's not really about packaging. It's about sharing fixtures with
other people. And because I think that that's a bigger need than people realize. So anyway,
love fixtures. We're
going to talk about that. And then, um, and then I got asked to be on this experts panel. There's no
with, uh, we got, uh, Zach Hatfield, Dodds, me, Andy Knight, uh, which is, um, he's got a good
automation, automation Panda. That's right. Uh, Joshua Grant and Nishat Khan. So it should be a fun panel. And it's at
seven o'clock at night. I'm like, wow, I think I really need to change my flight because I was
planning on flying out at 8 a.m. the next day and it's going to be tough. So that's going on
next weekend. I'm pretty excited. Yeah. Bylang says good luck on the talk brian oh thanks so how about you do you have any
extras i do i do a bunch of i'll make them pretty quick so heroku you know the platform is a service
place they for 13 years or something have had a free plan where people can go and create what
what are they called dinos or something i don't use yeah din yeah, dinos. I don't use Heroku. So I don't know all
the terminology and how all the plans break down. But for a long time, they've had free plans.
But now they are canceling them. And you will either have to pay or delete your projects. So
that's going to affect a lot of people. They have something like 13 million.
What's the right number here?
Claims, yeah, that it's been used by 13 to develop 13 million apps.
So I bet many of those are free and are going to be suffering this.
There's an interesting discussion on Y Combinator.
So you can check that out.
I'm sure it's very civil over there in the comments as it always would be.
Yes. it's very civil over there in the comments as it always would be. Yeah. But basically,
you know, Heroku was purchased by Salesforce for, they claim, and it may be true. I'm sure that it is somewhat true. They want to cancel this because of fraud and abuse. It may be more that they have
to spend so much money to fight the fraud and abuse that it's just not worth it to them. I
don't know what it is, but however you land on the, it's a good idea, bad idea,
it's going to cost money if you want to use this.
And it's pretty pricey, by the way.
This change will roughly double the cost
of a basic plan that uses Redis
from up to $50 a month.
If you start bringing in your Redis cache
and your Postgres hosting and your Dinos,
they all add up,
and then you've got to scale this one or that one
right um one of the reasons i'm not using it but not the only reason i want a little more control
as well but anyway so if if you have a free thing running on heroku or you were thinking about it
you have to think again find something else there's actually at the bottom there's a bunch of
um platform as a service things that i've never heard of there's porter railway render fly io
and clever cloud all of these things vying for this business they all look kind of interesting i know nothing about
them you can check it out i've seen fly i o all over the place and python twitter at least yeah
okay so that's if i were personally picking one i would check that one out first but i don't know
anything about any of them to be honest with you the last time I used Heroku was a long time ago.
I'd like to see some real comparisons among some of these.
There's still a place for hobby projects.
I want to try something out, or do something live,
even as a high school app or something like that i know um
oh good you're gonna show python anywhere i was going to i gotta find the right link here we go
um so i think they still have a free tier i don't know if they advertise it much but
beginner's free yeah the the part that bothers me really isn't that it's, I don't, there's a comment about, a comment in the chat about, it's hard to, it's hard to complain about people.
It's a free service, so they can do whatever they want, right?
Essentially.
Yeah.
Oh, there's that right.
That's the right one.
Yeah.
However, the jump between free and $50 a month is a big jump.
And that's my gripe about it.
So anyway.
Yep.
And not to frame this into a recommendation, but yeah, I feel like a lot of the cloud services
have really pushed how easy it is to deploy.
Because I remember initially starting with Heroku, the ease of deployment was the big
win for a lot of people.
And so, yeah, a lot of cloud services where, you know,
you pay for everything you use,
but what you use ends up being a few cents a month,
which is a lot more surmountable than $50 a month.
So, yeah, there's definitely a gap there.
There's not as much of a gap there as there was before.
Yeah, for sure.
Brian out in the audience says,
at my last company, we had to disable our free tier due to crypto miners yeah of course i'm sure and kim also has something yeah stealing the computation
there but all right anyway um okay not i didn't want to go too far down that one but for sure
check check out some of the options below uh digital ocean and lenode are also really really
good options this one i'm full of rants today, potential rants.
This one comes to us from Extreme Tech.
White House, as in the US, bans paywalls on taxpayer-funded research.
It is always felt super creepy and wrong that we have the NSF, which pays billions of dollars
a year, millions for individual research projects to
come up with scientific research that all three of us and many people listening actually pay for.
I'm glad to pay it. I think this is really important. It's important for the country.
It's important for the world. And yet those results get locked up behind really expensive
for pay scientific journals, right? Like you've
got to pay $5,000 a year to subscribe to this journal so that you can read the article that,
wait, we paid to create that and we can't even get access to it? So this article here is,
the White House has updated federal rules to close a loophole that enabled journals to keep
taxpayer-funded research behind a paywall, which I think is great.
So if you're specifically in the data science side,
I think this might be relevant to you.
Yeah, I'm curious how that's going to get implemented.
Yeah, me too.
All right, anyway, there's that.
And then, Seth, back to some of the stuff
you were talking about.
I mean, it would never happen
that someone would try to phish.
Wait, last week, somebody tried to phish. IP, no, last week somebody tried to phish PyPI.
Maybe it was a week before when it started, but not too long ago.
So over on darkreading.com, there's an article that says,
threat actor phishing PyPI users has been identified.
Juice Ledger has escalated a campaign to distribute its information stealer
by now going after developers who publish code widely used on the Python code repository.
Don't want to go too much into it, but there's this group who had originally tried to do typo
squatting, if I'm correct. They wrote some thing to steal some malware written in.NET, by the way,
which Will was joking about it only running on Windows. Hey, if they use.NET Core, they could
expand out the open source version. Anyway, I don't want to give. Hey, if they use.NET Core, they could expand out the open source
version. Anyway, I don't want to give them ideas, but they were distributing this malware through
these malicious packages. And then they said, well, what if we could get really popular ones,
hack their accounts, and then upload bad wheels? So anyway, there's a bunch of background on the
actual people behind this. So it's pretty interesting. You can check out that article
if you want. There's also an Ars Technica article, but it doesn't have as much depth as the dark reading
one. Nice. All right. Last one. I think this is the last one. Brian Skin, former co-host on the
show, who always contributes many interesting things, says Python Bytes will definitely want
to check this out. This is a tweet by Steve Dower that says, we have published the
details of a critical security problem for Python. It is very rare that we have direct
vulnerabilities in Python. Like it was all fun to have the lulls about, um, Ginny, Jindy and log4j,
but this is not exactly that, but it's a denial of service at that kind of scale.
So if you've ever thought,
I have a string and it needs to be an integer,
and that string came from user input,
that's really bad, it turns out,
because there's a denial of service thing
that you can do by passing very, very long strings
to that integer parsing.
Seth, you're shaking your head like, oh boy.
Yes.
Yeah, if you've been waiting to upgrade to Python 2,
now's the time to upgrade Python 3, I would say.
Exactly.
The security support.
And you shouldn't say,
I'll just go to one of the older ones.
Like you need to get the 3.10.7 ASP.
I suspect they'll roll this back to some of the supported ones as well.
So they'll probably back port it to 3.9 and 3.8. But if you're on say 3.6, that's a problem. That's a big, big problem.
Yeah. So expect releases for 3.7 plus in the next week. This came out a few days ago. This has now
been done, but this Twitter thread is super interesting and that's what I'm linking to.
So y'all can check that out.
There was also some feedback like, what are you doing?
How dare you fix this?
The way they fixed this is they said, if you're doing base 10 parsing,
you can only use 4,300 digits.
Not the number to 4,000, but places in the number, 4,000 places. That's a really large number.
If it's bigger than that, basically Python won't be able to parse it before.
Brian, you do C++ all
the time. You have to think about, is this over
32,000? Is it signed
or unsigned? Okay, it's unsigned.
We can get to 64,000.
This is not that level of thinking,
but you kind of do have to think about what
the heck's going on here.
I think it's a fair fix.
I do too.
People are freaking out for no reason.
Yeah, this one was really,
this one's wild too
because you just pass a long number.
Like it's not something
sophisticated or anything.
This is, it also,
it feels almost not log for J,
but kind of log for J a little bit
where you can just do denial of service
by doing something very trivial.
Exactly.
Yeah, you just,
you just try to set your username to jndi colon slash slash hax trivial. Exactly. Yeah, you just try to set your username
to jndi://hackster.com.
This is like, well, the number is a1722117,
and then boom, now it goes to the website, right?
This is denial of service versus remote code execution,
which is clearly better, but it's not good.
Yeah, just hold down the zero key for a little longer.
Exactly.
Or if you're writing Python code, you can just do times 10 000 carat 10 000 you know power to 10 000 or something and send that
yeah string extension really coming in handy here rpad exactly or uh z fill in the right pad
exactly yeah piloting wants to send pi across you You know, that's going to upset it. Anyway, I upgraded my servers to 3.10.7.
They were not available from Ubuntu directly.
It was still the old 3.10.6, which is unnerving.
But because I built mine from source, I just changed the number 3.10.7, rebuild and redeploy Python.
I'm good to go.
I imagine everybody listening to this podcast is on 3. seven or above if they at any chance can be.
I mean, that if they're below, it's not because they haven't tried.
Yeah.
But let me point this out.
I would say, actually, I want to follow up with a couple of things.
Because this is, maybe this should have been the main item, but whatever.
One, we've talked about the reason you should upgrade to Python 3 for a long time. And
Brian, you and I had lots of fun calling it legacy Python. Although we've had people go
into iTunes and like post negative reviews of the podcast because I had said disparaging things of
Python 2, but that's okay. I'm willing to stick by them. Oh my goodness. That is wild.
More reviews. Awesome.
If you have good things to say, also consider posting a review, not just if you're angry that
I called it legacy Python, but if you're on old legacy code, which is even three, five, but is
very seriously Python too, because the gap to upgrade is really hard. These are the types of
things that we warned about that could be a problem yeah and
there will be no fix right you better just say well we're going to make sure the strings that
are really destined to be integers are really really checked and you know i mean it's it's not
good it's not good so just one more reason to be on a shipping version of python even if it's just
three seven yeah all right yeah that's that's, that's it. Uh,
let's see. Yeah. Change log. Uh, one other really quick. Yeah. So you can see it's like actually
described quite well here. Hatch by Gregory P. Smith and Christian Himes. Feedback by a bunch
of great folks. Sebastian Ramirez said, I sent a tweet out when this got fixed saying, please be
kind to your open source contributors.
They just wrote 800 lines of code in a PR so that you can parse strings to integers.
So apparently it wasn't easy to fix.
But yeah, I agree.
Cool.
Ready for a joke?
Or actually, Seth, you got anything extra you want to throw out first?
Yeah, I had a real, hopefully quick one.
So I follow a whole bunch of game art accounts on Twitter
because I just love it, seeing what people create.
And one came by, it was using hashtag pixel, P-Y-X-L,
did a little ding.
I'm like, wait a second, that's Python.
And then I just went back in this developer's Twitter
a few tweets back, and they just released wasm support for this python
like game framework i'm like this is incredible um so yeah it was quite the it was a very fast
journey of wow wasm is everywhere at this point that's kind of kind of wild that it's popping up
so fast so yeah version uh 180 of this um retro game engine for python which they had a whole bunch of really beautiful
examples. I think y'all have covered
this framework before but the Wasm support
is recent. Yeah, this is really cool.
Yeah, so apparently they have a whole bunch of demos
that you can just play in the browser
and I was really blown away that
I didn't even know this existed and suddenly
there's Wasm support for it.
Awesome. I love it.
Okay, that's a great one yeah all right how about
we close it out with a bit of a joke have you ever felt like you've had a hard day at work there's
one of these problems like parsing integers you're like how could possibly this go wrong i just don't
understand what is happening well here we have a joke of a guy at a nighttime soccer game
apparently it's a little cool out but he's been running really hard. So it's a picture
of a guy whose head is literally steaming, like not a little bit, a lot, a lot. I think that's a
visualization of like integer being parsed into a string right there. Exactly. The before. I'll
read what the tweet really says. And then maybe we can play with it a little. It says, the tweet
says just a JavaScript developer after work, you know, like, what do you we can play with it a little. It says, the tweet says, just a JavaScript developer after work.
You know, like, what do you mean I have to do a new framework?
I just did a new framework last month.
I feel like this could be Christian Himes after going,
what do you mean parsing integers that denial of service?
I just can't.
The ints are wrong.
The ints are cursed.
Exactly.
Anyway,
I just,
I'll just leave this here for people to appreciate and we can call it a show
300.
Yeah.
Nice.
Thanks.
Yeah.
Thank you,
Brian,
Seth.
Thanks so much for being here and sharing the work you've been doing.
Yeah.
Thanks so much for having me.
Yeah.
It's been great.
Bye everyone.