Python Bytes - #300 A Jupyter merge driver for git

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is the big episode 300, recorded September 6th, 2022. I'm Michael Kennedy. And I'm Brian Ocken. And I'm Seth Larson. And this episode is brought to you by Microsoft for Startups, Founders Hub. More about them later. Seth, welcome to the show.

Starting point is 00:00:21 Thanks for having me. This is so exciting. I didn't realize it was going to be a 300. Yeah, well, you hit the jackpot. This is the big one. The big one for at least two more years, I would say. And Brian, how about that? 300 episodes.

Starting point is 00:00:34 That's amazing. When did we start this? We should look this up. It must have been a while ago. I don't know. I mean, that's 5.7692307 years. Like, that's almost six years. It's amazing.

Starting point is 00:00:47 Actually, a reason that I'm so focused on floating point numbers and large numbers. We're going to get to that at the end of the show. 2016. We started November 2016. That's pretty cool. Yeah, absolutely. Anyway. Yeah.

Starting point is 00:00:58 Very cool indeed. David says, congrats on 300. Thank you, David. Thank you for being here. Indeed. Awesome. All right. Well, I've been thinking about wheels and packages lately so yeah you you were thinking about the phrase rolling rolling wheel gathers no moss or something like that how it goes in programming no i wasn't

Starting point is 00:01:18 thinking about that at all all right what were you thinking about tell us about it okay so i was thinking about actually using different packaging tools because PyProject.toml is supported like by tons of stuff now. Well, by tons of stuff, I mean like three that I know of. So we've got, we've got flit. Well, poetry also, but I don't use poetry. Anyway, I've been using flit and hatch and setup tools, which are all really easy to use with PyProject project at Tomo lately.

Starting point is 00:01:45 And I've been using like the flit method of building wheels and hatch and set it in Python, the build package also Python, or the if you just pip install build, you can do Python dash m build the build stuff, which is fun. But since I've been building all these, I've been using a lot of tools to try to like check these wheels to make sure that they're the packages and wheels are, are what I expect is inside. So there's this there's a few tools I'm using. One is wheel inspect. And this one, actually, it's kind of cool. You can use it programmatically if you want um i'm not i'm using the it comes with this thing called uh wheel to json and it uh if you run that on a wheel and you give it a wheel name

Starting point is 00:02:32 it just pops out all like dumps the json uh information about the wheel and um and i've been using this to just uh i'll like use different ways to build things and then dump this into a file and do a diff to just sort of see what's going on to make sure that, like make sure I got like the description correct or everything's right. And just because I'm curious if all of these tools are building this kind of the same thing and they kind of are, there's a slight differences, but it's neat that there's so many options now. So wheel inspect, uh, is really cool for, for wheels. I'm also using, uh, a thing called check wheel contents. Um, and this is kind of like a linter for wheels. So if you throw this at, um, because it's possible to make valid wheels that don't have really anything in them, um, or they don't have the thing that you thought was in there. So there's,

Starting point is 00:03:26 there's, this is a linter that goes through and it gives you a whole bunch of warnings and stuff. If you, you can kind of look through like a, a W zero zero one wheel contains a PYC and PYO files. Like somehow you've configured it wrong to grab that. And I don't know how you

Starting point is 00:03:45 would do that for the lot of stuff, but with flit, if you have possibly, if you accidentally threw those in your get, because flit just grabs anything that's checked in, I think, or committed duplicate files, it checks for that. So it checks for a whole bunch of stuff. So this is handy just to check as well. But the powerhouse that I'm using, of course, is just talks. I kind of wanted to cover the other ones because they're fun. But I wanted to remind people that one of the great things about talks is it builds things on your own on its own. So when you when you run talks on a package, it will build the package, then install it into an environment. And then then you run your tests. We think of it as more of a test runner, but it does that whole packaging loop also. The, and then the fourth way, I don't have a, like a slide for this, but the the fourth way that I've been doing is you can just push

Starting point is 00:04:36 them into a get repo, and then you can do the pip install get plus, and then the repo name thing. And pip will use your packaging tools to create the wheel before it installs it so that's another way to to check your packaging so yeah doing a lot of packaging so anyway i'm always super paranoid whenever i configure something to do with packages so my my method tends to be just unzip the wheel as a as a zip file and see what's in there see what landed i i didn't try that so what does that do if you just unzip the wheel as a zip file and see what's in there, see what landed. I didn't try that. So what does that do?

Starting point is 00:05:08 If you just unzip it? Way number five, Brian. Yeah. So does it just zip, unpacks it in place then? Yeah, wheels are technically zip files. So you can unzip them and just inspect what made it in there. Yeah, put a.zip extension on it, and then you can just put zip tools on it, and off it goes.

Starting point is 00:05:25 So it must store the metadata somewhere then also though. Yeah, there's a top level metadata file that says all the things that it's about. I love the pun in the chat we got from Pylang. We'll get stuff, Brian. Brian, that was real good stuff. Thanks for bringing it. Yeah yeah so on to the next one for mine huh yeah before we before we jump onto it you see i have my my race jersey on because

Starting point is 00:05:55 the portland grand prix indycar race was here this weekend so people listening and we're close by they missed it but next september go be sure to go it was really really fun three days of racing very nice were they were they fast cars they were we have zindi cars they were like close by. They missed it, but next September, be sure to go. It was really, really fun. Three days of racing. Very nice. Were they fast cars? They were very fast. They had no AI. No artificial intelligence yet, from what I understand. But if you look over on fast.ai, there's something that anybody who does proper data science is going to be pretty jazzed about. So Jupyter notebooks are notoriously bad citizens of source control and get and tools like that.

Starting point is 00:06:34 The reasons are basically whenever you have a notebook file, if you've ever run it, the output and the order in which the cells were run and the number in which the cells were run, and the number of times the cells were run, is stored in there. That's not great if someone gets the file and runs it, someone else gets it and runs it, and then you try to put it into source control. That's a problem, right? I mean, when you and I work on our code, we have Python files, the output goes somewhere, we check it in, the source code goes in. But with Jupyter, the outputs go in. Not just the outputs, but the memory address of some of the object used in the address. So even if it's you running it twice, you get merge conflicts, which is not the coolest

Starting point is 00:07:16 thing ever. I suspect that this goes by the name the Jupyter plus Git problem, where really it should be the Jupyter plus version control system VCS, because it doesn't matter what you're using. Anything that just diffs files is going to hate this, right? Anyway, the article and the feature really that I want to talk about is the Jupyter Git problem is now solved from Jeremy Howard over at fast.ai. The solution may surprise you. So it talks a little bit about the challenges here. And it says, it's interesting, it speaks in terms like that are not really developer oriented. It speaks more in terms of like end users. So like the way that maybe a first year science student might experience what the problem is not the way a seasoned data scientist would. Like, for example, here's the problem. The problem is when you're collaborating with others over Git,

Starting point is 00:08:10 you literally can't load your notebook if you both try to check it in because it's broken. Well, what does broken mean? Broken means it has merge conflicts written into it. That's really the problem is you can easily solve this problem if you accept their changes or accept your changes, but then you're losing data, right? So anyway, I says, okay, let's, let's look inside. Well, there's JSON and then there's like the head and then the, the Shaw like diff error. So I kind of already described this, but they do go into examples of like, when you're talking about matplotlib or something like that, you'll have things like matplotlib.axis.subplot.axisubplot at some memory address, right? Which is suboptimal, let's say. Yeah, there's a lot of axes. That's

Starting point is 00:08:55 right. Then non-determining outputs and so on. It says, okay, we identified two categories of problems here. And I would like to say this is only accurate if you have zero-based index when you start counting. So we've identified, in Michael's term, three problems here. One, Jupyter Notebook formats are fundamentally incompatible with version control. Problem zero.

Starting point is 00:09:22 Problem one, Git conflicts lead to broken notebooks. There we go. And many of these, almost all of these conflicts are unnecessary because metadata, like the environment, the machine name and stuff that it was run on, as well as the memory address of the objects is stored inside the file. What do you do?

Starting point is 00:09:42 Well, there was this thing called NBDEV that would allow you to clean the file. I think it was NBDEV that will let you clean it. There's other ways to clean it within Jupyter as well. You can say, I'm only going to commit to version control the empty version, right? You can say clear all cells and then commit that. Then that would be fine because you're wiping all that data out. However, sometimes that data is incredibly hard to compute, right? I have a picture. The picture comes from an hour of doing training machine models and then processing a gig of data and then looking at this picture. If I don't clear it and I check it in, the picture's right there. You know what I mean? Or some of the outputs are right there. So there's a huge reason

Starting point is 00:10:23 to not clear it because it might be incredibly hard to regenerate it. Maybe on the system you're on, you can't even run the code necessary, right? You don't have access to the database or whatever. So here's what they did. There's a new NBDev named NBDev2 as part of the name, not a version, but the name. And this comes from the folks at Fast.ai.

Starting point is 00:10:43 And here's how it works. It has a new merge driver for Git, okay? Instead of like processing the files, it says what we're going to do is we're going to set up hooks in Git. So when there is a merge, our special Python code that understands notebooks will present a different view for you.

Starting point is 00:11:00 Wow. I know. And there's a new save hook for Jupyter that automatically removes the unnecessary metadata and non-deterministic cell output. So what you'll get is when you open up this conflicting notebook in Jupyter, you'll actually have the diff shown

Starting point is 00:11:17 instead of having a corrupted notebook. Additionally, it drops out the metadata so you get these unnecessary ones are just kind of gone. So it talks about some interesting things that you can do there. You've got to run NB dev install hooks to get it set up and some other various things. There's also a lot of history on what has been done before. What are some of the other alternatives? But the big takeaway is the folks over at Fast AI have been using this internally for

Starting point is 00:11:43 several months and they say it has transformed their workflow. It's totally solved this problem. And the reason they care so much is almost all of their work, their unit tests, their documentation, their actual code, everything is in notebooks. They're like all in on notebooks. So having Git be a first-class citizen is obviously important. So I recommend people check this out. Postscript side bonus here is there's another thing called review in B. Review in B is about reviewing, say, a GitHub pull request. So somebody fixes a bug in a notebook and they do a PR and say, oh, you were generating this graph wrong. You should have passed this parameter, which means a totally different thing. Wouldn't

Starting point is 00:12:22 it be nice to have a picture of the before graph and the after graph? With this review in B, that's exactly what you get. So you get your code diff, but then you also get the output diff, which might be a matplotlib picture. Isn't that cool? That's really cool. I'd be surprised if GitHub doesn't have this eventually. Yeah, well. This seems like a logical next step. Yeah, it sure does, right? Notebooks are so important. Right, but it's not justithub though so some people are using git just straight so exactly right right or or git lab or whatever yeah yeah this is pretty neat um and this i'm yeah i'm one of the things i really like about this is the all the part like the all the other solutions that we've tried and everything i mean data science people are really good about covering that sort of stuff where a

Starting point is 00:13:08 lot of other people are like, hey, I came up with a problem. I solved it. Maybe some other people have solved it also, but yeah, whatever. Exactly. I will say this set of tools exactly solves a problem I had not that long ago. Okay. So this really resonates with you, huh? This resonates with me.

Starting point is 00:13:26 Yeah. Using notebooks for documentation and as part of like an integration test suite, like this is great. Yeah. Very cool. Piling on the audience says, ah, so it looks like you can actually resolve

Starting point is 00:13:38 merge conflicts inside the notebooks rather than traditionally ignore conflicts. I believe so as well. I think there's like a merge, merge inside of jupiter type of thing you can do neat yeah that's it i haven't i haven't totally used it all right anyway if you're into data science or that aside if you do jupiter and you care about source control this looks really helpful which you should care about source control yes exactly

Starting point is 00:14:01 yeah so if you use jupiter yeah full stop. Cool. There you go. Awesome. All right. Seth, over to you. Before we jump into the first topic you want to talk about, though, just real quick. We were so excited about episode 300. I didn't give you a chance to introduce yourself properly. So give us a quick background on you and then tell us about your item.

Starting point is 00:14:20 Yeah. So I'm currently an engineer at Elastic, working on the language clients team. Previously, I was the maintainer of the well-known within the Python community, the Elasticsearch client. Now I'm doing tech leadership for that same team. And then in terms of open source work, I am a maintainer of many different Python packages, most notably your Libs3, which is most downloaded Python package. And it's one of the dependencies of requests and Bodo and a whole bunch of other really foundational packages. That's incredible. Does it make you nervous to make changes to it? Oh, yeah. So the very first time that I became lead maintainer and had to make a release,

Starting point is 00:15:01 it was I actually spent multiple hours just kind of looking through the wheels and the source distributions and making sure that everything was right. It was a tough day, honestly. Yeah, so that chat with that Brian open with you, you've been there as well, huh? Nice. All right, well, what's your first item for us?

Starting point is 00:15:18 Yeah, so my first item is about trust stores. So this is about like certificates that you use to verify HTTPS connections. And so this is a library that me and David Glick have worked together to implement. And it's essentially trying to solve the problem of certify with Python and how it kind of interacts with certificates that aren't necessarily trusted by the greater world. So for example, if you have like a corporate proxy, if your company is installing a certificate on your behalf, enable it to do proxying of some sort, Certify just doesn't work with that. And you get these errors that are kind of insurmountable. You get errors that require

Starting point is 00:16:01 really low level debugging knowledge to figure out. And so we went and implemented this. Anything that has to do with certificates. If it goes wrong, it's just like, well, that's never going to work. I guess we're done here. It's just so hard to understand, right? I'm on a campaign to make it so no one on the world needs to type verify equals false ever again. That's my mission. Awesome.

Starting point is 00:16:22 Also, you spoke about Certify. Give us the background. I'm not sure, you spoke about certify, like, tell us what, give us the background. I'm not sure we all know what certify does. Sure. Yeah. Certify is essentially, um, every web browser like Chrome and Firefox and all of that, they have a bundle, a group of certificates that they are marking as these are trusted. Um, and they kind of bundle those along with every single web browser, right? And so Mozilla, because it's open source, it open sources its trust store. And so what certify is, is it's a small, really thin wrapper Python package around that bundle. And it allows Python to make HTTPS

Starting point is 00:17:00 connections to websites essentially without having to like rely on a certificate trust store being configured manually by the user. And so a lot of times because Python is installed on Windows or Mac OS, but is relying on Open SSL for a lot of its TLS, it really requires a file to be there. Like Open SSL doesn't know anything about the system certificate trust or any of that. It requires a file to be there. Like OpenSSL doesn't know anything about the system, certificate, trust, or any of that. It's very, it requires a file to be there. And so certify is solving that problem.

Starting point is 00:17:31 I see. So if I went and installed it, if I was on like windows and installed it into the trusted root store or something like that, it wouldn't, that wouldn't count. That wouldn't be enough. It wouldn't be enough. Yeah. You would, there is a whole bunch of other things that you get also by using these native operating system APIs for certificates like auto updates.

Starting point is 00:17:50 It can be centrally managed. So, you know, your IT department can click a button and update everyone's system trust store. So, yeah, there's a lot of really good benefits to using the system trust store instead of this Python managed file. And this article kind of goes into the nitty gritty of that. But the big announcement for this project was that PIP actually, with the version 22.2 release, added support, experimental support, for using this library instead of Certify to verify HTTPS. And so what this will allow people to do is try out TrustStore optionally, right, instead of switching it to verify HTTPS. And so what this will allow people to do

Starting point is 00:18:25 is try out TrustStore optionally, right? Instead of switching it to a default. And if they're experiencing this class of errors with installing Python packages or upgrading Python packages, they can use one flag. It's, I believe it's listed, either way it would be listed here.

Starting point is 00:18:44 So you do dash dash use dash feature equals trust store. And that will, you'll recognize that use feature flag for the 2020 resolver. That's another feature flag that they use. So this trust store feature flag is the same thing. It will, if trust store is installed on your system, it will use that instead of certify. And it allows you to get around the errors that you can see when you have a corporate network involved. So yeah, that this is kind of the big thing that I'm really excited about. And we're really hoping that in the future, we can add this to Python, maybe make this a default for requests like there's a whole bunch of different, really interesting things that we can go forward with if we can prove that, hey, this is useful to these users.

Starting point is 00:19:28 Right. Yeah. Yeah. Fantastic. So if I say dash dash use feature equals trust store, do I have to previously have installed trust store or something like that? You do have to have previously installed trust store. So the package is relatively new. It's less than a year old. And so to ensure that we're able to keep things moving because it's experimental, we didn't want to bundle with pip.

Starting point is 00:19:52 Their release cycle is a lot longer. I collaborated with Su Ping for a good long while on this and making sure that everything was all good to go for pip since shipping with pip is a big deal. So, yeah, it's been a long a long road so yeah this looks super useful uh kim out in the audience says i'd love to never need verify false again on my internal network seth's mission is fantastic yeah yeah i'm very grateful that this work is going on i hope that that's true because it drives me nuts is this something you have to deal with internally as well, Brian? Yeah, because we've got internal network, corporate firewall, we've got the trust stores on Windows

Starting point is 00:20:35 systems and it is an issue. So a lot of, I mean, one of the ways we get around it is to have internal pipey eye we'll get we'll get a mirror inside yeah um but uh sometimes i want to try out stuff that's not there so um having having something like this work uh would be good um but it's not just pipey it's other places too it's uh so yeah the entire entire outside internet is usually impacted when you when you have that sort of situation of a corporate proxy so yeah and i i'd like to be able to and that so i'm i'm guessing that this trust

Starting point is 00:21:10 store i mean using it within pip would be great for a lot of people to try it but uh trying out this trust store for applications that depend on uh trusted uh sites that would be helpful as well right yeah so actually the documentation if you're trying to use it manually with other things, we support Eurolib 3 AIO HTTP requests, and I'm sure it'll work with other libraries as well. Nice. Like HTTPX? Yeah, it should work with anything

Starting point is 00:21:38 that uses the standard SSL context API. As long as it can use that API, it should work with it. This is great. Awesome. Very cool. Nice work. Thanks for coming on and sharing it. Hopefully it makes corporate Python a little better. You know, there's, this was long ago when I first started the podcast,

Starting point is 00:21:58 this one and TalkPython. There was a lot of debate or discussion, I guess, whether Python was an appropriate enterprise software type of language. You know, I think that debate is largely over. And I think the reason it's over is because the data scientists said it's this is not a debate. You want us to do the job or not do the job? That's right. OK, well, so let's use Python.

Starting point is 00:22:20 And it kind of spread from there internally through acceptance. That said, like now that it does live in these environments that Brian described much more frequently, it's really important to have this support. Yeah. It's actually really funny because, so to put this in perspective for Java folks, this is like Java trust stores is like Certify where you have this manual thing that's shipped with Java

Starting point is 00:22:44 as opposed to just using the system and i i got that comment on uh lobsters or something that was talking about this article and they were just like wow this is like getting rid of java trust stories this is great i'm like okay i didn't even know that existed that's right we really hate it over there and yeah we hate this so this is great i was like okay thank you cool all right well before we get to the next topic brian let's talk about our sponsor for this week in many weeks this year microsoft for startups founders hub if you are starting a business doing a startup you are a little ways going or you're just thinking about it you should really check check this out because Microsoft versus startups set out to understand the challenges that we all have creating startups

Starting point is 00:23:28 in this digital cloud age. And they created Microsoft versus startups founders hub to help solve many of them. So that includes getting cloud resources, GitHub credits, other credits like AI credits, for example, from OpenAI that you can run your code on. But maybe even more important than that, it has support for connecting you with mentors and experts to make sure that you go in the right direction when you're young and getting started. So, so often you see the successful startups being in places where there are a lot of mentors, where there's these networks and people have connections to get funding, the marketing side of things, the product market fit, all of those things are super hard. So if you are part of Microsoft for Startups Founders Hub, you'll have access to their mentorship network, which gives you access to hundreds of mentors across a range of disciplines,

Starting point is 00:24:26 like the ones I just named and more, as well as up to a little bit over $100,000 worth of credits in Azure and GitHub and OpenAI and other places as you go through certain checkpoints as you sort of grow within this program. So really tons of super support that you can get for your startup. It doesn't have to be investor backed.

Starting point is 00:24:47 It doesn't have to be third party verified to participate. All you have to do is go to pythonbytes.fm slash foundershub2022, apply. And if you accept it, you'll get all of this support from them. So make your idea a reality with Microsoft for Startups Founders Hub. Apply today for free.

Starting point is 00:25:04 Get in, you'll get tons of support. So very nice. Also nice, Brian, plots. Tell us about these plots. Plots and command lines. So I like command line stuff. And actually with the thanks of Will McCougan, we've got a lot of people excited about CLIs.

Starting point is 00:25:21 But apparently Bob is also, Bob Belderbos from the PyBytes duo. So I like this article. So actually, I kind of skimmed the article. Sorry, Bob. But making plots with your terminal with plot text, if you install it,

Starting point is 00:25:40 I think it's plot text. I can see the typo squatting happening right now. Yeah, so if you pip install it, there's one T in the middle. So it's P-L-O-T-E-X-T. So he had some code where he was looking at plotting the frequency of their blog articles on the terminal. So he was using some of their own data to plot stuff and he came up with like uh it's kind of cool walking through how he grabbed the data

Starting point is 00:26:10 and everything but uh i was looking at this plot going oh this is a pretty nice looking plot i mean it's totally blocky of course but um but it's a bar chart so it's supposed to be lucky so that's okay and so then i went over and looked at this this uh package that was plot x'd um and it's supposed to be lucky. So that's okay. And so then I went over and looked at this package, this PlotExt. And it's cool. Look at all these awesome plots. I was looking at some of the various things you can do. It's got basic plots for, you know, just like sine waves and things like that. But you can also do fill-in plots and then uh multi-color this is

Starting point is 00:26:47 kind of a lot you can kind of cool stuff you can do on the command line and then even data streams which i was uh look at that it's a data stream going on in a plot in your terminal it's pretty great uh images even so there's a cat image you can do lol cats all day long yeah i say the people that put together those examples knew what the internet wants kind of do cat pictures yeah so um and then even uh subplots so the the first example we saw it it has a it has kind of all this this it's not actually that bad of uh the interface it looks pretty um you know it's tedious to put together plots anyway but this isn't too bad but that that cover image that we saw is a is not a combination of images that's one plot that with subplots in it so i see that's cool so within one terminal window you can

Starting point is 00:27:38 do almost like a dashboard view with different plots and they could probably can be updating live and yeah yeah so this is pretty exciting i like it uh so anyway that was just i just wanted to say hey if you want to plot on the command line you can use this so i'm loving this terminal renaissance is so fun so yeah we make me make us feel like uh hackers again you know so it does absolutely make us feel like a hackers again, you know? So it does absolutely make you feel like a hacker. I love it. That's so good. So, all right. On the next item.

Starting point is 00:28:11 Yeah. Just, uh, hadn't really planned to talk about this, but I just yesterday did an episode with Will McGugan, seven lessons from building a modern 2e framework. Brian, you covered that article last week on this show. So I reached out to Will and said, hey, we should absolutely cover this stuff in like a deep dive. So people-

Starting point is 00:28:29 Oh, I can't wait to listen. This is great. People can go check that out as well. All right. But let's talk about one of my very favorite things, HTMX. People who are not familiar with HTMX, you really owe it to yourself to check this out.

Starting point is 00:28:43 It's what the web should have been forever, but it wasn't for some reason. It's like it stalled in the late, mid-90s. I don't know. And hyperlinks and forms are the only things that can make requests. You can only click on them to make it happen and so on. Why should the entire screen have to be replaced,

Starting point is 00:29:00 every interaction and all those things? So HTMX is awesome. You can just put in little fragments of declarative code and it does all the cool work. You can have a class on it if people want to check that out, but that's not the topic of today. The topic is template fragments. So Carson Gross over there wrote this article, this essay called Template Fragments. It said, one way you might consider doing this is in HTML, you very frequently have to first show the page and then as little sections of an update, it goes back to the server and says, I just need the code, the HTML block that goes into this fragment here because somebody

Starting point is 00:29:34 moused over something else. So refresh its related item or whatever. He's a big fan of this thing called locality of behavior design principle, where instead of having a bunch of pieces that cling together and reassemble themselves, like if it could just all be right there, wouldn't that be great? So he says, normally the way that you would have to do this is you would have to have your full HTML and then a little subsection. And then that subsection has the optional element. But some frameworks, some template libraries allow you to define a fragment. And then when the code is requested on the server, it can either show the whole thing

Starting point is 00:30:12 or just peel that fragment out of the HTML, but you don't have to parse it into a bunch of small files. Cool, huh? It's really useful if there's no reuse. Like if the only reason you would make that little fragment is so that you could return it separately, this is great because basically it means you can just write the page once and it's it can interact with different data different elements if for some reason that fragment was being used in multiple places all of a sudden it's like code duplication and that's not ideal

Starting point is 00:30:38 but so we talked about this and hey there's some known uh implementations of this apparently django has the render block extension i created the ginger partials and chameleon partials which i'm not really sure i'm thinking i might actually take them out now that there's something for ginger better which i'm about to talk about but nonetheless those are kind of sort of allow this but more more in the second descriptive way where you have like a fragment that's separate but included. But I was talking with Sergey of Rixies. He said, between Ginger 2 Fragments and my Ginger Partials, HTMLX plus Flask is so awesome.

Starting point is 00:31:21 So he created this library called Ginger 2 Fragments, which does exactly what I described. So in Ginger, you have blocks, like you might have your main HTML and you say, here's a block of main content with his library. What you can do is you can say either just render the template or you can now render block and name just part of your Jinja template. And that part comes back with the data you supply to it. That's pretty awesome, right? Like this, this one paragraph is the whole response from the server if you call render block instead of render template this is yeah this is super great honestly i on twitter i every time i see htmx i'm just like i am so like prepared to write a website because i've not had the use case for a while but i'm very excited for the next time i will have i

Starting point is 00:32:02 exactly the same i'm working on projects that have been around for six or seven years. I'm like, if I rewrite this thing, it's getting HTMX all over it. But I just can't bring myself quite to do it. But yeah, it's so good. One day. A couple of comments from the chat. Vincent from CalmCode says, HTMX is the bee's knees and that CalmCode uses it a whole bunch.

Starting point is 00:32:26 I am not surprised, Vincent. awesome yeah yeah if i any website i create after knowing about htmx is likely going to be using htmx if you thought the answer was view js or react or something like that you may really really really want to check this out first well especially if you're somebody like me that i'm like yeah i want to i want to put this interactive stuff in here. I don't, I don't really feel, I'm not an expert in JavaScript though. So I'm not sure. And so, but I, but I do know somebody that knows a lot about HTMX. So you might know someone you're venturing very close to getting me off onto like a very long rant about htmx but it's so good because even if you know javascript it wouldn't it be better to not have to think about now i'm running

Starting point is 00:33:10 client code now i'm running server code now i'm running the apis to connect the client code to the server code this one's in this language it knows this that one's in that language in this location it knows that like in htmx you just write it all in one place in one language with the same context and security model and everything. Access to the database, for example. And then you just do what you need to do. It's perfect. And it's not really just about thinking about two languages either.

Starting point is 00:33:35 There's a lot of people, like me, that already have to think in two languages. I'm thinking in C++ and Python. So thinking about it in a third language or a fourth language, that's, it's like, you know, come on. Having a place to stop, plus, yeah. Yeah, yeah. A final comment I'll make on this is even people are using Node.js like HTMX,

Starting point is 00:33:56 where it's the same language. It's like, it's also just about the context and location switch. Oh, yeah. That's, I hadn't heard that. That's pretty cool. Yeah. Seth, it sounds like you were going to say something. Maybe I'll let you have the last word here. Oh, no, I was honestly just going to say that location switch oh yeah that's i didn't hear hadn't heard that that's pretty cool yeah seth

Starting point is 00:34:05 it sounds like you were gonna say something maybe i'll let you have the last word here oh no i was honestly just gonna say that like the more we can stay in html the better because you have to know html so you might as well stay in it right yeah absolutely absolutely so uh well done sergey check out his ginger two fragments framework it's it is super new. Like, I don't know when it got released, but in a couple days, these are like two and three days

Starting point is 00:34:29 on all the commits here. It's a lot of days. It's very, very new. Two to three days. Yeah. Well done. Well done.

Starting point is 00:34:36 All right. Seth, over to you for the final one. Sure thing. Yeah, this, this article

Starting point is 00:34:41 was announcing something that's been getting worked on for a while, which is generic generators for Salsa 3. So what you're seeing there, SLSA, that stands for, if I can remember, it is Supply Chain Levels for Artifacts, Levels for Software Artifacts. So SLSA, and you pronounce it Salsa. And it's essentially... It's a great way to say that acronym. Yeah, right? Makes you hungry every time, which is the best part.

Starting point is 00:35:13 But yeah, it's basically a set of tools and standards to attest and verify the provenance of artifacts. So essentially, where did this thing come from? This file, this wheel, this jar, depending on what like ecosystem independent, whatever thing, whatever artifact you're building, where did it come from? How was it built? And it so it uses a whole bunch of different like cryptographical primitives and open IDC, which is basically magic, but it basically allows you to prove in effect, okay, this was built from this specific GitHub repository, this commit, this tag, and someone can then later

Starting point is 00:35:53 take this file, this artifact that got built and then verify that that was the case. And so this is kind of like in the future, hopefully be used as like a defense against maybe like stolen credentials on the Python package index. That would never happen. That would never happen, right? That's never happened. That has never happened other than last week. At the time of the recording,

Starting point is 00:36:16 never has happened, I would say. So yeah, it gives a good defense against this, right? Because if you, let's say you have a package and the Python package index knows that this package came from, you know, github.com slash Seth M. Larson slash whatever, right? And then in the future, it received something that doesn't come from that GitHub repository, it can flag that and say, hey, this isn't right, like this didn't come from the place that it came from before or wherever it's supposed to come from. And the fact that this is generic is the big deal.

Starting point is 00:36:48 The part that ties us back to Python is that you can use it for wheel files and source distributions. You can sign like anything. And so, for example, one of the Python projects that is featured in here is your lives three. I've been trying to get into this and it's been really successful. And so your lib three now does this and you can actually verify that it came from a specific repo and that the wheel was came from a specific tag. And yeah, it's, it's really interesting. And this ecosystem is like just getting started. And so if you're like interested in anything about like supply chain security and all of that, this is like a great place to start doing some learning

Starting point is 00:37:25 about what the future might look like. Yeah, this is great. When I first saw this, I thought, okay, this is cool, but how does that really help protect against somebody sabotaging a package? But then again, if you think, and I realize if you think back to what happened with some of those other packages,

Starting point is 00:37:40 somebody got ahold of the PyPI account, not the GitHub account. And they just published a new version directly not through the ci right right yeah so this is making it just makes the amount of things that need to get compromised even larger right like right it closes no longer do you need to only compromise the email account on pi pi you have to also compromise github and then if you have, you know, GitHub environments configured,

Starting point is 00:38:06 you need to compromise a second account to like review the deployment. And so it just makes it even harder to actually get that attack off essentially. Yeah. And if you had to publish the actual vulnerability to a popular GitHub repository to trigger it, it would be discovered sooner, right?

Starting point is 00:38:26 Because people are like, oh, what's J... Oh, that's unusual. Who has made this... They've made this commit, and now it's doing this URL thing over to hacksore.com. And, right? Like, that's just another out-in-public thing, whereas if the direct account gets attacked,

Starting point is 00:38:43 somebody can just use Twine or something directly to push a bad wheel up. Yeah, exactly. Yeah. The more pushing bad wheels, you have to go through so many different hoops just to do something. You need to flatten those bad wheels. Yes. Got to inspect them too. Exactly. All right. Awesome. This is good stuff. Well, Brian, that's... No, do you have any more? No, that's all of them. Do you have any extras for us? I do. Although I'm going to try to make it quick because now I'm hungry for some salsa. So I wanted to, I'm like super excited for this upcoming weekend.

Starting point is 00:39:17 I can't believe it. So on Saturday, on Saturday, September 10th, I will be in San Francisco. And I've got two events going on at Pi Bay. So Pi Bay, awesome conference. I haven't been there before, but you've been there last year or something like that? Yeah, last year, and I absolutely loved it. I would go this year if I wasn't on single parent duty and had kids that had to go to school. So I'm giving two events. So one of them is a Sharing is caring PyTest fixture edition. I'm going to talk about building. Actually, I'm just

Starting point is 00:39:51 going to talk about packaging, but it's not really about packaging. It's about sharing fixtures with other people. And because I think that that's a bigger need than people realize. So anyway, love fixtures. We're going to talk about that. And then, um, and then I got asked to be on this experts panel. There's no with, uh, we got, uh, Zach Hatfield, Dodds, me, Andy Knight, uh, which is, um, he's got a good automation, automation Panda. That's right. Uh, Joshua Grant and Nishat Khan. So it should be a fun panel. And it's at seven o'clock at night. I'm like, wow, I think I really need to change my flight because I was planning on flying out at 8 a.m. the next day and it's going to be tough. So that's going on

Starting point is 00:40:39 next weekend. I'm pretty excited. Yeah. Bylang says good luck on the talk brian oh thanks so how about you do you have any extras i do i do a bunch of i'll make them pretty quick so heroku you know the platform is a service place they for 13 years or something have had a free plan where people can go and create what what are they called dinos or something i don't use yeah din yeah, dinos. I don't use Heroku. So I don't know all the terminology and how all the plans break down. But for a long time, they've had free plans. But now they are canceling them. And you will either have to pay or delete your projects. So that's going to affect a lot of people. They have something like 13 million. What's the right number here?

Starting point is 00:41:31 Claims, yeah, that it's been used by 13 to develop 13 million apps. So I bet many of those are free and are going to be suffering this. There's an interesting discussion on Y Combinator. So you can check that out. I'm sure it's very civil over there in the comments as it always would be. Yes. it's very civil over there in the comments as it always would be. Yeah. But basically, you know, Heroku was purchased by Salesforce for, they claim, and it may be true. I'm sure that it is somewhat true. They want to cancel this because of fraud and abuse. It may be more that they have to spend so much money to fight the fraud and abuse that it's just not worth it to them. I

Starting point is 00:42:01 don't know what it is, but however you land on the, it's a good idea, bad idea, it's going to cost money if you want to use this. And it's pretty pricey, by the way. This change will roughly double the cost of a basic plan that uses Redis from up to $50 a month. If you start bringing in your Redis cache and your Postgres hosting and your Dinos,

Starting point is 00:42:22 they all add up, and then you've got to scale this one or that one right um one of the reasons i'm not using it but not the only reason i want a little more control as well but anyway so if if you have a free thing running on heroku or you were thinking about it you have to think again find something else there's actually at the bottom there's a bunch of um platform as a service things that i've never heard of there's porter railway render fly io and clever cloud all of these things vying for this business they all look kind of interesting i know nothing about them you can check it out i've seen fly i o all over the place and python twitter at least yeah

Starting point is 00:42:54 okay so that's if i were personally picking one i would check that one out first but i don't know anything about any of them to be honest with you the last time I used Heroku was a long time ago. I'd like to see some real comparisons among some of these. There's still a place for hobby projects. I want to try something out, or do something live, even as a high school app or something like that i know um oh good you're gonna show python anywhere i was going to i gotta find the right link here we go um so i think they still have a free tier i don't know if they advertise it much but

Starting point is 00:43:36 beginner's free yeah the the part that bothers me really isn't that it's, I don't, there's a comment about, a comment in the chat about, it's hard to, it's hard to complain about people. It's a free service, so they can do whatever they want, right? Essentially. Yeah. Oh, there's that right. That's the right one. Yeah. However, the jump between free and $50 a month is a big jump.

Starting point is 00:44:05 And that's my gripe about it. So anyway. Yep. And not to frame this into a recommendation, but yeah, I feel like a lot of the cloud services have really pushed how easy it is to deploy. Because I remember initially starting with Heroku, the ease of deployment was the big win for a lot of people. And so, yeah, a lot of cloud services where, you know,

Starting point is 00:44:27 you pay for everything you use, but what you use ends up being a few cents a month, which is a lot more surmountable than $50 a month. So, yeah, there's definitely a gap there. There's not as much of a gap there as there was before. Yeah, for sure. Brian out in the audience says, at my last company, we had to disable our free tier due to crypto miners yeah of course i'm sure and kim also has something yeah stealing the computation

Starting point is 00:44:52 there but all right anyway um okay not i didn't want to go too far down that one but for sure check check out some of the options below uh digital ocean and lenode are also really really good options this one i'm full of rants today, potential rants. This one comes to us from Extreme Tech. White House, as in the US, bans paywalls on taxpayer-funded research. It is always felt super creepy and wrong that we have the NSF, which pays billions of dollars a year, millions for individual research projects to come up with scientific research that all three of us and many people listening actually pay for.

Starting point is 00:45:33 I'm glad to pay it. I think this is really important. It's important for the country. It's important for the world. And yet those results get locked up behind really expensive for pay scientific journals, right? Like you've got to pay $5,000 a year to subscribe to this journal so that you can read the article that, wait, we paid to create that and we can't even get access to it? So this article here is, the White House has updated federal rules to close a loophole that enabled journals to keep taxpayer-funded research behind a paywall, which I think is great. So if you're specifically in the data science side,

Starting point is 00:46:07 I think this might be relevant to you. Yeah, I'm curious how that's going to get implemented. Yeah, me too. All right, anyway, there's that. And then, Seth, back to some of the stuff you were talking about. I mean, it would never happen that someone would try to phish.

Starting point is 00:46:22 Wait, last week, somebody tried to phish. IP, no, last week somebody tried to phish PyPI. Maybe it was a week before when it started, but not too long ago. So over on darkreading.com, there's an article that says, threat actor phishing PyPI users has been identified. Juice Ledger has escalated a campaign to distribute its information stealer by now going after developers who publish code widely used on the Python code repository. Don't want to go too much into it, but there's this group who had originally tried to do typo squatting, if I'm correct. They wrote some thing to steal some malware written in.NET, by the way,

Starting point is 00:46:56 which Will was joking about it only running on Windows. Hey, if they use.NET Core, they could expand out the open source version. Anyway, I don't want to give. Hey, if they use.NET Core, they could expand out the open source version. Anyway, I don't want to give them ideas, but they were distributing this malware through these malicious packages. And then they said, well, what if we could get really popular ones, hack their accounts, and then upload bad wheels? So anyway, there's a bunch of background on the actual people behind this. So it's pretty interesting. You can check out that article if you want. There's also an Ars Technica article, but it doesn't have as much depth as the dark reading one. Nice. All right. Last one. I think this is the last one. Brian Skin, former co-host on the

Starting point is 00:47:36 show, who always contributes many interesting things, says Python Bytes will definitely want to check this out. This is a tweet by Steve Dower that says, we have published the details of a critical security problem for Python. It is very rare that we have direct vulnerabilities in Python. Like it was all fun to have the lulls about, um, Ginny, Jindy and log4j, but this is not exactly that, but it's a denial of service at that kind of scale. So if you've ever thought, I have a string and it needs to be an integer, and that string came from user input,

Starting point is 00:48:13 that's really bad, it turns out, because there's a denial of service thing that you can do by passing very, very long strings to that integer parsing. Seth, you're shaking your head like, oh boy. Yes. Yeah, if you've been waiting to upgrade to Python 2, now's the time to upgrade Python 3, I would say.

Starting point is 00:48:32 Exactly. The security support. And you shouldn't say, I'll just go to one of the older ones. Like you need to get the 3.10.7 ASP. I suspect they'll roll this back to some of the supported ones as well. So they'll probably back port it to 3.9 and 3.8. But if you're on say 3.6, that's a problem. That's a big, big problem. Yeah. So expect releases for 3.7 plus in the next week. This came out a few days ago. This has now

Starting point is 00:48:58 been done, but this Twitter thread is super interesting and that's what I'm linking to. So y'all can check that out. There was also some feedback like, what are you doing? How dare you fix this? The way they fixed this is they said, if you're doing base 10 parsing, you can only use 4,300 digits. Not the number to 4,000, but places in the number, 4,000 places. That's a really large number. If it's bigger than that, basically Python won't be able to parse it before.

Starting point is 00:49:26 Brian, you do C++ all the time. You have to think about, is this over 32,000? Is it signed or unsigned? Okay, it's unsigned. We can get to 64,000. This is not that level of thinking, but you kind of do have to think about what the heck's going on here.

Starting point is 00:49:42 I think it's a fair fix. I do too. People are freaking out for no reason. Yeah, this one was really, this one's wild too because you just pass a long number. Like it's not something sophisticated or anything.

Starting point is 00:49:53 This is, it also, it feels almost not log for J, but kind of log for J a little bit where you can just do denial of service by doing something very trivial. Exactly. Yeah, you just, you just try to set your username to jndi colon slash slash hax trivial. Exactly. Yeah, you just try to set your username

Starting point is 00:50:06 to jndi://hackster.com. This is like, well, the number is a1722117, and then boom, now it goes to the website, right? This is denial of service versus remote code execution, which is clearly better, but it's not good. Yeah, just hold down the zero key for a little longer. Exactly. Or if you're writing Python code, you can just do times 10 000 carat 10 000 you know power to 10 000 or something and send that

Starting point is 00:50:31 yeah string extension really coming in handy here rpad exactly or uh z fill in the right pad exactly yeah piloting wants to send pi across you You know, that's going to upset it. Anyway, I upgraded my servers to 3.10.7. They were not available from Ubuntu directly. It was still the old 3.10.6, which is unnerving. But because I built mine from source, I just changed the number 3.10.7, rebuild and redeploy Python. I'm good to go. I imagine everybody listening to this podcast is on 3. seven or above if they at any chance can be. I mean, that if they're below, it's not because they haven't tried.

Starting point is 00:51:13 Yeah. But let me point this out. I would say, actually, I want to follow up with a couple of things. Because this is, maybe this should have been the main item, but whatever. One, we've talked about the reason you should upgrade to Python 3 for a long time. And Brian, you and I had lots of fun calling it legacy Python. Although we've had people go into iTunes and like post negative reviews of the podcast because I had said disparaging things of Python 2, but that's okay. I'm willing to stick by them. Oh my goodness. That is wild.

Starting point is 00:51:44 More reviews. Awesome. If you have good things to say, also consider posting a review, not just if you're angry that I called it legacy Python, but if you're on old legacy code, which is even three, five, but is very seriously Python too, because the gap to upgrade is really hard. These are the types of things that we warned about that could be a problem yeah and there will be no fix right you better just say well we're going to make sure the strings that are really destined to be integers are really really checked and you know i mean it's it's not good it's not good so just one more reason to be on a shipping version of python even if it's just

Starting point is 00:52:21 three seven yeah all right yeah that's that's, that's it. Uh, let's see. Yeah. Change log. Uh, one other really quick. Yeah. So you can see it's like actually described quite well here. Hatch by Gregory P. Smith and Christian Himes. Feedback by a bunch of great folks. Sebastian Ramirez said, I sent a tweet out when this got fixed saying, please be kind to your open source contributors. They just wrote 800 lines of code in a PR so that you can parse strings to integers. So apparently it wasn't easy to fix. But yeah, I agree.

Starting point is 00:52:54 Cool. Ready for a joke? Or actually, Seth, you got anything extra you want to throw out first? Yeah, I had a real, hopefully quick one. So I follow a whole bunch of game art accounts on Twitter because I just love it, seeing what people create. And one came by, it was using hashtag pixel, P-Y-X-L, did a little ding.

Starting point is 00:53:16 I'm like, wait a second, that's Python. And then I just went back in this developer's Twitter a few tweets back, and they just released wasm support for this python like game framework i'm like this is incredible um so yeah it was quite the it was a very fast journey of wow wasm is everywhere at this point that's kind of kind of wild that it's popping up so fast so yeah version uh 180 of this um retro game engine for python which they had a whole bunch of really beautiful examples. I think y'all have covered this framework before but the Wasm support

Starting point is 00:53:50 is recent. Yeah, this is really cool. Yeah, so apparently they have a whole bunch of demos that you can just play in the browser and I was really blown away that I didn't even know this existed and suddenly there's Wasm support for it. Awesome. I love it. Okay, that's a great one yeah all right how about

Starting point is 00:54:07 we close it out with a bit of a joke have you ever felt like you've had a hard day at work there's one of these problems like parsing integers you're like how could possibly this go wrong i just don't understand what is happening well here we have a joke of a guy at a nighttime soccer game apparently it's a little cool out but he's been running really hard. So it's a picture of a guy whose head is literally steaming, like not a little bit, a lot, a lot. I think that's a visualization of like integer being parsed into a string right there. Exactly. The before. I'll read what the tweet really says. And then maybe we can play with it a little. It says, the tweet says just a JavaScript developer after work, you know, like, what do you we can play with it a little. It says, the tweet says, just a JavaScript developer after work.

Starting point is 00:54:45 You know, like, what do you mean I have to do a new framework? I just did a new framework last month. I feel like this could be Christian Himes after going, what do you mean parsing integers that denial of service? I just can't. The ints are wrong. The ints are cursed. Exactly.

Starting point is 00:55:04 Anyway, I just, I'll just leave this here for people to appreciate and we can call it a show 300. Yeah. Nice. Thanks. Yeah.

Starting point is 00:55:12 Thank you, Brian, Seth. Thanks so much for being here and sharing the work you've been doing. Yeah. Thanks so much for having me. Yeah. It's been great.

Starting point is 00:55:19 Bye everyone.

Python Bytes - #300 A Jupyter merge driver for git

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.