Python Bytes - #262 So many bots up in your documentation

Episode Date: December 9, 2021

Topics covered in this episode: pytest 7.0.0rc1 PandasTutor * Apache Airflow* textwrap.dedent pip-audit Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/262...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 262, recorded December 8th, 2021. Oh my gosh, it's almost winter. I'm Michael Kennedy. And I'm Brian Ocken. And I'm Leah Cole. Yay. Yay. So great to have you here. Thanks for being here on the show.
Starting point is 00:00:19 Yeah, happy to be here. You and I got a chance to discuss Airflow over on TalkPython a couple months ago, something like that. Yeah. Yeah. But now we'll probably do a little more Airflow over here for people who are unfamiliar with that, but also just whatever you're interested in. So yeah, great to have you here. Why don't you tell people a quick bit about yourself before we jump into the topics? Sure. So I'm Leah and I am a developer relations engineer in Google Cloud. And specifically, I work on Cloud Composer, which is our hosted managed product of the popular Apache Airflow project, which we'll talk about a little bit later. And in addition to writing samples and content for that, I also work with a group of fellow engineers, and we maintain all Python samples for Google Cloud and make sure that they stay tested, up to date and are healthy and are getting reviewed for new samples.
Starting point is 00:01:14 And that's a lot of fun. That kind of fell into my lap and has been a good time. Yeah. I remember Python being one of the original two supported languages on Google Cloud, right? It had sort of a special place. Yeah. Now it's one of seven, I think. Yeah. Cool. Well, that sounds like such a fun job. I've always imagined DevRelations type of jobs to be super fun. Maybe slightly less fun in COVID because the travel and
Starting point is 00:01:36 the conferences and all those kinds of things are part of it, but still a fun job, right? Still a good time. Every day is a little bit different. You kind of never know what's going to happen and that's part of what i like about it yeah awesome oh cool brian i don't even know what you're going to cover so i don't know what's going to happen why don't you let us know you don't know what i'm going to cover um well i'm not looking at my docs yet oh okay sorry fighting a cold um i am super excited by test seven release candidate one is out so um oh that's excellent that's big news it is the last release for pytest was uh or six they've done the other dot releases but the six two or six zero came out or six two i don't know i lost track track uh we use six two four for our gcp samples so oh you do we do. Well, I think it was, I wrote this down.
Starting point is 00:02:27 The 6.2.0 was released on December 2020. So it's been, we're ready for a new one. So 7.0 is out, the release candidate at least. And so because it's a release candidate to install it, you have to do pip install pytest double equal 7.0.0 rc1. We've got that in the show notes. It's also on the release announcement page for pytest. But I wanted to go through some of the cool features
Starting point is 00:02:56 that I'm really excited about. There's a lot of great things in there. There's some little improvements with the aprox thing. So one of the things that PyTest has is an aprox. So you can say floating point numbers, if you're comparing them, you should never do equal, but you can do equal aprox with PyTest and it's really pretty cool. That's cool. I didn't know that because any science you're doing is so like double equals is the kiss
Starting point is 00:03:21 of death for floating point math comparison. Yeah. Well, in the PyTest of Prox does now the docs reference the NumPy comparisons, which is nice because NumPy has some really cool features around that. But PyTest out of the box does. And now also with mappings and dicts and other does. It handles decimal types, which is nice. Um, decimal types of course are very useful when working with money and other things that need to be exact decimals. Um, one of the things that's really cool is, um, the sequences are compared better. So if you have like a list of numbers and you compare against an approximate list of another numbers, I didn't know
Starting point is 00:04:05 you could do this. Um, it will tell you, uh, which index was, uh, was wrong. And I, and actually not by how much, but what, what the expected was. And, uh, that's pretty neat. So those are the little minor features. Um, most of these are kind of minor for it, but it major for somebody, right? Um, so one of the things I like is some people have mentioned fixtures, or sometimes when people use a lot of fixtures, they don't know where the fixtures are. Well, there's a couple flags, fixtures per test and dash dash fixtures. Both those flags are helpful to find out what fixtures you have available. And now, by default, they print the location of the path in and with the fixture name
Starting point is 00:04:46 and you can also do a verbose option that prints out the full doc string which is which is pretty handy um the a couple things that i'm really excited about are python path that's been added and that was a feature i added to the the project which is fun um nice it's cool to see the contributions you're making coming back out yeah it's cool and it's um and then there's a bunch of other features that i contributed to by just saying this is a little weird can we fix this and somebody else volunteered to fix it so it's nice that's the best kind of contribution yeah um one of the one of the improvements in the docs which is kind of fun is there's an auto-generated uh list of uh so i've got the changelog going on here um and i gotta come back to this uh there's an auto-generated
Starting point is 00:05:31 list of plugins and there's 963 right now we'll refresh it nope still 963 but that's a lot um when i first started right right in the the beta uh or the the second edition of the PyTest book, I noticed this and I wrote it down, but the number keeps changing. So I took out the number. I'm like, it's a lot. It's a lot of cool plugins. One of the things that if you'll notice when you go to the change log, it starts with breaking changes and then deprecations. And I know I think this is around because people, when they upgrade, they want to know if it's going to break their code or not. I have tested a bunch of stuff and upgraded from six to seven and I haven't noticed a lot. There was, there was a, like a six one to
Starting point is 00:06:15 six two. I can't remember what the, the, there was one break a while ago in the six X that, that messed some plugin authors, but I haven't noticed any problems. So please try these out. I wish they would do the features first and then not the breaking changes. I suspect it's the people working deep in the guts, like the plugin authors that hit these deprecations and not just people doing assert this equals that type of work. Yeah, right.
Starting point is 00:06:40 One of the things that I didn't list, but I think a lot of people are excited about, there's more the objects within PyTest that people are using. More of them are type hinted now so that you can do type hints with objects. Oh, that's nice. Yeah. That's really nice. So, fun.
Starting point is 00:06:57 Leah, do you use PyTest? Some of these changes exciting? We do. We use PyT test on our python samples and so i actually the one that was most exciting to me was the fixtures figuring out where fixtures are is definitely something that comes into play for me um especially when we're maintaining something that was written a while ago by someone who might not be working on that code anymore yeah yeah nice yeah this is great i love the pip installable RC1. That's great. And before we move on, just taking a step back,
Starting point is 00:07:28 Roman Wright, author of Beanie. Hey, Roman, out there in the audience says, hey, I'm a big fan of Google Cloud. Oh, thank you. For sure. Well, I've got some fun stuff to talk about next year. I want to talk about this thing that David Smith, former guest co-host here on Python Bytes,
Starting point is 00:07:44 sent over and said this looks cool sam low and um philip guel released this thing called pandas tutor this is cool yeah previously philip had built python tutor at python tutor.com now there's pandas tutor.com and it's all about just helping you understand what the code does. So it basically says, look, there's this code here. Like imagine you've got a list of dogs that have a breed, a type, a longevity, a type is like a herding dog or a toy dog. It goes in a purse, longevity, size, weight, and so on. And you've got that as a data frame. If you wrote dogs where the size equal, equal medium, then sort values on type,
Starting point is 00:08:27 then group by by type, and then show the median. Well, what is that actually doing? Like, how do I understand that, right? As somebody learning pandas, imagine I don't really have a database background. And so I'm not sort of trying to map that over to like, okay, there's the where clause, there's the order by clause, and you know, like that kind of business, right? So what what is happening when I write that code, either because I'm coming across it for the first time, or, which happens to me a lot. I wrote it two years ago and understood it perfectly, then I have no idea what it does now. Yeah, that happens way too often, right? So what you do is you can go and run this code over in pandas tutor and you say visualize it says running a
Starting point is 00:09:06 code please wait and so what they do is they put a csv bit of text in here it's like a triple string and then use pandas read and then just do that one line so that's a nice way to kind of get data in there and the way to think about this is steps it shows you what is the first step and what is the second step and so on so when you go, you'll see that it has the code that we were talking about. But then right now, the effective where clause, the filter is regular font, and the rest is gray, it's like fade into the background. And so you can actually see what the starting data frame was, and the ending data frame, and then how it got in there. And you can use the mouse over like, so what they're saying is the type is medium. So if you hover over like
Starting point is 00:09:44 a large or a small dog, there's just no arrow. But if you hover over medium, it shows you where in the result that that thing landed. Isn't that cool? That's wild. Isn't that wild? And so then you can see size has all the values on the left and then the size is grouped on the right and it shows medium, medium, medium, medium, because that's all that's in there. Now, when I first looked at this, I'm like, there's a bunch of stuff on the screen. What's going on? I noticed the arrows, but then what it took me a minute to realize is there's multiple steps. So the next thing, if you scroll down shows the same code at the top, but now the sort values
Starting point is 00:10:16 type is highlighted, right? That's the next part of what looks like one expression in pandas. And so now it highlights the column that it's sorting on. And you can actually see the arrows pointing to how they were reordered in the result because you're sorting by type. So it's non-sporting, non-sporting, non-sporting, non-sporting, and then sporting, sporting, and working, working, and so on. So that was step two. We have a group by, this one's interesting. It doesn't have arrows, it has colors. So the group by type, again, non-sporting, sporting, so on, you end up with these groups. Like here's a blue, a blue box of all the non-sporting dogs, the bulldog, the poodle, the French bulldog is so cute. Then you've got the golden retriever and the Labrador and the
Starting point is 00:10:55 boxer, right? So these are grouped into the colors. And then finally you do the median and it shows how those groups reduce down to statistics. Like the longevity of a non-sporting dog is less than a sporting dog apparently, but they're also lighter. So anyway, what do you all think? Oh my gosh, I love this. This is nice, right? I'm a very visual learner, so I really appreciate this. And especially if you're working with data that you kind of aren't sure what it does and or the code, like that's pretty incredible. I'm filing this away. It's going to go into my team's group chat pretty much as soon as we're done recording.
Starting point is 00:11:29 In fact, yeah, it's awesome. I think it's really good. You know, there's so many people who are presented a notebook or presented some kind of result. And they're like, I need to understand what that means so I can keep following. And I think, you know, throw it into here or something like this would be really helpful.
Starting point is 00:11:42 Well, and a lot of people that have spent a lot of time with databases might it might be obvious what these things do but for people that don't spend a lot of time with sql um it's not obvious and uh and so this is really nice yeah definitely or if you're like trying to take some example that you have with their example data and trying to translate it to your own data. That's something that customers do all the time for us. It's something I do a lot, too. Just seeing how it behaves with your stuff. Oh, man. You didn't write it, but you want to use it.
Starting point is 00:12:15 So how much applies? Exactly. Yeah. Yeah. So this is quite cool. Dean out in the live stream. Hey, Dean. Says Panda's Tutor looks awesome.
Starting point is 00:12:25 And Robert Robertson also loving it. It's nice. So very cool. Indeed. All right. Over to you, Leah. All right. So yeah, my first thing today is Apache Airflow.
Starting point is 00:12:34 So Airflow is a project that is part of the Apache Software Foundation. It's a workflow orchestration tool that originated at Airbnb, I want to say in like 2014. And then pretty shortly after became part of the ASF. And it became a top level Apache project in, I would say early 2019. It's been a little while now, which is very exciting. So you can use it to author these workflows as directed acyclic graphs or DAGs of tasks, which is pretty cool. And it's most commonly used with workflows that are like pretty static, not super frequently changing or slowly changing, just so that you can see how the workflow goes over time and not allows you for some clarity and continuity in your workflows. I've always sort of wondered what the role of these workflow type systems were
Starting point is 00:13:27 until I realized, you know, if you're going to build a full end-to-end type of workflow without a framework, there's a lot of coordination. And what if this fails? Where do you restart? What do you do? And then the analogy for me is kind of like Flask or some web, like all I got to do is write this little thing
Starting point is 00:13:42 and everything else will come together to make sure these four lines of my Python code run, they run reliably, if they fail, it gets dealt with, right? It allows people to not have to understand the whole system and just go, I need you to load up this file and put it into that database. Can you write that code? And that's all you got to know to be part of some complex thing, right? Yeah, it's, I mean, it's not the most glamorous thing, but it is extremely useful. I mean, I did a summer internship when I was doing my bachelor's where I wrote a cron job that ingested some data every night. And the only way I knew if it failed was if I looked in the target folder where it's supposed to end up.
Starting point is 00:14:20 And if the data wasn't there, no files. Whoops. That sucked. I'm sure a lot of people have dealt with that. And this is actually like a really common Airflow workflow, which is the extract, transform, and load the ETL workflow, which is where you have data somewhere that you want to get. You want to do something to it or maybe not.
Starting point is 00:14:40 Maybe you just want to extract and load it. And you want to put that result somewhere else, either locally or in the cloud for all of that. And Airflow lets you do all of that. And you can see the history of these jobs. There's a UI where you can see, did it fail? It has a helpful error message if it failed. It's not just, oh gosh, the data's not there.
Starting point is 00:15:00 What do I do? Yeah, you've got a really cool UI where it shows all the parts of the workflow running and yeah whether or not they finish successfully and stuff right yeah and it got a makeover uh fairly recently so it's a lot of improvements um yeah that's super cool another thing maybe you could talk about really quick is the connectors i don't remember that exactly the right terminology there's a yeah it's a name for them tell us tell people about that that's also good to know so these connectors that you're thinking of, I mean, we can use the word connector
Starting point is 00:15:27 to describe what it does. So there are these things called operators in Airflow and an operator executes a single task. And so that might be executing a bash script or executing a Python script. But we also have these connectors that are grouped by providers, which might be your cloud provider
Starting point is 00:15:45 or other software providers that allow you to execute code there. So for example, we have a ton of GCP operators. One example might allow you to create a Dataproc cluster or then like run a job on that Dataproc cluster and maybe tear it down when you're done. And we, there are providers that have operators for all the major clouds and more. You can do, there's one that like sends a Slack message when it's done. So it's, if you can dream it, it might be there.
Starting point is 00:16:16 And if not, you can make it there. That's awesome. What's GCP? GCP is Google cloud platform or GCP might be a dated acronym. Sorry. Don't know. Yeah.
Starting point is 00:16:30 Yeah. So one of the advantages, I think, of that that's really cool is you don't necessarily have to know all those APIs. Like if I was going to connect Slack to GCP to like Azure Blob Storage to like some hosted database, I don't have to learn all those things. I can just sort of click it together. Yeah. to like some hosted database, I don't have to learn all those things. I can just sort of click it together. Yeah, you just have to, there's a small amount of setup you have to do for auth, which is understandable.
Starting point is 00:16:50 You can't just like publicly go to your Azure blob thing to grab your data. But once you've set up that connection, then your operators can talk to those things. And if you use, so you can run or host Airflow yourself. And there are a few different ways to do that.
Starting point is 00:17:06 And then Amazon and Google both have managed hosted providers. And there's a company, Astronomer, that also does manage hosted ones. And so if you're in an Amazon or a Google, the advantage there is that the connections with those operators might be a little bit simpler from the auth and networking perspective. But other than that, you can still, like if you're running in Cloud Composer, which is Google's Airflow, you can still be using the Amazon
Starting point is 00:17:32 or the Microsoft operators to pull data from over there. That's really common. And you see it all the time and bring it, do some stuff in Google Cloud and either put it back in the other cloud or leave it in Google Cloud. That's totally normal. And people are doing that all the time. Right on. Yeah. Cool, cool. I think this is neat to people for whom that would make sense. You're like trying to do these sort
Starting point is 00:17:54 of running in the background, schedule jobs, or there's triggers as well. Like a file has been uploaded or landed here. Yeah. Let's talk about that. So that's actually, I had written down this one example, but I'll adapt it slightly since you mentioned triggers. So that's another common type of operator, these sensors where you wait for a certain condition to be true. And they're used in data analytics workflows all the time. So like one example workflow might be waiting for a particular file to appear in a cloud storage or an S3 bucket. So you'd use one of those sensors to wait for that to happen. And then you want to do something to that data. So let's say you then create a Dataproc cluster that is going to run a PySpark job on that cluster. And then you
Starting point is 00:18:38 can store the results in BigQuery at the end and then delete the cluster and like send a Slack message when the job is done. That's a very common ETL thing, including that sensor. Yeah, that sounds pretty nice. Yeah, definitely seems interesting and quite useful. Yeah. Thoughts before we move on? I have a question. If you wanted to get started with something like this, I was trying to look for tutorials and getting started and stuff like that.
Starting point is 00:19:05 Does it make sense? Or is it too confusing of somebody? You said you could run it on your own machine. Does that make sense to try it that way? Or should you try it with a... Okay. You totally can do it on your own machine. And there's this really wonderful environment that can be found in the Airflow repository that's called Breeze,
Starting point is 00:19:26 and it's a Dockerized version of it. It shouldn't be run in production, but if you're looking to try it out or if you're looking to contribute to Airflow, we highly recommend that everyone check out the Breeze environment. Right now, I have the community page pulled up where you can join the DevList and the slack if you have questions but if you were to go to the github repo you would see breeze right on that first page okay cool thanks yeah great question thank you yeah very good one all right brian are you gonna give us a tutorial on airflow or what we got going next yeah so i was looking through the tutorials in airflow and i noticed that one of the right right away one of the examples used Dedent.
Starting point is 00:20:08 How about that for a connection? Nice connection. Totally well planned. Very cool. Dedent was suggested. It's a text wrap tool. It's suggested by Michael Rogers-Villet. It's a small utility, but it's super useful. And I kind of forget that it's... I mean, I use it all the time, but I forget to mention it to people. But it comes up a lot. And the idea around ddent is
Starting point is 00:20:32 you've got something Oh, I think I lost my ddent thing. See if I can find it. There it is. The idea is you've got a multi line string, like here here we've got hello world and some multiple lines and there's different spacing. But I'm as you notice, I want to define it within a test, within within a test function or within some other function. And that's so there's this extra like space at the beginning. That's that's in the string. It's in the multi-line string. And we don't want that. We don't we want it to be just, just no, like
Starting point is 00:21:06 nothing at the beginning or the same amount chopped off. So one of the options that people have used before is to just define a very multi-line string out of the function. You just do it out of the function. Then it's against, then it's just against the left side of your editor or whatever on column zero. And you don't have to worry about it. But it does bother some people that you've got this variable defined outside of your function when you're just using it within one function. So dedent is the answer. So what dedent does is it just takes a multi-line string and strips off all the common white space at the beginning. That's it. But it's super useful. They've got a little example that we're showing here,
Starting point is 00:21:45 but I think this is not a great example. So I wrote a new example. Oops, fell asleep. And so the idea really is I've got a function that either prints stuff or has some output, and I want to be able to compare that string, and I want my comparison to be in the function. So I use dedent to just comparison to be in the function um so um so i use ddent to just
Starting point is 00:22:07 write it right in my function and then i don't have the spaces and then yeah anyway so this is a uh a high test example of how you could test a output string so anyway this really sounds like a classic example of there's a problem like the open source this really bothered me and so i wrote something to fix it and it's it's wonderful like the time the open source this really bothered me and so i wrote something to fix it and it's it's wonderful like a time-honored open source reason to make something but um i also want to remind people that ddent is not the only thing in text wrap and text wrap has a whole bunch of other cool tools so it's used uh it's not huge it's just but a five minute read to peruse what's in text wrap so that next time you need to manipulate some text it's useful so nice yeah maybe wrapping yeah like wrapping well it does things like like if you've
Starting point is 00:22:51 got a huge string and you want to be able to like one of the things is to uh shorten it so if you if you've got a huge string but you really only have like eight characters to show something like ellipsize it yeah it does that for you. Oh, that's nice. That's there too. That's good because I've written that code. It wasn't fun. Okay. It didn't feel useful either. I'm like, okay, great, it works.
Starting point is 00:23:11 But here we go. Some audience feedback. Anthony out there. Hey, Anthony says, it's really useful. Used it many times. Nice. Quite cool. Mm-hmm.
Starting point is 00:23:20 All right. This next one comes to us from Dan Bader. You might know him from A Real Python and other things. He and I were chatting and he said, hey, have you heard about pip audit from Trail of Bits? And I was sure that I had, and I thought we had talked about it, but then I realized, no, I don't believe we have. So I must've just heard about it somewhere else and we haven't covered it before. So the idea is we've heard about a lot of issues with supply chain vulnerabilities, things getting into pip, but also Ruby gems and NPM and so on. Sometimes that's somebody trying to be evil and putting in some typos squatting thing. Or, you know, worse than that would be if the GitHub account of a maintainer got hacked and somebody published a package with like to the real package, right? So however,
Starting point is 00:24:06 things might get into your dependencies. If something is going on bad there, it's better to know than to not know. So this pip audit is all about that. It audits Python environments as in virtual environments and dependency trees for known vulnerabilities. So that's one of the things that's interesting is when you pip install things, you might be very good about saying, oh, I pip installed Flask and I pip installed Pandas. So those are going into my requirements file or my pyproject.toml. But did you remember to pin their versions so that things like GitHub will say your version is wrong? Because if it just sees Flask and the recent version doesn't have a problem, it's not going to tell you. But the one you have installed may.
Starting point is 00:24:50 Also, the transitive closure of the dependencies. So flask depends on it's dangerous, which depends on, I don't know. But if there's something down that chain that has a problem, you may have not put that in your requirements file and you may not be tracking it. I might be paying careful attention to flask. I might not care anything about it's dangerous, but that's where the problem is, right? Yeah. So this tool from TrailerBits, which is a security company, basically solves that problem.
Starting point is 00:25:11 And it lets you just type pip-audit. And for me, it's a dash r requirements.txt or whatever. And from what I can tell, what it does is it will go create its own virtual environment where it one by one installs each package, looks at the things that come out of that process and then scans those. So it's not just looking at, oh, you say you have Flask and that's 201.
Starting point is 00:25:36 Great. You're good to go. It actually installs it because who knows what the setup.py process is doing and all those kinds of things. And then it scans that and it gives you a report. So for like TalkPython training site, we have, I don't know, 30 dependencies or something. And it sat there and it took, I don't know, probably took two minutes to go through and it said, everything's good to go. So that was good to hear, but it's pretty neat, really easy to use. It's like an external tool, like black or something. So it's very,
Starting point is 00:26:04 a good candidate for PipX. And then it's just globally available to point at any environment. What do you all think? Oh, this is so cool. I heard about it because one of my colleagues, Dustin Ingram, I think has been involved with it. Or either it's his Twitter that I found out about it from. But he also has a really good talk from PyCon this past year about the supply chain vulnerabilities. That's worth checking out if you're wanting to get an idea of why this is important. Yeah. Yeah. We've highlighted a few
Starting point is 00:26:32 examples over the years, but it's definitely something you want to pay attention to. And that's cool that Dustin was talking about it. He works, I think he's still working with the PyPA and works on the PyPI.org and all those kinds of things. So very cool. Warehouse. Brian, what do you think? I think this is cool. I'm going to start using it right away.
Starting point is 00:26:52 This is nice. Yeah. I already used it once as well, and everything seems good. So here, look, I even called it Flask as an example. So here on this particular version, there was this security vulnerability from 2019. And same with, I guess, Jinja and all those were good. But yeah, it gives you a nice description of what went wrong. And in this case, it's a denial of service attack and whatnot.
Starting point is 00:27:14 So I definitely recommend people pin versions, definitely, in your requirements. But what do you all think of including hashes? I think that's something Dustin talked about in his talk. And at the time, I was like, oh, that sounds like a good idea. And it's not something I've started doing yet. Yeah, exactly. That's exactly what I think. It sounds like a good idea.
Starting point is 00:27:35 And I'm not doing it yet. So anyway. But that sounds like it's a me problem more than anything else. Also, it seems like a good idea. You know, I might be missing a step. It feels like the challenge you're going to run into there, what you're preventing against
Starting point is 00:27:53 is a man in the middle attack. Somebody can intercept what's happening with PyPI.org and sneak in some kind of broken hacked version. I don't know. I don't necessarily trust what goes into PyPI.org, but I trust PyPI.org.
Starting point is 00:28:08 So I'm not super, it's not my biggest worry. There's like 10 other worries that make me have a hard time sleeping at night about running stuff on the internet that precedes that. So I haven't worried about it, but maybe I should. It's in the queue of things to worry about. Well, for instance, with this audit, you can,
Starting point is 00:28:26 you can pin your stuff and, um, and then have it be, um, check it every once in a while, install everything and check things. I don't see why it couldn't be a CI step. I was actually just going to say that PIP audit,
Starting point is 00:28:39 I need to bring it to my Lake samples, maintaining group to talk about, uh, who wants to implement it and how, how soon we're going to do it. And whose pager rings when it finds a problem. Yes. Yeah, look at that, pagers from back in the day. All right.
Starting point is 00:28:52 Well, that's all I got for that one. We're off to Leah. I'm so glad you mentioned pitting requirements because that is actually, that's a great segue for managing samples for GCP. So what I have open right now for Google Cloud is an example documentation page. I picked Cloud Composer because it's what I work on. And I want to give an example of where this code lives that I'm talking about that I work with this group to maintain.
Starting point is 00:29:19 So this is a page that's about using a particular Airflow operator. And if you were to scroll on it, you will see these code samples. And they are all stored in GitHub and then embedded in our docs. So you can click view on GitHub on any one of them, and it will take you to the linked repository. You can look at the history, look at everything in context. So we have thousands of samples for all of the Google Cloud products just for Python. But we have them in other languages too. And they're located across hundreds of repos. This happens to be one repo that has samples for multiple products, but we have other repos where things are stored too. So to ensure that there's consistency and that my group
Starting point is 00:30:07 of engineers, my colleagues and I actually have time to do our work and function as humans outside of work too, we use a lot of automation. So we use a lot of bots to do things like keep our dependencies up to date, check for license headers, auto-assign PRs for reviewing, syncing repositories with centralized configurations, and even more, which is pretty great. And this is actually where the pinning requirements comes in. We very strongly believe in pinning requirements because it makes the samples easier to maintain and test against. And it's easier to go back to the product and say, hey, you just pushed a release candidate for your product and it broke your samples. It wasn't supposed to. What gives? Rather than finding out mysteriously when getting a customer issue. So then to keep it up to date,
Starting point is 00:30:59 we use a bot. And these are some pull requests recently opened by the bot of some dependencies. They get double checked to make sure everything looks good by human and merged. It's pretty great. And then we actually have a team of engineers in DevRel that works on making GitHub bots that we use. And that is totally open source. So you can see some of the ones that we use. Like we have our license header one. The sync repo settings allows us to have a single source of truth for our configuration for all of our Python repos. And then it makes sure it gets synced across all of them. It's pretty great. I really don't know how I would function without all of my bot friends.
Starting point is 00:31:41 This is super cool. I can just imagine how much work it is to keep all of those different things in sync. And I have worked recently on projects where I'm like, okay, I got to integrate this library. I'm going to go to the documentation and I try to use the one or two functions that the whole thing does. And it's like, nope, that parameter doesn't exist. Or you're missing some print. You're like, come on, at least just keep the signature right, you know? And of course, it's something like star args, star star kwrgs. It's not like, oh, I can
Starting point is 00:32:10 just look in my ID and see, oh, yeah, it says it takes, like, use security, use SSL, yes or no. Like, no, it's unknown without the documentation, basically. Yeah. This is awesome. Thank you. I think so, too. I'm very grateful to it. And yeah, for our dependency bot, we do use an external one. I know, I think GitHub is the one that does Dependabot. We in particular use White Source Renovate bot. It's what we were using when I started. And that works very well, too. And they're very nice and responsive to issues.
Starting point is 00:32:38 Oh, that's fantastic. Yeah. Yeah. Dependabot was fairly new and then it was bought quite recently by GitHub. So I can imagine you were all doing something before then, yeah? Probably. But I know I have friends who use that too and they're great. Using a dependency bot, I would say if you need a starter bot for any of them, the dependency bot is a great place to start. Yeah, that's fantastic. I recently switched to pip tools and pip compile to generate my requirements with pinned versions and stuff.
Starting point is 00:33:04 Nice. but before that i was all about the pin to bot telling me if something new was up out and seeking that nice yeah pip tools rocks love pip tools yeah it definitely does brian there's a lot of cool automation here what do you think i'm excited about looking through all these i love looking at bots because the whole idea about the bot is to do is like the Unix philosophy of do do one thing and do it well. Yeah. And yeah, I love that. And have something else do it and not you do it.
Starting point is 00:33:32 Oh, yeah. All of our bots are based on like, oh, gosh, we're doing this one thing over and over and we're not doing it well because we're doing it manually. So how can we like use automation to make sure we're doing it consistently and to save a lot of time? Like one of the things you've got in here that's shown right now is label sync. So one of the nice things about, one of the interesting things about different groups workflows is to have different labels that mean different things. But when you open a new repo, it doesn't have all those labels. So being able to sync those labels across an organization. Like needs triage, good first contribution, all those kinds of things, right?
Starting point is 00:34:12 Yeah. As I said, we have hundreds of repos just for Python. And we use things like we have labels that say what API something belongs to. And that helps with the auto assign bot to make sure that issues and PRs get routed to the right team. Otherwise you're having a human do all that triage, which is fine, but doesn't scale super well in our use case.
Starting point is 00:34:36 Yeah. And adding a label is really easy in a, in a, to, to an issue or something. So having a bot that looks at label changes and just does an action based on that is a brilliant use of time. Yep.
Starting point is 00:34:48 Highly recommend. Yeah, fantastic. This is great. And you have an install link next to all of them. Does that mean I just click that and install it into one of my repos? I believe that is the intent. And if it doesn't work,
Starting point is 00:35:00 you should open an issue on this repo because my colleagues are very responsive. Fantastic. Now we just need bots to generate bots. Honestly, if my colleagues told me they were working on that in this repo, I wouldn't be surprised, but I don't know. The meta bot. Yeah.
Starting point is 00:35:15 Fantastic. All right. Well, how about some extras? Brian, you got anything extra you want to share while we're here before we call it a show? No, just I'm fighting a cold. Hopefully that'll all be over. Yeah, maybe some sort of audit thing. We'll check your health status.
Starting point is 00:35:31 We can run that against you. Leah, anything else you want to share with us? Oh, I mean, on Twitter earlier, we were talking about HTTP status codes, and it reminded me of still my forever reference for HTTP status codes is http.cat. Yes, http.cat is fantastic. It's so good. It is so good. Let me share a few non-funny things and then we'll mix that in with our joke. Please do.
Starting point is 00:35:55 Fantastic. All right. The first one has to do with, speaking of GitHub, another cool GitHub thing. You know, you could press a dot and that would do certain things. This only works if you're signed in. But now there's a command palette. This idea of command palettes are becoming popular in UIs. We've got it in VS Code.
Starting point is 00:36:14 We've got it in like Superhuman, the email. And often you get them by pressing Command K or Control K. And now you have that for GitHub. So if I were on a repo where or I could do stuff to it, I could hit command K. And then it will say, what do you want to do search or jump to, I can go to pages issues, I could look for, let's see, look for the app, if I just type app, it'll search for those, I can search for all sorts of things here. And boom, it takes me and shows me all the apps. Isn't that cool? That's so cool.
Starting point is 00:36:45 Command palette. Yeah. That's now a thing. That's beautiful. And you can just, you, I mean, no mouse. I'm here. I'm in this repo, the top level, command K down arrow two times to enter. I'm on the issues.
Starting point is 00:36:56 Oh my gosh. Love to see it. Yeah. So that's a good one. The other one, the other extra is Python 310.1 is out, released December 6th. So as in two days ago. Wow. It's got a fun little snake with a hat on. Love it.
Starting point is 00:37:11 That's really about 3.10. So let me describe. I can cover the entire release for you. So Python 3.10.1 is the newest major release of the Python programming languages. It contains many features and optimizations. So now you all know what's in it. It's very vague. It contains many features and optimizations. So now you all know what's in it. It's very vague. Apparently it has 300 commits of changes and fixes. One thing I would, I wanted to know, are there security updates? Yes or no. Like, should I, like, should I install
Starting point is 00:37:39 this if I'm curious or should I install this now before tomorrow? Because someone's going to start poking around. I would love if it would say that. There's a great thing about the major features, but that's just 3.10, not the point release. So anyway, still good. Yeah, we've been having fun making all of our GCP samples, making sure they're 3.10 compatible,
Starting point is 00:38:00 which we're getting there. It's all waiting for certain dependencies to be ready. It's a lot of fun. Very exciting to see. Yeah, that's awesome. Well, you waiting for certain dependencies to be ready. But a lot of fun. Very exciting to see. Yeah, that's awesome. Well, you can look at the changelog. So if you look at the changelog, you can see 3.10.1 and stuff. I can.
Starting point is 00:38:15 Go up a little bit. Full changelog there, maybe? Yeah. Yeah, that's true. I can go to the changelog there and check that out. But having a security thing would be an idea. Yeah, just like. A TLDR. Yeah, exactly.
Starting point is 00:38:25 Exactly. Cool. Well, and also a lot of people didn't want to try three 10 until we got one patch release. So now we have one patch release. So there's no excuse. So now, now it's quote safe. I have been running it for a day in production and it seems okay. Put it on one site to see if it would hang in there.
Starting point is 00:38:41 It seems fine. So we're all good. Yeah. All right. The samples that are using it are doing fine. They've had passing periodic builds for a while. Yeah, fantastic. All right.
Starting point is 00:38:48 Are you ready for some cat jokes later? Sorry. Yes. I mean, we started our conversation off today talking about cats. It's true. Before we hit record. So I feel like we should round that out.
Starting point is 00:38:59 Definitely. So first of all, httpstatuses.com is a fantastic place to go learn about the real meaning or the official meaning, let's say, of status code. So for example, there's 100 continue. And if you want details, you click on that and it actually pulls this all up. Even shows you like the enum in Python. Oh my gosh, I love that.
Starting point is 00:39:19 Isn't that cool? Yeah. It gives you the meaning like 100 continue. The initial part of a request has been received and has not yet not yet been rejected by the server the server intends to send a final response eventually and so there's other ones like 200 okay 201 created let's see what else should i point out 304 cache not modified 400 bad requests bad request uh 404 not found. 403 forbidden. 500 internal server error. Yeah, 418. I'm a T-bot.
Starting point is 00:39:49 Yeah. And 502 bad gateway. Okay. So let's do yours firstly. Please. I put out this joke and you said, this is good, but oh my goodness, cats. Yeah. So when I was doing my computer science degree, a friend shared with me HTTP.cat when we were learning about HTTP status codes.
Starting point is 00:40:09 And if you go there, you will find one cat per HTTP status code representing what is going on. And I'm not going to lie to you, in my professional career, I still use it as a reference because it's my favorite one. And you can even, if you go to like http.cat slash 200, it returns a JPEG of a cat that's like, okay. Yeah, exactly. And you can do that for all of the status codes. 201, the cat has walked through some wet cement and that's 201 created for a footprint. Let's see what else we got in here. Some good ones. 404, not modify. 304, sorry. The 404, the cat is hiding under some wrapping.
Starting point is 00:40:50 Not found. Fantastic. Yeah, I love this. I had not heard about this and it's glorious. Thank you. Well, wait, is there a 418? There is. Of course there is.
Starting point is 00:40:59 I'm a teapot. A kitten in a teapot. Literally inside of a teapot. All right. So I saw this joke by Breen, who is John Breen, and thought, that's really funny. What he did is he put his own personal take on what status code means. And I thought they were hilarious. But I thought, let me take a shot at this as well, a little more Python focused.
Starting point is 00:41:22 So I linked my tweet. I put this set of colloquial meanings of the HEV status code. right you're all ready for this yeah dude so 200 is what's up all good uh 201 hello creator 304 uh not modified or cached is same old same old 403 permission denied it's get off my lawn kids 404 is just there's no message it's just not there it's just not that that's the message but it's just blank 500 is we're bad at apis server error 400 is you're bad at apis yes though the the real cardinal sin of apis is 200 but in the body there's a j JSON that says error and a reason. Oof. 200, but with error text, we're really bad at APIs.
Starting point is 00:42:10 Yeah. 502, we're bad at deployment or DevOps because part of the infrastructure can't get to the other part. And Brian's favorite, 418, is it already April again? Yeah. Because the reason is that was actually put into the spec as an April Fool's joke and they left it. I'm a teapot. I love that.
Starting point is 00:42:28 They left it. I do too. I do too. It's like a, like import this, just stuff. That's fun. That should just always be there.
Starting point is 00:42:36 Yeah. Yeah. What's the harm? What's the harm? Just leave it there. Anthony, the live stream has some feedback for you, Leah status codes
Starting point is 00:42:45 using cats well I never I mean where there's internet there is cats no oh of course why we created
Starting point is 00:42:52 the internet in the first place is for cats exactly alright well I think that's it for our show Brian thanks for being here
Starting point is 00:43:00 as always Leah thanks for joining us thanks for having me thanks for listening everyone yeah you bet see y'all later bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.