Python Bytes - #220 What, why, and where of friendly errors in Python

Episode Date: February 11, 2021

Topics covered in this episode: We Downloaded 10,000,000 Jupyter Notebooks From Github – This Is What We Learned pytest-pythonpath Thinking in Pandas Quickle what(), why(), where(), explain(), mo...re() from friendly-traceback console Bandit Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/220

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 220, recorded February 10th, 2021. I'm Michael Kennedy. I'm Brian Hockett. And we have a special guest, Hannah. Welcome. Hello.
Starting point is 00:00:14 Hannah Stepnick, welcome to the show. It is so great to have you here. Thank you. I'm happy to be here. Yeah, it's good to have you. It's so cool. The internet is a global place. We can have people from all over.
Starting point is 00:00:23 So we've decided to make it an all Portland show this time. We could do this in person, actually. Well, not really, because we can't go anywhere. But theoretically, geographically, anyway. Yeah. So all three of us are from Portland, Oregon. Very nice. Before we jump into the main topics, two quick things. One, this episode is brought to you by Datadog. Check them out at PythonBytes.fm slash Datadog. And Hannah, just want to give people a quick background on yourself. Yeah, so I'm Hannah. I have written a book, which is weird to say, about pandas. But I also just go around and give talks at various conferences on Python.
Starting point is 00:01:00 So yeah, I gave ReArchitecting a Legacy Codebase recently. That sounds interesting and challenging. Yeah. What was the legacy language? Was it Python or something? It was, it was Python. It was like a flask web application. And then also the front end of it was Vue, like Vue.js. Oh yeah. So yeah, that's been a fun project that was through work as developers. Like you're pretty much always working with some form of legacy code. Just depends on how legacy it really is. Well, what could be cutting edge in one person's viewpoint might be super legacy in another, right? Like it's Python 3.5, you wouldn't believe it. Right. Yeah, very cool. Well, it's great to have you here. I think maybe we'll start off with our first topic, which is sort of along the lines of the data science world, some tie-ins to your book. And of course, whenever you go to JetBrains, you've got to We Downloaded 10 Million Jupyter Notebooks. I almost said 10,000. 10 million Jupyter Notebooks from GitHub.
Starting point is 00:02:09 Here's what we learned. So this is an article or analysis done by Elena Guzaharina. And yeah, pretty neat. So they went through and downloaded a whole bunch of these notebooks and just analyzed them. And there's many, many of them are publicly accessible. And a couple of years ago, there were 1.2 million Jupyter notebooks that were public. As of last October, it was eight times as many, 9.7 million notebooks available on GitHub.
Starting point is 00:02:36 That's crazy, right? Wow. Yeah. So this is a bunch of really nice pictures and interactive graphs and stuff. So I encourage people to go check out the webpage. So for example, one of the questions was, well, what language do you think is the most popular for data science just by judging
Starting point is 00:02:52 on the main language of the notebook? Hannah, you wanna take a guess? Oh yeah, Python, for sure, without a doubt. That's for sure. The second one, I'm pretty sure no one who's not seen this, there's no way they're going to guess it's Nan. We have, we have no idea.
Starting point is 00:03:10 Like I, we look, we can't tell what language this is in there. Um, but then the other contenders are R and Julia and often people say, oh yeah, well, Julia, maybe I should go to Julia from Python. Well, maybe, but that's not where the trends are. Like there's 60,000 versus 9 million, you know, as the ratio. I don't know what that number is, but it's a percent of a percent type of thing. Wow. They also talk about the Python 2 versus 3 growth or difference.
Starting point is 00:03:34 So in 2008, it was about 50% was Python 2. And in 2020, it's a Python 2 is down to 11%. And I was thinking about this 11%. Like, why do you guys think people, there's still 11 there um hanging around i mean i would guess speaking of legacy applications um probably it just hasn't been touched but um yeah yeah those are very likely the ones that were like the original 2016-17 ones that were not quite there uh they're still public right github doesn't get rid of them uh the other one is i was thinking, you know, a lot of people do work on Mac or maybe even on some Linux machines that just came at the time with Python 2. So they're just like, well, I'm not going to change anything.
Starting point is 00:04:13 I just need to view this thing. I don't have Python. Problem solved, right? They didn't know that there's more than one Python. There's a good breakdown of the different versions. Another thing that's interesting is looking at the different languages, not language, different libraries used during this. So like NumPy is by far the most likely used. And then a tie is Pandas and Matplotlib and then Scikit-learn and then OS actually for traversing stuff. And then there's a huge long tail. And they also talk about combinations like Pandas and NumPy are common and then Pandas and then like Seaborn, Scikit-learn, Pandas, NumPy, Matplotlib, and so on as a combo. And so that's really interesting, like what sets of tools data scientists are using. Yeah. And then another one is they looked at deep learning libraries,
Starting point is 00:04:53 and PyTorch seems to be crushing it in terms of growth, but not necessarily in terms of popularity. So it grew 1.3 times or 130%, whereas TensorFlow is more popular, but only grew 30% and so on. So there's a lot of these types of statistics in there. I think people will find interesting if they want to dive more into this ecosystem. You know, it's one thing to have survey and go fill out the survey, like ask people, what do you use? You know, what platform do you run on? Vue.js or Linux? Like, okay, well, that's not really a reasonable question, but I guess Vue.js, you know, like, but if you just go and look at what they're actually doing on places like github i think you can get a lot of insight yeah sure yeah i know i use um like i'll go to github pretty frequently like i work when i'm you know just like browsing
Starting point is 00:05:34 like i wonder how you do this thing or like what's the most common way to do this or yeah absolutely just look up like what's the most popular so it's a pretty good uh sign if a lot of people are using it it is one thing i should probably make more better use of is I know they started adding dependencies like, oh, if you go to Flask, it'll show you Flask is used in these other GitHub repos and stuff. Like you could find interesting little connections. I think, oh, this other project uses this cool library I know nothing about, but if they're using it, it's probably good. Yeah, for sure. Yeah. I love the dependency feature of looking who's using it. Yeah, absolutely. So Brian, you're going to cover something on testing this time?
Starting point is 00:06:09 Yeah. I wanted to bring up something we brought up before. So there's a project called PyTest Python Path, and it's just a little tiny plugin for PyTest. And we did cover it briefly in way back in episode 62. But at the time I brought it up as so okay, so the I brought it up as a way to, to, to just shim, like be able to have your test code, see your source code, but as just like a shortcut, like a stopgap until you actually put together like proper packaging for your source code. But the more I talk to real life people who are testing all sorts of software and hardware, even there's there that that's a simplistic view of the world. So thinking of everybody is working on on packages is is not real.'s applications for instance um that that they're never going to
Starting point is 00:07:07 set up hold their code together as a package and that's that's that's legitimate so if you have an application and your your source code is in your source directory and your test code is in your test directory um it's just your tests are just not going to be able to see your source code right off the bat right so what's more um tricky is depending on how you run it, they will or they won't. Yeah. Right. Right. If you say run it with PyCharm and you open up the whole thing and it can like put together
Starting point is 00:07:34 the paths, you're all good. But if you then just go into the directory and type PyTest, well, maybe not. It doesn't work. And it just confuses a lot of people. And so more and more, I'm recommending people to use this little plugin. And really, the big benefit is it gives you, it does a few things, but the biggie is just you can add a Python path setting within your PyTest Any file, and you stick your any file at the top of your project and then you just give it a relative path to where your source code is like source or
Starting point is 00:08:10 SRC or something else and then PyTest from then on will be able to see your source code. It's a really simple solution. It's just That's way better than what I do. I don't think it's a stopgap I think it's awesome. Yeah, I totally agree. What's way better than what I do. I don't think it's a stop gap. I think it's awesome.
Starting point is 00:08:25 So, yeah, I totally agree. What I do a lot of times is certain parts of my code. I'm like, this is going to get imported. So for me, the real tricky thing is Alembic, the database, database migration tool and the tests and the web app. And usually I can get the tests in the web app to work just fine running them directly. But for some reason, Alembic always seems to get weird, like working directories that don't line up in the same way so it can't import stuff so a lot of times i'll put at the top of some file you know go to the python path and add you know get the directory
Starting point is 00:08:55 name from dunder file and go to the parent add that to the python path and now it's going to work from then on basically and uh this seems like a nice one although it doesn't help me with olympic but still but it it might you might be able to add the olympic path right to it so yeah yeah for sure pretty cool so it's a yeah go ahead oh i was just gonna say yeah like this is something i like pretty much every time i set up a new project like i always have to screw with the python path i always like run it initially and then it's like oh can't find blah blah blah and i'm like oh here we go again but i usually always run my projects from docker though so i just you know hard code that stuff like just directly yeah once you get it set up yeah that's cool yeah um nice i dream of days when i can use docker again i have an m1 mac
Starting point is 00:09:42 and it's in super early, early beta stages. Oh no. Yeah, it's all good. I don't mind too much because I don't use it that much, but it's so cool. Brian, it says something about.pth, I'm guessing path files. Do you know anything about this? I have no idea what those are. Oh,.pth files. So there's, yeah, there's, there, there are a way to, I don't know a lot. I don't know the detail, the real big details, but it's, it's a way to have a, you can have a list of different paths within that file. And if you import it or don't import it, if you include it in your path, then Python, I think includes all of the contents into anyway, I'm actually, I'm blown smoke.
Starting point is 00:10:25 I don't know the details. Okay. Sorry. Yeah. But apparently you can have a little more control with ETH files, whatever those are. Very cool. Yeah. I don't know much about that either.
Starting point is 00:10:32 Yeah. Unfortunately. I mean, I've been using OS.path. So what do I know? All right. Speaking of what do I know? I could definitely learn more about pandas and that's one of your items here, Hannah. Yeah. definitely learn more about pandas and that's uh one of your items here hannah yeah so um i thought
Starting point is 00:10:46 uh maybe i just give like a little snippet of kind of like some of the stuff i talk about in the book um fantastic so yeah uh here we go uh so if we're looking at pandas in terms of like the dependency hierarchy um well and i guess I should start at the beginning. So what is pandas? If you're not familiar with it, it's a data analysis library for Python. So it's used for doing big data operations. And so like, if we look at the dependency hierarchy
Starting point is 00:11:18 of pandas, it kind of goes like pandas, which is dependent on NumPy, which deep down is dependent on this thing called BLOS, which is Basic Linear Algebra Subprograms. Right. And wasn't there something with BLOS and Windows and a Windows update and a certain version, I think recently? I can't remember.
Starting point is 00:11:35 I feel like there was some update that made that thing that wasn't working. Yeah, usually. A big challenge around NumPy and versioning and stuff to make it work. Yeah, usually the the blast library is built into your os already um and it just points at that but um if you're using um something like anaconda i think by default like it installs intel mlk um and uses that but yeah if you're using like linux or just like out of the box whatever's Windows, which is what it is if you like pip install it, then yeah, there could certainly be issues with like dependencies mismatches. Yeah. So, and I've like greatly simplified this, but in terms of kind of like the languages
Starting point is 00:12:19 and walking down that dependency hierarchy, you start out in Python with pandas. And then NumPy is partially Python and partially C. And then BLOS is pretty much always written in assembly. And if you don't know what assembly is, it's basically like a very, very, very, like probably the lowest level language you can program in. And it's essentially like CPU instructions for your processor. And so I've taken this just like basic example here and I'm going to kind of like roll with it. So if we're doing just like a basic addition in pandas, say like we have column A
Starting point is 00:12:57 and we want to add that with column B and like store it back into column C. Like a traditional linear algebra vector addition. Traditional vector math. So pandas, like if you, if you look at these operations, each, each of these like additions on a per row basis is independent, meaning like you could conceivably run like each of those additions for each row, like in in parallel like there's no reason why you have to go like row by row um right and that's essentially like what kind of like big data analysis libraries are like at their core is they they like understand this conceptually and try to parallelize things as
Starting point is 00:13:38 much as possible um and so that's kind of like the first like fundamental understanding that you have to have like when working with pandas is like you should be doing things in parallel as much as you can um which means understanding the api and understanding like which functions in the api will let you do things in parallel um so like if we're just not using pandas at all um say like we're just inventing our own sort of like technique for this like you might think well, each of these rows could be broken up into a thread. We could say thread one is going to run the first row addition, and then thread two is going to run the second row, et cetera. But you might find that we'll run into issues with this in terms of the GIL. The GIL is otherwise known as the global interpreter lock in python uh prevents us
Starting point is 00:14:26 from really like running a multi-threaded app uh operation like in parallel yeah basically python can run the rule is it can run one python opcode at a time yeah and that's it right it doesn't matter if you've got you know 16 cores 16 cores. It's one at a time. Yeah. Yeah. And this like is really terrible for, yeah, for like trying to do things in parallel. Right. So like that, that kind of use case is out like pandas and NumPy and all that stuff is not going to be able to use multi-threading. And so, and like, I just want to point out like Python,
Starting point is 00:15:08 like at its core has this like fundamental problem, which is why they went with the GIL. So like Python manages memory for you. And how it does that is it keeps track of references to know when to free up memory. So like when memory can be like completely destroyed and somebody else can use it essentially. And like that's something. Otherwise you've got to do stuff like Brian sometimes probably has to do with C and like free and all those things. Right. Yeah, exactly. Yeah. Yeah. So like C you have to do this with yourself with like Malik and free and all that stuff. But with Python, it does it for you. But that comes at a cost, which means like every single object in order to kind of like avoid this threading problem, they came up with the gill, which basically says you can only run one thread at a time
Starting point is 00:16:12 or like one opcode at a time, as you said. And attempts have been made to remove it. Like Larry Hastings has been working on something called the galectomy, the removal of the gill for a while. And the main problem is if you take it away, the way it works nowill for a while. And the main problem is, if you take it away, the way it works now is you have to do lock on all memory access, all variable access, which actually has a bigger hit than a lot of the benefits you would get, at least in the single threaded case. And I know Guido said, like, we really don't want to make changes to this if it's
Starting point is 00:16:39 going to mean slower single threaded Python, but probably not for a while. Yeah, yeah, yeah. And that is a big problem. So like, I mean, if generally what people use like instead of threads in Python is they use like multi-process and they spin up multiple Python processes, right? And like that truly kind of like achieves the parallelism. But anyways, I digress. Uh, so, um, so we can't use the gill, but what's interesting to note is when you're, uh, running NumPy at its very low level in C, like when you enter and look at the C files, it actually is not subject to the gill anymore because you're in C. Uh, and so you can potentially run, you know, multi-threaded things in C and call it from Python. But beyond that, if we look at BLOS, BLOS has built-in parallelization for hardware parallelization.
Starting point is 00:17:39 And how it does that is through vector registers. So if you're not familiar with like the architecture of CPUs and stuff, like at its core, you basically only have like, only can have a certain small set, maybe like three or four values in your CPU at any one time that you're running like ads and multiplies on. And like how that works is you load those values like into the CPU from memory. And that load can be quite time consuming. It's really just based on like how far away your memory is from your CPU at the end of the day,
Starting point is 00:18:16 like physically on your board. Right. Right. Is it in cache? Is it in? Yes. Yeah. And that's why we have caches.
Starting point is 00:18:21 So like caches are like memory that's closer to your CPU. Consequently, it's also smaller. But that's why we have caches. So caches are memory that's closer to your CPU. Consequently, it's also smaller. But that's how you can kind of, you might hear people say, oh, so-and-so wrote this really performant program and it utilizes the size of the cache or whatever. So basically, if you can load all of that data into your cache and run the operations on it
Starting point is 00:18:44 without ever having to go back out to memory, you can make all of that data into your cache and run the operations on it without ever having to go back out to memory, you can make a really fast program. Yeah, it could be like 100 times faster than regular memory. Yeah. Yeah. And so essentially, that's what Bloss is trying to do underneath and NumPy is they're trying to take this giant set of data and break it into chunks and load those chunks into your cache and operate on those chunks and then dump them back out to memory and load the
Starting point is 00:19:13 next trunk. Yeah, very cool. Thanks for pointing that out. I didn't realize that BLAST leveraged some of the OS native stuff, nor that it had special CPU instruction type optimizations. That's pretty cool. Yeah. Yeah. So it has, on top of the registers, it also has these things called vector registers, which actually can hold multiple values at a time in your CPU. So we could take this simple example of the addition, and we could actually,, we can't run those like per row calculations in parallel with threads. We can with vector registers. And the limitation there is that the memory has to be sequential when you load it in.
Starting point is 00:19:58 This is definitely at a level lower than I'm used to working at. How about you, Brian? But yeah, so anyways, this is just like kind of the stuff that I talk about to working at. How about you, Brian? But yeah, so anyways, this is just like kind of the stuff that I talk about in my book. It's not necessarily about like how to use pandas, but it's about like kind of like
Starting point is 00:20:14 what's going on underneath pandas. And then like once you kind of like build that foundation of understanding, like you can understand like better how pandas is working and like how to use it correctly and what all the various functions are doing. Fantastic. Yeah. So people can check out your book,
Starting point is 00:20:28 got a link to it in the show notes. So very nice. It's offering me the European, the Euro price, which is fine. I don't mind. So. Yeah. So like, I mean, it's on Amazon too. It's on a lot of different platforms, but I figured I'd just point directly to the publishers. Yeah, no, that's perfect. Perfect.
Starting point is 00:20:45 Quick comment. Roy Larson says, NumPy and Intel MKL cause issues sometimes, particularly on Windows, if something else in the system uses Intel MKL. Okay. Yeah. Interesting. I have no experience with that, but I can believe it. Intel has a lot of interesting stuff.
Starting point is 00:21:01 They even have a special Python compiled version, I think, for Intel. If you use potentially, I'm not sure they have some high performance version. Yeah. Yeah. Yeah, they do. Yeah. Nice. Also in Portland, you can keep it in Portland. There we go. Now, before we move on to the next item, let me tell you about our sponsor today. Thank you to Datadog. So they're sponsoring Datadog. And if you're having trouble visualizing latency, CPU, memory bottlenecks, things like that in your app, and you don't know why, you don't know where it's coming from or how to solve it, you can use Datadog to correlate logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python app. Plus, they have a continuous profiler that allows you to find the most resource consuming parts of your production
Starting point is 00:21:43 code all the time at any scale with minimal overhead. So you just point out your production server, run it, which is not normally something you want to do with diagnostic tools, but you can with their continuous profiler, which is pretty awesome. You'll be the hero that got that app back on track at your company.
Starting point is 00:21:56 Get started with a free trial at pythonbytes.fm slash datadog, or just click the link in your podcast player show notes. Now, I'm sure you all have heard that working with Pickle has all sorts of issues, right? Pickle is a way to say, take my Python thing, make a binary version of bits that looks like that Python thing so I can go do stuff with it, right?
Starting point is 00:22:16 That's generally got issues, not the least of which actually are around the security stuff. So to unpickle something, to deserialize it back is actually potentially running arbitrary code. So people could send you a pickle virus. I don't know what that is, like a bad, a rotten pickle or whatever. That wouldn't be good. So there's a, uh, a library I came across that solves a lot of the pickle problems.
Starting point is 00:22:39 It's supposed to be faster than pickle and it was cleverly named quickle. Neither of you heard of this thing? No. Yeah. It's cool. Right? So here's the deal. It's a fast serialization format for a subset of Python types. You can't pickle everything, but you can pickle like way more say than JSON. And the reasons they give to use it are it's fast. If you check out the benchmarks, I'll pull those up in a second. It's one of the fastest ways to serialize things in Python. It's safe, which is important. Unlike pickle, deserializing a user provided message does not allow arbitrary code execution. Hooray. That seemed like a minimum bar, like, oh, I got stuff off the internet. Let's try to execute that. What's that going to do? Oh, look, it's reading all my files. That's nice. All right. But also it's flexible
Starting point is 00:23:26 because it supports more types. And we'll also learn about a bunch of other libraries while we're at it here, which is kind of cool. A bunch of things I heard of like MSG pack or well, JSON, you may have heard of that. And the other main problem you get with some of these binary formats is you can end up where in a situation where you can't read something, if you make a change your code, like, so imagine I've, I've got a user object and I've pickled them and put them into a Redis cache. We upgrade our web app, which adds a new field to the user object. That stuff is still in cache. After we restart, we try to read it. Oh, that stuff isn't there anymore. You can't use your cache anymore. Everything's broken, et cetera, et cetera. So it has a concept of schema evolution, having
Starting point is 00:24:05 different versions of like history. So there's ways that older messages can be read without errors, which is pretty cool. Yeah, that's nice. Yeah, neat, huh? I'll pull up the benchmarks. There's actually a pretty cool little site here. It shows you some examples on how to use it. I mean, it's incredibly simple. It's like, dump this as a string, read this, you know, deserialize this. It's real simple. So, but there's quite an interesting analysis, a live analysis where you can click around and you can actually look at like load speed versus reads, like serialize versus deserialize speed, how much memory is used and things like that. And it compares against pickle tuples, protobuf, pickle itself,
Starting point is 00:24:41 OR JSON, MSG pack, quickle and quickle structs there's a lot of things i i mean i knew about two of those i think that's cool but these are all different ways and you can see uh like in all these pictures generally at least the top one where it's time shorter is better right so you can see if you go with their like quickle structs it's quick rule of thumb maybe four or five times faster than pickle which i presume is way faster than JSON, for example. You'll also see the memory size, which actually varies by about 50% across the different things. Also speed of load and a whole bunch of different objects
Starting point is 00:25:12 and so on. So yeah, you can come check out these analysis here. Let's see all the different libraries that we had. Yeah, I guess we read them all off basically there, but yeah, there's a bunch of different ways which are not pickle itself to do this kind of binary serialization which is pretty interesting i think it does proto buff that's pretty cool actually i want to try this out it looks neat yeah yeah
Starting point is 00:25:34 it looks really right and one of the things i was just looking at the source code i love that they use pytest to test this of course you should use pytest um but um the i can't believe i'm saying this but this would be the perfect package to test with a gherkin syntax don't you think because it's a pickle oh my gosh you've got to use the gherkin syntax so yeah you definitely should and roy uh threw out another one like uq foundation uh dill package uh deals with many of the same issues but because it's binary and has all the same uh sort of versioning challenges you might run into as well dill the dill package that's funny yeah pretty good pretty good all right so anyway like you know i'm kind of a fan of json these days i've had enough xml with custom namespaces in my life
Starting point is 00:26:22 that i really don't want to go down that path and XSLT and all that. But, you know, I've really shied away from these binary formats for a lot of these reasons here. But, you know, this might make me interested. If I was going to say, throw something into a cache, the whole point is put it in the cache,
Starting point is 00:26:36 get it back, read it fast. This might be decent. Yeah. Yeah. It definitely seems to address a lot of the concerns I have with pickle for sure. Yeah. And I don't, did I talk about the types somewhere in here? We have to, yeah, here's,
Starting point is 00:26:47 there's quite a list of types. You know, one's really nice date time. I can't do that with JSON. Why is I in the world? Doesn't JSON support some sort of time information? Oh, well, but you've got most of the fundamental types that you might run into. All right. So quick, give it a quick look. All rightrian what you got here um well i was actually uh reading a different article uh but uh the it came we i think we've talked about um friendly traceback it's a package that just sort of tries to make your tracebacks nicer but but well i didn't realize it had a console built in. So I was pretty blown away by this. So there's a, you know, it's not trivial to get set up.
Starting point is 00:27:30 It's not that terrible. But you have to start your own console, start the REPL, import friendly traceback, and then do friendly traceback start console. But at that point, you have just like the normal console but you have better tracebacks and then also you have all these different cool functions you can call like uh what uh what where why um and explain and more and basically if something goes wrong while you're playing with python you can interrogate it and ask like for more information and that's just pretty cool the the why is really great so if you have the one of the examples i saw before and i'm i think i might start using this when teaching people is uh we often have like exceptions like you assigned to none or
Starting point is 00:28:19 you assigned to something that can't be assigned or you, you, you didn't match up the bracket and the parentheses or something like that correctly. And you'll get like just syntax error and it'll point to the syntax error, but you might not know more. So you can just type why a W H Y with parentheses. Cause it's a, or yeah, because it's a function and it'll tell you why.
Starting point is 00:28:42 Why? It's like the great storytelling, right? The five whys of a bug. Yeah. The five Ws of a bug. Yep. You can say what, like to repeat what the error was. Why will tell you why that was an error.
Starting point is 00:28:57 And then specifically what you did wrong. And then where will show you if you've been asking all sorts of questions and you lost where the actual traceback was you can say where and it'll point to directly to it and i think this is going to be cool i think i'll use this when trying to teach especially kids but really just people new to python tracebacks can be very helpful for them like even i know like i sometimes have to look up like certain error messages that I'm not familiar with. So yeah, that would be super helpful. I could just do it right in the console.
Starting point is 00:29:28 Yeah, I totally agree. You're going to have to help me find a W that goes with this. But I want what would be effectively Google open-close privacy. You know, because so often you get this huge trace back and you've got these errors. And if you go through and you select it, like, for example, the error you see on the screen, unbound local error, local variable greetings in quotes referenced before assignments. Well, the quotes means oftentimes in search, like it must have the word greeting. And that's the one thing that is not relevant to the Googling of it.
Starting point is 00:30:00 Right. So if I'm a beginner and I even try to Google that, I might get a really wrong message. So if you could say, Google this in a way that is most likely going to find the error, but without carrying through like variable details, file name details, but just the essence of the error, that would be fantastic. Now, how do we say that with W? You could just say, whoa. Or maybe WWW. you just say whoa or or maybe www or or wtf i mean come on there's a lot of wtf but wouldn't that be great and so that's also part of this package you see um at their main site where you've got these really cool uh like visualized stuff right where it's sort of more
Starting point is 00:30:42 tries to tell you the problem of the error with the help text and whatnot. Yeah. Yeah. This is cool. Also uses rich, which is a cool library we talked about as well. I love rich. I include rich in everything now, even just, just to print out simple, better tables. It's great. Yeah, for sure. Hannah, do you see yourself using this or is it, are you more, more in notebooks? Oh no. I, I mean, I usually use like the PDB debugger. So yeah, I mean, I'm not sure if this as it is would be like a problem. It would depend on how much information it has about like obscure errors from dependent libraries, which is usually what I end up looking at these days. But yeah, I mean, conceivably, yeah, that could be helpful.
Starting point is 00:31:25 Yeah, if we get that WTF feature added, then it's gonna go. Oh yeah, for sure, gosh. Speaking of errors, let's cover your last item, last item of the show. Woo-hoo, yeah. So I, at work, work in the security org and I write automation tools tools for them which means
Starting point is 00:31:46 sometimes the repos that we work on get to be like test subjects for new like requirements and such and so recently our org was exploring like static code analysis, looking for like security vulnerabilities in the code. And so I ran across Bandit and I integrated Bandit into our... We don't have time to go through these old legacy code and fix these problems. Oh, wait, this is what it means? Oh, sorry. Yes, we can do that right now. That's the kind of report you got from Bandit? Yeah, exactly. So yeah, we integrated Bandit into our legacy code base. And we actually, it's funny you say that
Starting point is 00:32:29 because the bug that I found using Bandit was actually from the legacy code. That does not surprise me. Yeah. So it was a pretty stupid error. It was pretty obvious if you were doing code review, but because it was legacy code and it was a pretty stupid like error. Like it was pretty obvious, like if you were doing code review, but because it was legacy code and it was like already there, I just like never noticed.
Starting point is 00:32:52 But it was basically like issuing like a request with like no verify. So it was like an unverified like HTTP request. And Bandit was like, no. This, this, this broken SSL certificate keeps breaking it. I just told
Starting point is 00:33:05 to ignore it oh yeah yeah well and i honestly like i think that might have been why it was there in the first place because i i know like the oh like several years ago like had some certificate issues so yeah that might be and it was it was like an internal talking to internal so it was like maybe even a self-signed certificate that nothing trusted, but they get technically there. Yeah. It was like, we'll just,
Starting point is 00:33:29 we'll just do that. But yeah. So bandit is basically like, like a linter, but it looks for security issues. So you could just like pip install it and then just like run it on your code and it will find a bunch of different potential security issues like just by like statically analyzing your code.
Starting point is 00:33:48 And I've pretty much like come to the opinion that like, why haven't I done this on all of my other projects? Like I should be doing this on every single project. Like, cause you know, like as like a developer, I always run like Lint and Black and stuff like that. So I figured, you know, I as, as like a developer, I always run like lint and black and stuff like that. Um, so I figured, you know, I should probably be running bandit too.
Starting point is 00:34:10 Yeah. Cool. Yeah. Well, very nice. That's a good recommendation for people as well. And it's got a lot of cool, you can go and actually see the list of the things that it tests for and even has test plugins as well, which is pretty cool. Yeah.
Starting point is 00:34:21 Yeah. So you can like make your, make your own if you want. Um, and it has like all the common linter sort of like functionality like ignore these files or like ignore these rules or even you know like ignore this rule on this particular line stuff like that yeah absolutely which is pretty sweet i love that things like bandit are around because um thankfully developing web stuff is becoming easier and easier but it's then now the barrier to to entry is lower you still have to have all the security concerns that you had before that normal i mean usually people were just had more experience but they would make mistakes anyway but now i think
Starting point is 00:35:00 this is one of the reasons why i love this is because people new to it might be terrified about the security part, but having a bandit on there looking over their shoulder is great. Yeah. Yeah. Like don't publish with the debug setting on and or anything like that. Simple, obvious stuff. And like, honestly, like having worked in the security org for about a year now, like I've come to the understanding that a lot of security issues stem from just like basic, like, duh, sort of misconfigurations. So like something like this is perfect. And I really, really like that you added, you wrote in the show notes, some pre commit, how to hook this up with pre commit,mit because I think having it in pre-commit or in a CI pipeline is important because like you guys were joking about often security problems come in because somebody's just trying to fix something that broke yeah but they don't really realize how many other
Starting point is 00:35:56 things it affects so yeah yeah besides down just we got to make it work quick just just turn on the debug thing we'll just look real quick and then you forget to turn it off or whatever. Yeah, for sure. Yeah. Yeah. Just stupid human errors. Nice. All right. I want to go back real quick, Brian, because you're mentioning a friendly trace back.
Starting point is 00:36:15 Got a lot of stuff. So let me just do a quick audience reaction. Robert says, it is cool, Brian. John Sheehan says, I was just thinking of something the same would be cool as a great teaching concept. Anthony says, I was just thinking of something the same would be cool. It's a great teaching concept. Anthony says, super useful. John says, I've been doing more demo code in the console rather than ID. And this looks like it would help.
Starting point is 00:36:33 W how to fix it. W wow how. I love it, Robert. Very good. Zach says, what is this magic? This looks amazing. And so on. All right.
Starting point is 00:36:44 Well, thanks, everyone. I'm glad you all like that. So that's it for our main items. Brian, you got any extras you want to throw out there? You were doing some of the climate change? Or what are you doing this week? Yeah, I'm sharing a room with some people. Just a sec.
Starting point is 00:37:01 I did do two meetups with Noah and then with the Aberdeen Python meetup I gotta interrupt you really quick did all that talk that Hannah did about bandit viruses get you? I just I'm sorry
Starting point is 00:37:20 sorry about that did all this talk with Hannah that Hannah had about viruses and in hacking and stuff with Bandit, did it make you nervous and you had to put on your mask? No, I just, I'm in a group meeting in their group room and somebody came in. It's okay. I'm just teasing. Carry on.
Starting point is 00:37:39 That's funny. I also wanted to look like a Bandit. Yeah, exactly. But I was thrilled that Noah asked me to speak to them. That was funny. I also wanted to look like a bandit. Yeah, exactly. But I was thrilled that Noah asked me to speak to them. That was neat. And then the Python Aberdeen people. But they mentioned that Ian from the Python Aberdeen group said that he had an arrangement with you, Michael,
Starting point is 00:37:59 that when the pandemic is over, you're going to go over and you're going to do a whiskey tour or something like that. I don't know the details, but it sounds good to me already. If that happens, I want to go along. It's a Python bites out and let's do it. And then we have a, uh, there are PDX West meetup tomorrow. You're going to speak. That's kind of exciting. Yeah. It's going to be fun and people people, it's virtual, so people can attend however. I'm also, I've got feedback from both you and Matt Harrison gave me some feedback. So I'm updating my training page on testing code. So because I really like working with teams, so and anybody else wants to give me feedback on my training page, maybe I'd love to hear it. So yeah, or maybe they even want to have some high-test training for their team.
Starting point is 00:38:46 Yeah, I mean, testing is something that I think teaching a team at a time is a great thing because people can really, I don't know, we can talk about their particular problems, not general problems. It's good. Yeah, for sure. Well, you also need more of a team buy-in on testing, right?
Starting point is 00:39:01 Because if one person writes code and won't write the test, and another person is like really concerned about making the test fast, it's super frustrating. And the person who doesn't wanna run the tests keeps breaking the build. But like, you know, anyway,
Starting point is 00:39:13 it's a team sort of sport in that regard. Yep. Yeah. All right, awesome. So I got a couple of quick things. Pep 634, structural pattern matching in Python has been accepted for Python 3.10. That's like, imagine a switch case that has about a hundred different options that's what it is yeah with like like reg x not quite but sort
Starting point is 00:39:32 of like style like you can have like these patterns and stuff that happen in the cases i don't know how to feel about this like if if uh let me put a perspective like if the walrus operator was controversial like this is like this is like a way bigger change to the language. So I didn't know it. It's both awesome and terrifying. Yes, exactly. Yeah, I was going to say, I'm kind of surprised. Yeah, yeah.
Starting point is 00:39:52 So my Hannah, like this got accepted. It seemed to be sort of counter to the simplicity of Python. Like I did not at all against having a simple switch statement that does certain things, but this seems like a lot. I may come to love it. One thing that maybe would help me come to a better understanding
Starting point is 00:40:06 and acceptance was if the pep page had at least one example of it in use. Like the whole page that talks about all the details says, I don't believe there's a single code sample ever. Well, there's a tutorial page as well. Oh, is there? There's the tutorial page. Okay, maybe that's where I should be going to check it out, yeah. But it still sort of feels like a five-barrel foot gun.
Starting point is 00:40:24 Yeah, it does. Well, but the page that of feels like a five barrel foot gun. Yeah, it does. Well, but the page that I'm looking like the thing that I'm listening to the official pep, I don't think it has, uh, does it have a tour?
Starting point is 00:40:32 Yeah, no, you're right. It does. It does. Um, somewhere down. Yeah.
Starting point is 00:40:35 Pep six 36. Yeah. It's a different pep. That is the tutorial for the pep. Interesting. I didn't realize that it's kind of meta. Honestly. Anyway,
Starting point is 00:40:42 I, to me, I'm a little surprised. It was accepted. Fine. Um, I know people worked really hard on it and congratulations. A lot of people really want it. It's kind of meta, honestly. Anyway, to me, I'm a little surprised it was accepted. Fine. I know people worked really hard on it. And congratulations.
Starting point is 00:40:48 A lot of people really want it. It comes from Haskell. So Haskell had this pattern matching alternate struct thing. I don't know. I just feel like Haskell and Python are far away from each other. So that's my first impression. I will probably come to love it at some point. PyCon registration is open. So if you want to go to PyCon, you want to attend and be more part of it than just like
Starting point is 00:41:05 watching the live stream on YouTube, be part of that. I think I'm going to try to make a conscious effort to attend the virtual conference, not just catch some videos. So, uh, you can do that. PyCon is awesome. Like just, I, my first conference was PyCon and then I went to other conferences and I was like, what are wrong with these conferences? Like, why do they suck so much?
Starting point is 00:41:25 I know, I feel the same way. I know. It's really, really special. I'm sure the virtual one will be good. I can't wait for the in-person stuff to come back because it really is an experience. For sure. Yeah, it's a whole nother experience in person.
Starting point is 00:41:37 I consider it basically my geek holiday where I get away and just get to hang out with my geek friends. I happen to learn stuff while I'm there. Totally. And then Python Web Comp is coming up, and registration is open for that as well. And I suppose probably PyCascades, which Brian and I are on a panel at there as well.
Starting point is 00:41:54 Oh, nice. I put a link into an hour of code for Minecraft, which has to do with programming Minecraft with Python. If people are looking to teach kids stuff, that looks pretty neat. My daughter's super into Minecraft. I don't do anything with it. But if you are and you want to make it part of your curriculum, that's pretty cool. Hannah, anything you want to throw out there before we break out the joke? Nope. I'm good. Awesome. Do it. All right. So this one, we have something a little more interactive for everyone. We've got a song about PEP8, about writing clean code. This is
Starting point is 00:42:25 written and produced sung by Leon Sandoy. It goes by Lemon. Him and his team over at Python Discord, he runs Python Discord and apparently it was a team effort creating this. And the reason I'm covering it is a bunch of people sent it over. So Michael Rogers Valet sent it over. So you should cover
Starting point is 00:42:41 this. Dan Bader said, check this out. Alan McElroy said, hey, check out this thing. So, all right. I actually spoke to Lemon and said, hey, do you mind if we play this? He said, no, that'd be awesome. Give us a shout out, of course. So we're going to actually play the song as part of this. In the live stream, you get the video.
Starting point is 00:42:56 On the audio, you get, well, audio. So I'm going to kick this off and we'll come back. And I'd love to hear Brian and Hannah's thoughts. Here we go. You don't need any curly braces Just four spaces, just four spaces Wildcard imports should be avoided In most cases, in most cases Try to make sure there's no trailing white space It's confusing, it's confusing. Trailing commas go behind list items. Get blamed, titans. Get blamed, titans. And comments are important as long as they're maintained
Starting point is 00:44:05 When comments are misleading, it will drive people insane Just try to be empathic, just try to be a friend It's really not that hard, just adhere to to Pepeade Pepeade Constance should be named all capital letters and live forever live forever
Starting point is 00:44:44 and camel cases not for Python Never ever, never ever And never use a bare exception Be specific, be specific No one likes the horizontal scroll bar Keep it succinct, keep it succinct And comments are important as long as they're maintained When comments are misleading
Starting point is 00:45:25 It will drive people insane Just try to be empathic Just try to be a friend It's really not that hard Just adhere to Pepaid Pepaid Peppate. Peppate. Peppate. Peppate. That was amazing. I can sympathize with so much of what he's saying.
Starting point is 00:46:14 I'm just having flashbacks to a discussion I had with my teammate about comments. And being like, no, this comment doesn't actually describe what the code is doing it's worse than having no comment it really is it really is yeah or like if it describes like literally what the code is doing and not like exactly you know kind of like high level background or anything other than the why the why is important yeah i love it so two things lemon and team well done on the song and man you got a great voice that's actually it was beautiful and funny yeah it was amazing all right well brian we probably should wrap it up yeah yeah all right well hannah thanks so much for being here it's good to have you on the show and brian thanks as always everyone
Starting point is 00:47:00 thanks for listening thanks bye bye thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.