Python Bytes - #220 What, why, and where of friendly errors in Python
Episode Date: February 11, 2021Topics covered in this episode: We Downloaded 10,000,000 Jupyter Notebooks From Github – This Is What We Learned pytest-pythonpath Thinking in Pandas Quickle what(), why(), where(), explain(), mo...re() from friendly-traceback console Bandit Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/220
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly
to your earbuds.
This is episode 220, recorded February 10th, 2021.
I'm Michael Kennedy.
I'm Brian Hockett.
And we have a special guest, Hannah.
Welcome.
Hello.
Hannah Stepnick, welcome to the show.
It is so great to have you here.
Thank you.
I'm happy to be here.
Yeah, it's good to have you.
It's so cool.
The internet is a global place.
We can have people from all over.
So we've decided to make it an all Portland show this time. We could do this in person, actually. Well,
not really, because we can't go anywhere. But theoretically, geographically, anyway.
Yeah. So all three of us are from Portland, Oregon. Very nice. Before we jump into the
main topics, two quick things. One, this episode is brought to you by Datadog. Check them out at
PythonBytes.fm slash Datadog. And Hannah, just want to give people a quick background on yourself.
Yeah, so I'm Hannah.
I have written a book, which is weird to say, about pandas.
But I also just go around and give talks at various conferences on Python.
So yeah, I gave ReArchitecting a Legacy Codebase recently.
That sounds interesting and challenging. Yeah. What was the legacy language? Was it Python or something? It was,
it was Python. It was like a flask web application. And then also the front end of it was Vue,
like Vue.js. Oh yeah. So yeah, that's been a fun project that was through work as developers. Like you're pretty much always working with some form of legacy code. Just depends on how legacy it really is.
Well, what could be cutting edge in one person's viewpoint might be super legacy in another, right? Like it's Python 3.5, you wouldn't believe it.
Right. Yeah, very cool. Well, it's great to have you here. I think maybe we'll start off with our first topic, which is sort of along the lines of the data science world, some tie-ins to your book. And of course, whenever you go to JetBrains, you've got to We Downloaded 10 Million Jupyter Notebooks.
I almost said 10,000.
10 million Jupyter Notebooks from GitHub.
Here's what we learned.
So this is an article or analysis done by Elena Guzaharina.
And yeah, pretty neat.
So they went through and downloaded a whole bunch
of these notebooks and just analyzed them.
And there's many, many of them are publicly accessible.
And a couple of years ago, there were 1.2 million Jupyter notebooks that were public.
As of last October, it was eight times as many, 9.7 million notebooks available on GitHub.
That's crazy, right?
Wow.
Yeah.
So this is a bunch of really nice pictures and interactive graphs and stuff.
So I encourage people to go check out the webpage.
So for example, one of the questions was,
well, what language do you think is the most popular
for data science just by judging
on the main language of the notebook?
Hannah, you wanna take a guess?
Oh yeah, Python, for sure, without a doubt.
That's for sure.
The second one, I'm pretty sure no one who's not seen this,
there's no way they're
going to guess it's Nan.
We have, we have no idea.
Like I, we look, we can't tell what language this is in there.
Um, but then the other contenders are R and Julia and often people say, oh yeah, well,
Julia, maybe I should go to Julia from Python.
Well, maybe, but that's not where the trends are.
Like there's 60,000 versus 9 million, you know, as the ratio.
I don't know what that number is, but it's a percent of a percent type of thing.
Wow.
They also talk about the Python 2 versus 3 growth or difference.
So in 2008, it was about 50% was Python 2.
And in 2020, it's a Python 2 is down to 11%.
And I was thinking about this 11%.
Like, why do you guys think people, there's still 11 there um hanging around i mean i would guess speaking of legacy applications um probably
it just hasn't been touched but um yeah yeah those are very likely the ones that were like the
original 2016-17 ones that were not quite there uh they're still public right github doesn't get
rid of them uh the other one is i was thinking, you know, a lot of people do work on Mac or maybe even on some Linux machines that
just came at the time with Python 2. So they're just like, well, I'm not going to change anything.
I just need to view this thing. I don't have Python. Problem solved, right? They didn't know
that there's more than one Python. There's a good breakdown of the different versions.
Another thing that's interesting is looking at the different languages,
not language, different libraries used during this. So like NumPy is by far the most likely used. And then a tie is Pandas and Matplotlib and then Scikit-learn and then OS actually for
traversing stuff. And then there's a huge long tail. And they also talk about combinations like
Pandas and NumPy are common and then Pandas and then like Seaborn, Scikit-learn, Pandas,
NumPy, Matplotlib,
and so on as a combo. And so that's really interesting, like what sets of tools data scientists are using. Yeah. And then another one is they looked at deep learning libraries,
and PyTorch seems to be crushing it in terms of growth, but not necessarily in terms of popularity.
So it grew 1.3 times or 130%, whereas TensorFlow is more popular, but only grew 30% and so on. So there's
a lot of these types of statistics in there. I think people will find interesting if they want
to dive more into this ecosystem. You know, it's one thing to have survey and go fill out the
survey, like ask people, what do you use? You know, what platform do you run on? Vue.js or Linux?
Like, okay, well, that's not really a reasonable question, but I guess Vue.js, you know, like,
but if you just go and look at what they're actually doing on places like github i think you can get a lot of insight yeah sure yeah i know i
use um like i'll go to github pretty frequently like i work when i'm you know just like browsing
like i wonder how you do this thing or like what's the most common way to do this or yeah absolutely
just look up like what's the most popular so it's a pretty good uh sign if a lot of people are using
it it is one thing i should probably make more better use of is I know they started adding
dependencies like, oh, if you go to Flask, it'll show you Flask is used in these other GitHub repos
and stuff. Like you could find interesting little connections. I think, oh, this other project uses
this cool library I know nothing about, but if they're using it, it's probably good.
Yeah, for sure.
Yeah. I love the dependency feature of looking who's using it. Yeah, absolutely. So Brian, you're going to cover something on testing this time?
Yeah. I wanted to bring up something we brought up before. So there's a project called PyTest
Python Path, and it's just a little tiny plugin for PyTest. And we did cover it briefly in way back in episode 62.
But at the time I brought it up as so okay, so the I brought it up as a way to, to, to just shim,
like be able to have your test code, see your source code, but as just like a shortcut,
like a stopgap until you actually put together like proper packaging for
your source code. But the more I talk to real life people who are testing all sorts of software
and hardware, even there's there that that's a simplistic view of the world. So thinking of
everybody is working on on packages is is not real.'s applications for instance um that that they're never going to
set up hold their code together as a package and that's that's that's legitimate so if you have an
application and your your source code is in your source directory and your test code is in your
test directory um it's just your tests are just not going to be able to see your source code right
off the bat right so what's more um tricky is depending on how you run it, they will or they won't.
Yeah.
Right.
Right.
If you say run it with PyCharm and you open up the whole thing and it can like put together
the paths, you're all good.
But if you then just go into the directory and type PyTest, well, maybe not.
It doesn't work.
And it just confuses a lot of people.
And so more and more, I'm recommending people to use this little plugin.
And really, the big benefit is it gives you, it does a few things, but the biggie is just you can add a Python path setting within your PyTest Any file, and you stick your any file at the top of your project and then
you just give it a relative path to where your
source code is like source or
SRC or something else
and then PyTest
from then on will be able to see
your source code. It's a really simple
solution. It's just
That's way better
than what I do. I don't think it's a stopgap
I think it's awesome. Yeah, I totally agree. What's way better than what I do. I don't think it's a stop gap. I think it's awesome.
So, yeah, I totally agree.
What I do a lot of times is certain parts of my code.
I'm like, this is going to get imported.
So for me, the real tricky thing is Alembic, the database, database migration tool and
the tests and the web app.
And usually I can get the tests in the web app to work just fine running them directly.
But for some reason, Alembic always seems to get weird, like working directories that don't line up in the same way so it can't import stuff so a lot of times
i'll put at the top of some file you know go to the python path and add you know get the directory
name from dunder file and go to the parent add that to the python path and now it's going to
work from then on basically and uh this seems like a nice one although it doesn't help me with olympic but still but it it might you might be able to add the olympic path
right to it so yeah yeah for sure pretty cool so it's a yeah go ahead oh i was just gonna say yeah
like this is something i like pretty much every time i set up a new project like i always have to
screw with the python path i always like run it initially and then it's like oh can't
find blah blah blah and i'm like oh here we go again but i usually always run my projects from
docker though so i just you know hard code that stuff like just directly yeah once you get it
set up yeah that's cool yeah um nice i dream of days when i can use docker again i have an m1 mac
and it's in super early, early beta stages.
Oh no. Yeah, it's all good. I don't mind too much because I don't use it that much, but it's so
cool. Brian, it says something about.pth, I'm guessing path files. Do you know anything about
this? I have no idea what those are. Oh,.pth files. So there's, yeah, there's,
there, there are a way to, I don't know a lot. I don't know the detail, the real big details, but it's, it's a way to have a, you can have
a list of different paths within that file.
And if you import it or don't import it, if you include it in your path, then Python,
I think includes all of the contents into anyway, I'm actually, I'm blown smoke.
I don't know the details.
Okay.
Sorry.
Yeah.
But apparently you can have a little more control with ETH files, whatever those are.
Very cool.
Yeah.
I don't know much about that either.
Yeah.
Unfortunately.
I mean, I've been using OS.path.
So what do I know?
All right.
Speaking of what do I know?
I could definitely learn more about pandas and that's one of your items here, Hannah.
Yeah. definitely learn more about pandas and that's uh one of your items here hannah yeah so um i thought
uh maybe i just give like a little snippet of kind of like some of the stuff i talk about in the book
um fantastic so yeah uh here we go uh so if we're looking at pandas in terms of like the
dependency hierarchy um well and i guess I should start at the beginning.
So what is pandas?
If you're not familiar with it,
it's a data analysis library for Python.
So it's used for doing big data operations.
And so like, if we look at the dependency hierarchy
of pandas, it kind of goes like pandas,
which is dependent on NumPy,
which deep down is dependent on this thing called BLOS,
which is Basic Linear Algebra Subprograms.
Right.
And wasn't there something with BLOS and Windows and a Windows update and a certain version,
I think recently?
I can't remember.
I feel like there was some update that made that thing that wasn't working.
Yeah, usually.
A big challenge around NumPy and versioning and stuff to make it work.
Yeah, usually the the blast library is
built into your os already um and it just points at that but um if you're using um something like
anaconda i think by default like it installs intel mlk um and uses that but yeah if you're using like
linux or just like out of the box whatever's Windows, which is what it is if you like pip install it, then yeah, there could certainly be issues with like dependencies mismatches.
Yeah. So, and I've like greatly simplified this, but in terms of kind of like the languages
and walking down that dependency hierarchy, you start out in Python with pandas.
And then NumPy is partially Python and partially C.
And then BLOS is pretty much always written in assembly.
And if you don't know what assembly is, it's basically like a very, very, very, like probably the lowest level language you can program in.
And it's essentially like CPU instructions for your processor. And so I've taken this just like basic example here
and I'm going to kind of like roll with it.
So if we're doing just like a basic addition in pandas,
say like we have column A
and we want to add that with column B
and like store it back into column C.
Like a traditional linear algebra vector addition.
Traditional vector math. So pandas, like if you, if you look at these operations, each, each of these like
additions on a per row basis is independent, meaning like you could conceivably run like
each of those additions for each row, like in in parallel like there's no reason why you have to
go like row by row um right and that's essentially like what kind of like big data analysis libraries
are like at their core is they they like understand this conceptually and try to parallelize things as
much as possible um and so that's kind of like the first like fundamental understanding that you have
to have like when working with pandas is like you should be doing things in parallel as much as you can um which means
understanding the api and understanding like which functions in the api will let you do things in
parallel um so like if we're just not using pandas at all um say like we're just inventing our own
sort of like technique for this like you might think well, each of these rows could be broken up into a thread.
We could say thread one is going to run the first row addition, and then thread two is going to run
the second row, et cetera. But you might find that we'll run into issues with this in terms of the
GIL. The GIL is otherwise known as the global interpreter lock in python uh prevents us
from really like running a multi-threaded app uh operation like in parallel yeah basically python
can run the rule is it can run one python opcode at a time yeah and that's it right it doesn't
matter if you've got you know 16 cores 16 cores. It's one at a time.
Yeah.
Yeah.
And this like is really terrible for, yeah, for like trying to do things in parallel.
Right.
So like that, that kind of use case is out like pandas and NumPy and all that stuff is not going to be able to use multi-threading. And so, and like, I just want to point out like Python,
like at its core has this like fundamental problem, which is why they went with the GIL.
So like Python manages memory for you. And how it does that is it keeps track of references to know when to free up memory. So like when memory can be
like completely destroyed and somebody else can use it essentially. And like that's something.
Otherwise you've got to do stuff like Brian sometimes probably has to do with C and like
free and all those things. Right. Yeah, exactly. Yeah. Yeah. So like C you have to do this with
yourself with like Malik and free and all that stuff. But with Python, it does it for you. But that comes at a cost, which means like every single object in order to kind of like avoid this threading problem,
they came up with the gill,
which basically says you can only run one thread at a time
or like one opcode at a time, as you said.
And attempts have been made to remove it.
Like Larry Hastings has been working on something called the galectomy,
the removal of the gill for a while.
And the main problem is if you take it away, the way it works nowill for a while. And the main problem is, if you take it
away, the way it works now is you have to do lock on all memory access, all variable access, which
actually has a bigger hit than a lot of the benefits you would get, at least in the single
threaded case. And I know Guido said, like, we really don't want to make changes to this if it's
going to mean slower single threaded Python, but probably not for a while. Yeah, yeah, yeah. And that is a big
problem. So like, I mean, if generally what people use like instead of threads in Python is they use
like multi-process and they spin up multiple Python processes, right? And like that truly
kind of like achieves the parallelism. But anyways, I digress. Uh, so, um, so we can't use the gill, but what's interesting to
note is when you're, uh, running NumPy at its very low level in C, like when you enter and
look at the C files, it actually is not subject to the gill anymore because you're in C. Uh,
and so you can potentially run, you know, multi-threaded things in C and call it from Python. But beyond
that, if we look at BLOS, BLOS has built-in parallelization for hardware parallelization.
And how it does that is through vector registers. So if you're not familiar with like the architecture
of CPUs and stuff, like at its core, you basically only have like, only can have a certain
small set, maybe like three or four values in your CPU at any one time that you're running like ads
and multiplies on. And like how that works is you load those values like into the CPU from
memory.
And that load can be quite time consuming.
It's really just based on like how far away your memory is from your CPU at
the end of the day,
like physically on your board.
Right.
Right.
Is it in cache?
Is it in?
Yes.
Yeah.
And that's why we have caches.
So like caches are like memory that's closer to your CPU. Consequently, it's also smaller. But that's why we have caches. So caches are memory that's closer to your CPU.
Consequently, it's also smaller.
But that's how you can kind of,
you might hear people say,
oh, so-and-so wrote this really performant program
and it utilizes the size of the cache or whatever.
So basically, if you can load all of that data
into your cache and run the operations on it
without ever having to go back out to memory, you can make all of that data into your cache and run the operations on it without
ever having to go back out to memory, you can make a really fast program.
Yeah, it could be like 100 times faster than regular memory.
Yeah.
Yeah.
And so essentially, that's what Bloss is trying to do underneath and NumPy is they're trying
to take this giant set of data and break it into chunks and load those chunks
into your cache and operate on those chunks and then dump them back out to memory and load the
next trunk. Yeah, very cool. Thanks for pointing that out. I didn't realize that BLAST leveraged
some of the OS native stuff, nor that it had special CPU instruction type optimizations. That's pretty cool. Yeah. Yeah. So it has, on top of the registers, it also has these things called vector registers,
which actually can hold multiple values at a time in your CPU. So we could take this simple example
of the addition, and we could actually,, we can't run those like per row calculations
in parallel with threads.
We can with vector registers.
And the limitation there is that the memory
has to be sequential when you load it in.
This is definitely at a level lower
than I'm used to working at.
How about you, Brian?
But yeah, so anyways, this is just like kind of the stuff that I talk about to working at. How about you, Brian? But yeah, so anyways,
this is just like kind of the stuff
that I talk about in my book.
It's not necessarily about like how to use pandas,
but it's about like kind of like
what's going on underneath pandas.
And then like once you kind of like
build that foundation of understanding,
like you can understand like better
how pandas is working
and like how to use it correctly
and what all the various functions are doing.
Fantastic. Yeah. So people can check out your book,
got a link to it in the show notes. So very nice.
It's offering me the European, the Euro price, which is fine.
I don't mind. So.
Yeah. So like, I mean, it's on Amazon too.
It's on a lot of different platforms,
but I figured I'd just point directly to the publishers.
Yeah, no, that's perfect.
Perfect.
Quick comment.
Roy Larson says, NumPy and Intel MKL cause issues sometimes,
particularly on Windows, if something else in the system uses Intel MKL.
Okay.
Yeah.
Interesting.
I have no experience with that, but I can believe it.
Intel has a lot of interesting stuff.
They even have a special Python compiled version, I think, for Intel. If you use potentially, I'm not sure they have some high performance version.
Yeah. Yeah. Yeah, they do. Yeah. Nice. Also in Portland, you can keep it in Portland. There we
go. Now, before we move on to the next item, let me tell you about our sponsor today. Thank you to
Datadog. So they're sponsoring Datadog. And if you're having trouble visualizing latency, CPU,
memory bottlenecks, things like that in your app, and you don't know why, you don't know where it's
coming from or how to solve it, you can use Datadog to correlate logs and traces at the level of
individual requests, allowing you to quickly troubleshoot your Python app. Plus, they have a
continuous profiler that allows you to find the most resource consuming parts of your production
code all the time at any scale with minimal overhead.
So you just point out your production server, run it,
which is not normally something you want to do
with diagnostic tools,
but you can with their continuous profiler,
which is pretty awesome.
You'll be the hero that got that app back on track
at your company.
Get started with a free trial
at pythonbytes.fm slash datadog,
or just click the link in your podcast player show notes.
Now, I'm sure you all have heard that working with Pickle
has all sorts of issues, right?
Pickle is a way to say, take my Python thing,
make a binary version of bits that looks like that Python thing
so I can go do stuff with it, right?
That's generally got issues,
not the least of which actually are around the security stuff.
So to unpickle something, to deserialize it back is actually potentially running arbitrary
code.
So people could send you a pickle virus.
I don't know what that is, like a bad, a rotten pickle or whatever.
That wouldn't be good.
So there's a, uh, a library I came across that solves a lot of the pickle problems.
It's supposed to be faster than pickle and it was cleverly named quickle.
Neither of you heard of this thing?
No. Yeah. It's cool. Right? So here's the deal. It's a fast serialization format for a subset of Python types. You can't pickle everything, but you can pickle like way more say than JSON.
And the reasons they give to use it are it's fast. If you check out the benchmarks, I'll pull those
up in a second. It's one of the fastest ways to serialize things in Python. It's safe, which is important. Unlike
pickle, deserializing a user provided message does not allow arbitrary code execution. Hooray.
That seemed like a minimum bar, like, oh, I got stuff off the internet. Let's try to execute that.
What's that going to do? Oh, look, it's reading all my files. That's nice. All right. But also it's flexible
because it supports more types. And we'll also learn about a bunch of other libraries while we're
at it here, which is kind of cool. A bunch of things I heard of like MSG pack or well, JSON,
you may have heard of that. And the other main problem you get with some of these binary formats
is you can end up where in a situation where you can't read something, if you make a change your code, like, so imagine I've, I've got a user object and I've pickled them and put them
into a Redis cache. We upgrade our web app, which adds a new field to the user object. That stuff
is still in cache. After we restart, we try to read it. Oh, that stuff isn't there anymore. You
can't use your cache anymore. Everything's broken, et cetera, et cetera. So it has a concept of
schema evolution, having
different versions of like history. So there's ways that older messages can be read without
errors, which is pretty cool. Yeah, that's nice. Yeah, neat, huh? I'll pull up the benchmarks.
There's actually a pretty cool little site here. It shows you some examples on how to use it. I
mean, it's incredibly simple. It's like, dump this as a string, read this, you know, deserialize
this. It's real simple. So, but there's quite an
interesting analysis, a live analysis where you can click around and you can actually look at
like load speed versus reads, like serialize versus deserialize speed, how much memory is
used and things like that. And it compares against pickle tuples, protobuf, pickle itself,
OR JSON, MSG pack, quickle and quickle structs there's a lot of things i i mean i knew
about two of those i think that's cool but these are all different ways and you can see uh like in
all these pictures generally at least the top one where it's time shorter is better right so you can
see if you go with their like quickle structs it's quick rule of thumb maybe four or five times
faster than pickle which i presume is way faster than JSON, for example.
You'll also see the memory size,
which actually varies by about 50% across the different things.
Also speed of load and a whole bunch of different objects
and so on.
So yeah, you can come check out these analysis here.
Let's see all the different libraries that we had.
Yeah, I guess we read them all off basically there,
but yeah, there's a bunch of different ways
which are not pickle itself
to do this kind of binary serialization which is pretty interesting i think
it does proto buff that's pretty cool actually i want to try this out it looks neat yeah yeah
it looks really right and one of the things i was just looking at the source code i love that they
use pytest to test this of course you should use pytest um but um the i can't believe i'm saying this but this would
be the perfect package to test with a gherkin syntax don't you think because it's a pickle
oh my gosh you've got to use the gherkin syntax so yeah you definitely should and roy uh threw
out another one like uq foundation uh dill package uh deals with many of the same issues but
because it's binary and has all the same uh sort of versioning challenges you might run into as well
dill the dill package that's funny yeah pretty good pretty good all right so anyway like you
know i'm kind of a fan of json these days i've had enough xml with custom namespaces in my life
that i really don't want to go down that path and XSLT and all that.
But, you know, I've really shied away
from these binary formats
for a lot of these reasons here.
But, you know, this might make me interested.
If I was going to say,
throw something into a cache,
the whole point is put it in the cache,
get it back, read it fast.
This might be decent.
Yeah. Yeah.
It definitely seems to address
a lot of the concerns
I have with pickle for sure.
Yeah. And I don't,
did I talk about the types somewhere in here? We have to, yeah, here's,
there's quite a list of types. You know, one's really nice date time. I can't do that with JSON.
Why is I in the world? Doesn't JSON support some sort of time information? Oh, well,
but you've got most of the fundamental types that you might run into. All right. So quick,
give it a quick look. All rightrian what you got here um well i was
actually uh reading a different article uh but uh the it came we i think we've talked about um
friendly traceback it's a package that just sort of tries to make your tracebacks nicer but but
well i didn't realize it had a console built in. So I was pretty blown away by this.
So there's a, you know, it's not trivial to get set up.
It's not that terrible.
But you have to start your own console, start the REPL, import friendly traceback, and then do friendly traceback start console.
But at that point, you have just like the normal console but you have better tracebacks and then
also you have all these different cool functions you can call like uh what uh what where why um
and explain and more and basically if something goes wrong while you're playing with python
you can interrogate it and ask like for more information and that's just pretty cool the the
why is really great so if you have the one of the examples i saw before and i'm i think i might start
using this when teaching people is uh we often have like exceptions like you assigned to none or
you assigned to something that can't be assigned or you, you, you didn't match up the bracket and the parentheses or something like that
correctly.
And you'll get like just syntax error and it'll point to the syntax error,
but you might not know more.
So you can just type why a W H Y with parentheses.
Cause it's a,
or yeah,
because it's a function and it'll tell you why.
Why?
It's like the great storytelling, right?
The five whys of a bug.
Yeah.
The five Ws of a bug.
Yep.
You can say what, like to repeat what the error was.
Why will tell you why that was an error.
And then specifically what you did wrong.
And then where will show you if you've been asking all sorts of questions and you lost
where the actual traceback was you can say where and it'll point to directly to it and i think this
is going to be cool i think i'll use this when trying to teach especially kids but really just
people new to python tracebacks can be very helpful for them like even i know like i sometimes
have to look up like certain error messages that I'm not familiar with.
So yeah, that would be super helpful.
I could just do it right in the console.
Yeah, I totally agree.
You're going to have to help me find a W that goes with this.
But I want what would be effectively Google open-close privacy.
You know, because so often you get this huge trace back and you've got these errors.
And if you go through and you select it, like, for example, the error you see on the screen,
unbound local error, local variable greetings in quotes referenced before assignments.
Well, the quotes means oftentimes in search, like it must have the word greeting.
And that's the one thing that is not relevant to the Googling of it.
Right.
So if I'm a beginner and I even try to Google that, I might get a really wrong message. So if you could say, Google this in a way that is most likely going to find the error,
but without carrying through like variable details, file name details, but just the
essence of the error, that would be fantastic. Now, how do we say that with W?
You could just say, whoa.
Or maybe WWW. you just say whoa or or maybe www or or wtf i mean come on there's a lot of wtf
but wouldn't that be great and so that's also part of this package you see um at their main
site where you've got these really cool uh like visualized stuff right where it's sort of more
tries to tell you the problem of the error with the help text and whatnot. Yeah. Yeah. This is cool. Also uses rich, which is a cool library we talked about
as well. I love rich. I include rich in everything now, even just, just to print out simple,
better tables. It's great. Yeah, for sure. Hannah, do you see yourself using this or is it,
are you more, more in notebooks? Oh no. I, I mean, I usually use like the PDB debugger.
So yeah, I mean, I'm not sure if this as it is would be like a problem.
It would depend on how much information it has about like obscure errors from dependent
libraries, which is usually what I end up looking at these days.
But yeah, I mean, conceivably, yeah, that could be helpful.
Yeah, if we get that WTF feature added,
then it's gonna go.
Oh yeah, for sure, gosh.
Speaking of errors, let's cover your last item,
last item of the show.
Woo-hoo, yeah.
So I, at work, work in the security org
and I write automation tools tools for them which means
sometimes the repos that we work on get to be like test subjects for new like requirements and such
and so recently our org was exploring like static code analysis, looking for like security vulnerabilities in the code.
And so I ran across Bandit and I integrated Bandit into our...
We don't have time to go through these old legacy code and fix these problems. Oh,
wait, this is what it means? Oh, sorry. Yes, we can do that right now.
That's the kind of report you got from Bandit?
Yeah, exactly. So yeah, we integrated Bandit into our legacy code base.
And we actually, it's funny you say that
because the bug that I found using Bandit
was actually from the legacy code.
That does not surprise me.
Yeah.
So it was a pretty stupid error.
It was pretty obvious if you were doing code review, but because it was legacy code and it was a pretty stupid like error. Like it was pretty obvious, like if you were doing code review,
but because it was legacy code and it was like already there,
I just like never noticed.
But it was basically like issuing like a request with like no verify.
So it was like an unverified like HTTP request.
And Bandit was like,
no.
This,
this,
this broken SSL certificate keeps breaking it.
I just told
to ignore it oh yeah yeah well and i honestly like i think that might have been why it was
there in the first place because i i know like the oh like several years ago like had some
certificate issues so yeah that might be and it was it was like an internal talking to internal
so it was like maybe even a self-signed certificate that nothing trusted,
but they get technically there.
Yeah.
It was like,
we'll just,
we'll just do that.
But yeah.
So bandit is basically like,
like a linter,
but it looks for security issues.
So you could just like pip install it and then just like run it on your
code and it will find a bunch of different potential security issues
like just by like statically analyzing your code.
And I've pretty much like come to the opinion
that like, why haven't I done this
on all of my other projects?
Like I should be doing this on every single project.
Like, cause you know, like as like a developer,
I always run like Lint and Black and stuff like that. So I figured, you know, I as, as like a developer, I always run like lint and black and stuff
like that.
Um, so I figured, you know, I should probably be running bandit too.
Yeah.
Cool.
Yeah.
Well, very nice.
That's a good recommendation for people as well.
And it's got a lot of cool, you can go and actually see the list of the things that it
tests for and even has test plugins as well, which is pretty cool.
Yeah.
Yeah.
So you can like make your, make your own if you want.
Um, and it has like all the common linter sort of like functionality like ignore these files or like
ignore these rules or even you know like ignore this rule on this particular line stuff like that
yeah absolutely which is pretty sweet i love that things like bandit are around because um
thankfully developing web stuff is becoming easier and easier but it's then now the barrier to to
entry is lower you still have to have all the security concerns that you had before that normal
i mean usually people were just had more experience but they would make mistakes anyway but now i think
this is one of the reasons why i love this is because people new to it might be terrified about the security part, but having a bandit on there looking over their shoulder is great.
Yeah. Yeah. Like don't publish with the debug setting on and
or anything like that. Simple, obvious stuff. And like, honestly, like having worked in the
security org for about a year now, like I've come to the understanding that a lot of security issues stem from just like basic, like, duh, sort of misconfigurations. So like something like this
is perfect. And I really, really like that you added, you wrote in the show notes, some pre
commit, how to hook this up with pre commit,mit because I think having it in pre-commit or in a CI pipeline
is important because like you guys were joking about often security problems come in because
somebody's just trying to fix something that broke yeah but they don't really realize how many other
things it affects so yeah yeah besides down just we got to make it work quick just just turn on
the debug thing we'll just look real quick and then you forget to turn it off or whatever.
Yeah, for sure.
Yeah.
Yeah.
Just stupid human errors.
Nice.
All right. I want to go back real quick, Brian, because you're mentioning a friendly trace back.
Got a lot of stuff.
So let me just do a quick audience reaction.
Robert says, it is cool, Brian.
John Sheehan says, I was just thinking of something the same would be cool as a great
teaching concept. Anthony says, I was just thinking of something the same would be cool. It's a great teaching concept.
Anthony says, super useful.
John says, I've been doing more demo code in the console rather than ID.
And this looks like it would help.
W how to fix it.
W wow how.
I love it, Robert.
Very good.
Zach says, what is this magic?
This looks amazing.
And so on.
All right.
Well, thanks, everyone.
I'm glad you all like that.
So that's it for our main items.
Brian, you got any extras you want to throw out there?
You were doing some of the climate change?
Or what are you doing this week?
Yeah, I'm sharing a room with some people.
Just a sec.
I did do two meetups with Noah
and then with
the Aberdeen Python meetup
I gotta interrupt you really quick
did all that talk that Hannah did about bandit
viruses get you?
I just
I'm sorry
sorry about that
did all this talk
with Hannah that Hannah had about viruses and in hacking and stuff
with Bandit, did it make you nervous and you had to put on your mask?
No, I just, I'm in a group meeting in their group room and somebody came in.
It's okay.
I'm just teasing.
Carry on.
That's funny.
I also wanted to look like a Bandit.
Yeah, exactly.
But I was thrilled that Noah asked me to speak to them. That was funny. I also wanted to look like a bandit. Yeah, exactly. But I was thrilled that Noah asked me to speak to them.
That was neat.
And then the Python Aberdeen people.
But they mentioned that Ian from the Python Aberdeen group
said that he had an arrangement with you, Michael,
that when the pandemic is over, you're going to go over
and you're going to do a whiskey tour or something like that.
I don't know the details, but it sounds good to me already.
If that happens, I want to go along. It's a Python bites out and let's do it. And then we have a, uh, there are PDX West meetup tomorrow. You're going to speak. That's kind of exciting.
Yeah. It's going to be fun and people people, it's virtual, so people can attend however. I'm also, I've got feedback from both you and Matt Harrison gave me some feedback. So I'm
updating my training page on testing code. So because I really like working with teams,
so and anybody else wants to give me feedback on my training page, maybe I'd love to hear it.
So yeah, or maybe they even want to have some high-test training for their team.
Yeah, I mean, testing is something that
I think teaching a team at a time
is a great thing because people can really,
I don't know, we can talk about their particular problems,
not general problems.
It's good.
Yeah, for sure.
Well, you also need more of a team buy-in on testing, right?
Because if one person writes code
and won't write the test,
and another person is like really concerned
about making the test fast,
it's super frustrating.
And the person who doesn't wanna run the tests
keeps breaking the build.
But like, you know, anyway,
it's a team sort of sport in that regard.
Yep. Yeah.
All right, awesome.
So I got a couple of quick things.
Pep 634, structural pattern matching in Python
has been accepted for Python 3.10.
That's like, imagine a switch case that
has about a hundred different options that's what it is yeah with like like reg x not quite but sort
of like style like you can have like these patterns and stuff that happen in the cases
i don't know how to feel about this like if if uh let me put a perspective like if the walrus
operator was controversial like this is like this is like a way bigger change to the language.
So I didn't know it.
It's both awesome and terrifying.
Yes, exactly.
Yeah, I was going to say, I'm kind of surprised.
Yeah, yeah.
So my Hannah, like this got accepted.
It seemed to be sort of counter
to the simplicity of Python.
Like I did not at all against having a simple switch statement
that does certain things, but this seems like a lot.
I may come to love it.
One thing that maybe would help me
come to a better understanding
and acceptance was if the pep page had at least one example of it in use.
Like the whole page that talks about all the details says,
I don't believe there's a single code sample ever.
Well, there's a tutorial page as well.
Oh, is there?
There's the tutorial page.
Okay, maybe that's where I should be going to check it out, yeah.
But it still sort of feels like a five-barrel foot gun.
Yeah, it does. Well, but the page that of feels like a five barrel foot gun. Yeah,
it does.
Well,
but the page that I'm looking like the thing that I'm listening to the
official pep,
I don't think it has,
uh,
does it have a tour?
Yeah,
no,
you're right.
It does.
It does.
Um,
somewhere down.
Yeah.
Pep six 36.
Yeah.
It's a different pep.
That is the tutorial for the pep.
Interesting.
I didn't realize that it's kind of meta.
Honestly.
Anyway,
I,
to me,
I'm a little surprised.
It was accepted.
Fine.
Um, I know people worked really hard on it and congratulations. A lot of people really want it. It's kind of meta, honestly. Anyway, to me, I'm a little surprised it was accepted. Fine.
I know people worked really hard on it.
And congratulations.
A lot of people really want it.
It comes from Haskell.
So Haskell had this pattern matching alternate struct thing.
I don't know.
I just feel like Haskell and Python are far away from each other.
So that's my first impression.
I will probably come to love it at some point.
PyCon registration is open. So if you want to go to PyCon, you want to attend and be more part of it than just like
watching the live stream on YouTube, be part of that.
I think I'm going to try to make a conscious effort to attend the virtual conference, not
just catch some videos.
So, uh, you can do that.
PyCon is awesome.
Like just, I, my first conference was PyCon and then I went to other conferences and I
was like, what are wrong with these conferences?
Like, why do they suck so much?
I know, I feel the same way.
I know.
It's really, really special.
I'm sure the virtual one will be good.
I can't wait for the in-person stuff to come back
because it really is an experience.
For sure.
Yeah, it's a whole nother experience in person.
I consider it basically my geek holiday
where I get away and just get to hang out with my geek friends.
I happen to learn stuff while I'm there.
Totally.
And then Python Web Comp is coming up,
and registration is open for that as well.
And I suppose probably PyCascades,
which Brian and I are on a panel at there as well.
Oh, nice.
I put a link into an hour of code for Minecraft,
which has to do with programming Minecraft with Python.
If people are looking to teach kids stuff, that looks pretty neat.
My daughter's super into Minecraft. I don't do anything with it. But if you are and
you want to make it part of your curriculum, that's pretty cool. Hannah, anything you want
to throw out there before we break out the joke? Nope. I'm good. Awesome. Do it. All right. So
this one, we have something a little more interactive for everyone. We've got a song about PEP8, about writing clean code. This is
written and produced
sung by Leon Sandoy.
It goes by Lemon.
Him and his team over at Python Discord, he runs
Python Discord and apparently it was a team
effort creating this. And the reason I'm covering it is
a bunch of people sent it over. So Michael
Rogers Valet sent it over. So you should cover
this. Dan Bader said, check this out. Alan
McElroy said, hey, check out this thing.
So, all right.
I actually spoke to Lemon and said, hey, do you mind if we play this?
He said, no, that'd be awesome.
Give us a shout out, of course.
So we're going to actually play the song as part of this.
In the live stream, you get the video.
On the audio, you get, well, audio.
So I'm going to kick this off and we'll come back.
And I'd love to hear Brian and Hannah's thoughts.
Here we go. You don't need any curly braces
Just four spaces, just four spaces Wildcard imports should be avoided
In most cases, in most cases Try to make sure there's no trailing white
space It's confusing, it's confusing. Trailing commas go behind list items. Get
blamed, titans. Get blamed, titans. And comments are important as long as they're maintained
When comments are misleading, it will drive people insane
Just try to be empathic, just try to be a friend
It's really not that hard, just adhere to to Pepeade
Pepeade
Constance should be named all capital
letters and
live forever
live forever
and camel cases not for Python
Never ever, never ever
And never use a bare exception
Be specific, be specific
No one likes the horizontal scroll bar
Keep it succinct, keep it succinct
And comments are important as long as they're maintained
When comments are misleading
It will drive people insane
Just try to be empathic
Just try to be a friend
It's really not that hard
Just adhere to
Pepaid
Pepaid Peppate. Peppate. Peppate. Peppate. That was amazing.
I can sympathize with so much of what he's saying.
I'm just having flashbacks to a discussion I had with my teammate about comments.
And being like, no, this comment doesn't actually describe what the code is doing
it's worse than having no comment it really is it really is yeah or like if it describes like
literally what the code is doing and not like exactly you know kind of like high level background
or anything other than the why the why is important yeah i love it so two things lemon and team well done
on the song and man you got a great voice that's actually it was beautiful and funny yeah it was
amazing all right well brian we probably should wrap it up yeah yeah all right well hannah thanks
so much for being here it's good to have you on the show and brian thanks as always everyone
thanks for listening thanks bye bye thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.