Python Bytes - #112 Don't use the greater than sign in programming
Episode Date: January 11, 2019Topics covered in this episode: [play:0:56] nbgrader [play:3:22]* profanity-check* [play:9:05]* Python Dependencies and IoC* [play:16:59] A Gentle Introduction to Pandas [play:18:38] Don't use the... greater than sign in programming Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/112
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 112, recorded January 9th, 2019.
I'm Michael Kennedy.
And I'm Brian Atkin.
Hey Brian, how you doing?
I am great. It's a wonderful January.
We're starting to get back into the swing of things. The news is starting to flow again.
Yes.
Yeah, absolutely.
Now, before we get into it, I just want to say thank you to Datadog for sponsoring the show, as they are many of our shows. So tell you more about them later. Right now, I want to just think back to what it was like to have my programming and computer science assignments graded. They were like, here is an algorithm, write the output with a pencil on a piece of paper. We've come a long way from there,
right? Yeah. I mean, I even remember like, I guess, turning in floppy disks and code printouts
and stuff like that. Right, because what are you going to do, go for them? First thing I want to
talk about is a thing called NB Grader. So that's short for notebook grader. And this, I just ran
across this. This is just so totally cool i'm just going
to read their little thing and there was an article about it in journal of open source education
beginning of the summary is nb grader is a flexible tool for creating and grading assignments
in the jupiter notebook nb grader allows instructors to create a single master copy of an
assignment including tests and canonical solutions.
From the master copy, a student version is generated without the solutions, thus obviating the need to maintain two separate versions.
NBGrader also automatically grades submitted assignments by executing the notebooks and storing the results of the test in a database.
After auto-grading, instructors can manually grade free responses and provide partial credit using the FormGrader Jupyter notebook extension.
Finally, instructors can use NBGrader to leave personalized feedback for each student submission,
including comments as well as detailed error information. That sounds super useful.
I totally want to play with it, even though I'm not a teacher.
We're also linking to the MBGrader documentation
that has a little intro video on how it all works.
And wow, it just looks totally cool.
That seems like an awesome way to grade computer science stuff.
And you could grade pretty much anything that is reasonable
to compute within a Jupyter notebook, right?
So I guess the people would have to have some Python
or some sort of skill where they interact with it.
But maybe that could be really simple,
like just put an answer or a number or something into a cell
that then gets stored and checked.
But the thing that was kind of a concern for me
as you're describing it was,
well, what if there's like a super simple mistake you make and then the answer is way off.
So you just get it wrong.
Right.
But the fact that you can go back and give partial credit and like evaluate it, that that sounds pretty cool.
Like a lot of the stuff, if you got tests in place where it just checks their code and all the people that got it right, you don't have to really go back and double check that stuff. Maybe spot check to make sure they're not all writing the same answer or
something, but it looks like a lot of fun. Yeah, that's cool. I was a TA in college and
had to grade a lot of calculus tests and stuff. This seems really lovely compared to the alternative,
honestly. This is great. Sometimes when people are doing their assignments, they can get pretty
upset. Things aren't working out. It's it's really frustrating yeah they might even swear they
might and they might do it in like a public forum or maybe they do it in like a github commit
that is going to be public and you don't want it there and so you might want to check that
and there's a couple ways actually to check for profanity in python and there's a couple of ways actually to check for profanity in Python. And there's a new library called profanity-check.
So what's cool about this, I mean, obviously you could say, does it have these seven words or whatever?
But this one takes AI and applies it to this problem, basically.
Wow. basically. It takes a linear SVM model trained on 200,000 human labeled samples of clean and
profane text. So this string is bad. This sentence is good. This phrase is bad. This phrase is good.
And then it uses that to understand how similar whatever you're looking at is to something like
one of these bad phrases.
Isn't that cool?
Yeah, very.
So one of the problems with a lot of the systems out there that are more simple is they just
have like explicitly bad words.
But as you can imagine, there are many, many bad words that you might forget or there's
some slightly different way of saying some other thing and they fall through.
So this one turns out to catch a lot of them.
And it's also super, super fast.
So there's another one out there called profanity-filter,
which is more sophisticated than a lot of these,
you know, like just are these words in here, checks.
This one is similar,
but because it creates this model and just uses the result,
it's actually like 300 to 400 times faster than the other one.
That's cool.
If you have 300 to 400 times faster, not percent, times,
like 13 seconds versus 24 milliseconds type of difference,
that's pretty awesome.
And the speed really matters,
especially if the amount of text you're filtering is huge.
Right, or a whole bunch of stuff real time or something like that.
And so it's super simple to
use it has basically two functions it calls predict whether or not something is bad or give
the probability so you can call predict and give it some text and it'll give you like zero one or
you can say give me the probability and it'll say this is we think this is 70 point you know 76.3
bad do with that what you will so you can you can take it as black and white or gray
and then just decide how gray you'll let it get okay so i'm like i'm redoing uh some one of my
websites maybe i'll uh do this on my own blog posts and make sure that i haven't uh just curious
to see what my confidence level is that they're clean yeah. I think a lot of people don't have this problem,
but if your problem is to take user input
and evaluate it for this characteristic,
like that would be a complete pain, right?
And so here's a pip install,
one-liner sort of thing you can do
that will help a lot of things.
Yeah, neat.
Yeah, yes, indeed.
All right, what's the next one?
Something we've never talked about on this show, right?
We've actually talked to, of course,
talked about packaging quite a bit.
So dealing with packages, if you're dealing with Python a lot,
like the difference between a module and a package in the file system and then an installable package that you can distribute,
that all just becomes second nature.
We don't even really think about it anymore.
But as I'm working with different people and different people are starting to work in python around you sometimes you have a you had
somebody that you need to explain this to and it's hard to remember all the it's hard for me to
remember like all what it was like to not know all this stuff so i bookmarked this an article
called an introduction to python packages for absolute beginners and it's just a nice gentle this article called An Introduction to Python Packages for Absolute Beginners.
And it's just a nice, gentle discussion about somebody trying to share some code and then describes modules and packages and using packages and installing and what import means and a
bunch of stuff like that.
Yeah.
So I think this would be good either to hand around or just review before you go explain
it to somebody.
Right.
We get so excited about jumping in and talking about Poetry or PipMF or all these other things.
And it's just like, wait, what are these?
You know, when you're new, it's like, what are these things?
Like, how do I make a package?
You know, how do I share it?
You know, people probably start out with just like one giant Python file.
And like, that's the whole, the whole app is just crammed into the one file even
right and people share the code by just emailing it around or copying it into different repos and
stuff and there's yeah there are better ways to me it's a little annoying that the word package
has multiple meanings because it's python calls just a directory with an init in it, that's a package.
But that's not what PyPI is full of.
Right.
Distributions. Wheels and all that stuff, right?
Yeah.
Like, yeah, a whole other level.
I do agree that those are, like, oddly the same and different.
Yeah.
Yeah.
It's definitely confusing.
So this is good.
So if you're confused about how your app is working, we know a company that can help, right, Brian?
Yes, we do.
Datadog.
So Datadog sponsored the show,
as I said at the opening.
They're a cloud-scale monitoring platform
that brings together all your metrics,
logs, distributed traces,
all into one place.
And it will auto-instrument things like
Django or Flask or Postgres
and let you track requests
across those different pieces of infrastructure
and put them all back
together to know why it was slow, where it was working, things like that. So that's pretty
awesome. Check them out at pythonbytes.fm slash Datadog. Go do a free trial and they will send
you a cool Datadog t-shirt. So definitely check them out. It helps support the show.
Plus the t-shirts are cool.
And the t-shirts are very cool. They have a cute little dog on them.
Now I'm going to bring up something on here that we don't spend a whole lot of time on and it may be it's
even a little bit of controversial what do you think i'm looking forward to talking about this
yeah i figured you are i figured you have an opinion one way or the other so the idea is in
python we can usually get away with replacing our dependencies.
Like if we're talking to a database or a web service,
we can kind of cancel that out so we can test our code
by doing like some sort of patch operation
or something to that effect, right?
We can get it out of the way.
But this guy named Yasha Gutzir,
hopefully I got that closely right,
sent us a message that said, hey, I've
been reviewing all of the Python dependency injection and IOC and version control containers
around Python.
And I know that some folks say it's not even necessary, but on large apps, I think there's
a lot of value in making your dependencies more explicit.
Yeah.
So he sent us a big long list of all the options, basically.
And he did a bunch of good research for us. Interesting. Yeah. So he sent us a big long list of all the options basically. And he did a bunch of good research for us.
Awesome.
Yeah.
So I'll just read off a couple of them here.
We got five or six.
So we have one called dependency injector,
which apparently requires some tricks to get installed on windows,
but he couldn't get it quite working, but it looks pretty good.
I'm kind of mediocre on that one.
There's injector, which is fairly Java-esque.
There's Pinject, probably P-I-N-J-E-C-T,
something like that.
And this one had kind of gone unmaintained,
but there's, for like five years, a long time.
But now there's new folks working on it,
so that's kind of cool,
and it seems like it's doing a lot.
There's Python Inject, which has got some really nice testing features.
It's got built-in mocking and stuff and things like that.
Are you starting to notice a similarity in the name?
Yeah.
There's another one that's just here more for completeness sake,
DIPy, but it only works on Python three, four apparently.
So appreciate the,
the comment here is like,
you know,
this is a legacy.
So I,
I can't really be touching on this.
Like there's no good.
And then the,
the next two I think are really quite good.
Okay.
There's serum,
which I think actually is a pretty interesting thing to look at because what
it does is it primarily is driven through class
decorators. Okay. So what you do is you go to like some class here and you say, um,
this class is a dependency. So you put it at dependency on to the class definition.
And then later on, you can put an ad inject on top of either a function call or a class and if the class
has like say like a log field a class level log field it will automatically be set to an instance
of that dependency based on the type annotation there's an interesting way that it kind of uses
type annotations and class decorators to link that back together yeah okay okay and then the final one is this thing called haps and haps is pretty cool and it's really
lightweight and quite simple also based on type annotation so a lot of them are taking advantage
of the i think it's probably three six either three five or three six but i think it's three
six because the some of the ways it's using type annotations. But the point is using the modern features of Python 3 to help figure out a lot of the
configuration and how stuff wires together.
Okay.
That's the survey that Josje gave us, and thank you for that.
That was cool.
Now, you want to have a quick chat about whether Python needs dependency injection?
What do you think?
I'm still confused as to what the problem is that it's trying to solve, is my thing.
I hear you. And I think, let me try to talk about the other side, although I find myself
not doing this very often. So for what it's worth, I don't do dependency injection a lot.
So I think the fundamental, let's do it a couple of steps. I think the fundamental starting point is
it's trying to write object-oriented Python
or even functions following the open-close principle, which is one of the solid principles.
And it's pretty interesting, this principle.
It says that software entities like classes and functions should be open for extension
but closed for modification, which is like, what the heck does that mean?
Basically, I should be able to change the behavior of this class or this function without touching the source code to modify it which kind of sounds like wait how do you do
that how's that possible but imagine like it has like a logging feature instead of just internally
creating one if you could pass in the logger you could pass in different loggers changing the
behavior of how it logs right so open close principle that's how it works right that's i
think the general motivation for all of these frameworksclose principle, that's how it works, right? That's, I think, the general motivation
for all of these frameworks.
Yeah.
Because they're like, that's cool.
I want to do that.
It's good for testing
because I could pass in like a fake logger
or like a mock database.
I could pass that in, right?
And not touch the database.
And I think that's generally a good feature,
a good way to do things.
The problem is, if you do that at low-level stuff
and at all the different layers of your
app, at the top, you've got to like pass like 20 things to the top level things.
So it can like distribute them down as it creates all the objects further down the graph.
Right.
So then people have come up with IOC containers, which like get registered for what I need.
One of these, I really give it one of those.
And then I create this object by giving it three of these things at once. And that starts to get really hard in my mind to know like, okay,
what is being done here? Like I see a bunch of abstract types and I can't even tell.
An example of like, you don't know what database you're going to use. Another, you can use the
injection thing, but it kind of ripples through a whole bunch of layers of code
is that is the part that i don't like whereas um another way to do this is to to kind of bypass
all of the middle stuff and at a top level have and like flask i think flask does this sort of a
thing and a lot this is a common design is to define, instantiate the real objects at an application level and just set those where they need to be set.
So there's like a whatever the real database is.
Right. Go look up the service for the database and everybody can ask that thing to give it the database. Right.
And then everybody just uses the same interface and we don't need to pass it through all the levels of constructors and stuff
it can just kind of bypass all of that i guess then because that's how i generally do things
and then for testing yeah i'm okay with patching and monkey patching and stuff like that so i hear
you i think in python it is certainly something that's open for more for debate because we do
have these alternative ways to accomplish the same thing
like monkey patching now i don't know i'm kind of a fan of the open close principle
in general but i do think it can just become like too much when you put it all together
and certainly i've worked on some applications that did this all over the place and it was some
of the most frustrating code i've ever had to like work through because it was just like every step you're like i have there's four things working together and i don't
know what any of them are right now because of some configuration setting somewhere other than
that so i i don't know i'm kind of uh i'm on the fence like some parts of this i think are cool
and some i think can go too far but i guess you know check out haps if this kind of stuff is
interesting to you it is it is pretty well done i think that one of the places for it is if people are really used to using this kind of a model
and then coming to Python, yeah, you can do it here too.
It's just I'm not sure I'm there.
Yeah, I think there's simpler things than IOC containers,
but this podcast is probably a little short if we're going into them.
But it's certainly an interesting thing to think about, and here's a bunch of options. Yep. Cool. You know, after all that, Brian, I feel like I just
need something gentle, like a gentle conversation about like a soft, fuzzy animal. Yeah. Like a
gentle introduction to pandas. Yes. Well, maybe not an animal, but yeah, something gentle. Tell
us about pandas. So this is another kind of a newbie thing, but we're starting to use Pandas DataFrames at work.
And I really kind of needed a pretend I'm just starting out, which I am, and kind of tell me
how these things work. And so it's called a gentle introduction to Pandas, but it's really
a gentle introduction to the data structures series and DataFr the series are interesting i think it's just a
precursor to try to jump you into data frames that's where the real fun gets starts to happen
goes through about a half an article talking about arrays series how do you create series
from arrays and dictionaries and and i didn't know you could create a a series from uh from
just a scaler and give it a bunch a different
index and it'll like fill it in that's pretty cool oh that is cool yeah but then it jumps into
data frames and then talks about sorting and slicing and how do you select things by label
or position and then uh what one of the things and how easy it is to get the statistics on columns
and then how to get things in and out of data frames.
So importing and exporting.
And then where you take it from there
depends on your problem space,
but this is kind of a really good
why do we call these things data frames
and why do we care about them?
If you need to understand them, this is a decent
article. Yeah, if you need to understand them, 15 minutes.
This is kind of a no-fluff
keep it simple one. Nice little article by Wilson Busaca. Well done. Let's see. Medium tells me it's
a five-minute read, but I bet Medium's not taking into account the code. So 15 minutes, how about
that? Yeah, I think so. Right. So this last one I have for you, Brian, I think it's going to be
a little bit of a shock. It'll come across a little bit weird at first,
but the more you look at it,
the more it starts to sound appealing, let's say.
Yeah.
All right, so I'm going to give you some advice.
I'm going to tell you a bit about it.
So the advice, you know, you also get all sorts of advice,
like don't format your code like this.
Don't have a bunch of multiple,
if this is equal to this value and that value and that value,
maybe do an in test. So there's like sort of Pythonic ways to do conditionals and whatnot.
The advice here is to never, not almost never, says don't use the greater than sign in programming.
Yeah.
It's crazy, right?
It seems like kind of a bold statement. I'm like, well, we have it. It must be useful somewhere.
It must be useful. And why would we not want to use it? So this is an article by a good friend
of mine, Llewellyn Falco, who I've known for a long time, but someone else sent me this article,
which I thought was a pretty interesting coincidence. And Llewellyn has a really
interesting way of like looking at straightforward stuff and then just getting it down to its
essence. So he says like, let's look at this problem. Let just getting it down to its essence.
So he says, like, let's look at this problem.
Let's suppose I want to check whether a number, let's call it x, it's a variable, is between 5 and 10.
There are a lot of ways that we can do this.
We could say x greater than 5 and 10 greater than x, or we could say x greater than 5 and x less than 10, right? Those are
equivalent. But why should you choose one over the other? Well, he lists off these six different
ways of doing this. He says, actually, here's all the ways. Oh, no, wait, look, one of them is wrong.
Go back and figure out which one is wrong. And it's like not very obvious. You know, you kind
of got to go through and think through every little bit. Right? So this is look, if you remove the greater than sign, there's actually
only two ways to say this x less than 10 and five less than x, which is kind of weird, or five less
less than x and x less than 10. So in that last one, it's cool, because the variable you're trying
to test between five and 10 is literally between the five and the 10. And that statement one, it's cool because the variable you're trying to test between 5 and 10 is literally between the 5 and the 10 in that statement.
It's in text, it's between, and it's actually between.
Yeah.
So here you can test this containment interval bit completely with no greater than.
That's how I code.
I think of, especially with numbers, I think that all of the comparisons need to kind of be on the number line.
Yes.
You can think about them easier.
I've never really seen it as put in place as a rule, kind of a rule of thumb of just don't use the greater than sign.
Yeah, it's really interesting.
And this analogy back to the number line is perfect because it's like, well, where do you want the variable to be relative to this?
So if you want it to be between, then as you say, like five x x less than 10 right so it's between if you want to test that it's outside
there you could do the same thing x less than five or 10 less than x and you put the variable
outside the numbers right so you can do this number line sort of relative bit with both you
know and and or and containment and not contained in and things like that.
We'd kind of be remiss if we didn't mention that this article is referencing all programming
languages. If you're doing Python, of course, you would just say five is less than X is less
than 10. You don't need the and. Nice. And also somebody said,
okay, I'm all for, I follow you on this. This is great, and I'm with you,
but how do you say I would like all the numbers greater than one
without the greater than sign?
And so the answer is, of course, one less than X.
Yeah, there's times where it's a little, that's why it's not,
it's more of a rule of thumb, I think,
because there's times where it just doesn't look right
and you have to go for maintenance. more of a rule of thumb, I think, because there's times where it just doesn't look right,
and you have to go for maintenance. If it just looks weird, then change it.
I brought this in because I thought it was interesting. And when I first read it, I'm like, well, that's dumb advice. What is this? And I read it, I'm like, actually, no, this makes a lot
of sense a lot of the time. But I agree, if you have one thing, you want to say x is greater than
one, don't twist around so you don't have you have one thing, you want to say X is greater than one,
you know, don't twist around. So you don't have to have the greater than sign, just like a,
say the most straightforward thing. But if you're doing more complicated comparisons,
then I think it's, it's worthwhile. Yeah. Like I would say like, like for instance,
a series of if clauses, if you have a, and you're not really testing both ends, if you're doing like
if X is greater than the max, then do something.
And if all the comparisons have X on the left,
I wouldn't change it just because of this.
But, you know, anyway.
All right, Brian, well, that's it for all of our main topics.
I got a few extras to share with everyone while we're here,
just really quick and short things.
And, of course, not be forgotten as our joke,
but you got any extras to share with everyone?
I did mention last time that I was having some issues
with PythonTesting.net.
I think I mentioned that, but
with SSL and stuff, but that's all
resolved and fixed.
So if I go over here
and I pull this up in Chrome,
is it going to tell me that it's secure?
It should. Nice.
Yeah, testing code over SSL. Beautiful.
It's still kind of a WordPress thing is what I use.
And I'm not thrilled with that.
So I have a side project going on to convert that to something else.
But it's not urgent anymore.
Yeah, that's good.
Well, you'll have to give us the full report once you get it all fixed up.
Okay, so you said you got a bunch of stuff for us.
I do.
I'll go through them quick.
First of all, there's a new Python podcast, which is pretty exciting.
And this one is focused
on teaching Python.
And do you know
what the name of it is?
I think it's probably
Teaching Python.
Yeah, it is.
So, Teaching Python
is by Kelly Paredes
and Sean Tibor.
Sorry about messing up the names.
But they're doing a podcast.
These are two middle school teachers
who are learning
and teaching Python
to their students
and basically documenting that journey.
So if you're interested in that,
especially if you're a teacher or you work with kids,
I think this will be really, really helpful for you.
So you can check that out.
I'm about halfway through the backlog so far
and I really like it so far.
Yeah, they're doing a nice job.
One of the things that had kept people
from using GitHub
for their private work was that you had to pay for private repositories on GitHub,
no matter what. Yes. Right. So people would use Bitbucket because Bitbucket had
free private repositories. Well, GitHub decided we're also going to have free private repositories.
So if you're working on projects that they have to stay private or you just want them private,
you can now use GitHub
without paying anything. There's been some weird
reactions to it, but they're just sort of
following the model of Bitbucket
and GitLab now,
so I don't think there's anything weird
going on. Exactly. Competition
is a good thing, and here we have it.
It's not entirely free. It's not like GitHub
decided they're not going to make money anymore you can only have three contributors to the private
repository and so there's limits and things like that but still pretty cool for most things yeah
right also very quickly some early details about europython are available and it's looking pretty
sweet i'd love to go i don't know if i'll be able to. Yeah, me too. Yeah, so they just announced EuroPython.
It's going to be in Basel, Switzerland, July 8th to 14th.
And it looks great.
So I put a link to the conference site there.
I don't think they have call for papers or anything like that out yet,
but it should be out pretty soon.
Another thing that has been lacking in the world
is good data center support in Africa. So I know this because
I use AWS to deliver the video course content, like actual the videos. And I have streaming
servers all over the place, like in Brazil or Mumbai or whatever, but there's just no way to
do that in Africa. So the big news is there's an AWS data center coming to South Africa,
which is pretty cool for anyone that wants to be closer to that part of the world.
And finally, Pandas is dropping legacy support.
No more Python 2 in Pandas.
Oh, cool.
Yeah, and that's coming out like this month.
So it should be good.
Yeah, this is the year that a ton of projects are dropping Python 2.
Yeah, for sure.
So one more major thing.
We already covered how cool Pandas is.
It's not going to support legacy Python anymore.
All right.
You ready for the joke?
Yeah.
Can I click on it now?
You can click on it.
This is a visual one, but I can describe it to you folks.
Now, I just got to do a quick little bit of history here for people who maybe have not seen harry potter so this is the
harry potter joke and there's a point in the harry potter movie i think this might be the first one
where harry potter has to get on this like long table and is battling i don't know someone
something and all the other students are standing around and somebody like
conjures a snake a serpent and harry in the real show harry starts speaking to the thing
in its native tongue which apparently is a freaky thing to do and people were all freaked out and it
was called a parcel tongue something like that right that he could speak snake so with that
here's the joke so there's a picture. Harry's fighting the snake in that environment.
And he says, import OS current path equals OS dot get current working to her
and just start speaking out Python commands at the snake.
And Hermione says, I didn't know Harry spoke Python.
And Ron Weasley says, yeah, he's a parser tongue.
That's terrible.
It's really bad.
It's really bad.
But there it is.
And Nick Spirit sent that to us.
So thank you, Nick, for finding that joke and letting us share it here.
Yeah, very nerdy.
Yep.
He's a parser tongue.
Well, I think we're going to leave it at that, Brian.
Thanks for being here.
Yeah, thank you.
Yeah, bye.
Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.