Python Bytes - #112 Don't use the greater than sign in programming

Episode Date: January 11, 2019

Topics covered in this episode: [play:0:56] nbgrader [play:3:22]* profanity-check* [play:9:05]* Python Dependencies and IoC* [play:16:59] A Gentle Introduction to Pandas [play:18:38] Don't use the... greater than sign in programming Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/112

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 112, recorded January 9th, 2019. I'm Michael Kennedy. And I'm Brian Atkin. Hey Brian, how you doing? I am great. It's a wonderful January. We're starting to get back into the swing of things. The news is starting to flow again. Yes.
Starting point is 00:00:19 Yeah, absolutely. Now, before we get into it, I just want to say thank you to Datadog for sponsoring the show, as they are many of our shows. So tell you more about them later. Right now, I want to just think back to what it was like to have my programming and computer science assignments graded. They were like, here is an algorithm, write the output with a pencil on a piece of paper. We've come a long way from there, right? Yeah. I mean, I even remember like, I guess, turning in floppy disks and code printouts and stuff like that. Right, because what are you going to do, go for them? First thing I want to talk about is a thing called NB Grader. So that's short for notebook grader. And this, I just ran across this. This is just so totally cool i'm just going to read their little thing and there was an article about it in journal of open source education beginning of the summary is nb grader is a flexible tool for creating and grading assignments
Starting point is 00:01:16 in the jupiter notebook nb grader allows instructors to create a single master copy of an assignment including tests and canonical solutions. From the master copy, a student version is generated without the solutions, thus obviating the need to maintain two separate versions. NBGrader also automatically grades submitted assignments by executing the notebooks and storing the results of the test in a database. After auto-grading, instructors can manually grade free responses and provide partial credit using the FormGrader Jupyter notebook extension. Finally, instructors can use NBGrader to leave personalized feedback for each student submission, including comments as well as detailed error information. That sounds super useful. I totally want to play with it, even though I'm not a teacher.
Starting point is 00:02:06 We're also linking to the MBGrader documentation that has a little intro video on how it all works. And wow, it just looks totally cool. That seems like an awesome way to grade computer science stuff. And you could grade pretty much anything that is reasonable to compute within a Jupyter notebook, right? So I guess the people would have to have some Python or some sort of skill where they interact with it.
Starting point is 00:02:30 But maybe that could be really simple, like just put an answer or a number or something into a cell that then gets stored and checked. But the thing that was kind of a concern for me as you're describing it was, well, what if there's like a super simple mistake you make and then the answer is way off. So you just get it wrong. Right.
Starting point is 00:02:52 But the fact that you can go back and give partial credit and like evaluate it, that that sounds pretty cool. Like a lot of the stuff, if you got tests in place where it just checks their code and all the people that got it right, you don't have to really go back and double check that stuff. Maybe spot check to make sure they're not all writing the same answer or something, but it looks like a lot of fun. Yeah, that's cool. I was a TA in college and had to grade a lot of calculus tests and stuff. This seems really lovely compared to the alternative, honestly. This is great. Sometimes when people are doing their assignments, they can get pretty upset. Things aren't working out. It's it's really frustrating yeah they might even swear they might and they might do it in like a public forum or maybe they do it in like a github commit that is going to be public and you don't want it there and so you might want to check that
Starting point is 00:03:39 and there's a couple ways actually to check for profanity in python and there's a couple of ways actually to check for profanity in Python. And there's a new library called profanity-check. So what's cool about this, I mean, obviously you could say, does it have these seven words or whatever? But this one takes AI and applies it to this problem, basically. Wow. basically. It takes a linear SVM model trained on 200,000 human labeled samples of clean and profane text. So this string is bad. This sentence is good. This phrase is bad. This phrase is good. And then it uses that to understand how similar whatever you're looking at is to something like one of these bad phrases. Isn't that cool?
Starting point is 00:04:27 Yeah, very. So one of the problems with a lot of the systems out there that are more simple is they just have like explicitly bad words. But as you can imagine, there are many, many bad words that you might forget or there's some slightly different way of saying some other thing and they fall through. So this one turns out to catch a lot of them. And it's also super, super fast. So there's another one out there called profanity-filter,
Starting point is 00:04:51 which is more sophisticated than a lot of these, you know, like just are these words in here, checks. This one is similar, but because it creates this model and just uses the result, it's actually like 300 to 400 times faster than the other one. That's cool. If you have 300 to 400 times faster, not percent, times, like 13 seconds versus 24 milliseconds type of difference,
Starting point is 00:05:14 that's pretty awesome. And the speed really matters, especially if the amount of text you're filtering is huge. Right, or a whole bunch of stuff real time or something like that. And so it's super simple to use it has basically two functions it calls predict whether or not something is bad or give the probability so you can call predict and give it some text and it'll give you like zero one or you can say give me the probability and it'll say this is we think this is 70 point you know 76.3
Starting point is 00:05:41 bad do with that what you will so you can you can take it as black and white or gray and then just decide how gray you'll let it get okay so i'm like i'm redoing uh some one of my websites maybe i'll uh do this on my own blog posts and make sure that i haven't uh just curious to see what my confidence level is that they're clean yeah. I think a lot of people don't have this problem, but if your problem is to take user input and evaluate it for this characteristic, like that would be a complete pain, right? And so here's a pip install,
Starting point is 00:06:13 one-liner sort of thing you can do that will help a lot of things. Yeah, neat. Yeah, yes, indeed. All right, what's the next one? Something we've never talked about on this show, right? We've actually talked to, of course, talked about packaging quite a bit.
Starting point is 00:06:30 So dealing with packages, if you're dealing with Python a lot, like the difference between a module and a package in the file system and then an installable package that you can distribute, that all just becomes second nature. We don't even really think about it anymore. But as I'm working with different people and different people are starting to work in python around you sometimes you have a you had somebody that you need to explain this to and it's hard to remember all the it's hard for me to remember like all what it was like to not know all this stuff so i bookmarked this an article called an introduction to python packages for absolute beginners and it's just a nice gentle this article called An Introduction to Python Packages for Absolute Beginners.
Starting point is 00:07:10 And it's just a nice, gentle discussion about somebody trying to share some code and then describes modules and packages and using packages and installing and what import means and a bunch of stuff like that. Yeah. So I think this would be good either to hand around or just review before you go explain it to somebody. Right. We get so excited about jumping in and talking about Poetry or PipMF or all these other things. And it's just like, wait, what are these?
Starting point is 00:07:31 You know, when you're new, it's like, what are these things? Like, how do I make a package? You know, how do I share it? You know, people probably start out with just like one giant Python file. And like, that's the whole, the whole app is just crammed into the one file even right and people share the code by just emailing it around or copying it into different repos and stuff and there's yeah there are better ways to me it's a little annoying that the word package has multiple meanings because it's python calls just a directory with an init in it, that's a package.
Starting point is 00:08:08 But that's not what PyPI is full of. Right. Distributions. Wheels and all that stuff, right? Yeah. Like, yeah, a whole other level. I do agree that those are, like, oddly the same and different. Yeah. Yeah.
Starting point is 00:08:17 It's definitely confusing. So this is good. So if you're confused about how your app is working, we know a company that can help, right, Brian? Yes, we do. Datadog. So Datadog sponsored the show, as I said at the opening. They're a cloud-scale monitoring platform
Starting point is 00:08:30 that brings together all your metrics, logs, distributed traces, all into one place. And it will auto-instrument things like Django or Flask or Postgres and let you track requests across those different pieces of infrastructure and put them all back
Starting point is 00:08:45 together to know why it was slow, where it was working, things like that. So that's pretty awesome. Check them out at pythonbytes.fm slash Datadog. Go do a free trial and they will send you a cool Datadog t-shirt. So definitely check them out. It helps support the show. Plus the t-shirts are cool. And the t-shirts are very cool. They have a cute little dog on them. Now I'm going to bring up something on here that we don't spend a whole lot of time on and it may be it's even a little bit of controversial what do you think i'm looking forward to talking about this yeah i figured you are i figured you have an opinion one way or the other so the idea is in
Starting point is 00:09:18 python we can usually get away with replacing our dependencies. Like if we're talking to a database or a web service, we can kind of cancel that out so we can test our code by doing like some sort of patch operation or something to that effect, right? We can get it out of the way. But this guy named Yasha Gutzir, hopefully I got that closely right,
Starting point is 00:09:43 sent us a message that said, hey, I've been reviewing all of the Python dependency injection and IOC and version control containers around Python. And I know that some folks say it's not even necessary, but on large apps, I think there's a lot of value in making your dependencies more explicit. Yeah. So he sent us a big long list of all the options, basically. And he did a bunch of good research for us. Interesting. Yeah. So he sent us a big long list of all the options basically. And he did a bunch of good research for us.
Starting point is 00:10:07 Awesome. Yeah. So I'll just read off a couple of them here. We got five or six. So we have one called dependency injector, which apparently requires some tricks to get installed on windows, but he couldn't get it quite working, but it looks pretty good. I'm kind of mediocre on that one.
Starting point is 00:10:23 There's injector, which is fairly Java-esque. There's Pinject, probably P-I-N-J-E-C-T, something like that. And this one had kind of gone unmaintained, but there's, for like five years, a long time. But now there's new folks working on it, so that's kind of cool, and it seems like it's doing a lot.
Starting point is 00:10:45 There's Python Inject, which has got some really nice testing features. It's got built-in mocking and stuff and things like that. Are you starting to notice a similarity in the name? Yeah. There's another one that's just here more for completeness sake, DIPy, but it only works on Python three, four apparently. So appreciate the, the comment here is like,
Starting point is 00:11:08 you know, this is a legacy. So I, I can't really be touching on this. Like there's no good. And then the, the next two I think are really quite good. Okay.
Starting point is 00:11:15 There's serum, which I think actually is a pretty interesting thing to look at because what it does is it primarily is driven through class decorators. Okay. So what you do is you go to like some class here and you say, um, this class is a dependency. So you put it at dependency on to the class definition. And then later on, you can put an ad inject on top of either a function call or a class and if the class has like say like a log field a class level log field it will automatically be set to an instance of that dependency based on the type annotation there's an interesting way that it kind of uses
Starting point is 00:12:00 type annotations and class decorators to link that back together yeah okay okay and then the final one is this thing called haps and haps is pretty cool and it's really lightweight and quite simple also based on type annotation so a lot of them are taking advantage of the i think it's probably three six either three five or three six but i think it's three six because the some of the ways it's using type annotations. But the point is using the modern features of Python 3 to help figure out a lot of the configuration and how stuff wires together. Okay. That's the survey that Josje gave us, and thank you for that. That was cool.
Starting point is 00:12:38 Now, you want to have a quick chat about whether Python needs dependency injection? What do you think? I'm still confused as to what the problem is that it's trying to solve, is my thing. I hear you. And I think, let me try to talk about the other side, although I find myself not doing this very often. So for what it's worth, I don't do dependency injection a lot. So I think the fundamental, let's do it a couple of steps. I think the fundamental starting point is it's trying to write object-oriented Python or even functions following the open-close principle, which is one of the solid principles.
Starting point is 00:13:11 And it's pretty interesting, this principle. It says that software entities like classes and functions should be open for extension but closed for modification, which is like, what the heck does that mean? Basically, I should be able to change the behavior of this class or this function without touching the source code to modify it which kind of sounds like wait how do you do that how's that possible but imagine like it has like a logging feature instead of just internally creating one if you could pass in the logger you could pass in different loggers changing the behavior of how it logs right so open close principle that's how it works right that's i think the general motivation for all of these frameworksclose principle, that's how it works, right? That's, I think, the general motivation
Starting point is 00:13:46 for all of these frameworks. Yeah. Because they're like, that's cool. I want to do that. It's good for testing because I could pass in like a fake logger or like a mock database. I could pass that in, right?
Starting point is 00:13:56 And not touch the database. And I think that's generally a good feature, a good way to do things. The problem is, if you do that at low-level stuff and at all the different layers of your app, at the top, you've got to like pass like 20 things to the top level things. So it can like distribute them down as it creates all the objects further down the graph. Right.
Starting point is 00:14:14 So then people have come up with IOC containers, which like get registered for what I need. One of these, I really give it one of those. And then I create this object by giving it three of these things at once. And that starts to get really hard in my mind to know like, okay, what is being done here? Like I see a bunch of abstract types and I can't even tell. An example of like, you don't know what database you're going to use. Another, you can use the injection thing, but it kind of ripples through a whole bunch of layers of code is that is the part that i don't like whereas um another way to do this is to to kind of bypass all of the middle stuff and at a top level have and like flask i think flask does this sort of a
Starting point is 00:14:59 thing and a lot this is a common design is to define, instantiate the real objects at an application level and just set those where they need to be set. So there's like a whatever the real database is. Right. Go look up the service for the database and everybody can ask that thing to give it the database. Right. And then everybody just uses the same interface and we don't need to pass it through all the levels of constructors and stuff it can just kind of bypass all of that i guess then because that's how i generally do things and then for testing yeah i'm okay with patching and monkey patching and stuff like that so i hear you i think in python it is certainly something that's open for more for debate because we do have these alternative ways to accomplish the same thing
Starting point is 00:15:45 like monkey patching now i don't know i'm kind of a fan of the open close principle in general but i do think it can just become like too much when you put it all together and certainly i've worked on some applications that did this all over the place and it was some of the most frustrating code i've ever had to like work through because it was just like every step you're like i have there's four things working together and i don't know what any of them are right now because of some configuration setting somewhere other than that so i i don't know i'm kind of uh i'm on the fence like some parts of this i think are cool and some i think can go too far but i guess you know check out haps if this kind of stuff is interesting to you it is it is pretty well done i think that one of the places for it is if people are really used to using this kind of a model
Starting point is 00:16:29 and then coming to Python, yeah, you can do it here too. It's just I'm not sure I'm there. Yeah, I think there's simpler things than IOC containers, but this podcast is probably a little short if we're going into them. But it's certainly an interesting thing to think about, and here's a bunch of options. Yep. Cool. You know, after all that, Brian, I feel like I just need something gentle, like a gentle conversation about like a soft, fuzzy animal. Yeah. Like a gentle introduction to pandas. Yes. Well, maybe not an animal, but yeah, something gentle. Tell us about pandas. So this is another kind of a newbie thing, but we're starting to use Pandas DataFrames at work.
Starting point is 00:17:05 And I really kind of needed a pretend I'm just starting out, which I am, and kind of tell me how these things work. And so it's called a gentle introduction to Pandas, but it's really a gentle introduction to the data structures series and DataFr the series are interesting i think it's just a precursor to try to jump you into data frames that's where the real fun gets starts to happen goes through about a half an article talking about arrays series how do you create series from arrays and dictionaries and and i didn't know you could create a a series from uh from just a scaler and give it a bunch a different index and it'll like fill it in that's pretty cool oh that is cool yeah but then it jumps into
Starting point is 00:17:50 data frames and then talks about sorting and slicing and how do you select things by label or position and then uh what one of the things and how easy it is to get the statistics on columns and then how to get things in and out of data frames. So importing and exporting. And then where you take it from there depends on your problem space, but this is kind of a really good why do we call these things data frames
Starting point is 00:18:15 and why do we care about them? If you need to understand them, this is a decent article. Yeah, if you need to understand them, 15 minutes. This is kind of a no-fluff keep it simple one. Nice little article by Wilson Busaca. Well done. Let's see. Medium tells me it's a five-minute read, but I bet Medium's not taking into account the code. So 15 minutes, how about that? Yeah, I think so. Right. So this last one I have for you, Brian, I think it's going to be a little bit of a shock. It'll come across a little bit weird at first,
Starting point is 00:18:46 but the more you look at it, the more it starts to sound appealing, let's say. Yeah. All right, so I'm going to give you some advice. I'm going to tell you a bit about it. So the advice, you know, you also get all sorts of advice, like don't format your code like this. Don't have a bunch of multiple,
Starting point is 00:19:02 if this is equal to this value and that value and that value, maybe do an in test. So there's like sort of Pythonic ways to do conditionals and whatnot. The advice here is to never, not almost never, says don't use the greater than sign in programming. Yeah. It's crazy, right? It seems like kind of a bold statement. I'm like, well, we have it. It must be useful somewhere. It must be useful. And why would we not want to use it? So this is an article by a good friend of mine, Llewellyn Falco, who I've known for a long time, but someone else sent me this article,
Starting point is 00:19:34 which I thought was a pretty interesting coincidence. And Llewellyn has a really interesting way of like looking at straightforward stuff and then just getting it down to its essence. So he says like, let's look at this problem. Let just getting it down to its essence. So he says, like, let's look at this problem. Let's suppose I want to check whether a number, let's call it x, it's a variable, is between 5 and 10. There are a lot of ways that we can do this. We could say x greater than 5 and 10 greater than x, or we could say x greater than 5 and x less than 10, right? Those are equivalent. But why should you choose one over the other? Well, he lists off these six different
Starting point is 00:20:14 ways of doing this. He says, actually, here's all the ways. Oh, no, wait, look, one of them is wrong. Go back and figure out which one is wrong. And it's like not very obvious. You know, you kind of got to go through and think through every little bit. Right? So this is look, if you remove the greater than sign, there's actually only two ways to say this x less than 10 and five less than x, which is kind of weird, or five less less than x and x less than 10. So in that last one, it's cool, because the variable you're trying to test between five and 10 is literally between the five and the 10. And that statement one, it's cool because the variable you're trying to test between 5 and 10 is literally between the 5 and the 10 in that statement. It's in text, it's between, and it's actually between. Yeah.
Starting point is 00:20:51 So here you can test this containment interval bit completely with no greater than. That's how I code. I think of, especially with numbers, I think that all of the comparisons need to kind of be on the number line. Yes. You can think about them easier. I've never really seen it as put in place as a rule, kind of a rule of thumb of just don't use the greater than sign. Yeah, it's really interesting. And this analogy back to the number line is perfect because it's like, well, where do you want the variable to be relative to this?
Starting point is 00:21:22 So if you want it to be between, then as you say, like five x x less than 10 right so it's between if you want to test that it's outside there you could do the same thing x less than five or 10 less than x and you put the variable outside the numbers right so you can do this number line sort of relative bit with both you know and and or and containment and not contained in and things like that. We'd kind of be remiss if we didn't mention that this article is referencing all programming languages. If you're doing Python, of course, you would just say five is less than X is less than 10. You don't need the and. Nice. And also somebody said, okay, I'm all for, I follow you on this. This is great, and I'm with you,
Starting point is 00:22:05 but how do you say I would like all the numbers greater than one without the greater than sign? And so the answer is, of course, one less than X. Yeah, there's times where it's a little, that's why it's not, it's more of a rule of thumb, I think, because there's times where it just doesn't look right and you have to go for maintenance. more of a rule of thumb, I think, because there's times where it just doesn't look right, and you have to go for maintenance. If it just looks weird, then change it.
Starting point is 00:22:34 I brought this in because I thought it was interesting. And when I first read it, I'm like, well, that's dumb advice. What is this? And I read it, I'm like, actually, no, this makes a lot of sense a lot of the time. But I agree, if you have one thing, you want to say x is greater than one, don't twist around so you don't have you have one thing, you want to say X is greater than one, you know, don't twist around. So you don't have to have the greater than sign, just like a, say the most straightforward thing. But if you're doing more complicated comparisons, then I think it's, it's worthwhile. Yeah. Like I would say like, like for instance, a series of if clauses, if you have a, and you're not really testing both ends, if you're doing like if X is greater than the max, then do something.
Starting point is 00:23:07 And if all the comparisons have X on the left, I wouldn't change it just because of this. But, you know, anyway. All right, Brian, well, that's it for all of our main topics. I got a few extras to share with everyone while we're here, just really quick and short things. And, of course, not be forgotten as our joke, but you got any extras to share with everyone?
Starting point is 00:23:23 I did mention last time that I was having some issues with PythonTesting.net. I think I mentioned that, but with SSL and stuff, but that's all resolved and fixed. So if I go over here and I pull this up in Chrome, is it going to tell me that it's secure?
Starting point is 00:23:40 It should. Nice. Yeah, testing code over SSL. Beautiful. It's still kind of a WordPress thing is what I use. And I'm not thrilled with that. So I have a side project going on to convert that to something else. But it's not urgent anymore. Yeah, that's good. Well, you'll have to give us the full report once you get it all fixed up.
Starting point is 00:23:58 Okay, so you said you got a bunch of stuff for us. I do. I'll go through them quick. First of all, there's a new Python podcast, which is pretty exciting. And this one is focused on teaching Python. And do you know what the name of it is?
Starting point is 00:24:10 I think it's probably Teaching Python. Yeah, it is. So, Teaching Python is by Kelly Paredes and Sean Tibor. Sorry about messing up the names. But they're doing a podcast.
Starting point is 00:24:21 These are two middle school teachers who are learning and teaching Python to their students and basically documenting that journey. So if you're interested in that, especially if you're a teacher or you work with kids, I think this will be really, really helpful for you.
Starting point is 00:24:34 So you can check that out. I'm about halfway through the backlog so far and I really like it so far. Yeah, they're doing a nice job. One of the things that had kept people from using GitHub for their private work was that you had to pay for private repositories on GitHub, no matter what. Yes. Right. So people would use Bitbucket because Bitbucket had
Starting point is 00:24:55 free private repositories. Well, GitHub decided we're also going to have free private repositories. So if you're working on projects that they have to stay private or you just want them private, you can now use GitHub without paying anything. There's been some weird reactions to it, but they're just sort of following the model of Bitbucket and GitLab now, so I don't think there's anything weird
Starting point is 00:25:17 going on. Exactly. Competition is a good thing, and here we have it. It's not entirely free. It's not like GitHub decided they're not going to make money anymore you can only have three contributors to the private repository and so there's limits and things like that but still pretty cool for most things yeah right also very quickly some early details about europython are available and it's looking pretty sweet i'd love to go i don't know if i'll be able to. Yeah, me too. Yeah, so they just announced EuroPython. It's going to be in Basel, Switzerland, July 8th to 14th.
Starting point is 00:25:50 And it looks great. So I put a link to the conference site there. I don't think they have call for papers or anything like that out yet, but it should be out pretty soon. Another thing that has been lacking in the world is good data center support in Africa. So I know this because I use AWS to deliver the video course content, like actual the videos. And I have streaming servers all over the place, like in Brazil or Mumbai or whatever, but there's just no way to
Starting point is 00:26:22 do that in Africa. So the big news is there's an AWS data center coming to South Africa, which is pretty cool for anyone that wants to be closer to that part of the world. And finally, Pandas is dropping legacy support. No more Python 2 in Pandas. Oh, cool. Yeah, and that's coming out like this month. So it should be good. Yeah, this is the year that a ton of projects are dropping Python 2.
Starting point is 00:26:47 Yeah, for sure. So one more major thing. We already covered how cool Pandas is. It's not going to support legacy Python anymore. All right. You ready for the joke? Yeah. Can I click on it now?
Starting point is 00:26:59 You can click on it. This is a visual one, but I can describe it to you folks. Now, I just got to do a quick little bit of history here for people who maybe have not seen harry potter so this is the harry potter joke and there's a point in the harry potter movie i think this might be the first one where harry potter has to get on this like long table and is battling i don't know someone something and all the other students are standing around and somebody like conjures a snake a serpent and harry in the real show harry starts speaking to the thing in its native tongue which apparently is a freaky thing to do and people were all freaked out and it
Starting point is 00:27:37 was called a parcel tongue something like that right that he could speak snake so with that here's the joke so there's a picture. Harry's fighting the snake in that environment. And he says, import OS current path equals OS dot get current working to her and just start speaking out Python commands at the snake. And Hermione says, I didn't know Harry spoke Python. And Ron Weasley says, yeah, he's a parser tongue. That's terrible. It's really bad.
Starting point is 00:28:06 It's really bad. But there it is. And Nick Spirit sent that to us. So thank you, Nick, for finding that joke and letting us share it here. Yeah, very nerdy. Yep. He's a parser tongue. Well, I think we're going to leave it at that, Brian.
Starting point is 00:28:18 Thanks for being here. Yeah, thank you. Yeah, bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm.
Starting point is 00:28:32 If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.