Python Bytes - #194 Events and callbacks in the Python language!

Episode Date: August 10, 2020

Topics covered in this episode: An introduction to mutation testing in Python asynq redis: Beyond the Cache LittleTable pytest-timeout Events Extras Joke See the full show notes for this episode... on the website at pythonbytes.fm/194

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 194, recorded August 5th, 2020. I'm Brian Ocken. And I'm Michael Kennedy. And this episode is brought to you by us, and we'll tell you more about what we're shilling later in the day. I want to talk to you about mutants. Mutants? Like mutant Ninja Turtle type things things or what are we looking at here sure mutant ninja turtles no so mutation testing so i really kind of i think in warming to mutation
Starting point is 00:00:33 testing and it's kind of a neat thing and i think we've covered it before but this article is from mosh zadka and it's called an introduction to mutation testing in python there are a few a handful of um i think there's like two or three different mutation testing libraries. MUTMUT is one of them, and that's what this article uses. And so if people are not familiar with mutation testing, here's the problem. So you can use code coverage tools like coverage.py to show how much of your code your tests are covering. But even if you get to 100% coverage, it doesn't mean that you're really testing everything.
Starting point is 00:01:10 And so mutation testing, what it does is it takes your code under test and it does some modifications. So it modifies portions of your source code to simulate potential bugs. Like, for example, it'll replace like greater than comparison with greater than equal or placing it with those sorts of edge cases and stuff are often where we muck up. If there's no boundary test around the boundary condition, you know, there'll be a problem. So every little change is considered a mutant and it generates all these different mutants and it does it in a fairly, fairly smart way. It can test your code fairly quickly with not too many mutants.
Starting point is 00:01:47 And then it runs your test suite on the mutant and the idea is your test suite should kill all of the mutants. So in this article he shows an example of three methods and one test case and 100% code coverage. But he runs mutt-mutt and 16 of them survive and then talks about how to fix that. So it's a really good, quick article. Yeah, this is interesting.
Starting point is 00:02:12 And I like the emoji legend use for the output. Yeah, it's a cute library. Yeah, it is. You know, one thing that I don't understand about mutation testing is I understand, okay, well, we going to change like a value of a variable or like the way if we're doing a test make it it was less than we're going to make it greater than and see if your tests still pass and like those kind of things that totally seems reasonable but if it goes and i don't know if it does maybe you know if it goes and like says well you're doing a print statement so we changed part of the print string. Who's testing for that, right?
Starting point is 00:02:47 That seems like it would survive. Yeah, I'm not sure. So it seems like there's certain things, like I would just never care to test for the output of the print statement where the static string changes. To me, that just is not something I care to test, right? But I feel like the sort of general case
Starting point is 00:03:04 of mutation testing, you go, well, here's a piece of variable that I need to test, right? But I feel like the sort of general case of mutation testing, you go, well, here's a piece of variable that I need to change around. Let's change a string and see if the test still passed. So, I don't know, maybe it's just inappropriate for those types of scenarios. Maybe you only test stuff at a lower level where you don't have a bunch of print statements. But you know, you've got logging
Starting point is 00:03:19 and all kinds of things. So, I don't know. But still, I do like the idea. I think MuttMutt and some of the others have ways to specify which kinds of things. So I don't know. But still, I do like the idea. I think MuttMutt and some of the others have ways to specify which kinds of mutants to generate. So I don't know if it does the print statement sort of example, but I'm sure that there's ways to say, yeah, I don't really care about.
Starting point is 00:03:37 Don't modify string values, for instance. Yeah, yeah. Like don't modify constants or something, maybe. Who knows? Yeah. Cool. All right. Well, you know, that looks really interesting. And Masha does a great job Yeah. So like don't modify constants or something. Maybe who knows? Yeah. Yeah. Cool. All right.
Starting point is 00:03:46 Well, you know, that looks really interesting. And Masha does a great job writing up these types of things. We feature him a lot. Very cool. Next up, I want to talk about asynchronous programming. Oh, nice. Yeah. So we, maybe we've covered this before.
Starting point is 00:03:59 Now we've covered this a lot, but I don't believe we've covered async queue. I don't think so. I don't think so either. So this is from Quora, and it is not brand new. So I just want to be really upfront. This has been around since 2016, but it's pretty interesting. And the idea is so much of what asynchronous programming, especially AsyncIO type of Async and Await programming is about
Starting point is 00:04:24 is scaling while you're waiting scaling the latencies right so i you know like i'm going to call the stripe service and it's going to take you know half a second to return and so i want my web server to be able to go and just do stuff who you know other requests instead of waiting for half a second while we're checking out some person or whatever right but they've got a different use case. What they're doing is they're running, I don't know if I said this is from Quora, they're running Quora.com, which is a really cool Q&A site. I actually think Quora does a great job of having solid, thoughtful answers. Not always right, but thoughtful at least, which is pretty cool. But what they do is they don't talk directly to their
Starting point is 00:05:05 database because that would be too slow okay we started on that but what they're doing is they're talking to memcached or which you know or redis or whatever but they're using memcached to store a bunch of pre-computed query results so they don't have to keep going back to the database like for example when you go view a question they want to see the names of the people who upvoted the question right so it's kind of a complicated query right i need to go here's the ids maybe we store the ideas of the upvoter then we're going to do a query a join over on the user table and get the their names back and then we're going to show it like Like that sounds expensive for lots of data. So what they do is they basically store those answers.
Starting point is 00:05:48 Like this user goes to this thing in memcached. But a lot of the latency around this has to do actually with the network call. Like it's pretty close. It's like, you know, one millisecond or something, but they've got to go get those names over and over, right? Because the way that you store stuff in memcached is this ID has this name. You've got 50 upvoters.
Starting point is 00:06:09 It's like, give me the name of this person, give me the name of that person. So there's a way to do like a, you know, a batch get like, here's all these IDs, go get me all the associated names. And they've got like this dependency tree of these sorts of questions they have to answer. So what they've done is they've come up with this thing called Async Queue, and it's all about batching asynchronous requests and converting them from a bunch of individual calls into one massive call.
Starting point is 00:06:32 Oh, okay. So they can do what looks like asynchronous programming, say, go get me all these things, and instead of doing a bunch of individual async and await type calls, the system looks at that and goes, okay, what that means is turn that into one giant query where it's like all of these ids go to all those things and then return them back oh that's right yeah it's pretty neat so it's basically this way to write code that will
Starting point is 00:06:55 take a what would be a bunch of small independent requests and turn it into like a one-shot request for talking to things like caching servers and whatnot yeah yeah so apparently this is like a core component of core's architecture and yeah it's all about batching up these calls i didn't know core was python on the back end oh yeah yeah they've got a really interesting python blog where they like an engineering blog where they talk about all sorts of stuff so this was like written up on their engineering blog about sort of how they went from what they were doing before this which was you'd have to like write several functions that would prepare the things and then you could ask for it because they would be cached locally and all sorts of funky stuff so there's a great write-up on sort
Starting point is 00:07:36 of the whole use case of this so this like i said is from 2016 so it predates async and await and so they use the yield keyword which is a sort of a more foundational way to get to break up functions into parts that run so basically you decorate a function and then you yield out the various steps and then it before it executes all those it looks at them figures out what it has to do and then it like batches it up and then does it all at once pretty wild yeah neat yeah. So I thought this was, you know, kind of interesting. I think it's a little bit, just looking at the patterns here, I feel like it's a little tiny bit limited because it's targeted at, I believe they're still at least then, right? So when they came up with it and it's still active, I think they built it for running a Python 2.
Starting point is 00:08:26 Remember this 2016, they've been running for a while. So some of their APIs, I don't think, like for example, they don't use the async and await keyword, I think because that didn't exist. Like they supported Python 3.4 where async IO was, but async and await didn't come along until just a tiny bit later, I don't think. So anyway, a bit of a grain of salt but i think you know this will be a pretty interesting thing that people can adopt and use for these types of scenarios certainly if it powers quora
Starting point is 00:08:55 it's probably pretty good yeah neat cool absolutely yep another thing that's cool is talk python training thank you we got a lot of stuff going on over there. Actually, we've got a ton of new courses coming. Course for people who generally live in Excel and should be adopting the Python data science tools. So that's coming really soon. We've got a getting started with data science. I just actually, last time we spoke, I said, hey, I'm writing this course.
Starting point is 00:09:18 I started writing a course called Python Memory Management. I finished it. I recorded it. It's like a five-hour course. It's going to be awesome. So now I'm on to writing a new course, Python Design Patterns. Oh, nice. Yeah, so that'll be out in a few weeks as well.
Starting point is 00:09:31 How about you? Yeah, I just wanted to highlight again, I have the URL pytestbook.com set up to go directly to Arata because I get a lot of people asking, hey, your pytest book, is it still good? It was like in 2017. Still valid? Yes, it's still valid. But there's a few gotchas. And I list them out very easy to read at PyTestbook.com. It directs you to an errata page to show you. It's just a couple tweaks to the source code you got to make.'ve got to pin TinyDB and a couple other things.
Starting point is 00:10:05 And we'll try to get those changes out to the download link on Pragmatic as soon as possible. That's still in the works. But there's also a link to, if you have any issues, there's a link to the official Pragmatic errata page where you can ask questions. And if you haven't run into anything, I'd love to hear about it. And I'm excited to get a lot, lately a lot of the people that have been contacting me said they're excited about reading the book
Starting point is 00:10:31 are machine learning people. So it's kind of neat to see data science and machine learning people add testing to their workflows. That's exciting. Absolutely. So I have a final call to action for people out there. If you want to make sure that we have the time and energy to keep creating stuff like this podcast
Starting point is 00:10:46 and the other things we're doing, you don't necessarily have to get our stuff, but how about recommending it, right? If your company needs to get up to speed on Python, recommend that your company buy the courses for that team. Or if a company is doing a bunch of testing, have everyone on that team or the engineering group get Brian's book. That would be great.
Starting point is 00:11:07 Yeah, and then individually, too, remind people that we do have a Patreon campaign going. So people can contribute a buck or two a month. That would be great. Yeah, now that we go anywhere, we don't buy coffee. Yeah. Next, I want to talk, this sort of ties into your async thing. Yeah, for sure. That's interesting.
Starting point is 00:11:26 But they use memcached, but I wanted to talk about Redis. So I've not used Redis myself, but I know that a lot of people do for caching and for other things. And so this is an article. It's actually on the Redis site, but it's an article called Redis Beyond the Cache in Python by Guy Royce, I think. I knew that Redis did more than just a cache for a back-end database,
Starting point is 00:11:54 but this is kind of neat. So these are good, clear examples of Python code using Redis for more than just caching. So the first example talks about how to use it as a queue, so you can set it up as a fast queuing system. And apparently there's a couple calls called rpush and blpop. And actually, to tell you the truth, I picked this article because of blpop.
Starting point is 00:12:16 I think that's one of the best function names ever. I don't know what it means, but maybe back of the list pop? Not sure, but it's good. I thought you picked it because of the various from the code example about putting stuff into queues here felt that felt close to home did it yeah it's about a bigfoot sightings and we've got a sighting near the columbia river and people were chased by a tall hairy creature and so on so like asynchronously adding bigfoot sightings from the general pacific northwest yeah that's good sorry carry on didn't mean to derail you no no it's good so uh using it
Starting point is 00:12:54 as a queue using it in a pub sub model apparently there's functions um like publish and p subscribe so you can do publish and subscribe models data streaming using it as a search engine the search engine seems like a little more hardcore because it looks like they're it's almost like sql queries that they're you're using but apparently you can do that and of course you can also use it as your primary in-memory database if you want to as long as you don't need to store it somewhere so or use some later thing you know i guess i'm just swinging it here i don't need to store it somewhere. Or use some later thing. You know, I guess I'm just winging it here. I don't know how you hook up a Redis database to a normal database,
Starting point is 00:13:31 but I know you database people know how to do that. But I probably would use it. I like the idea of using it as a queue system for like multi-threads and multi-processes. That sounds kind of fun. This is a really cool article because I just often think of Redis as cache, right? But yeah, there's a bunch of neat stuff here.
Starting point is 00:13:50 And so often you think like, oh, I'll just write this cool data structure. We'll just do this thing and it's great. And you're like, oh wait, but hold on. When I deploy that to the web server, it forks off like 10 copies of micro-WSGI. And so I'm'm gonna have like 10 separate db copies and all this like there's just certain times you're like i just need a thing to
Starting point is 00:14:10 hold this stuff and like redis seems pretty cool for that yeah and the examples use apparently there's a bunch of different python libraries to access redis and this one uses aio redis because there's async and await calls to access everything. Yeah, it's beautiful. It's a real nice example of async and await as well there. Yeah. So I'm sure, Brian, you've heard of little Bobby tables. Yeah, of course. I think we've brought it up on this show. Yeah, I don't know if we've actually,
Starting point is 00:14:36 have we featured it as a proper joke? I don't know what we have. Nonetheless, this one is no joke. This is just little table. I didn't know what i was thinking i know what i was thinking i was curious i didn't want to commit as much effort as it turned out to be into having like a broad discussion about this but i thought okay well we have dictionaries and so i can go and find a single key passing a certain key and then get the thing back or not
Starting point is 00:15:01 right so if i've got like i don't't know, users, I could have the user ID. And then the user object comes back. If I index the dictionary like that. Totally simple, right? Yeah. What if we wanted to ask that question two ways on the same data structure? What if I wanted to say, give me the user by ID, and give me the user by email. So one possible way, I guess you could just cram all the IDs and all the emails into the dictionary. But then things like, you know, enumerate over dict.items breaks because you get, you know, every now and then it's integers or it's strings. And then it's a duplicate of the users like in.items or.values. So it's not really a great one so i
Starting point is 00:15:45 said does python have like a structure that is not a database because i do not want to do database stuff like if i wanted to do that i would just use a database a thing that is lightweight and memory and easy to use that lets me put something like a user in there, but then be able to ask, give me the user by ID, give me the user by email. That is fast, right? So dictionaries work because they're indexed and they're insanely like near, you know,
Starting point is 00:16:14 a one type of performance on getting back the content that's in there, right? So I want to be able to do that both with email and ID, not. I'm going to go on this rant some more later. I'm actually trying to pull together all the responses i got because i got a bunch of things given back to me a lot of people suggested pandas but i want to store non-tabular data so i'm not sure pandas which is tabular ish
Starting point is 00:16:36 makes sense nonetheless one thing i did come up with that's probably the closest to what i was asking for without me doing any work which i'm not against doing work but if something exists you know let me pip install it right is this thing called light table by paul mcguire sorry not like a little table little table and it gives you a schemaless in-memory thing that's kind of like a dictionary but gives you orm x like access to the objects okay okay so it's like think of like an in-memory database basically that you don't have to go create table you know set column type name this to you know varchar 16 type and like you don't have to actually define the table like full-on database right you just say it you know, put these things in it,
Starting point is 00:17:25 like you would a dictionary, and then you can access all the elements. What do you think? I think I'd like to try to solve your problem also. It's a fun programming problem, right? But this thing is pretty cool because it lets you do like greater than queries. It has indexes on all of the columns or the columns that you say you want them on.
Starting point is 00:17:42 Like all you do is say, it's like creating a dictionary and say, I'm going to put in a thing dictionary and say i'm going to put in a thing by id i'm going to put a thing by email and put in a thing by city and i want to index for all of those so it's like dictionary like speed which is pretty cool it even does like in memory joins and all sorts of stuff so uh yeah okay yeah and the result of like a query can be like another little table so i could like do a filter and select only a couple of columns and then out comes a little baby little table a little even littler little table anyway i thought this was a pretty cool thing because it lets you kind of do database like stuff without the effort right do it dynamically some people said hey you should just use sql or sqlite i'm like yeah sqlite's cool but then i've got to come up with a full-on
Starting point is 00:18:30 schema for defining the thing and that gets to be a pain there's also some other options but little table looks good yeah i'll have to get an example get your actual problem statement again and try it but this looks looks neat. Yeah, absolutely. Yeah, well, I'll come back to that for sure as well. Because I got so many good recommendations and ideas that I think it's probably worth just doing a segment on that. But little table. Nice. This is something I'm surprised we didn't talk about already.
Starting point is 00:18:58 Maybe we have, but I've forgotten. PyTest Timeout. This was a listener suggestion. And I think it's pretty much an essential plugin for any test suite that you're running, especially if it's not something you're running where you're watching it. So if it's something running on a server or continuous integration or something, or if it's a long-running test suite, it's very simple to use plugin. And what you want to make sure is that
Starting point is 00:19:26 none of your tests run longer than a certain number of seconds. All the people out there that are like scratching their head thinking, wow, there's a test that runs longer than a second. Yes, there are tests that run longer than a second. Especially if they're trying to talk to hardware or external things and that thing might not be there and it's just waiting. Yeah, there's more to testing than unit testing. There's also system testing. But anyway, this one's great because you can set up a configuration in the
Starting point is 00:19:52 config file. You can throw one number in to say like, say you have like five minutes or something like that or even just down to like three minutes. I want to make sure nothing runs longer than this and just to make sure that the server doesn't just sit spinning all night long.
Starting point is 00:20:09 And then, well, let's say you even tighten it closer to try to kill off a test if it's running longer than a certain amount. But there's like maybe two of your tests that are longer or a few of them that are longer. You can put a decorator on those particular tests and give them more time and then the rest of them shorter. It, you can put a decorator on those particular tests and give them more time and then the rest of them shorter. It's very easy to operate and just kind of a must-have for
Starting point is 00:20:31 long test suites. Yeah, that's super cool. Yeah, I mean, sometimes you just rather have the test fail if it's taking way, way, way too long and you're like, I'm pretty sure this is going to fail, but not right away. I would recommend just trying it out and kind of look at the time of your tests and stuff and then set it so that it actually kills one of your tests in the middle or stick a spin in there or something like that just to verify it does
Starting point is 00:20:55 because it is sort of operating system dependent and there's some configuration allowed in the plugin to be able to use either signals or kill commands or process killing there's there's different ways to um to stop a test that's going too long and that's so test it before you deploy it but it's a good thing do a meta test yeah test of your test exactly super cool okay that's a great one and you know use case is straightforward. I have got one for you that has got me really, really excited.
Starting point is 00:21:26 It's called events. So in Python, we have functions as first class objects, right? You can pass a function around super easy, right? Like if there's some part of your program is going to run and you want to get a function called when it's done, you can pass that function, do its work, it can call it, right? You have this kind of this observer style programming right yeah what requires programming on your behalf is to have that happen for more than one thing like i would like parts of my program to subscribe
Starting point is 00:21:57 to being notified about events and one or more of them get called when this thing happens. So a friend of mine, Nicola Aroshi, put together a really cool project called Events. And the idea is that it adds event subscription and callback to the Python language in a super simple way. So go to a function that is an event. If I want my function to be called by it, I would say, if i want the event on change i would say my class dot on change plus equals some function to call and if there's already one
Starting point is 00:22:33 there it's just going to add it to the list of all the functions that'll be called when that event fires and if at some point i decide i don't want to hear about it anymore i just go to my class dot or my object dot on change minus equals the function I want to take out of that subscription list. And that's it. Oh, that's neat. Isn't that slick? And then to call it, you just say object dot on change and you pass the arguments and then all those functions get called in order. Oh, this is cool.
Starting point is 00:22:57 Yeah. So it's if you have to do any sort of observer design pattern event subscription stuff like this is super super nice and it's inspired on the c-sharp language base event keyword which is based on delegates basically function pointers it doesn't really matter if you know about that or care about it but if you know about the c-sharp version this basically brings that to the python language yeah i kind of want to build up a finite state machine using this it's cool right. I mean, it could make it really readable. Yeah. I have a gist that I'm working on
Starting point is 00:23:28 or I have some code I'm working on. I'll post as a gist that people can check out that is like a lot better than what they have in the documentation. So the documentation takes like this raw event source and shows you how you can subscribe and unsubscribe to it. But what I've got is something that's like,
Starting point is 00:23:42 here's how you have a class, right? Like, you know, a thing on the screen and then you could have like subscribe to when the location changes or the size changes or you know those kinds of things and it's more of uh like a natural programming analogy so i'll put up the gist for that i'm just working on a few things to see if i can make it even slightly better i'm seeing if i can use descriptors so that the event triggering happens behind the scenes without you even having to program it as well. So like right now, from the outside,
Starting point is 00:24:10 using it is really easy, but you do have to sort of like know when something's changed and then call that, raise that event. I think I can use descriptors to maybe make it seamless on both sides, but I'm still playing with that. Now, do you know if all of the events get
Starting point is 00:24:25 called by the thing changing the making the event happen yes they do yeah okay yeah yeah so they they get called by the thing that whatever decides to raise the event that's the thing that's doing the calling it's the events just basically manage what are the functions to be called in what order and then like you call it and it just like delegates onto them also you get to just arbitrarily pick the parameters that gets past that get passed along but it seems like a good idea to say this event always takes these kinds of arguments and whatever there's not a lot of structure there you do get the only real safety is you can say when you create it you can say these are the only allowed events
Starting point is 00:25:05 because it's kind of just full-on dynamic programming. But you can say these three things, you can subscribe and unsubscribe and call. Anything else, we're going to say it doesn't exist. So that's pretty nice. Yeah, yeah. Yeah, it provides a little safety. Cool.
Starting point is 00:25:20 Yeah. Well, that's our six items. Do you have any extras for us? Not really. I sort of talked about it i was going to talk about it here but i talked about it in the we talked about what we were doing how people can support us i finished the python memory management course the thing is so cool it's a five hour course just diving into the internals of like python memory management
Starting point is 00:25:38 algorithms and what i thought i would create was something that was like understanding python memory management but there's actually a ton of techniques I discovered that actually let you run your code in a way that's like, well, now it uses half as much memory and it's 30% faster and stuff like that. So I didn't think there would be a lot of actionable stuff coming out of it, but there is, which I think is pretty cool, actually. Oh, nice. Yeah. How about you? I'm pretty excited that PyTest 6 is out. A couple of weeks ago, we talked about the 6 being in sort of a beta release, but it's out now. And I wanted to mention that episode 125 of Testing Code
Starting point is 00:26:15 walks through those changes. This is due to the miracles of time travel. This has not been recorded yet, but it will be recorded and released by last week. Perfect. Time travel. I love it. You've chosen the perfect joke. So the only question I have for you before we do the joke is, am I the school administrator IT person or am I the mom? Oh, you be the mom. Okay.
Starting point is 00:26:40 Okay. So the phone rings, I pick it up. Yeah. Hi, this is your son's school. We're having some computer trouble. Oh, dear. Did he break something? In a way. Did you really name your son Robert? Robert, single quote, parentheses, semicolon, drop table, students, semicolon, minus, minus.
Starting point is 00:27:01 Oh, yes. Little Bobby Tables, we call him. Well, we've lost this year's student records i hope you're happy and i hope you've learned to sanitize your database inputs be on the lookout for that sequel injection baby i love it this is so good this is absolutely one of the most classic computer jokes there is yeah i love it because it probably would actually work it reminds me of the guy who said that his uh he got his license plate to be the the characters n-u-l-l null yeah i heard about that yeah and he ended up getting all the like automated you know you drove through a traffic light sort of thing
Starting point is 00:27:40 tickets for all the records that were null. Yeah. Anytime they didn't have data, it went to him. Any police officer that forgot to enter the license plate, it would go to him. He thought he would get out of it because they wouldn't be able to send it to him. But oh, no. That's hilarious. Awesome. Awesome. All right.
Starting point is 00:28:01 Well, great to chat with you as always. All right. You too. Bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
Starting point is 00:28:19 We're always on the lookout for sharing something cool. This is Brian Ocken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.