Python Bytes - #194 Events and callbacks in the Python language!
Episode Date: August 10, 2020Topics covered in this episode: An introduction to mutation testing in Python asynq redis: Beyond the Cache LittleTable pytest-timeout Events Extras Joke See the full show notes for this episode... on the website at pythonbytes.fm/194
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 194, recorded August 5th, 2020.
I'm Brian Ocken.
And I'm Michael Kennedy.
And this episode is brought to you by us, and we'll tell you more about what we're shilling later in the day.
I want to talk to you about mutants.
Mutants? Like mutant Ninja Turtle type things things or what are we looking at here sure
mutant ninja turtles no so mutation testing so i really kind of i think in warming to mutation
testing and it's kind of a neat thing and i think we've covered it before but this article is from
mosh zadka and it's called an introduction to mutation testing in python there are a few a
handful of um i think there's like two or three different mutation testing libraries.
MUTMUT is one of them, and that's what this article uses.
And so if people are not familiar with mutation testing, here's the problem.
So you can use code coverage tools like coverage.py to show how much of your code your tests are covering.
But even if you get to 100% coverage,
it doesn't mean that you're really testing everything.
And so mutation testing, what it does is it takes your code under test
and it does some modifications.
So it modifies portions of your source code to simulate potential bugs.
Like, for example, it'll replace like greater than comparison with greater
than equal or placing it with those sorts of edge cases and stuff are often where we muck up.
If there's no boundary test around the boundary condition, you know, there'll be a problem.
So every little change is considered a mutant and it generates all these different mutants and it
does it in a fairly, fairly smart way. It can test your code fairly quickly with not too many mutants.
And then it runs your test suite on the mutant and the idea is
your test suite should kill all of the mutants. So in this article
he shows an example of three methods and one test case
and 100% code coverage. But he runs mutt-mutt
and 16 of them survive
and then talks about how to fix that.
So it's a really good, quick article.
Yeah, this is interesting.
And I like the emoji legend use for the output.
Yeah, it's a cute library.
Yeah, it is.
You know, one thing that I don't understand about mutation testing
is I understand, okay, well, we going to change like a value of a variable or like the way if we're doing a test make it it was less than
we're going to make it greater than and see if your tests still pass and like those kind of things
that totally seems reasonable but if it goes and i don't know if it does maybe you know if it goes
and like says well you're doing a print statement so we changed part of the print string. Who's testing for that, right?
That seems like it would survive.
Yeah, I'm not sure.
So it seems like there's certain things,
like I would just never care to test
for the output of the print statement
where the static string changes.
To me, that just is not something I care to test, right?
But I feel like the sort of general case
of mutation testing, you go, well, here's a piece of variable that I need to test, right? But I feel like the sort of general case of mutation testing, you go, well, here's a
piece of variable that I need to change
around. Let's change a string and see if the
test still passed. So, I don't know, maybe
it's just inappropriate for those types of
scenarios. Maybe you only test stuff
at a lower level where you don't have a bunch
of print statements. But you know, you've got logging
and all kinds of things. So, I don't know. But still,
I do like the idea. I think MuttMutt
and some of the others have ways to specify which kinds of things. So I don't know. But still, I do like the idea. I think MuttMutt and some of the others have ways to specify
which kinds of mutants to generate.
So I don't know if it does
the print statement sort of example,
but I'm sure that there's ways to say,
yeah, I don't really care about.
Don't modify string values, for instance.
Yeah, yeah.
Like don't modify constants or something, maybe.
Who knows?
Yeah.
Cool. All right. Well, you know, that looks really interesting. And Masha does a great job Yeah. So like don't modify constants or something. Maybe who knows? Yeah. Yeah.
Cool.
All right.
Well, you know, that looks really interesting.
And Masha does a great job writing up these types of things.
We feature him a lot.
Very cool.
Next up, I want to talk about asynchronous programming.
Oh, nice.
Yeah.
So we, maybe we've covered this before.
Now we've covered this a lot, but I don't believe we've covered async queue.
I don't think so.
I don't think so either.
So this is from Quora, and it is not brand new.
So I just want to be really upfront.
This has been around since 2016, but it's pretty interesting.
And the idea is so much of what asynchronous programming,
especially AsyncIO type of Async and Await programming is about
is scaling while you're
waiting scaling the latencies right so i you know like i'm going to call the stripe service and it's
going to take you know half a second to return and so i want my web server to be able to go and just
do stuff who you know other requests instead of waiting for half a second while we're checking
out some person or whatever right but they've got a different use case. What they're doing is they're running,
I don't know if I said this is from Quora, they're running Quora.com, which is a really cool Q&A
site. I actually think Quora does a great job of having solid, thoughtful answers. Not always right,
but thoughtful at least, which is pretty cool. But what they do is they don't talk directly to their
database because that would be too slow okay we started on that but what they're doing is they're
talking to memcached or which you know or redis or whatever but they're using memcached to store
a bunch of pre-computed query results so they don't have to keep going back to the database
like for example when you go view a question they want to see the names of the people who upvoted the question right so it's
kind of a complicated query right i need to go here's the ids maybe we store the ideas of the
upvoter then we're going to do a query a join over on the user table and get the their names back and
then we're going to show it like Like that sounds expensive for lots of data.
So what they do is they basically store those answers.
Like this user goes to this thing in memcached.
But a lot of the latency around this
has to do actually with the network call.
Like it's pretty close.
It's like, you know, one millisecond or something,
but they've got to go get those names over and over, right?
Because the way that you store stuff in memcached is this ID has this name.
You've got 50 upvoters.
It's like, give me the name of this person, give me the name of that person.
So there's a way to do like a, you know, a batch get like, here's all these IDs, go get
me all the associated names.
And they've got like this dependency tree of these sorts of questions they have to answer.
So what they've done is they've come up with this thing called Async Queue,
and it's all about batching asynchronous requests
and converting them from a bunch of individual calls
into one massive call.
Oh, okay.
So they can do what looks like asynchronous programming,
say, go get me all these things,
and instead of doing a bunch of individual async and await type calls,
the system looks at that and goes, okay, what that means is
turn that
into one giant query where it's like all of these ids go to all those things and then return them
back oh that's right yeah it's pretty neat so it's basically this way to write code that will
take a what would be a bunch of small independent requests and turn it into like a one-shot request
for talking to things like caching servers and whatnot yeah yeah so apparently
this is like a core component of core's architecture and yeah it's all about batching up these calls
i didn't know core was python on the back end oh yeah yeah they've got a really interesting python
blog where they like an engineering blog where they talk about all sorts of stuff so this was
like written up on their engineering blog about sort of how they went from what they were doing before this which was you'd have to like
write several functions that would prepare the things and then you could ask for it because
they would be cached locally and all sorts of funky stuff so there's a great write-up on sort
of the whole use case of this so this like i said is from 2016 so it predates async and await and so they use the yield keyword which
is a sort of a more foundational way to get to break up functions into parts that run so basically
you decorate a function and then you yield out the various steps and then it before it executes all
those it looks at them figures out what it has to do and then it like batches it up and then does
it all at once pretty wild yeah neat yeah. So I thought this was, you know, kind of interesting. I think it's a little
bit, just looking at the patterns here, I feel like it's a little tiny bit limited because it's
targeted at, I believe they're still at least then, right? So when they came up with it and
it's still active, I think they built it for running a Python 2.
Remember this 2016, they've been running for a while.
So some of their APIs, I don't think, like for example,
they don't use the async and await keyword,
I think because that didn't exist.
Like they supported Python 3.4 where async IO was,
but async and await didn't come along until just a tiny bit later,
I don't think. So anyway, a bit of a grain of salt but i think you know this will be a pretty interesting
thing that people can adopt and use for these types of scenarios certainly if it powers quora
it's probably pretty good yeah neat cool absolutely yep another thing that's cool is
talk python training thank you we got a lot of stuff going on over there. Actually, we've got a ton of new courses coming.
Course for people who generally live in Excel
and should be adopting the Python data science tools.
So that's coming really soon.
We've got a getting started with data science.
I just actually, last time we spoke, I said,
hey, I'm writing this course.
I started writing a course called Python Memory Management.
I finished it.
I recorded it.
It's like a five-hour course.
It's going to be awesome.
So now I'm on to writing a new course, Python Design Patterns.
Oh, nice.
Yeah, so that'll be out in a few weeks as well.
How about you?
Yeah, I just wanted to highlight again,
I have the URL pytestbook.com set up to go directly to Arata
because I get a lot of people asking,
hey, your pytest book, is it still good? It was
like in 2017. Still valid? Yes, it's still valid. But there's a few gotchas. And I list them out
very easy to read at PyTestbook.com. It directs you to an errata page to show you. It's just a
couple tweaks to the source code you got to make.'ve got to pin TinyDB and a couple other things.
And we'll try to get those changes out to the download link on Pragmatic as soon as possible.
That's still in the works.
But there's also a link to, if you have any issues,
there's a link to the official Pragmatic errata page where you can ask questions.
And if you haven't run into anything, I'd love to hear about it.
And I'm excited to get a lot,
lately a lot of the people that have been contacting me
said they're excited about reading the book
are machine learning people.
So it's kind of neat to see data science
and machine learning people add testing to their workflows.
That's exciting.
Absolutely.
So I have a final call to action for people out there.
If you want to make sure that we have the time and energy
to keep creating stuff like this podcast
and the other things we're doing,
you don't necessarily have to get our stuff,
but how about recommending it, right?
If your company needs to get up to speed on Python,
recommend that your company buy the courses for that team.
Or if a company is doing a bunch of testing,
have everyone on that team or the engineering group get Brian's book.
That would be great.
Yeah, and then individually, too, remind people that we do have a Patreon campaign going.
So people can contribute a buck or two a month.
That would be great.
Yeah, now that we go anywhere, we don't buy coffee.
Yeah.
Next, I want to talk, this sort of ties into your async thing.
Yeah, for sure.
That's interesting.
But they use memcached, but I wanted to talk about Redis.
So I've not used Redis myself,
but I know that a lot of people do for caching and for other things.
And so this is an article.
It's actually on the Redis site,
but it's an article called Redis Beyond the Cache in Python
by Guy Royce, I think.
I knew that Redis did more than just a cache for a back-end database,
but this is kind of neat.
So these are good, clear examples of Python code
using Redis for more than just caching.
So the first example talks about how to use it as a queue,
so you can set it up as a fast queuing system.
And apparently there's a couple calls called rpush and blpop.
And actually, to tell you the truth,
I picked this article because of blpop.
I think that's one of the best function names ever.
I don't know what it means, but maybe back of the list pop?
Not sure, but it's good.
I thought you picked it because of
the various from the code example about putting stuff into queues here felt that felt close to
home did it yeah it's about a bigfoot sightings and we've got a sighting near the columbia river
and people were chased by a tall hairy creature and so on so like asynchronously adding bigfoot sightings from the general pacific
northwest yeah that's good sorry carry on didn't mean to derail you no no it's good so uh using it
as a queue using it in a pub sub model apparently there's functions um like publish and p subscribe
so you can do publish and subscribe models data streaming using it as a search engine
the search engine seems like a little more hardcore because it looks like they're it's
almost like sql queries that they're you're using but apparently you can do that and of course you
can also use it as your primary in-memory database if you want to as long as you don't need to store
it somewhere so or use some later thing you know i guess i'm just swinging it here i don't need to store it somewhere. Or use some later thing. You know, I guess I'm just winging it here.
I don't know how you hook up a Redis database
to a normal database,
but I know you database people know how to do that.
But I probably would use it.
I like the idea of using it as a queue system
for like multi-threads and multi-processes.
That sounds kind of fun.
This is a really cool article
because I just often think of Redis as cache, right?
But yeah, there's a bunch of neat stuff here.
And so often you think like,
oh, I'll just write this cool data structure.
We'll just do this thing and it's great.
And you're like, oh wait, but hold on.
When I deploy that to the web server,
it forks off like 10 copies of micro-WSGI.
And so I'm'm gonna have like 10
separate db copies and all this like there's just certain times you're like i just need a thing to
hold this stuff and like redis seems pretty cool for that yeah and the examples use apparently
there's a bunch of different python libraries to access redis and this one uses aio redis
because there's async and await calls to access everything. Yeah, it's beautiful. It's a real nice example of async and await as well there.
Yeah.
So I'm sure, Brian, you've heard of little Bobby tables.
Yeah, of course.
I think we've brought it up on this show.
Yeah, I don't know if we've actually,
have we featured it as a proper joke?
I don't know what we have.
Nonetheless, this one is no joke.
This is just little table.
I didn't know what i was thinking
i know what i was thinking i was curious i didn't want to commit as much effort as it turned out to
be into having like a broad discussion about this but i thought okay well we have dictionaries and
so i can go and find a single key passing a certain key and then get the thing back or not
right so if i've got like i don't't know, users, I could have the user
ID. And then the user object comes back. If I index the dictionary like that. Totally simple,
right? Yeah. What if we wanted to ask that question two ways on the same data structure?
What if I wanted to say, give me the user by ID, and give me the user by email. So one possible
way, I guess you could just cram all the IDs and all the emails into the dictionary.
But then things like, you know, enumerate over dict.items breaks because you get, you know, every now and then it's integers or it's strings.
And then it's a duplicate of the users like in.items or.values.
So it's not really a great one so i
said does python have like a structure that is not a database because i do not want to do database
stuff like if i wanted to do that i would just use a database a thing that is lightweight and
memory and easy to use that lets me put something like a user in there, but then be able to ask, give me the user by ID,
give me the user by email.
That is fast, right?
So dictionaries work because they're indexed
and they're insanely like near,
you know,
a one type of performance
on getting back the content that's in there, right?
So I want to be able to do that
both with email and ID, not.
I'm going to go on this rant some more later.
I'm actually trying to pull
together all the responses i got because i got a bunch of things given back to me a lot of people
suggested pandas but i want to store non-tabular data so i'm not sure pandas which is tabular ish
makes sense nonetheless one thing i did come up with that's probably the closest to what i was
asking for without me doing any work which i'm not against
doing work but if something exists you know let me pip install it right is this thing called light
table by paul mcguire sorry not like a little table little table and it gives you a schemaless
in-memory thing that's kind of like a dictionary but gives you orm x like access to the objects okay okay so it's like think of like an
in-memory database basically that you don't have to go create table you know set column type name
this to you know varchar 16 type and like you don't have to actually define the table like
full-on database right you just say it you know, put these things in it,
like you would a dictionary, and then you can access all the elements.
What do you think?
I think I'd like to try to solve your problem also.
It's a fun programming problem, right?
But this thing is pretty cool
because it lets you do like greater than queries.
It has indexes on all of the columns
or the columns that you say you want them on.
Like all you do is say, it's like creating a dictionary and say, I'm going to put in a thing dictionary and say i'm going to put in a thing by id i'm going to put a thing by email and put in a
thing by city and i want to index for all of those so it's like dictionary like speed which is pretty
cool it even does like in memory joins and all sorts of stuff so uh yeah okay yeah and the result
of like a query can be like another little table
so i could like do a filter and select only a couple of columns and then out comes a little
baby little table a little even littler little table anyway i thought this was a pretty cool
thing because it lets you kind of do database like stuff without the effort right do it dynamically
some people said hey you should just use sql or sqlite i'm like yeah sqlite's cool but then i've got to come up with a full-on
schema for defining the thing and that gets to be a pain there's also some other options but
little table looks good yeah i'll have to get an example get your actual problem statement again
and try it but this looks looks neat. Yeah, absolutely.
Yeah, well, I'll come back to that for sure as well.
Because I got so many good recommendations and ideas that I think it's probably worth just doing a segment on that.
But little table.
Nice.
This is something I'm surprised we didn't talk about already.
Maybe we have, but I've forgotten.
PyTest Timeout.
This was a listener suggestion.
And I think it's pretty
much an essential plugin for any test suite that you're running, especially if it's not something
you're running where you're watching it. So if it's something running on a server or continuous
integration or something, or if it's a long-running test suite, it's very simple to use plugin.
And what you want to make sure is that
none of your tests run longer than a certain number of seconds. All the people out there that
are like scratching their head thinking, wow, there's a test that runs longer than a second.
Yes, there are tests that run longer than a second. Especially if they're trying to talk
to hardware or external things and that thing might not be there and it's just waiting. Yeah,
there's more to testing than unit testing.
There's also system testing.
But anyway, this one's great because
you can set up a configuration in the
config file. You can throw one number in to say
like, say you have like
five minutes or something like that
or even just down to like
three minutes. I want to make sure
nothing runs longer
than this and just to make sure that the server
doesn't just sit spinning all night long.
And then, well, let's say you even tighten it closer
to try to kill off a test
if it's running longer than a certain amount.
But there's like maybe two of your tests that are longer
or a few of them that are longer.
You can put a decorator on those particular tests
and give them more time and then the rest of them shorter. It, you can put a decorator on those particular tests and give them more time
and then the rest of them shorter. It's very easy to operate and just kind of a must-have for
long test suites. Yeah, that's super cool. Yeah, I mean, sometimes you just rather have the test
fail if it's taking way, way, way too long and you're like, I'm pretty sure this is going to
fail, but not right away. I would recommend just trying it out and kind of look at the time of your
tests and stuff and then
set it so that it actually kills one of your
tests in the middle
or stick a spin in there or something like that
just to verify it does
because it is sort of operating system dependent
and there's some configuration allowed in
the plugin to be able to use
either signals or kill commands
or process killing there's there's
different ways to um to stop a test that's going too long and that's so test it before you deploy
it but it's a good thing do a meta test yeah test of your test exactly super cool okay that's a
great one and you know use case is straightforward. I have got one for you that has got me really, really excited.
It's called events.
So in Python, we have functions as first class objects, right?
You can pass a function around super easy, right?
Like if there's some part of your program is going to run
and you want to get a function called when it's done,
you can pass that function, do its work, it can call it, right?
You have this kind of this observer style programming right yeah what requires programming on your behalf
is to have that happen for more than one thing like i would like parts of my program to subscribe
to being notified about events and one or more of them get called when this thing happens. So a friend of mine, Nicola Aroshi,
put together a really cool project called Events.
And the idea is that it adds event subscription
and callback to the Python language in a super simple way.
So go to a function that is an event.
If I want my function to be called by it,
I would say, if i want the event on change
i would say my class dot on change plus equals some function to call and if there's already one
there it's just going to add it to the list of all the functions that'll be called when that event
fires and if at some point i decide i don't want to hear about it anymore i just go to my class
dot or my object dot on change minus equals the function I want to take out of that subscription list.
And that's it.
Oh, that's neat.
Isn't that slick?
And then to call it, you just say object dot on change and you pass the arguments and then all those functions get called in order.
Oh, this is cool.
Yeah.
So it's if you have to do any sort of observer design pattern event subscription stuff like this is super super nice and it's inspired
on the c-sharp language base event keyword which is based on delegates basically function pointers
it doesn't really matter if you know about that or care about it but if you know about the c-sharp
version this basically brings that to the python language yeah i kind of want to build up a
finite state machine using this it's cool right. I mean, it could make it really readable.
Yeah.
I have a gist that I'm working on
or I have some code I'm working on.
I'll post as a gist that people can check out
that is like a lot better
than what they have in the documentation.
So the documentation takes like this raw event source
and shows you how you can subscribe
and unsubscribe to it.
But what I've got is something that's like,
here's how you have a class, right?
Like, you know, a thing on the screen and then you could have like subscribe to when the location
changes or the size changes or you know those kinds of things and it's more of uh like a natural
programming analogy so i'll put up the gist for that i'm just working on a few things to see if
i can make it even slightly better i'm seeing if i can use descriptors so that the event triggering
happens behind the scenes
without you even having to program it as well.
So like right now, from the outside,
using it is really easy,
but you do have to sort of like know
when something's changed
and then call that, raise that event.
I think I can use descriptors
to maybe make it seamless on both sides,
but I'm still playing with that.
Now, do you know if all of the events get
called by the thing changing the making the event happen yes they do yeah okay yeah yeah so they
they get called by the thing that whatever decides to raise the event that's the thing that's doing
the calling it's the events just basically manage what are the functions to be called in what order
and then like you call it and it just
like delegates onto them also you get to just arbitrarily pick the parameters that gets past
that get passed along but it seems like a good idea to say this event always takes these kinds
of arguments and whatever there's not a lot of structure there you do get the only real safety
is you can say when you create it you can say these are the only allowed events
because it's kind of just full-on dynamic programming.
But you can say these three things,
you can subscribe and unsubscribe and call.
Anything else, we're going to say it doesn't exist.
So that's pretty nice.
Yeah, yeah.
Yeah, it provides a little safety.
Cool.
Yeah.
Well, that's our six items.
Do you have any extras for us?
Not really.
I sort of talked
about it i was going to talk about it here but i talked about it in the we talked about what we
were doing how people can support us i finished the python memory management course the thing is
so cool it's a five hour course just diving into the internals of like python memory management
algorithms and what i thought i would create was something that was like understanding python
memory management but there's actually a ton of techniques I discovered that actually let you run your code
in a way that's like, well, now it uses half as much memory and it's 30% faster and stuff like
that. So I didn't think there would be a lot of actionable stuff coming out of it, but there is,
which I think is pretty cool, actually. Oh, nice. Yeah. How about you? I'm pretty excited that
PyTest 6 is out. A couple of weeks ago, we talked about the 6 being in sort of a beta release,
but it's out now.
And I wanted to mention that episode 125 of Testing Code
walks through those changes.
This is due to the miracles of time travel.
This has not been recorded yet,
but it will be recorded and released by last week.
Perfect. Time travel. I love it. You've chosen the perfect joke. So the only question I have
for you before we do the joke is, am I the school administrator IT person or am I the mom?
Oh, you be the mom.
Okay.
Okay.
So the phone rings, I pick it up.
Yeah. Hi, this is your son's school. We're having some computer trouble.
Oh, dear.
Did he break something?
In a way.
Did you really name your son Robert?
Robert, single quote, parentheses, semicolon, drop table, students, semicolon, minus, minus.
Oh, yes.
Little Bobby Tables, we call him.
Well, we've lost this year's student
records i hope you're happy and i hope you've learned to sanitize your database inputs
be on the lookout for that sequel injection baby i love it this is so good this is absolutely one
of the most classic computer jokes there is yeah i love it because it probably would actually work it reminds me of the guy who said that his uh
he got his license plate to be the the characters n-u-l-l null yeah i heard about that yeah and he
ended up getting all the like automated you know you drove through a traffic light sort of thing
tickets for all the records that were null. Yeah. Anytime they didn't have data, it went to him.
Any police officer that forgot to enter the license plate, it would go to him.
He thought he would get out of it because they wouldn't be able to send it to him.
But oh, no.
That's hilarious.
Awesome.
Awesome.
All right.
Well, great to chat with you as always.
All right.
You too.
Bye.
Bye.
Thank you for listening to Python Bytes. Follow the show on Twitter at Python Bytes.
That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm.
If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
This is Brian Ocken, and on behalf of myself and Michael Kennedy,
thank you for listening and sharing this podcast with your friends and colleagues.