Python Bytes - #73 This podcast comes in any color you want, as long as it's black

Episode Date: April 12, 2018

Topics covered in this episode: Set Theory and Python Trio: async programming for humans and snake people black: The uncompromising Python code formatter gain: Web crawling framework based on async...io Generic Function in Python with Singledispatch Unsync: Unsynchronizing async/await in Python 3.6 Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/73

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 73, recorded April 4th, 2018. I'm Michael Kennedy. And I'm Brian Ocken. And this episode is brought to you by Datadog. Check them out at pythonbytes.fm slash datadog. I'll tell you more about them later. Right now, I want to relive my math education, Brian, and my set theory,
Starting point is 00:00:27 and all these sorts of fun number theory things. Right. So there's an article called, I think I truncated it, Set Theory and Python Tips and Tricks. Actually, that's one of the classes I loved. I loved the discrete math class I had in college that was early in my computer science days to talk and learn about set theory. And then you use it a lot. It's a useful tool, but there's a lot of people that come to Python that haven't taken that or are really not sure what set theory is. It's really not that complicated. And this is a good introduction article on really what set theory is or some of the set theory concepts and then how to do that in Python, including things like checking to see if something's in a set
Starting point is 00:01:09 and unions and intersections and differences and things. And it's just a lot of fun. Yeah, I think this is awesome. And this is one of those things where the right data structure can just make all the difference in terms of performance and simple code, right? Like if you try to model this with lists, which you totally could do, then you're writing, basically implementing,
Starting point is 00:01:31 is this set contained within that set? What is the intersection? The actual performance of checking if something's in there, making sure you don't get duplicates, all that stuff that just falls apart. But if you use sets, well, then it flies, right? And anytime you need something distinct, all right, I want a distinct number of IDs or usernames or emails or whatever where there might be duplication. Set. It's all about the set.
Starting point is 00:01:52 Right. And, yeah, that's the thing to take away from if you're not familiar with sets. Sets are a thing. They're a container that don't have any, that only have unique elements in it. So if you're adding words, you're going through an article and you add words to a set, if the word the is already in there, it's only going to be added once. Right. If you had a list that contained every word in order of a play or something, you threw it into a set, that would just give you all the words used distinctly.
Starting point is 00:02:21 It's a simple data structure built into Python Python and it's a good thing to know how to use. Yeah, I love it. And it's super, super fast for containment, a little bit like dictionaries in that sense. So nice one. I'm glad you found that. The syntax for how to do math, set theory math is not obvious. It's not complicated, but it's something that's good to review so that you know what those are. Yeah, Very, very nice. So the next thing that I want to talk about, in fact, you're going to notice a little bit of a trend this week, Brian. I'm on some kind of like rant on async programming this week. So all three of my items have to do with async programming in one way or another, generally each one of them and like improving what already exists.
Starting point is 00:03:03 Oh, great. generally each one of them and like improving what already exists oh great yeah so the first one is called trio t-r-i-o and it says async programming for humans and snake people so it's it's interesting right like in in python 3 and 3 4 we got async io and 3 5 we got async and await the keywords that built upon the async io foundation which is really coroutines and functions and stuff like that. And so this guy who created this is like, this thing already exists, but the API for it's really crummy. And you'll hear this as like a theme over and over with slightly different takes. But so like, why does this exist? Right? So it says basically, the the tree of projects goals produce production quality, async and await native IO libraries for Python.
Starting point is 00:03:47 And like all the other stuff, it allows you to do sort of IO block stuff in parallel in really nice ways. But the API on how you do that is quite a bit nicer, and it supports advanced concepts like canceling a task that started while you're still waiting for it. Right? Things like that.
Starting point is 00:04:07 Like if you're doing a web service call in a database transaction and the database fails, you want to roll back or cancel the web call or vice versa. Something like that. So it really tries to distinguish itself by being, like, really focused on this usability thing. And it's built, like, entirely from the ground up. And what's really interesting is they have this, like, these are our sources for inspiration. A link, like a GitHub issue. Maybe it's in the wiki.
Starting point is 00:04:33 I can't remember. But a big, long part where they talk about all the places. There's a lot of Erlang. There's C Sharp. There's Go that makes a big appearance there. But in particular, it's based on David Beasley's Curio, which is kind of a similar project as well. Interesting, right?
Starting point is 00:04:47 Yeah, and I like the API is actually pretty nice. Things like start soon. Yeah, exactly. I love the things you can do. So like async IO stuff is really fairly complicated. It's like three, four lines of code. You always have to do just a transition from synchronous mode into the asynchronous world. You've always got to create this async loop and then begin running an async method to do just a transition from synchronous mode into the asynchronous world
Starting point is 00:05:05 you've always got to create this async loop and then begin running an async method to do it but here you just say trio.run you give it a function boom done if you want to like pause but not block the rest of the program you can instead of saying time.sleep which will totally block that thread you can say trio.sleep. And that will basically release the main thread to go do more work as if you're doing IO. Nice. That's nice. But like you said, the main part is you can say create an async with block, which already is like mind bending and create this thing called a nursery. And then you go to the nursery and say, start this task soon, start that task soon. And then the with block will wait and block at the very end
Starting point is 00:05:45 until all of the things you started within it are done yeah and that you know the nursery thing confused me at first until i remembered that we're doing child tasks so that's right you get your child's children from a nursery nice that's right you put them in the nursery they can grow up and when they're done then you're they're out of but this this like adaptation of the async with block is really really interesting which i believe requires python 3.6 so they have a bunch of tools for like inspecting and debugging your programs and like the async flow how this stuff is working which is i think a really nice addition one of the problems that you run into with this is if you've got like say an async postgres data access layer it's probably built on async io not on trio so you can't even though they are effectively the same
Starting point is 00:06:31 they're not compatible right so there's this other project called trio async io that puts a layer a compatibility layer so anything that supports async io can run on trio oh cool yeah so this is a really cool project. I'm super impressed with this. I'll have to check that out too, but just a slight correction. It says 3.5 and above and also PyPy. Yes, that's true, but I think the async with block, I'm not sure that that structure itself is supported in Python 3.5.
Starting point is 00:07:01 I don't think so. I know more things became async in 3.6, like async generator expressions and list comprehensions became came in 3.6 so there's like it might be a slightly different context there and whatever f strings came in 3.6 so nothing exists to me before 3.6 3.5 you're dead to me awesome so i've been hearing a lot on the social medias about this thing called Black. Yeah. So I thought we'd better cover it, even though it's been around for a little while, not like a long time, but it is, you're right. It has had a lot of social media attention. So Black is the uncompromising Python code formatter, but I thought it was just sort of
Starting point is 00:07:43 an amusing take on something like, so we have linters and everything, but I thought it was just sort of an amusing take on something like, so we have linters and everything, but they just tell you what is wrong or what doesn't comply with Pep8 standards or other standards. And this one just goes and changes your code for you and doesn't tell you anything. Well, I'm not sure if it tells you or not. Is it like a black box? I'm not sure why the name black, but it is amusing that you can also, now that after you run it on your code, your code is blackened code, which actually makes me just hungry
Starting point is 00:08:13 because I really like blackened salmon. Yeah, a good little sauce on it, yeah. It is an interesting take to say, if you're going to say our code needs to follow certain standards, if this one works for you, just put that in your tool chain and it'll just automatically format everything and you don't need to argue about it anymore. Make it part of like a GitHub, a Git check-in hook or part of your automated build that just formats it and checks it back in. The GitHub repo has some amusing stuff in it also.
Starting point is 00:08:42 So poke around in there because like, for for instance in the tests, there's a comments file and it has examples of comments that should be removed. And it's sort of funny stuff like some comment about why this function doesn't work and it's still in production anyway. Things like that that should just be removed.
Starting point is 00:09:00 It's funny. It's a good read. Yeah, I think this is pretty interesting. I haven't done anything with it, but it's definitely worth checking out, especially if you have a team of people and you want to try to make it continuously format stuff the same, right? I don't really have an opinion on it other than it's interesting. Sounds pretty good. All right, before we get to the next one, let me tell you about Datadog. So if you run any sort of distributed app, understanding how a request flows from one part through the whole thing,
Starting point is 00:09:26 what the performance of those are, what the bottlenecks around those are, can be really tricky. So you can plug in Datadog in just a few minutes, you'll be able to investigate those bottlenecks and explore dashboards that show you where you're spending your time in the app. And you get to visualize your Python performance, super easy easy and nice get a free trial and a free datadog t-shirt with a cute little dog on it so just check them out at pythonbytes.fm slash datadog we got to get our shirts right brian yeah definitely definitely i'm looking at pycon i'm going to try to get a shirt from him at pycon i'm going to try to get one before that nice back onto my rant on async stuff. So there's this thing called gain.
Starting point is 00:10:07 And the point of gain is you can give it a base URL, a set of regular expressions of types of links in there to traverse and follow, and then just tell it to go. And it will basically spider an entire site. Think Google Web Spider, but yours. Yeah, and it's all based on AsyncIO, UV Loop, and AIO HTTP, which is pretty cool. So all you got to do is you define a class
Starting point is 00:10:37 that has the CSS selectors and what you want to do, like save the data to a file or a database or whatever. How many concurrent connections you want it to go spider on with its async aspect, where to start the URL, things to match like anything on the page or anything that's under the catalog section or whatever. And you say go. And then you wait a little bit and all sorts of stuff has been downloaded, processed and saved. Right. And it's very terse. I mean, you don't really have to put that much code in place to get this done. No, it's like 10, 15 lines of code,
Starting point is 00:11:08 and you've completely analyzed somebody's entire website structure. Pretty cool. Yeah. And because it's based on AsyncIO and AIO HTTP, it should totally fly. Neat. Yeah, very neat. So not a whole lot to do on that one, but if you're doing screen scraping, web analysis of more than just one page,
Starting point is 00:11:26 this is pretty cool because you can sort of just set up patterns and say, go forth and do that. I was thinking it would be fun to do something like that to hook up with a website you're running to just even attach it to a post project of checking to make sure the link
Starting point is 00:11:42 using a request or something to grab. Yeah, that's a good point. Like link validation to make sure every link on the page works correctly. Things like that, right? Yeah. And set up a notifier or something like that to let you know if something's broken. All right. So what is this next one you got with these decorators, single dispatch? Yeah, actually. So this is an article called Generic Functions in Python with Single Dispatch. And I didn't know this was a thing. Apparently in Python 3.4, it was added this decorator, a single dispatch decorator.
Starting point is 00:12:12 And we'll talk about it and read it, but you kind of need to see the code. You can decorate a function with single dispatch and that makes that function the default function, then you can use a decorator to register other functions to be the non-default. Oh, this is interesting. Yeah, so that you can have one function name that calls different functions based on the type of the first element in the parameter list. So it's basically like declarative function
Starting point is 00:12:46 overloading polymorphism based on the type, which we don't have in Python. Right, which apparently we do have in Python. I just didn't know about it. It just requires decorators, but it's built in. Built in decorators. Yeah. Yeah, as well.
Starting point is 00:12:58 So you've got like one function and you say single dispatch, and then you have other functions that just have doc strings, but you would basically wrap them with, this one takes a list setter tuple called this version. If it takes a dictionary called this other version. Yeah, this is interesting.
Starting point is 00:13:13 It's a little non-obvious, but it's interesting. I took the example out of the article and trimmed it down. So those doc strings are just to make our code example and our notes small. But yeah, it has an example of building your own fprintf function that can print differently. The default is just to print the string representation. But for instance, lists and sets and dictionaries can be printed differently and having elements on each line. I'm sure there's other reasons. But I know I've run across times where I wished Python had function overloading,
Starting point is 00:13:48 and it doesn't. I've implemented function overloading with if is instance of this type, else if instance of this type, you know, which is not really great, but it's what you got, right? But apparently you've got this as well, which is pretty awesome.
Starting point is 00:14:03 Yeah. All right, you ready for another rant? Another thing on async? Final one. We haven't talked about async for a while. No, let's talk about that. So there's this thing by a guy named Alex Sherman, and he wrote a library called Unsync, async, unsync,
Starting point is 00:14:18 called Unsynchronizing Async and Await in Python 3.6. So he says, I'm a big fan of Async and Await in Python 3.6. So he says, I'm a big fan of Async and Await, but there's two major problems with this. First of all, it's difficult to do fire and forget Async stuff, right? You can't just go to an Async function and call it and let it run. You have to do this weird sort of series of creating the Async loop, blocking on the async loop. So you create a loop and you ensure the future by giving it a function, a loop function to call. But it's really not obvious to just run a basic asynchronous thing from a synchronous task. Okay. So if you look at the article, like right at the top, it links sort of that, that code there. You also can't block.
Starting point is 00:15:02 You can't say, well, I've gotten this thing back from an async thing. I just want to stop here and just wait until its answer comes back. It'll throw an exception, right? So this is all kind of weird. This guy says, hey, well, what can we do about this? So he kind of solves it in a sense. It says, you know, C Sharp had this idea of asyncing away in these tasks that run almost identical to what Python has. The way they fixed it was by creating this ambient thread pool that will capture it and run. Basically, the async IO loop is like this thing behind the scenes you never see. And internally, Python or C Sharp would like find it, just put it in like the default one.
Starting point is 00:15:37 And they said Python didn't take this approach. And his hunch is the maintainers didn't want to add an ambient thread pool to their language, which makes sense he says i however am not a python maintainer and i did add an ambient thread to mine and here's how it works so you just take any async function and you put an at unsync decorator we also have a big decorator theme going on here put an at unsync decorator on it and then you just call it and it sounds real simple. So what it does is it will basically wrap it up and do all that async IO initialization stuff for you, and then you can wait on the result or not wait on the result, however you like.
Starting point is 00:16:15 That alone is pretty cool. But then if you put that on a regular function, not an async one, it'll cause it to run on a thread pool thread, on thread pool executor. If you flip a bit and say, add unsync to your decorator, but it's CPU bound, it'll actually run it on the process pool executor in a separate process so you can get around the gill. Oh, interesting. And it's all just one decorator. And it'll like traverse, it'll like sort of manage those dependencies as async, how does it run, where it run? It's really pretty slick on how it detects the different ways in which asynchronicity can be manifest in Python.
Starting point is 00:16:50 To be fair, my first thought is that this might be, if I'm writing an asynchronous library, that synchronizing my asynchronous library, for instance, might be helpful during like just a functional test, for example. Yeah, you want to wait for it to go. Yeah, definitely. So I think there's a lot of interesting use cases for this. And it definitely provides a lot of flexibility. It's not, it doesn't have a huge number of GitHub stars. I think it's pretty new. But, you know, people can think about it. And maybe there's even some tie ins, like maybe somehow Trio, and it's Trio asyncIO could plug together with this. I don't know. But a lot of interesting news around the AsyncIO space or Async await space.
Starting point is 00:17:33 Very amusing code example of things like return I hate event loops and naming his event loop annoying event loop. Yeah, he's got some great naming. And then his async function that he calls by putting the unsync decorator on it, its return value is I like decorators. Yeah, it's pretty lighthearted. It's nice, but it's a cool project. People can check it out and see if it works for them. All right. Well, thanks.
Starting point is 00:17:58 Yeah. You got anything else to share with us, Brian? I'm out of news for the week. No, I'm out as well. All right. How about that? Well, thanks for finding all these things and sharing them. And thanks, everyone, for listening.
Starting point is 00:18:09 Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.