Python Bytes - #153 Auto format my Python please!

Episode Date: October 23, 2019

Topics covered in this episode: Building a Python C Extension Module What’s New in Python 3.8 - docs.python.org UK National Cyber Security Centre (NCSC) is warning developers of the risks of stic...king with Python 2.7, particularly for library writers Pythonic News Deep Learning Workstations, Servers, Laptops, and GPU Cloud * Auto formatters for Python* Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/153

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 153, recorded October 16, 2019. I'm Brian Ocken. And I'm Michael Kennedy. This week's episode is sponsored by DigitalOcean. We'll talk more about them later. But first, Michael, could you extend my knowledge a bit? Yeah, by like extending the entire Python ecosystem, maybe?
Starting point is 00:00:23 Yeah. Yeah. So there's actually a cool real Python article called Building a Python C Extension Module. So Brian, you know how to write C code, right? Yes. Or at least that's the theory I used to know how. Yeah. I have this really awesome former self of mine that was super good at C++.
Starting point is 00:00:40 I kind of remember that person that I was. I used to be able to write a lot of C. That was my main job is to write a lot of C. That was my main job is to write C and 3D stuff and OpenGL and things like that, right? So it's definitely the main way to extend Python these days. And there's other options, like there's some cool Rust options and whatnot. But primarily people know C, it runs everywhere, has light runtime requirements. You're already running CPython probably, so you already have those requirements met, right?
Starting point is 00:01:11 So extending your code with some kind of C extension gives you a couple of options. One is clearly performance. I love to talk about Python performance because one, it always surprises me, and two, like people are usually wrong about it. They say Python is slow. Like I was just reading something on Quora about why, like compare C sharp to Python and somebody as well, you can't even compare them. C sharp is 50 times faster. Well, that's true for certain operations, unless
Starting point is 00:01:35 maybe part of that it's done in C and then probably Python is faster because like now it's down in like NumPy and doing it in C, which is actually faster, right? There's just, it gets really interesting. So one reason you might care about writing a C module is just for performance. And I think that's what most people think of, but there's also like low level operating system APIs or other C APIs, like some library you can get, you might want to use that happens to be written in C and doesn't have a Python way to talk to it, right? Yeah, there's lots of stuff with DLLs that are available with C header files, but you don't have a Python binding. Exactly.
Starting point is 00:02:10 And I bet you have a lot of experience with that with all of your devices and stuff like that. Yep. Yeah. Okay, so those are the two main reasons I can think of writing C extensions. I mean, obviously throw some Cython at it if it's a performance thing to give it a try. But there's a cool tutorial on real Python. And it talks about how you can, you know, like things you'll be able to do is like import C functions within Python, pass arguments from Python to C, raise correct exceptions in your C code.
Starting point is 00:02:37 So they surface bubble back into your Python code as a proper like value error type exception or something like that. All sorts of cool things and even how to test and distribute it. So let me just sort of talk through the process and then people really care. They can go read the big long article, right? So if you want to basically get access to some C functionality, or if you want to just like write your implementation and see for some degree, first thing you got to do is go and figure out, let's suppose you want to call like a, some C function, right? So the article uses F puts, which puts a string into a file pointer, right? Like basically writes a string to a file in C. So you have to write a function, which is pretty interesting because it returns, you have to start talking in the C Python language,
Starting point is 00:03:23 not Python, right? So everything that gets passed around is a Python object pointer, or the return value is like a pi object pointer, right? So you pass these things around. And first of all, you declare like whatever inbound arguments you're really expecting. And you get basically passed a single pointer that is the arguments to your function, but it's really a tuple. So there's a pyarg parse tuple, give you the arguments, a format thing, and you give it the address of the pointers. You pass them by reference, basically. And then you just do your CPython code.
Starting point is 00:03:55 In this case, the function that they're wrapping fputs returns the number of bytes copied when it does that. And so this function wants to return the bytes copied but you can't just return an integer or long no no because everything in python is a pi object at the c level pi object star even numbers so you have to convert from a long to a pi long from long which is a function that you get from the python.h c header file okay it's actually pretty simple there's like some weird non-obvious structure at the beginning of the function
Starting point is 00:04:29 so that it can be called by Python. And the return value is weird, but everything else in the middle is straight C. So you don't really have to think about what's going on. The GIL will protect you from race conditions, all that kind of stuff. Yeah, and actually one of the things I love about this article is that it's using a fairly simple example
Starting point is 00:04:44 so that you're not distracted by the example. It's just the boilerplate junk that you got to learn about. Yeah, absolutely. Which is probably the thing you don't know, even if you know C, right? Yeah. It says also there's a few other things that are necessary if you actually want to use this code and not just write it and compile it in C, is you have to write a definition for your module in C and the methods that it contains. So there's a few C functions that you call there. And then you have to build it for Python,
Starting point is 00:05:11 which you basically create a setup.py file and use distutils and it will compile and create the right base library and install it for you. Okay. Pretty cool, huh? Pretty cool, yeah. One of the issues with this is that people that have to to a lot of times when you need to do this it isn't a hardcore c compiler person or a hardcore python c python person that needs to do this it's just your casual user that happens
Starting point is 00:05:38 to have a use case that they need to connect python to c and so this is great yeah and it's super approachable and like you you said, the examples are pretty straightforward. Obviously you're writing C, which puts you in a different category of hard, right? I mean, free malloc pointers, pointers by reference, like all that kind of stuff that you learn when you learn C, but that's the world you got to live in when you go down and you take the blue pill or whatever it is. is the blue one the good one i think no i always forget i know that there's a pill that's good and there's pill that's like bad it keeps the facade but yeah probably the i don't know do you know what else is good
Starting point is 00:06:16 documentation documentation no python python 3.8 python 3.8 is good but also python 3.8 is good. But also Python 3.8. For even the URL, sorry. Python 3.8 dropped just this week. So it is no longer beta. It is final and you can download it from the front page. The default is Python 3.8.0 now when you download it. So yay. Yes, that's awesome. We've talked about a lot of stuff.
Starting point is 00:06:44 On this podcast, we've talked about things going into 3.8. Like the walrus operator, of course, that's come up a lot of times. Those are assignment expressions, positional only parameters, and fstrings. Fstrings have the little equal sign so you can debug with them easier. Right, fstrings have been here since 3.6, but now they have this like self-documenting short print statement thing, right? Yeah, and it takes longer to describe than to show, and it's cool. What I wanted to highlight is the What's New in Python 3.8 document that came out from, that's at docs.python.org, and it's a really great summary
Starting point is 00:07:15 of all the stuff that's in 3.8. It does have all of those new things, all those big hitters, but it also has some stuff that I was surprised by that I hadn't heard of you before one of them is we've talked about a lot of async stuff and now you can type python dash m async io and it launches a async native repl that is so cool and i had no idea that that was there i guess it would have been a pain in the butt before to work with async stuff
Starting point is 00:07:45 over there in the REPL, right? Yeah, I guess. Now you can just, because I often drop into the REPL to try something out. Now you can try out async stuff in there. So that's cool. Yeah, that's super cool. A couple other things that'll just help you while writing Python, a couple new warnings and messages for things that you might do wrong. So when you're not supposed to compare, use is or is not to compare non-objects like strings or integers or something. It's just like if x is three, don't do that. But apparently the warning wasn't very good and so now the warning is better. It tells you to use double equal or not equal. So that's cool. And then one of the things that I often get because I do a lot of parameterized, is if you've got a list with tuples inside, or basically a list of lists or a tuple of tuples, and you forget the commas between some of the things, because maybe they're on a new line or something.
Starting point is 00:08:39 The warning was weird before, but now it is a more helpful message. So I love things like that. Yeah. You know, it drives me crazy if those are strings, like if you're creating like a JSON document or something like that, or a multi-line, like a list of strings, you forget a comma, it just concatenates them, even though they're on separate lines. I'm like, oh, really? That's the default behavior, but I understand where it comes from, but it drives me crazy. That probably still there. Yeah. I don't see how you would fix that without changing what that means. yeah this one it took me a while to get my head around but i didn't know that this was an issue before iterable unpacking so if you like packed a bunch of stuff into a variable you can unpack it with star variable name you can't return that in a return
Starting point is 00:09:22 statement or you couldn't before out of a tuple so you had to put parentheses around it before you return it but now that's gone away you can just return it yeah there's a lot of good stuff in here actually and you just did an episode on it didn't you yep uh episode 91 of testing code i just read through the entire article and and uh and it's still just 20 minutes i didn't read through everything but it highlighted all the stuff that I thought was cool. Super. You know, something else that's cool is DigitalOcean. I love DigitalOcean.
Starting point is 00:09:49 This episode is sponsored by DigitalOcean. And Python Byte's infrastructure runs on DigitalOcean, thanks to Michael putting that all together. And it's quite solid, and we're super happy with it. But did you know that not all web applications and services have the same memory and CPU demands? It's shocking, isn't it? Shocking. Anyway, so DigitalOcean has embraced this diversity in their droplet structure, which is cool, with the ratio of memory to CPU powers in droplets. The general purpose droplets have a ratio of four gigabytes of memory per CPU, and you can scale those up.
Starting point is 00:10:23 They added, not too long, a couple years ago, I think, CPU optimized ones. So they doubled the number of CPUs per the amount of memory, and that's great for CPU bound tasks. But there's some applications like high performance databases or in-memory caches or data processing of large sets that a lot of memory might be a really great thing. So there's now a memory optimized droplet that reverses that structure and makes it like eight gigabytes of memory per CPU. It's pretty cool.
Starting point is 00:10:52 Yeah, very cool. Yeah, use the right kind of droplet for the right service that you're using and try it out at pythonbytes.fm slash digital ocean and they'll give you a $50 credit for new users. You and I have mentioned that folks should put legacy Python where it belongs in the past. Last time we spoke about 35 million lines of Python code at JPMorgan Chase and their journey to work on that.
Starting point is 00:11:16 And that's all interesting. But we just recently got this announcement from the UK's NCSC, the National Cyber Security Center. Wow. Yeah. And they're warning developers of the risk of sticking with Python 2, particularly library writers. Okay. That's interesting, right? That they actually go so far as to call that out as a thing. So they say, look, this could be like basically companies that continue to use Python 2 past its end of life could be tempting or setting the environment for another WannaCry or even an Equifax incident. So Equifax was a horrible data breach. Basically, it's one of these companies that gathers up so much private data.
Starting point is 00:12:00 They know stuff about my financial past that I have forgotten and don't even know. They're like, oh, did you know you had this account in California? Like I did. Oh, okay. Well, I guess I do. Right. They know all of that. And it was broken into.
Starting point is 00:12:12 Why? Because there was a vulnerability in Apache struts, which is an open source framework. People at struts are like, guys, this is super bad. You just have to send like a special ACP request to the server and it's owned. Right. Well, the folks at Equifax got that message, but they didn't really want to get around to like upgrading it to the new version because, hey, it's kind of hard to upgrade this thing. It's like a new version, which probably
Starting point is 00:12:34 is old and was slightly incompatible or something. Anyway, that's where Equifax came from is running an old version of one of these frameworks, not Java itself, but like the web framework on top of it. Anyway, there's some cool quotes in here. They say, if you're still using Python 2x, it's time to port your code to Python 3. If you continue to use unsupported modules, you are risking the security of your organization and data as vulnerabilities will sooner or later appear, which nobody's fixing. Okay, that's one. One interesting quote. Another one is, if you maintain a library that other developers depend upon, you may be preventing them from updating to three. And by holding back other developers, you're indirectly and likely, unintentionally increasing the security risk of basically all the computers in the world.
Starting point is 00:13:19 Yeah. Yeah. So, I mean, we've said this before, right? You and I have said this, but if the NSA or the NCSC, they come out and publicly call out Python 2 like this, well, that maybe carries more weight than Python bytes. Not that we don't carry some weight, I'm sure. Yeah, it actually makes me think though, like let's say you have a library that now works on both Python 2 and 3 and somebody else is depending on it and
Starting point is 00:13:46 they're also depending on another library that is two only they're gonna stick with two yeah but if like for instance you could push them if you like stopped supporting python 3 or python 2 it's a good question like in six months do we have a obligation to actually cut python 2 out of our libraries i mean i don't have any libraries people care about but maybe to force people to upgrade maybe you could do some help yeah most of these changes have been more self-serving or self-centered right like numpy and django all those folks dropped python 2 not because they're like we're gonna fix the world but like we don't want to maintain this stuff we want to just move forward and use the cool features, and we can't right now.
Starting point is 00:14:26 Yeah. Yeah, pretty cool. I guess one other kind of interesting thing to call out from this report, article, whatever you call it, is that they said that Python's popularity makes updating the code imperative, which I thought was pretty interesting. It's like Python is so successful.
Starting point is 00:14:41 It's so broadly deployed. We can't just ignore this. It's not like Adobe Flash. It's now running an old version. We should deal with it, right? This is one of the really important parts of the computer infrastructure that they called out. Interesting. Yeah, I mean, there's got a Hacker News lookalike site called news.python.sc. I don't know what SC stands for. Yeah, it looks a lot like Hacker News, but it's just got Python stuff on it, and it's pretty neat.
Starting point is 00:15:19 So I thought, oh, that's cool, we should talk about it. But one of the neat things about it is he put it all together relatively quickly in like a week or so. And it's built on Django. And all of it's open source. So you can take it and look at how it's done and everything. Plus it's up and it's live and you can post stuff. It's neat.
Starting point is 00:15:39 And I thought, yeah, maybe we'll cover this. And then while I was thinking about covering it, we got like two or three other people tell us about this new news site. So I think people are using it. It's kind of fun. What do you think? I like it. It definitely looks like Hacker News, but more Pythonic colors. And, you know, looking through this,
Starting point is 00:15:58 these are all legitimately interesting things here. I'm like, yeah, oh, yeah, I read about that. That was cool. And, oh, I didn't know about that, but interesting. Yeah, I feel like this is great. And even if it doesn't take off, I think it's cool to have an example of a working model of simple with people being able to vote things up and down and that's kind of a neat model to say there's a working website, a working user model, how can I emulate
Starting point is 00:16:24 that in Python? Yeah, super cool. I'm definitely going to start checking it out as one of my new sources in addition to Redis and Twitter and other places. Yeah, like we don't have enough to do. I know. Now you just gave me work, man. Come on, it's homework.
Starting point is 00:16:36 So you've heard that most people are moving to the cloud and data science is moving to the cloud. There's all sorts of interesting stuff happening up there. But a lot of times this type of work especially training like machine learning models and stuff is super super intensive so if you've got like a laptop some of the gpu processing and other really interesting things are inaccessible to you like for example my macbook is super killer but it's got you know like 12 cores if you count the hyper threads and it's got 32 gigs of ram but it has a ati not a nvidia graphics card so you can't use cuda on it for example right so what do i do i go to the cloud well if you're really into deep
Starting point is 00:17:18 learning and you really want to do like data science with GPUs, there's this place called Lambda, this company called Lambda that is creating these deep learning workstation servers and laptops. Have you heard about this? Huh? No. Just to be clear, this is like a super commercial product, right? These are like servers that you buy and we have no, this is not like a product placement. I just ran across this and thought, dang, this is interesting. So I thought I would just talk about it. So they're selling GPU accelerated TensorFlow, PyTorch, Keras, and other types of pre-configured machines. They say, just plug in and start training. You're good to go. And they talk about how you can save a bunch of money, right? You don't run on the cloud.
Starting point is 00:18:01 The cloud can save you money for short work, but if you got to do it over a long time it can get expensive so they offer a tensorbook which is a gpu training available laptop capable laptop for two thousand nine hundred dollars that's a pricey laptop right yeah actually it's less expensive than my macbook but so but if you were going to do gpu stuff you know this is a really cool option to be able to do it on the go or be mobile. Then they also have a workstation, which is called Lambda Quad, which has four GPUs in it. And this one is $21,000. Okay.
Starting point is 00:18:36 That's a lot. But if you go and grab a second-tier GPU-enabled EC2 instance, specifically a P38X large, that's over $12 an hour, which comes out to close to $9,000 a month in cloud bills if you were to run it all the time. Obviously, probably not all the time.
Starting point is 00:18:56 So $21,000 is a lot, but a $9,000 monthly bill for AWS is also a lot. Yeah, it's something to pay attention to as your bill starts getting bigger. Maybe a dedicated hardware makes sense. Anytime I run across something like this, if it were Alienware for gaming laptops or the Apple MacBook Pro or whatever,
Starting point is 00:19:14 it's like, all right, well, what if you're all in? What if you turn all the knobs to 11? What could you get? They have this thing called the Lambda Hyperplane, which has eight Tesla V100 GPUs. And it starts at, it's not the final price, it starts at $114,000. Oh, nice.
Starting point is 00:19:30 That's without the pinstriping. Yeah, exactly. That's not even the leather bound keyboard or whatever. I don't know. Anyway, if you're into deep learning and you need GPUs for computational stuff, data science and whatnot. This is actually pretty cool. Yeah.
Starting point is 00:19:49 Also, I'm sure there's applications where you really don't want to use the cloud. You want to use in-house computers and not go out or the connection is bad. You're sticking some data in the middle of nowhere or something and you can't get to the internet. Right. If you got terabytes of data, that takes days to upload. So maybe it's better to just run it locally. Who knows? Black has been a big hit. Yeah, I like black. Yeah, for sure. like that takes days to upload so you know maybe it's better to just run it locally who knows black has been a big hit and uh yeah i like black yeah a lot of people do oh yeah what are the
Starting point is 00:20:11 things um so i ran across an article it's not a new article but it's all still relevant it's auto formatters in python and big shock black's in there but one of the things i liked about it is they talked they spent a little bit of time talking about why why you want to use black or something else and what and and i'm finding this more and more as a team lead that just it's not great to have like if you're doing code reviews you don't want to have like style be part of the code review yeah it's way better to have a tool just uh dictate the style and so people can argue with the tool instead of arguing with each other. Yeah, it's like if the code review, the people there, I'm sure they feel like, well, I have
Starting point is 00:20:51 to make a constructive or critical comment about something. It shouldn't be, why are you indenting like that? Or why is there not a space by those commas? Like, that's the stuff machines can agree upon and just do for us, right? Like, have architectural or algorithmic conversations, right? Yeah, you should be using three double quotes there instead of one. So get off the style police and use an auto formatter instead. I love black.
Starting point is 00:21:13 A lot of people do. But there's reasons for some people like an established code base or other predefined style guide that maybe it's too much. It does do things that sometimes I don't like it to do. So there's a couple other options. And this article talks about AutoPep8 and YAPF. Now AutoPep8 is essentially just, it just does Pep8 or uses PyCode style, the utility, to detect Pep8 violations and just change change the code you can do both with it it does less than black but it doesn't do much more so if that if really you're just trying to stick to pep8 maybe that's yeah that'd be better to use and the other end of it is yapf which is a
Starting point is 00:21:59 tool out of and i don't know how to say that yap of yep it's probably yet another python formatter yeah it probably is it's a google tool i think it's cool i think it's good if you want it's got I don't know how to say that. Yep. It's probably yet another Python formatter. Yeah, it probably is. It's a Google tool. I think it's cool. I think it's good if you want. It's got a lot of knobs and dials, a lot of customization. So if black doesn't have enough controls for you and you really want to tweak it to be your personal company's code style, this might be great for you.
Starting point is 00:22:25 In the documentation, it says it takes away some of the drudgery in maintaining your code and what just ultimate goal is to code is that it produces as good a code as that a programmer would write if they were following the style guide that sounds pretty good honestly one of the interesting things i was reaching researching this um this story is i didn't know this about black after it's changed your code it does a check to see if the reformatted code still produces a valid abstract syntax tree that is equivalent to the original that's pretty cool i didn't know it did that yeah that is cool like so running through the python parser and turn it into bytecode and then just see if the essence is the same, which, yeah, I mean,
Starting point is 00:23:06 because you don't actually want to change the meaning of the way the code actually gets interpreted. It's just formatting, right? So the meaning change is like, well, that might be a problem. Yeah. The other thing I wanted to highlight this article for is it took a few code examples
Starting point is 00:23:17 and just did the, what does black change it into? And what does Yappif change it into? And what does Autopepate change it into? Oh, that's sweet. I like that. Yeah. Very what does auto pet bait change it into so that's sweet i like that yeah very very cool all right well that's all of our main items you got anything else you want to throw out there while we're here no you yes a couple things i'm getting excited for pycon us it's earlier this year in april at some point i'm guessing but the announcement i want to
Starting point is 00:23:40 make is that the applications for financial aid are open, and they'll be accepting them through January 31st, 2020. So 30 days into a world with only Python 3. The Python Software Foundation and PyLadies are making this financial aid possible, and check it out. Yeah, so like PSF is contributing $130,000 towards that. And yeah, it's pretty good. So if you're thinking, hey, I would really love to go to PyCon and make some connections, kind of new to this world, use some networking and learn more about it, but I just can't justify the expense or afford it, maybe do that. Yeah, nice. Indeed,
Starting point is 00:24:14 indeed. And I'm working on some new courses. I got one that's all done and recorded, just getting edited. Another one, I spent like six contiguous hours recording videos yesterday. That doesn't sound like a lot of time if you haven't done it, but six straight hours recording, that's a lot. So I'm really, really excited about what's coming out. We'll share more with it when I can. Very exciting. Oh, yeah.
Starting point is 00:24:33 Now, sometimes we have really short jokes. I see that you have one. We got a short joke that was contributed by Eric Nelson. Thanks, Eric. It is a math joke. The joke is, I is as complex as it gets. J-K. A letter I.
Starting point is 00:24:48 Yeah. I love it. I love it. I studied a bunch of complex analysis and things like that when I was doing math. And yeah, I like it. Yeah. We have another one too that it's long. It's long and I'm not going to be able to do justice to it.
Starting point is 00:25:00 So you have to check this out. So you know the song American Pie, right? Yes. I drove my chevy to the levee but the levee was dry that sort of song yeah you can sing it no no i can't sing it i could recite it if i sing it it's not going to be singing it's always something else there's another one at one of our listeners i only know his um username on reddit i'm afraid i can't find the tweet in time anyway said hey you inspired me to write this song
Starting point is 00:25:25 called American Pie, American P-Y. And it's basically the story of like legacy Python done to American Pie, the song. Yeah, it's pretty awesome. It's really, really well done. I'll just like recite a little bit here, one of the verses. So bye-bye to your legacy pies,
Starting point is 00:25:44 made decisions about division, so bye-bye to your legacy pies made decisions about division. So you'll have to revise and you decode as official. It's not a bunch of bite lies singing. That'll be the day it dies. That'll be the day it dies. It's really good. Yeah. People should check it out. If somebody can perform this and give it to us, he's given us permission to take that and put it on the air. If it's good enough, man, we'd love it. That'd be awesome. I cannot do this. I want somebody to sing it because it includes the phrase i was a crusty old fart coding guy yes i know you could be a youtube sensation if you just take this uh chance here jump on it yes and if you do let us know yeah for sure let us know that'd be awesome all right well yeah this really a nice song and a
Starting point is 00:26:24 nice nice job there. Well done on that. And Brian, thanks for everything. Thanks for being here. Thank you. Yep. You bet. Bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. This is Brian Ocken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.