Python Bytes - #153 Auto format my Python please!
Episode Date: October 23, 2019Topics covered in this episode: Building a Python C Extension Module What’s New in Python 3.8 - docs.python.org UK National Cyber Security Centre (NCSC) is warning developers of the risks of stic...king with Python 2.7, particularly for library writers Pythonic News Deep Learning Workstations, Servers, Laptops, and GPU Cloud * Auto formatters for Python* Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/153
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 153, recorded October 16, 2019.
I'm Brian Ocken.
And I'm Michael Kennedy.
This week's episode is sponsored by DigitalOcean.
We'll talk more about them later.
But first, Michael, could you extend my knowledge a bit?
Yeah, by like extending the entire Python ecosystem, maybe?
Yeah. Yeah. So there's actually a cool real Python article
called Building a Python C Extension Module.
So Brian, you know how to write C code, right?
Yes.
Or at least that's the theory I used to know how.
Yeah.
I have this really awesome former self of mine
that was super good at C++.
I kind of remember that person that I was.
I used to be able to write a lot of C.
That was my main job is to write a lot of C. That was
my main job is to write C and 3D stuff and OpenGL and things like that, right? So it's definitely
the main way to extend Python these days. And there's other options, like there's some cool
Rust options and whatnot. But primarily people know C, it runs everywhere, has light runtime
requirements. You're already running CPython probably,
so you already have those requirements met, right?
So extending your code with some kind of C extension gives you a couple of options.
One is clearly performance.
I love to talk about Python performance
because one, it always surprises me,
and two, like people are usually wrong about it.
They say Python is slow.
Like I was just reading something on Quora about why, like compare C sharp to Python and somebody as well, you can't
even compare them. C sharp is 50 times faster. Well, that's true for certain operations, unless
maybe part of that it's done in C and then probably Python is faster because like now it's
down in like NumPy and doing it in C, which is actually faster, right? There's just, it gets
really interesting. So one reason you might care about writing a C module is just for performance. And I think that's
what most people think of, but there's also like low level operating system APIs or other C APIs,
like some library you can get, you might want to use that happens to be written in C and doesn't
have a Python way to talk to it, right? Yeah, there's lots of stuff with DLLs that are available with C header files, but you
don't have a Python binding.
Exactly.
And I bet you have a lot of experience with that with all of your devices and stuff like
that.
Yep.
Yeah.
Okay, so those are the two main reasons I can think of writing C extensions.
I mean, obviously throw some Cython at it if it's a performance thing to give it a try.
But there's a cool tutorial on real Python.
And it talks about how you can, you know, like things you'll be able to do is like import C functions within Python, pass arguments from Python to C, raise correct exceptions in your C code.
So they surface bubble back into your Python code as a proper like value error type exception or something like that.
All sorts of cool things
and even how to test and distribute it. So let me just sort of talk through the process and then
people really care. They can go read the big long article, right? So if you want to basically get
access to some C functionality, or if you want to just like write your implementation and see
for some degree, first thing you got to do is go and figure out, let's suppose you want to call like a, some C function, right? So the article uses F puts, which puts a string into a file
pointer, right? Like basically writes a string to a file in C. So you have to write a function,
which is pretty interesting because it returns, you have to start talking in the C Python language,
not Python, right? So everything
that gets passed around is a Python object pointer, or the return value is like a pi
object pointer, right? So you pass these things around. And first of all, you declare like
whatever inbound arguments you're really expecting. And you get basically passed a single pointer
that is the arguments to your function, but it's really a tuple.
So there's a pyarg parse tuple, give you the arguments, a format thing, and you give it the address of the pointers.
You pass them by reference, basically.
And then you just do your CPython code.
In this case, the function that they're wrapping fputs returns the number of bytes copied when
it does that.
And so this function wants to return the bytes copied
but you can't just return an integer or long no no because everything in python is a pi object
at the c level pi object star even numbers so you have to convert from a long to a pi long from long
which is a function that you get from the python.h c header file okay it's actually pretty simple
there's like some weird non-obvious structure
at the beginning of the function
so that it can be called by Python.
And the return value is weird,
but everything else in the middle is straight C.
So you don't really have to think about what's going on.
The GIL will protect you from race conditions,
all that kind of stuff.
Yeah, and actually one of the things I love about this article
is that it's using a fairly simple example
so that you're not distracted by the example. It's just the boilerplate junk
that you got to learn about. Yeah, absolutely. Which is probably the thing you don't know,
even if you know C, right? Yeah. It says also there's a few other things that are necessary
if you actually want to use this code and not just write it and compile it in C,
is you have to write a definition for your module in C
and the methods that it contains.
So there's a few C functions that you call there.
And then you have to build it for Python,
which you basically create a setup.py file
and use distutils and it will compile
and create the right base library and install it for you.
Okay.
Pretty cool, huh?
Pretty cool, yeah.
One of the issues with this is that people that have to to a lot of times when you need to do this it isn't a hardcore c compiler person
or a hardcore python c python person that needs to do this it's just your casual user that happens
to have a use case that they need to connect python to c and so this is great yeah and it's
super approachable and like you you said, the examples are
pretty straightforward. Obviously you're writing C, which puts you in a different category of hard,
right? I mean, free malloc pointers, pointers by reference, like all that kind of stuff that
you learn when you learn C, but that's the world you got to live in when you go down and you take
the blue pill or whatever it is. is the blue one the good one i think
no i always forget i know that there's a pill that's good and there's pill that's like bad
it keeps the facade but yeah probably the i don't know do you know what else is good
documentation documentation no python python 3.8 python 3.8 is good but also python 3.8 is good. But also Python 3.8. For even the URL, sorry.
Python 3.8 dropped just this week.
So it is no longer beta.
It is final and you can download it from the front page.
The default is Python 3.8.0 now when you download it.
So yay.
Yes, that's awesome.
We've talked about a lot of stuff.
On this podcast, we've talked about things going into 3.8. Like the walrus operator, of course, that's come up a lot of times. Those are assignment expressions,
positional only parameters, and fstrings. Fstrings have the little equal sign so you can debug with
them easier. Right, fstrings have been here since 3.6, but now they have this like self-documenting
short print statement thing, right? Yeah, and it takes longer to describe than to show, and it's cool.
What I wanted to highlight is the
What's New in Python 3.8 document
that came out from, that's at docs.python.org,
and it's a really great summary
of all the stuff that's in 3.8.
It does have all of those new things,
all those big hitters,
but it also has some stuff that I was surprised by
that I hadn't heard of you
before one of them is we've talked about a lot of async stuff and now you can type python dash m
async io and it launches a async native repl that is so cool and i had no idea that that was there
i guess it would have been a pain in the butt before to work with async stuff
over there in the REPL, right? Yeah, I guess. Now you can just, because I often drop into the
REPL to try something out. Now you can try out async stuff in there. So that's cool. Yeah,
that's super cool. A couple other things that'll just help you while writing Python,
a couple new warnings and messages for things that you might do wrong. So when you're not supposed to compare, use is or
is not to compare non-objects like strings or integers or something. It's just like if x is
three, don't do that. But apparently the warning wasn't very good and so now the warning is better.
It tells you to use double equal or not equal. So that's cool. And then one of the things that
I often get because I do a lot of parameterized, is if you've got a list with tuples inside, or basically a list of lists or a tuple of tuples, and you forget the commas between some of the things, because maybe they're on a new line or something.
The warning was weird before, but now it is a more helpful message. So I love things like that.
Yeah. You know, it drives me crazy if those are strings, like if you're creating like a JSON document or something like that, or a multi-line, like a list of strings, you forget a comma,
it just concatenates them, even though they're on separate lines. I'm like, oh, really? That's
the default behavior, but I understand where it comes from, but it drives me crazy.
That probably still there.
Yeah. I don't see how you would fix that without changing what that means. yeah this one it took me a while to get my head around but i didn't
know that this was an issue before iterable unpacking so if you like packed a bunch of
stuff into a variable you can unpack it with star variable name you can't return that in a return
statement or you couldn't before out of a tuple so you had to put
parentheses around it before you return it but now that's gone away you can just return it yeah
there's a lot of good stuff in here actually and you just did an episode on it didn't you yep uh
episode 91 of testing code i just read through the entire article and and uh and it's still just 20
minutes i didn't read through everything but it highlighted all the stuff that I thought was cool.
Super.
You know, something else that's cool is DigitalOcean.
I love DigitalOcean.
This episode is sponsored by DigitalOcean.
And Python Byte's infrastructure runs on DigitalOcean, thanks to Michael putting that all together.
And it's quite solid, and we're super happy with it.
But did you know that not all web applications and services have the same memory and CPU demands?
It's shocking, isn't it?
Shocking.
Anyway, so DigitalOcean has embraced this diversity in their droplet structure, which is cool, with the ratio of memory to CPU powers in droplets.
The general purpose droplets have a ratio of four gigabytes of memory per CPU, and you can scale those up.
They added, not too long, a couple years ago,
I think, CPU optimized ones. So they doubled the number of CPUs per the amount of memory,
and that's great for CPU bound tasks. But there's some applications like high performance databases
or in-memory caches or data processing of large sets that a lot of memory might be a really great
thing. So there's now a memory optimized droplet
that reverses that structure
and makes it like eight gigabytes of memory per CPU.
It's pretty cool.
Yeah, very cool.
Yeah, use the right kind of droplet
for the right service that you're using
and try it out at pythonbytes.fm slash digital ocean
and they'll give you a $50 credit for new users.
You and I have mentioned that folks should put legacy Python where it belongs in the past.
Last time we spoke about 35 million lines of Python code at JPMorgan Chase
and their journey to work on that.
And that's all interesting.
But we just recently got this announcement from the UK's NCSC,
the National Cyber Security Center. Wow. Yeah. And they're
warning developers of the risk of sticking with Python 2, particularly library writers. Okay.
That's interesting, right? That they actually go so far as to call that out as a thing. So they say,
look, this could be like basically companies that continue to use Python 2 past its end of life could be tempting or setting the environment for another WannaCry or even an Equifax incident.
So Equifax was a horrible data breach.
Basically, it's one of these companies that gathers up so much private data.
They know stuff about my financial past that I have forgotten and don't even know.
They're like, oh, did you know you had this account in California?
Like I did.
Oh, okay.
Well, I guess I do.
Right.
They know all of that.
And it was broken into.
Why?
Because there was a vulnerability in Apache struts, which is an open source framework.
People at struts are like, guys, this is super bad.
You just have to send like a special ACP request to the server and it's owned.
Right.
Well, the folks at Equifax got
that message, but they didn't really want to get around to like upgrading it to the new version
because, hey, it's kind of hard to upgrade this thing. It's like a new version, which probably
is old and was slightly incompatible or something. Anyway, that's where Equifax came from is running
an old version of one of these frameworks, not Java itself, but like the web framework on top of it. Anyway, there's some cool quotes in here. They say, if you're still using Python 2x,
it's time to port your code to Python 3. If you continue to use unsupported modules,
you are risking the security of your organization and data as vulnerabilities will sooner or later
appear, which nobody's fixing. Okay, that's one. One interesting quote. Another one is, if you maintain a library that other developers depend upon,
you may be preventing them from updating to three.
And by holding back other developers, you're indirectly and likely,
unintentionally increasing the security risk of basically all the computers in the world.
Yeah.
Yeah.
So, I mean, we've said this before, right?
You and I have said this, but if the NSA or the NCSC, they come out and publicly call
out Python 2 like this, well, that maybe carries more weight than Python bytes.
Not that we don't carry some weight, I'm sure.
Yeah, it actually makes me think though, like let's say you have a library that now works
on both Python 2 and 3 and somebody else is depending on it and
they're also depending on another library that is two only they're gonna stick with two yeah but if
like for instance you could push them if you like stopped supporting python 3 or python 2 it's a
good question like in six months do we have a obligation to actually cut python 2 out of our libraries i mean i don't have
any libraries people care about but maybe to force people to upgrade maybe you could do some help
yeah most of these changes have been more self-serving or self-centered right like numpy
and django all those folks dropped python 2 not because they're like we're gonna fix the world but
like we don't want to maintain this stuff we want to just move forward and use the cool features,
and we can't right now.
Yeah.
Yeah, pretty cool.
I guess one other kind of interesting thing to call out
from this report, article, whatever you call it,
is that they said that Python's popularity
makes updating the code imperative,
which I thought was pretty interesting.
It's like Python is so successful.
It's so broadly deployed.
We can't just ignore this. It's not like Adobe Flash. It's now running an old version. We should deal with it, right? This is one of the really important parts of the computer infrastructure that they called out.
Interesting.
Yeah, I mean, there's got a Hacker News lookalike site
called news.python.sc.
I don't know what SC stands for.
Yeah, it looks a lot like Hacker News,
but it's just got Python stuff on it, and it's pretty neat.
So I thought, oh, that's cool, we should talk about it.
But one of the neat things about it is he put it all together
relatively quickly in like a week or so.
And it's built on Django.
And all of it's open source.
So you can take it and look at how it's done and everything.
Plus it's up and it's live and you can post stuff.
It's neat.
And I thought, yeah, maybe we'll cover this.
And then while I was thinking about covering it, we got like two or three other people tell us about this new news site.
So I think people are using it.
It's kind of fun.
What do you think?
I like it.
It definitely looks like Hacker News, but more Pythonic colors.
And, you know, looking through this,
these are all legitimately interesting things here.
I'm like, yeah, oh, yeah, I read about that.
That was cool.
And, oh, I didn't know about that, but interesting. Yeah, I feel like this is great. And even if it doesn't
take off, I think it's cool to have an example of a working model of
simple with people being able to vote things up and down
and that's kind of a neat model to say
there's a working website, a working user model, how can I emulate
that in Python?
Yeah, super cool.
I'm definitely going to start checking it out as one of my new sources in addition to
Redis and Twitter and other places.
Yeah, like we don't have enough to do.
I know.
Now you just gave me work, man.
Come on, it's homework.
So you've heard that most people are moving to the cloud and data science is moving to
the cloud.
There's all sorts of interesting stuff happening up there.
But a lot of times this type of work especially training like machine learning models and stuff is super super intensive so if you've got like a laptop some of the gpu processing
and other really interesting things are inaccessible to you like for example my macbook is super killer
but it's got you know like 12 cores if you count the hyper
threads and it's got 32 gigs of ram but it has a ati not a nvidia graphics card so you can't use
cuda on it for example right so what do i do i go to the cloud well if you're really into deep
learning and you really want to do like data science with GPUs, there's this place called Lambda, this company called Lambda
that is creating these deep learning workstation servers and laptops. Have you heard about this?
Huh? No.
Just to be clear, this is like a super commercial product, right? These are like
servers that you buy and we have no, this is not like a product placement. I just ran across this
and thought, dang, this is interesting. So I thought I would just talk about it. So they're selling GPU accelerated TensorFlow, PyTorch, Keras, and other types of
pre-configured machines. They say, just plug in and start training. You're good to go.
And they talk about how you can save a bunch of money, right? You don't run on the cloud.
The cloud can save you money for short work, but if you got to do it over a long time it can get expensive so they offer a tensorbook which is a gpu training available laptop capable laptop
for two thousand nine hundred dollars that's a pricey laptop right yeah actually it's less
expensive than my macbook but so but if you were going to do gpu stuff you know this is a really
cool option to be able to do it on the go or be mobile.
Then they also have a workstation,
which is called Lambda Quad, which has four GPUs in it.
And this one is $21,000.
Okay.
That's a lot.
But if you go and grab a second-tier GPU-enabled EC2 instance,
specifically a P38X large,
that's over $12 an hour,
which comes out to close to $9,000 a month
in cloud bills
if you were to run it all the time.
Obviously, probably not all the time.
So $21,000 is a lot,
but a $9,000 monthly bill for AWS is also a lot.
Yeah, it's something to pay attention to
as your bill starts getting bigger.
Maybe a dedicated hardware makes sense.
Anytime I run across something like this,
if it were Alienware for gaming laptops
or the Apple MacBook Pro or whatever,
it's like, all right, well, what if you're all
in? What if you turn all the knobs to 11?
What could you get?
They have this thing called the Lambda Hyperplane,
which has eight
Tesla V100 GPUs.
And it starts at, it's not the final price, it starts at $114,000.
Oh, nice.
That's without the pinstriping.
Yeah, exactly.
That's not even the leather bound keyboard or whatever.
I don't know.
Anyway, if you're into deep learning and you need GPUs for computational stuff,
data science and whatnot.
This is actually pretty cool.
Yeah.
Also, I'm sure there's applications where you really don't want to use the cloud.
You want to use in-house computers and not go out or the connection is bad.
You're sticking some data in the middle of nowhere or something and you can't get to the internet.
Right.
If you got terabytes of data, that takes days to upload.
So maybe it's better to just run it locally.
Who knows? Black has been a big hit. Yeah, I like black. Yeah, for sure. like that takes days to upload so you know maybe it's better to just run it locally who knows
black has been a big hit and uh yeah i like black yeah a lot of people do oh yeah what are the
things um so i ran across an article it's not a new article but it's all still relevant it's
auto formatters in python and big shock black's in there but one of the things i liked about it
is they talked they spent a little bit of time talking
about why why you want to use black or something else and what and and i'm finding this more and
more as a team lead that just it's not great to have like if you're doing code reviews you don't
want to have like style be part of the code review yeah it's way better to have a tool just uh dictate
the style and so people can argue with the tool instead of arguing with each other.
Yeah, it's like if the code review, the people there, I'm sure they feel like, well, I have
to make a constructive or critical comment about something.
It shouldn't be, why are you indenting like that?
Or why is there not a space by those commas?
Like, that's the stuff machines can agree upon and just do for us, right?
Like, have architectural or algorithmic conversations, right?
Yeah, you should be using three double quotes there instead of one.
So get off the style police and use an auto formatter instead.
I love black.
A lot of people do.
But there's reasons for some people like an established code base or other predefined style guide that maybe it's too much.
It does do things that sometimes I don't like it
to do. So there's a couple other options. And this article talks about AutoPep8 and YAPF.
Now AutoPep8 is essentially just, it just does Pep8 or uses PyCode style, the utility,
to detect Pep8 violations and just change change the code you can do both with it
it does less than black but it doesn't do much more so if that if really you're just trying to
stick to pep8 maybe that's yeah that'd be better to use and the other end of it is yapf which is a
tool out of and i don't know how to say that yap of yep it's probably yet another python formatter
yeah it probably is it's a google tool i think it's cool i think it's good if you want it's got I don't know how to say that. Yep. It's probably yet another Python formatter.
Yeah, it probably is.
It's a Google tool.
I think it's cool.
I think it's good if you want.
It's got a lot of knobs and dials, a lot of customization.
So if black doesn't have enough controls for you and you really want to tweak it to be your personal company's code style, this might be great for you.
In the documentation, it says it takes away some of the drudgery in maintaining your code and what just ultimate goal is to code is that it produces as good a code as
that a programmer would write if they were following the style guide that sounds pretty
good honestly one of the interesting things i was reaching researching this um this story is i
didn't know this about black after it's changed your code it does a check
to see if the reformatted code still produces a valid abstract syntax tree that is equivalent
to the original that's pretty cool i didn't know it did that yeah that is cool like so running
through the python parser and turn it into bytecode and then just see if the essence is the same,
which, yeah, I mean,
because you don't actually want to change the meaning
of the way the code actually gets interpreted.
It's just formatting, right?
So the meaning change is like,
well, that might be a problem.
Yeah.
The other thing I wanted to highlight this article for
is it took a few code examples
and just did the,
what does black change it into?
And what does Yappif change it into?
And what does Autopepate change it into?
Oh, that's sweet. I like that. Yeah. Very what does auto pet bait change it into so that's
sweet i like that yeah very very cool all right well that's all of our main items you got anything
else you want to throw out there while we're here no you yes a couple things i'm getting excited for
pycon us it's earlier this year in april at some point i'm guessing but the announcement i want to
make is that the applications for financial aid are open, and they'll be accepting them through January 31st, 2020.
So 30 days into a world with only Python 3.
The Python Software Foundation and PyLadies are making this financial aid possible,
and check it out.
Yeah, so like PSF is contributing $130,000 towards that.
And yeah, it's pretty good.
So if you're thinking, hey, I would really love to go to PyCon and make some connections, kind of new to this world, use some networking and learn more
about it, but I just can't justify the expense or afford it, maybe do that. Yeah, nice. Indeed,
indeed. And I'm working on some new courses. I got one that's all done and recorded, just getting
edited. Another one, I spent like six contiguous hours recording videos yesterday. That doesn't
sound like a lot of time if you haven't done it,
but six straight hours recording, that's a lot.
So I'm really, really excited about what's coming out.
We'll share more with it when I can.
Very exciting.
Oh, yeah.
Now, sometimes we have really short jokes.
I see that you have one.
We got a short joke that was contributed by Eric Nelson.
Thanks, Eric.
It is a math joke.
The joke is, I is as complex as it gets.
J-K.
A letter I.
Yeah.
I love it.
I love it.
I studied a bunch of complex analysis and things like that when I was doing math.
And yeah, I like it.
Yeah.
We have another one too that it's long.
It's long and I'm not going to be able to do justice to it.
So you have to check this out.
So you know the song American Pie, right?
Yes.
I drove my chevy
to the levee but the levee was dry that sort of song yeah you can sing it no no i can't sing it i
could recite it if i sing it it's not going to be singing it's always something else there's another
one at one of our listeners i only know his um username on reddit i'm afraid i can't find the
tweet in time anyway said hey you inspired me to write this song
called American Pie, American P-Y.
And it's basically the story of like legacy Python
done to American Pie, the song.
Yeah, it's pretty awesome.
It's really, really well done.
I'll just like recite a little bit here,
one of the verses.
So bye-bye to your legacy pies,
made decisions about division, so bye-bye to your legacy pies made decisions about division. So
you'll have to revise and you decode as official. It's not a bunch of bite lies singing. That'll be
the day it dies. That'll be the day it dies. It's really good. Yeah. People should check it out.
If somebody can perform this and give it to us, he's given us permission to take that and put it
on the air. If it's good enough, man, we'd love it. That'd be awesome. I cannot do this.
I want somebody to sing it because it includes the phrase i was a crusty old fart coding guy yes i know you could
be a youtube sensation if you just take this uh chance here jump on it yes and if you do let us
know yeah for sure let us know that'd be awesome all right well yeah this really a nice song and a
nice nice job there. Well done
on that. And Brian, thanks for everything. Thanks for being here. Thank you. Yep. You bet. Bye. Bye.
Thank you for listening to Python Bytes. Follow the show on Twitter at Python Bytes. That's
Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news
item you want featured, just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
This is Brian Ocken, and on behalf of myself and Michael Kennedy,
thank you for listening and sharing this podcast with your friends and colleagues.