Python Bytes - #47 PyPy now works with way more C-extensions and parking your package safely
Episode Date: October 12, 2017Topics covered in this episode: WTF Python? Python Exercises Exploiting misuse of Python's "pickle" A Complete Beginner's Guide to Django pypi-parker Extras Joke See the full show notes for this... episode on the website at pythonbytes.fm/47
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 47, recorded October 11th, 2017.
I'm Michael Kennedy.
And I'm Brian Harkin.
And we've got a bunch of cool stuff lined up for you.
So, hey Brian, how's it going?
It's going really good.
Yeah, yeah, great.
Hey, before we get to your first item, I want to say thanks to DigitalOcean.
They've sponsored a bunch of episodes coming up and they're really supporting the show.
And the thing they want me to tell you about is Spaces, which is like Amazon S3, but like literally three times better.
And you get a two-month trial.
So check it out at do.co slash Python, and we'll talk more about that later.
How about Fast?
Fast Python, Brian.
What do you think?
I'm excited.
So PyPy is a fast implementation, and it's good to see that there's still work coming out.
And one of the exciting bits of news just recently is version 5.9, at least on the PyPy 2.7 version of this release, has Pandas and NumPy in it as well, which is super exciting.
That's actually a really big deal because they had not been supported.
That's one of the things that was a challenge with PyPy. Like, it was great. It was super exciting. That's actually a really big deal because they had not been supported. That's one of the things that was a challenge with PyPy.
Like it was great.
It was much faster in many ways.
It was like five times faster than regular CPython.
However, it didn't support any of the C extensions.
You couldn't integrate things like NumPy and stuff.
And so it was like you get a subset of Python that's super fast, but there might be things you don't want to do.
And oh, by the way, a lot of those are computational and where people care about when it's fast.
Yeah.
So it's awesome to see that coming on.
So getting NumPy and Pandas come on, and I'm sure that eventually it'll come on on the 3.5
branch as well.
Yeah, for sure. And you also have notes about Cython as well, right?
Yeah. So it includes the part of the help with this, and what it includes is Cython 0.27.1, which supports a lot more Cython projects on PyPy.
I'm not sure what the Cython story was before this release, but that's pretty exciting.
Yeah, that's cool. is that CFFI has been updated, and the C API extensions for many, many projects
now work with PyPy, whereas previously they did not.
And so it's not just Pandas and NumPy.
Those are the headline ones.
But there's a bunch of things that previously couldn't work with PyPy
because of the C extensions.
Well, guess what? Now they can. That's pretty awesome.
Yeah, and then another bit of news with this release
is the optimized JSON parser
for both memory and speed, which should help for people trying to pull in JSON. So that's good.
Yeah, that's awesome. I think people use JSON every now and then. Not really sure.
All the microservices, it's just like the network lights are above those JSON messages.
So that's really cool. And that's all pretty straightforward. I want to show you some stuff that is not straightforward. So there's this project on GitHub that has really taken off,
there's a ton of people contributing to it. So here, let me pull up the main page and see,
there's 17 contributors who are doing a lot of work on this project. And it has about 3,600 stars called WTF Python. So if you've heard of, if you've seen the
Watt video about JavaScript and Ruby, which is hilarious, you know, Python is lucky in that
there's not that many weird edge cases, but this repository will show you, actually there's some
weird cases. So have you seen this Brian? No, I haven't. This is pretty funny. Yeah. I pulled out
four items, but there's a bunch and this is pretty funny yeah i pulled out four items but
there's a bunch and this is super active on github i'm getting all these notifications from it that's
cool like one is about skipping lines you say like value equals 11 value equals 32 what is value
it's 11 huh what is going on here there's another one that is similar in the same section says, quote, e, equal, equal,
quote, e, false. Okay. And things like that. And it's, it's about encoding and some interesting
stuff. So each one of these has like a really simple, you know, like three or four lines of
code, and then the explanation and the explanation, I think is where this gets interesting. So another
one is modifying dictionaries, like, these are super good ways to trick people.
Like, create a dictionary with one item,
go through for each item in it,
delete that item and add a new one,
and then print that out.
How many times did that loop run, do you think?
I have no idea.
It's either one or error or something
is what I would guess, right?
But the answer is eight.
Exactly eight.
You're like, what?
Why does it run eight why doesn't it run
one infinite or zero or error like those are the three zero one or infinity eight doesn't make any
sense but if you look at the implementation the dictionaries are pre-allocated because you're
typically adding stuff they want to grow in like a doubling sort of way not a every time you add
something it's got to reallocate
and copy around things. And so what they do is they pre allocate a certain number of items
in this trick, like leverages assigning into those new slots until it runs out.
So this is crazy. I'll give you one more example is let's go with the is, is, is not what it is. Is is not what it is. So if you say A equals 256, B equals 256, A is B is true. However,
if you say A is 257 and B is 257, A is B is false. Do you know why? It's another crazy one.
This is insane. And the reason is I believe the first 126 numbers, maybe negative as well,
I'm not sure, are pre-allocated for performance reasons.
And every time you literally say the number seven, that points to this pre-allocated flywheel pattern type thing.
But beyond that, these get allocated on demand.
So you're basically asking, is the pointer to 257 equal to the other pointer to 257?
And there's no longer this tracking between them and they get dropped.
So there's tons of this craziness going on here. That's pretty fun.
Yeah, that's nice.
So I think this is a fun project. I really commend the people working on it. It's great.
And I've definitely, I want to do something with this later. I just haven't figured out
quite what the details are yet, but there's got to be something fun here.
So this makes me feel like I should go practice my Python. Like maybe I'm not as good as I thought
I was because that dictionary thing going eight times kind of like took me for a loop for a bit. Anything in the WTF Python would
be evil to try to bring up at a job interview. It'd be very evil. Yeah. But if they answered it,
think of that. Yeah, that'd be good. I ran across this, it's a recent article called Python
Exercises. And I've done this before. So as a trying to either brush up
on Python skills or trying to do, find some questions to ask at an interview or something,
trying to come up with some decent questions. And a lot of the questions out there are,
they seem to be sort of generic questions around like any language. And they just happen to be
do it in Python python this is a collection
of questions that are some of them are pretty easy to start off with like basic syntax stuff
but they're some things that check actually just python and some use of the standard library
and it's i think it's a nice collection it's um it goes, of course, and then some text processing and OS integration
and decorators, generators, and you can get into quite a few things. But I think it's a nice set.
It's not too huge. It's a good one to look at. Yeah, yeah. And they don't seem too trivial.
They're like, given this set of data, parse it into a CSV file, start the subprocess,
things like that. It's pretty nice, actually. Yeah. And then at the end, the last thing they talk about is testing,
which I very much appreciate. I think it's important to make sure I've started with trying
to do, send out code examples to, before I bring somebody in for an interview, ask them to solve
some coding problem, but also to write a test to prove it works. I think that's a good thing to add.
Absolutely.
Yeah, that's really cool.
Great that they include that at the end as well.
So I've got another thing you should test for.
Before I tell you about it, though, I want to tell you about Spaces.
So Spaces is DigitalOcean's new service, which lets you basically store files on the
internet and either privately or publicly pass them around, right?
So kind of like
Amazon S3, but much, much more affordable. So instead of charging you nine cents per gigabyte,
they charge you one cent. And you can use exactly the same tools. So you know, like I use transmit
for my Mac, I love that to manage all my stuff in the cloud. And I when I switched to Digital Ocean
Spaces, which I did just because I saw the offer, I'm like, this is so much better
before we even talked about this.
I just pointed my transmit at that
and it just kept on working.
It said, hey, there's an S3 thing over here
and here's the key.
So if you are using S3
or some other sort of shared cloud storage
for files and things like that,
you definitely should check out
DigitalOcean Spaces at do.co slash
Python and check it out.
There's a two-month free trial, and then it's really, really affordable and straightforward.
I love it.
Nice.
The audio you're listening to right now came straight out of there.
So beautiful.
Have you heard of Pickle?
Oh, yeah.
Not the gherkins, but the built-in way to serialize stuff.
I don't remember why, but I try to avoid it
because I've heard there's problems.
Yeah, there's two major problems with Pickle.
One of them is it stores a binary representation
of your objects.
And so if you do things like rename a field
or maybe even reorder stuff, right?
If you add a field, remove a field,
there's all sorts of stuff where like
just the versioning of your classes or your data,
if that changes, you can no longer properly serialize these things.
It's not great.
So that can be a problem, and that's probably reason enough to use JSON or some other format.
However, right in the documentation it says,
warning, the pickle module is not intended to be secure against erroneous or maliciously constructed data.
Never unpickle data received from an
untrusted or unauthenticated source all right so i think people see this like okay that looks bad
let's get out of here and they just bail as they should like i think even the versioning stuff
alone is already an issue so like i think there was an issue with somebody caching stuff and when
they were switching from python 2 to python 3 the in-memory representation of like date time or some part of the memory was a different
representation and the pickling stuff started to conflict with each other.
Anyway, this article I want to talk about is called Exploiting Misuse of Python's Pickle.
So if you've ever read that warning and gone, huh, that sounds bad.
I can kind of imagine what that might look like.
I'm going to stay away from it. This one shows you exactly how to do bad things. And bad things
begin with let's create a remote shell and start executing code and maybe even let us log in
remotely over SSH to this machine by sending a little bit of binary data, like 50 bytes,
100 bytes, something super small over to this machine. And then we bit of binary data, like 50 bytes, 100 bytes, something super small,
over to this machine.
And then we'll just log in and go from there.
That sounds bad, right?
Yeah, geez.
So the idea is when you unpickle something,
there's a few hooks where you can run arbitrary Python code.
And so they say, well, let's just use subprocess.popen and create a shell for us.
So you just put that command in like your dunder reduce,
I think it's called. And then you've got shells and that's bad. So for those of you out there
wondering what is this warning about exactly? Why should I be super scared? Here's why. Great
little example. Super approachable. Yeah. Wacky. Yeah. Wacky. So if I was running like a Django
website, I probably wouldn't want to like use that as my exchange format on my services, right? No, and there's so many other better formats
anyway. So JSON, JSON, JSON. Yeah, for sure. All right. So what do you got next for us?
I've got a complete beginner's guide to Django. Awesome. This is a seven part series. And it
looks like six parts are done already. And the seventh part is coming up soon. And it kind of goes through quite a bit of Django.
I know there's already a lot of Django tutorials out there.
But the interesting thing I think that makes this one stand out is it's kind of has an academic feel to it, I think.
And if that's kind of your thing, you might like this.
Well, it has a chalkboard, it has a beaker, and it has a Superman flying. So these are all good signs. Yeah. Well, it has some like comic like drawings in it too
and stuff. Yeah, yeah, yeah. Actually, I think this is really nice. The graphics are wonderful.
They've got little wireframes to help you design the web pieces, some nice graphics for file
structure. It seems super approachable to me. I kind of got lost with some of the UML diagrams
and whatnot, but it's well written. People should check it out if you want to learn Django. So
maybe. Yep, absolutely. And it's based on Python, not legacy Python. So this is all good as well.
Yeah. So if you're looking to pick up Django, that's a good place to do it. All right. So do
you remember when we talked about the malicious packages being uploaded yes
pi pi yeah do you remember what they were targeting like how were they making those
getting people to install them well there were a couple ways they were naming standard library
things in pi pi and then also misspellings exactly so we have a new github project called pi pi
dash parker so this is a cool project by a guy named matt he sent this over said hey you should So we have a new GitHub project called PyPI-Parker.
So this is a cool project by a guy named Matt.
He sent this over and said, hey, you should check this out.
I don't think a lot of people know about it yet, but it's really cool.
So the idea is, you know, we had this debate about how do people check and how people verify what gets uploaded to PyPI?
Should there be like a committee that reviews it?
And all that sounded really bad.
And so he's created this library that says, look, the self-serve ability of people to just upload things to PyPI, this is a good thing. Let's not get rid of it. Let's just try to solve this
typo squatting problem. So what he's done is he's created this thing called the PyPI Parker, and it's an extension to distutils.
So it's a separate command that you can run on it.
So if I was like Kenneth Wrights and I create a request, you do this and
I could run the setup py and give it, I think it's park.
And it will actually generate additional packages that I can upload to PyPI.
And there'll be the various reasonable misspellings of requests.
And when you import them, it'll raise an error, an import error, and says,
no, no, no, this thing that you pip installed, you misspelled that.
Go get the real one over here.
So it gives them like a help message and all that kind of stuff. So it, one, ownership or provide it gives the ownership of these
misspellings to the original package owner. And then for the people trying to accidentally use
those, it will give them the warning to say you've misspelled this, but here's what you actually
should be looking for. I think that's great. Yeah, that's cool. Yeah. So well done, Matt.
If you're a package owner, check this out. It might be helpful. Since I'm not writing so much
anymore. I'm thinking about writing a couple new might be helpful. Since I'm not writing so much anymore,
I'm thinking about writing a couple new open source projects.
So I'll probably be in that boat soon.
Yeah, nice.
So you should use PyPI Parker and then give us a report.
Okay.
Awesome.
That's our six items for the week.
So hopefully everyone enjoyed them.
Brian, what else is going on?
Well, I'm just getting ready for Halloween, actually.
I know.
Houses around here are getting scary.
A lot of creatures and various cobwebs but
i have not been as busy as you have lately what have you been up to i have just released a brand
new course and you can find it at free mongodb course.com and that should give you pretty much
all you need to know about it so i have this paid course which is like a seven hour super in-depth
thing and i wanted to come up with a way for people to get started with Python,
get started with MongoDB.
And then if you want to learn more,
you can take the paid course or things like that.
So just drop over at freemongodbcourse.com and sign up.
There's really no strings attached.
You just have to create an account, and then you can go take the class.
Oh, another thing I wanted to point out.
This is maybe not worth a whole item, and this is not my thing.
This is just something I saw is Donald stuffed who runs pipe.
Yeah.
And the website and all that kind of stuff.
He sent out a tweet that said Python three usage has doubled in the past
year,
according to download stats on pipe.
Yeah.
Oh,
that's cool.
Yeah.
So legacy Python is definitely on the downward trend,
even though it's still the majority of things that get downloaded.
Yeah, so way to go, Donald, for putting that out there,
and nice to see that trend continuing.
All right, well, thank you, everyone, for listening.
Brian, thanks for finding these things and sharing them with everyone.
Yeah, thank you.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm. If you have a news item you want featured,
just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something
cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and
sharing this podcast with your friends and colleagues.