Python Bytes - #44 pip install malicious-code
Episode Date: September 20, 2017Topics covered in this episode: Ten Malicious Libraries Found on PyPI * PyPI migration to Warehouse is in progress* Live coding in a presentation * Notable REST / Web Frameworks* tox * flake8-tidy-...imports* deprecated imports Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/44
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 44, recorded on September 19th, 2017.
I'm Michael Kennedy.
And I'm Brian Ocken.
And it's been a big news for, a big week news, hasn't it, Brian?
Yeah, very big.
Yeah, we've got, I would say, the most listener feedback and requests to cover a particular topic,
which we're going to jump
right into as the first thing. But before we do, let's just say thanks to Datadog.
They are sponsoring this episode, as they have some others, and they've got some great tools
and even a way to get a free t-shirt at pythonbytes.fm slash datadog. So we'll talk more about them later.
Why don't you tell everyone what the big news is?
Apparently there's malicious libraries found on PyPI.
Right, so pip install virus.
Not so joyful as the pip install anti-gravity would make it, right?
It actually, I think, scared people more than the real threat, but let's talk about it. Yeah, you know what?
I didn't see what the actual code vulnerabilities, what the thing was, other than sort of a proof of concept stuff. So I don't know how big of a deal this is in terms of actual viruses and malicious code,
but it certainly shows the door is open for somebody to sling in some very bad things.
So the story is that there were a number of malicious libraries found on PyPI.
So these are basically packages that you would pip install, but they
either did some sort of typo squatting, or they grabbed the name of something that was already in
the standard library. So for example, people might try to use urllib and didn't import it,
right? And so they get an error, cannot find library library URL lib. And so then they go type pip, install URL lib.
Well, guess what?
That actually goes out to PyPI and grabs a thing.
And I think there's a misspelling like URL lib with one L, not two.
But they would grab those things and they would put those packages up there.
And to be even more devious, what they did is they actually took the implementation
and put it into those libraries.
So that it would actually work like it should, but it was...
So you might not notice it, right? You pip install the thing, you import the thing, it works.
But the problem is that the setup py, the actual setup code that installs or executes during
setup, like when this is installing, that was where the viruses or the malicious code lived.
And so that's bad. I look into it, the code that they were putting in there, it was this little,
it said proof of concept, no harm, no foul or something. But it was collecting what username
and your host IP address and sending that to some server in China.
Absolutely.
So I think what the best write-up on this was done by Dan Goodwin, I think, on Ars Technica.
So that's the primary link here to that article.
And the conversation, I find Ars Technica to be the best place for the comments to actually
be really meaningful.
So there's a great bunch of things in there.
But let's cover a little bit more of the details. There's a Slovak security authority that
actually discovered these packages. They discovered these packages and then send a message to the
Python package authority and they took those down right away. All right. So those are supposed to
be gone, but that doesn't obviously get them off of your servers,
get them off of your developer workstation if you pip installed something bad, right?
And there's actually a message from the PSF.
They did an official response to this.
And, you know, we talked several times
about the fragility of PyPI
and just how we're depending upon this thing
that there's really not a lot of resources put into.
All right, we talked about Donald Stuff
and I've had him on TalkPython and things like that.
And so the PSF said, this is just a part of what they said,
is unlike some language packaging management systems,
PyPI does not have any full-time staff devoted to it.
It's a volunteer-run project with only two active admins.
As such, it doesn't currently have the resources for some of the proposed solutions,
such as actively monitoring new projects, like inspecting code as it gets uploaded. Historically, and by necessity,
we've relied on a reactive system to take down potentially malicious projects as we've become
aware of them. Does that make you feel better, Brian? Not really. No. No, it doesn't make me
feel very good either. It's like, well, if someone notices a virus, of course, we'll take that down.
But other than that, like, good luck is basically what they're saying.
So there's some interesting comments, like I said, on that Ars Technica article.
And I've linked to four of them.
One of them, this is actually, I've been thinking about how you deal with this.
Like, do you digitally sign these things?
And then, like, everyone's going to get a key.
And then how do you know when a bad actor's key gets used?
They'll just regenerate it.
There's a lot of issues with getting like trusted keys, right?
Sort of SSL style.
But this guy, girl goes by Hugh, Hugh, Hugh.
Says, what if Pip gets more paranoid?
So if you say Pip install a thing
and there's a very slight misspelling
or slight change to that that is more much more
popular it'll actually instead of just install it give you a list of things and say it looks like
you might be trying to install this other thing that's way more popular than this thing and that
might be really interesting like if the thing you're installing has two downloads and the
thing you were trying to get had a half a million downloads you know maybe it will just say like
error you need to say like you know force it or something to that effect. So what do you think
about that? I'm a little uneasy with that. Giving preference to popular projects just because
they're popular. I don't know if that's the, maybe we're swinging too far. Yeah, possibly.
There's actually some stats on all the downloads of the bad packages. They were not really bad.
They were like really quite small numbers. There's some graphs and stuff. There's a person on the comment section
that said their name was Stastag and said, I'm sitting on a lot of the misspellings of common
package names. So that's pretty cool. Apparently I've created packages that do nothing
that are like typos. So typo squatters can't actually do this various stuff
with it there's an undergrad think in germany who studied this capability and like said actually
there's this problem you know it was like a year ago they had sort of said look this can be a real
problem but you know we could i guess feel a little bit better and that he also did the same
thing to ruby and he also did the thing the same thing to NPM for Node.js.
So it's kind of a common theme
that there's this challenge
across all the official package repositories.
Yeah, and one of the notes also was that
people like trying to pip install
something that's part of the standard library.
It shouldn't come from PyPI.
Absolutely.
So there has been a change to the warehouse to not allow new or have new packages that have the same name as standard library
packages have to go through approval process for that. Yeah. And you link to a PR pull request
2409 on PyPI slash warehouse. And that's pretty interesting, that conversation so yeah i see how people are
talking about solving the problem which ones are there how to deal with ones that are already there
but those are actual backports of so like somebody wants to bring async io to python 2 or to like a
lower version of python 3 then maybe they put that package up there and it's it would look like one
of these bad named things but they said the solution that they're considering is basically you can't create new ones without some sort of admin being involved
to say, yeah, I see what you're doing and it's okay. But the ones that exist, they won't like
kill them off or anything. Yeah. And one of the big example of that is, for instance,
mock is in the standard library as of Python 3, but in Python 2, it was separate. So I guess mock is really part
of the unit test library. Right, but it has a legitimate place both in the standard library
for Python 3 and on PyPI. Yeah, there are some legitimate backports that show up. So there's
legitimate reasons to have the same name. So that's a pretty nice segue to this news that
Jonas Newbert sent us about the new version of PyPI, which is called Warehouse,
and it might be finally moving. Yeah, and actually, so this was great. Jonas sent us an email,
and essentially he did almost all of my research for me, which I love that. Thank you, Jonas.
Feel free to do that, anybody. So he was writing an article. He was talking about the research he did for a topic when he wrote a blog post, which we have a link to called Publishing Your First PyPI Package, By and For the Absolute Beginner.
And it's a pretty nice, quick article. He talks about, well, anyway, one of the things he talked about when he emailed us is things have changed. And so a lot of the tutorials that are out there aren't valid anymore.
For instance, let's see, the pipi.org is no longer, it used to be read-only when we were just playing with it.
But now it's really where you go through to publish packages.
You write to there.
The old APIs at pipi.python.org slash pippi are disabled, so you have to use the new one.
Right, and if you have one of those hidden.pypirc files that you can configure,
like your package, username, password, URL, and so on, you have to change that URL, right?
If you're already done packages and pushed them up before, some of this will make sense and some of it won't.
But if you read Jonas' article, all of it will make sense.
Yeah, absolutely.
And I also had some good news, like things like Markdown support
is coming for the readme.md files.
Yeah, yeah.
That would be great.
Yeah, I'm looking forward to it.
I refuse to write restructured text.
So when I need it, I convert it from Markdown.
There you go.
Yeah, yeah, that's great.
So this is good news.
A couple of things.
One of the other things that I thought was interesting is that apparently, I didn't know this, but you could change some aspects on the old API, some aspects of your project, like the description or something.
There was a way to change that through the web interface or through the API without changing your package itself. And a lot of those have been closed down
and you really have to just re-upload your stuff
if you want to make quite a few changes.
And I actually think that's the way you should do it anyway,
so that's all right.
Yeah, that sounds good to me.
I've been long waiting for PyPI.org to be the thing.
It's just a nicer interface.
It's built in Pyramid, which is kind of cool.
I know that it's like a huge
revision of a very, very old
and sort of kludgy code, so it will also open up
PyPI for more
contributions and
collaboration with other people.
Yeah, and I'd really like for them to
I think it's totally usable now. I'd really
like to have them take down the
red notification at the top that makes it look like a warning. I'd really like to have them take down the red notification at the top
that makes it look like a warning.
And I don't think we need that anymore.
Yeah, it feels like it's going to go pretty soon.
But yeah, definitely that should move to the old one
and it should just stay.
It should be gone from the new one, right?
I'm ready for the switch to happen.
I understand that pip actually references, you know,
pipe.org and such for its URLs internally on something. So it's kind
of, it's kind of there anyway, but it's not, I don't know, it feels a little gradual.
And apparently the one holdout is you have to, right, currently still you have to create your
user account on the old website.
Maybe that's why that red bar is still there.
Maybe.
Maybe. All right. So last week we had a lot of fun talking about David Beasley's fun of reinvention, right?
Yeah, I love that.
Yeah, I love to talk too.
If anybody hasn't watched that, go back and watch that.
Yeah, we're basically link and do it again because it was awesome.
One of the things he did really well was he had these really cool live, he was live coding
during the presentation and he had some cool backgrounds and stuff and we have no idea how to do what David did.
We asked him and he won't share it yet.
Yeah, and if anyone knows,
go to pythonbytes.fm slash 44
and add a comment at the bottom
so we can all figure out how that cool trick was done.
Yeah, definitely.
But for now, you can do live coding.
I like live coding in a presentation,
but it can go wrong if things go wrong.
So I went out, I have a presentation that's coming up,
and I was thinking about whether I wanted to do this.
And so I found a few links talking about it, about advice.
One of them is basically advice for live coding,
and it's basically practice a lot and have a backup plan.
I guess that's the real meat of it.
And then also one thing is while you're coding a lot, it might be plan. I guess that's the real meat of it. And then also, one thing is,
while you're coding a lot, it might be fun for you just to code, but you have to talk at the
same time. So if you can't talk and code at the same time, maybe it's not for you.
So if you want to have the same effect, but not live code, so there's a couple other articles
called Not Quite Live Coding and Avoiding Live Coding. They're kind of cool there talks about basically how you
can do like github labels or get labels to pull in new parts of your code if you want to watch it
and my favorite right you can basically go from like tag to tag to tag and then talk about the
new code that's appeared without actually typing it although i'm with you i'm for the live coding
that is the most legit but like these are fallbacks and I think that's not bad. The last one is supposedly a bit of work.
I'm going to have to try this out, is doing a fade in. So you've got all your code showing up
on a slide, but instead of showing a huge eye diagram of a whole bunch of code and nobody knows
really, are they supposed to just read all the code at once? Is to fade in the code a snippet at a time,
highlight the piece that you're talking about.
And then for the next slide or the next fade in,
fade in the new piece of code.
And I hadn't actually seen how to do that before,
but it talks about using Reveal.js
and some other tricks to do that.
Yeah, that's a really nice effect.
If you're going to have code up there or or even lots of text in any sort of presentation,
definitely don't just blast it all up there.
Let it come in piece by piece or somehow indicate the little sections you're talking about,
and that definitely makes it more engaging for sure.
I brought this up also today because I was curious about your choice.
It sounds like you like live coding as well, that at least yeah i'm definitely for the live
coding like if if people do it well like when it goes bad it kind of makes me squirm and be
uncomfortable but done well i think like if you you as an audience member if you see something
being presented and then you actually saw every step of it and then in the end you see the outcome you outcome. You're like, well, I saw every bit of it. There was nothing that was crazy
there. And now, now it's doing this. Like, I feel like I could totally do that. There's nothing in,
you know, sort of scary about it anymore once you see it done live. And I think a lot of times you
can skip over that and just sort of like fling pieces of code together. And then you're like,
well, yeah, but those were slides. Maybe this is way harder than it sounds.
You know, if you see it done live, you kind of know how hard it is.
Yeah, I agree.
I think I'm going to opt for something almost there first.
Yeah, of course.
And I'd also like to hear from my listeners to see,
I'd like to hear like some live coding horror stories
and also some tips for how to do some Python live coding.
If anybody has any cool tools to share, that'd be great.
Yeah, sounds awesome.
All right, before we get to our next topic, let's talk about Datadog.
So they're sponsoring the show and they're doing really cool stuff.
So if you have performance or bottlenecks in your application, that may be in your code, but it might be just somewhere in the whole stack that you're using.
So let's say you have a Python web framework,
web app running Flask, and it's built upon Mongo,
and it's Scala on Ubuntu running Nginx and MicroWSGI.
With Datadog, you can actually monitor all of those pieces as a whole.
So that's super powerful if you want to understand
really why your app's slow, not just why your Python code is slow.
So they have a great getting started tutorial,
and you can check that out and get a free Datadog t-shirt.
So just visit pythonbytes.fm slash Datadog and see what they've got to offer.
It's pretty cool.
That's cool.
Yeah, and thank you, Datadog, for keeping the show rolling.
All right, let's talk, speaking of web, let's talk a little bit about REST.
Okay.
All right, so I mentioned Flask.
I mentioned Pyramid. There's Django, of course. Those are talk a little bit about REST. Okay. All right. So I mentioned Flask. I mentioned Pyramid.
There's Django, of course.
Those are the three sort of high-level web frameworks.
And they're great.
They're good for building web applications.
There's extensions, or even they themselves are good for building RESTful services.
But there's two really interesting web API frameworks in Python that a listener suggested we talk about, and I'm excited to talk about them.
So there's these two called, one is Falcon and one is Hug.
First of all, those are pretty good names for frameworks, right?
Yeah, they're pretty good.
I've heard of Hug, but I've never heard of Falcon.
Yeah, so I just had the Falcon guys talk Python to me last week on episode 129, and that is a super low level, really high performance,
restful framework. So they call it a bare metal Python web API for building very fast backends
and microservices. And they don't see it as competing with those frameworks I've mentioned,
but they see it as more complimentary, like you write your app in that. And if you need like that
super fast little service, you use this. And it even works on pi pi for extra extra speed boost so that's cool and you can use
falcon and it's really really low level and then there's hug which is actually a web web service
restful api built upon falcon so they're sort of you want hug is using Falcon for his low level capabilities. But then hug is like a simplification on top of these API. So you can do really interesting stuff with hug, like, you just put a decorator onto a function, and all of a sudden, it becomes an API that you can work with might be a method on a class, but you can work with that really simply and one of the unique
things about it is it comes with built-in self-documenting apis right so it will like
tell you can ask it what your functions are and it'll give you a description and they're exposed
over you can expose them in different ways so maybe i have an api that i can access over htp
but i could also make that a python package where it exposes that API and make it like a command line thing where it exposes that as a command line thing.
And those are all the same bits of code just exposed differently with Hug.
Oh, that's cool.
Yeah, that's pretty neat, right?
Yeah, I got to try that out.
So if you're building RESTful services, give these two things a look depending on which level you want to work at.
They're kind of neat.
All right, but you might want to test those, right?
You should test them. So if you are testing them, you might want to test them in multiple environments. And so talks would be a good thing. Yeah, we got a conversation,
had a nice conversation with some listeners on Twitter, like, hey, what is talks? Will you tell
us what talks is? So Brian, tell me what talks is. Well, yeah, first off, we're not going to like,
we're going to give a little sneak peek on what talkss is, but I think it does quite a bit.
So I reached out to one of the Talks developers, Oliver Bestweller, and he has agreed to come on Testing Code to have a longer conversation.
We haven't scheduled that yet, but we'll let you know when it's up.
But for now, Talks, and this is a quote from Oliver, the name of the Tox automation project derives from testing out of the box.
I didn't know that before I read this.
But it aims to automate and standardize testing in Python.
It's conceptually above PyTest or whatever else you use and serves as a command line front end.
I think of it similar to something like a
Travis CI or something that you could do on the command line.
Right. It lets you pick different versions of Python. So you could say Python 2.7 and Python
3.5. And it basically depends upon PyTest or something like that, right? It'll orchestrate
running your tests on PyTest in those environments, for example.
Yeah. And one of the things that I really like about it is when you are distributing something,
it's not just your code that you need to test.
It's also the packaging and installation process and all of that.
You want to make sure that all that works.
And so essentially what it does in this normal, this is the normal use model,
is to list a handful of Python versions.
And then what Toxel will do is use your setup.py file to create a source distribution
and then create a virtual environment and then install dependencies and then
install your package and then run the tests and then do all of that for
each of the different Pythons. So using different versions of Python to run the
setup all the way through running the tests.
Yeah, that's really cool.
And that's really, if you let it do all that,
you have to wait for it.
It's slower because you're creating
that distribution every time and other things.
But there are, I left,
there's a couple of links in the show notes
on some tips and patterns that are are you can speed things up if
you need to but just having this ability just at your desktop in the command line is really great
for testing your stuff yeah that's really cool and i believe there was something to do with python 2
and that original vulnerability stuff that people discovered on pypi right like the vulnerable code
only ran on python 2 or something right and that's how they discoveredPI, right? Like the vulnerable code only ran on Python 2 or
something, right? And that's how they discovered it? I think that's the case. I don't have it.
I don't have it pulled up either, but yeah. A source to verify that, but like on Twitter,
somebody said, oh yeah, and we found this because of talks and testing this stuff on Python 3.
Yeah, that's beautiful. All right, awesome. So last one, I want to talk about legacy Python a
little bit as well. So there's a flake eight, right? Which is a linter and talks about your
code and tells you what you're doing right and wrong, things like that. There's a, I think it's
a plugin called flake eight tidy imports. And so one of our listeners said, Hey, I added this cool
feature to tidy imports. And I thought it was pretty pretty cool so I thought I'd highlight it here people who are moving to Python 3 you might want to check this out so you can declare Python
2 to 3 as a banned module import in flake 8 and then it'll go through and actually find any of
the modules that would have worked in Python 2 but not not in Python 3. For example, mock, right?
So you used to say import mock,
but now you would just use import unitest.mock as mock
or something like this, right?
So it would actually give you that warning.
Like in Python 3, you don't use mock anymore.
You use unitest.mock.
And it gives you like a nice useful message,
not just this was not, you shouldn't use this anymore,
but here's the thing to use instead as you do this upgrade.
So it kind of shames people a little bit for using the old stuff, which is good.
Yeah, I really like it. Actually, I use that as well.
That's great. Very nice. And I have a bonus one for us, actually. I want to throw it in really
quick. So Jesse Davis from MongoDB did a PyMongo driver, stuff like that. He actually is the
organizer for PyGotham. So that is the Pi conference in New York City.
And he's really into helping and mentoring people, especially people who are new speakers.
So he's running this project where he's trying to raise money to hire a speaking coach to work with and mentor first-time speakers who he's getting to come speak at Pi Gotham.
And he's trying to raise $1,200.
And it turns out just like today, yeah, As of today, he's raised his goal,
but I'm sure that he can do more if he had some more money.
So I'm linking to his,
his article called help me offer coaching to first time PyGotham speakers,
which I thought was a cool project.
And I'm happy to spread the word for Jesse cause you know,
it's great to have more people coming in to the community.
Yeah. I think that things like this are awesome and I I like covering it anyway and I asked him to
maybe write up something after after the conference just to but like to hear how that goes I'd like to
hear from the people that got coached and and how the process went if it helped things yeah that'd
be really cool sort of retrospective like was this actually useful like what did you learn? Like to see if it's something we should be doing as a community. Yeah.
And then other conferences, and I don't have any links right now, but some conferences do like mentors for submitting your proposal. So a talk proposal, they'll have a mentor program so you can work with somebody to build up your proposal in the first place.
Yeah, that's kind of the first step to being a first-time speaker.
Okay, cool.
Awesome. Well, good job, Jesse. How about you? What other news you got? Have you forgotten
about your book and you're just like relaxing, living life again?
It's printing. No, I haven't forgotten. But I am relaxing a lot more and there's sunshine
outside. I'm going outside more, which is good. Not sunshine
today. You're actually seeing the outside. Yeah, but I'm seeing the outside. Yeah, that's awesome.
But the physical, you can order them now. Apparently they're printing and shipping,
so that's awesome. Yeah, very good. Very good. That's great to hear. So I remember last week,
I talked about adding switch to Python and I said, I'll put it up on GitHub. Yeah, and you did. I did. And I would say about 75% of the people said it was awesome.
So cool.
And 25% of the people said, please, no, don't do this.
But, you know, you can't please everyone, and it's not changing the language.
It's just a package on GitHub.
You can do whatever you want with it.
So anyway, it was actually in the top Python trending packages on GitHub
out of all Python
packages.
Sorry, repos.
Wow, really?
Yeah, yeah.
Last week, it was pretty awesome.
That's great.
And it had like 175 comments on Reddit or something.
So it's an interesting set of conversations that comes up around it.
So that was a follow-up to last week where I talked about that.
And then also, I'm writing a free MongoDB course that's going to compliment my
paid MongoDB course, right? Like a short one. That's an intro sort of thing.
So people can, there's a link at the bottom of the show notes.
People can sign up to get notified. That'll probably be out.
I finished writing that this week, like this morning,
and I'll probably have that out in a few weeks. That's great. Yeah.
Should be fun. All right. Well, Brian,
thanks for doing all the research
or having our listeners do some research for you.
It was really fun to talk about this.
And if you guys have thoughts,
especially on the PyPI security thing,
go to pythonbytes.fm slash 44
and add your thoughts at the bottom.
This is kind of a big deal.
Yeah, and thanks everybody for helping
come up with ideas for the show.
We always appreciate it. Yep, keep it coming. Very much appreciated. All right. Bye, Brian. Yeah, and thanks to everybody for helping come up with ideas for the show. We always appreciate it.
Yep, keep it coming.
Very much appreciated.
All right.
Bye, Brian.
Bye, everyone.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit pythonbytes.fm and send
it our way. We're always on the lookout for sharing something cool. On behalf of myself
and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with
your friends and colleagues.