Python Bytes - #44 pip install malicious-code

Episode Date: September 20, 2017

Topics covered in this episode: Ten Malicious Libraries Found on PyPI * PyPI migration to Warehouse is in progress* Live coding in a presentation * Notable REST / Web Frameworks* tox * flake8-tidy-...imports* deprecated imports Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/44

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 44, recorded on September 19th, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And it's been a big news for, a big week news, hasn't it, Brian? Yeah, very big. Yeah, we've got, I would say, the most listener feedback and requests to cover a particular topic, which we're going to jump
Starting point is 00:00:25 right into as the first thing. But before we do, let's just say thanks to Datadog. They are sponsoring this episode, as they have some others, and they've got some great tools and even a way to get a free t-shirt at pythonbytes.fm slash datadog. So we'll talk more about them later. Why don't you tell everyone what the big news is? Apparently there's malicious libraries found on PyPI. Right, so pip install virus. Not so joyful as the pip install anti-gravity would make it, right? It actually, I think, scared people more than the real threat, but let's talk about it. Yeah, you know what?
Starting point is 00:00:55 I didn't see what the actual code vulnerabilities, what the thing was, other than sort of a proof of concept stuff. So I don't know how big of a deal this is in terms of actual viruses and malicious code, but it certainly shows the door is open for somebody to sling in some very bad things. So the story is that there were a number of malicious libraries found on PyPI. So these are basically packages that you would pip install, but they either did some sort of typo squatting, or they grabbed the name of something that was already in the standard library. So for example, people might try to use urllib and didn't import it, right? And so they get an error, cannot find library library URL lib. And so then they go type pip, install URL lib. Well, guess what?
Starting point is 00:01:49 That actually goes out to PyPI and grabs a thing. And I think there's a misspelling like URL lib with one L, not two. But they would grab those things and they would put those packages up there. And to be even more devious, what they did is they actually took the implementation and put it into those libraries. So that it would actually work like it should, but it was... So you might not notice it, right? You pip install the thing, you import the thing, it works. But the problem is that the setup py, the actual setup code that installs or executes during
Starting point is 00:02:20 setup, like when this is installing, that was where the viruses or the malicious code lived. And so that's bad. I look into it, the code that they were putting in there, it was this little, it said proof of concept, no harm, no foul or something. But it was collecting what username and your host IP address and sending that to some server in China. Absolutely. So I think what the best write-up on this was done by Dan Goodwin, I think, on Ars Technica. So that's the primary link here to that article. And the conversation, I find Ars Technica to be the best place for the comments to actually
Starting point is 00:03:01 be really meaningful. So there's a great bunch of things in there. But let's cover a little bit more of the details. There's a Slovak security authority that actually discovered these packages. They discovered these packages and then send a message to the Python package authority and they took those down right away. All right. So those are supposed to be gone, but that doesn't obviously get them off of your servers, get them off of your developer workstation if you pip installed something bad, right? And there's actually a message from the PSF.
Starting point is 00:03:31 They did an official response to this. And, you know, we talked several times about the fragility of PyPI and just how we're depending upon this thing that there's really not a lot of resources put into. All right, we talked about Donald Stuff and I've had him on TalkPython and things like that. And so the PSF said, this is just a part of what they said,
Starting point is 00:03:50 is unlike some language packaging management systems, PyPI does not have any full-time staff devoted to it. It's a volunteer-run project with only two active admins. As such, it doesn't currently have the resources for some of the proposed solutions, such as actively monitoring new projects, like inspecting code as it gets uploaded. Historically, and by necessity, we've relied on a reactive system to take down potentially malicious projects as we've become aware of them. Does that make you feel better, Brian? Not really. No. No, it doesn't make me feel very good either. It's like, well, if someone notices a virus, of course, we'll take that down.
Starting point is 00:04:25 But other than that, like, good luck is basically what they're saying. So there's some interesting comments, like I said, on that Ars Technica article. And I've linked to four of them. One of them, this is actually, I've been thinking about how you deal with this. Like, do you digitally sign these things? And then, like, everyone's going to get a key. And then how do you know when a bad actor's key gets used? They'll just regenerate it.
Starting point is 00:04:45 There's a lot of issues with getting like trusted keys, right? Sort of SSL style. But this guy, girl goes by Hugh, Hugh, Hugh. Says, what if Pip gets more paranoid? So if you say Pip install a thing and there's a very slight misspelling or slight change to that that is more much more popular it'll actually instead of just install it give you a list of things and say it looks like
Starting point is 00:05:12 you might be trying to install this other thing that's way more popular than this thing and that might be really interesting like if the thing you're installing has two downloads and the thing you were trying to get had a half a million downloads you know maybe it will just say like error you need to say like you know force it or something to that effect. So what do you think about that? I'm a little uneasy with that. Giving preference to popular projects just because they're popular. I don't know if that's the, maybe we're swinging too far. Yeah, possibly. There's actually some stats on all the downloads of the bad packages. They were not really bad. They were like really quite small numbers. There's some graphs and stuff. There's a person on the comment section
Starting point is 00:05:49 that said their name was Stastag and said, I'm sitting on a lot of the misspellings of common package names. So that's pretty cool. Apparently I've created packages that do nothing that are like typos. So typo squatters can't actually do this various stuff with it there's an undergrad think in germany who studied this capability and like said actually there's this problem you know it was like a year ago they had sort of said look this can be a real problem but you know we could i guess feel a little bit better and that he also did the same thing to ruby and he also did the thing the same thing to NPM for Node.js. So it's kind of a common theme
Starting point is 00:06:28 that there's this challenge across all the official package repositories. Yeah, and one of the notes also was that people like trying to pip install something that's part of the standard library. It shouldn't come from PyPI. Absolutely. So there has been a change to the warehouse to not allow new or have new packages that have the same name as standard library
Starting point is 00:06:51 packages have to go through approval process for that. Yeah. And you link to a PR pull request 2409 on PyPI slash warehouse. And that's pretty interesting, that conversation so yeah i see how people are talking about solving the problem which ones are there how to deal with ones that are already there but those are actual backports of so like somebody wants to bring async io to python 2 or to like a lower version of python 3 then maybe they put that package up there and it's it would look like one of these bad named things but they said the solution that they're considering is basically you can't create new ones without some sort of admin being involved to say, yeah, I see what you're doing and it's okay. But the ones that exist, they won't like kill them off or anything. Yeah. And one of the big example of that is, for instance,
Starting point is 00:07:38 mock is in the standard library as of Python 3, but in Python 2, it was separate. So I guess mock is really part of the unit test library. Right, but it has a legitimate place both in the standard library for Python 3 and on PyPI. Yeah, there are some legitimate backports that show up. So there's legitimate reasons to have the same name. So that's a pretty nice segue to this news that Jonas Newbert sent us about the new version of PyPI, which is called Warehouse, and it might be finally moving. Yeah, and actually, so this was great. Jonas sent us an email, and essentially he did almost all of my research for me, which I love that. Thank you, Jonas. Feel free to do that, anybody. So he was writing an article. He was talking about the research he did for a topic when he wrote a blog post, which we have a link to called Publishing Your First PyPI Package, By and For the Absolute Beginner.
Starting point is 00:08:35 And it's a pretty nice, quick article. He talks about, well, anyway, one of the things he talked about when he emailed us is things have changed. And so a lot of the tutorials that are out there aren't valid anymore. For instance, let's see, the pipi.org is no longer, it used to be read-only when we were just playing with it. But now it's really where you go through to publish packages. You write to there. The old APIs at pipi.python.org slash pippi are disabled, so you have to use the new one. Right, and if you have one of those hidden.pypirc files that you can configure, like your package, username, password, URL, and so on, you have to change that URL, right? If you're already done packages and pushed them up before, some of this will make sense and some of it won't.
Starting point is 00:09:23 But if you read Jonas' article, all of it will make sense. Yeah, absolutely. And I also had some good news, like things like Markdown support is coming for the readme.md files. Yeah, yeah. That would be great. Yeah, I'm looking forward to it. I refuse to write restructured text.
Starting point is 00:09:40 So when I need it, I convert it from Markdown. There you go. Yeah, yeah, that's great. So this is good news. A couple of things. One of the other things that I thought was interesting is that apparently, I didn't know this, but you could change some aspects on the old API, some aspects of your project, like the description or something. There was a way to change that through the web interface or through the API without changing your package itself. And a lot of those have been closed down and you really have to just re-upload your stuff
Starting point is 00:10:09 if you want to make quite a few changes. And I actually think that's the way you should do it anyway, so that's all right. Yeah, that sounds good to me. I've been long waiting for PyPI.org to be the thing. It's just a nicer interface. It's built in Pyramid, which is kind of cool. I know that it's like a huge
Starting point is 00:10:28 revision of a very, very old and sort of kludgy code, so it will also open up PyPI for more contributions and collaboration with other people. Yeah, and I'd really like for them to I think it's totally usable now. I'd really like to have them take down the
Starting point is 00:10:43 red notification at the top that makes it look like a warning. I'd really like to have them take down the red notification at the top that makes it look like a warning. And I don't think we need that anymore. Yeah, it feels like it's going to go pretty soon. But yeah, definitely that should move to the old one and it should just stay. It should be gone from the new one, right? I'm ready for the switch to happen.
Starting point is 00:10:58 I understand that pip actually references, you know, pipe.org and such for its URLs internally on something. So it's kind of, it's kind of there anyway, but it's not, I don't know, it feels a little gradual. And apparently the one holdout is you have to, right, currently still you have to create your user account on the old website. Maybe that's why that red bar is still there. Maybe. Maybe. All right. So last week we had a lot of fun talking about David Beasley's fun of reinvention, right?
Starting point is 00:11:28 Yeah, I love that. Yeah, I love to talk too. If anybody hasn't watched that, go back and watch that. Yeah, we're basically link and do it again because it was awesome. One of the things he did really well was he had these really cool live, he was live coding during the presentation and he had some cool backgrounds and stuff and we have no idea how to do what David did. We asked him and he won't share it yet. Yeah, and if anyone knows,
Starting point is 00:11:50 go to pythonbytes.fm slash 44 and add a comment at the bottom so we can all figure out how that cool trick was done. Yeah, definitely. But for now, you can do live coding. I like live coding in a presentation, but it can go wrong if things go wrong. So I went out, I have a presentation that's coming up,
Starting point is 00:12:07 and I was thinking about whether I wanted to do this. And so I found a few links talking about it, about advice. One of them is basically advice for live coding, and it's basically practice a lot and have a backup plan. I guess that's the real meat of it. And then also one thing is while you're coding a lot, it might be plan. I guess that's the real meat of it. And then also, one thing is, while you're coding a lot, it might be fun for you just to code, but you have to talk at the same time. So if you can't talk and code at the same time, maybe it's not for you.
Starting point is 00:12:34 So if you want to have the same effect, but not live code, so there's a couple other articles called Not Quite Live Coding and Avoiding Live Coding. They're kind of cool there talks about basically how you can do like github labels or get labels to pull in new parts of your code if you want to watch it and my favorite right you can basically go from like tag to tag to tag and then talk about the new code that's appeared without actually typing it although i'm with you i'm for the live coding that is the most legit but like these are fallbacks and I think that's not bad. The last one is supposedly a bit of work. I'm going to have to try this out, is doing a fade in. So you've got all your code showing up on a slide, but instead of showing a huge eye diagram of a whole bunch of code and nobody knows
Starting point is 00:13:21 really, are they supposed to just read all the code at once? Is to fade in the code a snippet at a time, highlight the piece that you're talking about. And then for the next slide or the next fade in, fade in the new piece of code. And I hadn't actually seen how to do that before, but it talks about using Reveal.js and some other tricks to do that. Yeah, that's a really nice effect.
Starting point is 00:13:44 If you're going to have code up there or or even lots of text in any sort of presentation, definitely don't just blast it all up there. Let it come in piece by piece or somehow indicate the little sections you're talking about, and that definitely makes it more engaging for sure. I brought this up also today because I was curious about your choice. It sounds like you like live coding as well, that at least yeah i'm definitely for the live coding like if if people do it well like when it goes bad it kind of makes me squirm and be uncomfortable but done well i think like if you you as an audience member if you see something
Starting point is 00:14:20 being presented and then you actually saw every step of it and then in the end you see the outcome you outcome. You're like, well, I saw every bit of it. There was nothing that was crazy there. And now, now it's doing this. Like, I feel like I could totally do that. There's nothing in, you know, sort of scary about it anymore once you see it done live. And I think a lot of times you can skip over that and just sort of like fling pieces of code together. And then you're like, well, yeah, but those were slides. Maybe this is way harder than it sounds. You know, if you see it done live, you kind of know how hard it is. Yeah, I agree. I think I'm going to opt for something almost there first.
Starting point is 00:14:54 Yeah, of course. And I'd also like to hear from my listeners to see, I'd like to hear like some live coding horror stories and also some tips for how to do some Python live coding. If anybody has any cool tools to share, that'd be great. Yeah, sounds awesome. All right, before we get to our next topic, let's talk about Datadog. So they're sponsoring the show and they're doing really cool stuff.
Starting point is 00:15:18 So if you have performance or bottlenecks in your application, that may be in your code, but it might be just somewhere in the whole stack that you're using. So let's say you have a Python web framework, web app running Flask, and it's built upon Mongo, and it's Scala on Ubuntu running Nginx and MicroWSGI. With Datadog, you can actually monitor all of those pieces as a whole. So that's super powerful if you want to understand really why your app's slow, not just why your Python code is slow. So they have a great getting started tutorial,
Starting point is 00:15:48 and you can check that out and get a free Datadog t-shirt. So just visit pythonbytes.fm slash Datadog and see what they've got to offer. It's pretty cool. That's cool. Yeah, and thank you, Datadog, for keeping the show rolling. All right, let's talk, speaking of web, let's talk a little bit about REST. Okay. All right, so I mentioned Flask.
Starting point is 00:16:04 I mentioned Pyramid. There's Django, of course. Those are talk a little bit about REST. Okay. All right. So I mentioned Flask. I mentioned Pyramid. There's Django, of course. Those are the three sort of high-level web frameworks. And they're great. They're good for building web applications. There's extensions, or even they themselves are good for building RESTful services. But there's two really interesting web API frameworks in Python that a listener suggested we talk about, and I'm excited to talk about them. So there's these two called, one is Falcon and one is Hug.
Starting point is 00:16:32 First of all, those are pretty good names for frameworks, right? Yeah, they're pretty good. I've heard of Hug, but I've never heard of Falcon. Yeah, so I just had the Falcon guys talk Python to me last week on episode 129, and that is a super low level, really high performance, restful framework. So they call it a bare metal Python web API for building very fast backends and microservices. And they don't see it as competing with those frameworks I've mentioned, but they see it as more complimentary, like you write your app in that. And if you need like that super fast little service, you use this. And it even works on pi pi for extra extra speed boost so that's cool and you can use
Starting point is 00:17:11 falcon and it's really really low level and then there's hug which is actually a web web service restful api built upon falcon so they're sort of you want hug is using Falcon for his low level capabilities. But then hug is like a simplification on top of these API. So you can do really interesting stuff with hug, like, you just put a decorator onto a function, and all of a sudden, it becomes an API that you can work with might be a method on a class, but you can work with that really simply and one of the unique things about it is it comes with built-in self-documenting apis right so it will like tell you can ask it what your functions are and it'll give you a description and they're exposed over you can expose them in different ways so maybe i have an api that i can access over htp but i could also make that a python package where it exposes that API and make it like a command line thing where it exposes that as a command line thing. And those are all the same bits of code just exposed differently with Hug. Oh, that's cool.
Starting point is 00:18:14 Yeah, that's pretty neat, right? Yeah, I got to try that out. So if you're building RESTful services, give these two things a look depending on which level you want to work at. They're kind of neat. All right, but you might want to test those, right? You should test them. So if you are testing them, you might want to test them in multiple environments. And so talks would be a good thing. Yeah, we got a conversation, had a nice conversation with some listeners on Twitter, like, hey, what is talks? Will you tell us what talks is? So Brian, tell me what talks is. Well, yeah, first off, we're not going to like,
Starting point is 00:18:42 we're going to give a little sneak peek on what talkss is, but I think it does quite a bit. So I reached out to one of the Talks developers, Oliver Bestweller, and he has agreed to come on Testing Code to have a longer conversation. We haven't scheduled that yet, but we'll let you know when it's up. But for now, Talks, and this is a quote from Oliver, the name of the Tox automation project derives from testing out of the box. I didn't know that before I read this. But it aims to automate and standardize testing in Python. It's conceptually above PyTest or whatever else you use and serves as a command line front end. I think of it similar to something like a
Starting point is 00:19:26 Travis CI or something that you could do on the command line. Right. It lets you pick different versions of Python. So you could say Python 2.7 and Python 3.5. And it basically depends upon PyTest or something like that, right? It'll orchestrate running your tests on PyTest in those environments, for example. Yeah. And one of the things that I really like about it is when you are distributing something, it's not just your code that you need to test. It's also the packaging and installation process and all of that. You want to make sure that all that works.
Starting point is 00:19:57 And so essentially what it does in this normal, this is the normal use model, is to list a handful of Python versions. And then what Toxel will do is use your setup.py file to create a source distribution and then create a virtual environment and then install dependencies and then install your package and then run the tests and then do all of that for each of the different Pythons. So using different versions of Python to run the setup all the way through running the tests. Yeah, that's really cool.
Starting point is 00:20:27 And that's really, if you let it do all that, you have to wait for it. It's slower because you're creating that distribution every time and other things. But there are, I left, there's a couple of links in the show notes on some tips and patterns that are are you can speed things up if you need to but just having this ability just at your desktop in the command line is really great
Starting point is 00:20:51 for testing your stuff yeah that's really cool and i believe there was something to do with python 2 and that original vulnerability stuff that people discovered on pypi right like the vulnerable code only ran on python 2 or something right and that's how they discoveredPI, right? Like the vulnerable code only ran on Python 2 or something, right? And that's how they discovered it? I think that's the case. I don't have it. I don't have it pulled up either, but yeah. A source to verify that, but like on Twitter, somebody said, oh yeah, and we found this because of talks and testing this stuff on Python 3. Yeah, that's beautiful. All right, awesome. So last one, I want to talk about legacy Python a little bit as well. So there's a flake eight, right? Which is a linter and talks about your
Starting point is 00:21:30 code and tells you what you're doing right and wrong, things like that. There's a, I think it's a plugin called flake eight tidy imports. And so one of our listeners said, Hey, I added this cool feature to tidy imports. And I thought it was pretty pretty cool so I thought I'd highlight it here people who are moving to Python 3 you might want to check this out so you can declare Python 2 to 3 as a banned module import in flake 8 and then it'll go through and actually find any of the modules that would have worked in Python 2 but not not in Python 3. For example, mock, right? So you used to say import mock, but now you would just use import unitest.mock as mock or something like this, right?
Starting point is 00:22:12 So it would actually give you that warning. Like in Python 3, you don't use mock anymore. You use unitest.mock. And it gives you like a nice useful message, not just this was not, you shouldn't use this anymore, but here's the thing to use instead as you do this upgrade. So it kind of shames people a little bit for using the old stuff, which is good. Yeah, I really like it. Actually, I use that as well.
Starting point is 00:22:29 That's great. Very nice. And I have a bonus one for us, actually. I want to throw it in really quick. So Jesse Davis from MongoDB did a PyMongo driver, stuff like that. He actually is the organizer for PyGotham. So that is the Pi conference in New York City. And he's really into helping and mentoring people, especially people who are new speakers. So he's running this project where he's trying to raise money to hire a speaking coach to work with and mentor first-time speakers who he's getting to come speak at Pi Gotham. And he's trying to raise $1,200. And it turns out just like today, yeah, As of today, he's raised his goal, but I'm sure that he can do more if he had some more money.
Starting point is 00:23:11 So I'm linking to his, his article called help me offer coaching to first time PyGotham speakers, which I thought was a cool project. And I'm happy to spread the word for Jesse cause you know, it's great to have more people coming in to the community. Yeah. I think that things like this are awesome and I I like covering it anyway and I asked him to maybe write up something after after the conference just to but like to hear how that goes I'd like to hear from the people that got coached and and how the process went if it helped things yeah that'd
Starting point is 00:23:40 be really cool sort of retrospective like was this actually useful like what did you learn? Like to see if it's something we should be doing as a community. Yeah. And then other conferences, and I don't have any links right now, but some conferences do like mentors for submitting your proposal. So a talk proposal, they'll have a mentor program so you can work with somebody to build up your proposal in the first place. Yeah, that's kind of the first step to being a first-time speaker. Okay, cool. Awesome. Well, good job, Jesse. How about you? What other news you got? Have you forgotten about your book and you're just like relaxing, living life again? It's printing. No, I haven't forgotten. But I am relaxing a lot more and there's sunshine outside. I'm going outside more, which is good. Not sunshine
Starting point is 00:24:25 today. You're actually seeing the outside. Yeah, but I'm seeing the outside. Yeah, that's awesome. But the physical, you can order them now. Apparently they're printing and shipping, so that's awesome. Yeah, very good. Very good. That's great to hear. So I remember last week, I talked about adding switch to Python and I said, I'll put it up on GitHub. Yeah, and you did. I did. And I would say about 75% of the people said it was awesome. So cool. And 25% of the people said, please, no, don't do this. But, you know, you can't please everyone, and it's not changing the language. It's just a package on GitHub.
Starting point is 00:24:58 You can do whatever you want with it. So anyway, it was actually in the top Python trending packages on GitHub out of all Python packages. Sorry, repos. Wow, really? Yeah, yeah. Last week, it was pretty awesome.
Starting point is 00:25:09 That's great. And it had like 175 comments on Reddit or something. So it's an interesting set of conversations that comes up around it. So that was a follow-up to last week where I talked about that. And then also, I'm writing a free MongoDB course that's going to compliment my paid MongoDB course, right? Like a short one. That's an intro sort of thing. So people can, there's a link at the bottom of the show notes. People can sign up to get notified. That'll probably be out.
Starting point is 00:25:37 I finished writing that this week, like this morning, and I'll probably have that out in a few weeks. That's great. Yeah. Should be fun. All right. Well, Brian, thanks for doing all the research or having our listeners do some research for you. It was really fun to talk about this. And if you guys have thoughts, especially on the PyPI security thing,
Starting point is 00:25:54 go to pythonbytes.fm slash 44 and add your thoughts at the bottom. This is kind of a big deal. Yeah, and thanks everybody for helping come up with ideas for the show. We always appreciate it. Yep, keep it coming. Very much appreciated. All right. Bye, Brian. Yeah, and thanks to everybody for helping come up with ideas for the show. We always appreciate it. Yep, keep it coming. Very much appreciated.
Starting point is 00:26:07 All right. Bye, Brian. Bye, everyone. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send
Starting point is 00:26:25 it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.