Python Bytes - #158 There's a bounty on your open-source bugs!

Episode Date: November 27, 2019

Topics covered in this episode: GitHub launches 'Security Lab' to help secure open source ecosystem pybit.es now has some test challenges pyhttptest - a command-line tool for HTTP tests over RESTfu...l APIs xarray Animated SVG Terminals Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/158

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 158, recorded November 20th, 2019. I'm Michael Kennedy. And I'm Brian Ocken. And this episode is brought to you by DigitalOcean. DigitalOcean's awesome. Check them out at pythonbytes.fm slash DigitalOcean. Tell you more about that later. But Brian, I find that Python is making its way into all these different areas, not just traditional computer science or maybe data science.
Starting point is 00:00:28 Right. There's an article that I saw that's kind of interesting. I mean, there's not a lot of details, but essentially it's saying that Python is replacing Excel in banking and investing. The real title is Python Already Replaced Excel in Banking, but we've got some interesting quotes from here, so I'm just going to read it out. This is from the article. If you wanted to prove your mettle as an entry-level banker or trader, it used to be the case that you had to know all about financial modeling in Excel. Not anymore. These days it's all about Python, especially on on the trading floor and it goes on to talk about how a lot of different modeling that used to be done in
Starting point is 00:01:11 smaller cases in excel but it would take like a few minutes to run the excel modifications and analysis now they can do even like way more data and have it done in like a second or two. So it does make sense in cases where split second decisions or change how you react to the market that you'd want to have speed and ease. So Python makes sense to me. Yeah, that's really interesting. I'm sure it's using a lot of the data science stuff like NumPy and whatnot to make that fast.
Starting point is 00:01:41 Deep down below, the whole trading, the algorithmic trading, the high-speed trading, all that kind of stuff, the latency that those folks care about is crazy, right? Like if you could get it from four milliseconds to three milliseconds, we'd really appreciate that, right? And they'll actually like rent servers that are nearly co-located to the stock market to reduce the actual latency
Starting point is 00:02:01 or set up alternate direct connections over microwaves. There's all kinds of crazy stuff. And so if you can go from minutes to seconds, that already seems like it would make a big difference to these folks. Yeah. And also being able to go from minutes to seconds and while incorporating more data. Yeah. Super cool.
Starting point is 00:02:17 I'm imagining like walking through the trading floor and seeing some guy in a hoodie sitting with a laptop on the floor. I mean, like, I don't understand this, but, you know, whatever. Five years ago, that person would have been arrested. Now people are like, hey, I need some help, man. Can you give me some advice on this trade? Yeah. I have a little personal experience with this.
Starting point is 00:02:37 Python replacing Excel and banking and trading. Can't talk about the details, but I did teach a class to a bunch of folks working on the European stock market. And they actually couldn't even take the class during the day because they had to be there for a while the market was open. So we had the class in the evening for a week over there. And they were all really into learning Python because they had been trying to analyze how their day went and do this kind of analysis that you're talking about in Excel. And they're just like, we can't do this anymore. We have to get better tools. And Python was the answer for we can't do this anymore. We have to get like better tools.
Starting point is 00:03:06 And Python was the answer for them as well. Pretty cool. Oh, that's great. Interesting. Yeah. Another thing that I think is really, really good news is something that GitHub just announced.
Starting point is 00:03:16 GitHub has announced a ton of things while you were not with us last week when we recorded in Florida, we talked about how GitHub has added code navigation to all the source code there, much of the source code. You go in there and click on functions and classes and say go to definition in Python, and that's pretty awesome.
Starting point is 00:03:36 So give it a week, and GitHub launches Security Lab to help secure the open source ecosystem. Wow. So you've probably heard about bug bounties and these bounties and like these bounties paid out to security researchers before i would guess yeah yeah so it's pretty much like that is my understanding of it so it's like a bug bounty program to go and find bugs in open source
Starting point is 00:03:58 libraries but what's kind of cool is it seems like the folks paying out that money are not the open source projects, right? Like Apple might pay out a huge amount of money, like $100,000 for finding a big vulnerability in iOS, or Microsoft might, or whoever. But who's going to pay to find that security bug in Flask or wherever it is, right? It seems like that this is to pay for those types of things so it says organizations as well as individual security researchers can join a bug bounty program with rewards of up to three thousand dollars is available to compensate bug hunters for the time they put into searching for vulnerabilities in open source projects oh that's neat cool right yeah yeah so apparently this has been in beta since for a little while when was it exactly a little while not very long anyway the founding members
Starting point is 00:04:50 who were part of it have already found reported and helped fix more than 100 security flaws already across the open source ecosystem that's pretty cool another thing that's interesting is the bug report in order to count must contain a code QL, like SQL but code QL or something? I don't know. Code QL, which is an open source tool that GitHub released at the same time. Remember
Starting point is 00:05:18 we talked about their semantic code analysis engine and what it does is basically this is a query that runs against source code that will uncover the vulnerabilities in dependent projects. Okay. So if I find a bug in Flask, I don't know if there is one, but let's just say I'm just picking a random project. I find a bug in Flask and I submit this. I submit a query to GitHub so that they can go find all the projects that depend on Flask that have outdated versions of-date versions of Flask that need to also subsequently receive warnings
Starting point is 00:05:47 to get their stuff updated. So do they then notify all the other maintainers? Yes. So if you look at that article, there's some screenshots of what it gets. So the actual project will get an automated pull request that fixes the security vulnerability. Maybe it bumps the requirements pinned version to
Starting point is 00:06:05 something where it's fixed or something, right? It gets the PR to automatically fix it. And then there's also a button where they can publish an advisory out to from that repository to dependent repositories. And they could also request a a cve which is like a vulnerability official number to be recognized as an actual issue so github became what was the term they used a cve numbering authority a cna of course to so that they can actually issue these vulnerability numbers to be understood and like referenced unique IDs across the security landscape. Interesting. Yeah.
Starting point is 00:06:47 So all this stuff is integrated into GitHub. So GitHub researchers find the issue in the main project. The main project gets a PR. The main project can then also push out these warnings to other folks and request CVEs for their projects. That's pretty cool, right? Yeah. Open source is growing up.
Starting point is 00:07:03 Yeah, it totally is. And it seems like it's pretty solid for all the folks working on it. It doesn't seem like it requires much of the maintainers. It's more like there's this bug-bounding program from what I can tell. And also they threw in there right at the end of this, GitHub also updated the token scanning, an in-house service that scans for like API keys, like AWS
Starting point is 00:07:26 access keys or whatever that have been accidentally left inside a source code. Oh, that's good. That's really good. Yeah. It'd be pretty nice to like, uh, you probably didn't mean this. Click this button to make this go away. Anyway, I think this is really cool. I think this is like, this is just plumbing to make open source more secure and I like that. Yeah, and also just to be able to have companies put money at open source projects to keep them fixed
Starting point is 00:07:52 and it's not necessarily trying to get the official maintainer to do it, but to have some incentive for everybody else to watch these things. So that's great. Absolutely. Yeah, these bug bounty programs have been working really well for the industry,
Starting point is 00:08:08 and it's cool to see GitHub putting that in there. Also cool is DigitalOcean, not just for sponsoring the show, but because they have awesome infrastructure and awesome product, and we use them for our stuff. So let me tell you about a new thing that they have generally available,
Starting point is 00:08:22 memory-optimized droplets. And if you have a memory heavy workload, basically this is the best way to get tons of memory in a droplet or a virtual machine. So you can get eight gigs of RAM for each dedicated CPU. And then it goes from two CPUs all the way up to enough to get you 256 gigs of RAM, whatever that math works out to be. And it's really good for high-memory applications like high-performance SQL or no-SQL databases and memory caches like Redis or Indexes,
Starting point is 00:08:54 some kind of large data analysis runtime, something like that. So check those out at pythonbytes.fm slash digital ocean. Really good stuff over there. Lots of cool things coming. Brian, what you got next for us? Well, we have a couple friends of ours, Bob Belderboss and
Starting point is 00:09:09 Julian Sequeira. They run a thing called PyBytes and PyBytes Challenges. Not affiliated with Python Bytes, just sounds similar. It's the I versus the Y. It's not even close to the same thing. It's P-Y-B-I-T dot E-S.
Starting point is 00:09:26 Anyway, I enjoy it. It's a challenges platform where you can just sort of, there's a few of them for free, but it is a paid service. It's one of those things where they give you kind of a written assignment and some test code already there, and it checks to see, and then you have to fill in the body of a function to make all the tests pass. It's kind of a brain teaser sort of thing.
Starting point is 00:09:51 It's a fun way to keep up, make sure that you're practicing out-of-the-box Python stuff that you don't normally do. That's what I use it for. But the news is they just added test coverage, or tests, testing. So in the past, you didn't write the tests, they wrote them to evaluate your code.
Starting point is 00:10:08 But they've added a few test challenges where they write the code, and you have to write the test code to check that code. And it's kind of cool, but they were, they actually talked to me about this as well, as to try to pick my ideas, but they came up with it on their own. How do you evaluate if the
Starting point is 00:10:25 test code is good? So if you evaluate if your source code is good by running tests, but the other way around is a little difficult. Yeah. How do you test the tests? Yeah. So they did it a couple of ways. They're using coverage.py to make sure that you're hitting a hundred percent coverage. And, you know, yes, it's debatable as for a large project of whether you should get 100 coverage but for a small function or some small bit of code it should you should be able to hit 100 coverage that's a nice thing the other one is mutation testing so there's a couple projects we've heard of mutt mutt and mutt pie m-u-t-p-y and uh i think we talked about this earlier but uh ned batch elder did write an article about his experience with mutt mutt but uh pybytes is using muttpy and what it does is it
Starting point is 00:11:14 takes your the source code and changes something about it and muttpy works at the level of the abstract syntax tree and it changes like instance, a division operator to a multiplication or changes a string to some other string or something, and then it runs the tests again. And the idea is you want your tests to be able to... It makes a whole bunch of mutants of the code, and you want the tests to be able to kill off all the mutants except for the original.
Starting point is 00:11:44 That's how they're testing it. It's kind of a neat idea, but it's fun to play with. It is an interesting question to ask, how do you test the test? And I think this is pretty creative. Well done, Bob and Julie. I haven't used mutation testing a lot. I've tried it out, but I haven't used it for projects. The idea of using it in a training situation is a novel thing i haven't heard of
Starting point is 00:12:05 before and i think that's a cool idea to be able to to try to test somebody's uh test code yeah i agree and like you said 100 code coverage for a project that's real is challenging i think also maybe mutation testing for a project that's real tricky because maybe it changes like you know the print statement that shows what the title of the app is and who cares like no one's going to check for that right right but in this case where pretty much it's a very small focused bit of code and you're supposed to test it like presumably any changes to that are going to appear in the couple of tests you write yep nice now speaking of tests i feel like i stole this one from you brian just out of the universe i mean so i want to talk about pi http test
Starting point is 00:12:46 so this one comes from florian dallas or dallets sorry and uh he actually sent in two things for this week which they were both excellent so i'm going to cover them this is a command line tool for http tests against restful apis okay all, so the idea is basically I want to test some RESTful endpoint, and instead of going over and say, okay, I'm going to create, I'm going to get requests, I'm going to do a get, I'm going to get the dictionary, I'm going to verify, like this thing is in the dictionary and so on, what you basically do is you just write a simple little JSON document
Starting point is 00:13:21 for each test that you want to run. Oh, cool. Yeah, so then it has things like what is the name of the test, what HTTP verb do you want to use, what is the URL combination between host and endpoint, the headers you need to pass, a query string you need to pass, and then you get back a report. It actually gives you a cool report in a columnar-style validation
Starting point is 00:13:41 that lets you assert things about it. Yeah, there's a handful of these types of things and I think it's kind of a neat way to describe API testing. Yeah, it seems really cool. There's a bunch of neat little libraries that are used as well like Tabulate, which is a cool way to print the tabular data that they're
Starting point is 00:13:57 showing there and things like that. Yeah, I like this project. If your job is to test a bunch of HTTP endpoints, this is pretty cool. Yeah, neat. Nice. All right, what else? What's next?
Starting point is 00:14:09 Oh, next. X-Ray. This was suggested by a listener. I think it's Guido Imperial. Yep, I agree. Thanks, Guido. Sent it in. We haven't covered it before,
Starting point is 00:14:20 and actually I didn't know about it before. People in the data science community probably do because it seems pretty powerful. But the gist of it is it's built it uses and builds on top of numpy and pandas and dask to offer um in-dimensional arrays you can do in-dimensional arrays in in pandas already i believe but the with one of the neat things about these is that they've got labels on them. So they're self-describing and they've got indexes. There's a few data types within it. There's a data, so there's x-ray data array. The data array is the indimensional array,
Starting point is 00:14:58 but it has metadata like names and labels for the dimensions. And you can also have coordinates and attributes. And coordinates are essentially like the tick elements for the different axes. And then attributes, the data array doesn't really do anything with the attributes, but it's a way to keep, consistently keep data with data. So if you have to keep track of some extra things like, you know, where was this data collected or really anything, you can add them as an attribute. And then a data set is a
Starting point is 00:15:32 dictionary-like collection of data array elements. I was playing with this and it's pretty darn cool. One of the nice things about using it is just keeping all of that the dimension names together so if you have a multi-dimensional array even just like a three-dimensional array it's sometimes hard to keep track of you know which axes is which and this is all together but it's not just packaged together you can also do things like use the label names and the axi names and even axi elements at the coordinates. They don't actually need to be numbers. For instance, you could have the months of the year or the letters of the alphabet be coordinates. You can use those as selectors to be able to select
Starting point is 00:16:20 rows and columns and those return different data array elements. The data array elements also can be used in algorithms. They can just be passed directly to Panda's algorithms. So these are pretty cool. Yeah, it looks a little bit like it's taken some of the features from NumPy, some of the features from Panda, some of the features from Dask, and sort of brings them together into one package. So when I was going through some of the tutorials, I was to get somebody to talk about this. It was like a three-dimensional array in, I think it's in pandas, is used to be, is considered a panel. But when I went to look at the panel information, it looks like panels are being
Starting point is 00:16:56 deprecated for something else. So even in the pandas documentation, it was pointing to this x-ray project. Oh, interesting. I think the people in the Panda's community are definitely familiar with it. But if you're using Panda's kind of on the side and you're not really in it all the time, this might be helpful. Now, previously you spoke about Bob Belderbos, and I said we got this item from Florian Valitz. I'm going to bring those two things together in this next one. So Bob had introduced us to carbon remember that yeah it's like screen sort of beautiful screenshots for colored code right code it's like a mock
Starting point is 00:17:32 faux little like shell or whatever editor like you don't use screenshots of real editors you just create that with carbon at carbon.now.sh and that's cool but those are generally static so florian sent in this thing called term to svg and it's a cool way to create animated terminal gifs so instead of going all the way to create like full-on screencasts of your screen you can run this in your terminal and then you just do whatever you want to do in the terminal and it captures it perfectly into svg and then you get convert that out to some kind of animated thing like i guess the svg itself is animated so you just show that in the browser or wherever you want to put it isn't that cool yeah very cool you basically just type term to svg once you have it installed and it starts recording you do a bunch of stuff and then there's a way to get out of its recording status.
Starting point is 00:18:28 So it's pretty cool. It produces lightweight, clean-looking animations or you can even do still frames if you want for a project page. Carbon is cool because I can put in the text and the code I want to show up, but maybe it doesn't have here is what the progress bar
Starting point is 00:18:44 and then the install steps with the spinner look like. It doesn't naturally capture what actually happens when that code or those terminal commands execute. So this panel, it has color themes, animation controls, all sorts of good stuff. And yeah, it's pretty cool. So there's probably, if this sounds interesting, you want to check out the examples.
Starting point is 00:19:09 So there's a whole page of examples, and there's a bunch of different stuff happening. You can just look through there. And I think there's also templates that configure how it records and stuff. So there's a bunch of predefined templates that you can go play with to get started from. That'd be really cool for like a tutorial site or something.
Starting point is 00:19:23 Yes, exactly. Or if you have a project, like if you're the maintainer of PipX, to get started from. That'd be really cool for like a tutorial site or something. Yes, exactly. Yeah. Or even, or if you have a project, right? Like if you're the maintainer of pip X, it'd be cool to use this to create a way to like show how awesome pip X is like this step,
Starting point is 00:19:34 then this step and then boom, right? Just put that right in your GitHub readme. Yeah. I love it when there's little animated things in the readme. So when you go to, to, to GitHub,
Starting point is 00:19:42 you just see that. Yeah. You and I, we spend an inordinate amount of that? Yeah. You and I, we spend an inordinate amount of time jumping into new projects and going, is it interesting? Yes or no? Why is it interesting, right? And this
Starting point is 00:19:53 kind of stuff is the thing that just goes, after 10 seconds, I knew I wanted to learn about it, right? It really makes a difference, and it's easy. Yeah, very cool. Definitely check this out. Yeah, for sure. Alright, yeah, so that's a good one. can check that out uh term to svg be cool all right well that's it for our main items what else you got i have one bit of extra news is that pytest 5.3.0 was released the other day and it is mostly there's some cool features and if you you know
Starting point is 00:20:22 pytest nerds definitely check it out but i wanted to bring it up because I think a lot of people that just use PyTest and are using it with continuous integration systems should pay attention to this because the JUnit XML output, they've changed the default, so the default format. An XML output has an old version and a new version. The new version has some more information, but they wanted to make sure that people know about this. So if you run it, you'll get a warning, and it's not really a warning. It just says, it's just to make you aware that there's a particular format
Starting point is 00:20:55 that's being deprecated. So eventually in the 5.4 release, they won't support the old format. So if you see this, I encourage anybody using PyTest and continuous integration to read the change log and understand what's going on and make sure they're ready to either pin PyTest or change their system. Yeah, it's a good thing to put on people's radar for sure. Okay. How about you, Michael? Any extra bits? Yeah, I got a bunch for you. Actually, a couple of things. PyCon.
Starting point is 00:21:25 PyCon's awesome. We love that each year. And this year it's going to be in Pittsburgh for the first of its two years in that city. And PyCon registration is now open. You can go and register, get your ticket before it sells out. Oh, cool. Yeah, that comes to us from Jacqueline Wilson. So thank you very much for sending that in.
Starting point is 00:21:43 And then also I saw, I can't remember where I saw this, somewhere, actually I think somewhere funky like Flipboard or something. So Facebook has now decided that Microsoft's Visual Studio Code is their default development platform. That's a little surprising to me. Yeah, interesting. Yeah, that's an article on ZDNet. And they're also helping Microsoft improve the remote development experience in VS Code.
Starting point is 00:22:07 Cats, dogs, all live in the same place. Okay. Yeah, this is cool. I suspect that things like Vim and Emacs and stuff probably have a strong representation there. But apparently, it's all about Visual Studio Code over there now. Anything else? Yes, two more things. Very exciting. So if the release
Starting point is 00:22:26 schedule lines up correctly in the future extends as I expected, this should be Wednesday before Thanksgiving, right? And that would mean the day or two after that is going to be Black Friday. So I just want to point out that TalkPython Training is going to have a really awesome Black Friday sale. Get a whole bunch of stuff on buying all of the courses, but also we're doing some special things to support the PSF and other stuff, some surprises in there that I suspect people won't guess at. And there's no way people are going to guess that what is there. So check it out over at training.talkpython.fm. But you've got to act right away because it's only going to be there for like four days. It's a big deal. So check that out.
Starting point is 00:23:08 And also we have a new course coming, Python for the.NET developer. So, so many people are coming from C Sharp and the.NET world over into the Python space. I thought it would be cool to create a course that kind of gives them a big hug and holds their hand and helps them step over that divide. So it's like do you know about asp.net here's flask and here's how you use it in python do you know about any framework here's seek welcoming here's how you use it in python like all the things that they need or they love from c sharp and dot net here's the python equivalent and why it's awesome and how it works is that one that you did or did somebody else do that no no, no, I did that one. Because you're like the perfect person for that.
Starting point is 00:23:46 Exactly. I spent so many years doing C Sharp and now I'm all about Python. So exactly. I figured like, why don't I try to think back to the way it was for me many years ago and like sort of extend that experience back to other people. It's probably not going to be out yet. It may be out at the time that people hear this, but it's coming really soon. So I'll just put it out there as that.
Starting point is 00:24:08 That's nice. Hey, speaking of Black Friday, I do not have any insider knowledge, but Pragmatic Publishers often does a Black Friday sale too. It's usually fairly steep. So if you've not picked up the PyTest book yet, and really, if you're listening to this and you haven't read it yet, what's going on? Come on. If you haven't, maybe check out preggprog.com and see if there's a sale. Definitely. I'm sure there will be. It would be surprising if there weren't. Awesome. How about a joke or two or three? I like three jokes. Okay. It's a good number. So this one, first one is more of just a geeky STEM type of joke, but I think people will like it. So I love soda drinks, you know, Coca-Cola, Dr. Pepper, root beer, things like that. So this one, I try to not drink too much, but I do like it. But here's how that world can clash together with math.
Starting point is 00:24:56 What do you get when you put root beer into a square glass? I don't know, what? Beer. Beer. I don't even get it, but it's funny. If you take root of beer and you square it okay okay like the square root of beer and then you put it in a square glass okay that was bad what's your next one here okay what do you call an optimistic front-end developer i don't know
Starting point is 00:25:18 what you call a stack half full developer that is. Now, also, I was going to tell a version control joke, but they're only funny if you get them. Get GIT. Awesome. Those are both good. I like them. Yeah. Great. Cool.
Starting point is 00:25:35 Well, thanks again for having a nice conversation this week. Yeah, you bet. Thanks as always. See you later, Brian. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.
Starting point is 00:25:47 And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.