Python Bytes - #158 There's a bounty on your open-source bugs!
Episode Date: November 27, 2019Topics covered in this episode: GitHub launches 'Security Lab' to help secure open source ecosystem pybit.es now has some test challenges pyhttptest - a command-line tool for HTTP tests over RESTfu...l APIs xarray Animated SVG Terminals Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/158
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 158, recorded November 20th, 2019.
I'm Michael Kennedy.
And I'm Brian Ocken.
And this episode is brought to you by DigitalOcean.
DigitalOcean's awesome. Check them out at pythonbytes.fm slash DigitalOcean.
Tell you more about that later.
But Brian, I find that Python is making its way into all these different areas, not just traditional computer science or maybe data science.
Right.
There's an article that I saw that's kind of interesting.
I mean, there's not a lot of details, but essentially it's saying that Python is replacing Excel in banking and investing.
The real title is Python Already Replaced Excel in Banking,
but we've got some interesting quotes from here, so I'm just going to read it out.
This is from the article. If you wanted to prove your mettle as an entry-level banker or trader,
it used to be the case that you had to know all about financial modeling in Excel. Not anymore.
These days it's all about Python, especially on on the trading floor and it goes on to talk about how a lot of different modeling that used to be done in
smaller cases in excel but it would take like a few minutes to run the excel modifications and
analysis now they can do even like way more data and have it done in like a second or two. So it does make sense in cases where split second decisions
or change how you react to the market
that you'd want to have speed and ease.
So Python makes sense to me.
Yeah, that's really interesting.
I'm sure it's using a lot of the data science stuff
like NumPy and whatnot to make that fast.
Deep down below, the whole trading, the algorithmic trading,
the high-speed trading, all that kind of stuff,
the latency that those folks care about is crazy, right?
Like if you could get it from four milliseconds
to three milliseconds, we'd really appreciate that, right?
And they'll actually like rent servers
that are nearly co-located to the stock market
to reduce the actual latency
or set up alternate direct connections over microwaves.
There's all kinds of crazy stuff.
And so if you can go from minutes to seconds, that already seems like it would make a big
difference to these folks.
Yeah.
And also being able to go from minutes to seconds and while incorporating more data.
Yeah.
Super cool.
I'm imagining like walking through the trading floor and seeing some guy in a hoodie sitting
with a laptop on the floor.
I mean, like, I don't understand this, but, you know, whatever.
Five years ago, that person would have been arrested.
Now people are like, hey, I need some help, man.
Can you give me some advice on this trade?
Yeah.
I have a little personal experience with this.
Python replacing Excel and banking and trading.
Can't talk about the details, but I did teach a class
to a bunch of folks working on the European stock market.
And they actually couldn't even take the class during the day because they had to be there for a while the market was open.
So we had the class in the evening for a week over there.
And they were all really into learning Python because they had been trying to analyze how their day went and do this kind of analysis that you're talking about in Excel.
And they're just like, we can't do this anymore.
We have to get better tools. And Python was the answer for we can't do this anymore. We have to get like better tools.
And Python was the answer for them as well.
Pretty cool.
Oh,
that's great.
Interesting.
Yeah.
Another thing that I think is really,
really good news is something that GitHub just announced.
GitHub has announced a ton of things while you were not with us last week
when we recorded in Florida,
we talked about how GitHub has added code navigation to all the source code there,
much of the source code.
You go in there
and click on functions and classes
and say go to definition in Python,
and that's pretty awesome.
So give it a week,
and GitHub launches Security Lab
to help secure the open source ecosystem.
Wow.
So you've probably heard about bug bounties
and these bounties and like these
bounties paid out to security researchers before i would guess yeah yeah so it's pretty much like
that is my understanding of it so it's like a bug bounty program to go and find bugs in open source
libraries but what's kind of cool is it seems like the folks paying out that money are not the open source projects, right?
Like Apple might pay out a huge amount of money, like $100,000 for finding a big vulnerability in iOS, or Microsoft might, or whoever.
But who's going to pay to find that security bug in Flask or wherever it is, right?
It seems like that this is to pay for those types of
things so it says organizations as well as individual security researchers can join a bug
bounty program with rewards of up to three thousand dollars is available to compensate
bug hunters for the time they put into searching for vulnerabilities in open source projects oh
that's neat cool right yeah yeah so apparently this has been in beta since for a little while when was it exactly a little while not very long anyway the founding members
who were part of it have already found reported and helped fix more than 100 security flaws already
across the open source ecosystem that's pretty cool another thing that's interesting is the bug report in order to count must contain a code
QL, like SQL
but code QL
or something? I don't know. Code
QL, which is an
open source tool that GitHub released
at the same time. Remember
we talked about their semantic code analysis
engine and what it does is basically this is a
query that runs against
source code that will uncover the vulnerabilities in dependent projects.
Okay.
So if I find a bug in Flask, I don't know if there is one, but let's just say I'm just picking a random project.
I find a bug in Flask and I submit this.
I submit a query to GitHub so that they can go find all the projects that depend on Flask that have outdated versions of-date versions of Flask that need to also subsequently receive warnings
to get their stuff updated.
So do they then notify all the other maintainers?
Yes.
So if you look at that article,
there's some screenshots of what it gets.
So the actual project will get an automated pull request
that fixes the security vulnerability.
Maybe it bumps the requirements pinned version to
something where it's fixed or something, right? It gets the PR to automatically fix it. And then
there's also a button where they can publish an advisory out to from that repository to
dependent repositories. And they could also request a a cve which is like a vulnerability official
number to be recognized as an actual issue so github became what was the term they used a
cve numbering authority a cna of course to so that they can actually issue these vulnerability
numbers to be understood and like referenced unique IDs across the security landscape.
Interesting.
Yeah.
So all this stuff is integrated into GitHub.
So GitHub researchers find the issue in the main project.
The main project gets a PR.
The main project can then also push out these warnings to other folks and request CVEs for
their projects.
That's pretty cool, right?
Yeah.
Open source is growing up.
Yeah, it totally is. And it seems like it's pretty solid
for all the folks working on it.
It doesn't seem like it requires much of the maintainers.
It's more like there's this bug-bounding program
from what I can tell.
And also they threw in there right at the end of this,
GitHub also updated the token scanning,
an in-house service that scans for like API keys, like AWS
access keys or whatever that have been accidentally left inside a source code.
Oh, that's good. That's really good.
Yeah. It'd be pretty nice to like, uh, you probably didn't mean this. Click this button
to make this go away. Anyway, I think this is really cool. I think this is like,
this is just plumbing to make open source more secure and I like that. Yeah, and also just
to be able to have
companies put money at
open source projects to keep them fixed
and it's not necessarily
trying to get the
official maintainer to do it, but
to have some incentive for
everybody else to
watch these things. So that's great.
Absolutely.
Yeah, these bug bounty programs have been working really well for the industry,
and it's cool to see GitHub putting that in there.
Also cool is DigitalOcean,
not just for sponsoring the show,
but because they have awesome infrastructure
and awesome product,
and we use them for our stuff.
So let me tell you about a new thing
that they have generally available,
memory-optimized droplets.
And if you have a memory heavy workload, basically this is the best way to get tons of memory in a droplet or a virtual
machine. So you can get eight gigs of RAM for each dedicated CPU. And then it goes from two CPUs all
the way up to enough to get you 256 gigs of RAM, whatever that math works out to be. And it's really
good for high-memory applications
like high-performance SQL or no-SQL
databases and memory caches like
Redis or Indexes,
some kind of large data analysis
runtime, something like that. So check those out
at pythonbytes.fm slash digital
ocean. Really good stuff over there.
Lots of cool things coming.
Brian, what you got next for us?
Well, we have a couple friends of ours,
Bob Belderboss and
Julian Sequeira.
They run a thing called PyBytes
and PyBytes Challenges.
Not affiliated with Python Bytes,
just sounds similar. It's the I
versus the Y. It's not even close to the same thing.
It's P-Y-B-I-T
dot E-S.
Anyway, I enjoy it.
It's a challenges platform where you can just sort of,
there's a few of them for free, but it is a paid service.
It's one of those things where they give you kind of a written assignment
and some test code already there, and it checks to see,
and then you have to fill in the body of a function
to make all the tests pass.
It's kind of a brain teaser sort of thing.
It's a fun way to keep up,
make sure that you're practicing out-of-the-box Python stuff
that you don't normally do.
That's what I use it for.
But the news is they just added test coverage,
or tests, testing.
So in the past, you didn't write the tests,
they wrote them to evaluate your code.
But they've added a few test challenges
where they write the code,
and you have to write the test code to check that code.
And it's kind of cool, but they were,
they actually talked to me about this as well,
as to try to pick my ideas,
but they came up with it on their own.
How do you evaluate if the
test code is good? So if you evaluate if your source code is good by running tests, but the
other way around is a little difficult. Yeah. How do you test the tests? Yeah. So they did it a
couple of ways. They're using coverage.py to make sure that you're hitting a hundred percent
coverage. And, you know, yes, it's debatable as for a large project of whether you should get 100 coverage
but for a small function or some small bit of code it should you should be able to hit 100
coverage that's a nice thing the other one is mutation testing so there's a couple projects
we've heard of mutt mutt and mutt pie m-u-t-p-y and uh i think we talked about this earlier but uh ned batch elder did write an
article about his experience with mutt mutt but uh pybytes is using muttpy and what it does is it
takes your the source code and changes something about it and muttpy works at the level of the
abstract syntax tree and it changes like instance, a division operator to a multiplication
or changes a string to some other string or something,
and then it runs the tests again.
And the idea is you want your tests to be able to...
It makes a whole bunch of mutants of the code,
and you want the tests to be able to kill off all the mutants
except for the original.
That's how they're testing it.
It's kind of a neat idea, but it's fun to play with.
It is an interesting question to ask, how do you test the test?
And I think this is pretty creative.
Well done, Bob and Julie.
I haven't used mutation testing a lot.
I've tried it out, but I haven't used it for projects.
The idea of using it in a training situation is a novel thing i haven't heard of
before and i think that's a cool idea to be able to to try to test somebody's uh test code yeah i
agree and like you said 100 code coverage for a project that's real is challenging i think also
maybe mutation testing for a project that's real tricky because maybe it changes like you know the
print statement that shows what the title of
the app is and who cares like no one's going to check for that right right but in this case where
pretty much it's a very small focused bit of code and you're supposed to test it like presumably
any changes to that are going to appear in the couple of tests you write yep nice now speaking
of tests i feel like i stole this one from you brian just out of the universe i mean so i want to talk about pi http test
so this one comes from florian dallas or dallets sorry and uh he actually sent in two things for
this week which they were both excellent so i'm going to cover them this is a command line tool
for http tests against restful apis okay all, so the idea is basically I want to test some RESTful endpoint,
and instead of going over and say, okay, I'm going to create,
I'm going to get requests, I'm going to do a get,
I'm going to get the dictionary, I'm going to verify,
like this thing is in the dictionary and so on,
what you basically do is you just write a simple little JSON document
for each test that you want to run.
Oh, cool.
Yeah, so then it has things like what is the name of the test,
what HTTP verb do you want to use,
what is the URL combination between host and endpoint,
the headers you need to pass, a query string you need to pass,
and then you get back a report.
It actually gives you a cool report in a columnar-style validation
that lets you assert things about it.
Yeah, there's a handful of these types of things
and I think it's kind of a neat way to describe
API testing.
Yeah, it seems really cool.
There's a bunch of neat little libraries that are used as well
like Tabulate, which is a cool way
to print the tabular data that they're
showing there and things like that.
Yeah, I like this project.
If your job is to test a bunch of
HTTP endpoints, this is pretty cool.
Yeah, neat.
Nice.
All right, what else?
What's next?
Oh, next.
X-Ray.
This was suggested by a listener.
I think it's Guido Imperial.
Yep, I agree.
Thanks, Guido.
Sent it in.
We haven't covered it before,
and actually I didn't know about it before.
People in the data science community probably do
because it seems pretty powerful. But the gist of it is it's built it uses and
builds on top of numpy and pandas and dask to offer um in-dimensional arrays you can do
in-dimensional arrays in in pandas already i believe but the with one of the neat things
about these is that they've got labels
on them. So they're self-describing and they've got indexes. There's a few data types within it.
There's a data, so there's x-ray data array. The data array is the indimensional array,
but it has metadata like names and labels for the dimensions. And you can also have coordinates and attributes.
And coordinates are essentially like the tick elements
for the different axes.
And then attributes, the data array doesn't really do anything
with the attributes, but it's a way to keep,
consistently keep data with data.
So if you have to keep track of some extra things like,
you know, where was this data collected or really anything, you can add them as an attribute. And then a data set is a
dictionary-like collection of data array elements. I was playing with this and it's pretty darn cool.
One of the nice things about using it is just keeping all of that the dimension names
together so if you have a multi-dimensional array even just like a three-dimensional array
it's sometimes hard to keep track of you know which axes is which and this is all together
but it's not just packaged together you can also do things like use the label names and the axi names and even axi elements at the coordinates.
They don't actually need to be numbers. For instance, you could have
the months of the year or the letters of the alphabet be
coordinates. You can use those as selectors to be able to select
rows and columns and those return different data array elements.
The data array elements also can be used in algorithms. They can just be
passed directly to Panda's algorithms. So these are pretty cool.
Yeah, it looks a little bit like it's taken some of the features from NumPy, some of the features from Panda,
some of the features from Dask, and sort of brings them together into
one package. So when I was going through some of the tutorials, I was to get somebody
to talk about this. It was like a three-dimensional array in, I think it's in pandas, is used to be, is
considered a panel. But when I went to look at the panel information, it looks like panels are being
deprecated for something else. So even in the pandas documentation, it was pointing to this
x-ray project. Oh, interesting.
I think the people in the Panda's community are definitely familiar with it.
But if you're using Panda's kind of on the side and you're not really in it all the time, this might be helpful.
Now, previously you spoke about Bob Belderbos, and I said we got this item from Florian Valitz.
I'm going to bring those two things together in this next one.
So Bob had introduced us to carbon remember that
yeah it's like screen sort of beautiful screenshots for colored code right code it's like a mock
faux little like shell or whatever editor like you don't use screenshots of real editors you just
create that with carbon at carbon.now.sh and that's cool but those are generally static so florian sent in this thing
called term to svg and it's a cool way to create animated terminal gifs so instead of going all
the way to create like full-on screencasts of your screen you can run this in your terminal
and then you just do whatever you want to do in the terminal and it captures it perfectly into svg and then you get convert that out to some kind of animated thing
like i guess the svg itself is animated so you just show that in the browser or wherever you
want to put it isn't that cool yeah very cool you basically just type term to svg once you have it
installed and it starts recording you do a bunch of stuff and then there's a way to get out of its recording status.
So it's pretty cool.
It produces lightweight, clean-looking animations
or you can even do still frames
if you want for a project page.
Carbon is cool
because I can put in the text and the code I want to show up,
but maybe it doesn't have
here is what the progress bar
and then the install steps with the spinner look like.
It doesn't naturally capture what actually happens
when that code or those terminal commands execute.
So this panel, it has color themes,
animation controls, all sorts of good stuff.
And yeah, it's pretty cool.
So there's probably, if this sounds interesting,
you want to check out the examples.
So there's a whole page of examples,
and there's a bunch of different stuff happening.
You can just look through there.
And I think there's also templates
that configure how it records and stuff.
So there's a bunch of predefined templates
that you can go play with to get started from.
That'd be really cool for like a tutorial site or something.
Yes, exactly.
Or if you have a project, like if you're the maintainer of PipX, to get started from. That'd be really cool for like a tutorial site or something. Yes, exactly. Yeah.
Or even,
or if you have a project,
right?
Like if you're the maintainer of pip X,
it'd be cool to use this to create a way to like show how awesome pip X is
like this step,
then this step and then boom,
right?
Just put that right in your GitHub readme.
Yeah.
I love it when there's little animated things in the readme.
So when you go to,
to,
to GitHub,
you just see that.
Yeah.
You and I,
we spend an inordinate amount of that? Yeah. You and I, we spend
an inordinate amount of time
jumping into new projects and going,
is it interesting? Yes or no?
Why is it interesting, right? And this
kind of stuff is the thing that just goes,
after 10 seconds,
I knew I wanted to learn about it, right? It really makes
a difference, and it's easy. Yeah,
very cool. Definitely check this out. Yeah, for sure.
Alright, yeah, so that's a good one. can check that out uh term to svg be cool all right well
that's it for our main items what else you got i have one bit of extra news is that pytest 5.3.0
was released the other day and it is mostly there's some cool features and if you you know
pytest nerds definitely check it out but i wanted to bring it up because I think a lot of people that just use PyTest
and are using it with continuous integration systems should pay attention to this
because the JUnit XML output, they've changed the default, so the default format.
An XML output has an old version and a new version.
The new version has some more information,
but they wanted to make sure that people know about this.
So if you run it, you'll get a warning, and it's not really a warning.
It just says, it's just to make you aware that there's a particular format
that's being deprecated.
So eventually in the 5.4 release, they won't support the old format.
So if you see this, I encourage anybody using
PyTest and continuous integration to read the change log and understand what's going on
and make sure they're ready to either pin PyTest or change their system.
Yeah, it's a good thing to put on people's radar for sure.
Okay. How about you, Michael? Any extra bits?
Yeah, I got a bunch for you. Actually, a couple of things. PyCon.
PyCon's awesome.
We love that each year.
And this year it's going to be in Pittsburgh for the first of its two years in that city.
And PyCon registration is now open.
You can go and register, get your ticket before it sells out.
Oh, cool.
Yeah, that comes to us from Jacqueline Wilson.
So thank you very much for sending that in.
And then also I saw, I can't remember where I saw this, somewhere, actually I think somewhere
funky like Flipboard or something.
So Facebook has now decided that Microsoft's Visual Studio Code is their default development
platform.
That's a little surprising to me.
Yeah, interesting.
Yeah, that's an article on ZDNet.
And they're also helping Microsoft improve the remote development experience in VS Code.
Cats, dogs, all live in the same place.
Okay.
Yeah, this is cool.
I suspect that things like Vim and Emacs and stuff probably have a strong representation there.
But apparently, it's all about Visual Studio Code over there now.
Anything else?
Yes, two more things.
Very exciting. So if the release
schedule lines up correctly in the future extends as I expected, this should be Wednesday before
Thanksgiving, right? And that would mean the day or two after that is going to be Black Friday. So
I just want to point out that TalkPython Training is going to have a really awesome Black Friday sale. Get a whole bunch of stuff on buying all of the courses, but also we're
doing some special things to support the PSF and other stuff, some surprises in there that I suspect
people won't guess at. And there's no way people are going to guess that what is there. So check
it out over at training.talkpython.fm. But you've got to act right away because it's only going to be there for like four days.
It's a big deal.
So check that out.
And also we have a new course coming, Python for the.NET developer.
So, so many people are coming from C Sharp and the.NET world over into the Python space.
I thought it would be cool to create a course that kind of gives them a big hug and holds their hand and helps them step over that divide.
So it's like
do you know about asp.net here's flask and here's how you use it in python do you know about any
framework here's seek welcoming here's how you use it in python like all the things that they
need or they love from c sharp and dot net here's the python equivalent and why it's awesome and
how it works is that one that you did or did somebody else do that no no, no, I did that one. Because you're like the perfect person for that.
Exactly.
I spent so many years doing C Sharp and now I'm all about Python.
So exactly.
I figured like, why don't I try to think back to the way it was for me many years ago and
like sort of extend that experience back to other people.
It's probably not going to be out yet.
It may be out at the time that people hear this, but it's coming really soon.
So I'll just put it out there as that.
That's nice. Hey, speaking of Black Friday, I do not have any insider knowledge, but Pragmatic Publishers often does a Black Friday sale too. It's usually
fairly steep. So if you've not picked up the PyTest book yet, and really, if you're listening
to this and you haven't read it yet, what's going on? Come on. If you haven't, maybe check out preggprog.com and see if there's a sale. Definitely. I'm sure there will be. It
would be surprising if there weren't. Awesome. How about a joke or two or three? I like three
jokes. Okay. It's a good number. So this one, first one is more of just a geeky STEM type of
joke, but I think people will like it. So I love soda drinks, you know, Coca-Cola, Dr. Pepper, root beer, things like that.
So this one, I try to not drink too much, but I do like it.
But here's how that world can clash together with math.
What do you get when you put root beer into a square glass?
I don't know, what?
Beer.
Beer.
I don't even get it, but it's funny.
If you take root of beer and you square it okay okay
like the square root of beer and then you put it in a square glass okay that was bad what's
your next one here okay what do you call an optimistic front-end developer i don't know
what you call a stack half full developer that is. Now, also, I was going to tell a version control joke, but they're only funny if you get them.
Get GIT.
Awesome.
Those are both good.
I like them.
Yeah.
Great.
Cool.
Well, thanks again for having a nice conversation this week.
Yeah, you bet.
Thanks as always.
See you later, Brian.
Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.