Python Bytes - #41 Python Concurrency From the Ground Up and Caring for our Community
Episode Date: August 31, 2017Topics covered in this episode: lolviz Odo for data transforms Python Concurrency From the Ground Up FAT Python: the next chapter in Python optimization sshuttle Node.js forks again – this time i...t's a war of words over codes of conducts Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/41
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 41, recorded August 30th, 2017.
I'm Michael Kennedy, and yes, I'm back from vacation.
Thank you, Brian Ocken, for covering and all of our guest co-hosts.
And it's time to immediately start repaying Brian for his keeping things rolling while I was gone. He's in Germany for some work business,
and we have a new guest co-host.
Welcome, Miguel Grinberg.
Hello, and thank you for having me, Michael.
Happy to be here.
Yeah, I'm happy to have you here.
It's going to be fun to talk about the items you've chosen for the week
and what I've got as well.
So let's just kick it off first by saying thank you to Rollbar,
who's bringing you this episode. So check them out at pythonbytes.fm slash Rollbar and get some
good monitoring for your app. We'll talk more about that later. But let's jump to your first
item, LOLViz. I would like to call it LOLViz. But actually, this is not anything to laugh about.
So this is a Python package that generates graphical
representations of very commonly used Python data structures. So far, they support five different
structures, and one of them is a list of lists, which gives the LOL name to the package. But
they also support dictionaries, lists, linked lists, and binary trees.
And what they do is basically they use GraphBiz, which you need to install on your machine
to generate these graphical representations.
And one of the coolest things is that it integrates with Jupyter.
So if you're doing this from a notebook in Jupyter, then it'll render these graphical representations right in your prompt.
So it's super cool, especially I imagine for you and for me, people who used to teach Python.
I'm having tons of ideas to use this myself.
Yeah, so am I.
I'm like, this can definitely be used to help people who are new to these data structures and new to these ideas.
And honestly, new to the concept of pointers and reference types at all.
I think it's really a great way to learn it.
The linked list one is particularly interesting to help people who are not familiar with pointers to visualize how that works.
One thing I need to mention is that if you try to pip install this in Python 3
today, it's going to fail. I submitted a bug and the author told me that a fix is coming pretty
soon with additional features. So I'm glad to hear that he's responsive and that he's working
on making more improvements. Yeah, this thing came out really recently. I don't remember how
old it is, but it's all quite new, like a couple of days. And it's already got 200 stars on GitHub. And it looks like it's going well.
Yeah, it's going pretty well. Yeah, I if you're not teaching it, like I think looking at some of the different representations of these data structures that it draws is probably helpful for people who just haven't really thought a lot about it as well.
So definitely worth checking out.
All right.
So moving from visualizing different data structures to transforming data structures, the thing I want to cover first this week is Odo.
So this is a user or listener recommendation, as was LOLViz.
So Odo migrates data between all these different formats, and it knows how to read, let's say,
a Panda data frame, and then convert that to a MongoDB table or collection and save it just like
that. And it's literally one line like odo here's your data frame go to that
database or here's your postgres go to csv or reverse so if you find yourself pulling in data
from one source and trying to convert it or save it more or less in the same shape over to another
source then check out this thing called odo odo i wonder if it takes a list of lists, an LOL, and writes a CSV file. That would be something
that I could find useful tons of times. Yeah, absolutely. That would be really, really cool.
And it does take lists and transform them. It probably does, right. So I'm not entirely sure
how flexible it is in that regard, but I think there's also extensions. So you can write
extensions. So just to give you the rundown, let's see, it'll work with Panda data frames, with a list, with JSON
files, CSV files, Postgres. Yeah, so you could take like a CSV file, load it into Postgres,
or Postgres into JSON, or it even works with like converting into MongoDB, like I said,
so like Pandas to Mongo or reverse. So not a whole lot more to it than that,
but it seems really handy if that's the problem you're trying to solve.
Yeah, very nice.
Yeah, so tell us about concurrency.
You chose this item, and it's not exactly new,
but it's certainly something we haven't covered and is really amazing.
And I agree with you that this is one of the really interesting things to cover in concurrency.
Right, so this is a Pycon talk from a couple years ago from the one and only dave beasley i mean who else
right and uh the the interesting thing as a speaker i find it interesting not only the content
but also the way he presented this talk uh this is a live coding session the entire thing it's Dave's terminal
there are no slides and he's speaking
while coding and starts from
an empty file actually codes everything
in the talk. I think he just fires up
Emacs and says let's do this right?
Right yeah
just two or three terminal sessions
one with Emacs and the other two
with Bash and it's done all there
and he goes to cover concuracs and the other two with Bash, and it's done all there. And he goes to
cover concurrency and the different ways you can do concurrency with threads, with processes,
shows all the problems with both approaches, how the global interpreter log messes all this
and complicates things. And then in the second portion of the talk, he goes and builds an asynchronous framework, pretty much
like asyncio, a small version of it, a minimal version, using Python generators without any other
additional libraries, all in Python standard code. So it's pretty amazing. And it's only 45 minutes.
The amount of knowledge that's in those 45 minutes, it's unbelievable.
Yeah. And I really love the style. Like, well done, David. So certainly the style of we're
going to build this up from scratch. I'm not going to just show you a bunch of slides and
talk about it, but I'm going to just show you how it's built. Really makes it feel accessible.
You're like, well, if he could literally do it from scratch in 45 minutes, like I saw everything
that went into it, it was pretty understandable. It really is, really is well done.
So yeah,
if people are,
you know,
thinking about Python, like a currency or generators or async IO and all these things.
It's actually,
uh,
even good for,
for networking because he,
he builds a little server.
He,
he doesn't even use,
you know,
nothing like flask or jungle or anything.
He,
he builds a little server,
uh,
using the,
he,
he calls it the socket framework,
just using network sockets.
Yeah.
That was like part of the demo.
It was just like part of it, right?
Right, yeah, it's super awesome.
Yeah, so certainly if this sounds interesting to you,
be sure to watch that video.
It's on YouTube.
We'll link it to it in the show notes.
And you'll learn a bunch,
especially conceptually what's going on
in all this async IO stuff.
Super cool.
So before we move on to the next segment, though,
let me tell you about the sponsor for the show, Rollbar.
So Rollbar lets you basically just type pip install rollbar,
type a few commands either in your config file
or in code in your web app,
and it will continuously monitor your site,
your web application for any sorts of errors
and not just tell you if something happened,
but capture all the details.
What was the logged in user when they ran into an error?
So who is your customer who's having this problem?
What's the call stack?
What other errors have you experienced like this?
What are the local variables passed to the function
when it failed?
Like you can probably fix the error
without even debugging it or running your code.
You just look and go, oh, I see what's wrong.
So I use Rollbar a lot, love it. If you want to check it out, check it out at pythonbytes.fm slash rollbar. All right. So next up, I want to talk about some optimizations. This
concurrency stuff that you brought up is certainly a sort of a form of optimization, but this is kind
of the future, trying to push CPython,
the main default Python implementation further. And this is an article on Medium by a friend of
the show, Anthony Shaw, covering work that Victor Stinner has done. And Victor Stinner,
have you been following what he's doing? He's like killing it on performance.
He is. Yes, absolutely. I've been to his talk this past PyCon as well.
Yeah, so there's just so many things
that he's doing that are amazing.
And he did give some good talks at PyCon as well.
So this project is called Fat Python,
the next chapter in Python optimization.
So like I said, article by Anthony Shaw.
And he basically highlights this Fat Python project
that was started by Victor Stinner back in 2015.
And the idea was, let's try to make it possible to apply better static optimizers for Python.
So one of the big challenges that you have with optimizing Python is it's super dynamic.
You can't necessarily just look at the code and say, well, it has this structure, so we're going to change it around because you could go and dynamically add methods, functions, variables, whatever, right?
Yeah, so that makes it a big challenge.
So there's actually three PEPs, Python Enhancement Proposals, that chain together to try to make things a little bit better that Victor's working on as part of this project. So one is PEP511,
which is a proposal to add a process to optimize ASTs.
So ASTs are abstract syntax trees,
are basically what Python pulls up
before it starts to execute your code, right?
It can pull up and...
It's basically a tree representation
of the code that you write
right in a form that that's easier to be interpreted right like an object-oriented
representation of what your code's gonna do not just the text the idea is it's possible it could
be possible to have an optimizer look at that ast and say okay this looks like panda's code and
you're applying this you're doing this particular anti-pattern that is slow. So maybe we could change things around behind the scenes. You
don't even know it to optimize or to fix that, right? Maybe people run linters that say,
you're doing this thing that's not amazing. What if we could have an optimizer that would just
make it fast? Right. It would just make the change for you. Exactly. The proposal is to basically
create some kind of hook
for creating these optimizers.
And this might be built into CPython.
It might even be something you pip install,
like pip install the NumPy optimizer
or whatever for my runtime,
which is really, really cool.
So that also brings us to PEP 509.
Like I said, this makes it really hard to optimize
because everything is mutable at runtime.
And there's these things called guards that verify like the last time it's sort of processed this bit of the structure that it hasn't changed.
PEP 509 is a process to add a private version of dictionaries that implement a different type of guards that are much faster.
We have 510, which proposes a public API to CPython
for specialized code with guards for a function.
So the idea is you put it all together,
you take this optimizer,
it generates a new high-performance version.
It replaces the code that would normally run
with this optimized version.
And as long as it doesn't change,
it can run that optimized code or fall back so taking together all
three of these create this fat python thing which is really great so you can download this and run
it you have to compile c python like a special branch of it to make it work but the article
talks about it but the the results are really good so for example a basic function that just
you call it returns a string 24 faster 24% faster than Python 3.6.
I was skimming through the article and the peps,
and I could not find, for this implementation,
what are the gotchas, if there are any.
It doesn't look like there are, which is great.
But I was wondering if any code can run or... I think it would, I guess the gotchas would probably be like,
can the optimizer deal with it?
And are you 100% sure the optimizer is not going to make a change that has a behavioral change, right?
Right.
Yeah, that was what I was looking for.
I mean, are there any cases where it can make a mistake at this point?
Yeah, I think this is basically cracking the system open so that these things could be plugged in.
But then you'd probably have to look at the gotchas of the optimizers you know what i mean right yeah so
it's pretty good like a 24 improvement in function call speed and that's 46 faster than python 2.7
like that that is a big deal one of the big drawbacks of python is function calls are really
slow and so people don't necessarily break their code into as small functions as they should.
And so this might really alleviate some of that.
I think it's great.
Yeah, awesome project.
Yeah, keep up the good work, Victor.
All right, so suppose I'm sitting in a coffee shop.
I'm sure it's fine if I just, you know, go log into my bank.
Right.
Things like that.
Right, yeah.
So, I mean, we all hear that it is insecure to go online on a coffee shop or hotel, airport, you know, all of those.
Yeah, hotels and airports scare me way more, especially hotels these days. I don't know why.
Right. So, you know, very few people really understand, you know, what's the problem.
You may think that if you only access sites that are HTTPS, so secure site that you'll find, and you're really not
completely, all the content that you transfer to this site is going to be encrypted. But there are
other things that happen before you get to connection that are not encrypted, like for
example, DNS searches. So if you're in a coffee shop, getting to a lot of sites, it's very easy
for someone on the same network to find out what sites you visit, even though they cannot see what data you transfer to them.
Right. And if somehow they happen to be able to alter that DNS, which probably has no verification because it's unencrypted, then they send you to their Google.com.
They can send you to some other place. Right, exactly. So, you know, it's very important that you take security when you go on a public Wi-Fi spot, you know, very seriously.
And I wanted to mention this tool that so many times I talk to people and I mention it and they don't know it.
And it's super great.
It's called S-Shuttle.
And typically the solution that you're told is that you need to pay for a VPN service.
And as Shuttle, it proclaims itself as the poor man's VPN.
And a nice advantage it has over regular VPNs is that it doesn't require any software to be installed on the remote machine, the secure machine that you use for your connection.
So the way it works is basically you need to have SSH
access into a secure machine. And in my case, I have it right here in my home. I have a little,
this is a chip machine that these are $9 computers or could be a Raspberry Pi, you know, any of those.
And the only thing you need there is SSH, right? So you get it by default if you install the Linux distribution.
And then from anywhere you are, you use S Shuttle to create a secure encrypted tunnel into this secure machine that you have in your home.
And then everything that you do goes through the tunnel and then it's forwarded into the secure machine and then it happens on your secure machine including dns searches so
that's really cool so if i run a shuttle and i go to like gmail.com it will actually funnel the
traffic through say my raspberry pi like on on on your on the coffee shop there's only going to be
a connection to your raspberry pi and then the raspberry pi will make the connection to gmail
and then forward the results back into your connection
in the coffee shop through an encrypted channel.
Oh, that's really cool.
Everything you do is absolutely private.
Yeah, that's super cool.
And it's written in Python, yeah?
And it is written in Python.
It recently had support for Python 3, yay.
Nice, that's cool.
Because that used to be a problem in the past,
but now it works on Python 3 as well.
All right, well, that's really cool.
I'm definitely going to check that out.
All right, so last thing is,
I want to talk about something
that initially might surprise people about the topic,
because I want to talk about Node.js.
So Node.js, while Python developers sometimes,
or web developers that also play with Node.js,
I don't want to talk about it for that reason.
Mostly, I consider Node.js like a similar new modern ecosystem and environment similar to Python's, right? It's very open source.
There's a lot of people excited and contributing to it, things like that. But Node.js has been in
the news for the wrong reason this week. And basically, a bunch of the people who are in charge of the steering committee for Node.js
quit like a third of the committee quit and just said we're done with this because there's been
like a huge rift in the community apparently and so I kind of want to talk about this community
aspect and sort of give thanks for how how well things are going at Python and the Python ecosystem relative to Node.js.
So basically there was, I don't want to get into name calling or whatever, you guys go read the
articles, but there was some guy who was on the committee who was making decisions that a lot of
people disagreed with. He was very much against the code of conducts for reasons that may or may
not be valid. I don't know. But basically, they felt like he was not representing the community well. And the way the board worked,
or the system to enforce the code of conduct worked was they would look at individual
things. And they would say, is this sufficiently bad, say, to like have this guy removed from
being in charge. And they said, any individual thing was no big, it was not not a
big deal, but it was not big enough to take that action. But if you add up like 10 or 20 of these
things, like all of a sudden, here's like a pattern, a pattern, a pattern behavior that says,
you know, maybe this guy's not representing us as well as we want or something like that. And so
they decided not to remove the guy. A bunch of people said,
that's it. We've tried for a long time to sort of fix this. And we're so fed up with it that
we're leaving. We're no longer going to be on the committee. Some of the people just said,
I'm done with Node.js. Like these are former like steering committee folks on Node.js. Like I'm done
with Node.js entirely. And actually one of the, maybe the biggest thing that came out of this,
not the people on the board, but they said moments after the failed leadership vote,
Kate Marchand pushed the button and created IOJS, a new open source fork of Node.js.
And this has happened before there was some problem with Node.js, the community, and they
created this thing called IO.js. This is AYO.-o.js, but phonetically they're the same, right?
So you had some pretty interesting insights on this, I thought.
Yeah, one of the things that you said, I mean, first of all,
just give thanks that things are working better here
and we seem to have a better balance.
But you also pointed out that we have a single leader
who ultimately decides, right?
We have Guido.
Right.
We have a single person that sets the tone for the community, right?
And I believe they don't.
Yeah, yeah.
They vote on things, and many people vote,
and clearly they're very divided on their roles.
Some people put technical over community,
and clearly some some other people
it's the reverse of that so yeah i think we're lucky that at least you know we we have a model
that that we we know you know that that we follow right yeah i think it's i think it's really
important that you know guido's open to taking feedback from all the folks involved in the
community but in the end he makes the decisions.
And I think, you know, credit to him.
He's made a lot of good decisions that keep people engaged.
Yeah.
So, yeah, here's a Node.js item for Python Bites.
And really the story is just, you know, look at what's going on over here and let's make sure that this doesn't happen to our community as well.
Let's hope not, yeah.
All right.
Well, Miguel, that's the items for this week that
we had picked out um anything else that you got going on you just did an amazing kickstarter right
yeah that that was actually a little experiment I uh I wanted you know for for many years I wanted
to update my flask mega tutorial the first tutorial I wrote more than four years ago and
the amount of work is so big that I always
had to give preference to work, right?
Got to pay the bills. Kids got to go to college.
Yeah. So I decided to give a Kickstarter to try. And basically, I converted that task into work.
That is really cool. So you have this mega tutorial, this Flask tutorial that you've done,
and it's really elaborate. And you basically ran the Kickstarter and said, look, I want to update this tutorial.
Who's in on helping me do this?
And the community really responded, right?
It responded in a huge way.
Basically, the modest funding that I set up for Go was met the first day.
And then from then on, I decided to start coming up with ideas
to expand the tutorial, add more content to it.
It got funded in one day?
Yes.
The 100% funding was in the first day.
But then it got something like, I don't know,
maybe 600% funding in the end.
It ended up pretty good.
And I have some serious work to do
because I have three new, complete new topics to add that I will be working on as part of this rewrite.
That's really cool.
So super excited about that.
Yeah, that's great. Congratulations. And how do people find more out about this?
They can go to my blog. And the blog is blog.miguelgreenberg.com. And they're going to find a little Kickstarter badge there.
And from there, they can go to the Kickstarter page
and find out what I'm planning to do if they're interested.
Awesome. Yeah, we'll throw the link in the show notes as well.
Thank you.
Cool. So I just have one piece of news as well myself,
other than, hey, I'm back from vacation.
That's awesome.
While I was on vacation,
I ended up finding a little bit of time to release my latest course,
which is building RESTful APIs
with Pyramid. And so this is like an eight hour course digging into like the whole what is REST,
like how do you work with HTTP status codes and verbs and all that, and then making this all
happen in Pyramid. And it was really fun to write. So if that sounds cool to you, check it out at
training.talkpython.fm. I'll probably put that in the show notes as well all right miguel thank you so much
for filling in while brian's having his beer over in germany actually i don't know what he's doing
but i hope he had a beer yeah thank you for having me yeah it was great talk to you later
thank you for listening to python bytes follow the show on twitter via at python bytes that's
python bytes as in b-y-t-e-s and get the full show notes at pythonbytes.fm if you have a
news item you want featured just visit pythonbytes.fm and send it our way we're always on the lookout
for sharing something cool on behalf of myself and brian ocken this is michael kennedy thank
you for listening and sharing this podcast with your friends and colleagues