Python Bytes - #136 A Python kernel rather than cleaning the batteries?
Episode Date: June 25, 2019Topics covered in this episode: Voilà! Toward a “Kernel Python” Use _main_.py The CPython Bytecode Compiler is Dumb You can play with EdgeDB now, maybe 16 Python libraries that helped a health...care startup grow Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/136
Transcript
Discussion (0)
Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.
This is episode 136, recorded June 19th, 2019. I'm Michael Kennedy.
And I'm Brian Atkin.
And this episode is brought to you by Datadog. Check them out at pythonbytes.fm slash datadog.
Get a cool shirt. More on that later. Brian, how you been?
I am doing well.
Good, good. Same here. Rolling into summer. Kids are home. Working from home is now chaos but it's it's
all right yeah it's a little cooler today too it's nice yeah it's beautiful so you've got some
kind of magic trick lined up for us for this first thing what's going on here well actually
i think it's uh just going to be me trying to pronounce a french word so i think is it i know
it's not voila voila do you pronounce the Voila? Do you pronounce the V?
I don't know.
Voila?
I think so.
Voila?
Viola.
No.
No.
Voila is a new project.
It's a newly announced project from the Jupyter Notebook people.
So the idea is Jupyter Notebooks in standalone applications or dashboards.
So people that are used to working with notebooks
and they want to share what they found with other people
and you want to have people be able to interact with it
and have it be a little bit interactive
but not allow people to change your code,
how do you do that?
That's where Veloci.
Voila!
Steps in.
It's a pretty cool project.
I was playing around with it a little bit.
You can have custom, or at least reading about it custom widgets
to set up your page and even like templates and grid layouts
but you don't have to be a web developer it's like drag and drop stuff this is
cool i like the idea of presenting the
interactive plots and graphs and stuff and also being able
to uh have people be able to run the code and do selectors
and things but not let people change the code so that's pretty cool yeah you just take your notebook
and you just turn it into a web application that's pretty awesome and it has nice restrictions like
does not permit execution of arbitrary code by consumers of the dashboard it's language agnostic
so you could have C++ code up there
in addition to Python or whatever Jupyter does.
That's pretty sweet.
Yeah, I could see a lot of people using this,
even for dashboards for peeking into databases and stuff.
Nice.
So voila, and you have a website.
Voila.
I have a Jupyter notebook, and here it is.
Cool.
Awesome.
So I don't think we covered this.
I kind of tried to sort of intentionally didn't want to go too far into it because it didn't seem super productive.
But there was a presentation back at the Language Summit called Python, something like cleaning up Python's dead batteries or something to that effect.
Did you catch that?
Yeah.
So the idea was there are some modules and parts of Python that are outdated.
Their existence puts pressure on the core developers in ways that doesn't let them focus on what they need to be doing.
It also makes it super hard for people to contribute.
Like, for example, there's a color system module in Python in the standard library.
I'm going to pick on this thing a lot during this little segment. And it has the very important purpose of converting CSS colors between
coordinate systems.
Super important to have that in the standard library,
right?
I don't know.
Maybe,
maybe not at one point,
you know,
Python standard library was really the goal of it was to come with
everything you need because installing extra stuff like downloading it getting it like running the setup and all that was really
tricky right now everything's pip install a one away so like it looks a little bit weird and if
say you wanted to fix or change the color system module it's not the same as if you want to go
contribute to some random thing on github no you've got to be a core developer. There's a lot of steps that go through this. It only ships every 16 months,
or sorry, 18 months for new content and new things, changes. So how much do you really care
to make a contribution or change to ColorSys? Probably not very much for all those reasons,
right? It's slow on purpose. It's hard to make changes to and so on. But the problem is
like it has things like color shifts that probably don't make a lot of sense to be there anymore.
So Amber Brown and some other folks were making a case that maybe we should take some stuff away.
And it was pretty controversial. There was some like heated disagreements at the actual
presentation and stuff. And I don't care about that. I don't want to go into it.
But that approach was we have stuff in Python that maybe shouldn't be there.
Let's talk about what we can take out.
And Glyph wrote a cool article called Toward a Kernel Python.
And I've talked about this before.
I don't know if I talked about it on Python Byte or on TalkPython.
But I think there should be some kind of subset of Python that is defined to be like the minimum subset of Python that is guaranteed to be everywhere.
So, for example, if you work with PyPy, you get one variant of mostly Python.
If you work with CircuitPython or MicroPropython, you get another variant of mostly Python,
but not all of it, right?
If you work with Brython or some of the JavaScript versions
that run in the browsers,
again, similar subset, but not the same.
So if we had like a smaller,
sort of essential Python standard language definition
and library that was like, I don't know what's
the right number, but a smaller amount that you could guarantee was identical across all
those platforms and then opt into bringing the other stuff in.
So Glyph's main idea was like, could we say instead of like, take what we have and hack
away a few things that don't make sense, rather trim it down to this kernel, to this
essence.
And then, I don't know,
pip install the rest of the libraries or something like that. The rest of the standard library. You want the networking stack, you pip install networking. I don't know. I'm just making up
parts that we would do that for. But that was his idea, that basically there's a PEP, PEP 594,
that's about removing obviously obsolete and dead stuff from the standard library.
And that's all well and good.
But it actually turns out that having things like color sys in the standard library mean
that the core devs have to deal with a bunch of stuff.
So he runs Twisted.
So he talks about how is Twisted doing on keeping up with PRs.
And let's look at CPython over there.
They have 429
tickets currently awaiting review over on github i think it's in github it says the oldest pr
awaiting review hasn't been touched since february 2nd 2018 is almost 500 days old but when you look
at it the prs there's 25 prs that are out unaddressed or whatever 14 were about the standard
library in 10 were about c python so why are the core devs having to deal with this stuff when
there's typically a replacement like there is a built-in htp library but people just use requests
or ai oh tp or whatever right like there's there should be a way to maybe create this essence of
it and then bring more in.
What do you think?
I definitely think there's an idea there.
Like you said, there's kind of like a Venn diagram.
There's a common set that most people need or you'll need for lots of different domains.
But then for a web domain, for web stuff, you're going to need different stuff than for like working with audio files or working with text files or the different problem domains are going to use different bits and you don't need everything else.
Yeah.
Interesting idea.
The install, the how do beginners install stuff is an interesting, how do you deal with that though?
Yeah, absolutely.
And he does address that.
And I do think it's, it is a challenge. He says, look, probably the stuff you go get when you
download off python.org or you brew install Python, that probably should just be everything,
right? But it doesn't mean that it can't be comprised of smaller things that potentially
ship on a different schedule. If you're going to install like Visual Studio or something like that,
you always have options of, do you want to install this stuff and this other stuff also?
Yeah.
You can opt out and we could do something similar to that with Python of,
like, do you want to install the web stuff?
Do you want to install the audio stuff and whatever?
Right.
You could have these full distributions that you install.
But if you look at, say, Linux, for example,
if you go to an empty Ubuntu machine
and you try to create a virtual environment,
depending on the version you have, it may not work.
You may get an error that says,
you need to apt install Python 3-venv.
It's like, wait, that part of Python wasn't shipped?
Okay, well, we'll do that.
Or if you try to even pip install something,
it might say you might have to apt install Python 3-pip, right?
So they've already done this on Linux.
There's other examples as well.
For example, the.NET Core in the Microsoft space
basically is like this, right?
You use their package management system
to bring in significant parts of what is their standard
library yeah i do think it's a problem for beginners i think it makes it harder right it's
like saying well we just run javascript over in node and it's easy but then you see all this
require js and all these like patterns you're like why is this so hard what happened to print
hello world you know like i do think there's a danger but if if the standard way that people get python is they get this big bundle but maybe those bundles are not all maintained by the same
team which is the core developers right maybe if they could be broken apart and then brought back
together through something like pip and then more importantly upgraded like sooner than every 18
months that seems like a good idea and this seems like a way better approach than saying, well, what can we hack out of
the system?
Could we hack out ColorSys?
Yes or no?
Let's talk about that.
Well, right.
And also, there's some stuff like ColorSys.
It's surprising that it needs to be there.
And then there's stuff that's not in the core or in the standard library.
Mike, why is that not there?
Like setup tools and wheel.
I'm always surprised that I have to pip install wheel to create a wheel.
Yeah. And speaking of pip, Cliff did mention that when you get Python, it comes with pip,
like installing Python lets you type pip install a thing, but pip is actually maintained by a
different group and shipped on a different cycle, right? That's PyPA, not the core developers,
for example. So it's like that a little bit, but if That's PyPA, not the core developers, for example.
So it's like that a little bit. But if it was more like that, you could make a change.
You could join the team for, I don't know, the networking or color system or whatever you want,
work on that and maybe push changes more rapidly than the core CPython runtime. Right. We could do something like the PyPA, but do like the Python standard library authority or something like that.
Yeah, exactly. It's a pretty interesting idea. And it definitely seems better than seems more like the outcome will be better than trying to just hack away at a few dead batteries, if you will.
Yep.
Cool. This next one that you're talking about, I recently ran into this as an error. I'm like, wait, what did I type wrong? And then I realized, this might be a cool feature. Why don't you tell people about this?
Like, for instance, with PyTest, it comes as a standalone script.
You can just write PyTest on the command line.
But you can also say Python-m PyTest.
And a lot of pip installable things, you can do that.
You can say "-m", the thing name, and it works.
I didn't know how to do this, actually.
I never really thought, how do I do this?
And how do I figure it out?
And all it is is to put a dunder main file in your project.
I didn't know it was that easy.
So I'm linking to an article that pretty much says
we use the convention of if dunder name equals dunder main,
then run the main program or something.
But you can use the dunder main.py file itself in your project.
And that dash m thing just works.
And that's pretty simple, right?
I'm like, it can't really be that easy.
So this morning I did a little flit project, a flit-based project, and then threw a dunder main in there with just a print statement.
And installed it and went somewhere
else and sure enough it works just awesome so neat yeah that's super cool yeah i ran across
this by accidentally saying python and saying run a directory instead of a file like oh like
it said it couldn't find dundermain i'm like dot py i'm like wait it was looking for it that's
pretty cool maybe there's something to be done here yeah that's great i'm like dot py i'm like wait it was looking for it that's pretty
cool maybe there's something to be done here yeah that's great i'm glad you pointed out if you have
a dunder main in there you can just say like python and the directory name and it works i have not
verified that but it seems like the error message would indicate that it might work yeah i mean it
was like a full path it wasn't just a standard directory either okay yeah so pretty cool neat
yeah i don't know how useful it is
to people, but I'll run across
it every once in a while. Yeah, that and entry points
always really nice. Now, speaking of nice,
Diddy Dog is supporting our show, and they
got some nice products, so let me tell you about them really
quickly. And so
they're a cloud-scale monitoring
platform that unifies metrics, logs,
traces, all that kind of stuff,
monitors your Python apps in real time, helps you find bottlenecks with detailed flame graphs, trace requests
as they cross over service boundaries.
All right.
So if you have microservices, you're talking to database or queuing, things like that.
Plus, it also does automatic instrumentation for popular frameworks like Django, AsyncIO
and Flask, so you can
quickly get started without too much setup.
So get started today.
There's a 14-day free trial at pythonbytes.fm slash datadog, and you get a cool Datadog
t-shirt, which is always fun.
Thanks today, Doug.
Mine's a nice purple color, and my kids always comment when I wear it.
They like the shirt.
Yeah, I love it.
So this next one I want to talk about, the name might sound a little derogatory, but it's not really meant that way. So maybe
simplistic sounds better. But the thing I want to talk about is this article by Chris Wellens
entitled the CPython byte compiler is dumb. Simplistic is maybe better. But what you might
not know, depending on how much you dig into it is there's excruciatingly small amounts of
optimization when c python runs your code so there's a compilation step actually right the
bytecode and compiler talks about some kind of compilation so when you run your code you probably
see the dunderpy cache folders and in there you have the pyc files so that's taking your source
code turn it into bytecode and put it there. But then instead of say, compiling that onto machine
instructions, the interpreter takes that feeds it through this ginormous switch statement,
that's like 3000 lines long in the CPython runtime. And it just goes, well, what case is
this? We're jumping to that, right? It's pretty wild. There's a file called cval.c, check it out.
However, there's very little optimization that happens here. So Chris decided to compare this against
Lua and one other similar, maybe Ruby. I can't remember what the other one that he compared
against was. But he talked about, you know, like, if I write this code, what happens to it? So there
are optimizations like what are called peephole optimizations and a few memory allocation optimizations in CPython, but they're pretty limited.
So if you look at some examples, like let's take an example where we have a function.
It's called foo, defines two variables, x equals zero, y equals one, return x.
That seems simple, right, Brian?
Yeah, except for y is not needed, but yeah, sure.
Exactly.
So when you see that, if y is unused,
and this is not making any change,
it's just literally creating a variable,
which is effectively an entry in a module name lookup
or a locals lookup, right?
Why does that need to be done?
It doesn't seem like a whole lot's happening.
So for example, the CPython byte
code compiler could just go, well, forget that line. And it could say, well, x never changes
its value. So why don't we just inline that to say, basically, the whole function is return
zero, right? Inline the x, drop the y, it's good to go. But if you go and throw that into the
disassembler, you'll see that no that's not
what happens it literally just takes it step by step by step okay wild right so it loads the
constant it stores it into a field called x it loads the constant it stores into a field called
y then it loads the constant again the value for x again and then it calls return val as the byte
code instructions right instead of just load constant zero return value, right?
Like it could be a lot quicker.
So I find that this is, it's honestly a little bit surprising.
I mean, Python is 25 years old and it doesn't take steps like this.
Now, Darius Beacon did point out that Guido himself said Python is about having the simplest,
dumbest compiler imaginable and there's some links
to that so some references there if you want to go check that out so i think you know maybe it's
by design to keep it simple it's easy for people to contribute to but it certainly seems like there
could be a layer like between parsing the bytecode and executing the bytecode that says we're not running in a debugger or
something like that so let's go crazy and like just you know convert stuff like that to return
zero there's also a bunch of other interesting examples in there this is just one that's really
obvious that's good for talking and i pulled out and optimization levels is something that
cs people have been doing with compilers for a long time. So it's not like we'd have to invent it ourselves.
Right.
Yeah, C has had plenty of it.
C-sharps, the JIT compiled languages,
their JIT compiler is a place where a lot of that happens,
things like this, right?
Did you say that there was a comparison to other languages?
Do other languages do more optimization?
No, not really.
They're all pretty much the same.
For better or worse, they're all the same.
So there's an interesting point that he makes
that I do want to just comment on real quick.
It says, so the consensus seems to be that
if you want or need better performance,
don't use Python.
Go use another language.
I'm like, dude, no, you were like so close.
You were so close as well
you could you maybe don't use c python right maybe use pi pi py py but the most obvious
optimization to me that just can change the game is cython yeah you could write like one or two
slow functions in cython and boom it goes to machine instructions and it's you know it's
near the speed of C.
So I like the article.
I don't like that it says,
oh, if things are a little bit slow, just run away.
Like, no, there's probably a package that has a C extension that already works better
or there's a data structure you should be using
that would be better or there's Cython or on and on.
There's a lot of improvements before Python's not the answer.
Or be aware that this is doing this and do your own optimization.
And for the most part, use a profiler and really tell where the optimization needs to be.
Exactly.
Like only 10% of your code needs to be fast at all, if that much, most of the time, right?
It's usually like one little bit like, oh, if this were faster, it would change the game.
Yeah.
I mean, I learned assembly in college and I'm glad I've never had to use it.
For sure.
So final comment here.
Brett Cannon, when I interviewed him recently, he's on the steering council, a core developer, and so on.
He did talk about how would adoption of Python 3 change? How would adoption of Python
in general change if we could make Python two or four times faster than it is today?
Like most of the time, it doesn't matter. But if it could be faster in some interesting ways,
what would that mean, right? In terms of upgrading more quickly to the new versions and just
general, like people not saying, oh, I have to use go because i need async io or something like this
so it seems like if the compiler is this absent if uh optimizations are this absent from the
compiler like there's some low-hanging fruit to like do some simple cs compiler optimizations and
make stuff faster right i mean you could make this uh silly foo method like probably three times as
fast right because you could drop most of foo method probably three times as fast, right?
Because you could drop most of the bytecode operations.
Yeah, definitely.
It's worth looking at.
It's interesting that there is some people thinking about performance.
Yeah.
I mean, we had that conversation around the idea of Rust and maybe what if we use Rust instead of C in certain situations.
But this seems like low-hanging fruit already right
here. And regardless of whether it lands in rust or C when it's executed, not executing code is a
lot faster than executing it. Cool. What's the next one? EdgeDB is something that came up on
my radar a couple of years ago. I saw that EdgeDB people had a booth at one of the PyCons,
and they were talking about it, but at the time,
it wasn't around for people to actually play with. So the other day, I saw an article called
A Path to a 10x Database put out by the EdgeDB people. One, there was a download link, which I
was happy for. There's an alpha one available. And people that are following along and kind of
excited about what they're doing,
they've published a roadmap of the features they have done, what they're working on,
and it's kind of cool. I'm looking forward to being able to play with it more.
So people that don't know what it is, they call it a next generation relational database.
It's based on Postgres. I don't know what that means, if they're using Postgres or if they're using the design of Postgres as a base.
I haven't dug that far.
But it features a different kind of data model and an advanced query language.
And there are a whole bunch of features built into it already.
And I'm pretty excited about a lot of it.
But the thing that really excites me is that they completely replaced SQL, the query language.
It's a different kind of language, and it looks more natural to me.
I mean, people aren't really writing SQL a lot of times because they're using, like, what, the SQL alchemy or something like that.
And partly those things exist because people don't want to write SQL.
But maybe if we had a better query language,
we wouldn't need the middle layer so much.
Yeah, it definitely looks interesting.
I don't have a real good sense of how it compares to both of those.
The query syntax does look nice.
The joins look super cool.
Or the subqueries possibly is more like an analogy.
But yeah, it looks really neat.
It sort of sells itself as a hybrid
between document databases like Mongo
and relational ones like Postgres.
So yeah, it's cool to see innovation there for sure.
Outside of Mongo, seeing some innovation on the SQL side
or the relational side is nice.
We'll see if it's really a 10x improvement.
But yeah, we can't just stick around for nothing. I did actually try to play with it because i'm like i want to play with this
because it's got python bindings but i couldn't get it to install on my mac so yeah yeah well
would you have to suffer for being out on the cutting edge brian price you pay yep all right
so this last one is going to be just a quick roundup of some stuff.
Then maybe this combo will help some folks.
This guy, Wacwas Jonas, worked for a healthcare startup in the U.S.
And he wrote this cool blog post called 16 Python Libraries That Helped a Healthcare Startup Grow.
Oh, neat.
Yeah. that helped a healthcare startup grow. Oh, neat. Yeah, so it's just like a paragraph or two
about different packages or even modules
that they use to kind of solve some problem
within their startup.
So we have Paramiko,
which lets you basically issue commands over SSH
to other servers.
So like on my computer,
I could like use Python and talk over SSH
to run processes or copy files, stuff between
servers, anything I can SSH to. That's pretty cool. The built-in CSV module, you know, that's
always good for parsing CSV files. Really nice. You mentioned SQL alchemy. So they use SQL alchemy
as well. Requests and beautiful soup for APIs and web scraping. I like to say that every website is
an API, even if it doesn't know it. So if it doesn't have an API, it has data and you just
have to do the right request to it. Now it's an API. Here's one for you, Brian, test scenarios,
which is a PI unit extension for dependency injection. So that's kind of cool. Dependency
injection is not that huge of a thing that people make use of in Python, but you know, it has its place, I guess.
HL7. So a simple library for parsing health level seven files into Python objects. That's cool. I
suppose like doing that yourself probably is not fun. So having a library that does it is great.
Python phone numbers, which is
a library for parsing, formatting, and validating international phone numbers. That's pretty sweet.
It's based on a Google library. It's like a Python port of it. G event for networking and
asynchronous code. Python dash date util for parsing date times. Like anytime I have to work
with date times, I'm like, okay okay this project now requires python dash date util because parsing date time sucks without it right but this would need to say
parse and like that right answer just seems to always come out it's great so matt plot lib for
graphs python magic have you heard of python magic i don't don't know i hadn't and so what you can do
is you can give it a file some random binary file or even text file, and it'll tell you what file type it is.
Oh, neat.
So, like, suppose somebody gives you an image and they've named it.jpg, but it's really a PNG.
Like, you could feed it something to something like Python Magic, it would say PNG.
So you can give it, like, PDFs or zip files, and it'll tell you, like, what file it is.
Okay, but Magic?
That seems like a bad name for it, yeah it's python magic well i mean the reason is it's based on a thing called lib magic but yeah
it's just like the criticism just transitively follows to live magic i guess all right another
one django obviously that doesn't need a lot of introduction but yeah they must use django
bodo which is the api for interacting with all things AWS. So if you're doing anything with that, that's super cool.
Like I use Bodo for automatic trends, coding, like re-encoding in different formats,
videos from my courses or downloading say MP3s to a caching server, stuff like that.
And then finally Mailgun for sending email and twilio's python api for sending
both those for sending reminders one over email one over text but it's kind of a cool uh combo
of things right mailgun's just a great name we've been started to use paramico at work too
but for the ssh features it's good oh yeah nice yeah i just i feel like this you know like you
don't necessarily have to pick what he picked but it's cool to see just how those all fit together and think of like, well, you know, what packages run this company, basically.
Articles like this are neat of like different people solving different problems.
What are they using from Python?
Yeah, exactly.
All right.
So that's it for our main items.
Got any extras you want to share with us?
No.
Do you?
I thought I didn't, but I'm going to share one thing with you all.
I just recorded an episode with the United States Digital Service for TalkPython.
Okay.
What is that?
I hadn't heard of that either.
But this is like a little stealth startup type thing inside the government.
President Obama set it up, and it was basically the tech team brought in to solve the healthcare.gov
crashing problems
like that whole big fail to launch for the obamacare stuff there was a group of people
brought in to fix it and then they did they're like well why can't we just apply this to all
the other broken things in the government and so it's a it's a really cool service where like you
can go do like a three-month tour of duty at the u.s digital service and like not even have to
leave your job just take a like unpaid leave leave to go fix something in the government or whatever. It's pretty cool. I
have an episode coming out, but I hadn't heard of it and I thought that was kind of cool,
so I thought I'd throw it out there. Yeah, it is neat.
It is for sure. It's almost a joke. It's not exactly a joke. It's more maybe mocking,
but how can you tell the difference between machine learning and AI, Brian?
I don't know.
If it's written in Python, it's probably machine learning.
If it's written in PowerPoint, it's probably AI.
Written in PowerPoint?
Like as in it's just like a presentation with ideas, but no code and no implementation yet.
Oh, okay.
Yeah.
Okay.
Got it.
So basically if
it's real, it's machine learning. If it's like, we're going to use magic, computer magic to solve
this problem, it's AI and it's in PowerPoint. That's funny. I have a question for you. It's
not a joke, but way back in the dark ages when I was going to college, all the AI work was done
in like Lisp. Yes, it was. Are there still people doing AI in Lisp or is that not a thing anymore?
Do you know?
I think people are still doing it, but I don't think the neural network people have stayed
there, right?
I think the neural network people mostly have moved to Python and things like TensorFlow
and the other GP based things.
But I'm sure that there's like different kinds of yeah yeah
because you're right it was all you had to be a list programmer if you wanted to do anything with
ai and ai was always this like amorphous weird thing like you don't really know what it is but
probably if we can like set up a blind chat with it over irc then it might seem like it's alive
and then it'll be ai right like that you know the whole Turing test and all that, that stuff.
And now it's like, that's cute.
Car, drive here.
The car will go out, you know, like that's with Python and like TensorFlow and the Lisp
one that's doing the chat, right?
Like, I feel like that's kind of where it is, but I'm sure people are still doing cool stuff with Lisp
that I don't know or really want to read the code for.
Okay.
I'm glad there's probably not any Lisp programmers that listen to this.
No, but we're probably going to be posted in a negative way on some Lisp forum.
Sorry about that.
Sorry.
All right.
Well, thanks for being here, Brian, and thanks for sharing everything.
Thank you.
You bet.
Bye, everyone.
Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.