Python Bytes - #179 Guido van Rossum drops in on Python Bytes
Episode Date: April 30, 2020Topics covered in this episode: New governance model for the Django project missingno Announcements from the language summit. Codes of Conduct and Enforcement Myths about Indentation Parsers and Li...bCST Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/179
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 179, recorded April 21st, 2020.
I'm Michael Kennedy.
And I'm Brian Ocken.
And Brian, I'm super honored to have Guido van Rossum on the show.
Guido, welcome to Python Bytes.
Hello, glad to be here.
Yeah, it's really great to have you here.
It's going to be wonderful to hear your opinion, your perspective on some of these things that we're sharing this week. So welcome to the show. And this episode is brought to you
by Datadog. Check them out at pythonbytes.fm. More on that later. Brian, what do you got?
What's up first? Well, I've been thinking a lot about community lately, actually. And one of the
things that came out recently, this was a little bit ago, but it's still fairly new, is the Django
Project announced a new governance model.
It's been going on, I mean, I think they've been working on it for a couple years, since at least 2018.
Some of the specifics are interesting.
They had like a core team that they dissolved the core team, and they mainly kind of have a new role called a merger person,
which they have commit access, but they only merge pull
requests. So most of the changes could happen in the pull requests. And the discussion happens
there. There is a technical board also that was kept to kind of make some technical decisions
if there's if it's necessary, but apparently it hasn't been necessary for a while. I think it's
interesting that they switched the governance model midstream. And then also the rationale around it, I think is interesting.
And the rationale is around trying to get more people contributing to it. So they had like their
core team that hadn't really changed for a long time. And people that were set up as core people
really weren't contributing much anymore. Anyway, I just thought that was interesting that the reason around
changing the governance was around trying to get new people in.
Yeah, I think that's a great idea because Django's been around for a long time
and it's a fairly stable project, so I think it's kind of hard to jump in.
I mean, it's a little bit like Python itself, Guido.
Right. I'm thinking that sort of maybe five years in the future, Python could consider a similar
move, or maybe we'll know that this was not the right move by then from Django's experience.
And of course, the situation for the two projects is somewhat different.
But we definitely also feel the pain of sort of not getting enough new contributors.
But we only fairly recently, like early last year, we changed our governance structure
completely.
So it's a little early to start considering changing it again, probably.
Right, of course.
We're just starting to see the outcome of the decisions and the releases that are actually
going through that model, right?
Yeah.
We've been working with the steering council model
for, say, 16 months now.
Yeah, I guess so.
3.8 definitely came out under that model.
Yeah.
The thing that Python did that I think is kind of interesting,
and I don't know if you started it,
but the notion of having more core mentors
to try to mentor new core developers,
I think that's an interesting thing.
You can't really make people be mentors,
but that's an interesting way to get more core developers on.
We have a few people who are very active as mentors
in addition to being active as core devs,
and it really does make a difference.
Yeah, we don't have enough mentors
to mentor everyone who wants to become a core dev.
Yeah, so I think that's really great.
I mean, it's one thing to write web apps in Django or to write Python code.
It's an entirely different thing to write Django or write Python, right?
It's a very different skill set.
And so I think that mentor model is really a great bridge.
Yeah, that'd be cool.
So speaking of things I think are going to be really helpful,
but in a much simpler way,
this is sort of a data science topic for everyone out there.
And one of the problems in data science is
you can end up with very large data sets, complicated data.
But every now and then there might be a none
where you expected an integer,
or there might be a empty string where you expected a date or something like that.
And understanding how that data is or how complete it is,
where is it more incomplete than less complete, right?
Or less, more or less and so on.
So there's this cool project called missing no,
which I think is missing number, right?
Shortened.
And the idea is it's a missing data visualization module for Python. And you too can see the picture in the
show notes and folks who listen to this, they can go back and see it in the show notes as well.
But it's a really cool and simple little library, but it's not just show me a quick graph. It
actually does some pretty powerful analysis. So what you can do is if you've got like some pandas data,
you can just go to it and say msno.matrix
and give it a sample of your data.
And it gives you these really cool graphs
of like vertical, either black or white bars
or bars that are like kind of zebra stripe,
depending on whether or not there's missing data.
It shows you which parts,
which columns are more complete or incomplete. And even as a little
graph on the side that tells you the likelihood or the correlation of a row being incomplete,
right? Like you might have a missing address on one line, but in another one has a missing
phone number, or it could be more likely that those are both missing at the same time. There's
like a little graph to visualize that kind of stuff. What do you guys think? I think it's very cool. I'm not a data anything person myself.
So yeah, to indicate how much I am not in the target audience for this module.
The whole time I read your modules, I had the grouping wrong. I thought it was
the missing data visualization module.
And I thought, well, that's kind of cool that they say there's something missing,
and this clearly is the one that's... Now it's turned up, but it's actually visualizing missing data,
which actually I understand what that is.
I've seen a spreadsheet or two,
and I can actually even understand the little example chart that you pasted into
to the notes without understanding anything else around it yeah it's so wonderful because the
that's why i actually think i like this and i chose it as you can just look at that picture
and go oh i basically get a sense for what this data is like it's complete it's not complete it's
mostly incomplete on on this column or whatever.
And yeah, it's really nice.
And I suspect you could, if you had data, say, in a database or a file or something,
you could probably just read that into a Pandas data frame and then throw it out here and visualize database missing data or file missing data or whatever.
But it's really nice.
Yeah, for large data sets, one of the things you got to do is to decide when you're cleaning
it up what to do with the missing data. And there i mean there's some nones or whatever there's some strategies to
either fill it in with uh interleaved data or something or or just throw those complete rows
completely away but you i mean you don't really know how much data you're throwing away if you
without visualizing it so this is pretty cool i think this is great yeah and it has other visualizations as well it has heat maps which are like correlations you know so like
address and phone number correlated kind of things i was talking about it has bar charts and the most
interesting or unique visualization is the dendogram which i had never heard of but this
is a hierarchical clustering algorithm from sci-fi actually and it creates this kind of like hierarchical tree of relationships of missing data there's just if
you are worried about like cleaning up data or stuff like that or visualizing how good your data
is you could throw to this real quick and get some great answers yeah that's cool yeah all right well
guido you have been busy with the language Summit recently, right? What's the news there? Yes, well, normally the Language Summit basically is an in-person meeting where about 50 people
who are mostly but not exclusively core devs get together a day or two before the actual Python conference.
Since the conference was canceled...
This would have been in Pittsburgh, right?
It would have been in Pittsburgh, right? It would have been in
Pittsburgh this year, right. Obviously, the conference was cancelled and the language summit
was too. And then the two organizers thought, well, okay, this sounds like the kind of meeting
that we can actually try to do on Zoom. You can't have a whole conference on Zoom, but you can
probably have a meeting with 50 people on Zoom. And they tweak the format a bit so that, I mean, you can't be on Zoom for an entire day. I find Zoom incredibly intense. And after an hour of Zooming, yeah. Yeah. User interface sucks. Privacy probably sucks.
But it clearly serves its purpose.
So we had it spread over two different days.
And then in addition, because nobody was traveling to Pittsburgh, we spread it out in time.
One day, it was really early for me so that we could also have participants from Europe. And one day it was really late for
me so that we could have some people from Australia join us. One of the organizers lives in Poland
and he was there till the end on both days. So I don't know how he slept.
Yeah, so as usual, the format wasn't actually all that different. It's typically like half hour slots for various topics that are important to either get information
to core devs and usually also get feedback from core devs.
And we pretty much stuck to that format.
The one big thing that you miss, of course, is all the whispering to the guy who
was sitting next to you or during the break, quickly grabbing three other people and having
a little huddle about a topic. Yeah, that's what's so powerful about in-person conferences.
Yeah, we missed the entire hallway track, but it was still good to have sort of short presentations and q a sessions and the q a sessions actually
worked really well there was a little tool that you can use to sort of moderate questions
and lukasz was like running the moderation tool and it was nobody was asking spam questions so he
all he had to do was just click OK for every question, I think.
Yeah.
That tool is much more structured than the chat channel on Zoom could be. And sort of raising your hand on Zoom and waving doesn't really work if there are 50 people,
because there's no way to see more than 16 people or so at a time.
Yeah.
So anyway, the first day, each day, there were like maybe five topics and a few miscellaneous
things.
Shall I just go over each day briefly, see if I can sort of run them all off?
Yeah, I would say just maybe touch really quickly on just the things that you felt like
really might make an impact going forward, potentially.
Just a one-liner guy who originally implemented f strings gave a talk
about whether maybe all strings should become f strings and the general sentiment was that
that would have been nice in python 1.0 or so but there is no way it would just break too much code
it's gonna break too much i totally hear that though because I'm so often I'm typing in a string.
I'm like, oh, I need to put a variable here, but I've typed 20 characters in that.
I got to go back to the beginning, but not the beginning of the line.
Cause maybe that's what I got to get to the beginning of the string and then go, maybe
we could even put the F at the end.
Who knows?
But yeah, I would love to see it, but it's, I totally understand.
You can't do that without breaking stuff.
There are downsides to automatically doing it too, because curly braces are useful for all sorts of things besides formatting so that was
sort of the opening salvo then my two co-conspirators on the peg parsing project gave a talk about how
we're going to hopefully introduce a new parser in Python 3.9.
And we've been coding for like almost a year now, probably. It started out as a little hobby
project of mine and gradually became more serious and more people started helping out.
And the last few months we've been doing heavy engineering work to actually prepare for the integration.
But we didn't have steering council approval yet.
We made it a pep and we sort of said, well, this is a nice thing, but we're not going to do this unless there is sort of clear consensus or at least general agreement that we are going to do this. And so very soon after the summit,
the steering council actually had a meeting
and approved a bunch of PEPs, and ours was one of them.
And then the last two days, I've been stressing out
because we wanted to get the new parser in the Alpha 6 release,
which is going out tomorrow.
And so we're now in the last,
the very last stretches of preparing for alpha six
and we're just deleting or disabling tests
that are still failing that we know how to fix them,
but we just don't have the time.
Right.
That's exciting that this project is going to be in there.
That's great.
Yeah.
So that's the new parser.
And if all goes well, nobody will notice a thing.
Ideally.
What are the effects?
Is it going to speed things up or make things more maintainable?
It's going to sort of open up the grammar for future changes to the language
that we currently can't do because the old LL1 parser holds us back.
Okay.
That's sort of the main motivation.
Super.
There was one interesting talk about something called HPI,
which is a proposal for a new, more portable API,
and in particular focused on other Python implementations
besides CPython, as you may know.
PyPy has been struggling for over a decade with
compatibility with extension modules. And the HPy proposal is basically instead of pointers to
objects, you have handles, which is a pointer to a pointer to an object. And there's a whole
API around handles that is equivalent to the existing API, but it allows different styles
of garbage collection. For example, you could implement a garbage collector that moves objects
behind your back occasionally. Right, you might get a generational compacting garbage collector
because you could update the value of the pointer pointer without changing the actual pointer,
right? Yeah, yeah, that's actually really exciting. Yeah. And it's still in early stages, I believe. But it looks pretty promising. Eric Snow gave a
lightning talk about a sort of a retrospective of all his work on multi core support, which is
now beginning to conclude, well, maybe it's too soon to call it a conclusion, but we're going to have sub
interpreters with a much better API, either in 3.9 or in 3.10. There's a pep around that 5.5.4,
which will definitely be moving forward. But whether it's considered mature enough to go
to land in 3.9 is not entirely clear yeah eric's work is very interesting there yeah
yeah and in 3.10 we will probably have separate gills per sub interpreter that is going to be a
major new thing let's see what else do we have well so the next day i gave a talk about the
future of typing which oh yeah there's one detail you might remember that we introduced
something called from dunder future import annotations which made it so that annotations
are no longer evaluated at runtime you can still introspect them but you'll get just get the string
containing the annotation expression back well that's going to be the default in 3.9 most likely there's still a little
debate about that but there was like a two-thirds preference for just making that the default in
3.9 and and various people argued effectively that nobody should notice any difference i'm
really excited or happy to have typing in the language it makes such a difference for the right
use case you know on defining the boundary of apis or making the editor understand something
better when it otherwise wouldn't if you're maintaining tens of thousands of lines of python
code or more type annotations really make a difference yeah for sure i still don't recommend
teaching them to beginners though oh. Oh, really? Okay.
It depends on what kind of beginners you have.
If they're sort of recuperating Java programmers, maybe you should introduce them.
But if they're like actually blank slate, this is the first time they're programming ever.
I wouldn't bother with them with annotations.
Yeah, I kind of agree with that. Yeah. available time to actually implement the design. And I'm sure that when you're halfway through implementation,
all sorts of interesting issues with the design will prop up.
So the design is not final until it's been implemented.
Okay, last two topics.
Zach Hetfield-Dodds gave a very good talk about what he calls property-based testing and which really is about the tool named hypothesis
that introduces testing approach that i think was first developed in academia for haskell
that works in a completely different way than your typical unit test based testing. Right. The tool decides, right? Instead of examples.
The tool generates test cases,
and I've never played with it myself,
but the talk sort of made me very excited
to play around with it more.
And it actually,
even though it's a very different approach
than unit test or PyTest based testing,
it will still integrate with that.
I mean, you can write a unit test and then put some decorator on top of it that produces test data.
And Hypothesis has all kinds of really advanced stuff for exploring enormous spaces of possible input data and quickly finding bugs.
Do you think we'll get to a place where we are able to use Hypothesis for
some of the testing for the standard library?
That was one of the propositions that Zach made.
I think it's still early for that.
I think it's much easier to introduce Hypothesis in sort of a new project where you haven't yet written all the code and all the tests than it is to retrofit it in a large, mature, or maybe even somewhat dementing project. we'll have hypothesis-based testing for the standard library, just like it'll be a while
before we'll have annotations in the standard library rather than annotations sort of separate
from the standard library. The last talk I want to highlight, and then I'm really done with this,
is also a very good talk by Russell Keith McGee about the state of beware and Python for mobile.
And one of his suggestions was that we adopt some of his mega patches that he's currently
been maintaining for several Python releases that would make Python at least compile out
of the box or nearly out of the box for the important mobile platforms.
That'd be cool.
Yeah, it'd be so wonderful to have Python as an option for mobile.
It really would bust open the doors and create even more growth.
Many people believe that sort of mobile platforms are obviously continuing to grow in importance
and to grow in power.
And we'd be crazy if we didn't support Python on those.
And it may be very important for Python's very survival.
Yeah.
Yeah.
I saw the block Swan talk that Russell Keith McGee gave,
and it was compelling.
He is an amazing speaker for sure.
Yeah.
Yeah.
That's what I have.
Great.
Thank you so much for that insight.
That was,
that was awesome.
A lot of people don't get to see the behind the scenes.
They just see what's announced when it comes out, right?
Before we move on, let me tell you about our sponsor, Datadog.
This episode is brought to you by Datadog.
So let me ask you a question.
Do you have an app in production that's slower than you like?
Is its performance all over the place, sometimes fast, sometimes slow?
Now, here's the important question.
Do you know why?
With Datadog, you will.
You can troubleshoot your app's performance with Datadog's end-to-end tracing, use detailed flame graphs, identify bottlenecks and latency in that
finicky app of yours. So be the hero that got the app back on track at your company. Get started
with a free trial over at pythonbytes.fm slash Datadog. Get a cool t-shirt as well. Brian, you've
got another one that kind of ties into your first one, right? But it's sort of the other side of the
coin, maybe? I don't know what's been happening in the Python world that you
sort of orbit in that might make you think about these things, but tell us about it.
No, I've just been thinking about community and codes of conduct and enforcement for codes of
conduct. No reason, really, just kind of an interesting topic. One of the things I've been
thinking about is, especially when researching this, the codes of conduct and enforcement of it and how we treat people.
I first thought it was really important for open source projects.
And it definitely is because people have the option to just leave and get out of the project.
So you really want to treat people well so they stick around and have it be welcoming to other people.
But I don't think industry is really that different.
I think that people have the ability to just get another job
or work on a different project.
So I think these are important for industry as well.
I took a look at two sets of codes of conduct and the enforcement of those.
So the PSF has a code of conduct.
I'm not going to read them all out,
but there's things like being open and being friendly.
And in there, there's a list being open being friendly and in there there's
a list of inappropriate behaviors as well that's covered now also the django code of conduct they
also have all of these when you read them there are differences but when you read them they kind
of sound the same one of the things they highlight in the django one is be careful with your choice of words,
and they include examples of harassment, speech,
and exclusionary behavior that's not appropriate.
One of the big differences I saw was the enforcement.
So the PSF is a two-third majority vote enforcement sort of thing to make sure if something happens,
like if they want to kick somebody out or put them on probation or something.
I think that's really important because if you require 100% majority and somebody who is on the team that decides is potentially part of the problem, then what do you
do, right? It's really tricky. I mean, if people are just going to abandon a project, right, you
would rather have just a strong majority make a decision i also think that
psf has probably got a larger possibly as a larger working group on this and as more i guess maybe
harder to get a hold of people maybe it's easier to get a two-thirds then maybe you can't even
reach all 100 of the group but anyway the other interesting difference is um PSF code of conduct seems to, I know it does cover online interaction as well as events
like the conferences and meetups and stuff.
But possibly, at least I think that maybe its focus might be more on events,
whereas the Django code of conduct is specifically targeted towards online interactions.
I would say for the PSF that sort of historically,
events were the first place where codes of conduct were introduced,
but we've been using them for online forums more and more in the past few years.
Okay.
One of the interesting things with the Django one is that
a single person on the committee can act without collaborating with anybody else.
If it's an ongoing problem or if there's a threat involved or something, they still have to go through the process of notifying everybody else.
But there is an interesting thing that one person on the committee can intervene right away. I'm not saying one is better than the other, or
I just think it's interesting. And I think it's important for new projects to think about not just
their code of conduct, but how they're going to enforce it and what the timeline. So the Django
one also includes some timelines, which is interesting. And I would really like to make
sure that projects kind of practice, maybe figure out what they're going to do if they need
to enact one of these things without, you know, before it becomes a problem, they know what
they're going to do. Yeah, there's a lot of stuff going on with some projects out there. So having
a couple of examples and side by side comparisons, I think is great. I was interested to find out
our meetup, like the Python meetup that we started which is on hold right now unfortunately because of the the virus and quarantine and stuff but because we were getting support from the python
software foundation to help pay for the meetup fees and stuff we had to list a code of conduct
on our meetup page and stuff like that yeah that makes a lot of sense but i didn't realize that
yeah yeah the psf been has been doing that for a few years now yeah that's really great all right
this next one i want to cover.
It goes back a ways, but I think it's really fun.
And it's something that also, I think, ties together well with our special guest here.
And this is an article about myths about indentation.
And Guido, I picked this one because you were talking about this on Twitter just the other day.
What was the motivation to throw that out there?
That is a good question. I was just going to volunteer the answer because apparently I had
a link to that article on my homepage in some odd corner. And I have a very, very sort of
ready old homepage. It's moved it to GitHub pages, but it looks like web 1.0. And because it really
is, I just added rawtml it blends in right with
netscape huh so someone reported to me a broken link which happens like i don't know once every
four years or so someone reported a broken link oh wait it wasn't even on my home page it was on
an old blog that i can no longer edit at artema.com. I'm very glad that that blog is still online.
But so because I got the report of the broken link, I decided, oh, I'm sure I can still
find on archive.org where that link used to point.
And sure enough, it was there.
And I thought, oh, that's actually still a neat little article.
So I thought, okay okay tweet of the day or
tweet of the week yeah i agree and i think it's interesting as well and just to give you a sense
of why it might have disappeared it was one of those types of sites where the domain or the url
included a tilde username path like you know and like used to get in university or whatever way
back when so anyway this one is myths about indentation for Python. And for people who come
from a C-oriented language, I think Python could come across a little bit funky. I actually want
to share a little story of just sort of my journey with it and how I came to love this.
But I think this is really interesting for people having the debate about is significant white space
useful? Is it weird? Is is it good i did a ton of
c++ and then c sharp development so it was all and then javascript development it was all about
the curly brace languages lots of symbols and then i came to learn python and i'd love python
right away but it was weird to me i felt kind of naked like if i'd write an if statement i'm like
i need some little parentheses to kind of hold the code in place and why don't they need to be there
and i need a curly brace to like say when this block of code is done and whatnot it just took a
little bit of getting used to but i knew that it was the right thing for me because when i went back
to work on some older projects i'm like why are there symbols everywhere what is all this stuff i
have to keep typing this is like a broken language and just took a couple of weeks for me to like
make that switch to feel like it was broken to go back to work in languages i've been doing for like 10 years so well done
with the white space guido thanks yeah but so let's cover some of the things mentioned really
quick in the article one is that white space is significant in python source code and actually
no not in general is the answer it's significant on the left so right so as much as you indent stuff
that really means things but between variables like whether you have like a equals seven or a
space equals space seven doesn't matter you can have tons of spaces in there right like any other
language of spaces kind of don't matter except for on the left so that that's cool. And also the amount of indentation doesn't really matter, right?
You could have five spaces for any code suite that you want,
or you could have 18, or you could go with a standard four.
I recommend the four, but you know.
And then also if you have something that defines
like a list comprehension or an array creation or a dictionary,
then all of a sudden the spacing doesn't matter anymore, right? As soon as you have like an open square bracket and then you
have a bunch of stuff and then close square bracket, spacing doesn't matter in there. So
I think this is interesting to think about as folks debate that maybe within their teams.
It also, you could say it forces you to use a certain indentation style. Well, yes and no.
If you wanted to write it single statement per line, then yeah, there's a cool
example that they gave in the article is like, if one plus one equals two, then new line, print
food, new line, print bar, new line, print, or just say X equals 42. You can also put them on
multiple lines with semicolons. If you're really missing your semicolons from your language, you
could do that. The thing that's interesting here, I think this is probably the most significant part
of this article or this write up is if you look at it, it looks right.
And when it gets parsed, it is right.
There's an example of some C code that looks visually wrong because it's intended differently, but it's going to parse.
But the way you see it when you read it is not what's actually happening and i think there was a problem like this well i think
it was in some either objective c it was something with apple in there um it was really bad there was
an infamous apple vulnerability i think it might even have been on the iphone where someone had
added a second statement to a block but it wasn't a block because there were no curlies. Right. That it started out with a single conditional line, like if something indent,
do the thing. And then they just indented, but they didn't put the curly braces in. And it was,
yeah, it was, it took so long for people to find it because visually it looked like what
Python would look actually mean, right? It looked like those two things were part of the if block,
but because the white space didn't matter, it actually didn't. And so that's really interesting. I'm not
going to go through everything. I'll put it in the show notes. But another one that I thought is like,
I just don't like it. And that's fine. People can not like it, but it has a lot of advantages. Like
in that example before, if you had that wrongly indented Python code, it would not parse. It's
an error to have it not look right. And rather than just not be right. So it has a lot of advantages and people can really quickly get used to not having to write
all those symbols. And then you go back and you're like, this code is hard to read. It's just full
of curly braces, semicolons, parentheses everywhere. I always thought we used to, those were just,
that is what builds programming languages. To have a programming language, you had to have that.
And then once I experienced Python and I went back went back it kind of it broke my mental model of the world i'm
like you don't actually have to have those things so why are they there anyway i what do you think
about this article you must like it somewhat because you hunted it down and tweeted it right
it's all news for me because i didn't even invent the white space thing for python that was sort of
handed to me on a silver platter
by one of my mentors in the early 80s.
Yeah, back in the ABC days.
And in those days, it was an innovation.
There was like one other language that had this
and Knuth had once said that he thought it would be a good idea,
but he had never actually implemented the language
or even experienced the language that implemented it.
He just thought that it would be a good idea.
Right, right.
The only thing that was a stumbling block for me was when I first started looking at Python, the editor I was using, I think it was an Emacs something at the time.
I'm not sure what I was using. But with the C++ code I was using, I had it set up so that if I double-clicked
on the closing bracket,
it would jump to the top of the block.
And I really liked that feature.
And for some reason, that's the reason
why I didn't like the white space thing at first.
Like, how do I get back?
But then I just went, okay,
I'm going to, like, beginner's mind,
just open mind, just embrace it
and learn it as a new thing.
And I didn't, like, a week later and i didn't like a week later i didn't
even miss it so yeah and of course the new editors the newer editors like pie charm and stuff at the
bottom they have little breadcrumbs of you know here's the class here's the function here's the
if here's a while whatever and you can you can jump between them just like you were talking about
but like the entire hierarchy of like i don't know the tokens or whatever yeah and i just i tend to
write smaller functions now so it's not as much of a deal.
This is probably a good thing that it was hard.
I was thinking that if you needed the attitude to help you find the top of the block,
it must be pretty far away.
It's 4,000 lines. I hate scrolling so much. These functions are hard.
How interesting. All right. Guido, do you have one more you want to share with us?
Well, yeah, you gave me some homework.
I didn't really do it, but there's like, and of course, this has to do with parsing.
And so this may be a fairly esoteric library.
But if you're writing a program that sort of does some manipulation of your code, and
maybe it converts four space indents to two space indents
or three space indents or whatever or maybe you're having you're writing something like black which
is the sort of python code reformatting tool but you don't like the way black handles certain things
or maybe you're writing some other thing that does analysis of source code.
Maybe you're writing a linter.
There are a couple of tools that you can use.
And it turns out that one of them is in the standard library.
There's something called lib2to3, which is a little hard to pronounce.
It has the digit 2 and then the word T-O and then the digit three in the name.
That is tricky.
That is something I wrote probably over 15 years ago,
or at least the core of it,
which is yet another LL1 parser,
but this one's written in Python
rather than in C, like the original one.
And actually, Black ended up using lib2 to 3,
except I think Lukasz had one issue that he couldn't figure out how to do with black and so he ended up vendoring a copy of lib 223 and then butchering
it a little bit which is how these things happen i mean if you look at what pip vendors that's
pretty scary but there are good reasons for that too so but if you're
writing your own you should probably not use lib 223 and not just because it's going to go out of
style once the peg parser arrives there are much better tools and the one that i discovered a few
months ago uh it's actually written by some folks at Facebook mostly. It's called libcst and they
have unique capitalization. It's a capital L lib and then lowercase i b and then cst is all
uppercase. And so it's a library for manipulating concrete syntax trees. And like lib223 it actually shares some code with lib223 i think the underneath is a
parsing library called parso which itself is a butchered version of lib223 at least that's how
it started these tools are things that can parse python code but they produce a syntax tree that
is the opposite of an abstract syntax tree.
It's a very concrete syntax tree.
And that means that every space, every comment, every bit of indentation is preserved
or at least can be recovered from the information in that syntax tree.
And opposed that with the typical abstract syntax tree
which in the end doesn't even remember
where the parentheses are.
Right, right.
It just takes us, well, here's some conditional statement.
Here's the two things we're testing, right?
So this sounds much more useful
if you want to do like a code analysis type of thing
to say this thing you're doing here,
you should do it in this other way or transform it over,
but kind of preserve things like comments and style.
Yeah.
And so libcst has a really sort of solid underlying model.
And they thought a lot about various transformations
they want to apply because the typical way these tools work
and lib223 itself started out that way as well, is you read your source code using this customized parser. It gives you a concrete syntax tree. Then in that syntax tree, you're actually going to systematically rename a parameter or move things around or insert.
In the 2-2-3 world, of course, it's used to turn things like iter items into items and iter keys into keys.
And you can make that kind of changes.
And so libcst also supports that. It sort of has a slightly better API because 15 years ago when I started lib2.2.3,
I didn't realize what an important tool it was going to be.
And some of the way the white space is attached to nodes
is exactly backwards from the way that is the most convenient
to think about it and work with it.
All right, cool.
Well, this sounds like it'll be really helpful
for people building tools like Black
or looking at code analysis and stuff.
Right.
Lukasz had a, I think it was the 2019 talk, PyCon talk, where he described how Black uses
both concrete syntax trees and the abstract syntax tree.
It's a pretty fascinating talk for a very low level depth into these concepts.
It wasn't until I watched that talk that I realized
that Black compares the before and after abstract syntax tree
to make sure that your code is guaranteed to run the same
so you don't really have to test for that.
He's already testing for it.
So that's pretty interesting.
Yeah, that's very cool.
That is a very neat feature.
And it's actually an important trick in general
for people who are doing transformations to have some abstract way of double checking that your transformation left things in a decent state.
Yeah, it's cool.
Yeah, very cool. All right. Well, thanks for LibCST. We know that's a great one. Now that's it for our main topic. So just really quick things at the end that I just want to throw out there for people.
One, Adam, who goes by Codependent Coder on Twitter,
sent a message over and said,
hey, Django no longer supports Python 2 at all,
which is pretty awesome because 1.11 has left long-term support,
leaving only 2.2.12 onward,
which has only Python 3 support.
So yay for modern Python making its way through.
That's good. And then last time we talked about 90% of coding is Googling and that's okay,
or it's not. And we didn't really feel like that was our experience, right? As people have been
around for a while. But I got to tell you, this last week, I've been doing nothing but pandas,
Altair visualization, Jupyter notebook and, and graphics because I'm building a whole
set of dashboards for the TalkPython courses and whatnot.
Basically, the dashboards that I should have built a while ago. I Googled a lot.
A whole lot. But that's the thing. It was like a two or three
day blip of like, wow, I'm Googling 25-30%
of my time because I don't know anything about
these things and how do i get this thing to line up with that bar but now i'm back to just kind of
mostly not doing that anymore even after a few days so i think generally what we said is true
but i do think there's like these blips of like wow i'm diving into something new it's like mad
search scrambling but then i'm back to sort of using like more memory coding i don't know what you call not google coding yeah you gotta understand what you're doing and that means you can't just
google for examples and copy and paste them in because then you can combine the examples and
you have no idea what you're doing and of course it doesn't work at best it's frustrating right
you're like i this worked that worked but together they't work. And you just don't even know why, right? Yeah, so for sure.
But yeah, so anyway, it's a follow-up on our conversation last week.
Brian, what do you got to throw out there for everyone? I'm going to say this on this show just to make sure
I do it. There's like three days left for me to record my talk.
Yeah, this is like forcing yourself to commit to it, so you're going to do it?
Yes, definitely. So PyCon Talk, I really do want to commit to it. So you're going to do it. Okay. Yes, definitely. So PyCon talk,
I really do want to get it online.
It's important stuff.
It's about parameterization.
I talked a couple episodes ago about having trouble switching back and forth at home with all this working from home stuff between Mac and windows.
I finally figured out the whole using command and control.
So thank you to everybody.
But apparently there's this really simple thing.
Apple lets you just
swap them on on a keyboard so that's what i'm doing and it works great and then also i had
promised that i was going to have my cards project be able to work and publish to pipei or the test
pipei it doesn't work with setup tools scm because i'm using fl. So if somebody's got a way to figure out how to just
somehow change the version
string or bump that every time
you merge or something like that,
that'd be great. But otherwise,
right now I don't think there's a way to automatically
push to PyPI
if you're using Flit.
Because it says that one's already uploaded.
Maybe there's a GitHub action that will just
randomize that or something.
Because the version is embedded in the source code.
And the trick that people are using with setup tools is
the version is based on the version in GitHub.
And you can't do that with Flit.
So at least I haven't figured it out.
But that's okay.
I'll probably do something else.
That's my extras.
Guido, anything else?
Even though I said it's hard to imagine Python going online,
it actually is going online.
At least some of it is.
The first talk by the conference chair, Emily Morehouse,
has been posted and many more will follow.
Yeah, her welcome was really nice.
The other thing, and as you mentioned,
Django no longer supports Python 2 at all.
Well, that's just fine because the very last release of Python 2,
2.7.18 was released a few days ago.
Yeah, that's great.
That must be kind of a load off of your shoulders
to finally have that in the rear view mirror.
I'm very happy and I'm sad, of course,
that we can't have an absolutely wild and crazy party in Pittsburgh like we were planning.
Yeah, a big celebration on Zoom.
It's just not the same.
Just have to have a bigger one next year.
That's one I don't know how to pull off.
Well, that's really good.
All right.
You guys ready for a really quick joke?
All right.
So here's a quick joke sent to us by Derek Chambers.
And he may have even made this up for us.
This goes back to the sub-interpreters and the multiple gills and all that.
You guys know how you can borrow money concurrently?
With async IOUs.
That's a terrible joke.
That's a bad joke.
Oh, that is very groan-worthy.
Very groan-worthy.
Excellent.
Most of our jokes actually are around here, but that's how it goes.
Yeah, and keep them coming.
Keep sending us your bad jokes.
Yeah.
That's right.
That's right.
Python dad jokes, that should be a whole separate category.
They absolutely should.
They should.
Well, Guido, it was really an honor to have you on the show.
Thanks for coming and sharing your perspective on all this.
Glad to be back.
Yeah, and Brian, thanks as always. Good to be here with you.
Cheers. Yep. Bye, everyone.
Bye. Thanks, both of you.
Thank you for listening to Python Bytes. Follow the show
on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes
at PythonBytes.fm.
If you have a news item you want featured, just visit
PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.
