Python Bytes - #179 Guido van Rossum drops in on Python Bytes

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 179, recorded April 21st, 2020. I'm Michael Kennedy. And I'm Brian Ocken. And Brian, I'm super honored to have Guido van Rossum on the show. Guido, welcome to Python Bytes. Hello, glad to be here. Yeah, it's really great to have you here.

Starting point is 00:00:21 It's going to be wonderful to hear your opinion, your perspective on some of these things that we're sharing this week. So welcome to the show. And this episode is brought to you by Datadog. Check them out at pythonbytes.fm. More on that later. Brian, what do you got? What's up first? Well, I've been thinking a lot about community lately, actually. And one of the things that came out recently, this was a little bit ago, but it's still fairly new, is the Django Project announced a new governance model. It's been going on, I mean, I think they've been working on it for a couple years, since at least 2018. Some of the specifics are interesting. They had like a core team that they dissolved the core team, and they mainly kind of have a new role called a merger person,

Starting point is 00:01:02 which they have commit access, but they only merge pull requests. So most of the changes could happen in the pull requests. And the discussion happens there. There is a technical board also that was kept to kind of make some technical decisions if there's if it's necessary, but apparently it hasn't been necessary for a while. I think it's interesting that they switched the governance model midstream. And then also the rationale around it, I think is interesting. And the rationale is around trying to get more people contributing to it. So they had like their core team that hadn't really changed for a long time. And people that were set up as core people really weren't contributing much anymore. Anyway, I just thought that was interesting that the reason around

Starting point is 00:01:47 changing the governance was around trying to get new people in. Yeah, I think that's a great idea because Django's been around for a long time and it's a fairly stable project, so I think it's kind of hard to jump in. I mean, it's a little bit like Python itself, Guido. Right. I'm thinking that sort of maybe five years in the future, Python could consider a similar move, or maybe we'll know that this was not the right move by then from Django's experience. And of course, the situation for the two projects is somewhat different. But we definitely also feel the pain of sort of not getting enough new contributors.

Starting point is 00:02:26 But we only fairly recently, like early last year, we changed our governance structure completely. So it's a little early to start considering changing it again, probably. Right, of course. We're just starting to see the outcome of the decisions and the releases that are actually going through that model, right? Yeah. We've been working with the steering council model

Starting point is 00:02:47 for, say, 16 months now. Yeah, I guess so. 3.8 definitely came out under that model. Yeah. The thing that Python did that I think is kind of interesting, and I don't know if you started it, but the notion of having more core mentors to try to mentor new core developers,

Starting point is 00:03:03 I think that's an interesting thing. You can't really make people be mentors, but that's an interesting way to get more core developers on. We have a few people who are very active as mentors in addition to being active as core devs, and it really does make a difference. Yeah, we don't have enough mentors to mentor everyone who wants to become a core dev.

Starting point is 00:03:27 Yeah, so I think that's really great. I mean, it's one thing to write web apps in Django or to write Python code. It's an entirely different thing to write Django or write Python, right? It's a very different skill set. And so I think that mentor model is really a great bridge. Yeah, that'd be cool. So speaking of things I think are going to be really helpful, but in a much simpler way,

Starting point is 00:03:48 this is sort of a data science topic for everyone out there. And one of the problems in data science is you can end up with very large data sets, complicated data. But every now and then there might be a none where you expected an integer, or there might be a empty string where you expected a date or something like that. And understanding how that data is or how complete it is, where is it more incomplete than less complete, right?

Starting point is 00:04:16 Or less, more or less and so on. So there's this cool project called missing no, which I think is missing number, right? Shortened. And the idea is it's a missing data visualization module for Python. And you too can see the picture in the show notes and folks who listen to this, they can go back and see it in the show notes as well. But it's a really cool and simple little library, but it's not just show me a quick graph. It actually does some pretty powerful analysis. So what you can do is if you've got like some pandas data,

Starting point is 00:04:46 you can just go to it and say msno.matrix and give it a sample of your data. And it gives you these really cool graphs of like vertical, either black or white bars or bars that are like kind of zebra stripe, depending on whether or not there's missing data. It shows you which parts, which columns are more complete or incomplete. And even as a little

Starting point is 00:05:06 graph on the side that tells you the likelihood or the correlation of a row being incomplete, right? Like you might have a missing address on one line, but in another one has a missing phone number, or it could be more likely that those are both missing at the same time. There's like a little graph to visualize that kind of stuff. What do you guys think? I think it's very cool. I'm not a data anything person myself. So yeah, to indicate how much I am not in the target audience for this module. The whole time I read your modules, I had the grouping wrong. I thought it was the missing data visualization module. And I thought, well, that's kind of cool that they say there's something missing,

Starting point is 00:05:52 and this clearly is the one that's... Now it's turned up, but it's actually visualizing missing data, which actually I understand what that is. I've seen a spreadsheet or two, and I can actually even understand the little example chart that you pasted into to the notes without understanding anything else around it yeah it's so wonderful because the that's why i actually think i like this and i chose it as you can just look at that picture and go oh i basically get a sense for what this data is like it's complete it's not complete it's mostly incomplete on on this column or whatever.

Starting point is 00:06:26 And yeah, it's really nice. And I suspect you could, if you had data, say, in a database or a file or something, you could probably just read that into a Pandas data frame and then throw it out here and visualize database missing data or file missing data or whatever. But it's really nice. Yeah, for large data sets, one of the things you got to do is to decide when you're cleaning it up what to do with the missing data. And there i mean there's some nones or whatever there's some strategies to either fill it in with uh interleaved data or something or or just throw those complete rows completely away but you i mean you don't really know how much data you're throwing away if you

Starting point is 00:07:00 without visualizing it so this is pretty cool i think this is great yeah and it has other visualizations as well it has heat maps which are like correlations you know so like address and phone number correlated kind of things i was talking about it has bar charts and the most interesting or unique visualization is the dendogram which i had never heard of but this is a hierarchical clustering algorithm from sci-fi actually and it creates this kind of like hierarchical tree of relationships of missing data there's just if you are worried about like cleaning up data or stuff like that or visualizing how good your data is you could throw to this real quick and get some great answers yeah that's cool yeah all right well guido you have been busy with the language Summit recently, right? What's the news there? Yes, well, normally the Language Summit basically is an in-person meeting where about 50 people who are mostly but not exclusively core devs get together a day or two before the actual Python conference.

Starting point is 00:07:59 Since the conference was canceled... This would have been in Pittsburgh, right? It would have been in Pittsburgh, right? It would have been in Pittsburgh this year, right. Obviously, the conference was cancelled and the language summit was too. And then the two organizers thought, well, okay, this sounds like the kind of meeting that we can actually try to do on Zoom. You can't have a whole conference on Zoom, but you can probably have a meeting with 50 people on Zoom. And they tweak the format a bit so that, I mean, you can't be on Zoom for an entire day. I find Zoom incredibly intense. And after an hour of Zooming, yeah. Yeah. User interface sucks. Privacy probably sucks. But it clearly serves its purpose.

Starting point is 00:08:49 So we had it spread over two different days. And then in addition, because nobody was traveling to Pittsburgh, we spread it out in time. One day, it was really early for me so that we could also have participants from Europe. And one day it was really late for me so that we could have some people from Australia join us. One of the organizers lives in Poland and he was there till the end on both days. So I don't know how he slept. Yeah, so as usual, the format wasn't actually all that different. It's typically like half hour slots for various topics that are important to either get information to core devs and usually also get feedback from core devs. And we pretty much stuck to that format.

Starting point is 00:09:41 The one big thing that you miss, of course, is all the whispering to the guy who was sitting next to you or during the break, quickly grabbing three other people and having a little huddle about a topic. Yeah, that's what's so powerful about in-person conferences. Yeah, we missed the entire hallway track, but it was still good to have sort of short presentations and q a sessions and the q a sessions actually worked really well there was a little tool that you can use to sort of moderate questions and lukasz was like running the moderation tool and it was nobody was asking spam questions so he all he had to do was just click OK for every question, I think. Yeah.

Starting point is 00:10:30 That tool is much more structured than the chat channel on Zoom could be. And sort of raising your hand on Zoom and waving doesn't really work if there are 50 people, because there's no way to see more than 16 people or so at a time. Yeah. So anyway, the first day, each day, there were like maybe five topics and a few miscellaneous things. Shall I just go over each day briefly, see if I can sort of run them all off? Yeah, I would say just maybe touch really quickly on just the things that you felt like really might make an impact going forward, potentially.

Starting point is 00:11:01 Just a one-liner guy who originally implemented f strings gave a talk about whether maybe all strings should become f strings and the general sentiment was that that would have been nice in python 1.0 or so but there is no way it would just break too much code it's gonna break too much i totally hear that though because I'm so often I'm typing in a string. I'm like, oh, I need to put a variable here, but I've typed 20 characters in that. I got to go back to the beginning, but not the beginning of the line. Cause maybe that's what I got to get to the beginning of the string and then go, maybe we could even put the F at the end.

Starting point is 00:11:36 Who knows? But yeah, I would love to see it, but it's, I totally understand. You can't do that without breaking stuff. There are downsides to automatically doing it too, because curly braces are useful for all sorts of things besides formatting so that was sort of the opening salvo then my two co-conspirators on the peg parsing project gave a talk about how we're going to hopefully introduce a new parser in Python 3.9. And we've been coding for like almost a year now, probably. It started out as a little hobby project of mine and gradually became more serious and more people started helping out.

Starting point is 00:12:19 And the last few months we've been doing heavy engineering work to actually prepare for the integration. But we didn't have steering council approval yet. We made it a pep and we sort of said, well, this is a nice thing, but we're not going to do this unless there is sort of clear consensus or at least general agreement that we are going to do this. And so very soon after the summit, the steering council actually had a meeting and approved a bunch of PEPs, and ours was one of them. And then the last two days, I've been stressing out because we wanted to get the new parser in the Alpha 6 release, which is going out tomorrow.

Starting point is 00:13:02 And so we're now in the last, the very last stretches of preparing for alpha six and we're just deleting or disabling tests that are still failing that we know how to fix them, but we just don't have the time. Right. That's exciting that this project is going to be in there. That's great.

Starting point is 00:13:18 Yeah. So that's the new parser. And if all goes well, nobody will notice a thing. Ideally. What are the effects? Is it going to speed things up or make things more maintainable? It's going to sort of open up the grammar for future changes to the language that we currently can't do because the old LL1 parser holds us back.

Starting point is 00:13:42 Okay. That's sort of the main motivation. Super. There was one interesting talk about something called HPI, which is a proposal for a new, more portable API, and in particular focused on other Python implementations besides CPython, as you may know. PyPy has been struggling for over a decade with

Starting point is 00:14:07 compatibility with extension modules. And the HPy proposal is basically instead of pointers to objects, you have handles, which is a pointer to a pointer to an object. And there's a whole API around handles that is equivalent to the existing API, but it allows different styles of garbage collection. For example, you could implement a garbage collector that moves objects behind your back occasionally. Right, you might get a generational compacting garbage collector because you could update the value of the pointer pointer without changing the actual pointer, right? Yeah, yeah, that's actually really exciting. Yeah. And it's still in early stages, I believe. But it looks pretty promising. Eric Snow gave a lightning talk about a sort of a retrospective of all his work on multi core support, which is

Starting point is 00:14:58 now beginning to conclude, well, maybe it's too soon to call it a conclusion, but we're going to have sub interpreters with a much better API, either in 3.9 or in 3.10. There's a pep around that 5.5.4, which will definitely be moving forward. But whether it's considered mature enough to go to land in 3.9 is not entirely clear yeah eric's work is very interesting there yeah yeah and in 3.10 we will probably have separate gills per sub interpreter that is going to be a major new thing let's see what else do we have well so the next day i gave a talk about the future of typing which oh yeah there's one detail you might remember that we introduced something called from dunder future import annotations which made it so that annotations

Starting point is 00:15:52 are no longer evaluated at runtime you can still introspect them but you'll get just get the string containing the annotation expression back well that's going to be the default in 3.9 most likely there's still a little debate about that but there was like a two-thirds preference for just making that the default in 3.9 and and various people argued effectively that nobody should notice any difference i'm really excited or happy to have typing in the language it makes such a difference for the right use case you know on defining the boundary of apis or making the editor understand something better when it otherwise wouldn't if you're maintaining tens of thousands of lines of python code or more type annotations really make a difference yeah for sure i still don't recommend

Starting point is 00:16:42 teaching them to beginners though oh. Oh, really? Okay. It depends on what kind of beginners you have. If they're sort of recuperating Java programmers, maybe you should introduce them. But if they're like actually blank slate, this is the first time they're programming ever. I wouldn't bother with them with annotations. Yeah, I kind of agree with that. Yeah. available time to actually implement the design. And I'm sure that when you're halfway through implementation, all sorts of interesting issues with the design will prop up. So the design is not final until it's been implemented.

Starting point is 00:17:34 Okay, last two topics. Zach Hetfield-Dodds gave a very good talk about what he calls property-based testing and which really is about the tool named hypothesis that introduces testing approach that i think was first developed in academia for haskell that works in a completely different way than your typical unit test based testing. Right. The tool decides, right? Instead of examples. The tool generates test cases, and I've never played with it myself, but the talk sort of made me very excited to play around with it more.

Starting point is 00:18:16 And it actually, even though it's a very different approach than unit test or PyTest based testing, it will still integrate with that. I mean, you can write a unit test and then put some decorator on top of it that produces test data. And Hypothesis has all kinds of really advanced stuff for exploring enormous spaces of possible input data and quickly finding bugs. Do you think we'll get to a place where we are able to use Hypothesis for some of the testing for the standard library?

Starting point is 00:18:52 That was one of the propositions that Zach made. I think it's still early for that. I think it's much easier to introduce Hypothesis in sort of a new project where you haven't yet written all the code and all the tests than it is to retrofit it in a large, mature, or maybe even somewhat dementing project. we'll have hypothesis-based testing for the standard library, just like it'll be a while before we'll have annotations in the standard library rather than annotations sort of separate from the standard library. The last talk I want to highlight, and then I'm really done with this, is also a very good talk by Russell Keith McGee about the state of beware and Python for mobile. And one of his suggestions was that we adopt some of his mega patches that he's currently been maintaining for several Python releases that would make Python at least compile out

Starting point is 00:20:01 of the box or nearly out of the box for the important mobile platforms. That'd be cool. Yeah, it'd be so wonderful to have Python as an option for mobile. It really would bust open the doors and create even more growth. Many people believe that sort of mobile platforms are obviously continuing to grow in importance and to grow in power. And we'd be crazy if we didn't support Python on those. And it may be very important for Python's very survival.

Starting point is 00:20:30 Yeah. Yeah. I saw the block Swan talk that Russell Keith McGee gave, and it was compelling. He is an amazing speaker for sure. Yeah. Yeah. That's what I have.

Starting point is 00:20:38 Great. Thank you so much for that insight. That was, that was awesome. A lot of people don't get to see the behind the scenes. They just see what's announced when it comes out, right? Before we move on, let me tell you about our sponsor, Datadog. This episode is brought to you by Datadog.

Starting point is 00:20:51 So let me ask you a question. Do you have an app in production that's slower than you like? Is its performance all over the place, sometimes fast, sometimes slow? Now, here's the important question. Do you know why? With Datadog, you will. You can troubleshoot your app's performance with Datadog's end-to-end tracing, use detailed flame graphs, identify bottlenecks and latency in that finicky app of yours. So be the hero that got the app back on track at your company. Get started

Starting point is 00:21:14 with a free trial over at pythonbytes.fm slash Datadog. Get a cool t-shirt as well. Brian, you've got another one that kind of ties into your first one, right? But it's sort of the other side of the coin, maybe? I don't know what's been happening in the Python world that you sort of orbit in that might make you think about these things, but tell us about it. No, I've just been thinking about community and codes of conduct and enforcement for codes of conduct. No reason, really, just kind of an interesting topic. One of the things I've been thinking about is, especially when researching this, the codes of conduct and enforcement of it and how we treat people. I first thought it was really important for open source projects.

Starting point is 00:21:51 And it definitely is because people have the option to just leave and get out of the project. So you really want to treat people well so they stick around and have it be welcoming to other people. But I don't think industry is really that different. I think that people have the ability to just get another job or work on a different project. So I think these are important for industry as well. I took a look at two sets of codes of conduct and the enforcement of those. So the PSF has a code of conduct.

Starting point is 00:22:19 I'm not going to read them all out, but there's things like being open and being friendly. And in there, there's a list being open being friendly and in there there's a list of inappropriate behaviors as well that's covered now also the django code of conduct they also have all of these when you read them there are differences but when you read them they kind of sound the same one of the things they highlight in the django one is be careful with your choice of words, and they include examples of harassment, speech, and exclusionary behavior that's not appropriate.

Starting point is 00:22:54 One of the big differences I saw was the enforcement. So the PSF is a two-third majority vote enforcement sort of thing to make sure if something happens, like if they want to kick somebody out or put them on probation or something. I think that's really important because if you require 100% majority and somebody who is on the team that decides is potentially part of the problem, then what do you do, right? It's really tricky. I mean, if people are just going to abandon a project, right, you would rather have just a strong majority make a decision i also think that psf has probably got a larger possibly as a larger working group on this and as more i guess maybe harder to get a hold of people maybe it's easier to get a two-thirds then maybe you can't even

Starting point is 00:23:36 reach all 100 of the group but anyway the other interesting difference is um PSF code of conduct seems to, I know it does cover online interaction as well as events like the conferences and meetups and stuff. But possibly, at least I think that maybe its focus might be more on events, whereas the Django code of conduct is specifically targeted towards online interactions. I would say for the PSF that sort of historically, events were the first place where codes of conduct were introduced, but we've been using them for online forums more and more in the past few years. Okay.

Starting point is 00:24:20 One of the interesting things with the Django one is that a single person on the committee can act without collaborating with anybody else. If it's an ongoing problem or if there's a threat involved or something, they still have to go through the process of notifying everybody else. But there is an interesting thing that one person on the committee can intervene right away. I'm not saying one is better than the other, or I just think it's interesting. And I think it's important for new projects to think about not just their code of conduct, but how they're going to enforce it and what the timeline. So the Django one also includes some timelines, which is interesting. And I would really like to make sure that projects kind of practice, maybe figure out what they're going to do if they need

Starting point is 00:25:05 to enact one of these things without, you know, before it becomes a problem, they know what they're going to do. Yeah, there's a lot of stuff going on with some projects out there. So having a couple of examples and side by side comparisons, I think is great. I was interested to find out our meetup, like the Python meetup that we started which is on hold right now unfortunately because of the the virus and quarantine and stuff but because we were getting support from the python software foundation to help pay for the meetup fees and stuff we had to list a code of conduct on our meetup page and stuff like that yeah that makes a lot of sense but i didn't realize that yeah yeah the psf been has been doing that for a few years now yeah that's really great all right this next one i want to cover.

Starting point is 00:25:46 It goes back a ways, but I think it's really fun. And it's something that also, I think, ties together well with our special guest here. And this is an article about myths about indentation. And Guido, I picked this one because you were talking about this on Twitter just the other day. What was the motivation to throw that out there? That is a good question. I was just going to volunteer the answer because apparently I had a link to that article on my homepage in some odd corner. And I have a very, very sort of ready old homepage. It's moved it to GitHub pages, but it looks like web 1.0. And because it really

Starting point is 00:26:23 is, I just added rawtml it blends in right with netscape huh so someone reported to me a broken link which happens like i don't know once every four years or so someone reported a broken link oh wait it wasn't even on my home page it was on an old blog that i can no longer edit at artema.com. I'm very glad that that blog is still online. But so because I got the report of the broken link, I decided, oh, I'm sure I can still find on archive.org where that link used to point. And sure enough, it was there. And I thought, oh, that's actually still a neat little article.

Starting point is 00:27:02 So I thought, okay okay tweet of the day or tweet of the week yeah i agree and i think it's interesting as well and just to give you a sense of why it might have disappeared it was one of those types of sites where the domain or the url included a tilde username path like you know and like used to get in university or whatever way back when so anyway this one is myths about indentation for Python. And for people who come from a C-oriented language, I think Python could come across a little bit funky. I actually want to share a little story of just sort of my journey with it and how I came to love this. But I think this is really interesting for people having the debate about is significant white space

Starting point is 00:27:42 useful? Is it weird? Is is it good i did a ton of c++ and then c sharp development so it was all and then javascript development it was all about the curly brace languages lots of symbols and then i came to learn python and i'd love python right away but it was weird to me i felt kind of naked like if i'd write an if statement i'm like i need some little parentheses to kind of hold the code in place and why don't they need to be there and i need a curly brace to like say when this block of code is done and whatnot it just took a little bit of getting used to but i knew that it was the right thing for me because when i went back to work on some older projects i'm like why are there symbols everywhere what is all this stuff i

Starting point is 00:28:18 have to keep typing this is like a broken language and just took a couple of weeks for me to like make that switch to feel like it was broken to go back to work in languages i've been doing for like 10 years so well done with the white space guido thanks yeah but so let's cover some of the things mentioned really quick in the article one is that white space is significant in python source code and actually no not in general is the answer it's significant on the left so right so as much as you indent stuff that really means things but between variables like whether you have like a equals seven or a space equals space seven doesn't matter you can have tons of spaces in there right like any other language of spaces kind of don't matter except for on the left so that that's cool. And also the amount of indentation doesn't really matter, right?

Starting point is 00:29:06 You could have five spaces for any code suite that you want, or you could have 18, or you could go with a standard four. I recommend the four, but you know. And then also if you have something that defines like a list comprehension or an array creation or a dictionary, then all of a sudden the spacing doesn't matter anymore, right? As soon as you have like an open square bracket and then you have a bunch of stuff and then close square bracket, spacing doesn't matter in there. So I think this is interesting to think about as folks debate that maybe within their teams.

Starting point is 00:29:35 It also, you could say it forces you to use a certain indentation style. Well, yes and no. If you wanted to write it single statement per line, then yeah, there's a cool example that they gave in the article is like, if one plus one equals two, then new line, print food, new line, print bar, new line, print, or just say X equals 42. You can also put them on multiple lines with semicolons. If you're really missing your semicolons from your language, you could do that. The thing that's interesting here, I think this is probably the most significant part of this article or this write up is if you look at it, it looks right. And when it gets parsed, it is right.

Starting point is 00:30:10 There's an example of some C code that looks visually wrong because it's intended differently, but it's going to parse. But the way you see it when you read it is not what's actually happening and i think there was a problem like this well i think it was in some either objective c it was something with apple in there um it was really bad there was an infamous apple vulnerability i think it might even have been on the iphone where someone had added a second statement to a block but it wasn't a block because there were no curlies. Right. That it started out with a single conditional line, like if something indent, do the thing. And then they just indented, but they didn't put the curly braces in. And it was, yeah, it was, it took so long for people to find it because visually it looked like what Python would look actually mean, right? It looked like those two things were part of the if block,

Starting point is 00:31:03 but because the white space didn't matter, it actually didn't. And so that's really interesting. I'm not going to go through everything. I'll put it in the show notes. But another one that I thought is like, I just don't like it. And that's fine. People can not like it, but it has a lot of advantages. Like in that example before, if you had that wrongly indented Python code, it would not parse. It's an error to have it not look right. And rather than just not be right. So it has a lot of advantages and people can really quickly get used to not having to write all those symbols. And then you go back and you're like, this code is hard to read. It's just full of curly braces, semicolons, parentheses everywhere. I always thought we used to, those were just, that is what builds programming languages. To have a programming language, you had to have that.

Starting point is 00:31:43 And then once I experienced Python and I went back went back it kind of it broke my mental model of the world i'm like you don't actually have to have those things so why are they there anyway i what do you think about this article you must like it somewhat because you hunted it down and tweeted it right it's all news for me because i didn't even invent the white space thing for python that was sort of handed to me on a silver platter by one of my mentors in the early 80s. Yeah, back in the ABC days. And in those days, it was an innovation.

Starting point is 00:32:13 There was like one other language that had this and Knuth had once said that he thought it would be a good idea, but he had never actually implemented the language or even experienced the language that implemented it. He just thought that it would be a good idea. Right, right. The only thing that was a stumbling block for me was when I first started looking at Python, the editor I was using, I think it was an Emacs something at the time. I'm not sure what I was using. But with the C++ code I was using, I had it set up so that if I double-clicked

Starting point is 00:32:45 on the closing bracket, it would jump to the top of the block. And I really liked that feature. And for some reason, that's the reason why I didn't like the white space thing at first. Like, how do I get back? But then I just went, okay, I'm going to, like, beginner's mind,

Starting point is 00:33:00 just open mind, just embrace it and learn it as a new thing. And I didn't, like, a week later and i didn't like a week later i didn't even miss it so yeah and of course the new editors the newer editors like pie charm and stuff at the bottom they have little breadcrumbs of you know here's the class here's the function here's the if here's a while whatever and you can you can jump between them just like you were talking about but like the entire hierarchy of like i don't know the tokens or whatever yeah and i just i tend to write smaller functions now so it's not as much of a deal.

Starting point is 00:33:28 This is probably a good thing that it was hard. I was thinking that if you needed the attitude to help you find the top of the block, it must be pretty far away. It's 4,000 lines. I hate scrolling so much. These functions are hard. How interesting. All right. Guido, do you have one more you want to share with us? Well, yeah, you gave me some homework. I didn't really do it, but there's like, and of course, this has to do with parsing. And so this may be a fairly esoteric library.

Starting point is 00:33:55 But if you're writing a program that sort of does some manipulation of your code, and maybe it converts four space indents to two space indents or three space indents or whatever or maybe you're having you're writing something like black which is the sort of python code reformatting tool but you don't like the way black handles certain things or maybe you're writing some other thing that does analysis of source code. Maybe you're writing a linter. There are a couple of tools that you can use. And it turns out that one of them is in the standard library.

Starting point is 00:34:35 There's something called lib2to3, which is a little hard to pronounce. It has the digit 2 and then the word T-O and then the digit three in the name. That is tricky. That is something I wrote probably over 15 years ago, or at least the core of it, which is yet another LL1 parser, but this one's written in Python rather than in C, like the original one.

Starting point is 00:35:00 And actually, Black ended up using lib2 to 3, except I think Lukasz had one issue that he couldn't figure out how to do with black and so he ended up vendoring a copy of lib 223 and then butchering it a little bit which is how these things happen i mean if you look at what pip vendors that's pretty scary but there are good reasons for that too so but if you're writing your own you should probably not use lib 223 and not just because it's going to go out of style once the peg parser arrives there are much better tools and the one that i discovered a few months ago uh it's actually written by some folks at Facebook mostly. It's called libcst and they have unique capitalization. It's a capital L lib and then lowercase i b and then cst is all

Starting point is 00:35:55 uppercase. And so it's a library for manipulating concrete syntax trees. And like lib223 it actually shares some code with lib223 i think the underneath is a parsing library called parso which itself is a butchered version of lib223 at least that's how it started these tools are things that can parse python code but they produce a syntax tree that is the opposite of an abstract syntax tree. It's a very concrete syntax tree. And that means that every space, every comment, every bit of indentation is preserved or at least can be recovered from the information in that syntax tree. And opposed that with the typical abstract syntax tree

Starting point is 00:36:48 which in the end doesn't even remember where the parentheses are. Right, right. It just takes us, well, here's some conditional statement. Here's the two things we're testing, right? So this sounds much more useful if you want to do like a code analysis type of thing to say this thing you're doing here,

Starting point is 00:37:04 you should do it in this other way or transform it over, but kind of preserve things like comments and style. Yeah. And so libcst has a really sort of solid underlying model. And they thought a lot about various transformations they want to apply because the typical way these tools work and lib223 itself started out that way as well, is you read your source code using this customized parser. It gives you a concrete syntax tree. Then in that syntax tree, you're actually going to systematically rename a parameter or move things around or insert. In the 2-2-3 world, of course, it's used to turn things like iter items into items and iter keys into keys.

Starting point is 00:37:54 And you can make that kind of changes. And so libcst also supports that. It sort of has a slightly better API because 15 years ago when I started lib2.2.3, I didn't realize what an important tool it was going to be. And some of the way the white space is attached to nodes is exactly backwards from the way that is the most convenient to think about it and work with it. All right, cool. Well, this sounds like it'll be really helpful

Starting point is 00:38:22 for people building tools like Black or looking at code analysis and stuff. Right. Lukasz had a, I think it was the 2019 talk, PyCon talk, where he described how Black uses both concrete syntax trees and the abstract syntax tree. It's a pretty fascinating talk for a very low level depth into these concepts. It wasn't until I watched that talk that I realized that Black compares the before and after abstract syntax tree

Starting point is 00:38:50 to make sure that your code is guaranteed to run the same so you don't really have to test for that. He's already testing for it. So that's pretty interesting. Yeah, that's very cool. That is a very neat feature. And it's actually an important trick in general for people who are doing transformations to have some abstract way of double checking that your transformation left things in a decent state.

Starting point is 00:39:16 Yeah, it's cool. Yeah, very cool. All right. Well, thanks for LibCST. We know that's a great one. Now that's it for our main topic. So just really quick things at the end that I just want to throw out there for people. One, Adam, who goes by Codependent Coder on Twitter, sent a message over and said, hey, Django no longer supports Python 2 at all, which is pretty awesome because 1.11 has left long-term support, leaving only 2.2.12 onward, which has only Python 3 support.

Starting point is 00:39:43 So yay for modern Python making its way through. That's good. And then last time we talked about 90% of coding is Googling and that's okay, or it's not. And we didn't really feel like that was our experience, right? As people have been around for a while. But I got to tell you, this last week, I've been doing nothing but pandas, Altair visualization, Jupyter notebook and, and graphics because I'm building a whole set of dashboards for the TalkPython courses and whatnot. Basically, the dashboards that I should have built a while ago. I Googled a lot. A whole lot. But that's the thing. It was like a two or three

Starting point is 00:40:19 day blip of like, wow, I'm Googling 25-30% of my time because I don't know anything about these things and how do i get this thing to line up with that bar but now i'm back to just kind of mostly not doing that anymore even after a few days so i think generally what we said is true but i do think there's like these blips of like wow i'm diving into something new it's like mad search scrambling but then i'm back to sort of using like more memory coding i don't know what you call not google coding yeah you gotta understand what you're doing and that means you can't just google for examples and copy and paste them in because then you can combine the examples and you have no idea what you're doing and of course it doesn't work at best it's frustrating right

Starting point is 00:41:01 you're like i this worked that worked but together they't work. And you just don't even know why, right? Yeah, so for sure. But yeah, so anyway, it's a follow-up on our conversation last week. Brian, what do you got to throw out there for everyone? I'm going to say this on this show just to make sure I do it. There's like three days left for me to record my talk. Yeah, this is like forcing yourself to commit to it, so you're going to do it? Yes, definitely. So PyCon Talk, I really do want to commit to it. So you're going to do it. Okay. Yes, definitely. So PyCon talk, I really do want to get it online. It's important stuff.

Starting point is 00:41:29 It's about parameterization. I talked a couple episodes ago about having trouble switching back and forth at home with all this working from home stuff between Mac and windows. I finally figured out the whole using command and control. So thank you to everybody. But apparently there's this really simple thing. Apple lets you just swap them on on a keyboard so that's what i'm doing and it works great and then also i had promised that i was going to have my cards project be able to work and publish to pipei or the test

Starting point is 00:41:57 pipei it doesn't work with setup tools scm because i'm using fl. So if somebody's got a way to figure out how to just somehow change the version string or bump that every time you merge or something like that, that'd be great. But otherwise, right now I don't think there's a way to automatically push to PyPI if you're using Flit.

Starting point is 00:42:19 Because it says that one's already uploaded. Maybe there's a GitHub action that will just randomize that or something. Because the version is embedded in the source code. And the trick that people are using with setup tools is the version is based on the version in GitHub. And you can't do that with Flit. So at least I haven't figured it out.

Starting point is 00:42:37 But that's okay. I'll probably do something else. That's my extras. Guido, anything else? Even though I said it's hard to imagine Python going online, it actually is going online. At least some of it is. The first talk by the conference chair, Emily Morehouse,

Starting point is 00:42:55 has been posted and many more will follow. Yeah, her welcome was really nice. The other thing, and as you mentioned, Django no longer supports Python 2 at all. Well, that's just fine because the very last release of Python 2, 2.7.18 was released a few days ago. Yeah, that's great. That must be kind of a load off of your shoulders

Starting point is 00:43:16 to finally have that in the rear view mirror. I'm very happy and I'm sad, of course, that we can't have an absolutely wild and crazy party in Pittsburgh like we were planning. Yeah, a big celebration on Zoom. It's just not the same. Just have to have a bigger one next year. That's one I don't know how to pull off. Well, that's really good.

Starting point is 00:43:36 All right. You guys ready for a really quick joke? All right. So here's a quick joke sent to us by Derek Chambers. And he may have even made this up for us. This goes back to the sub-interpreters and the multiple gills and all that. You guys know how you can borrow money concurrently? With async IOUs.

Starting point is 00:43:57 That's a terrible joke. That's a bad joke. Oh, that is very groan-worthy. Very groan-worthy. Excellent. Most of our jokes actually are around here, but that's how it goes. Yeah, and keep them coming. Keep sending us your bad jokes.

Starting point is 00:44:10 Yeah. That's right. That's right. Python dad jokes, that should be a whole separate category. They absolutely should. They should. Well, Guido, it was really an honor to have you on the show. Thanks for coming and sharing your perspective on all this.

Starting point is 00:44:22 Glad to be back. Yeah, and Brian, thanks as always. Good to be here with you. Cheers. Yep. Bye, everyone. Bye. Thanks, both of you. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes

Starting point is 00:44:38 at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #179 Guido van Rossum drops in on Python Bytes

Topics covered in this episode: New governance model for the Django project missingno Announcements from the language summit. Codes of Conduct and Enforcement Myths about Indentation Parsers and Li...bCST Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/179

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.