Python Bytes - #191 Live from the Manning Python Conference

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 191, recorded July 14th, 2020. I'm Michael Kennedy. And I'm Brian Ocken. And welcome special guest, Enos. Hi. It's great to have you here. So I want to kick this off with a cool IoT thing.

Starting point is 00:00:19 Now, IoT and Python, they've got a pretty special place. Because when I think about Python, I think of it as not being something that sort of competes with assembly language and really, really low level type of programming for small devices. But amazing people put together MicroPython, which is a reimplementation of Python that runs on little tiny devices. And we're talking like $ dollar microchip type devices right have either of you all played with these no no i i haven't but like i've been seeing a bit of this from my brother so he's he's pretty amazing like he he's a bit younger than me he's an event

Starting point is 00:00:54 technician and he recently taught himself programming and everything just so he can build stuff on these like tiny raspberry pies and like i don't know he's doing super advanced stuff it's been really interesting to see him learn to program and he't know he's doing super advanced stuff it's been really interesting to see him learn to program and he's also he's incredibly good he has like amazing instincts about programming even though he's never done it before but like so i've been kind of watching this from afar and it made me really want to build stuff so i'm very curious yeah i've done the i've done the circuit python on some of the adafruit stuff exactly so i always just want to build these things.

Starting point is 00:01:25 I'm like, what could I think of that I could build with these cool little devices? I just, in my world, I don't have it. Maybe if I had a farm, I could like automate, you know, like watering or monitoring the crops or if I had a factory, but I just don't live in a world that allows me to automate these things.

Starting point is 00:01:41 Do you have pets? Maybe you can build something for pets. We generally don't have pets, but we are fostering kittens for the summer. So I could put a little device onto one of the kittens potentially. GPS tracker. Yeah. So in general, you have to get these little devices, right? You've got the US PyCon, we got the Circuit Playground Express, which is that little circular thing. It's got some 10 LEDs and a bunch of buttons and other really advanced things like motion sensors and temperature and so on. Probably the earliest one of these that was a big hit was the BBC Microbit, where I think every seventh grader in the UK got it.

Starting point is 00:02:23 Some grade around that scale got one of these. And it really made a difference in kids seeing themselves as a programmer. And interestingly, especially women were more likely to see programming as something they might be interested in, in that group where they went through that experience. So I think there's real value to work with these little devices, but getting a hold of them can be a challenge, right? You've got to physically get this device. That means you have that idea of, I want to do this thing and then I have to order it from Adafruit or somewhere else and then wait for it to come. And my experience has been, I'll go there and I'm

Starting point is 00:02:53 like, oh, this is really cool. I want one of these. Oh, wait, no, it's sold out right now. You can order it again in a month, right? So getting it as a challenge. And also if you're working in a group of say, like you want to teach a high school class or a college class or something like that, and you want everyone to have access to these, well, then all of a sudden, the fact that maybe it costs $50 wasn't a big deal. But if it's $50 times 20 or 100 kids, then all of a sudden, well, maybe not. So I want to talk about this thing called Device Simulator Express. So this is a

Starting point is 00:03:26 plugin or extension or whatever the things that I think it's extensions that VS code calls them that makes VS code do more stuff. And it's a open source free device simulator. So what you can do is you just go to the visual studio code extensions thing and you type device probably is sufficient, but device simulator express, and it'll let you install this extra thing inside of vs code that is really quite legit so it gives you a simulated circuit playground express a simulated bbc micro bit and the most impressive to me is the clue from adafruit which actually has a screen that you can put graphics on. So really, really cool way to get these little IoT devices with CircuitPython. So Adafruit's fork of MicroPython on there. What do you guys think? See that picture? Look how cool that is.

Starting point is 00:04:19 Yeah, so you can really so you can write Python in one tab and then just have the visualization in the other. That's pretty cool. Yeah. Yeah, exactly. And it's very similar to, say, what you might do with Xcode and iPhones, where you have an emulator that looks quite a bit like it or what you would do on the Android equivalent. I actually think this is a little bit better than the device because it's actually larger,

Starting point is 00:04:40 right? Like the device is really small, but here's like a, you know, it could be like a huge thing on your 4K monitor with a little clue device so you can simulate circuit playground express bbc micro bit and the clue in here and we just say new project and it'll actually write the the boilerplate code for the main dot pi or code dot pi or whatever it's called that the various thing is going to run and it like you said en Enos, on one half, it's got the code and the other half does the device that you can interact with. I was thinking that a couple of cases that would be great is,

Starting point is 00:05:12 like you were saying, trying to get a hold of it, but you might not even know if the concept that you're going to use is really going to work for the device you're thinking of. So this would be a good way to try it out, to try out whether the thing you're thinking of trying in for your house or whatever would actually work for this device. The other thing was you brought up education and that it's big. I was thinking about a couple of conferences where they tried to do the display

Starting point is 00:05:36 and try to have like a camera or something. Sometimes it works and sometimes it doesn't. This way you could just do a like a tutorial or in a teaching scenario and everybody could see it because it's just going to be displayed on your your monitor so right your standard screen sharing would totally work here that's a good point as well yeah and it doesn't have to be all or nothing it actually what's really interesting is this thing isn't just an emulator but you can do debugging you can set like a break point and like step through it running on the device,

Starting point is 00:06:07 simulate it, or you can actually run it. If you had a real device plugged in, you can run it on there as well and then do debugging and breakpoints and stuff on the actual device. So it's like you tested here. I always admire people who actually use like the proper debugging features. I know VS Code has like so much of this

Starting point is 00:06:20 and I'm always like, I should use this more, but I'm like, okay, print. Print, print. Yeah, there's some really cool libraries that will actually do that i can't remember what it's called but brian and i recently covered one that that would actually like print out a little bit of your code and the variables as they change over time it was like the the height of the print debugging world that was really really cool i wish i could remember do you remember brian no we actually covered a couple of them and i know i know that's a problem we cover thousands of things in here so another thing that's interesting is like okay so you see the device some of them have buttons and they have

Starting point is 00:06:53 lights and you can imagine maybe you could touch the button but they also have things like temperature gyro meter type things or like you move in it or motion sensing or even like if you shake it this thing has little ways to simulate all that stuff. So you can like have a temperature slider that freaks it out and says, hey, the temperature is actually this on your temperature sensor and so on. So all the stuff that the devices simulate

Starting point is 00:07:15 are available here. Oh, that's cool. Yeah. So I actually had the team over on TalkPython not long ago. So people can check that over at talkpython.fm. And yeah, I'm also really excited about what you got coming here next brian what is that yeah well speaking of deep i guess debugging versus test we didn't really talk about testing anyway i'm really excited we should have talked about

Starting point is 00:07:35 testing yeah so i was just i was thinking and i was thinking that that um i i hardly ever use a debugger for my source code but i use a debugger all the time when I'm debugging my tests. I don't know. It's just something different about it. But I've been running a lot of tests and debugging a lot of tests lately because PyTest 6, the candidate release is out. Now, by the time this episode airs, I don't know if the release candidate will be released

Starting point is 00:08:03 or just the release candidate still, but you can install it. We'll have instructions in the show notes, but essentially you just have to say 6.0.0 RC1 and you'll get it. So there's a whole bunch of stuff that I'm really excited about. There's a lot of configuration

Starting point is 00:08:19 that you used to be able to put in lots of places in your PyTest Any or your setup config or ToxAny or something, PyTest 6 will support pyproject.toml now. So if you jumped on the toml bandwagon, you can stick your PyTest configuration in there too. There's a lot of people excited about the type annotations.

Starting point is 00:08:38 So the 6.0 is going to support type annotations. So it actually was a lot of work. There was a volunteer that went through and added type annotations to a bunch of was a lot of work. There was a volunteer that went through and added a type annotations to a bunch of it, especially the user facing API. And why this is important is if you're type checking, you're running my pie or something over your source and everything,

Starting point is 00:08:58 your project in your, why not include your tests? But if, if I test doesn't support types, it doesn't really help you much. So it will now, so that's a really, really cool addition. This is basically the API of PyTest itself is now

Starting point is 00:09:14 annotated with types? Yes, and a lot of the internal code as well. So they actually went through and did a lot. It was a lot of work. If you look at the conversation chain, it was a month, several month project.

Starting point is 00:09:28 Wow. What does that mean for compatibility? Does that make PyTest like 3.6 only and above? I think the modern versions of PyTest

Starting point is 00:09:35 really already are 3.6 and above. I'm not sure about that. Right. So then the door was opened to use that because otherwise it

Starting point is 00:09:42 would... I mean, it would be a weird move to like release a completely new version with Python 2 backwards compatibility. Like, you wouldn't do that, right? I mean, I think the message it sends is not great. I totally agree. There is a pinned version of PyTest, I don't remember which one it is,

Starting point is 00:10:02 that still supports 2.7 if you're on it, but no new features are going in there. The thing I'm really excited about is a little flag they've added called no header. So don't use this. Most people don't use this. When you run PyTest, it prints out some stuff like the version of Python, the version of PyTest,

Starting point is 00:10:22 all the plugins you're using, a bunch of information about it. All this stuff is really important for logging. If you're capturing the output to save somewhere or do a bug report or something, that information is great to help other people understand it. What I don't like about that is that it's not helpful if you're writing tutorials or if you're writing code to put on a slide or something. All that extra stuff just takes up space and it distracts. Yeah, like I've had students say, like, I ran it, I think, PyTest in PyCharm. And it has like some kind of output just stating where it is and what it's doing.

Starting point is 00:10:58 They're like, this didn't work for me. I'm like, well, that was just random output from the tool. You're not actually supposed to try to run that part. You know what I mean? But it's, I mean, I saw why they saw that. But at the same time, like, the ability to just say, like, these details don't matter in the long term is great. Yeah, so I'm excited about that, to trim it down. There was a plugin called TLDR.

Starting point is 00:11:18 Too long, didn't read. But it actually didn't take enough of the header off than I wanted. So I had my own tool that would do this, but now I've got this, which is great. So a lot of the configuration, there is a chance for human error if you type something wrong and you type a variable name wrong. And so I really like this new flag called strict config, which will throw an error if you have the PyTest section of your configuration

Starting point is 00:11:46 has something that it doesn't recognize. And it probably is just you misspelled some variable or something. Yeah, that's good to know. And not to, I can't remember the version, but it was, I think it was in PyTest 5. They added some code highlighting stuff that... Yeah, that's super cool. I discovered that just the other day. I like just somehow updated all my dependencies in some environment and suddenly PyTest output was colored and I was like, the diff comparisons on PyTest are wonderful, but apparently they didn't do recursive comparisons of data classes and

Starting point is 00:12:28 adder classes, but now they do. So that's neat. There's a whole bunch of new features. There's fixes. I ran through some of the features I really liked. There are deprecations and it's a large list of breaking changes and deprecations.

Starting point is 00:12:42 That's why they went to a new number by test six, but I went through the whole list and I didn't see anything that was, was like, Oh, that's going to stop me. a large list of breaking changes and deprecations. That's why they went to a new number, PyTest 6. But I went through the whole list and I didn't see anything that was like, oh, that's going to stop me. I'm going to have to change something. Okay, that's good to know. I mean, if you say, oh, there was nothing that we're using, I feel confident that maybe there's nothing in my code either. And I knew that somebody was going to ask, is my PyTest book still valid? Yes, it is. I'm going through it right now i haven't gone through the whole thing yet to make sure the side that is not compatible is not the book the book's fine it's um i have a plugin that now is broken so uh pytest check still works but if you depend

Starting point is 00:13:18 on x fail pytest this is a wow this is a corner case but But if you depend on PyTest check and the X fail feature of it, it doesn't work right now. So I'll have to fix that. So you would say X fail fails temporarily? Yeah. It actually marks everything as a pass. So if you mark X fail. Oh, wow.

Starting point is 00:13:36 That's like X fail-ception. It's really bad. Anyway, I'll have to get back to that. Yeah, this is really exciting that PyTest 6 is out. Super cool. I know that there were some waves, some uncertainty in the ecosystem. So it sounds like that got ironed out.

Starting point is 00:13:53 Things are going strong. New versions coming out. I even saw that Guido had retweeted the announcement and said, yay, type annotations coming in PyTest. Of course, he's been all about type annotations these days. We'll come back to that later in the show, actually. So Enos, I know you work a lot with text, but are you frustrated with it? What's the story of this name here? Oh, my point of the day. Yeah. I thought I'd present something for MySpace, obviously. Awesome. Yeah. There's this new framework that I came across. And it's called text attack. Yeah. And it's a framework for adversarial attacks and data augmentation for

Starting point is 00:14:30 natural language processing. So what are adversarial attacks, you've probably you might have actually seen a lot of examples of it. For instance, that an image classifier that predicts a cat or some other image, even though you show it complete noise and you somehow trick the model. Or you might've seen people at protests wearing like funny shirts or masks to trick facial recognition technology. So really to trick the model into, to like, you know, not recognize them.

Starting point is 00:14:57 Or the famous example of Google Translate suddenly hallucinating these crazy Bible texts. If you just put in some complete gibberish, like just gah, gah, gah, gah, and then it would go like, the Lord has spoken to the people, stuff like that. That's amazing. I include a link to an article by a researcher

Starting point is 00:15:16 who explains why this happened and shows the example. It's pretty fascinating. But I think it all comes down to the fundamental problem of how do you understand a model that you train? And what does it mean to understand your model? And how does it behave in situations when it suddenly gets to see something that it doesn't expect at all? Like, gah, gah, gah, what does it do? And the thing with neural network models is you can't just look at the weights. They're not linear. They're like, you know, you can't just look at what your model is. You have to actually run it. And so that library text attack that lets you actually try out different types of attacks from the academic

Starting point is 00:15:55 literature and different types of inputs that you can give a model to see whether it produces something that you're like not happy with, or that's like really weird and exposes some problems in your model. And it also lets you then, because normally what's the goal? The goal is, well, you do that and then you find out, oh, damn, like if I suddenly feed it this complete nonsense, or if I feed it Spanish text, it like goes completely in the wrong direction and suddenly predicts stuff that's not there.

Starting point is 00:16:22 And if you deployed that model into a context where it's actually used, that would be pretty terrible. And there are much worse things that can be happening. So you can also create more robust training data by replacing words with synonyms. You can swap out characters and just see how the model does. So I thought that was very cool. And yeah, in general, I think adversarial attacks, it's a pretty interesting topic. And yeah.

Starting point is 00:16:48 Yeah, it's super interesting. So the idea is basically you've trained up a model on some text and for what you've given it, it's probably working. But if you give it something you weren't expecting, you want to try that to make sure that it doesn't go insane. Yeah, exactly. And it can do it can expose very unexpected things like the Bible text, for example, that sounds really bizarre when you like first hear it. But one explanation for that would be that, well, especially it happens in low resource languages where, you know, we don't have much text, and especially not much text translated into other languages. But there's one type of text that has a lot of translations available. And that's the Bible. And so they're parallel corpora where you have one text,

Starting point is 00:17:27 one line in English, one line in Somali, for example. And then people train their models on that. But one thing that also is very specific about Bible text is that Bible text has some words that really only occur in the Bible text. It uses some really weird words. So what your model might be learning is, if I come across a super unexpected word that's really, really rare,

Starting point is 00:17:48 that must be Bible. And also the objective is, you want your model to output a reasonable sentence. So the model's like, well, okay, if that's the rare word, then the next word needs to be something that matches. And then you have this bizarre sentence from the Bible, even though you typed in ga-ga-ga.

Starting point is 00:18:04 And that happens. Yeah, how funny. Yeah. Yeah, so it looks like they have actually a bunch of trained models already at the TextAttack model zoo, they call it, I guess. Yeah, everything's called the model zoo. Yeah, and so you can just take these

Starting point is 00:18:22 and run it against it, like the movie reviews from Rotten Tomatoes or IMDb or the new set or Yelp, and just give it that kind of data and see how it comes out, right? Exactly, yeah. I think that's pretty cool. And yeah, and then you can actually,

Starting point is 00:18:34 you can also generate your own data or load in your data and generate data that maybe produces a better model or like covers things that your model previously couldn't handle at all. So that's the data augmentation part. Yeah, that's all very important. And I think it's also very important to understand

Starting point is 00:18:50 the models that we train and, you know, really try them out and think about like, what do they do and how are they going to behave in like a real world scenario that we care about? Because, yeah, the consequences. As soon as you're making decisions on this data, right? On these models. Yeah.

Starting point is 00:19:15 I guess as soon as a human is convinced that the model works and they start making decisions on it, right, that could go bad if the situation changes or the type of data. And especially if the model is bad. Like I'm always saying, well, people are always scared of these dystopian futures where we have AI that can, I don't know, know anything about us and predict anything and works. But the real dystopia is if we have models that kind of don't work and are really shit, but people believe that they work, that's much more. It's not even about whether they work, it's about whether people believe it.

Starting point is 00:19:38 And then that's where it gets really bad. Yeah, and that's way more likely. Yeah, yes. It's a more difficult world to test this sort of stuff to figure out. What does it mean for a model to be bad? How do you tell if it's bad? And models can be both working in with some data sets and produce gibberish with,

Starting point is 00:20:00 or yeah, I guess in this case, the reverse not produce gibberish. if you pass in gibberish. Yeah. Actually, yeah, I just realized it ties in very well with the PyTest point earlier. And just like, yep, machine learning is quite special in a way that it's code plus data. Code, you can test, you can have a function. And you're like, yay, that comes in.

Starting point is 00:20:19 That's what I expect out. Easy, write a test for it. You know, it's not that easy. Testing is hard, but like fundamentally, yeah. It's somewhat deterministic. Yeah. Right. Like, and even if it's not, there's like something you can, you know, test around it and it's

Starting point is 00:20:32 much harder with the model. Yeah. Yeah, for sure. All right. Before we get to the next item, just want to let you know, this episode is brought to you all by us over at TalkPython Training. We have a bunch of courses. You can check them out and we're

Starting point is 00:20:46 actually featured in the humble bundle that's running the python humble bundle right now so if you go to talk python.fm humble 2020 you can get a thousand four hundred dollars worth of python training tools and whatnot for 25 bucks so that's a pretty decent deal and uh brian you mentioned your book before tell people about your book real quick. Yeah. So Python Testing with PyTest is a book I wrote, and it's still very valid, even though it was written a few years ago. The intent was the 80% of PyTest that you will always need to know for any version of PyTest. And I've had a lot of feedback from people saying a weekend of skimming this makes it so that they understand how to test. It's a weekend worthwhile.

Starting point is 00:21:27 Yeah, absolutely. And Ines, you want to talk a little bit about Explosion just to let people know? Yeah, so I mean, some of you who are listening to this might know me from my work on Spacey, which is an open source library for NLP and Python, which I'm one of the core developers of. And yeah, that's all free open source. And we're actually just working on the nightly version

Starting point is 00:21:45 or the pre-release of Spacey 3, which is going to have a lot of exciting features. I might also mention a few more things later on. And yeah, so maybe that's already going to be out by the time this podcast officially comes out. Maybe not. I don't want to overpromise. But yeah, you can definitely try that out. And we also recently released a new version of our annotation tool Prodigy, which comes with a lot of new features for annotating relations, audio, video. And the idea here is, well, once you get serious about training your own models, you usually want to create your own data sets for your very specific problems that solve your problems. And often, the first idea you have

Starting point is 00:22:23 might not be the best one. It's a continuous process. You want to develop your data. And Prodigy was really designed as a developer tool that lets you create your own datasets with a web app, a Python backend, you can script. That's our commercial tool. That's how we make money. And it's very cool to see a growing community around this. So yeah, that's what we're doing. We have some more cool stuff planned for the future. So stay tuned. Yeah, people should check it out. Actually, you and I talked on TalkPython 202

Starting point is 00:22:50 about building a software business and entrepreneurship. You had a bunch of great advice. So people might want to check that out as well. Do you actually know these episode numbers by heart or did you look that up before? Some of them I know, but that one I used the search. Okay. I remember you were on there.

Starting point is 00:23:03 I remember what it was about, but not the number. I just put together that I know two people from Explosion, so that's interesting. Yeah, Sebastian. Yeah, he was on your podcast recently, which I feel really bad. I wanted to listen to this because he advertised it with like, it will tell the story,

Starting point is 00:23:19 true story behind his mustache, which I really wanted to know. But then I was like, I'll need to listen to this on the weekend, and I forgot. So yeah, if he's listening, I'm sorry, I will definitely I need I need to know this. So I will listen. Excellent. So don't spoil it. Do a great work on fast API. All right. Speaking of people that have been on all the podcasts as well as Brett Cannon, he recently wrote an interesting article called What is the core of the Python programming language? And he's legitimately asking as a core developer, what is

Starting point is 00:23:49 not the maybe lowest level, but what is the essence, I guess is maybe the way to think about it. I only just got the core core pun. It did not occur to me when I first read the article. I feel really embarrassed now. To be fair, English is not my first language, but still

Starting point is 00:24:05 it's not about that. Anyway, sorry for interrupting. When I first read it, I was thinking like, okay, we're going to talk about what is the lowest level and yeah, okay, it's probably C and C of L dot H, C of L dot C and so on. But really the thing is, Brett has been thinking a lot about

Starting point is 00:24:21 WebAssembly and what does that mean for Python in the broad sense he and I talked about it on talk Python I think at the very last icon event we did a live conversation there about that and it's important because there's a few areas where Python is not the first choice maybe not the second, sometimes not even the 10th choice of what you might use to program some very important things like maybe mobile, maybe the web, the front end part of the web, importantly, I mean. So there's a few really important parts of technology where Python doesn't have much reach, but all of those areas

Starting point is 00:25:02 support WebAssembly these days, right? And if you have something in C, you can compile it to WebAssembly. So there's some thought about like, well, what can we do potentially to make a WebAssembly runtime for Python so that Python magically, almost instantly gets access to what was just javascript front-end frameworks space and what is you know mobile ios and android and all those things allow you to directly run javascript as part of your app so how would we make that happen so it's pretty important right if we could solve that problem like python is already so popular and his growth is so incredible like what if we could say oh yeah and now it's an important language on mobile

Starting point is 00:25:46 and it's an important front-end language framework. Like that would just take it to the next level or maybe a couple levels up if you do them both. And WebAssembly seems to be one of the keys to kind of bridge that gap, right? So Brett talks about in this article how for so long we've just had CPython is what we think of when we have python sometimes people use

Starting point is 00:26:06 pi pi py py as a partially jit compiled version sometimes faster version of python but not always because the the way it interacts maybe with c libraries that you might be using through packages and so on and really it's a lot of Python's dynamic nature makes it hard to do outside of an interpreter, where to be clear, WebAssembly is a compiled language, right? So if you're going to put it over there, maybe it's going to require it to be compiled. So this is a really interesting thing to go through and read and think about with Brett, he talks about things like, well, how much of the Python language would you have to implement and still consider it to be valid python like we

Starting point is 00:26:45 talked about micro python and usually don't people look at they don't look at that go that's not python that's fake right no like it's python but it's not as much python right you don't have the same all the apis on micro python as you do on regular python so questions like do you still need a repl could you live without locals right The ability to ask what the local variables are and so on. So he said he didn't really have a great bunch of great answer. It's more of a philosophical, like we need to solve this. But I do want to share some of my thoughts on this. And I feel like maybe what we could do is we could come up with like a standard Python language definition that is a subset of full python right here's the essence like okay we have to be able to create classes we have to be able to create functions

Starting point is 00:27:31 you have to define strings probably you want type annotations but do you need a val maybe maybe not right so like that if you could have a subset of the language that was smaller as well as the standard library because do you really need to like parse css hex colors everywhere probably not it's a very underused part of the library but it's in there right so if we could narrow it down maybe it would be easier to think about how does it go to web assembly how does it go to like some kind of javascript runtime or something like that and if it sounds crazy you know the, the.NET people did this. They have a.NET standard class library language. They got it running on WebAssembly.

Starting point is 00:28:11 So there's an example of it out there and something that's kind of sort of similar, right? So I think this would just open stuff up if you could get Python in these places. What do you guys think? Initially, I was never so sold on WebAssembly and especially WebAssembly and Python until I watched

Starting point is 00:28:27 Dave Beasley live code a compiler at PyCon India, I think it was. And I was like, this is kind of fun. It's just also fun to watch Dave Beasley live code a compiler. Yeah, for sure. Classic. So that did get

Starting point is 00:28:44 me thinking. I do think one question I think we should ask ourselves is, well, do we really need Python to do all of the things in the browser? Does this really have a benefit that actually makes a difference? A. B. There are a lot of things people use Python for that just wouldn't work in that way. And that's also, I think, part of what made Python so popular in the first place. Like, for instance, you know, all the interactive computing environments. That's why people want to use Python for data science,

Starting point is 00:29:14 iPython, Jupyter Notebooks, that sort of stuff. That's why, you know, Python as a dynamic language made so much sense to people. And that's what made it popular. And large scale processing, like a lot of the type of stuff we're working on,'s like yeah you they can there's stuff that you can run in the browser but it's never going to be viable to run large-scale information extraction in the browser because you want to run that on a machine for like a few hours but i think there are a lot of opportunities also in a machine learning space for privacy preserving technologies that already exist i think from what i understand mozilla is working on some features built into the browser, where you know, you can have models predicting things without it being sent to someone's server. And I think that's

Starting point is 00:29:54 obviously very powerful. That's an interesting idea, right? Yeah, yeah. Because if you could have a little bit of machine learning, yeah, but you don't have to give up the data privacy aspect of it. That's pretty cool. Yeah. So I think for that, there's a lot of potential here for running Python in a browser. Yeah. Well, we start getting used to saying what is Python is what is the CPython implementation. And we got to remember CPython is the reference implementation for the language spec. And I think, I guess we're kind of getting at, maybe we need to split it up and have a core language spec and an extended one or something. I don't know.

Starting point is 00:30:31 Where would you divide the line? Because we've seen, like you said, we've seen things like CircuitPython and other things. And we've actually talked about several smaller languages based on Python that just try to be the same syntax. But at which point is it when is it it's not python anymore and there's at least some of the stuff like i could totally see having a distribution of python that doesn't have a repl still count i could totally see not having idle for instance if something doesn't ship with idle is it still python i think so and because of idle,

Starting point is 00:31:05 then you need TK enter and, or you need TK stuff in there. And there's a lot of stuff that maybe I would be in like, you know, could you live without locals? Most of the time, probably. I actually think this would be since the web and since mobile is so such a big part of our lives and it will be for a while, this might be a decent dividing line to say, whether or not it's for WebAssembly or not, maybe we should split the division at whatever we need to implement a WebAssembly version of Python. And anything above that line is an extended version of Python or something. Yeah, that's a good point. All right, I don't want to go too long in this section because I want to make sure we get the others.

Starting point is 00:31:47 But I do want to leave you with just some thoughts. What if shipping Python was just shipping a single binary and a thing that ran it? You could do that with WebAssembly. Maybe two WebAssemblies, the runtime plus the code. What if all the browsers had capability to plug in alternate runtimes through WebAssembly. So right now you have a JavaScript engine, but what if like say Firefox and Edge and whatnot came up with a way to say, here's a WebAssembly API to plug in alternate runtimes, Python, Ruby,.NET, Java, you name it, and then shipped with the latest version of each of those runtimes. So you just

Starting point is 00:32:24 don't have to down, like the big problem now is you can do it, but you still got to download like 10 megs per page, which is not a good idea. So anyway, I think there's a ton of interesting things that open up if this were possible. So I'm glad Brett's still on this and hopefully he keeps thinking about it.

Starting point is 00:32:41 Brian, I still need to learn Pathlib. You got any ideas on how I do that? Really, you're not using path lib i i am such a i'm just stuck in the os path world i just really need to get with the time help me out okay so path lib is i mean i know yeah you're like some kind of animal like oh so i have no offense to always stop path but you know no i really love pathlib a lot and but there is i gotta tell you that the documentation for pathlib doesn't cut it as an introduction you can find what you're looking for but if you know what you're looking for but i agree with chris may so chris may wrote a post called getting started with pathlib i guess it's kind of

Starting point is 00:33:23 he's got a little pdf field guide that you can download, but he has a little bit of a blog post introducing it. But I downloaded it. It's like nine or 10 pages, and it's actually a really good introduction to Pathlib. So I really like it. The big thing with OSPath versus Pathlib is Pathlib creates path objects. So there's a class that represents a path that you have methods on. And it makes it different

Starting point is 00:33:47 for when you're dealing with this. With the os.path, it's just strings. So it's manipulating strings that represent paths. So the object's different. I like it. Actually, I switched just for the ability to add build up paths with just having the slash operator.

Starting point is 00:34:03 Yeah, it's really interesting how they've overridden division. Yeah. But I think it's a good example of where this makes sense. It's a reasonable use case. It looks good. It's defensible. There are other cases where you're like,

Starting point is 00:34:13 oh, did you really have to like overload these operators? But they're fine. I think that's very valid. Yeah. Yeah. And things like how do you find parts of a path that when you have to parse paths, that's where Pathlib really shines for me. So if you want to find the parent of something or the second level parent,

Starting point is 00:34:32 there's ways to do that in Pathlib and in OS.path. You're stuck with trying to split things and stuff, and it's gross. I mean, there are operations to do it, but it's very good to have this relative i don't know just all these operators like parent and then one of the things that i it took me a while to figure out was i was used to trying to find the absolute path of something and in path lib the finding the absolute path is the resolve method so you say resolve and it finds the absolute path for you you can find the current working directory you can go up and down folders, you can use globs, you can find parts of path names and stuff.

Starting point is 00:35:09 And it's just a really comfortable thing. So this, I think you should give it a whirl. And it's not like it's going to change your life a lot. But the next time you come up with when the next time you're programming, you're like, okay, I got to figure out, I got to have a base directory and some other directory. I'll reach for Pathlib instead of os.path. Yeah, I guess it has been there since 3.4, so I should get the times. Yeah, so I mean, now before I could see the objection of like, oh, you have to backport it. And also, I think what I like as well is a lot of integrations that like, you know, automatically can perform checks where the path exists, stuff like that. Or for me as a library author,

Starting point is 00:35:45 you know, you're writing stuff for users and you want to give them feedback. And for instance, in a library like Click or Typer, which is the modern type hint version CLI interface, which was also built by my colleague, Sebastian, you can just say, hey, this argument is a path. What you get back from the command line is a path. It will check that the path exists via pathlib.

Starting point is 00:36:07 So it does like, you know, a whole bunch of magic there. Yeah, that is super cool. Yeah, or you can say it can't be a directory and then you write your CLI, user passes in an invalid path and you don't even have to do any error handling. It will automatically, before it even runs your code, say, nope, that argument is bad.

Starting point is 00:36:24 So that's pretty cool as well. That's awesome. And you don't have to care about Unix versus Mac or PC or something like that. Yeah, I mean, Windows, I mean, no offense to Windows, but it's always the handling paths, and Windows is always the classic story. Also, as a library author,

Starting point is 00:36:38 we're supporting all operating systems, but like, well, Windows just does it a bit differently, and you cannot assume that a slash means a slash yeah for sure all right well the final item is yours enos and it's definitely interesting so if you're working in the machine learning data science side of things it might not be enough to just back up your algorithms and your code right yeah you also have your yeah machine learning is code and data. So yeah, so this is something we discovered a while ago and that we're now using internally. So we currently, as I mentioned before, we're working on version three of Spacey. And one of the big features is going to be a completely new optimized way for training your custom models,

Starting point is 00:37:19 managing the whole end-to-end workflows from pre-processing to training to packaging, and also making the experiments more reproducible. You want to train a cool model and then send it over to your colleague and your colleague should be able to run the same thing and get the same results. Sounds really basic, but it's pretty hard in general in machine learning. So our spacey stuff will also integrate with a tool called DVC, which is short for data version control, and which we've started using internally for our models. And DVC is basically an open-source tool for version control,

Starting point is 00:37:51 specifically for machine learning and for data. So, you know, you can check your code into a Git repo as you're working on it, but you can't just check your data sets and models and artifacts into Git or your model weights. So it's very, very difficult normally to keep track of changes and your files. Most people just end up with this directory of files somewhere, and it can be very frustrating.

Starting point is 00:38:13 And so you could think of DVC as Git for data. And the command line usage is actually pretty similar. So you type Git in it and DVC in it to initialize it, and then you can do DVC add to start tracking your assets and add them. So it's like, I think if you're familiar with Git as like abstract, it can be at times, you will also kind of find it easy to get into DVC. And it basically lets you track any assets like datasets, models, whatever, by adding meta files to your repository. So you always have like the checksum in there

Starting point is 00:38:46 and you always have these checkpoints of the asset, even though you're not actually checking that file into your repo. And that means you can always go back, fetch whatever it was from your cache and rerun your experiments. And it also builds this really cool dependency graph. So you can really have these complex pipelines with different steps. And then you only have to rerun one step if some of the inputs to it have changed. So in machine learning, you'd often have a pipeline. You download your data, then you

Starting point is 00:39:17 pre-process it, then you convert it to something, then you train, then you run an evaluation step. And everything sort of depends on each other. And that can make things like really hard. And you never know, you usually have to run everything very clean from scratch. Because yeah, if something changes, your whole results change. So if you set up your pipelines with DVC, it can actually decide whether something needs to be rerun. Or it can also know what needs to be rerun to reproduce exactly what you're trying to do. So that's pretty cool. Yeah, that could save you a ton of time and money if you're doing it in the cloud.

Starting point is 00:39:51 Yes, exactly. Yeah. And you know, you can share it with other people. It's like it's, it's, I think it definitely solves a problem that's very real. And yeah, the people making DVC, they've also recently released a new tool that I have not personally checked out yet. But it looks very interesting. It's called CML, which is short for continuous machine learning. And that's really more of the CI, which kind of is logically the next step, right? You manage everything in your repo, and then you obviously want to run automated tests and continuous integration. So the previous looked really cool. It showed kind of a GitHub action where you can submit a PR

Starting point is 00:40:25 with like some changes to your code and your data. And then you have the bot commenting on it and it shows like accuracy results and a little graph and how stuff changes. So it's really like these code coverage bots that you've probably seen where like you change some lines and then it tells you,

Starting point is 00:40:41 oh, coverage has gone up or down and the new view of your code so that's what it looks like so i think yeah i'm pretty excited about this and definitely it solves a problem it's already been solving a problem for us and yeah how does it store the large files i know it has this cache is that a thing that you host does it have a hosted thing that's kind of like github or i'm not sure if you could you probably connected to some cloud but like normally you have that locally it also has a cool thing where you can actually download files via the tool. And then depending on where you're fetching it from, if it's a cloud storage bucket or however they call it locally as like you know so it's like kind of a drive you have access to locally and then you can

Starting point is 00:41:30 just sort of type gs blah blah and then the path and really work with it like a local file system and that's pretty nice so you can you know you can have you can work with private assets because the thing is a lot of toy examples assume that oh you just download a public data set and then you train your model and then you upload it somewhere but that's not very realistic because most of the time the data you have can't just go in the cloud publicly so yeah but yeah i think i don't even know exactly how it works in detail but like it can basically tell fetch i think from the headers or something it can tell whether the file you're downloading has changed and whether there's something new.

Starting point is 00:42:05 Yeah. With a normal version control, one of the reasons we use it is to try to find what's different. Do you do diffs on data? I don't know. Maybe. I mean, I'm not sure if there's... I think the main diff is more like around the results that you get. Because, I mean, diffing large data set,

Starting point is 00:42:23 diffing weights, you kind of can't that's that's really where we are the you know the other problem where like you need to run the model to find out what it does and then you're diffing accuracies rather than weights okay i don't know if it does like actual diffing of the data sets but often the thing that changes is really the models like you have the you know you have your whole data and then you change things about your code yeah and something changes and it's very you want to keep track of what it is or how it manifests yeah it's really cool to see them working on this yeah so and also we'll be in spacey 3 we'll hopefully have a pretty neat

Starting point is 00:42:54 integration where you know if you want it's not like mandatory but if you say hey that's cool that's how i want to manage my assets you can just run that in your in a spacey project and then it just automatically tracks everything and you know you can shake that into git and share it and other other people can download it so that's yeah i'm pretty excited about that it works pretty well so far yeah everything you can do to make it a little easier to work with spacey and just make it reproducible yeah and it's just the things are hard like there is i'm not a fan of these oh one click everything just magically works like it looks it looks nice and it's a the things are hard. Like there is, I'm not a fan of these, oh, one click, everything just magically works. Like it looks nice and it's a nice demo,

Starting point is 00:43:28 but like once you actually get down to like the real work, like things need to be a bit modular. Things need to be customizable. Otherwise you're always hitting edge cases or you have these leaky abstractions. So yeah, I think things should be easy to use, but you can't just magically cover everything by just providing one button.

Starting point is 00:43:45 That's just not going to work. Yeah, because when it doesn't work, it's not good anymore. Yeah, exactly. Yeah. All right. Well, that's our six items that we go in depth into. But at the end,

Starting point is 00:43:56 we always just throw out a couple of really quick things that maybe we didn't have time to fit into the main section. And I want to talk about two things that are pretty exciting. One is if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are. There's probably over a million podcasts these days. One of our listeners, Anton Zianov, wrote a cool Python package that will let you search the iTunes directory and query it. It's basically a Python API into iTunes podcasting directory. You know, some people think that you've got to be part of the Apple ecosystem to care about iTunes, but really that's just the biggest like directory,

Starting point is 00:44:37 kind of Yahoo circa 1995 style of listing of podcasts. So if you care about digging in and researching podcasts, check that out. That's pretty cool. And then, yeah. And then I've also, I'm such a big fan of F strings. How about you two?

Starting point is 00:44:51 Yes. Yes. F. That's right. Yeah. I'm finally, I'm finally working in like Python three only. I remember,

Starting point is 00:44:58 I think last time I was on the podcast, I was basically, I was saying how like, Oh, all these modern things. They're so nice. I wish I could use them more, but we're still supporting Python two, but like, no, everything these modern things, they're so nice. I wish I could use them more, but we're still supporting Python 2,

Starting point is 00:45:06 but like, no, everything I write now, 3.6, yes. And I've talked previously about a tool called Flint, F-L-Y-N-T, which lets you run against an old code base and convert all the various Python 2 and 3 styles of formatting magically into Python 3. I think that was actually the episode I was... Yeah, you might've been, right? Like, I wish I could run this. Right. Yeah. And yeah, I ran that against like 20,000 lines of Python. I found like just a couple errors reported and

Starting point is 00:45:33 they got fixed. So that's nice. But the thing that's bugged me endlessly about F strings is I'll be halfway through writing the string and I'm like, Oh yeah, I want to put data here. So I got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like the string is being passed to a function. So I go back to the first quote, put the F, go back forward and then start typing out the thing I actually wanted. Right. Or maybe I'll F string something when I really, I'm not going to put data. Right. So it's like you're halfway through and you want it to become an F string. Well, PyCharm is coming with a new feature where if you start writing a regular string and pretend like it's an F string,

Starting point is 00:46:08 it'll automatically upgrade, upgrade it to F strings. Yes. Halfway through. Yes. Without leaving. So you just say curly variable. It's like,

Starting point is 00:46:15 Oh, okay. That means that's F string and the F appears at the front. Yes. Nice. So that is pretty awesome. Anyway, those are my two quick items.

Starting point is 00:46:22 Enos, I'm also excited about the one you got here. Yeah. Awesome is awesome. Yeah, I had one, which is something coming to 3.9 or in 3.9, which is PEP 585. And you can use, when you use type annotations, you can now use the built-in types like list and dict as generic types.

Starting point is 00:46:40 So that means no more from typing import list with a capital L. Yes l yes yes so you just literally but i mean when i first saw it i'm like that looks strange but like yes i'm so excited about this it probably it'd be years until i can just like use it all across my code basis because true yeah but like yay that's in three nine yeah yeah that it's in 3.9 i'm already using 3.9 and i didn't know that you can do this yeah yeah and guido is one of the guys on the uh the pep making this happen like i said he's really into typing oh that's great this is really cool because it was super annoying to say oh you have this new import just because you want to use type annotations on a collection

Starting point is 00:47:20 right now you don't have to there's actually a bunch of the collection stuff and iterators and whatnot like this you know the collections module like that a bunch of stuff in there it's really nice and they're compatible like lowercase list of str is the same as capital list of str i believe all right brian what you got oh i just wanted to i'll drop a link in the show notes uh testing code 120 code one 20 is where I interviewed, uh, Sebastian Ramirez, uh, from explosion also.

Starting point is 00:47:49 And, talking about fast API and typer, because I'm kind of in love with both of those. They're really cool. Yeah, absolutely. All right. Well,

Starting point is 00:47:57 uh, that's a cool one. Definitely going to check that out. And you can find out why he has the cool mustache. That's right. All right. So we always in the show with a joke and i thought we could do two jokes today so i think enos do you want to talk about this first one oh yeah i mean i'm not even sure it counts it's a joke per se but like it's more of a humorous situation i guess right yeah it ties in um well it's Sebastian again. Like he had this very viral tweet the other day

Starting point is 00:48:26 where he posted about some experience. I can just read it out because I think it needs to kind of stand on its own. So he writes, I saw a job post the other day. It required four plus years of experience in FastAPI. I couldn't apply as I only have 1.5 plus years of experience since I created that thing. And then he says, maybe it's time to reevaluate plus years of experience since I created that thing. And then he says,

Starting point is 00:48:45 maybe it's time to reevaluate that years of experience equals skill level. So, and this was, I was like, it resonated with people so much. I was actually surprised to see, like everyone was like, oh yeah, HR, like apparently this seems to be this huge issue,

Starting point is 00:48:59 obviously that like, most job ads not written by the people who actually work with the technologies and where you have, yeah. Actually, yeah, not most job ads, not written by the people who actually work with the technologies and where you have, yeah. Actually, yeah, this is awesome. And this tweet actually just got covered on DTNS, the Daily Tech News Show, I guess it is. Alongside another posting that said

Starting point is 00:49:17 you needed eight years of Kubernetes experience for another job. But of course, Kubernetes has only been around for four years. Yeah, when you say this went viral, it had 46 46 000 retweets and 174 000 likes that's like that's got some traction i feel like this might be a problem yeah yeah i was i was surprised that like so many people are like yeah that's a big deal and it's like and i mean it is true like kind of tech hiring sort of seems seems to be broken and it's also, it's like, it's a bit different in my case, I guess. But like, I don't qualify for most roles using the tech that I write.

Starting point is 00:49:50 And in some cases that's justified because I'm not a data scientist. Just because I write developer tools for data scientists doesn't mean I can do the job. But in other cases, I'm like, there's kind of a ridiculous amount of arbitrary stuff you're asking for in this job ad. Maybe that's needed, maybe not.

Starting point is 00:50:03 But like, it centers around like a piece of software that i happen to have written and i do not qualify for your job at all like that's insane the last time i wrote a job description i intentionally left off the college a degree requirement because all of the other requirements i was listing in there either they had it from college plus experience or they had it just from experience. So I was fine with that. By the time it actually went live, somebody in HR had added a college degree requirement to it. I just couldn't get away with not listing that, I guess.

Starting point is 00:50:37 Yeah. Master's degree in space is preferred. Yeah, but I guess another problem is, well, look, if HR writes these job ads with these bullshit requirements, then, well, look, if HR writes these job ads with these bullshit requirements, then, well, who applies? It's either people who are like, yeah, whatever, or people who are full of shit.

Starting point is 00:50:52 And then that's the sort of culture you're fostering. And it might not even be the engineer's fault who voted on his job description, but who applies to that? You're going to make me lie about my fast API experience. Yeah, or people just apply to anything. I'm like, yep, I have 10 years experience in everything. Great. And they're like, perfect. That's what we're looking for.

Starting point is 00:51:10 You're hired. And then you wonder, why is our company culture so terrible? Well, I actually did have somebody apply to a job and say they have multiple years of experience in any new language coming up. Nice. It looks like we're just about out of time. Let me give you one more joke for it. Brian, will you describe

Starting point is 00:51:33 this picture and then I'll read what it says? There's a poorly drawn horse, I think, zebra horse that has a white on the back end and black on the front end. And the text says, I defragged my Zebra. I don't even know if people defrag drives anymore. So this is only going to resonate with the folks that have been around for a while. I saw that there was this great video I came across on YouTube where you can actually watch like a live defrag session, like, I don't know, Windows 95.

Starting point is 00:51:58 And it's like, I don't know, it takes a few hours. And, you know, you can kind of bring back that nostalgia and just put it on your TV and just sit there and you're like, yeah. It's like the aquarium you would put on your TV. But for tech. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.

Starting point is 00:52:16 And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #191 Live from the Manning Python Conference

Topics covered in this episode: VS Code Device Simulator pytest 6.0.0rc1 What is the core of the Python programming language? Extras Joke See the full show notes for this episode on the website ...at pythonbytes.fm/191

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.