Python Bytes - #240 This is GitHub, your pilot speaking...

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 240, recorded July 1st, 2021. How time does fly. I'm Michael Kennedy. And I'm Brian Arkin. And I'm Chris Moffitt. Hey, Chris. Welcome to the show. Thank you. Great to be here. Yeah, it's great to have you here.

Starting point is 00:00:19 We've had you talking about the missteps of Excel and how the Python data tools chain can make that better over on TalkPython a few times. But this is your first time on Python Bytes, right? It is, yes. Yeah, exciting to have you here. Definitely, definitely. But maybe you want to go ahead and kick us off. I want to talk about subclassing today.

Starting point is 00:00:38 But Hinnick wrote an article called Subclassing in Python. And dealing with classes is just everywhere in Python. Even if you're not using classes, Python itself has all sorts of classes and objects that you're using all the time, whether you know it or not. But when you start getting into larger design,

Starting point is 00:00:58 there is a question around composition versus inheritance and stuff. So I really like this article that Hinnick put together because, because I think people should think about the ramifications more. So the general gist is he prefers composition over inheritance and I do too. And, but then goes through, if you have to do inheritance when, and sometimes you do in Python, for instance, uh, the, the greatest, the greatest example I know of is when you're having, uh, exception hierarchies.

Starting point is 00:01:34 Um, and it's really, it's really easy to build up exception, exception hierarchies in Python. And it's, um, there's like nothing there is except for like the, the class, the class definitions and their inheritance. And that's it. The easiest class you've ever created. Class, exception name, pass. Yeah. But it's useful to do that.

Starting point is 00:01:52 But then if you want to go further, there's other design patterns and stuff, especially from the C++ world where people might be thinking, well, I want to do something similar in Python and stuff. And so this is actually kind of a really great article discussion about it. world where people might be thinking, well, I want to do something similar in Python and stuff. And so this is actually kind of a really great article discussion about it. It's pretty long. I don't want to summarize it too much, but I'll jump into the three types. So he talks about three types of subclassing that's often happens, subclassing for code sharing. And the short answer is it's just people are trying to do the dry principle and try to share code. And it's just, it ends up being a bad idea, essentially. And there's a bunch of references for it. And if you don't, I think if you don't believe me or him, read this article and read a bunch.

Starting point is 00:02:37 He's got a whole bunch of linked articles, too, that discuss it. But I kind of agree. The second type is abstract data types or interfaces. In a lot of languages, they're called interfaces. And this is kind of a neat use of it, and there's a bunch of things. And I thought, okay, yeah, you definitely will use inheritance and composition for data type stuff.

Starting point is 00:03:06 But in his discussion, he talks about some of the cool things that Python has that allow you to have these sort of hierarchies without actually doing subclassing. So there's some cool features of Python, like the protocol syntax that came in recently in typing.protocol. Yeah, protocol is like formal duck typing,

Starting point is 00:03:31 which is an odd thing to combine, but yes. Yeah, but it's kind of really cool how it's put together in Python. So I like that. And then lastly is specialization. And that's where kind of the exception hierarchies come in. But also he's got a great discussion about structuring data classes that have common elements. And I think that's an interesting discussion too. And I think I already said this, the summary

Starting point is 00:03:57 really, it's really hard to summarize this article other than it's good to think about your design, especially if you're going to try to bring subclassing into it. So let's do that. Yeah, awesome. I haven't had a chance to dive into this article, but I do want to read it and explore it. You know, it touches on a couple of things, like it touches on namespaces and modules,

Starting point is 00:04:15 which I think is pretty interesting. So many people coming from C++, C Sharp, Java, etc. Like all these really strongly OOP, especially C Sharp and Java, where everything has to be a class. They, you'll see people creating classes just for things like static variables and so on, right. Or static functions. If you just have a bunch of static functions, you know what works really well for that? A module that has functions in it, right? That's

Starting point is 00:04:37 the same thing. You import module. Then you say module dot function name is the same as from module import class, class dot static function name, right? Like it's just a layer that doesn't really need to be there. So the article touches on that, which I think is neat. Like sometimes you just don't need those. And then also the composition over inheritance. I think composition over inheritance is a really important thing to think about because so often people say, well, you can't use OOP because it's horrible in all these ways. And you end up with like a robotic duck that can't quack. I don't know. Like you end up with these weird situations. If you like derive too many things, then you put a weird

Starting point is 00:05:11 specialty on the end. You're like a duck is an animal and then like, but it has wings, but wait, now it's a robot. Now why does it eat water? You know, it's like what happened to it, right? But the composition allows for you to keep things much more tight and small in the inheritance stack but still put them together in meaningful ways so anyway yeah i want to see more about this this looks great and i i'm coming from a standpoint of um i'm a c++ person as well and i've done both extremes i've like gone way down the inheritance hierarchy thing and had like seven deep in hierarchy maybe not seven but like five deep and it gets to be a nightmare so then i got went to the other direction and didn't do any inheritance at all in a design

Starting point is 00:05:52 but there's there's problems there too so yeah um thinking about it and doing it smartly is it you just need to yeah so it's uh often like salt i could see it's really good when you have some you try to go like, salt's great. I'm going to have that for dinner. Like, no, you shouldn't do that. I think the other thing that's really important about this is depending on how long you've been working in Python, sometimes you kind of get stuck in a rut and you're always doing the same thing. And the language has evolved and grown over time. And so I think articles like this kind of force you to take a step back and see if you're using all the new features in a way that maybe aren't idiomatic. Also, quick comment out there from the live stream. Paul says, first time watching the live

Starting point is 00:06:35 stream. Hey, Paul. Weird seeing everyone when they say the intro. Indeed, it is kind of weird. But I want to highlight this one to say hi to Paul. Thanks for being here. But also, if you're listening, you're like, hey, I'd kind of like to see what's on the screen while you're all talking about this. You'll follow us on YouTube. There's like a live stream and you write on pythonbytes.fm. So it's easy to sign up for that. And also Sam out there in live stream following up on that says currently

Starting point is 00:06:56 maintaining a library with a deep templated class and higher, uh, and hierarchies. It's very hard to keep track of it at all. Uh, yeah, I hear you that that's for sure all right uh let's switch over to the next one now i tried i tried to resist this brian i promise i did but

Starting point is 00:07:13 i've ended up with an extra extra with seven more extras here all about it and it just had to become a main item because otherwise we'd be here for hours it's not the idea of the show so we've got an extra extra here all about it nine extras uh not the idea of the show. So we've got an extra, extra, you're all about it. Nine extras. Let's pull them up. Extra number one. We've talked about Pyodide. I had a whole talk Python episode on Pyodide, which is an interesting thing.

Starting point is 00:07:34 It is this project by Mozilla where you take Python and you run it in the browser. And then you take many of the data science packages like NumPy and Matplotlib and stuff and compile them into the browser. And then you basically have client-side Python data science, which is really interesting. This project is being spun out as its own topic, as its own project. It's no longer under Mozilla. Usually that doesn't sound good to me. It kind of sounds like it's been orphaned. I have no idea what the status of Pyodide is. People can check that out, but it's no longer under Mozilla. It's its own separate thing, as they say. So it's cruising out there. And also it didn't get compiled to JavaScript. It got compiled to WebAssembly, which is interesting because that's faster. All right. That's number one. Number two, I just,

Starting point is 00:08:19 as in a couple hours ago, released a brand new course, Python Powered Chat Apps with Wilio and SendGrid. So the idea is if you want to have some kind of chatbot, but a lot of that conversation has to involve your database and your data and verifying things like the app that we built here is a tech-savvy

Starting point is 00:08:38 bakery where you can order cakes by sending it a WhatsApp message. And then it'll say, hey, you want a cake? Well, here's the menu. And it actually gets the menu from our flask app. And then they pick something off the menu. And then once they pick all the details, they said like this, okay, great. We send it back to our website and figure out how much that's going to cost. They order it goes back.

Starting point is 00:08:57 We send them a, if once they accept it, they get like a customized pretty email goes back to the backend, the bakers bake it. It sends them another message to let them know. So if you want to build kind of like that workflow, uh, if Julio and Singrid check it out, this course is super fun. It's six hours and it's a hundred percent free. So, uh, people can check that out that I think you're trying to build that kind of thing. That'll be a lot of fun. So links in the show notes there. I had something. Yes. If I can't afford free, can I get a discount code? I will give anyone

Starting point is 00:09:26 listening 50% off that. So I have this really cool tweet and Twitter is broken from what I can tell for everything that's not the homepage. So something went wrong, but let me describe it. So when you look at it in the show notes, you'll be able to see there's a really cool tweet from Nick Mall, who was on the guest on the show last week. Oh, you got it. How can you get this to work?

Starting point is 00:09:48 You've got some sort of magic. All right. Well, so thanks for putting it on your screen. So here we have, um, Will McGuigan, uh,

Starting point is 00:09:55 showing an animation of basically this really cool, like collapsible sidebar and like scrolling within sub windows inside of textual right we talked about textual as well and it's just such a cool graphic that says like wow you can build some pretty amazing applications there what do you think will's just docking out of the park with this it's fun to watch him go so fast absolutely well done there will uh i'll switch back to mine for a moment okay ours technical works on my. So remember we did an episode and I titled it something like Flock No or something like that. So Flock, Federated Learning of Cohorts, is something that Google was trying to do so that they can replace third-party cookies. Why?

Starting point is 00:10:40 Because people are running ad blockers or like I am right now, a VPN that at the network level blocks all the ad tracking and third party cookies. So they're just basically not working very well anymore. So they need to they're going to cancel third party cookies from which means they're canceling for the net, the Internet. And but because they're Google and they're based on ad revenue, primarily, they can't just go and we canceled tracking. Hooray. We're all winning on privacy, right? It has to be replaced with some other form of tracking, which they call this federated learning of cohorts.

Starting point is 00:11:15 But the federated learning of cohorts has all these almost more negative consequences. And I don't want to go too much into that because we went into quite a lot of detail. But for example, you could say, I would like to target lesbians who just got divorced. You run an ad on that. People show up on your site. They sign up. You have an email. And now guess what?

Starting point is 00:11:35 Not only do you know what their email is, you know that they're in this group. And maybe this is the very first time you've ever met them, right? So really weird, creepy stuff that you could pull out with this. Anyway, the big news is Google delays the rollout till 2023 because you know what people don't like it they're not super keen about it uh so there's a whole bunch of uh people who are against that but you're saying they're just delaying it not stopping it yes for now like let me this is a great article that people should check out like let me read the first sentence or two google's plan to upend web advertising and user tracking

Starting point is 00:12:10 by dropping third-party cookie support and chrome has been delayed most browsers block third-party tracking cookies now as though vpn like i mentioned but google the world's largest advertising company wasn't going to follow suit without protecting its business model first. But there's a lot of challenges with this. A lot of people have come out against it. And yeah, it's not going to work out super well. So they decided to delay it. That's what they said. Stage two starts mid 2023. Google says it's received substantial feedback including from us and other companies out there are like uh we kind of want to keep tracking too but we're not really excited about this so we're just going to not say anything like apple opera mozilla microsoft yeah they're like uh what else i share about this anyway yeah they've received substantial feedback so hooray i think for now one thing that we don't

Starting point is 00:13:03 talk very often about in python is what if you want to ship your code to somebody and it has sensitive algorithms in it, right? It's not that common, but you could get Py2exe or Py2app bundle up your code and give it to somebody. For example, Dropbox does something to this effect, right? But your Python code running up in your little menu bar, there's other ones as well. But you might want to encrypt how that works or protect how that works so that people can't just open the py files and look around. So there's this thing called Source Defender. I'll be clear, this is a paid commercial product.

Starting point is 00:13:35 I have no affiliation, but they sent me a message, hey, we're doing this thing, what do you think? It looks kind of interesting. I think it's going to be a pretty limited set of people who actually care about this. Like if you're running on Docker, you're running on the server, you probably don't care. Maybe you do, but probably not. But if you'd like to be able to encrypt your source code, so it's much harder to see and then ship that to somebody, you can use this thing as part of their

Starting point is 00:13:56 paid service. So that's kind of cool. People check that out. Let's see. Oh, there's a plate noise. I want that. So I was recently interviewed on a day in the life in a work from home Pythonista, which is a cool series being done by the folks in the Philippines. If you want a tour of the behind the scenes studio and all the work from home stuff, people can check that out. Python 396 was just released.

Starting point is 00:14:16 We can check out the change log and see what's happening there. There's a security HTTP client about what I think is like a denial of service look it sounds like it avoids an infinite uh loop sort of thing so that might matter to people um probably not but maybe it does then a bunch of changes uh that are happening here including um platform specific ones so if you're running python 3.9 and why wouldn't you be update that that. Because you're running 310. Yes, that's right. You're already ahead of the world. You're in the future.

Starting point is 00:14:46 So also we had Calvin on from six feet up a while ago and we talked about the conference that he was putting together. Well, the videos from that conference are out as a YouTube playlist so people can check that out. I don't remember how many videos there are. Let's click on it and see.

Starting point is 00:15:02 There are 61 videos, including one on the Python memory deep dive talk that I gave. So if people want to check that out, they can. Let's see. Oh, this one. Check this out, Brian. Have you seen this? Did you know you can pip install Python bytes?

Starting point is 00:15:16 Yeah. You can literally pip install Python bytes because of Scott Stoltzman. Created this for us as a joke. He was listening to one of our episodes. I can't remember what we talked about. This was episode 239, but we must've talked about packaging and pip and things like that.

Starting point is 00:15:33 So he created a package called Python Bytes. And what it does is basically you give it a number, like 240, and it would download this version as an MP3 file and put it right next to, you know, whatever the working directory is. So if you want to pip install Python bytes and then pythonbytes.downloadepisode instead of using a podcast player,

Starting point is 00:15:52 we're all for that. You can check that out. Yeah, that's it. That's extra, extra, extra, extra. Well, many, many extras. Yeah, so the Python bytes package is just sort of, it was for fun, but it also, it's really small. And one of the things I like about it is it's

Starting point is 00:16:07 just a really cool example of like with python you got something that downloads mp3 files off of a feed somewhere it's that easy it's just that's pretty cool yeah that's fantastic absolutely all right let's see a couple things from the live stream sam says things have happened with mozilla the last two years that really shook my confidence with them i am still a big fan of firefox and i support their mission but yeah it's uh i want to see them succeed let's see um another one from the live stream antonio said hey guys have you mentioned kivi before hey guis and kivi there you go i watched a video about this week it's a GUI that's compatible with many things, including the mobile devices.

Starting point is 00:16:45 I do, my feeling is that Kivi is a lot about, it's more about building almost game-like interactions, whereas a lot of GUIs people want, they want like, here's a text box. I type in the text box, here's a button I drop in, you know? So, but yeah, pretty cool. Well, let's see. Kim Venwick says, as an aside,

Starting point is 00:17:03 shipping a Docker image won't obfuscate the Python. The image can be taken apart in files like that. That's see. Kim Venwick says, as an aside, shipping a Docker image won't obfuscate the Python. The image can be taken apart in files like that. That's true. They absolutely can. I was just thinking like you're probably just running on like a container service.

Starting point is 00:17:13 But yeah, if you're shipping it to someone, it's the same. Nick Harvey on the live stream says, could just send the PYC files with no PY. It's not foolproof, but it does require more work. You're right.

Starting point is 00:17:24 You'd basically be down to like this dot this and like reading the bytecode. Yeah, for sure. Let's see. Final one. Rayhan says, if it ends up running code on your machine, you can read it. It's about putting enough barriers that people won't bother. Yeah, that's definitely true. I mean, you think of C++ and things like that being completely opaque. And yet people take that apart all the time. But there is also a difference from I'm literally shipping you the source files here

Starting point is 00:17:48 to, you know, cause then you could go in like, oh, here's where the license check is. Let's just, you know, command slash comment that out. All right, now we're ready to run. Yeah. Right. You want to make it a little bit of a challenge, at least I suspect. Anyway, thanks for all the feedback out there, everyone. That's the everything. Extra, extra nine times. All right, Chris, what's your first one here? All right. So the first one is from Andreas Kanz, I think is how you pronounce it.

Starting point is 00:18:15 And it's a library called CLIB, I believe. I wasn't sure if it's K-Lib or CLIB, but I think it's CLIB. And it's for automated cleaning of Pandas data frames. I guess I should even say it's a little bit more than just cleaning. It's automated analysis. And, you know, I'll be the first to say I'm a little skeptical about some things that try and automate the process, but I was playing around with it. And there's some pretty cool things that it does. The documentation, probably the best way to learn about it is the Towards Data Science article that he wrote, which gives a pretty nice overview of what it does. It has some, as I mentioned, some pretty nice cleaning features as well as analysis features. So I was going to kind of go through a couple of the, um, describe a couple of things. The first one that I thought was really interesting is the, um, there's this function called data cleaning and it essentially does, you can control what it does. So it can clean the column names, it can convert data types, it can drop missing. So one of the things that Pandas does is it's not really aggressive about the data types

Starting point is 00:19:32 that it uses. So when you read in data, it will just kind of assign it maybe to a float or an object. But if you want, you can get in there. And if it's a value, if it's a column, let's say that has only values from, you know, less than 100. If you convert it to an integer, it saves memory. If you save enough memory, then you can actually speed up your code. And so this goes behind the scenes and takes your data frame and converts it essentially to the smallest value, NumPpy value that it can store. And then, you know, I took a random data set and sure enough, it did reduce the memory footprint quite a bit, which I thought was pretty interesting because it's one of those things that is very tedious to do on your own by hand.

Starting point is 00:20:18 Does it do, like if you have the same string, does it just create a pointer to one copy instead of having that many times, stuff like that? It can do that by converting to a category type. That's essentially what pandas is doing. When you create a category, it does that to kind of string to like a list conversion. And it's, you know, it's pretty effective. And yeah, I've used the category piece before, but I haven't actually gone in and tried to, you know, shorten up the numeric columns,

Starting point is 00:20:47 which is really useful. The other thing... Can you just convert them to all the integers and then it'll just be shorter so you don't have to worry about the size? I'm just teasing.

Starting point is 00:20:55 Yeah, yeah. You probably knew that's the point. Yeah, no, no. But, I mean, it does even do, it's like, it can do even like

Starting point is 00:21:01 int16s or int32s or int8 maybe. Oh, yeah, interesting. Like, it'll shrink to the size that'll, like, all under 256 so we'll go to like one byte exactly exactly you know and i haven't looked at the code to see you know how it actually figures it out but i had a fairly large data frame and it was it was pretty quick um the other one that was interesting is the clean column names so i think there are some other libraries out there that will like strip spaces or special characters from column names. But what this one will actually do is actually, if you have a column name that has, let's say camel case,

Starting point is 00:21:35 it'll convert it to all underscore, or it will just essentially normalize all of your column names, which, you know, you could have a debate about whether you want to do that. But when you have a data frame that has a lot of columns, and you're just looking at the first time, that can really be helpful. And then the the other function that it does that works pretty well is for cleaning duplicate data or empty data. So if you have a lot of columns that have no values in it, or just, uh, you know, maybe 90% of the values are empty, you can set thresholds and just, uh, clean that all out. So I, I was playing around with it and I was pretty impressed. Um, and I kind of wanted to call it out because the documentation right now is mostly around the Jupyter notebooks that he has.

Starting point is 00:22:30 So I think, you know, it would be nice if we could get some get some more docs in there and some more examples. But overall, I was really impressed the library. And I think people should kind of take a look at it and see if it's something they want to use for some of their own processes. Yeah, some of them sound interesting, even if you don't have to trust it, right? Like the shrink the smallest data set, data type, for example, or normalized column names. Those don't seem as risky as, you know, clean it up. Exactly. Find the wrong data.

Starting point is 00:23:03 Exactly. And then I forgot to mention, it also has some nice correlation plots. And some of these things you can already do with Seaborn or Matplotlib, but I found that it gives you a little more control and it's just a little bit easier to do it. There are certainly other tools out there that do this as well. Oh, and then the categorical data plots, I thought was a nice summary of the data and gives you some nice graphs and it helps you understand where you've got some missing values. But yeah. The visualizing the missing data is a really interesting feature. Yeah. And there is another Pandas data frame called Miss no that that does this and does it well but i think

Starting point is 00:23:45 this is a unique combination especially some of the um data um the memory saving features that has are pretty neat yeah so the cleaning features though have a lot of um there's a lot of parameters to it so it looks like you have a lot of control and uh one of the i mean again this is open source so it isn't that magical. You can just look at the source and see what it's doing. Exactly, yeah, and that was one of the things I was looking at is like data cleaning, I think, is kind of the top level, and you can just run that wide open and it'll do everything.

Starting point is 00:24:17 And it actually prints out a pretty nice summary of what it does, but you can also go in there and specify parameters, like you said, to control it so that maybe it doesn't rename the columns or drop some of the missing data. The other thing that I tried to play with that seemed really interesting is this pool duplicate subsets. And essentially what it tries to do, and I had a little bit of trouble with this because I think I put too much data at it, but it tries to, maybe if you have 10 columns of data,

Starting point is 00:24:49 it says, well, you know what? Four or five of them are very heavily correlated. So we're going to drop them and just give you the four or five that are actually most useful. And so I think that's some interesting tools to use when you get some data that maybe you haven't worked with before.

Starting point is 00:25:05 Yeah. Yeah. Very nice. What a, what a good find. Brian, you got the next one. Sure.

Starting point is 00:25:10 Yeah. Just a second. So I wanted to remind people to every once in a while, look at funk tools because it's I've, I've experienced funk tools as kind of a interesting library an interesting library that's built in that is, it kind of grows with you. So if you're new to Python and you look at it, it's going to be confusing. It's like all intermediate stuff in there. But as you learn and experience more Python programming, come back to it every once in a while because there's stuff in there that you'll use that you didn't think about before so um i'm going to go

Starting point is 00:25:50 through a few things and actually i wanted to call out uh there was an article by mike martin hines that i read that kind of reminded me to go through and look this so i want to shout out to him thanks um we've we've talked about some of this stuff before. We talked about function overloading and using single dispatch as one of the ways you can do function overloading in Python, which is cool. And that's part of FuncTools. And hopefully people are familiar with wraps.

Starting point is 00:26:19 Wraps is a way to create decorators that act like the thing that you decorated. And so if you're writing decorators, make sure you check out wraps and then caching as well. We talk, I think I'm sure we've talked about LRU cache, but I'm sure I have. Yeah.

Starting point is 00:26:35 Yeah. So that's in funk tools, the caching and a new in three, nine, there's a, there's a, just a simple cache. You don't have to say LRU cache.

Starting point is 00:26:46 And it's just a convenience wrapper around LRU cache. But it also, there's no max size. So you don't want to do that for things that you actually want to throw items away. But caching is super cool. So check that out. When I first saw the LRU cache, I'm like, whoa, I got to go figure out what this LRU is. And it's not like rather than just like cache the response. I guess the other question, though, you might be is like, well, what if you pass two variable or two arguments or sets of arguments?

Starting point is 00:27:14 How are those? Yeah. So either way, it's kind of not 100% totally obvious what's going to happen. Yeah, it's very cool. Yeah. So there's a bunch of caching stuff in there, like the LRU cache. But then you can also cache a property. And actually, the property one I hadn't used before, but I was playing with it this morning, and it's really cool.

Starting point is 00:27:33 So, like, for instance, if you've got a data class or any class that has a bunch of stuff, and some of the things, you have an expensive read on one of those, cause you have to calculate the value. Um, you can, uh, you can throw a cash property on it and it, it, it, it looks pretty cool. Um, one of the neat things about it is so it only reads it once and then it caches the value of the property. Um, and if you need it to refresh, you call delete on it, which is, it's kind of a weird, but kind of cool also, but it's odd to call delete on something that you want to still be there and it'll just reread it next time. So that's how that works. That is weird. That's definitely weird. Total ordering, I didn't realize was there. You can, so if you have something that you, some data type that you want to be able to compare, you can use total ordering to define equal and one other operator, and then you get

Starting point is 00:28:32 all of the operators, all of the comparison operators show up. You can use that. And then the last one I wanted to highlight is partial method, which partial and partial method, which these are kind of neat in that, like, let's say you've got a function that takes a whole bunch of arguments, but you want to pre-fill some of those in and create a new function that has some of the arguments pre-filled in. That's, you can do that with this and pretty neat.

Starting point is 00:29:00 Yeah. Okay. Interesting. I see you partially supply some of the arguments, but not all of them. Yeah. So so um yeah just uh shout out to this that um yeah these are intermediate or advanced topics but uh but they're as so as you learn uh more python come back to this every once in a while and you you might just use find it useful yep indeed i was like how did i miss this hashed property thing like surely i would have paid attention to that because what uh so often these properties that are like computed things but they you know often don't change right you get something back from the database you want to it has time stored in seconds you want to know how many days it is

Starting point is 00:29:38 so something happens you might have a day's property right but that's probably not good so having that cache is cool if you're sure it's not going to change, but I'm like, how did I miss it? It's new in three. So it's not, it's not super old. Yeah. And like Chris said, um, uh, one of the reasons to revisit a lot of the, these things and pay attention to the news on Python is because the language changes like this. So yeah, for sure. Kim out there in live stream says also worth looking at inner tools from time to time. Definitely. Great.

Starting point is 00:30:06 Indeed. It's in the same level of complexity, but for collection. I just kind of like that. You wouldn't first go there, but eventually like, oh yeah, this is what I wanted. I just didn't know it. Speaking of things you didn't know it, let me scare you all a little, make you all delighted.

Starting point is 00:30:19 I don't know. You tell me how you take to this. So let me set the stage. GitHub has a little bit of source code, much of it actually public, right? Like it's public repos and whatnot. So it can be analyzed and talked about and shared or used to train an artificial intelligence, which is pretty crazy. And if you look at the artificial intelligence around text, there's the GPT-3 stuff, which is like scary, good text-based AI. Well, they decided, what decided what if you know our parent company

Starting point is 00:30:47 also makes this editor what if we hit an ai based on understanding the source code from github like all the source code from github and put it into vs code and then it did stuff have you all seen this it's called github copilot yeah yeah yeah i haven't tried it i was gonna put the link in there and you beat me to it oh yeah i was on top of it so if you go over here there actually works for typescript go ruby python a couple other languages uh it says it works for many languages but it's best on those of course but if you just look uh like at their home page the copilot.github.com they've got this little animation and it says, I'm going to write a function that says parse expenses.

Starting point is 00:31:27 And it takes some kind of text. And you put a doc string, literally a doc string in Python. It says, parse the list of expenses and return the list of tuples, date, value, currency, ignore lines starting with hash, parse using date time. Here's some examples, tab. And then it writes the code that does that. And let's see, what is it going to do? It says it's in the middle of innovation, it creates a list of expenses,

Starting point is 00:31:51 it goes through each line on split, it says if the line starts with hash, this is all Python code, continue on your loop. Otherwise, date value currency equals split it. And then it knows how to parse the date one, convert the value to a float, and then store the currency as a string. And it's not just that sometimes it'll do this. You can actually get alternate implementations by tabbing through its recommended solution, which is pretty crazy. So this is powered by open AIs. It's called Codec or something like that. I don't see it right here right now. Anyway, I'll probably run across it in a second. That's what it's powered by. And it says things like, you're the pilot. So with GitHub Copilot, you're always in charge. You can cycle through

Starting point is 00:32:34 alternative suggestions and choose which to accept or reject and then manually edit the suggested code. Oh yeah. And it learns from you. So I don know this is this is wild y'all this is pretty wild stuff here what do you think i i think it's it's really impressive i mean i it will be interesting to see what it's like when you use it in real life um and i think that there could certainly be limitations but i i don't know about you but when whenever i'm programming there's always these things i just need to go and look at the documentation or look at Stack Overflow to refresh my memory. Right, like I got to connect to SQL Alchemy and I totally forgot how to do those three steps for that connection string sequence, right? Exactly.

Starting point is 00:33:16 Yeah. was throwing a little shade at that example that you're walking through because they said, well, why are you storing the currency as a float? It should be a decimal because if you store a currency as a float, you're going to have all the rounding issues. Well, that's how Superman makes all his money or the evil villain in Superman. What was it? One of those shows. Yeah. Richard Pryor in one of the original Superman. Yeah. Yeah. And it's not just based on the doc string. Like the example I first spoke about was you wrote complex doc string and then say, do that thing.

Starting point is 00:33:51 But you can do it based just on function name. You can just type a meaningful function name. What was the example they used? I can't remember. But yeah, so you basically just write a doc string, a comment, a function name, or even some code to give more context to it. And then off it goes. So yeah, codecs, that's the name of the AI system behind it.

Starting point is 00:34:14 So basically, this is a plugin for VS Code, but a really nice one. So here's some examples we'll all be familiar with. So fetch tweets. And the example here is you literally, literally write def fetch underscore tweets underscore from underscore user have, and then what it auto completes with is, oh, you're going to need to pass the username in. And then here's how you authorize with Tweepy, set up the API credentials. And then here's the code you write. Oh, yeah. And

Starting point is 00:34:39 here's your return. Or I want to do a scatter plot. And you write import, import map plot lib.py plot as plot, draw scatter plot have, and then boom, there it is. Or memoization. I wanted to point this one out because of what you're covering, Brian says, Oh, here's how you memoize a function, which is to if it's past a set of arguments, it's always going to return the same answer. So just give that answer. Like remember, these arguments equal this return value once it's run. And it shows how to create a complex decorator that is going to have a function that remembers the values using caching, it could just go at functools.cache. You know, I mean, like, so there's things like that, that is missing, right? Because you could achieve the exact same outcome with functools.cache decorators, right, instead of trying to write

Starting point is 00:35:24 a bunch of code that re-implements that. But anyway, pretty wild thing. I don't know really how to feel about that, but thinking about this today, it's kind of freaking me out, but it's also kind of cool. Yeah, I wanted to point out a comment

Starting point is 00:35:34 that people have been pointing out with relation to this is the, you know, I wish we could have, just specify what we wanted to do, the computer to do, and it just does it. And we already have that. It's called code.

Starting point is 00:35:50 So anyway. Yeah. I don't think this, you know, people often say things like, I remember hearing this 20 years ago. Oh, this low-code thing where you create these little boxes that do stuff and you drag and drop between them. We're not going to need programmers anymore. We're all just going to become dragger droppies.

Starting point is 00:36:06 And then like you programmers won't be needed. The business people will just draggy drop either way the future. And that never, ever happened, right? Because people got to put them in production. They've got to debug them. They've got to scale them and so on. Yeah. Yeah.

Starting point is 00:36:20 I think the same thing here, like sure it wrote it once, but you can't have a right-only experience for your code. You have to understand your code and be able to evolve your code and work with... This might power you into a solution faster, but I don't think it escapes the need of people doing meaningful software work. The person that pointed out,

Starting point is 00:36:39 and several people pointed out, the example of using money, floats and money, that does highlight one of the problems with something like this, though, that everybody needs to be careful of is the code that's generated. Now you have to like you, you were already creating carefully thinking about it when you were creating it. But if something else creates it, you've got to scrutinize that to make sure that's really doing the right thing. And so you're code reviewing some AI code while you're coding your own stuff. That's just a different part of your brain. You

Starting point is 00:37:11 got to make sure that you're really paying attention. Yeah. Yeah. And even I was, I was looking at that matplotlib example, and I would even argue that's not really the way you should do a scatterplot and matplotlib because you should use the object-oriented interface in matplotlib. I mean, the code will work, but I wouldn't advocate that you use that code. And so to your point, I think it will be interesting to see if it does learn on your own coding style. So does it start to recognize those things that you're always, you know, like you said, connecting to a database or fetching a file or doing a certain pandas function, will it start to learn that? And then I thought I read something about it adapting to you and learning from what you're doing, but I have no idea what that actually means.

Starting point is 00:37:57 Yeah, hopefully it's paying attention. So if it generates something and you change it to the different method and everybody else is doing that also, maybe they'll stop suggesting the old one and start suggesting the new one. Yeah. I'll, you know, Chris, your point about having to, um, maybe it was you, Brian, sorry. Whoever said about, you've got to like criticize this and you didn't write it. So you basically have to study it and then, then understand or understand it and study

Starting point is 00:38:20 it to make sure it's doing the right thing. You know, I, a couple of years ago, I don't know, a while ago I was river floating and broke my hand on some rocks, broke my finger in a bunch of places. And like my fingers were completely wrapped up all the way to the very tips. There was no like, Oh, little pecking typing while my hand healed. It was like, nope, no one handed really slow. So to keep things going, I used voice to text to try to at least keep email flowing for a month or something. And what I found was I could write pretty decent emails. It's hard to stop and think in whole sentences the way the little tools like it to work, but you can get it to work pretty well. But the mistakes it makes that are phonetically correct,

Starting point is 00:39:01 but actually what you mean wrong, like they and they or or something that sounds like what you said, but is actually not what you mean to say is incredibly hard. It's much harder to understand and edit than you would think. And so things like this, like, well, I wanted it to do that. And I hit tab and OK, it's doing I feel like there's going to be a lot of blind spots. Yeah, well, it did what it says it did. And I type the thing and it seems right. And like, how do you really really know i it just seems like in the same type of situation it's going to be harder than normal code to check because you didn't have to think through it to create it you

Starting point is 00:39:34 know yeah a couple comments from the live stream um rehan don't don't give them ideas says uh greeting dr falcon gosh where have you been uh let's play thermonuclear war as a doc string and nick says i can't help but think of microsoft hey which microsoft was this really cool bot that was like super good at adapting to stuff and they put it on twitter but people decided to like be mean to to it instead of teach it i think in in like Japanese Twitter, it became a very kind and intelligent bot, but on like English Twitter, it got turned into like a racist, horrible creature like right away.

Starting point is 00:40:12 And they actually had to cancel the project. So yeah. And then Arthur says, so next April Fool's Day prank, everyone start writing terrible code that influences the AI. And this is why English Day went down the tubes. Let's see.

Starting point is 00:40:30 And then Sam, for goodness sakes, don't trade it on GitHub code. It'll arbitrarily turn on debug mode. Yeah, perhaps. Yeah, Kim thinks this is both very impressive and vaguely unsettling. And that captures what I was thinking. Rahan, will you go and talk to the marketing people for me? both very impressive and vaguely unsettling. And that captures what I was thinking. Rahan, will you go and talk to the marketing people for me? I'm good with people. That's what I do.

Starting point is 00:40:58 Yeah. Okay. Another thing that's not mentioned here explicitly, but I think is interesting is this code is coming from GitHub. Yeah. When I go and I'm saying like, I'm working on super secret commercial project for large organization that has lots of people trying to scrutinize it. And I hit memoize tab. It's going to write some amazing code. Oh, by the way, was that GPL? Where did that code come from? Right? Like what's the license of the code that was on GitHub that I just now all of a sudden grab something that turned, you know, like if I was doing this on Windows and I hit tab, is Windows now open source? I don't know. That's a really interesting point. And you would think if it was a small startup, someone will probably sue them. But, you know, this is Microsoft now. Yeah, exactly. Yeah. Yeah. Anyway, so I'm I'm I agree

Starting point is 00:41:41 with Kim. This is both very impressive. And if this is the start, like where will it go? It would be very amazing. But it's also vaguely unsettling at the same time. And I don't know how I feel about it, other than I wish it was in PyCharm so I could play with it more often. All right. Chris, you got the last one? I do. So this is another library called Cats.

Starting point is 00:42:03 And it's a time series analysis library. And it's made by the same... Well, it's from Facebook. And a lot of people may have heard of Profit, which is a library for time series forecasting. And one of the things that's interesting to me about Profit and CATS is I think time series forecasting is something that's really common in the business world. I mean, you think about trying to forecast sales or maybe inventory movements or stock prices, a whole bunch of different use cases for it. And I think in general, most organizations don't have a group of PhDs that are really sophisticated in their analysis. So people use Excel and kind of come up with their own approaches. And that's why I thought Profit

Starting point is 00:42:52 was interesting. And I think this is interesting because it does come from Facebook and you have to assume that they've got a lot of smart people that are doing a lot of forecasting. And they've taken some of the things that profit was good at and added some additional tools. So before I go too much into cats, one thing I wanted to mention is I did write an article about profit, but I think other people, this gentleman, Peter Cotton, wrote an article about profit and essentially questioning how good it was. And this is a really long, really well thought out article. And some of the math and some of the concepts are way over my head.

Starting point is 00:43:37 But I do encourage people, if you're looking at time series forecasting, take a look at this. But what CATS does is instead of just doing forecasting with profit, it has a couple of different models that you can use. You can also pull out features from your time series data, you can do that with this library as well. And there's a whole bunch of other libraries or utilities to build like ensemble models and other approaches for time series forecasting. This is another one where it is relatively new. So there's not a whole lot of documentation, but it's a whole bunch of different Python notebooks, Jupyter notebooks, I mean. And like one of the things I think is interesting is from a forecasting perspective, you can use Profit, but use the same API and use Serima, I think, Serima and Holt Winters, as well as some other ensemble models. You can backtest, you can tune your hyperparameters. And then you can also, it's got several of these other algorithms for change point detection.

Starting point is 00:44:58 And a lot of this, like I said, is I'm not an expert on the math, but I am interested in how you figure out how to take these tools and apply them to those real world business problems. And so I think it's really great when we have some of these libraries out there that are developed by really smart people that do understand the state of the art that can maybe make it a little simpler for others to apply to their own unique challenges. Yeah, this looks really nice to bundle these all together. What's a type of problem you might answer with this? I think so one, one example could be help me figure out what my blog or my website traffic is going to look like in six months from now. So I need to figure out do I need to, you know, resize my servers or, um, upgrade my, my disk space or

Starting point is 00:45:47 what's my AWS bandwidth bill going to be? Exactly. Um, you, you know, the, the other one that I think it's probably used a lot in inventory. So trying to figure out, okay, what do I think sales is going to look like? What do I, what do I need to reorder so that I actually have enough product so that we don't stock out? I think those are some pretty common use cases. A lot of the examples here are the airline flight data. So anything that you have that's over a period of time, typically kind of on a daily basis over multiple years, you can then start to forecast out what the future, what those future numbers would look like. Then you have this magic prediction power for the executives.

Starting point is 00:46:31 Exactly. And I think what's interesting about is most of these, I think most times when people do prediction in Excel, it's kind of, you put the numbers in there and kind of do your, your linear line. But this, these tend to give you more error bars,

Starting point is 00:46:46 so you can give a range. So I think a prediction like this is much more valuable when you say it could be between 100 and 110 versus it's going to be 101.5. And when you do that, it conveys a lot more precision than is really there. Yeah, that makes a lot of sense. A Comment from the live stream, Sam Morley says, when I was experimenting with time series data, I managed to get better results with a fairly simple, naive Dharma model than I did using profit.

Starting point is 00:47:14 Yep, and I think that's exactly what this article, I don't know if he's read this article, but this, the article from Dr. Cotton, that's essentially what he says is some of the more simple models did outperform profit. Yeah. Interesting. Well,

Starting point is 00:47:30 cool. All right, Brian, is that it for all of our items? It is. Got any extra stuff you want to throw out there? Oh, I just had a,

Starting point is 00:47:37 just a quick one. Somebody on Twitter last week asked, why did I write a second edition of the book? And so I thought, well, that's a reasonable question. So at PyTestBook.com, you can go and I've added a why a second edition section. So you can go read that. You can build in fixtures, new flags, package scope features, F strings, types, all sorts of good things are available that weren't available then, right? Yep. Yeah. There's all sorts of reasons.

Starting point is 00:48:07 Always good to see Pathlib there. I love Pathlib. It makes my life so much easier when dealing with files. Yeah. I finally made the move. I've put down os.path and I'm now all about the Pathlib. Loving it. That's good.

Starting point is 00:48:18 Yeah. Chris, anything else you want to throw out there? I was going to throw out one other. I was doing some research for working with units of measure. And there's a library called UNYT that allows you to do things like convert kilometers to miles. But it works with NumPy. it works with all the scientific stack and that was um i hadn't heard of that one i thought it was kind of interesting and wanted to put that out there and next time you need to actually do something with units and convert uh back and forth might want to consider that and then it looks like there's a lot of like physics and chemistry type things like the mass of the earth, the radius of the earth as constants,

Starting point is 00:49:05 probably pi and e and all those things. Yes, exactly. And I think it's when you start getting into it, there's probably temptation just to code it all yourself, just put those constants in there. But when it starts to get more complicated, I think something like this could be really useful. And then there's another approach called Pint, which also works. And there we go, which also works with Units and has a little bit different approach. And so I think it's good to take a look at both of them. And if you have a need, then you can decide which API is going to work best for your unique situation. Yeah, that's cool. I haven't looked at Unit, but I love Pint. And I think the name is so good because my wife will ask me like,

Starting point is 00:49:46 how many ounces are in a pound or how many pints are in a liter? I'll be like, or even in, I don't know, a quart or vice versa. I'm like, I have no idea. I just, these are such messed up volume measures. And so it's like, here's the thing that takes the thing you don't really know about and allows you to convert it to the others in a safe way. It's good. Exactly. I got one more quick extra to throw out as well for you, Chris, there. I forgot to mention that you are the author of the Move From Excel to Python with Pandas course over at TalkPython Trainings, which is a really popular course. Basically, it's a intro to Pandas course disguised as solving problems you might with Excel, right? Exactly. Yeah. Yeah. No, thanks. And I've had a lot of good

Starting point is 00:50:25 feedback from folks so hopefully it's uh interesting to the listeners that haven't had a chance to check it out i might have to buy that for my boss maybe you can get that discount code yeah yeah get the discount code all right you ready for some jokes my twitter came back so i can show the twitter joke now all right so dean who is often but i don't see him this day today on the live stream sent a joke over and said do you know how it uh they say async in italian it's think yo or async io which i thought was a pretty good one async io i think yo i love italian all right um you guys got another one out there? I saw one in the notes, another joke, but I didn't see who put it there. I've got one.

Starting point is 00:51:08 So does anyone know why cryptocurrency engineers aren't allowed to vote? No, I don't know. Because they're miners. Oh, that's a good dad joke. Yeah, it is. Absolutely. Well, on that high note, let's call it a show what do you what do you say yep brian thanks as always chris thanks for joining us this time thank you very much really

Starting point is 00:51:31 appreciate it yeah bye everyone thank you for listening to python bites follow the show on twitter via at python bites that's python bites as in b-y-t-e-s and get the full show notes at pythonbytes.fm if you have a news item you want featured just visit pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #240 This is GitHub, your pilot speaking...

Topics covered in this episode: Subclassing in Python Redux * Extra, Extra, Extra7, Hear all about it!** klib Don’t forget about functools GitHub Copilot Kats Extras Joke See the full show not...es for this episode on the website at pythonbytes.fm/240

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.