Python Bytes - #186 The treebeard will guard your notebook

Episode Date: June 18, 2020

Topics covered in this episode: sidetable - Create Simple Summary Tables in Pandas tabulate treebeard - ci for notebooks Upcoming features in venv/virtualenv PEP 582 now! awesome pyproject.toml pro...jects Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/186

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 186, recorded June 10th, 2020. And I'm Brian Ocken. And I'm Michael Kennedy. And this episode is actually brought to you by us. And we'll talk more about some of the ways you can support myself and Michael a little later in the show. But first, let's side table that for a little bit. Side table it. Yeah, let's put it to the side
Starting point is 00:00:26 and talk about side table. Yeah, so side table is something that I noticed as new project from Chris Moffitt. And longtime listeners of the show will definitely know that I'm inspired by visuals. And this is one of those that's really nice, right? Like not long ago when Guido was on the show, we talked about a missing number visualizer for pandas. So you could have a quick view of just like, I got this data. I just need to really quickly see kind of what it looks like, what's missing,
Starting point is 00:00:56 correlate missing elements and whatnot. And so side table is in this general Zen of things. It's like, I've loaded up some data. I just want to quickly ask some questions and get a sense of what's going on. Like I've got a Pandas data frame and I want to be able to say, can you just break this down by like,
Starting point is 00:01:13 show me the top 20% of this and then group the other stuff into just like an other category. Also, instead of just getting like a plain text output, you get a cool, like alternating row color, nice table with extra information, and whatnot. And it's usually something really, really simple. Like, I could go to the data frame and say, just give me the frequency of state and just, you know, group it by that or something.
Starting point is 00:01:38 And it's, it does a group on those and a whole bunch of cool stuff. So really, really neat visualization. There's a picture in the show notes that shows you without it and with it. And given that the nicer version requires even less typing than the not nice version, I kind of like it. Yeah, and just out of the box, having just like the alternating gray and white stripes is good.
Starting point is 00:02:02 Yeah, absolutely. So basically all you have to do is a pip install, of course, but then import side table and it adds an STB functionality to data frames, to Panda data frames. And then you can ask it questions like frequency.
Starting point is 00:02:14 There's other stuff that you can also ask. There's like a bunch of different functionality there. So really nice for exploring new data sets. And it's basically a supercharged version
Starting point is 00:02:23 of Panda's value counts with a little crosstab mixed in. So yeah, it's basically a supercharged version of pandas value counts with a little cross tab mixed in so yeah it's easy to use and if you're working with pandas especially in jupiter context you know that's really where this makes sense give it a shot i think it looks great does it good good job chris yeah and i yeah good and no i didn't i totally didn't even intend to uh do another table one back to back but. We're kicking it off with all the tables. Yeah, which one you got here? So this was a suggestion from Tom McDermott. And for the tabulate package, this is not for, it's not intended for Jupyter stuff.
Starting point is 00:02:57 It's intended for just standard out sort of things. So you want a pretty printy tabular data in Python for command line utility. Actually, I've been using this for years. I was like, I'm sure we've covered this. And I looked it up and I don't think we have, or at least I can't find it. I don't remember us covering it either. And it's really sweet. It's like it generates nice formatted tables, but in ASCII. So like before I said, you know, side table is awesome if you're going to be doing this within Jupyter but this is like if you're doing it
Starting point is 00:03:27 within just a terminal command line app. By default, you've just got like a matrix of like a list of lists or a list of tuples or something to represent the rows
Starting point is 00:03:37 and you just want to print it with tabulate. It just does it automatically but you can also, I usually use it with headers so you pass in the headers separate so the header header information and by default it just uh prints stuff out with uh prints the headers and then uh dashed lines and then your columns underneath but it also like spaces it correctly because that i mean actually that trying to get that right by yourself
Starting point is 00:04:02 by hand is just a pain to try to figure out how wide things are supposed to be and whatever. So this just does it, and it's great. Not only does it do it by text. The example that you have in the show notes really illustrates the nuance here. So it's got a list of planets that are radius and masses. And for the sun, it has it in scientific notation, like 1.989 times 10 to the 9th. And then for the other ones, it's like 5,973.6. It aligns the decimal places, not all to the right.
Starting point is 00:04:35 I mean, it's glorious. Yeah, the alignment is neat. I really appreciate that. So you have control over some of your number formatting and your alignment. But also, if you're outputting for different things there are multiple different formats including like a simple markdown type table but it also does github flavored markdown tables and pipes that just looks look nice if they just kind of make it look like boxes and there's jira style and media wiki and html and
Starting point is 00:05:03 just plain if you don't want to any sort of stuff in between just spaces in between it looks nice that's cool so you could like output this in Jira format and like paste it into a Jira issue as like here's what we're doing now or here's the problem or here's the data yeah definitely cool it's a good one if you're keeping track of tables yeah wow another good thing is all the stuff that you and I have to offer people to learn more information about lots of stuff. Yeah, absolutely.
Starting point is 00:05:29 We have the podcast, but we also have other things as well. Yeah. So if you want to support what we're doing, one of the things you can do is become a Patreon supporter. So there's a link on the page where you can throw a couple bucks at us a month if you want. But also I've got a book.
Starting point is 00:05:43 If anybody was not aware of that, there's a PyTest book. You've written a PyTest book? That's awesome. Yeah, I did. It's good. I really like it. Another podcast called Testing Code. I'd love to have more people go check that out and suggest what you want. So I'd like to have people tell me about what other topics should be covered there. Yeah. You also offer quite a few learning opportunities for people. Yeah, absolutely. The main thing for me, if you want to support me, like obviously we have the Patreon and that's great,
Starting point is 00:06:12 but if you want to support us and get something back, you could take one of our courses over at TalkPython Training. We're doing all sorts of cool stuff there. We've got like 120 hours of Python courses and exercises beyond that. But we recently just kicked off a cohort thing where people can go through as groups. So that's something I'm trying to put together. And, you know, it will probably be more opportunities to do that as well. So, yeah, check it out if you want to learn Python.
Starting point is 00:06:37 That's where I recommend people go. Yeah. I want to bring something up about your courses. There's a lot of the courses that are, there's a lot of content there. And it's wonderful information. One of the things I really love, especially in this working from home environment where I don't often have a lot of time, is the way you've broken up all the courses into little tiny pieces. So there's a table of contents so you can go through the course and see what you've seen and see what you haven't, but you can keep track of what you
Starting point is 00:07:03 haven't. And there's often just, if you've got like three to five minutes, you can fit in a little extra video. Yeah. Thank you so much. That's awesome. And I like that you've done that. Yeah. I really want to try to make the courses have meaning as a reference afterwards as well. And like, nobody wants to go back and scan a 30 minute video for that 30 second clip you're looking for. Yeah. That's good. Awesome. You know what else is really good? Tree beards. Yeah, for real, tree beards are pretty awesome. Is that like a neck beard? But for a tree.
Starting point is 00:07:31 Okay. Okay, yeah. So I actually have no idea the relationship of the neck beard to the tree beard, but tree beard is continuous integration for notebooks, which is pretty cool actually. That is cool. So this was recommended by Brian Skin
Starting point is 00:07:43 and it's continuous integration for a particular subset of notebooks those are the notebooks that are binder ready so if you're not familiar with binder i recently did a talk python episode on this and came to appreciate binder way more than i originally did so binder is a place where you can basically point a GitHub repo or some repo at go to binder, point it at your repo, say, here's the notebook, here's the dependencies files and everything. And then you just click a button and say, let me run this on binder. Cause if you go to GitHub, you see the, possibly the output from the notebook, but that's like cached what was run the last time. If you want to actually run it and play with it, you can click launch a binder. It'll fire up a little Docker image somewhere
Starting point is 00:08:29 magically in the cloud and it'll just run it. So you basically configure the repo to describe to binder what it needs to run successfully, right? So that's how this works is tribute basically says if there's something that can be run on Binder, then it will use that same functionality to automatically install the dependencies, which could be like Conda or Pip or whatever. And then it'll run the notebook using that cool library called Paper Mill, which sort of converts notebooks into kind of function type things. It'll upload the output and do an NB convert on the notebook to save it and create like a version
Starting point is 00:09:08 stamped last run of your notebook that you can go back through your continuous integration and see the history of the outputs saved as HTML which is pretty awesome. And it integrates with a GitHub app that'll like push notifications back to your repo. It integrates with Slack.
Starting point is 00:09:24 It has all kinds of interesting things like this. So really a neat mechanism to make sure that your code just keeps running if it's a notebook. Yeah, it's even got secret management. So if you have to connect with different things with passwords and stuff. Don't you just put those in the notebook? No. No? Darn it.
Starting point is 00:09:43 Yeah, no, that's really cool. Secret management and all kinds of stuff and uh basically i when i first saw this i thought okay well what's the criteria of success right like how do i write a test to indicate a successful notebook experience the way it works is basically it runs all the cells and if all the cells run without exceptions then it's successful so it's not like it's making assertions, but it's kind of like a smoke test. Like it didn't entirely explode.
Starting point is 00:10:09 So we think it's probably okay. That's not bad for a starter. But you could, I mean, conceptually you could put a search in there and that would throw an exception. Exactly. Right.
Starting point is 00:10:16 You could build in the test that like some layer in there, like have a, even a Python file that you import that like does the tests. I don't know, whatever. There's, there's a lot of options. So you write,
Starting point is 00:10:24 you could make your notebook report out or yeah, make some cells that'll blow up if things go wrong for sure. Yeah. Somebody should get ahold of us and tell us why beard. Yeah. Because trees generally don't have beards. Well, okay. We live in Oregon, so they're often very mossy.
Starting point is 00:10:41 So that's true. They've got that little moss thing. If it's just right, actually, it could totally do that for sure. Yeah. Okay. So one thing that surprised me, Brian, that seems to keep coming up and up and,
Starting point is 00:10:50 both of us are talking about it next. Like, I feel like we've aligned perfectly. So far, dude, we're both talking about virtual environments. You go first. Okay.
Starting point is 00:11:00 So there's a couple of things that we just into episode 184, we discussed virtual M and and venv. And actually, I learned quite a bit to find out that virtualenv is still pretty cool and fast. But that was in 184. But we had people get a hold of us and say, hey, there's more information that you guys don't know. And I love that. Please keep it coming. If we do half the story, give us the rest of it. In Python 3.9, so VENV, the built-in one, it has a cool new flag called upgrade depths for upgrading your dependencies. It's like not all of your dependencies, but it's for virtual environments. Every time you create one, we commented that you have to upgrade pip.
Starting point is 00:11:42 And this new flag allows it. So when you install install create a new virtual environment it automatically upgrades setup tools and pip for you yeah that's just nice that's in python 3.9 i tried it out already i tried it on beta 1 beta 3 is already out so you can try it out if you want the other news is the Virtualenv is getting something new, and it's not there yet. I'm not sure when it's coming, but I think it's soon. It's getting a feature called Periodic Update, which is super cool.
Starting point is 00:12:15 So one of the things, so Virtualenv, since it's separate from your Python, you can have it install, make virtual environments for multiple Pythons, for instance. But it also keeps its own cache of new pip, new setup tools, and new wheel, that package you need if you're creating wheels. And so it has those upgraded already, but the periodic update, it will just have this extra thing that in the background goes off and checks to see if there's new ones around so whenever you actually need to create a new virtual environment it'll automatically have uh an updated one that it can install right away which is neat yeah that's pretty cool nice and if you don't want it to go off and do in the background you can manually say okay right now i want you to go off and upgrade it right now so okay that's a cool idea i it. So you've got a better chance of having updated stuff
Starting point is 00:13:06 if you're working without an internet connection at the moment or something. It already had kept its own version of it that would upgrade it. So it's newer than if you're using VNV, but I'm excited about it. And one of the other things I wanted to mention is I kind of complained about that the prompt is different.
Starting point is 00:13:27 And I got a little bit of the skinny about why the prompt is different in virtual env versus VENV. And it had to do with the prompt formatting on different operating systems was different, which is weird. But they coalesced it and made it a single prompt and the need for like sometimes you actually want to not have a space you might not want to have those parentheses so there may be reasons to not have the parentheses in space so there's there's reasoning behind it it just still annoys me but that's okay it's cool to actually know why though that's that's really nice yeah so all these things that make working with virtual environments better are great but how about we just don't have
Starting point is 00:14:11 virtual environments but we still do wouldn't that be better i don't know so let me tell you what i'm thinking so a while ago for the 3.8 time frame there was a proposal called PEP 582. And PEP 582 is put together by a bunch of folks, Steve Dower, and four or five other people, I'm forgetting, Donald Stuff. And I know there's two other folks that I'm forgetting. Sorry about that. But anyway, it was put together. And the idea is that it proposes to add a mechanism to automatically recognize a dunder pi packages and prefer importing packages installed in there over global packages so the idea is you just go to your project and say at the top of your project go here's the top of my project and then when you pip install stuff it will put things there you won't have to activate a virtual environment because you're not changing anything
Starting point is 00:15:05 outside the global system. It's just going to drop it in right there. Basically, this is how Node.js works, right? So if I NPM install a thing, it just traverses up the directory until it finds node modules. And it's kind of like that, right? So it says, if you have this folder here, we're going to automatically install stuff there, and then Python will automatically know to look there. So if you're anywhere in a subfolder without even activating the virtual environment, and you type Python something to run a command,
Starting point is 00:15:36 as long as you're in the folder structure, it's going to use that environment. Oh, that's pretty cool. Yeah, that's pretty cool, right? So the motivation, at least, is it's like, every time someone's new to python they're like well i can't install this thing it says access denied you're like you know permission denied like well okay let me talk to you about virtual environments and why you need
Starting point is 00:15:53 them and also to activate the environment on the different shells and the different platforms like windows versus posix you know source versus source, and bin versus scripts is different. And so that's kind of a pain. So the idea, also, every time you open up a new terminal or command prompt, you've got to reactivate it. Like I've all, for all of these things,
Starting point is 00:16:15 I have aliases that make this happen, right? Yeah. So the idea here is that you don't have to worry about any of that stuff. You just have to like, init your Python project somehow. It doesn't, I don't remember seeing worry about any of that stuff. You just have to like, init your Python project somehow. It doesn't, I don't remember seeing how that was supposed to happen.
Starting point is 00:16:28 But once that PyPackages folder is there, it's like, well, that's the top of the project. We're going to install there. And you presumably could have like a fallback one at the top of your user profile or something along those lines. Yeah, you have that. So that's for the packages.
Starting point is 00:16:42 But what about in virtual environments, you can also have uh local scripts that come along entry points yeah do you know if it deals with that i don't know i don't know about it it's possible i didn't read like every word of it so it's in draft mode but i was a little confused because it says its version is python 3.8 i'm like well 3.8 shipped it should either be closed or fixed or published that seems weird so I sent a message to Steve Dower just a moment ago on Twitter and he
Starting point is 00:17:10 said that Kushal Das one of the folks proposing it I think the primary guy still working on it the text itself hasn't been updated before 3-H release which is why the header is still a little bit out of date so it's probably more like a 3-10 thing or something,
Starting point is 00:17:26 but it's still pretty cool. If you want to try to live in this world and see what it's like, David O'Connor has this thing called PyFlow, and PyFlow basically does this. It integrates with pyproject.com. Well, man, we lined it up good this week. And you go through, instead of saying pip install, you say PyFlow install. Instead of saying Python pip install you say pyflow install instead of saying
Starting point is 00:17:46 python run you say or python script you say pyflow script because it has to like re-initialize that every time because it's not actually changing something anyway it's interesting i would like to see something kind of like this i think it's pretty neat there's also some interesting possibilities around dirty envy that i'm looking into just talking to someone uh chris who has got some cool ways to have dirt env automatically activate virtual environments which would be kind of cool as well so there's a lot of a lot of stuff happening here it still kind of blows my mind there's so much action around something that feels like it's just a i don't know so so plumbing and foundational yeah but like you said it's one of the it is plumbing and foundational but it's also
Starting point is 00:18:29 one of those things that's it's one of those tripping things it's like with the loose the loose stone on the sidewalk that trips up all the new people all the time so actually so far what we've managed to do is we've managed to like spray paint a yellow line on both sides of it you know somebody needs to shave that bad boy down but right now at least it's like got a little marker on it and i just want to say thanks to louise your beer uh herbier ever on here uh sit that over and let me know about this whole project so thank you for that yeah that's nice yeah so speaking of pipe projects.com i really love, I kind of like this. I like awesome lists. So awesome lists are a thing.
Starting point is 00:19:10 We've covered many of them in the past. There's even a Python bytes awesome list. Yeah. This one is awesome pyprojects, pyproject.toml projects. So this is one of the great things about different sorts of source code lists is to go and look at examples. So this is a list of other projects that are out there that already use PyProject.toml so you can look to see how other projects are doing it. So if you want to figure out for your own project, this is helpful. For instance, a lot of the testing and formatting stuff came along early.
Starting point is 00:19:41 So Covers.py is in there, pytest, tox, black, isort. I knew all of those. Ward was a new one to me. So ward is apparently a way to test things without like string named test functions instead of function names. I haven't really played with it much other than looking at the documentation,
Starting point is 00:20:00 but it looks neat. But there's a code analysis like pylint and unimport. And the really long titled WeMake Python Style Guide, which is a linter and other stuff, but it's pretty cool. And then it has a couple links to articles about pyproject.toml
Starting point is 00:20:16 and then what I think is also neat is a list of projects that are discussing switching to pyproject.toml so you can oh yeah that's probably pretty interesting if you're deciding if you're trying to decide yourself right yeah to figure out what sort of discussions are going on in other projects as to why to switch and why not so yeah for sure pretty cool very cool yeah i think people should switch i'm using it everywhere
Starting point is 00:20:41 because it's just it's sort of easier what confused me for a little while was that it isn't, I thought it was something you needed a flit or poetry to be using, but you can use by project.com with setup tools projects also. Okay. Interesting. Yeah. I didn't know about that. Yeah. I kind of thought it was tied to some of these higher order management things like poetry and flit and so on yeah yep cool cool and like you said there's a python bytes awesome list if people like awesome lists sorry i put that at the end there people can check that out thanks jack for doing that yeah so that's our six items michael
Starting point is 00:21:16 anything extra to share with us i got something for everyone i got two things actually one follow up and one new thing first of all we had calvin on a while ago a couple shows ago was that last show show before i think a couple shows ago and we were talking about secrets and he also he's in your camp he doesn't put them in the notebook or in the right there in the source code he's doing something else but what he talked about is actually using one password as like a vault right so one password has awesome encryption and security and so a lot of the challenges revolve around well if i'm going to put them somewhere else if i just put them straight in the virtual into an environment variables well people can grab them there so maybe i want to put them some other place where it's like encrypted
Starting point is 00:22:02 or something right so he talked about his mechanism of finding all those environment variables at launch and then like just as you run your virtual environment injecting them there but storing them in one password instead of just on the file system or something like that so he did a blog post about how he's doing that and so i'm going to just link back to that yeah nice that looks pretty cool looks pretty cool. And also, I want to give a shout out to TalkPython. Specifically, the last episode, at least the time of recording, it'll probably not be by the time we publish this. But nonetheless, just recently, you were a guest on TalkPython where we talked about 15 awesome PyTest plugins,
Starting point is 00:22:39 mostly a few extensions like using with or alongside, but mostly PyTest plugins and went through things like PyTest sugar and freeze gun and all sorts of fun stuff. So people can't get enough of us. They can hear you being a guest over there talking about PyTest the entire time. Yeah. It's nice.
Starting point is 00:22:55 Yeah. That was fun. Thanks for coming on there. I like to hear myself talk so much that I also, we cross posted that on a testing code as well. So yeah, sounds good. And one of the things, so as an extra bit, did you know that I wrote a book?
Starting point is 00:23:08 Yes. Yeah, I've heard of that. No, it's a great book. I have it. I published through Pragmatic Publishers, and I just wanted to bring up that Pragmatic has a shiny new website. So the Pragmatic site is a little different, and there's an FAQ up there if people want to know why or what's different about it and for the most part it looks a lot the same to me but the entire back end is different and yeah yeah cool faster so faster is always nice makes it happy and i should work with all right have a joke let's pretend we're roommates you can be the first person and i'll be the second person okay okay okay stop by the store on the way on stop by the store on the way home from work
Starting point is 00:23:48 please stop at the market and buy one bottle of milk if they have eggs bring six i came back with six bottles of milk why the hell did you buy six bottles of milk i just said it's just the two of us what do you think man because they had eggs obviously taking this programming logic a little strong right stop by the store It's just the two of us. What do you think, man? Because they had eggs. Obviously, taking this programming logic a little strong, right? Stop by the store. If they have eggs, get a bottle of milk. If they have eggs, get six.
Starting point is 00:24:13 Cool. That's funny. Pretty good one. Takes a little bit of thinking. So glad we have it written down for people. Yeah, yeah. We can go back and study it, right? Yeah. All right.
Starting point is 00:24:21 Well, thanks a bunch, huh? Cool. All right. Thank you. Yep. Bye. Bye., thanks a bunch, huh? Cool. All right, thank you. Yep, bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter at Python Bytes.
Starting point is 00:24:30 That's Python Bytes, as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. This is Brian Ocken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.