Python Bytes - #185 This code is snooping on you (a good thing!)

Episode Date: June 12, 2020

Topics covered in this episode: MyST - Markedly Structured Text direnv Convert a Python Enum to JSON Pendulum: Python datetimes made easy PySnooper - Never use print for debugging again Fil: A New ...Python Memory Profiler for Data Scientists and Scientists Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/185

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 185, recorded June 4th, 2020. I'm Michael Kennedy. And I am Brian Ocken. And this episode is brought to you by Datadog. More on that later. Check them out at pythonbytes.fm slash datadog. Brian, I feel like we're all working from home.
Starting point is 00:00:20 Everyone's life is scrambled. Even like my sleep schedules are scrambled. Like some crazy stuff happened and I slept from like 6 to 9.30 and I was up for like four hours and I slept in. Like it's just, it's weird. Don't we need more structure in our life? Nice, nice intro. Yes, more structure.
Starting point is 00:00:36 Yeah. I'm a fan of Markdown also. Believe it, trust me, it's not a tangent. We have just a repo that we want to point people to called Myst. It's got to be called Myst, don't you think? Oh yeah, definitely. MYST, which is Markedly Structured Text.
Starting point is 00:00:53 What this is, is a fully functional Markdown parser for Sphinx. It's Markdown plus a whole bunch of stuff from Restructured Text. Restructured Text. Myst allows you to write Sphinx documentation entirely in Markdown, and things that you could do in restructured text,
Starting point is 00:01:13 but could not do in Markdown, have been put in a, there's a new flavor of Markdown, so you can do all of your directives and all sorts of cool things, like anything you could do in restructured text with Sphinx you can now do in Markdown. It's based on Common Mark and some other tools, so they're standing on other tools that are already doing things really well and just extending them a bit. But this is pretty powerful. One of the things I like about this is I particularly don't use a lot of Sphinx, but this also includes a standalone parser.
Starting point is 00:01:48 So you can see how somebody's extended Markdown for these extra directives and even use some of them in your own code if you want. Yeah, this looks really, really nice. Like restructured text is good and all, but I don't know. If I'm going to write something like restructured text my heart just wants to write markdown i gotta tell you yeah me too and and the the i think one of the things that was holding a lot of people back is some of the uh the extra directives the like information boxes and other things like that that you you can't necessarily do in markdown off the shelf but some extensions are nice i played with it a little bit doing some,
Starting point is 00:02:25 just, I didn't pull it down with Sphinx. I just pulled it down so that I could run some markdown through it and some of the extra directives to see what it has. So for instance, some of the directives, like I tried like an information box, you can have structure around putting an information box somewhere. And what you end up with is a div that has a class to it. Oh, nice. If you're not using Sphinx, then you'll have to use your own CSS, I guess, to style it. But it puts in enough hooks for you to be able to do that. That's really nice. I do wish you could sort of indicate CSS styles and markdown because, wow, that would be the end of what you need HTML for, for many, many things. That would be nice. So last week, you brought up dirinf. We were talking about how do you store
Starting point is 00:03:13 your secrets? How do you activate and configure different environments? I think I even said something about specifying where Python was running. I don't remember what the context was exactly, but you're like dirinf. And actually, i've been meaning to cover this dunder dan i linked him on twitter don't know his last name is thanks dan uh sent this over to us as a recommendation and i'm like yeah like you brought it up it seems definitely cool so let me tell you about dur and d-i-r-e-n-v so it's an extension that goes into your shell. And normally what you do is you open your shell and it runs your bash RC, ZHRC, whatever, and it sets up some stuff. Or if you're over on Windows, it works a little bit different. But I think DERENV is only for the POSIX type systems.
Starting point is 00:03:59 Anyway, it'll set up some values that you put in there, like environment variables and whatnot. And that's just global, right? You can also set up when you activate a virtual environment to export other values. That's pretty cool. But what it doesn't really do is allow you to have like a hierarchy of values. So if I'm in the subdirectory over here, I want this version of Python active, or this version of where the flask app lives. And then if I change to another directory, I want it to automatically go, well, that means different values and dir env basically does that. Oh, nice. Yeah. So as you go into different parts of your folder system, it'll look for certain files dot envenvrc. And if it finds that,
Starting point is 00:04:48 it'll automatically grab basically all the exports and then jam them into whatever your shell is. And it's also cool because it's not a shell, right? It's not like, well, here's a shell that has this cool feature. It works with Bash, ZShell, TCShell, Phish, and others, right? So it's basically a hook that gets installed for, like I use OhMyZShell because, oh my gosh, it's awesome.
Starting point is 00:05:09 And then I would just plug this into it, and as I do stuff with ZShell, it will just apply its magic. Yeah, and so one of the things you can do with this is to automatically set a virtual environment if you go into special directories. That's not the only thing it can do but that's one of the reasons why a lot of people use it right you basically well i guess you can't do aliases you can't change what python means but you can say where the python
Starting point is 00:05:34 path is yeah yeah that's one of the things that's a limitation of this that people should be aware of is it doesn't the way to think of it is not as a sub rc right it's not a sub bash rc where like it runs aliases and all sorts of stuff the way it works is it runs a bash shell like a little tiny hidden bash shell it imports that as the bash rc and it captures what the exported variables are throws away that shell and then jams that into whatever active shell you have like z shell or bash or fish or whatever yeah i would probably use this all the time if i weren't wasn't somebody that used both windows and mac yeah in linux frequently so you'll probably i bet somebody could come up with this thing for windows as well it's just
Starting point is 00:06:17 got to be like totally from scratch different type of thing right people have already pointed me to uh to windows versions of it but it's one of those things of like, you got to jump through hoops to make it work. And it's just not, for me, it's not solving a big enough problem that I have that I need to jump through the hoops. But I agree. I agree.
Starting point is 00:06:36 It is cool, but it doesn't, it's not like life-changing in that regard. I guess one more thing to point out is it's, you don't have to like go to the directory where the environment RC file is. It looks up the parent directories until it finds one. So you have this like hierarchy, like I'm down here in the, you know, like views part of my website and the top level of that get repo. I have one of these EMV RCs. It would find that and like activate that for you.
Starting point is 00:07:02 So that's pretty cool that it has, it's kind of like Node.js where the Node modules live in that regard. That's pretty cool. Yeah, that's a really nice feature. Yeah, for sure. Also nice, Datadog. So before we get to the next thing, let me talk about them real quick. They're supporting the show.
Starting point is 00:07:16 So thank you. They've been sponsors for a long time. Please check them out, see what they're offering. It's good software and it helps support the show. So if you're having trouble visualizing bottlenecks and latency in your app and you're not sure where the issues are coming from or how to solve it you can use datadog's end-to-end monitoring platform with their customizable built-in dashboards to collect metrics and visualize app performance in real time they automatically
Starting point is 00:07:38 correlate logs and traces at the individual level of requests allowing you to troubleshoot your apps and track requests across tiers. Plus their service map automatically plots the flow of these requests across your application architecture. So you can understand dependencies and proactively monitor performance of your apps. So be the hero that got that app
Starting point is 00:07:57 at your company back on track. Get started with a free trial at pythonbytes.fm slash Datadog. You can get a cool shirt. All right, Brian, what's next? Yep, thanks, datadog. I had a problem. So my problem was a little application that had a database.
Starting point is 00:08:12 And it was a, I was using TinyDB just for development. You could use Mongo, similar. It's a document database. Throw in some data into it, no problems. But I, that was one of the values that I decided to change to use Python enums because I thought enums are cool. I don't use them very often. I'll give these a shot because they seem like perfect.
Starting point is 00:08:31 And then everything blew up because I couldn't save it to the database because enums are not serializable by default. So I'm like, there's got to be an easy workaround for this. And I first ran around, ran into questions about or topics about creating your own serializer. That just didn't seem like something I wanted to do. You could do it, but it's not so fun, right? Yeah. Well, so I ran across an article, a little short article written by Alexander Holtner called Convert a Python Enum to JSON. And I didn't need it converted to JSON, but I did need it serializable. And the trick is to just,
Starting point is 00:09:10 if you're, you're doing your, when you use enums, you, you do from enum import the capital enum type, and then you have a class that derives from that. And then you have your values. Well,
Starting point is 00:09:22 if you also derive from not just enum, but another solid, a concrete type concrete type like int or string, and in my case I used string so that my string values would be stored, now it is serializable, and it works just the same as it always did before. It's just it uses the serializer from the other type. And it just works incredible so for instance i'm i'm going to put a little example in the show notes about using a color with just red and blue and if you just you derive from enum you can't convert it to json because it's not serializable
Starting point is 00:09:58 you can either do an int enum which is a built-in one or uh combine str and enum, now it serializes just to the string red and blue if that's the values. And then that's what's stored in your database too. So when I'm using, it's really handy for debugging to be able to have these readable values as well. Yeah, this is really cool. It's a little bit like abstract-based classes versus concrete classes or something like that, right? You've like the sort of general enum, but if you do the int enum, then it has this other capability,
Starting point is 00:10:29 which is cool. Yeah, multiple inheritance, str, enum is the one you went for, right? Yeah, so the multiple inheritance is the thing that Alexander recommended in his post. That's what I'm using. It works just fine. But I was interested to find out that in the Python documentation for IntiNum, IntiNum is almost just there as an example to say, we realized that it might not be integers that you want. You might want something else. But there's an example right in the Python documentation on using multiple inheritance to create your own type. It doesn't talk about serializability there, but that's one of the benefits. Yeah, it seems like it works anyway. Awesome. How much time did it take you to figure that out? Was it a long time? No, I don't know. 10 minutes of Googling.
Starting point is 00:11:11 Yeah, that's pretty cool. Well, you could compute it with Python, of course, but you know, the datetimes in Python and time spans, they're pretty good actually, but they're a little bit lacking. There's certain types of things you might want to do with them and so there's a couple of replacement libraries and one that tucker beck sent over it's called pendulum that's pretty cool have you played with pendulum i haven't but i like the name yeah i do too it's it's really good i've played with arrow so this is a little bit like arrow but it doesn't seem like it tries to solve exactly the same problem that's just like let's make python date times and time deltas better which is kind of the goal of both of them so it's more or less a drop-in replacement for the same problem. There's just like, let's make Python date times and time deltas better, which is kind of the goal of both of them. So it's more or less a drop in replacement for standard date time. So you can create like time deltas, which are pretty cool. Like I could say
Starting point is 00:11:54 pendulum dot duration days equals 15. I have this duration and it has more properties than the standard date time or the time Delta. You get like total seconds or something like that but that's you know that's not that helpful so this one has like duration dot weeks duration dot hours and so on which is pretty cool you can ask for the duration in hours like the total number of hours not just the number of hour you know like three hours and two days or whatever but you also have this cool like human friendly version so i can say duration in words and give it a locale and say like locale is us english and it'll say that's two weeks in one day nice you can also like let's suppose i'm trying to do some work with like
Starting point is 00:12:37 calendars or some kind of difference i say the time from here to there i want to do something for every weekday that appears, right? So skip Saturday and Sunday, but if it's like from Thursday to Wednesday, I need to go Thursday, Friday, Monday, Tuesday, Wednesday, yeah? So I could say pendulum.now, and then I could go from that and subtract three days. So that would be a period of three days.
Starting point is 00:12:59 And that gives you what they call a period, which is a little bit different. And then I can go to it and say, convert yourself to in weekdays okay right not interesting then you can loop over it you can say for each day or each time period in this period and go it would go you know over the weekdays that are involved in that time span that's pretty cool yeah because that would not be so much fun to do yourself right there's a bunch of stuff that it does and and I don't want to go read all the capabilities
Starting point is 00:13:28 and everything, but that gives you a sense. If these are the kinds of problems you're trying to work through, and you're like, man, this is a challenge to do with a built-in one, check out Pendulum. Also, check out Arrow. I think we've covered Arrow a long time ago. If we haven't, I'll cover it at some point. It's a good one. Yeah, and I think, actually, I don't think it's a matter of which one's the best, either. It's a good one. Yeah, and I think, actually, I don't think it's a matter of which one's the best either. It's whatever seems to speak to you and has an API that thinks like you do. Yeah.
Starting point is 00:13:52 It's good that lots of people have solved things like this. Yep, absolutely. All right, well, what's this next one? Are you trying to be like a private detective or what's going on with this? Yeah, a private detective. Looking into and spying on your code so this was sent off by a twitter account called pylang and this is pi snooper the claim is never
Starting point is 00:14:14 used print for debugging again and i have to admit i am one to lean on the print statement every once in a while especially if i'm, sometimes I don't really want to do a use breakpoint because I've got some code that's getting hit a lot. And I really do want to see what it looks like over time. So one of the things that people often do is throw a print statement somewhere in a line just to say, Hey, I'm here. The other thing they do is like print out a variable name right after an assignment so that they can see when it changes. But that's exactly what it was this. And now it's that. Yeah.
Starting point is 00:14:47 So this is exactly kind of what it does. So by default, it's just a, you can throw a decorator onto a function. That's the easiest way to apply it for by Snooper decorate a function. And now every time that function gets run, you get a play by play log of your function. And what it logs is it logs the parameters that gets past your function.
Starting point is 00:15:08 It logs all the output of your function, but also every line of the code of the function that gets run. And every time a variable is changed, it changes its value. And then even at the end, it tells you the elapsed time for the function. So that's quite a bit. If that's great for you, great. But if it's too much information, you can also isolate it with a with block and just take a section of your function under test
Starting point is 00:15:32 and just log a subset. And then if local variables are not enough and you're changing some global variable, you can tell it to watch that as well. Anyway, it's a pretty simple API and there's actually quite a few times I think I'll probably reach for this. When I first saw this, I'm like, ah, yeah, it's kind of cool. There's a lot of these
Starting point is 00:15:52 replacements where I think like, you know what, you've got PyCharm, or you've got VS Code, you're better off just sitting at breakpoint. And the tooling is so much better than like, say, PDB or something like that right yeah this though this solves a problem that always frustrates me when i'm doing debugging which is you're going around you've got to keep a track in your mind okay this value was that now it's this and then it became that and like sort of the flow of data like at any frozen point you can see really well with the visual debuggers right like pycharmCharm and whatnot, what the state is. You can see even what's changed, but like this number of when this list was empty, empty,
Starting point is 00:16:29 then this was added, then this was added. And here's how it evolved over time. You know, people should check out the read me for this because that view of it is like, there's a loop where it shows going through the loop four times. And as like all the values and variables like build up, so you can just like review it and see how it flows. I think it's pretty sweet, actually. Yeah, one of the other things that I forgot to mention is if you're debugging a process on a server, maybe you've got a small service that's running,
Starting point is 00:16:57 and instead of standard out, you can pipe these logs to a file and review them later. Yeah, definitely for a server as well, it would be nice to flip that on. And I guess with the conditional, but you could probably even encode, say, do you feel like you're running into trouble? Turn on the PySnooper for a minute and then turn it off.
Starting point is 00:17:16 You know, like there's probably options there. But yeah, you definitely wouldn't want to attach a real debugger to like production. Dude, why wasn't the site work site work oh somebody's got to go back to their desk and hit f you know f5 or continue or whatever yeah that's not going to go well so i have something that's uh pretty similar to follow this up with that's you know this is about debugging and seeing how your code is running like per usual we talk about one tool and people are like oh yeah but did you know about?
Starting point is 00:17:45 So we've talked about Austin and we've talked about some of the other cool debugger profilers. And so over on PyCoders, they talked about FIL, F-I-L, which is a new memory profiler for data scientists and, well, general scientists and you might wonder like why do data scientists right you know biologists why can't they just use our memory profile like why is austin not their thing right and it may or may not be like it may answer some great questions for them like obviously they do a lot of computational stuff making that go much faster faster to let some ask more questions right so maybe profilers in general are like things they should pay attention to but you know when they talk about this they say look there's a really big difference between servers and like data pipeline or sort of imperative just top to bottom code we're just going to run scripts sort of right and that's what scientists and data
Starting point is 00:18:41 scientists do a lot so like i just need to do this computation and get the answer. So with servers, if you're worried about memory, remember this is a memory profiler, what you're worried mostly about is, you know, this has been running for three hours. Now the server's out of memory. That's a problem, right? Like it's, it's probably an issue of a memory leak somewhere. Something is hanging on to a reference that it shouldn't. And it like builds up over time, like cruft, and it just eventually wears it down. And it's just like bloated, you know, with too much memory, right? So that's the server problem. And I think
Starting point is 00:19:15 that's what a lot of the tooling is built for. But data pipelines, they go and they just run top to bottom. And they don't, for the most most part don't really care about memory leaks because they're only going to run for 10 seconds but what they need to know is if i'm using too much memory what line of code allocated that memory like i need to know what line where i'm using too much memory and how can i like maybe use a generator instead of a function in a list or something like that right so that's what the focus of this tool is is it's like it's going to show you exactly what your peak memory usage is and what line of code is responsible for it this is actually pretty cool it is right at first i thought what is this like
Starting point is 00:19:56 why do they need their own thing but as i'm looking through i'm like yeah this is actually pretty cool and if you go to the site you can actually see they give you this graph, like a nice visualization of like, here are the lines of code. And then it's like more red or less red, depending on how much memory it's allocated. Oh, wow. Yeah. And then the total amount, and you can like dive into like, okay, well, I need to see like this loop or this sub function that I'm calling. How much is it? So you can like navigate through this visual, like pink gray of like memory badness i guess i
Starting point is 00:20:25 don't know memory usage yeah it's not bad right no yeah and when you're staring at code it's not obvious where the huge array might get generated or used yeah and the example they have here it's like okay well they have a function called make big array okay so like probably you might look there and there's also things like like using numpy like okay here we're creating a bunch of stuff with numpy and you might say well here's the numpy thing that we're doing that makes too much but you could be doing like a whole bunch of you know numpy and pandas work and like one line is actually responsible but you're probably pretty sure it has to do with pandas but you're not sure where exactly right so you could you know dig into it and see i think it's cool yeah we thought we
Starting point is 00:21:03 were using arrays and suddenly we have this huge matrix that accidentally. Exactly. Why is all this stuff still in here? Yeah, cool. Well, anyway, if you're doing data science and you care about memory pressure, this thing seems super easy. It even has like a try it on your own code on the website,
Starting point is 00:21:20 which I don't know what that means. That's crazy. Not uploading my code there, but it's fine. All right, well, Brian, that's it for our main items. You got anything? I don't. I've just been trying to get through the day lately. Yeah, I hear you.
Starting point is 00:21:34 Well, I have one really quick announcement and then an unannouncement in a sense. So I sent out a message to a ton of people. So unannouncement is for them. So what I'm trying to do is i'm trying to create some communities for students going through the courses to go through them together and i'm calling these cohorts right so i set up like a beginner python cohort in a web python cohort and put like 20 or 30 people i had 20 or 30 slots let's say for people to go through over like three or four
Starting point is 00:22:02 three months or so where they each work a little like they all work on the same part of the course at the same time. And they're there to help each other. There's like private Slack channels and other stuff around it. So that's really fun. But it turns out that after one day of having that open, I got many hundreds of applicants for like 20 spots. So I had to stop taking applications. so if people got those messages and like i want to apply but it looks like the form is down it's because there's like an insane number of applicants per uh spot so those will come back and people can sign up to get notified there's a link in the show notes but i just want to say like that's what i was doing which is fun but for those of you who didn't get a chance to apply because it got closed right away that's why and that's for training at talk python.fm yes exactly
Starting point is 00:22:49 so there's like certain courses and if you got one of the courses and you want to go through it with a group of students all on the same schedule this was like a free thing that i was doing to try that out yeah right i think it's a neat idea yeah thanks yeah people. Yeah, people seem to like it. Yeah, too many. But yeah, we've got to give it a try, get it dialed in, then we can open up some more groups. Yeah. All right, well, I've got a joke
Starting point is 00:23:12 I kind of like for you here. I love this one. Are you ready for it? Yeah. You want to be, why don't I be the junior dev? You can be the senior dev. So the junior dev and senior dev
Starting point is 00:23:20 are having a chat. And I feel like that you may be a little skeptical of what I've done here. Let's just do this. Sorry, I want you to hit me having a chat. And I feel like that you may be a little skeptical of what I've done here. Let's just do this. Sorry, I want you to hit me with a question. Okay. So where did you get the code that does this? Where did you get the code from?
Starting point is 00:23:32 Oh, I got it from Stack Overflow. Was it from the question part or the answer part? Isn't that so good? It's like people say copy from Stack Overflow is bad. I think this is the really question you definitely don't want to copy from the question part yeah but actually i've never heard anybody like you know spell that out you know i know i know look up stuff on stack overflow but at the top with the question don't copy that that's the code that somebody's saying
Starting point is 00:23:59 this doesn't work yeah exactly. That's funny. Yeah. This is a good one. It's too funny. It's too funny. All right. Well, thanks as always. Great to chat with you and share these things with everyone. Thank you.
Starting point is 00:24:14 Yeah. Bye-bye. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy.
Starting point is 00:24:34 Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.