Python Bytes - #311 Catching Memory Leaks with ... pytest?

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 311, recorded November 22nd, 2022. And I am Brian Akin. I'm Michael Kennedy. And I'm Marilu Cunha. So welcome, Marilu. So tell us a little bit about yourself before we jump into the topics. TLDR is a machine learning engineer at a data and AI consultancy company called Data Roots. I'm from Brazil, but I actually live in Belgium. And I guess that's all.

Starting point is 00:00:28 That's it. Thanks for having me. Thanks for showing up. It's great to have you here. Well, Michael, why don't you kick us off with the first topic? All right, let's kick it off. I've got some fun stuff. Let's see what Marilo thinks about this.

Starting point is 00:00:40 This is a little bit mathy, what I got going on here. That is not the right screen. How about that screen? So this comes in from a, one of the big friends of the show, Brian Skin, and he sent me a tweet and it just says, what? At Python Bites. And it's a, it's a quote tweet from somebody here saying, holy latexify is the sexiest thing I've ever seen. And look at this. So when I studied a ton of math and the symbols of mathematics are really important and they communicate stuff like really, really quickly. You can scan over and you see the symbol for the real numbers, or you can see the symbol for subset or infinite sum. And you're like, I know what that means.

Starting point is 00:01:30 When you translate that into Python or into computer code, it usually becomes something kind of gnarly looking, right? So the example here on this tweet has a function called solve, and it's solving the quadratic equation, I guess, just for one variation of the root, not the plus minus, but that's fine. It just says like negative B plus math dot square root B star star two. It's like symbol soup, right? So this latexify thing, latexify, latex is the language of expressing those symbols the way mathematicians would have written them in the 16th century or whatever, like the fancy flowing sort of sum symbols know, some symbols and integral symbols and whatnot. And so what this does is you just put a decorator onto that Python function.

Starting point is 00:02:11 You say latechify.withlatech. When you show that function in a notebook, it shows the math, formal mathematics of it. Wow. Like, like there's one that was doing, like I said, the quadratic equation. Another one that says if X is zero return one else return math, uh, sign of X divided by X. And then the symbols is like this sort of like branching equation, you know, like what you would write that in LaTeX conceptually. What do you think? Wow.

Starting point is 00:02:40 Is that insane? This is great. But it just changes the, the ripple of the function, I guess. Right. Like if you call the function, it's all fine. Yeah, exactly. But it just changes the rep of the function, I guess, right? Like if you call the function, it's all fine. Yeah, exactly. It doesn't change the function at all. It changes the repper or the stir.

Starting point is 00:02:51 So if you do this outside of a notebook, what it prints out, let me see if I can somehow communicate this back. So if you print it out, what it returns, do I have it here? No. Yes. There. No, that's not it. Sorry, I don't have it. What it prints out is i do i have it here no yes uh there no that's not it sorry i don't have it what it prints out is the latex escape codes so it'll say like um backslash frack of you know like it's it's a weird i don't know how to write latex i did a little bit when i was studying math and then i said that's something i never need to remember and you know shut it out

Starting point is 00:03:23 of my brain um never again yeah like that why do i need to know this i don't need to remember and you know shot it out of my brain um yeah like that why do i need to know this i don't need to know this but yeah so the the repper is is just the um the latex escape codes and then the notebooks see that and then they render it as as latex that's pretty cool and then but one of the nice things about this then is you can um you might have like the math that you're trying to convert to code and then you can like check your answer. You can just see, did I get it right in code? So yeah, it's pretty cool. That's really interesting.

Starting point is 00:03:52 Yeah, because you round trip it, right? Yeah, I'm assuming people are doing this on their own code. So they're, you know, I guess the question is about the inverse. Yeah, right. It's like, hey, if I have the math symbols, could I turn this

Starting point is 00:04:05 into a Python function? I mean, I don't see why I can't go both ways. Sure. True. But I still think it would be easier

Starting point is 00:04:10 to write the Python function than the LaTeX code for rendering it. Yeah, that's true. I think it's a pretty niche use case. Well,

Starting point is 00:04:17 you know, I'm sure. Well, I'm sure someone's going to find a use, a cool use case for it too, right?

Starting point is 00:04:22 Yeah. This is pretty interesting. We've got a couple of live comments. So madison hey madison out in the audience madison's been on the show before i'm blown away by how libraries like this are able to make math approachable i wonder how this could be used with auto-generated documentation very cool i agree and henry also says i'm guessing it's working on the bytecode like Numba, but compiling it into a human language. Yeah, compiling it into the LaTeX escape codes. Which is not human.

Starting point is 00:04:50 Which is the opposite of unireal, but it is text, right? Related to this, just... Oh, yeah. Henry. SimPy. Okay. It's using inspect, get source, and parsing the AST. Yeah, perfect.

Starting point is 00:05:03 Another thing that's amazing, if people check out like the SymPy stuff, it does some really, really interesting things. Like if you go say to like calculus, you take a limit here, it'll do similar outputs as well, right? So you could put in this and it'll actually express it as symbolic math and it won't lose precision

Starting point is 00:05:25 because it solves it symbolically. And you can say like, you know, factor this equation. So that's kind of related, but this just says given any arbitrary Python function not written in the symbolic form, just turn it into a let's act, which is pretty amazing. So anyway, thank you, Brian Skin for pointing that out. That is pretty neat.

Starting point is 00:05:42 One final comment. I could not get it to install on my Apple Silicon Mac. Maybe that detail matters, but I couldn't get it to pip install. I, out of PyPI, had to pip install the Git plus the GitHub URL

Starting point is 00:05:55 and then it would install. I don't know why, but if people want to play with it, that might be necessary. Okay. Yeah. Over to you, Brian. All right.

Starting point is 00:06:03 Well, while we're talking about math, I'm often working in the measurement world and where we care about like prefixes a lot. And, you know, a lot of people do with big numbers or small numbers. And this was actually suggested to us by Avram. And I think he's the I think he's either works on this or it's his project. It's a project called Prefixed. And what this does is it's just, it's a class. It provides a class called Float, capital F, that is a, derives from the built-in float. And it supports scientific decimal or scientific and IEC, which i'm not familiar with um prefixes uh so things like um uh scientific like k and s and things like that um if you go look at all the metric prefixes

Starting point is 00:06:54 you got like um uh there's some new ones but uh n k mega giga things like that and it just um so it adds these on to uh when you them. So it acts just like a normal float. Most of the time you can, you know, use it in math equations and everything. The interesting thing is, is if it is using, used in math, a math equation, the result will be a, one of these prefix float types. But then the nice thing about it is when you convert it to a string it uh it includes the little the little prefix prefix thing or the suffix or whatever the little uh micro or k or m or something like that um so uh i think this is actually super helpful um i'm going to use this right away because i you know i use a lot of like big big and small numbers and reporting

Starting point is 00:07:43 out just the huge thing or just the float is sometimes horrible to compare with. So this is, this is pretty cool. It's very clever. I love how, how simple the idea is. So you can just F string one of these floats and say colon 0.2 H and that'll convert it to it's and the H tells it to be either, you know, kila or micro or mega or, you know, whatever suffix is needed. That's cool.

Starting point is 00:08:07 And then there's the byte example where they said, well, I'm going to use the capital B for bytes, but that's after the formatting of the number. And then the K comes in from the float thing. So that's pretty cool. One of the other things that he passed along is there's some new prefixes so this is um this is apparently new uh new scientific prefixes over the last um for the first ones new new ones for the last 30 years apparently so we have uh uh 10 to the 21st

Starting point is 00:08:40 which is zeta and 10 to the 24th which is yada and uh then negative is zepto and yacto so these are fun maybe why why now why they they decided to like need to they have more money now and they need to come up with new uh prefixes or exactly i'm not sure why we need new prefixes but our microscopes can now see smaller things we We don't have words for things that are this small. But national debt, maybe? Yeah, very possible. But also Avram notes that prefix does handle these new ones. So cool.

Starting point is 00:09:21 Good job. Cool. One thing, Python, too, you can put the underscore right to like if you put underscore on the thousands that also that's something that makes it easier i think to to to read the numbers too that's what i was yeah like the digit grouping yeah yeah do you do that a lot not a lot i but like some whenever i can i do i think it makes it easier to to distinguish how big the number is i guess i always forget to i I know it's there, but I never use it. I think usually it's like when I'm counting the zeros

Starting point is 00:09:48 with my finger on the screen, I'm like, no, no. Maybe I just put an underscore there. It makes everyone's life easier. Yeah, I've really started doing that a lot the last couple of years, but before then I didn't. Cool. Well, what is next? Merlo, what you got for us?

Starting point is 00:10:03 I think that's me. Yeah. DBT, have you ever heard? First, you got for us um i think that's me on the screen yeah um dbt have you ever you gotta accept some cookies hold on oh my bad i'm just kidding no i'm just teasing cookie things drive me crazy man i don't know yeah yeah yeah i think it's like it's crazy how like now that it's popping up everywhere and then you see like the data the gathering all the time and this and this and it's like, okay. Yeah, yeah. But maybe dbt, have you ever heard of dbt? Is this something, cause in the data world,

Starting point is 00:10:29 in my field, it's super popular, but I don't know if it's a bubble as well. I've never heard of it. Michael never heard of it? Yeah, I think I've heard of it, but I couldn't tell you what it does, so. I was basically in the same spot. Yeah, tell us about it.

Starting point is 00:10:42 No, it's a really cool tool. It's open source as well. They have their cloud option, I guess, right? So you can pay and they host it. Maybe a disclaimer as well that I never, I always see it and I always want to use it, but I haven't found the use case. So I don't have first hand experience here,

Starting point is 00:10:56 but basically the way I would describe is that they add best practices around SQL projects. So why am I mentioning this on Python Bytes? It's built with Python. Yay. And the other thing too is that they actually mix Jinja with SQL stuff, right? So you can actually do for loops. You can do stuff like that.

Starting point is 00:11:15 So you don't have to repeat every time and just change the variable. They also have these like reference macros and stuff. So you can actually say, okay, this comes from that table that is on that file. And this comes from this. So you can actually chain a lot of these dependencies, right? Like there's a lot of projects that you have these ETL stuff, right? So you just have to basically transform at each step. And with dbt, they actually keep track of what depends on what, and you can say,

Starting point is 00:11:36 oh, I want the freshest data here and you execute everything that needs to be executed there. Wow. Yeah. So it's super cool. They actually support a lot of like data platforms here, right? So you see like BigQuery, Databricks, Snowflake, all these things as well. there um wow yeah so it's super cool they actually support a lot of like data platforms here right so you see like bigquery databricks uh snowflake all these things as well um another thing that they also more things they do they even have some data validation stuff which in my field it's a big thing too you know like maybe have an id column that needs to be unique cannot be null and you want to make sure that that always happens and if it it doesn't happen, you want to be flagged, right? So that's super cool. What else? Ah, you also have some built-in documentation. So once you have the dependencies, you can say, oh, show me the DAG, you know, show me where the

Starting point is 00:12:14 data comes from and what depends on what. So that's also super cool. And recently, actually, they actually started supporting, so like an SQL file kind of corresponds to a model, right? Oh, cookies again. And so they have sql models so that's the the one but they also started supporting python models right so this is very tied to data so now you can actually mix and match right you can say this step this transformation is in sql but this one is actually python right so the way they don't run anything on the machine they actually send it to the cloud so snowflakeflake has Snowpark, which is Python on its own Snowflake. BigQuery has Spark and Databricks as well, right?

Starting point is 00:12:50 So basically you can mix and match. This transformation is here, this transformation is there, but everything is like in a nice, put in one place. And because it's on Git as well, you can have CICD. I think also you mentioned,

Starting point is 00:13:00 I think it was you, Brian, that mentioned SQL Fluff. And SQL Fluff actually came from a DBT project as well. So, and it's all in Python. So super cool. Brian, that mentioned SQL Fluff. And SQL Fluff actually came from a dbt project as well. So, and it's all in Python. So super cool. Nice. Wow, that's really neat.

Starting point is 00:13:09 So what do the Python models look like? Are they straight Python classes or are they Pydantic or? I have, I watch, maybe I'm a bit lazy cause I just watched the video and they were showing here how it works. Cause they're also doing a comparison, right? Maybe this is, no, this doesn't work, does it? Yeah, it works. This works? Yeah, it works. It's also doing a comparison, right? Maybe this is, no, this doesn't work, does it? Yeah, it works.

Starting point is 00:13:26 This works? Yeah, it works. Okay. This is, but the quality is horrible. That's okay. But in a nutshell, you have this. Yeah. You define a function.

Starting point is 00:13:34 Yeah, you define a function that has a dbt and a session, and then you create a reference. So reference is basically a table, right? And you can say, and then from that point on, you can say two pandas, and then you can just basically use the pandas API to transform that, right? So there's still some caveats, right? Because panda is not super performant, depending on how much data you have and whatnot. So sometimes you probably still want to stick to the SQL stuff. But then it opens a lot of possibilities there too, right?

Starting point is 00:13:57 So even stuff like deploying machine learning models on the SQL infrastructure and everything. So yeah, so it's kind of the same old, same old story. You know, even if you're working with an ORM, sometimes you don't want to bring all that data back to make some minor change and then put, you would just do a sort of an update statement instead of pull back 10,000 models, change something and call save 10,000 times, right? Like it's probably that kind of trade-off, but it's really cool that you can bring it back into Python this way. What are you using it for in your work or like what are you interested in using it for well i think we have a lot of these like etl pipeline

Starting point is 00:14:29 stuff right a lot of these um we have some data here and then we want to like basically clean it up and make sure it's all uniform and put in a dashboard calculate some kpis and whatnot right and so business people can see are we doing better are we making more money or not kind of um yeah and like a lot of the times it's just sql right it's also more accessible for a lot of people so we stick to sql um but there are also limitations right but before what i've seen is uh people just kind of go in the ui and just execute stuff ad hoc right so no versioning nothing and i think this kind of puts everything in one place you can even add cicd because the cli tool and everything and just kind of make sure that everything goes to that versioned method, let's say.

Starting point is 00:15:05 I mean, and again, yeah, if you need something more fancy, right, then you can throw some Python stuff in there. But usually we try to avoid it, to be honest. I can imagine. Let's see here. Hold on. Yeah, the models, the way you express the code,

Starting point is 00:15:18 it's like, it's really nice looking for SQL, which is surprising, right? This code you write like with customers as select these fields from this, this table. Yeah. And they have, they also have like the different macros and like people can write different macros. So like the describe function in Pandas, someone can just have written that and you can import that. And like, it's, it's really nice to share like all these things as well. So super cool. Really, really eager to, to give it a try, to be honest. I've been just like, try and scratch that. Where's the next project that we could use this on?

Starting point is 00:15:49 Indeed, indeed, indeed. Yeah. All right. Brian, anything you want to add before we jump over to talking about our sponsor real quick? Yeah, let's talk about our sponsor. All right. So today's episode of Python Bytes is brought to you by Microsoft for Startups Founders Hub. Microsoft for Startups set out to understand what startups need to be successful and created a digital platform to help you overcome those challenges.

Starting point is 00:16:13 And they came up with Microsoft for Startups Founders Hub. The Founders Hub provides all founders at any stage with free resources to help solve startup challenges. The platform provides access to expert guidance, skilled resources, mentorship, and networking connections, technology benefits, and so much more. Founders Hub is truly open to all. You don't need to be investor backed, but you can be. Speed up development with free access to GitHub and the Microsoft Cloud. You can unlock credits over time, and there's also discounts and benefits from innovative companies partnering with Founders Hub, such as OpenAI. You'll have access to their mentorship network, which includes hundreds of mentors across a range of disciplines. Need advice on marketing, fundraising, idea validation? There's tons

Starting point is 00:16:55 of topics, including management and coaching. You'll be able to book one-on-one meetings with the mentors, many of whom are former founders themselves. It's no longer about who you know. Get critical support you need from Microsoft for Startup Founders Hub and make your ideas a reality today. Join the program by visiting pythonbytes.fm slash foundershub2022. That link is also in your show notes. Thanks, Microsoft, for keeping us going strong.

Starting point is 00:17:22 All right. What have I got next? This one is a chain of really cool things. So Roman Bright of Beanie fame and other things tweeted about this project that Pablo Galindo Salgado has been working on. So Pablo was the release manager for Python 3.11. It was part of the live stream of releasing that was all fun but he also i believe works at bloomberg where they work on memory and i think we spoke about memory quite a while back brian it's a memory profiling tool maybe um marlo do you use uh profilers and that kind of stuff in your world no i haven't used much haven't haven't had the need to be honest, not yet. I feel like so far, there's no, try to keep it simple. So a lot of times profilers are about performance, like how fast did this code run? And if it's slower, should I look at this loop or that loop? Or, you know, where do you spend your time making it faster?

Starting point is 00:18:15 Because it's really surprising when you look at code, like this part looks complicated. So that must be the slow part. Like, no, that doesn't matter. Nothing you do to that will make any difference. You got to look over here, right? That kind of stuff. But memory, as the name would suggest, is more about memory profiling and like talking about, you know, how many of these different

Starting point is 00:18:32 things have you allocated and those kinds of things. What is coming? Well, first, let me, let me pull up, we have a PyTest plugin, which is super cool. So with the PyTest plugin, you can do two things. Now you can say PyTest dash dash memory tests, and it'll tell you things like you can actually set limits on how much memory can be allocated for a certain operation. And if it exceeds that, it'll say, oh my gosh, there's something wrong. This thing is like way over using the memory we expected. So that's an error. But it also gives you like a cool emoji filled summary, I guess. Like total memory allocated, the number of allocations, a histogram of allocation sizes. So like Python memory has like size classes. We've talked about its block arena.

Starting point is 00:19:22 One other term that I'm forgetting that that it uses to organize these data structures. And then you can actually get it overall then for individual tests. And so it'll tell you like the different things that were allocated. And anyway, it's pretty insane. Okay. So you can get that report and then you can also, where's the other one? I think it's, there was a, there's a place where you put a decorator and you just say on this test, I,

Starting point is 00:19:47 if it exceeds this amount of allocation that should fail the unit test, it's just a pie test. Mark. Oh, cool. Memory limit or something. I don't think it's a limit or memory limit. I can't remember exactly what it's,

Starting point is 00:19:57 what it's called. You can say, if this test exceeds one memory, a one megabyte of memory allocation, then that's a failed test, which is pretty cool, right, Brian? That's really great. So they got a, yeah,

Starting point is 00:20:08 they have a limit memory decorator and a check leaks decorator. That's the one. So the check leaks is the new thing. And so what you can do now is you can say pytest.mark.checkleaks as a decorator on your test. And if there's a memory leak in the code that runs during that, it will let you know. Wow. I don't know if anyone else has tried to track down memory leaks.

Starting point is 00:20:28 I would rather track down like a multi-threaded race condition than a memory leak. I don't want anything to do with memory leaks. This is no fun. And so if I can deal with a decorator, let's do it. Well, and also decorating your tests. So you're not having to modify your code at all to do this. I mean, the code under test, you're modifying your test code your code at all to do this that mean the code under test you're modifying your test code if if at all um or it looks like it gives you some benefits even with no modification it's pretty cool yeah maybe pardon my ignorance here but when would i worry about memory leaks in python i think so imagine you're writing um imagine you're writing pandas right and you're you've written a bunch of C code

Starting point is 00:21:05 that's getting imported and you know there's a memory leak in there somewhere. And it's just like, okay, well, I don't really know how to. But then it's more like the C part is the bandage. You can also have memory leaks in the sense that you expected there to be no more things allocated

Starting point is 00:21:24 after the function was called, but you could have signed it to a global variable or you could have, you know, stored it, held onto a reference in some way that you weren't expecting. So it's not a leak in the super traditional sense, but it could build up if you're doing something wrong in Python, but certainly outside of that. So I think this is pretty cool. Really any long running service is going to have, you're going to be concerned about it. There's a lot of Python applications that are short running

Starting point is 00:21:50 and it just cleans up after itself when it's done. So there's cases, long running services, also things like maybe you care about, things that are using large amounts of data and need all of the data that they can get a hold of without wasting any. Or that's important as well. Makes sense. I'm also wondering- Yeah, if you're right at the limit. Yeah.

Starting point is 00:22:12 No, sorry. Go ahead. Go ahead. Yeah. If you're right at the limit of like, I'm using 15 and a half gigs and I don't have more than that. So I need that. Or like, I just checked the TalkPython training site has been running for seven days in one

Starting point is 00:22:23 hour. Yeah. Like if it had a memory leak, even if it's 100 kilobyte here and there, it could turn out to be a big hassle. Okay, cool. I'm wondering if you could use this for edge device stuff, if you want to limit the memory

Starting point is 00:22:36 because we know the edge device won't have that much. That's actually a really good point because if you're on one of these like circuit python little boards they've got like 256k of ram and that's that's very different than 16 gigs isn't it yeah yeah right yeah so you could test your application on a larger computer and limit it limit how much memory you give it so it's kind of right yeah i think you would want to do that with the limit then rather than the check leaks but still yeah. Yeah, but it's the same. Cool.

Starting point is 00:23:06 Yeah, awesome. All right, let's see. A couple of comments from the audience. Gareth out there. Hey, Gareth. Says, I ended up writing Docker containers that swapped out every couple hours to solve it. I mean, that's actually what a lot of people do. They're like, you know what?

Starting point is 00:23:16 If it runs more than 12 hours, it's a problem. So we just tell it to recycle itself. And then Madison says, this is so cool. I need memory profiling all the time with some of the data I do work with regularly. So people, people are digging it. Cool. Yeah. Very cool. So thank you, Roman. I know you didn't send that in to us on purpose, but you shared it with us anyway. Thanks. Nice. Over to you, Brian. Okay. Before I get onto the next topic, I want to point out that Henry Schreiner, I'm going to paraphrase him by saying, Brian, you dork. You didn't even read the article.

Starting point is 00:23:46 Yes, you're right, Henry. Sorry. So the new prefixes, I was showing the previous new ones in 91 when they added Yocto and Zepto. These are not the new ones. The new ones are down here with Rana, Quetta, Ronto, and Quecto. Yes. Rana Quetta, Ronto, and Quecto. Yes, the reason why those sounded familiar is because they've been around. These new ones, they're the new ones.

Starting point is 00:24:11 Okay, so thanks, Henry, for clarifying that. But on to the next topic is Will McCugan says, please steal my source code. So he wrote an article, Will McCugan wrote an article, said stealing open source code from so uh he wrote an article will mcclinton wrote an article said stealing open source code from textual and he says um i would like to talk about a serious issue with free and

Starting point is 00:24:32 open source software steal the stealing code you wouldn't steal a car would you and then actually he has this funny video that he embeds about like how uh like digital piracy really is like stealing. And it's sort of a funny video. But the comment is real, that you can steal code from open source projects as long as you can. So please read the MIT license or read the license to make sure that you can. And in a lot of cases, you can.

Starting point is 00:25:03 So I'm going to give an example that i i use a lot is i'll think of something that i want to do like i'm interacting with a library and i'm and maybe i don't quite get how to do that with the documentation i could search github for projects that use that library also as an example and so that's a way to to look at other source code of how to how to interact with a project that maybe doesn't have the greatest documentation. You can see how it's done. I've honestly never thought to do that. That's a great idea.

Starting point is 00:25:33 I'll go look at the tests and stuff. I'm like, these tests suck. There's not a single one that shows me this use case that I'm looking for. This is brilliant. I do that a lot with PyTest plugins because I look at how other plugins are testing their stuff and I'm like, oh, how do they do it? So the warning there is he's not advocating for piracy.

Starting point is 00:25:55 Open source code gives you explicit permission to use it. And if you're actually just copying the whole thing, you probably should reference it and use the same license, or if you're copying large chunks, but the MIT license, for example, says, says it's substantial copying.

Starting point is 00:26:11 So a little bit of copying is fine. And, and, and Will says textual has some cool stuff in it that you might want to look at. So he's got a loop. He points out some things you might want to steal the loop first and last. So he's got a loop iterator so he's got a loop uh iterator

Starting point is 00:26:25 that um he's got a couple versions of it that will not only iterate through things but it'll um it'll give you it'll note which one's the first and the last so if you need to do something uh cool on something different on the first and the last one do that um he tweeted recently or tooted or whatever about uh the lru cache as well so the python's got a built-in lru cache but everything's global so you can only kind of clear there's limits on how you can interact with it so he has a more flexible lru cache um he's got a color class that looks pretty cool that you can convert to different uh color representations that's pretty neat and then you know he's been working on a ton of geometry stuff,

Starting point is 00:27:06 2D geometry. So he's like, you might want to use this for whatever 2D geometry you're using. So here's there. So kind of cool reminder that open source, one of the benefits of open source is you get to see the source

Starting point is 00:27:17 and learn from people. I like it. I love your idea. You've never done that. I'm like, it might dance. I just can't figure this out. Oh, how are other people using it? I just get frustrated going to a new library.

Starting point is 00:27:30 This one sucks. I can't do this. I'm going to find another one. It's not good enough. Merlo, are you an open source thief? Do you do this kind of stuff? I have to admit, yes. Yes, I am.

Starting point is 00:27:39 Stack overflow thief, open source thief, especially in the early, early days, right? But I think with the rich stuff too it's very inviting for you to steal code because even the on the rich package right like if you do python dash m rich table or whatever you always show some really nice stuff on the table on the on the terminal right and i was like how does he do that and or like i think for every component he had a little demo that you can just run and it's very tempting. Even if you didn't want people to steal stuff from him,

Starting point is 00:28:08 I feel like you have a hard time just keeping the thieves away, you know? Yeah. Yeah, very cool. And funny too. I like it. Good job. Good job, Will.

Starting point is 00:28:16 Where are we at now? All right, off to Marlo's final item. Yes. This one I had not heard of either and it looks pretty interesting. Yeah, I mean, it's, I think it kind of, it's one of the things that I saw. I was like, yeah, this makes so much sense. Uh, why, how come I never, I didn't think of this before, but, uh, this is shed.

Starting point is 00:28:33 I'm a man. This is a podcast, right? So maybe, um, um, it basically, I think it's, it's related to like bike shedding, shed your, your legacy code, right? So it's like a super set of black, right? They call it black plus plus here. So they say here, maximally opinionated auto-formatting tool, right?

Starting point is 00:28:51 So it's all about convention over configuration, which is also something that I can subscribe to. They have no configuration options, but basically it's a bundling of a lot of tools, right? So they have black here, but they also have eyesort and with the profile black, so it doesn't clash. They also have pi upgrade, which I think you guys mentioned a couple of times, right? And autoflake as well. Autoflake I didn't know actually before, but basically it removes unused imports and unused

Starting point is 00:29:21 variables from your Python code. So it's kind of like, yeah, that's all I wanted. I was like, I wish I had this last week. There you go. Yeah. But yeah, it's the one stop shop and even do like a black in docs, right? So if you have doc strings or markdown or everything, you will take that. It will black format that for you. So I was like, yeah, this is what I wanted.

Starting point is 00:29:40 Okay. Hold on. Black in docs. This is new to me too. All right. Yeah. Let's see. So this is hold on. Blackened docs. This is new to me too. All right. Yeah, oh, let's see.

Starting point is 00:29:46 So this is, yeah, yeah. Run black on Python blocks, sample code blocks. Yes. So if you have rich structure text, markdown, even doc strings, it will format that for you. In the.

Starting point is 00:29:56 Oh, like, you like blackening your readme, for instance, so. Yes. Yes, yes. Ooh, okay. Yeah. This is good. Indeed, so. I have some stuff to talk about at the very end

Starting point is 00:30:07 just a little bit about blogging and writing and and some platforms and stuff and that's all in markdown like i could run this against all of my code samples on my blog to basically auto format all code in the blog that's awesome yeah yes. The next time I write a book, I'm totally going to use that. Yeah. Or if you're doing a book. Yeah. I mean, absolutely. So I literally just like yesterday, the day before I was cleaning up some code, I finally got, you know, I kind of, I don't do it clean the whole time.

Starting point is 00:30:37 I get it to work and then I like, you know, then I look at what I did stupid and there's, there might be some imports laying around that I thought I needed. Because you add an import and then you take that code out. But you sometimes forget to take the import out. So I ran black on everything, of course. And then I ran flake eight and I'm getting errors. I'm like, shoot, why didn't black just take those out? So now I've got shed and I take those out.

Starting point is 00:31:00 It does it all, right? Like it's great. Because maybe it's the same, right? Like you run flake eight, it's like, ah, yeah, unused variable. Ah, okay. Then you have to go there one by one. It feels like there should be a nicer way. Yeah, I mean, you have to pay attention to that because your unused

Starting point is 00:31:13 variable might be a typo or something. You might think you're using it. That's true. Yeah, or it's like a global variable module supposed to share with something else and it's a library. But in general, I mean, you could probably put like a hash, you know, QA or something on it.

Starting point is 00:31:29 Well, I mean, yeah. And also you're testing, so your test will catch it if you delete too much. Yeah. All right, well, really, really good one. Take your code out to the shed and whip it into shape behind the shed. That's it.

Starting point is 00:31:43 All right. All right, well, Brian, what else we got extras i got some extras you got some extras who should go first uh you go first okay well the thing that i've been working on is um is by test check and i finally got and i've been talking about this for like a month because i've been slowly uh pulling this into shape it's a almost a complete not really a rewrite, but I moved everything around and the code's a lot easier to read. And so it makes me happy. And I also changed the API. So I wanted to mention to everybody that you can either use, so you can

Starting point is 00:32:16 either use from PyTest check, import check to get this check object, or you can stick the check object as a fixture. And either way you get access to everything in the library. That's the only thing you have to do. And for people unfamiliar, PyTex Check is a library that allows you to have multiple failures per test. You know, normally the recommendation is try to fail on one thing. But sometimes you need lots of data. And I just threw in a little example that uses both. So if, like, it's using hdbx

Starting point is 00:32:45 to grab uh grab the status code and as long as the status goes 200 then i can check a whole bunch of stuff i can check to make sure the redirect and encoding is right and uh check for some some stuff inside the i mean you these could be multiple tests but if it really is you're checking multiple parts of things and for scientific work that i'm in measurement work that i do i'm often checking like uh tons of aspects of a waveform and it's really just making sure the waveform's right and that rightness is multiple checks so use that uh anyway i didn't intend to break anybody but i did break brian's skin so brian came up at the beginning of the article uh but he um tagged me in a github issue on his project and i looked at it and i'm like oh i didn't intend to break

Starting point is 00:33:30 that so i fixed it this morning so um hopefully if if anybody gets broken by this i was not intending to break anybody just let me know and i'll try to fix it that looks great how about you uh merlo i know you have some as well i'll let let you go as well. Sorry. I don't know. Yes. Maybe. Yeah, I feel like I should have opened that. I didn't have the link up here. But talking about breaking stuff,

Starting point is 00:33:54 Flake 8 is not on GitLab anymore. And I actually didn't have issues with that because with pre-commit, right? You have to specify the repo. I already was on GitHub, but I actually heard from some people that they heard a lot of noise that Flake 8 is not on gitlab anymore and then there was also this video from anthony that is maintaining right pre-commit and flake 8 he was

Starting point is 00:34:13 explaining a bit because uh why what was the motivation from going from gitlab to github and uh yeah like what's relatable is that like sometimes you break people's code but it's like it's not an intention right but sometimes people can get very heated over these things. So yeah, just maybe public service announcement, you know, change your, your Git repo to,

Starting point is 00:34:31 to GitHub now for, if you're using Flakegate as a pre-commit. Yeah. And you also had massed it on bot.py, right? Yes, yes, yes.

Starting point is 00:34:37 That I did. I just, sorry, I flipped the order. Cause I thought it was, it was, it was a segue there. Yeah,

Starting point is 00:34:43 yeah, yeah. It was, I wish I knew about this like a week ago or so. That would have been awesome. Yeah, you covered Toot, I think, right? Yes, we covered Toot. That's right.

Starting point is 00:34:52 Yeah, yeah. So this is, to be very honest, I wasn't the one that found this. It was my boss. So shout out to Bart, if you're listening right now. But this is basically just a wrapper around the Mastodon API, right? So you don't have to do requests.

Starting point is 00:35:04 You can usually have like a nice client library there to do all API, right? So you don't have to do requests. You can usually have like a nice client library there to do all these things. So if you want to play around, create some bots, you know, whatever, then yeah, there's a nice convenient package now for you to do it. This is really cool. And it has, you know what?

Starting point is 00:35:17 Documentations that say what functions it has. I love it. Documentation? Just read the code. It doesn't have to be much, like the seven or eight lines of code that are in the readme like gives you a really good boost,

Starting point is 00:35:28 but it lets you register your app, which is one of the things if you go to the website, it'll show you which apps are registered for your access keys on Mastodon, but it won't let you create one on the website. So here's like a simple create app and just give it a,

Starting point is 00:35:41 you know, your instance name and what file to save the access tokens over to and boom you're good to go yeah have you guys already done stuff with the mastodon or yeah i you know on the stream deck the thing that controls the stream i already wrote that thing when i when i push the one button it it sends out the message automatically that this live stream is starting and yeah it uses that uses a little bit of toot and mostly just the straight API with HTTPX. But if I'd known about this, you know, I would have used it. Now we know.

Starting point is 00:36:08 Yeah, I know. Thanks for sharing that. Anything else you want to share before we move on? Yes. So there are a couple more things. But this one, this is the Brazilian Nimi that couldn't resist. The World Cup started. I don't know.

Starting point is 00:36:20 Are you guys soccer fans or not at all? So we've a fun soccer team here. We go see, I go see with the kids and stuff in town. Yeah, so I'm also in machine learning. So a lot of data and like this time of the year, you know, there's a lot of like, oh yeah, the AI models are predicting this. This one is one from Oxford.

Starting point is 00:36:37 So I just wanted to give a quick shout out here. So they have a video on YouTube as well, which is cool. They explain the math. And I will go on a limb here and say they use Python because they even mentioned Matplotlib and whatnot. But this is basically just a big excuse to say that they predict Brazil to win. So, you know, if this doesn't happen, it's all rigged.

Starting point is 00:36:53 The math supports this. So Brazil must win this World Cup. And anything that is not there, I'm going to be extremely disappointed. This is really cool. People are always looking for like realistic examples to learn and explore uh libraries and tools and this you know if you're into soccer and you care about the world cup this is great yeah i think if you yeah there's there's people people are very creative i feel like there's a lot of uses for it well i'm sure this will happen because there's

Starting point is 00:37:20 absolutely no corruption in soccer so yeah yeah for sure yeah not at all not at all uh cool uh should i just keep going or you want to take over if you got more items yeah i have i have two more sorry i know you you said i could have more than two uh so uh you can just wait that's what this whole section is about um one so for me as a data scientist or machine learning engineer, we use a lot of notebooks, right? And I think they have their place in data science, but there are some tools that don't play very nicely with it, right? And I think in Git diffs or PRs, they don't play so nicely, right? So this is, I think it's public preview, I want to say, but I haven't actually seen this, but now GitHub is going to start supporting notebook diffs. So if you have a

Starting point is 00:38:04 pull request, they're going to have a nicer rendering of the notebook here and you can actually see what the differences are and i think before there was okay called review and b that you could add to github um but yeah now they're just going to start supporting it so i haven't seen how it looks but i'm pretty excited about this too one less headache for me yeah that's excellent because before the diff would just be like here's the diff of the json file you're like no that's not what i was like and also json is just json like just key value so if you just change the the order of some keys it's just like yeah you have a lot of changes but it's not you don't care yeah oh this looks really useful yeah and uh maybe one last if that's okay yeah just pull this here this is lancercer. So it's another CLI tool. I talked about linting before, right?

Starting point is 00:38:45 So this is another kind of linting. And I say kind of because... So, you know, black... Some definition of linting or cleanup, yeah. So this is like black, almost like black, but it's the opposite. So instead of making your code look nice, it would just make it like a hideous,

Starting point is 00:39:02 but working mess, right? So these are some of the features. It turns all your comments to pitbull lyrics or something safe for work depending if you want um it takes all your variable names and mix into like uh animal sounds and horribly look similar looking characters so like bark underscore bark underscore zero oh oh zero oh um it adds white spaces. It adds completely irrelevant comments and the code still runs after these improvements. So here's an example. You have here some comments and everything.

Starting point is 00:39:34 So before, like nicely formatted and then afterwards you see some comments like bada bing, bada boom, you know. There's nothing like Miami Heat, some alpha characters in your variable name. So pretty good stuff. Again, I must say I haven't used this, but this is a tool that I'm not as excited to use. I mean, there's always times that you need to send out your code to different places and you would rather share it less than more.

Starting point is 00:40:00 Thinking of like if you make a desktop app and you got to send out the code for that or whatever and you would want to obfuscate it you want to make it harder for people to just pick it up and like do so you could hit it with this they'd be like yeah no no we're just now we're just not doing that so my favorite one my favorite ones on the screen is the uh adding obvious comments like uh setting the value of some um like that's good um that wasn't in the original and it's just funny to i mean that that's actually not gibberish it's just useless um it's it's really good the the uh comments out in the live stream are really great as well people are enjoying it one of them is it's great for Twitter employees.

Starting point is 00:40:48 You can maximize your lines of code for review as it's coming up. Then you just print it out and you take it and sidebar. Like if somebody says print out my code so we can review it, they're not, they're not equipped to review the code that you may have written. Like if the word print involves in his value in code, like, no. All right. Just, I don't think so. So leave that where that is but you could

Starting point is 00:41:05 you could put this on top like yeah i'm kind of funky when i write codes it's a little different let's get used to let's yeah i just it's a farm it's a code farm oink oink oink you can have two sets of books kind of you got your real repo and then you port use this to to put it into the actual one that you submit. And you're like, I understand it. I don't know what your problem is. It works on my machine. I don't know.

Starting point is 00:41:33 I kind of want to run this on a large code base. Something really complicated. Squash all the commits. Force push. Like textual. Release it as textual oink oink or something yes i love it cool all right well this was this was a good a good find awesome thanks all right i'll i'll make mine quick here so a new youtube video i talked about how you can install the

Starting point is 00:41:58 mastodon web app on your ipad as a native app as well as on your desktop. So if you're doing that kind of stuff, not there. Basically, they just released Mastodon 4 a couple of days ago, and all the apps don't have features like edit and some of the other features that are there because they're like months behind. And so if you install the web app as an app, then guess what? It looks like an app. It acts like an app, but it has like zero latency. So as soon as something is released on the website, you get it, which is pretty cool. So people can check that out. I saw Madison in the audience sent over a call for proposals or calling all Pythonistas, if you will, for PyCascades. So PyCascades is back in person this year in Vancouver, BC. It goes from Vancouver to Seattle to Portland and cycles through that there.

Starting point is 00:42:46 But so this year it's gonna be in Vancouver. So if you wanna go up there and talk, be part of the conference, good conference. So call for proposals are open there. Yeah, but they're not open for very much longer. So jump on that. I don't remember what the date is, but. It closes Wednesday 30th.

Starting point is 00:43:03 So what is that? Yeah, eight days. Yeah. Next Wednesday. Yeah. Eight days. And Madison and I feel in the audience. Thank you. It's put in person this time. We, uh, and we really value the first time speakers in atypical talks. So get out there and put yourself out there and, and get into public speaking. It's not a huge conference, but it's, you know, it's big enough. A couple hundred people, three, four hundred people, fun time. This is just really quick and fun. You know, if you're on a Mac, you're not as likely to get viruses sent your way

Starting point is 00:43:31 that would actually be able to do something like 90% of viruses are written for Windows. But what's a really interesting fact, I just, if you do have a Mac, it turns out 50% of all macOS malware comes from one single app. Can you believe that? What is it safari

Starting point is 00:43:45 no it's um mac keeper so if if you have mac keeper it like organizes your files and it'll like clean up your your junky cache and stuff but apparently it has to take over so much permissions and it is like it can get i guess plug in or i don't know what it does but people can like plug into this and make it to all sorts of horrible stuff. So 50% of all malware is written for Mac Keeper. So if you have Mac Keeper, maybe unhavit. I recently, as of Sunday, launched a new website that I hope will bring me back to writing some more. We'll see about how that goes, but here I'm, I'm trying a new philosophy on, on blogging, Brian. I don't know how you feel about it, but I have a blog. I've been doing it for a long time, but like,

Starting point is 00:44:32 I looked, the last article I wrote was like 2020. I'm like, oh, that's not so good. And the reason is I would always try to write like 2000 word posts that are really, and I'm like, but I could post to Twitter and mass it on all day. And it's like, I can just do that. That's no problem. I don't like fall behind on Twitter. That's because these really should be super short posts. So I have, I've got this new website that I wrote that are just super short, you know, fits on a page type of articles that people can go and check out. So. Yeah. Some of the people, some people are promoting like today I learned things. Um, but sure. And why not? I mean, if you think it, I, if you think it's going to be a

Starting point is 00:45:10 thread, write a blog post. Um, exactly. Yeah. Yeah. So cool. So all of these are written and this is all based on Hugo, which is a, just learned about it, but a ridiculously cool static site generator. Either of you played with Hugo? I use it. I love it. So pythontest.com is written on Hugo. It's ridiculous, right? No, Merlo, you haven't? Sorry. No, I haven't used it, but I heard of it.

Starting point is 00:45:33 Yeah, I heard nice things. Yeah, so you basically just go to your directory of markdown files and images. You just run Hugo-d server or whatever. And then as you write, you have your web page open in your browser and it automatically sees the markdown file changes or the css changes regenerates it and refreshes your browser just so your browser could be just over there and it's just periodically as you make changes it instantly refreshes so you don't even go and refresh the page to see how that you just

Starting point is 00:45:58 write and the browser just watches and and reloads it's cool yeah and i so you got it so that you just you just push push your changes to github or your repo for and it just appears on your website exactly yeah exactly so that was my my next thing is then i set up a netlify free account with cd and ssl custom domain name push it just has a prod branch that i connected it to and when i pushed a prod boom it just goes there instantly so anyway anyway, people are looking at that. That is super cool. Push the prod. Oh, that's kind of cool.

Starting point is 00:46:28 I just, I just edit on prod. So I just log in, edit over SSH. Yeah. Just enter the server. The server is the backup. Anyway, I have stuff on the screen, but then no more backups. That's just stuff I pulled up while we're talking. So no more extras.

Starting point is 00:46:42 I mean, so yeah, fun stuff. Um, people check out the, the, the blog website and the video and, uh, I pulled up while we were talking. So no more extras. I mean, so yeah, fun stuff. People check out the, the, the blog website and the video and apply for speaking at podcast gates. Nice. Well, I feel like Lancer also was like already really funny, but do you have anything else funny for us or. I do. Although I somehow forgot to pull them up on the screen. So give me just a second here. There's two, these are really good. Okay. These are, these are pretty, pretty epic. So this one is called, I think Merlo, you'll, you'll really like this one. Cause it has to do with like algorithms and data science and it's called messing with the algorithm. And it shows this, this dude here, don't mind the thing at the bottom.

Starting point is 00:47:19 I have no idea what that's about, but see, there's this guy whose face is blurred out in the UK. I think, I can't remember where this was. I know Berlin and he's got But see, there's this guy whose face is blurred out in the UK, I think. I can't remember where this was. No, Berlin. And he's got a wagon, like a little red wagon that you pull behind you, full of 99 phones. Now, what he did is he got them all running Google Maps and left them open and started walking down the street real slow. And notice his neighborhood is now red on the map and he got it. So it thinks there's a traffic jam and it'll send cars around his neighborhood. Nice.

Starting point is 00:47:49 I want to get one of these so bad. And whenever I take my dog for a walk, just walk with the wagon behind me too. No cars. Yeah. So good. Isn't it? Yeah.

Starting point is 00:47:59 This guy's so ahead of our time. He's just like, Oh, he's so brilliant. Yeah. And for his neighbors. Yeah. The next one here is going to take a little bit of a, I got to set the stage.

Starting point is 00:48:09 Give me a second to set the history. You've heard about these motivational posters. You go to like a dentist's office, it'll be like an eagle soaring over like a sunset. Like if you don't spread your wings, you'll never soar as high as you could or something cheesy like that. Yeah. Well, there's this company called. Yes, exactly.

Starting point is 00:48:24 There's a company called Despair and Desp despair creates these, but like in reverse, they're called the demotivators. Yeah. Nice. So have you, have you seen these? No. Okay. So here's one like solutions. And what does it say? It has like a Rube Goldbergian type looking thing here. And it says solutions. This is what happens when the problem solver gets paid by the hour it's just it's just out of control here's one what is this is a frog wearing with a snail on its head says collaborate so the best of us have to carry the rest of us it's just like they're really all right so that brings us to i feel feel like this is a Brian Skin show a little bit. This tweet that he shared here and it has the latexify thing, but recursion. And for the recursion, it has that demotivator. It's a picture that said recursion. Here we go again. And then embedded in that is recursion. Here we go again. It's like that, you know, like your screen share,

Starting point is 00:49:22 you see your own screen. Yeah. So it's kind of like that poster, but for recursion. Yeah. I kind of feel bad that people, people that don't get the recursion joke, cause they can't even look it up because it just, it's redirected. It just keeps going. Like the definition is the definition. That's right. Nice.

Starting point is 00:49:39 All right. Well, that's what I got for y'all. Well, thanks everybody. And thanks Michael, of course. And thanks Merlo for coming on the show thanks for having me it was great yeah you bet bye everyone bye

Python Bytes - #311 Catching Memory Leaks with ... pytest?

Topics covered in this episode: Latexify prefixed dbt Memray pytest plugin Stealing Open Source code from Textual Shed Extras Joke See the full show notes for this episode on the website at pyth...onbytes.fm/311

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.