Python Bytes - #311 Catching Memory Leaks with ... pytest?
Episode Date: November 24, 2022Topics covered in this episode: Latexify prefixed dbt Memray pytest plugin Stealing Open Source code from Textual Shed Extras Joke See the full show notes for this episode on the website at pyth...onbytes.fm/311
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly
to your earbuds. This is episode 311, recorded November 22nd, 2022. And I am Brian Akin.
I'm Michael Kennedy.
And I'm Marilu Cunha.
So welcome, Marilu. So tell us a little bit about yourself before we jump into the topics.
TLDR is a machine learning engineer at a data and AI consultancy company called Data Roots.
I'm from Brazil, but I actually live in Belgium.
And I guess that's all.
That's it.
Thanks for having me.
Thanks for showing up.
It's great to have you here.
Well, Michael, why don't you kick us off with the first topic?
All right, let's kick it off.
I've got some fun stuff.
Let's see what Marilo thinks about this.
This is a little bit mathy, what I got going on here.
That is not the right screen. How
about that screen? So this comes in from a, one of the big friends of the show, Brian Skin, and he
sent me a tweet and it just says, what? At Python Bites. And it's a, it's a quote tweet from somebody
here saying, holy latexify is the sexiest thing I've ever seen. And look at this. So when I studied a ton of math
and the symbols of mathematics are really important and they communicate stuff like
really, really quickly. You can scan over and you see the symbol for the real numbers,
or you can see the symbol for subset or infinite sum. And you're like, I know what that means.
When you translate that into Python or into computer code, it usually becomes something kind of gnarly looking, right? So the example here on this tweet has a function called solve,
and it's solving the quadratic equation, I guess, just for one variation of the root,
not the plus minus, but that's fine. It just says like negative B plus math dot square root B star star two.
It's like symbol soup, right?
So this latexify thing, latexify, latex is the language of expressing those symbols the
way mathematicians would have written them in the 16th century or whatever, like the
fancy flowing sort of sum symbols know, some symbols and integral
symbols and whatnot. And so what this does is you just put a decorator onto that Python function.
You say latechify.withlatech. When you show that function in a notebook, it shows the math,
formal mathematics of it. Wow. Like, like there's one that was doing, like I said, the quadratic equation.
Another one that says if X is zero return one else return math, uh, sign of X divided
by X.
And then the symbols is like this sort of like branching equation, you know, like what
you would write that in LaTeX conceptually.
What do you think?
Wow.
Is that insane?
This is great.
But it just changes the, the ripple of the function, I guess.
Right. Like if you call the function, it's all fine. Yeah, exactly. But it just changes the rep of the function, I guess, right?
Like if you call the function, it's all fine.
Yeah, exactly.
It doesn't change the function at all.
It changes the repper or the stir.
So if you do this outside of a notebook, what it prints out, let me see if I can somehow
communicate this back.
So if you print it out, what it returns, do I have it here?
No.
Yes.
There. No, that's not it. Sorry, I don't have it. What it prints out is i do i have it here no yes uh there no that's not it sorry i don't have it what it prints out is the latex escape codes so it'll say like um backslash frack of
you know like it's it's a weird i don't know how to write latex i did a little bit when i was
studying math and then i said that's something i never need to remember and you know shut it out
of my brain um never again yeah like that why do i need to know this i don't need to remember and you know shot it out of my brain um yeah like that why do i need to know
this i don't need to know this but yeah so the the repper is is just the um the latex escape codes
and then the notebooks see that and then they render it as as latex that's pretty cool and then
but one of the nice things about this then is you can um you might have like the math that you're
trying to convert to code and then you can like check your answer.
You can just see, did I get it right in code?
So yeah, it's pretty cool.
That's really interesting.
Yeah, because you round trip it, right?
Yeah, I'm assuming people are doing this
on their own code.
So they're, you know,
I guess the question is about the inverse.
Yeah, right.
It's like, hey, if I have the math symbols,
could I turn this
into a Python function?
I mean,
I don't see why
I can't go both ways.
Sure.
True.
But I still think
it would be easier
to write the Python function
than the LaTeX code
for rendering it.
Yeah,
that's true.
I think it's a pretty
niche use case.
Well,
you know,
I'm sure.
Well,
I'm sure someone's
going to find a use,
a cool use case
for it too,
right?
Yeah.
This is pretty interesting.
We've got a couple of live comments. So madison hey madison out in the audience madison's been on
the show before i'm blown away by how libraries like this are able to make math approachable
i wonder how this could be used with auto-generated documentation very cool i agree and henry also
says i'm guessing it's working on the bytecode like Numba, but compiling it into a human language.
Yeah, compiling it into the LaTeX escape codes.
Which is not human.
Which is the opposite of unireal, but it is text, right?
Related to this, just...
Oh, yeah.
Henry.
SimPy.
Okay.
It's using inspect, get source, and parsing the AST.
Yeah, perfect.
Another thing that's amazing, if people check out like the SymPy stuff,
it does some really, really interesting things.
Like if you go say to like calculus,
you take a limit here,
it'll do similar outputs as well, right?
So you could put in this
and it'll actually express it as symbolic math
and it won't lose precision
because it solves it symbolically.
And you can say like, you know, factor this equation.
So that's kind of related,
but this just says given any arbitrary Python function
not written in the symbolic form,
just turn it into a let's act, which is pretty amazing.
So anyway, thank you, Brian Skin for pointing that out.
That is pretty neat.
One final comment.
I could not get it to install
on my Apple Silicon Mac.
Maybe that detail matters,
but I couldn't get it to pip install.
I, out of PyPI,
had to pip install the Git plus
the GitHub URL
and then it would install.
I don't know why,
but if people want to play with it,
that might be necessary.
Okay.
Yeah.
Over to you, Brian.
All right.
Well, while we're talking about math,
I'm often working in the measurement world and where we care about like prefixes a lot.
And, you know, a lot of people do with big numbers or small numbers.
And this was actually suggested to us by Avram.
And I think he's the I think he's either works on this or it's his project. It's a project called Prefixed. And what this does is it's just, it's a class.
It provides a class called Float, capital F, that is a, derives from the built-in float.
And it supports scientific decimal or scientific and IEC, which i'm not familiar with um prefixes uh so things like
um uh scientific like k and s and things like that um if you go look at all the metric prefixes
you got like um uh there's some new ones but uh n k mega giga things like that and it just um so
it adds these on to uh when you them. So it acts just like a normal
float. Most of the time you can, you know, use it in math equations and everything. The interesting
thing is, is if it is using, used in math, a math equation, the result will be a, one of these
prefix float types. But then the nice thing about it is when you convert it to a string it uh it
includes the little the little prefix prefix thing or the suffix or whatever the little uh micro or
k or m or something like that um so uh i think this is actually super helpful um i'm going to
use this right away because i you know i use a lot of like big big and small numbers and reporting
out just the huge thing or just the float is sometimes horrible to compare with.
So this is, this is pretty cool.
It's very clever.
I love how, how simple the idea is.
So you can just F string one of these floats and say colon 0.2 H and that'll convert it
to it's and the H tells it to be either, you know, kila or micro or mega or, you know,
whatever suffix is needed.
That's cool.
And then there's the byte example where they said,
well, I'm going to use the capital B for bytes,
but that's after the formatting of the number.
And then the K comes in from the float thing.
So that's pretty cool.
One of the other things that he passed along is there's
some new prefixes so this is um this is apparently new uh new scientific prefixes over the last um
for the first ones new new ones for the last 30 years apparently so we have uh uh 10 to the 21st
which is zeta and 10 to the 24th which is yada and uh then negative is zepto and yacto
so these are fun maybe why why now why they they decided to like need to they have more money now
and they need to come up with new uh prefixes or exactly i'm not sure why we need new prefixes but
our microscopes can now see smaller things we We don't have words for things that are this small.
But national debt, maybe?
Yeah, very possible.
But also Avram notes that prefix does handle these new ones.
So cool.
Good job.
Cool.
One thing, Python, too, you can put the underscore right to like if you
put underscore on the thousands that also that's something that makes it easier i think to to to
read the numbers too that's what i was yeah like the digit grouping yeah yeah do you do that a lot
not a lot i but like some whenever i can i do i think it makes it easier to to distinguish how
big the number is i guess i always forget to i I know it's there, but I never use it.
I think usually it's like when I'm counting the zeros
with my finger on the screen, I'm like, no, no.
Maybe I just put an underscore there.
It makes everyone's life easier.
Yeah, I've really started doing that a lot
the last couple of years, but before then I didn't.
Cool.
Well, what is next?
Merlo, what you got for us?
I think that's me.
Yeah.
DBT, have you ever heard? First, you got for us um i think that's me on the screen yeah um dbt have you ever you gotta accept some cookies hold on oh my bad i'm just kidding no i'm just teasing
cookie things drive me crazy man i don't know yeah yeah yeah i think it's like it's crazy how like
now that it's popping up everywhere and then you see like the data the gathering all the time and
this and this and it's like, okay. Yeah, yeah.
But maybe dbt, have you ever heard of dbt?
Is this something, cause in the data world,
in my field, it's super popular,
but I don't know if it's a bubble as well.
I've never heard of it.
Michael never heard of it?
Yeah, I think I've heard of it,
but I couldn't tell you what it does, so.
I was basically in the same spot.
Yeah, tell us about it.
No, it's a really cool tool.
It's open source as well.
They have their cloud option, I guess, right?
So you can pay and they host it.
Maybe a disclaimer as well that I never,
I always see it and I always want to use it,
but I haven't found the use case.
So I don't have first hand experience here,
but basically the way I would describe
is that they add best practices around SQL projects.
So why am I mentioning this on Python Bytes? It's built with Python.
Yay.
And the other thing too is that
they actually mix Jinja with SQL stuff, right?
So you can actually do for loops.
You can do stuff like that.
So you don't have to repeat every time
and just change the variable.
They also have these like reference macros and stuff.
So you can actually say, okay,
this comes from that table that is on that file.
And this comes from this.
So you can actually chain a lot of these dependencies, right? Like there's a lot of projects that you have these ETL stuff, right? So you just have to basically transform
at each step. And with dbt, they actually keep track of what depends on what, and you can say,
oh, I want the freshest data here and you execute everything that needs to be executed there.
Wow. Yeah. So it's super cool. They actually support a lot of like data platforms here,
right? So you see like BigQuery, Databricks, Snowflake, all these things as well. there um wow yeah so it's super cool they actually support a lot of like data platforms here right so
you see like bigquery databricks uh snowflake all these things as well um another thing that they
also more things they do they even have some data validation stuff which in my field it's a big
thing too you know like maybe have an id column that needs to be unique cannot be null and you
want to make sure that that always happens and if it it doesn't happen, you want to be flagged, right? So that's super cool. What else? Ah, you also have some built-in documentation.
So once you have the dependencies, you can say, oh, show me the DAG, you know, show me where the
data comes from and what depends on what. So that's also super cool. And recently, actually,
they actually started supporting, so like an SQL file kind of corresponds to a model, right?
Oh, cookies again. And so they have sql models
so that's the the one but they also started supporting python models right so this is very
tied to data so now you can actually mix and match right you can say this step this transformation is
in sql but this one is actually python right so the way they don't run anything on the machine
they actually send it to the cloud so snowflakeflake has Snowpark, which is Python on its own Snowflake.
BigQuery has Spark and Databricks as well, right?
So basically you can mix and match.
This transformation is here,
this transformation is there,
but everything is like in a nice,
put in one place.
And because it's on Git as well,
you can have CICD.
I think also you mentioned,
I think it was you, Brian,
that mentioned SQL Fluff.
And SQL Fluff actually came
from a DBT project as well. So, and it's all in Python. So super cool. Brian, that mentioned SQL Fluff. And SQL Fluff actually came from a dbt project as well.
So, and it's all in Python.
So super cool.
Nice.
Wow, that's really neat.
So what do the Python models look like?
Are they straight Python classes or are they Pydantic or?
I have, I watch, maybe I'm a bit lazy
cause I just watched the video
and they were showing here how it works.
Cause they're also doing a comparison, right?
Maybe this is, no, this doesn't work, does it? Yeah, it works. This works? Yeah, it works. It's also doing a comparison, right? Maybe this is, no, this doesn't work, does it?
Yeah, it works.
This works?
Yeah, it works.
Okay.
This is, but the quality is horrible.
That's okay.
But in a nutshell, you have this.
Yeah.
You define a function.
Yeah, you define a function that has a dbt and a session, and then you create a reference.
So reference is basically a table, right?
And you can say, and then from that point on, you can say two pandas, and then you can
just basically use the pandas API to transform that, right?
So there's still some caveats, right?
Because panda is not super performant, depending on how much data you have and whatnot.
So sometimes you probably still want to stick to the SQL stuff.
But then it opens a lot of possibilities there too, right?
So even stuff like deploying machine learning models on the SQL infrastructure and everything.
So yeah, so it's kind of the same old, same old story. You know,
even if you're working with an ORM, sometimes you don't want to bring all that data back
to make some minor change and then put, you would just do a sort of an update statement
instead of pull back 10,000 models, change something and call save 10,000 times, right?
Like it's probably that kind of trade-off, but it's really cool that you can bring it back into
Python this way. What are you using it for in your work or
like what are you interested in using it for well i think we have a lot of these like etl pipeline
stuff right a lot of these um we have some data here and then we want to like basically clean it
up and make sure it's all uniform and put in a dashboard calculate some kpis and whatnot right
and so business people can see are we doing better are we making more money or not kind of
um yeah and like a lot of the times it's just sql right it's also more accessible for a lot of people so we stick to sql um but there
are also limitations right but before what i've seen is uh people just kind of go in the ui and
just execute stuff ad hoc right so no versioning nothing and i think this kind of puts everything
in one place you can even add cicd because the cli tool and everything and just kind of make sure
that everything goes to that versioned method, let's say.
I mean, and again, yeah,
if you need something more fancy, right,
then you can throw some Python stuff in there.
But usually we try to avoid it, to be honest.
I can imagine.
Let's see here.
Hold on.
Yeah, the models, the way you express the code,
it's like, it's really nice looking for SQL,
which is surprising, right?
This code you write like with customers as select these fields from this, this table. Yeah. And they have, they also have like
the different macros and like people can write different macros. So like the describe function
in Pandas, someone can just have written that and you can import that. And like, it's, it's really
nice to share like all these things as well. So super cool. Really, really eager to, to give it a
try, to be honest. I've been just like, try and scratch that.
Where's the next project that we could use this on?
Indeed, indeed, indeed.
Yeah.
All right.
Brian, anything you want to add before we jump over to talking about our sponsor real quick?
Yeah, let's talk about our sponsor.
All right.
So today's episode of Python Bytes is brought to you by Microsoft for Startups Founders Hub.
Microsoft for Startups set out to understand what startups need to be successful and created a digital platform to help you overcome those challenges.
And they came up with Microsoft for Startups Founders Hub.
The Founders Hub provides all founders at any stage with free resources to help solve startup challenges.
The platform provides access to expert guidance, skilled resources,
mentorship, and networking connections, technology benefits, and so much more. Founders Hub is truly open to all. You don't need to be investor backed, but you can be. Speed up development with free
access to GitHub and the Microsoft Cloud. You can unlock credits over time, and there's also
discounts and benefits from innovative companies partnering with Founders Hub, such as OpenAI.
You'll have access to their mentorship network, which includes hundreds of mentors across a range
of disciplines. Need advice on marketing, fundraising, idea validation? There's tons
of topics, including management and coaching. You'll be able to book one-on-one meetings with
the mentors, many of whom are former founders themselves. It's no longer about who you know.
Get critical support you need from Microsoft for Startup Founders Hub
and make your ideas a reality today.
Join the program by visiting pythonbytes.fm
slash foundershub2022.
That link is also in your show notes.
Thanks, Microsoft, for keeping us going strong.
All right.
What have I got next? This one is a chain of really cool things. So Roman Bright of Beanie fame and other things tweeted about this project that Pablo Galindo Salgado has been working on. So Pablo was the release manager for Python 3.11. It was part of the live stream of releasing that was all fun but he also i believe works at bloomberg where they work on memory and i think we spoke about memory quite a while back
brian it's a memory profiling tool maybe um marlo do you use uh profilers and that kind of stuff in
your world no i haven't used much haven't haven't had the need to be honest, not yet. I feel like so far, there's no, try to keep it simple.
So a lot of times profilers are about performance,
like how fast did this code run?
And if it's slower, should I look at this loop or that loop?
Or, you know, where do you spend your time making it faster?
Because it's really surprising when you look at code,
like this part looks complicated.
So that must be the slow part.
Like, no, that doesn't matter.
Nothing you do to that will make any difference.
You got to look over here, right?
That kind of stuff. But memory, as the name would suggest,
is more about memory profiling and like talking about, you know, how many of these different
things have you allocated and those kinds of things. What is coming? Well, first, let me,
let me pull up, we have a PyTest plugin, which is super cool. So with the PyTest plugin, you can do two things. Now you can
say PyTest dash dash memory tests, and it'll tell you things like you can actually set limits on how
much memory can be allocated for a certain operation. And if it exceeds that, it'll say,
oh my gosh, there's something wrong. This thing is like way over using the memory we expected. So that's an error. But it also gives you like a cool emoji filled summary, I guess.
Like total memory allocated, the number of allocations, a histogram of allocation sizes.
So like Python memory has like size classes.
We've talked about its block arena.
One other term that I'm forgetting that that it uses to organize these data structures.
And then you can actually get it overall then for individual tests.
And so it'll tell you like the different things that were allocated.
And anyway, it's pretty insane.
Okay.
So you can get that report and then you can also, where's the other one?
I think it's, there was a, there's a place where you put a decorator and you just say on this test,
I,
if it exceeds this amount of allocation that should fail the unit test,
it's just a pie test.
Mark.
Oh,
cool.
Memory limit or something.
I don't think it's a limit or memory limit.
I can't remember exactly what it's,
what it's called.
You can say,
if this test exceeds one memory,
a one megabyte of memory allocation,
then that's a failed test,
which is pretty cool, right, Brian?
That's really great.
So they got a, yeah,
they have a limit memory decorator
and a check leaks decorator.
That's the one.
So the check leaks is the new thing.
And so what you can do now is you can say
pytest.mark.checkleaks as a decorator on your test.
And if there's a memory leak in the code
that runs during that, it will let you know. Wow. I don't know if anyone else has tried to track down memory leaks.
I would rather track down like a multi-threaded race condition than a memory leak. I don't want
anything to do with memory leaks. This is no fun. And so if I can deal with a decorator, let's do it.
Well, and also decorating your tests. So you're not having to modify your code at all to do this.
I mean, the code under test, you're modifying your test code your code at all to do this that mean the code under test
you're modifying your test code if if at all um or it looks like it gives you some benefits even
with no modification it's pretty cool yeah maybe pardon my ignorance here but when would i worry
about memory leaks in python i think so imagine you're writing um imagine you're writing pandas
right and you're you've written a bunch of C code
that's getting imported
and you know there's a memory leak in there somewhere.
And it's just like, okay, well,
I don't really know how to.
But then it's more like the C part is the bandage.
You can also have memory leaks
in the sense that you expected
there to be no more things allocated
after the function was called,
but you could have signed it to a global variable or you could have, you know, stored it, held
onto a reference in some way that you weren't expecting.
So it's not a leak in the super traditional sense, but it could build up if you're doing
something wrong in Python, but certainly outside of that.
So I think this is pretty cool.
Really any long running service is going to have,
you're going to be concerned about it. There's a lot of Python applications that are short running
and it just cleans up after itself when it's done. So there's cases, long running services,
also things like maybe you care about, things that are using large amounts of data and need
all of the data that they can get a hold of without wasting any.
Or that's important as well.
Makes sense.
I'm also wondering-
Yeah, if you're right at the limit.
Yeah.
No, sorry.
Go ahead.
Go ahead.
Yeah.
If you're right at the limit of like, I'm using 15 and a half gigs and I don't have
more than that.
So I need that.
Or like, I just checked the TalkPython training site has been running for seven days in one
hour.
Yeah.
Like if it had a memory leak,
even if it's 100 kilobyte here and there,
it could turn out to be a big hassle.
Okay, cool.
I'm wondering if you could use this for edge device stuff,
if you want to limit the memory
because we know the edge device won't have that much.
That's actually a really good point
because if you're on one of these like circuit python little boards
they've got like 256k of ram and that's that's very different than 16 gigs isn't it yeah yeah
right yeah so you could test your application on a larger computer and limit it limit how much
memory you give it so it's kind of right yeah i think you would want to do that with the limit
then rather than the check leaks but still yeah. Yeah, but it's the same.
Cool.
Yeah, awesome.
All right, let's see.
A couple of comments from the audience.
Gareth out there.
Hey, Gareth.
Says, I ended up writing Docker containers that swapped out every couple hours to solve it.
I mean, that's actually what a lot of people do.
They're like, you know what?
If it runs more than 12 hours, it's a problem.
So we just tell it to recycle itself.
And then Madison says, this is so cool.
I need memory profiling all the time
with some of the data I do work with regularly. So people, people are digging it. Cool. Yeah. Very
cool. So thank you, Roman. I know you didn't send that in to us on purpose, but you shared it with
us anyway. Thanks. Nice. Over to you, Brian. Okay. Before I get onto the next topic, I want
to point out that Henry Schreiner, I'm going to paraphrase him by saying, Brian, you dork. You didn't even read the article.
Yes, you're right, Henry.
Sorry.
So the new prefixes, I was showing the previous new ones in 91 when they added Yocto and Zepto.
These are not the new ones.
The new ones are down here with Rana, Quetta, Ronto, and Quecto.
Yes. Rana Quetta, Ronto, and Quecto. Yes, the reason why those sounded familiar
is because they've been around.
These new ones, they're the new ones.
Okay, so thanks, Henry, for clarifying that.
But on to the next topic is
Will McCugan says,
please steal my source code.
So he wrote an article,
Will McCugan wrote an article,
said stealing open source code from so uh he wrote an article will mcclinton wrote an article said stealing open
source code from textual and he says um i would like to talk about a serious issue with free and
open source software steal the stealing code you wouldn't steal a car would you and then actually
he has this funny video that he embeds about like how uh like digital piracy really is like stealing. And it's sort of a funny video.
But the comment is real,
that you can steal code from open source projects
as long as you can.
So please read the MIT license
or read the license to make sure that you can.
And in a lot of cases, you can.
So I'm going to give an example that i
i use a lot is i'll think of something that i want to do like i'm interacting with a library
and i'm and maybe i don't quite get how to do that with the documentation i could search github
for projects that use that library also as an example and so that's a way to to look at other
source code of how to how to interact with a project that maybe doesn't have the greatest documentation.
You can see how it's done.
I've honestly never thought to do that.
That's a great idea.
I'll go look at the tests and stuff.
I'm like, these tests suck.
There's not a single one that shows me this use case that I'm looking for.
This is brilliant.
I do that a lot with PyTest plugins
because I look at how other plugins are testing their stuff
and I'm like, oh, how do they do it?
So the warning there is he's not advocating for piracy.
Open source code gives you explicit permission to use it.
And if you're actually just copying the whole thing,
you probably should reference it and use the same license,
or if you're copying large chunks,
but the MIT license,
for example,
says,
says it's substantial copying.
So a little bit of copying is fine.
And,
and,
and Will says textual has some cool stuff in it that you might want to look
at.
So he's got a loop.
He points out some things you might want to steal the loop first and last.
So he's got a loop iterator so he's got a loop uh iterator
that um he's got a couple versions of it that will not only iterate through things but it'll um
it'll give you it'll note which one's the first and the last so if you need to do something uh
cool on something different on the first and the last one do that um he tweeted recently or tooted
or whatever about uh the lru cache as well so the python's got
a built-in lru cache but everything's global so you can only kind of clear there's limits on how
you can interact with it so he has a more flexible lru cache um he's got a color class that looks
pretty cool that you can convert to different uh color representations that's pretty neat and then
you know he's been working on a ton of geometry stuff,
2D geometry.
So he's like,
you might want to use this
for whatever 2D geometry you're using.
So here's there.
So kind of cool reminder that open source,
one of the benefits of open source
is you get to see the source
and learn from people.
I like it.
I love your idea.
You've never done that.
I'm like, it might dance.
I just can't figure this out.
Oh, how are other people using it?
I just get frustrated going to a new library.
This one sucks.
I can't do this.
I'm going to find another one.
It's not good enough.
Merlo, are you an open source thief?
Do you do this kind of stuff?
I have to admit, yes.
Yes, I am.
Stack overflow thief, open source thief,
especially in the early, early days, right?
But I think with the rich
stuff too it's very inviting for you to steal code because even the on the rich package right
like if you do python dash m rich table or whatever you always show some really nice stuff
on the table on the on the terminal right and i was like how does he do that and or like i think
for every component he had a little demo that you can just run and it's very tempting.
Even if you didn't want people to steal stuff from him,
I feel like you have a hard time
just keeping the thieves away, you know?
Yeah.
Yeah, very cool.
And funny too.
I like it.
Good job.
Good job, Will.
Where are we at now?
All right, off to Marlo's final item.
Yes.
This one I had not heard of either
and it looks pretty interesting.
Yeah, I mean, it's, I think it kind of, it's one of the things that I saw.
I was like, yeah, this makes so much sense.
Uh, why, how come I never, I didn't think of this before, but, uh, this is shed.
I'm a man.
This is a podcast, right?
So maybe, um, um, it basically, I think it's, it's related to like bike shedding, shed your,
your legacy code, right?
So it's like a super set of black, right?
They call it black plus plus here.
So they say here,
maximally opinionated auto-formatting tool, right?
So it's all about convention over configuration,
which is also something that I can subscribe to.
They have no configuration options,
but basically it's a bundling of a lot of tools, right?
So they have black here,
but they also have eyesort and with the profile black, so it doesn't clash. They also have pi
upgrade, which I think you guys mentioned a couple of times, right? And autoflake as well.
Autoflake I didn't know actually before, but basically it removes unused imports and unused
variables from your Python code. So it's kind of like, yeah, that's all I wanted.
I was like, I wish I had this last week.
There you go.
Yeah.
But yeah, it's the one stop shop and even do like a black in docs, right?
So if you have doc strings or markdown or everything, you will take that.
It will black format that for you.
So I was like, yeah, this is what I wanted.
Okay.
Hold on.
Black in docs.
This is new to me too.
All right.
Yeah. Let's see. So this is hold on. Blackened docs. This is new to me too.
All right.
Yeah, oh, let's see.
So this is, yeah, yeah.
Run black on Python blocks,
sample code blocks.
Yes.
So if you have rich structure text,
markdown, even doc strings,
it will format that for you.
In the.
Oh, like, you like blackening your readme,
for instance, so.
Yes.
Yes, yes.
Ooh, okay.
Yeah.
This is good.
Indeed, so. I have some stuff to talk about at the very end
just a little bit about blogging and writing and and some platforms and stuff and that's all in
markdown like i could run this against all of my code samples on my blog to basically auto format
all code in the blog that's awesome yeah yes. The next time I write a book, I'm totally going to use that.
Yeah.
Or if you're doing a book.
Yeah.
I mean, absolutely.
So I literally just like yesterday, the day before I was cleaning up some code, I finally got, you know, I kind of, I don't do it clean the whole time.
I get it to work and then I like, you know, then I look at what I did stupid and there's,
there might be some imports laying around that I thought I needed.
Because you add an import and then you take that code out.
But you sometimes forget to take the import out.
So I ran black on everything, of course.
And then I ran flake eight and I'm getting errors.
I'm like, shoot, why didn't black just take those out?
So now I've got shed and I take those out.
It does it all, right?
Like it's great.
Because maybe it's the same, right?
Like you run flake eight, it's like, ah, yeah, unused variable.
Ah, okay. Then you have to go there one by one.
It feels like there should be a nicer way.
Yeah, I mean,
you have to pay attention to that because your unused
variable might be a typo
or something. You might think you're
using it.
That's true. Yeah, or it's like a global
variable module supposed to share
with something else and it's a library.
But in general, I mean, you could probably put like a hash,
you know, QA or something on it.
Well, I mean, yeah.
And also you're testing,
so your test will catch it if you delete too much.
Yeah.
All right, well, really, really good one.
Take your code out to the shed and whip it into shape
behind the shed.
That's it.
All right.
All right, well, Brian, what else we got extras i got
some extras you got some extras who should go first uh you go first okay well the thing that
i've been working on is um is by test check and i finally got and i've been talking about this for
like a month because i've been slowly uh pulling this into shape it's a almost a complete not
really a rewrite, but I moved
everything around and the code's a lot easier to read. And so it makes me happy. And I also
changed the API. So I wanted to mention to everybody that you can either use, so you can
either use from PyTest check, import check to get this check object, or you can stick the check
object as a fixture. And either way you get access to everything in the library.
That's the only thing you have to do.
And for people unfamiliar, PyTex Check is a library that allows you to have multiple failures per test.
You know, normally the recommendation is try to fail on one thing.
But sometimes you need lots of data.
And I just threw in a little example that uses both.
So if, like, it's using hdbx
to grab uh grab the status code and as long as the status goes 200 then i can check a whole bunch of
stuff i can check to make sure the redirect and encoding is right and uh check for some some stuff
inside the i mean you these could be multiple tests but if it really is you're checking multiple
parts of things and for scientific work that i'm
in measurement work that i do i'm often checking like uh tons of aspects of a waveform and it's
really just making sure the waveform's right and that rightness is multiple checks so use that uh
anyway i didn't intend to break anybody but i did break brian's skin so brian came up at the
beginning of the article uh but he um tagged me in a github issue on his project and i looked at it and i'm like oh i didn't intend to break
that so i fixed it this morning so um hopefully if if anybody gets broken by this i was not
intending to break anybody just let me know and i'll try to fix it that looks great how about you
uh merlo i know you have some as well i'll let let you go as well. Sorry. I don't know.
Yes.
Maybe.
Yeah, I feel like I should have opened that.
I didn't have the link up here.
But talking about breaking stuff,
Flake 8 is not on GitLab anymore.
And I actually didn't have issues with that
because with pre-commit, right?
You have to specify the repo.
I already was on GitHub,
but I actually heard from some people
that they heard a lot of noise that Flake 8 is not on gitlab anymore and then
there was also this video from anthony that is maintaining right pre-commit and flake 8 he was
explaining a bit because uh why what was the motivation from going from gitlab to github
and uh yeah like what's relatable is that like sometimes you break people's code but it's like
it's not an intention right but sometimes people can get very heated over these things.
So yeah,
just maybe public service announcement,
you know,
change your,
your Git repo to,
to GitHub now for,
if you're using Flakegate as a pre-commit.
Yeah.
And you also had massed it on bot.py,
right?
Yes,
yes,
yes.
That I did.
I just,
sorry,
I flipped the order.
Cause I thought it was,
it was,
it was a segue there.
Yeah,
yeah,
yeah.
It was,
I wish I knew about this like a week ago or so.
That would have been awesome.
Yeah, you covered Toot, I think, right?
Yes, we covered Toot.
That's right.
Yeah, yeah.
So this is, to be very honest,
I wasn't the one that found this.
It was my boss.
So shout out to Bart, if you're listening right now.
But this is basically just a wrapper
around the Mastodon API, right?
So you don't have to do requests.
You can usually have like a nice client library there to do all API, right? So you don't have to do requests.
You can usually have like a nice client library there to do all these things.
So if you want to play around,
create some bots, you know, whatever,
then yeah, there's a nice convenient package
now for you to do it.
This is really cool.
And it has, you know what?
Documentations that say what functions it has.
I love it.
Documentation?
Just read the code.
It doesn't have to be much,
like the seven or eight lines of code
that are in the readme
like gives you a really good boost,
but it lets you register your app,
which is one of the things
if you go to the website,
it'll show you which apps are registered
for your access keys on Mastodon,
but it won't let you create one on the website.
So here's like a simple create app
and just give it a,
you know, your instance name
and what file to save the access tokens over to and boom
you're good to go yeah have you guys already done stuff with the mastodon or yeah i you know on the
stream deck the thing that controls the stream i already wrote that thing when i when i push the
one button it it sends out the message automatically that this live stream is starting and yeah it uses
that uses a little bit of toot and mostly just the straight API with HTTPX.
But if I'd known about this, you know, I would have used it.
Now we know.
Yeah, I know.
Thanks for sharing that.
Anything else you want to share before we move on?
Yes.
So there are a couple more things.
But this one, this is the Brazilian Nimi that couldn't resist.
The World Cup started.
I don't know.
Are you guys soccer fans or not at all?
So we've a fun soccer team here.
We go see, I go see with the kids and stuff in town.
Yeah, so I'm also in machine learning.
So a lot of data and like this time of the year,
you know, there's a lot of like,
oh yeah, the AI models are predicting this.
This one is one from Oxford.
So I just wanted to give a quick shout out here.
So they have a video on YouTube as well,
which is cool.
They explain the math.
And I will go on a limb here and say they use Python
because they even mentioned Matplotlib and whatnot.
But this is basically just a big excuse to say that they predict Brazil to win.
So, you know, if this doesn't happen, it's all rigged.
The math supports this.
So Brazil must win this World Cup.
And anything that is not there, I'm going to be extremely disappointed.
This is really cool.
People are always looking for like realistic examples
to learn and explore uh libraries and tools and this you know if you're into soccer and you care
about the world cup this is great yeah i think if you yeah there's there's people people are very
creative i feel like there's a lot of uses for it well i'm sure this will happen because there's
absolutely no corruption in soccer so yeah yeah for sure yeah not at all not at all uh cool uh
should i just keep going or you want to take over if you got more items yeah i have i have two more
sorry i know you you said i could have more than two uh so uh you can just wait that's what this
whole section is about um one so for me as a data scientist or machine learning engineer, we use a lot of
notebooks, right? And I think they have their place in data science, but there are some tools
that don't play very nicely with it, right? And I think in Git diffs or PRs, they don't play so
nicely, right? So this is, I think it's public preview, I want to say, but I haven't actually
seen this, but now GitHub is going to start supporting notebook diffs. So if you have a
pull request, they're going to have a nicer rendering of the notebook here and you can actually see what the
differences are and i think before there was okay called review and b that you could add to github
um but yeah now they're just going to start supporting it so i haven't seen how it looks
but i'm pretty excited about this too one less headache for me yeah that's excellent because
before the diff would just be like here's the diff of the json file you're like no that's not what i was like and also json is just json
like just key value so if you just change the the order of some keys it's just like yeah you have a
lot of changes but it's not you don't care yeah oh this looks really useful yeah and uh maybe one
last if that's okay yeah just pull this here this is lancercer. So it's another CLI tool. I talked about linting before, right?
So this is another kind of linting.
And I say kind of because...
So, you know, black...
Some definition of linting or cleanup, yeah.
So this is like black, almost like black,
but it's the opposite.
So instead of making your code look nice,
it would just make it like a hideous,
but working mess, right?
So these are some of the features. It turns all your comments to pitbull lyrics or something safe for work
depending if you want um it takes all your variable names and mix into like uh animal sounds
and horribly look similar looking characters so like bark underscore bark underscore zero oh oh
zero oh um it adds white spaces. It adds completely irrelevant comments
and the code still runs after these improvements.
So here's an example.
You have here some comments and everything.
So before, like nicely formatted
and then afterwards you see some comments
like bada bing, bada boom, you know.
There's nothing like Miami Heat,
some alpha characters in your variable name.
So pretty good stuff.
Again, I must say I haven't used this, but this is a tool that I'm not as excited to use.
I mean, there's always times that you need to send out your code to different places and you would rather share it less than more.
Thinking of like if you make a desktop app and you got to send out the code for that or
whatever and you would want to obfuscate it you want to make it harder for people to just pick
it up and like do so you could hit it with this they'd be like yeah no no we're just now we're
just not doing that so my favorite one my favorite ones on the screen is the uh adding obvious
comments like uh setting the value of some um like that's good um that wasn't in the original
and it's just funny to i mean that that's actually not gibberish it's just useless um
it's it's really good the the uh comments out in the live stream are really great
as well people are enjoying it one of them is it's great for Twitter employees.
You can maximize your lines of code for review as it's coming up.
Then you just print it out and you take it and sidebar.
Like if somebody says print out my code so we can review it,
they're not,
they're not equipped to review the code that you may have written. Like if the word print involves in his value in code, like, no.
All right.
Just, I don't think so.
So leave that where that is but you could
you could put this on top like yeah i'm kind of funky when i write codes it's a little different
let's get used to let's yeah i just it's a farm it's a code farm oink oink oink you can have two
sets of books kind of you got your real repo and then you port use this to to put it into the actual
one that you submit.
And you're like, I understand it.
I don't know what your problem is.
It works on my machine.
I don't know.
I kind of want to run this on a large code base.
Something really complicated.
Squash all the commits.
Force push.
Like textual.
Release it as textual oink oink or something
yes i love it cool all right well this was this was a good a good find awesome thanks all right
i'll i'll make mine quick here so a new youtube video i talked about how you can install the
mastodon web app on your ipad as a native app as well as on your desktop. So if you're doing that kind of stuff, not there.
Basically, they just released Mastodon 4 a couple of days ago, and all the apps don't have features
like edit and some of the other features that are there because they're like months behind.
And so if you install the web app as an app, then guess what? It looks like an app. It acts like an
app, but it has like zero latency. So as soon as something is released on the website, you get it, which is pretty cool.
So people can check that out. I saw Madison in the audience sent over a call for proposals
or calling all Pythonistas, if you will, for PyCascades. So PyCascades is back in person this
year in Vancouver, BC. It goes from Vancouver to Seattle to Portland and cycles through that there.
But so this year it's gonna be in Vancouver.
So if you wanna go up there and talk,
be part of the conference, good conference.
So call for proposals are open there.
Yeah, but they're not open for very much longer.
So jump on that.
I don't remember what the date is, but.
It closes Wednesday 30th.
So what is that?
Yeah, eight days. Yeah. Next Wednesday. Yeah.
Eight days. And Madison and I feel in the audience. Thank you. It's put in person this time. We, uh,
and we really value the first time speakers in atypical talks. So get out there and put yourself
out there and, and get into public speaking. It's not a huge conference, but it's, you know,
it's big enough. A couple hundred people, three, four hundred people, fun time. This is just really quick and fun.
You know, if you're on a Mac,
you're not as likely to get viruses sent your way
that would actually be able to do something
like 90% of viruses are written for Windows.
But what's a really interesting fact,
I just, if you do have a Mac,
it turns out 50% of all macOS malware
comes from one single app.
Can you believe that?
What is it safari
no it's um mac keeper so if if you have mac keeper it like organizes your files and it'll
like clean up your your junky cache and stuff but apparently it has to take over so much permissions
and it is like it can get i guess plug in or i don't know what it does but people can like plug
into this
and make it to all sorts of horrible stuff. So 50% of all malware is written for Mac Keeper.
So if you have Mac Keeper, maybe unhavit. I recently, as of Sunday, launched a new website
that I hope will bring me back to writing some more. We'll see about how that goes, but here I'm, I'm trying a new philosophy on, on blogging, Brian. I don't
know how you feel about it, but I have a blog. I've been doing it for a long time, but like,
I looked, the last article I wrote was like 2020. I'm like, oh, that's not so good.
And the reason is I would always try to write like 2000 word posts that are really, and I'm like,
but I could post to Twitter and mass it on all day.
And it's like, I can just do that. That's no problem. I don't like fall behind on Twitter.
That's because these really should be super short posts. So I have, I've got this new website that
I wrote that are just super short, you know, fits on a page type of articles that people can
go and check out. So. Yeah. Some of the people, some people are promoting like today I learned
things. Um, but sure. And why not? I mean, if you think it, I, if you think it's going to be a
thread, write a blog post. Um, exactly. Yeah. Yeah. So cool. So all of these are written and
this is all based on Hugo, which is a, just learned about it, but a ridiculously cool static
site generator. Either of you played with Hugo? I use it. I love it.
So pythontest.com is written on Hugo.
It's ridiculous, right?
No, Merlo, you haven't?
Sorry.
No, I haven't used it, but I heard of it.
Yeah, I heard nice things.
Yeah, so you basically just go to your directory of markdown files and images.
You just run Hugo-d server or whatever.
And then as you write,
you have your web page open in your browser
and it automatically sees the markdown file changes or the css changes regenerates it and refreshes your
browser just so your browser could be just over there and it's just periodically as you make
changes it instantly refreshes so you don't even go and refresh the page to see how that you just
write and the browser just watches and and reloads it's cool yeah and i so you got it so that you just you just push
push your changes to github or your repo for and it just appears on your website exactly yeah
exactly so that was my my next thing is then i set up a netlify free account with cd and ssl custom
domain name push it just has a prod branch that i connected it to and when i pushed a prod boom
it just goes there instantly so anyway anyway, people are looking at that.
That is super cool.
Push the prod.
Oh, that's kind of cool.
I just, I just edit on prod.
So I just log in, edit over SSH.
Yeah.
Just enter the server.
The server is the backup.
Anyway, I have stuff on the screen, but then no more backups.
That's just stuff I pulled up while we're talking.
So no more extras.
I mean, so yeah, fun stuff.
Um, people check out the, the, the blog website and the video and, uh, I pulled up while we were talking. So no more extras. I mean, so yeah, fun stuff. People check
out the, the, the blog website and the video and apply for speaking at podcast gates.
Nice. Well, I feel like Lancer also was like already really funny, but do you have anything
else funny for us or. I do. Although I somehow forgot to pull them up on the screen. So give
me just a second here. There's two, these are really good. Okay. These are, these are pretty, pretty epic. So this one is called, I think Merlo, you'll, you'll really
like this one. Cause it has to do with like algorithms and data science and it's called
messing with the algorithm. And it shows this, this dude here, don't mind the thing at the bottom.
I have no idea what that's about, but see, there's this guy whose face is blurred out in the UK. I
think, I can't remember where this was. I know Berlin and he's got But see, there's this guy whose face is blurred out in the UK, I think. I
can't remember where this was. No, Berlin. And he's got a wagon, like a little red wagon that
you pull behind you, full of 99 phones. Now, what he did is he got them all running Google Maps
and left them open and started walking down the street real slow. And notice his neighborhood is
now red on the map and he got it. So it thinks there's a traffic jam and it'll send cars around his
neighborhood.
Nice.
I want to get one of these so bad.
And whenever I take my dog for a walk,
just walk with the wagon behind me too.
No cars.
Yeah.
So good.
Isn't it?
Yeah.
This guy's so ahead of our time.
He's just like,
Oh,
he's so brilliant.
Yeah.
And for his neighbors.
Yeah.
The next one here is going to take a little bit of a, I got to set the stage.
Give me a second to set the history.
You've heard about these motivational posters.
You go to like a dentist's office, it'll be like an eagle soaring over like a sunset.
Like if you don't spread your wings, you'll never soar as high as you could or something
cheesy like that.
Yeah.
Well, there's this company called.
Yes, exactly.
There's a company called Despair and Desp despair creates these, but like in reverse,
they're called the demotivators. Yeah. Nice. So have you, have you seen these? No. Okay. So
here's one like solutions. And what does it say? It has like a Rube Goldbergian type looking thing
here. And it says solutions. This is what happens when the problem solver gets paid by the hour it's just it's just out of control here's one what is this is a frog wearing with a snail
on its head says collaborate so the best of us have to carry the rest of us it's just like
they're really all right so that brings us to i feel feel like this is a Brian Skin show a little bit. This tweet that he shared here and it has the latexify thing, but recursion. And for the
recursion, it has that demotivator. It's a picture that said recursion. Here we go again. And then
embedded in that is recursion. Here we go again. It's like that, you know, like your screen share,
you see your own screen. Yeah. So it's kind of like that poster, but for recursion.
Yeah.
I kind of feel bad that people, people that don't get the recursion joke, cause they can't
even look it up because it just, it's redirected.
It just keeps going.
Like the definition is the definition.
That's right.
Nice.
All right.
Well, that's what I got for y'all.
Well, thanks everybody.
And thanks Michael, of course.
And thanks Merlo for coming
on the show thanks for having me it was great yeah you bet bye everyone bye