Python Bytes - #281 ohmyzsh + ohmyposh + mcfly + pls + nerdfonts = wow
Episode Date: April 28, 2022Topics covered in this episode: Take Your Github Repository To The Next Level 🚀️ Fastero Watchfiles Slipcover: Near Zero-Overhead Python Code Coverage Extras Joke See the full show notes fo...r this episode on the website at pythonbytes.fm/281
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 281, recorded April 27th, 2022.
I'm Brian Ocken.
I'm Michael Kennedy.
And I'm Anna Astori.
Welcome, Anna.
Thank you.
Before we jump in, tell us a little bit about yourself.
Yeah, definitely.
So I'm a data engineer, or at least at the moment.
I'm a byte trainee. I'm a data engineer, or at least at the moment. By training, I'm a linguist, so I'll be doing both theoretical linguistics and computational linguistics.
So I'm really about how the information is encoded in our brains and how we share this information.
And that's why I work in the tech.
Nice. Since I got my master's in computational linguistics, I worked at Amazon, at
Alexa AI org for a while.
I first worked as a language engineer, actually, so it was more on the side of linguistic side
of things and dealing with extracting the semantics and the meaning, really, out of
the data for Electro.
Then gradually I switched over to just
data processing and been in the
role of data engineer for about
three, four years now.
I'm currently with Dicofone, which is
the worldwide sports retailer.
Still working lots and lots of data
there.
That is fascinating.
Yeah, it's been a ride.
Yeah, it's really neat how we can speak to our devices these days and they kind of actually work, do amazing things, right?
Like I know when Alexa first came out and Siri especially, it was like, ah, I don't really want to, that thing is so not getting, and now I talk to my devices all the time.
It's amazing.
Yeah.
There are some things that are really sophisticated that
they haven't you know sometimes i can't even believe like where we're actually getting there
so it's pretty exciting yeah and sometimes i admit that you know in several things that
they're like really you can't do it yeah but i realized that having worked on that actually i
realized that sometimes it's just um kind a thing that, you know, like from a professional standpoint, it might seem like kind of trivial
to me, but I realized that, you know, there's so much work with things and then this AI
of the actual device that sometimes just like you don't get, you know, go to little like
corners, right?
So one of the things that I got to work at some point
was actually helping Alexa kind of know
when she needs to stop.
When she needs to stop talking about things
and telling you about things,
like whatever she thought on Wikipedia right now.
So yeah, it's funny.
Fantastic.
So, well, for our first item,
Michael, do you want to kick it off? I will definitely kick kick it off let's take it to the next level with this one so this is an article by eluda called take
your github repository to the next level and there's kind of 13 levels but you know i guess
it's a spectrum you decide which level you want to take it to. So here are basically 13 ideas on how your GitHub repository can be better.
So there was a topic I was going to cover
after I explored it more.
I decided, eh, not so much.
But as part of it, there was a conversation
about some WebAssembly stuff in Python,
and I checked it out.
It's really cool.
They're like, we're going to use this library.
This is the fundamental thing that makes it work.
And I go to the GitHub repo for that,
and it says, here's how you build it.
And that's it. I'm like, wait. okay, great. But why do I want it?
What can I do with it? How do I use it? I don't care about how do I build it. Like that's the last, I'll just download the WASM file, but what do I do with it once I get it? Right. It was just
none of that. And so that's kind of, you know, this article helps you think through those ideas.
Oh, nice.
So number one, and you know,
it's Python friendly because it starts with zero step zero rather than one, make your project more
discoverable. Now, every one of these comes with a recommendation, a bit of a description,
and then examples, which is cool. Nice. So for example, this one says what you can do is to
help people find your project if the name of your
project does not carefully describe what it is you can put tags basically so like refactoring
or science or things like that might be something you put on there that's not immediately obvious
from it right so you can tag subject areas and whatnot and they have some examples so for example
there's this thing called well app which is like a mindfulness app for the Mac. Of course, it's for the Mac, isn't it? So it has
tags such as macOS, productivity, happiness, mental health, but also Flutter and web app if
people wanted to check out a Flutter web app, right? Okay, so that's, you know, there's other
examples as well. That's step zero. Step one is choose a name that sticks.
Something that's available on PyPI, something that people can Google, something that people
want to say. It doesn't sound silly or unprofessional if they were to use it.
You wouldn't call your web app Fancy Pants Server, right? You wouldn't say, well,
our Fancy Pants Server is really scaling today today. You wouldn't want to speak that way necessarily, so don't name it that way, right?
So choose a name that sticks. And that we can say on air.
Yes, exactly. And is somewhat
predictable in the pronunciation maybe because that's also a challenge.
So there's some examples of like... Anna, what do you think?
Yeah, absolutely.
Just thinking about the name,
something that I ran into today,
particularly with Python,
some of the services or applications and libraries as well that have them
and in PY and sometimes we don't know
if it's high or B in that case.
It's like all confusing
and then you're talking to somebody else
who's talking about the same thing.
They're like constantly confused.
So yeah.
Yeah, I agree.
It matters a lot.
Let's see.
So some of the things are
conduct a thorough internet search for the name,
avoid hard to spell names,
get the dev or.io domain
if you really, really care about it.
Is it some random small little package
or are you trying to create the next fast API?
A name that conveys some meaning. I was thinking about Jupiter,
for example, like Jupiter is pretty interesting because it's kind of hard to spell, but once you know it, you just know it. And it very clearly works well in a search. There's probably no
domain name. That's like a misspelled planet type of thing. You know, I mean, it was probably a
really good choice,
even though it kind of breaks the maybe hard to spell at first.
Yeah, but it's easier to search, right?
So yeah, yeah.
So the example they give for this one is size limit is the name.
And what does it do?
It calculates the real cost to run your JavaScript app or lib.
Keep good performance.
It'll show an error in a PR if the cost, basically file size, exceeds the limit.
That's cool.
The next one, I'm all about exceeds the limit. That's cool.
The next one, I'm all about this.
Display a beautiful cover image.
So if you go to a repo and it's just the text, that's not amazing.
You want some color and you don't necessarily have to have an amazing logo.
So they come back to this Well app and it's just a W with a little connection smile or something under under it. One thing I did learn about this though, that I thought was interesting, like how do they
center this image, but not have it go all the way across the readme? If you go to the readme
and you actually look at it, apparently GitHub will let you put full HTML inside of your readme
for the segments that need lots of formatting. I thought they wouldn't. I know some Markdown
does fall back that way, but I didn't think GitHub did.
Anyway, apparently, yes, you can.
Also, this one's quick.
Badges like, is CI passing?
What's the license?
And so on.
Is there a YouTube link,
like a YouTube channel that shows people how to use it?
Some more of those as examples.
Write a convincing description in a paragraph or two.
Things like, what is this repo or project?
How does it work? Who will use it? What is the goal? And so like, what is this repo or project? How does it work?
Who will use it?
What is the goal?
And so on, right?
Real simple one.
And again, they come back to the size limit.
It's a performance tool that'll crash your CI
if it's too big.
Here we go.
Getting to the ones that Brian and I love.
Record visuals to attract users.
Yes.
So you might think there's no UI aspect,
but here's a full-on CLI example that is create
Go app CLI.
And all it does, imagine this, it creates Go apps on the CLI.
It's a good name that conveys what it does.
But if you go to C, it's like, how do I create one?
It has the option, but then under it, it has an animated GIF doing the things that creates
the app and showing you the tree structure that results, you know, the file structure that results and so on.
Then a full video and a documentation to that thing and so on.
So that's pretty awesome.
And how about you?
Brian and I are always trying to quickly jump into a project and figure out what is it about?
Is it polished and so on?
But, you know, that's because we run this podcast.
How do you see this sort of pictures and animations for repos?
Yeah, that's super helpful.
I really like the idea with the animation, just basically taking you through the kinds
of things that this particular app, for instance, can do.
That's super helpful.
More and more people are doing it.
I don't think it's super popular yet.
I don't know about how about you guys,
but I haven't seen it pull up, you know, times.
Yeah.
Yeah, it definitely looks nice.
Yeah, I really like it as well.
All right, let's see.
Another one is create a practical usage guide,
like how to use it with some examples,
some templates, answer common questions,
like an FAQ, I use it on Windows, some templates answer common questions like an faq
i use it on windows or does it require admin support i don't know something like that build
a community so maybe you have a this is probably further down the line but like do you have a
discord community for your project or you can even just enable discussions on the github repository
i'll end up with people opening issues on my various repositories saying, I have
a question. Okay. A question is not an issue. An issue is the thing that is wrong or a thing to be
improved, but they don't have another way to communicate traditionally. But GitHub now has,
in addition to issues, they also have a discussion section that's more open-ended. So I think that's
off by default, if I remember correctly, at least on the older ones it is. So I go and turn that on.
Code of conduct.
That's all good.
Contributor guidelines.
Choose a license, the right license.
Remember, if you don't choose a license at all, that means it's unlicensed and people can't really use it.
So add a roadmap, create GitHub releases.
One thing that I didn't pull up that's pretty cool is release drafter. I'm not sure if you all are familiar with this, but this is's pretty cool. Is a release drafter.
I'm not sure if you all are familiar with this,
but this is a pretty cool thing as well.
Release drafter drafts your next release notes as PRs are merged into master
or main depending on how you set up your repo.
That's pretty cool.
Customize your social media preview.
So if somebody shares your project,
you can control what is shown
in that little Twitter card
or other cards.
So apparently that that can be customized
inside of your GitHub repository
and launch a website.
Off it goes.
You can use GitHub pages
or Netlify is really easy for,
easy and free for static sites and so on.
So anyway,
there's a bunch of things people can do
to take their repo to the next level.
What do you all think?
I think it's great. Yeah, I love this list. list it looks very nice i don't do any of these things and i
probably should so i might have a picture i have a usage guide oh there's also one that talks about
how to install it that i somehow skipped but most things don't need so one of the things
one of the things that i see a lot is uh I don't know if this covers it, but I see documentation that's on Read the Docs, which is great.
But I still think a quick start or a little like this is how you install it and this is how you can do a little bit of something with it.
That should be in the Read Me, even if you have other documentation, because I don't want to have to just go to the documentation to see if this is the right project for me.
So, yeah, this is great.
So we have a question of does how does one create a CLI animated GIF?
And I don't know if the if this article covers that, but I don't think so.
OK, left a left a research that and get back to you.
Yeah. Well, Alvaro, what I do is I'll use Camtasia,
and you can record a Camtasia video of just the window,
and then there's different output options like just audio
or just the video or an animated GIF.
Oh, okay, cool.
So that's one of them.
Jeremy Page points out there are a few tools to record that in a cinema.
I don't know.
Like ASCII cinema, basically.
I don't know how to say that.
It's often used pretty cool.
And Dean.
You know the hook of names.
Exactly.
I'm at a loss on that one.
Claudia, who I just had on TalkPython, has a blog post about many of those things.
And he has a better release drafter and badges. Yeah, I covered that on TalkPython just recently about hyper-modern Python.
Awesome. Well, that's probably way more than people want to know about their GitHub repository,
but so often GitHub repositories these days serve as your CV or your resume when you go to apply for
developer jobs. And if you end up at somewhere that looks like what they described here, rather than a bunch of things with like weird commit messages and
nothing like that's going to make a different impression. Or if you want people to adopt it
and start using it. Yeah. And if you don't, then don't put this stuff in. Yeah, exactly.
All right, Brian, let's go faster. Well, let's go faster. Speaking of CLI, so this is a fun tool.
We're talking about Fastero.
Faster, I don't know.
Fastero, I'm going to go with that.
So this is a, it's like Timeit on the command line.
So, but it's pretty neat.
So this is by Arian Wassey,
and we've covered something of his before um so it
was the type explainer thing oh right i don't remember its exact name but type explainer where
you put a typed thing in there and it would humanize what those meant so i this is a simple
little tool but i'm loving it already so uh this one of the, it, it does either it times stuff, but it also
compares times. So like in this, we're showing the website here, but and it, I can't, I can't
tell what their timing. So let's just pull over in the documentation. It does have a bunch of
examples. So if you ran faster with with two code snippets, and in this example,
we're showing is just either just showing either a string or an F string, timing those. So that's
pretty neat. And those so those two code snippets, if you run those, it'll run both of those a whole
bunch of times and do some statistics. Like in this example, it's running it 20,000 and 50,000 times,
but no 20 million and 50 million. Wow. Um, and then, uh, it shows you a little progress bar
and, um, and then who wins. Um, but if you don't, if you're not comparing two things,
it'll just show one with the same graphics, but you can do more than two. I did like three or
four, just trying this out to time different things and compare them.
And this often, that's why I'm timing something. I'm comparing two things and I want to see which
one's faster. So this is a really cool feature. You can either pass in code snippets or you can
give it to Python file names and it'll run both, both those things. One of the, it's kind of a
whole bunch of really cool features actually. And one of the things i like is uh you can if you've got some a code snippet that you are um it needs some setup but
that the setup part isn't the part you're timing you can give it some setup code to do before it
does the time part so that's pretty neat anyway uh just a really nice looking command line interface
timing tool yeah that's
very cool so you can sort of isolate the thing that you really want to time the setup thing
you don't really care about yeah i haven't tried the setup part but it's cool that it has it in
there's um there's a documentation is pretty thorough actually as well um quite a bit of
customization available.
That's cool.
Yeah, I agree that it's nice, that setup stuff.
Because so often, if I want to profile some web app or something,
the thing I want to profile is dwarfed by just loading up the framework
and scanning all the files.
And you're like, all right, now I've got to hunt down that little fragment
that actually represents what I'm really after.
So, pretty cool.
Yeah, maybe I'll try one of those sometime.
Yeah.
And you can pass in strings of Python or you can pass in files.
Yeah.
And when I saw the strings bit, I'm like, all right,
there's a good use case for semicolons in Python.
Well, you can use them.
Yeah.
So.
Exactly.
It makes you feel better.
Awesome.
That's a good one.
All right.
Anna, on to you.
What's your first one here?
Yeah.
So I wanted to talk a little bit about, well, data, my line of business.
And I was just thinking that something that could be really interesting, especially for
that part of our audience that works with data science projects.
Well, in general, you're collecting data.
You definitely, in most cases, you get some kind of noisy data that you need to clean
up and filter out in some way.
And particularly, I imagine we have a pretty large international audience as well. And also, on the other hand, if you're working with data from social media, which is very popular right now,
one of the questions that you have to solve there is identify the human language of the data that you're working with.
And then you want to filter out the pieces of data that are maybe for example are not
in english if you're going through um going through a social media post or something all right you get
that little translate this to your language little button at the end if for some reason the popular
post is in spanish or something right exactly yeah and some of the platforms and their APIs rather do provide this kind of filtering on their back
end. I know Twitter does that, but also as I know, sometimes it's not as reliable really.
I guess maybe again, like I could imagine that maybe it's not really sort of the ultimate
goal, the fact that maybe not putting as much love and caring to this question.
So that's something that I had to deal with a few times
also. And a couple of libraries that have
worked with our Blang ID and Blang Detect,
there are a few more out there, and
these ones have been out there for a while, actually.
And LangID hasn't been actually sort of worked on actively for a few years now,
but it's still kind of one of those benchmark libraries for this kind of questions.
And both of those are super neat, actually.
So LangID is really popular, and. So, language ID is really popular.
And one of the things that I really liked about it is that it actually covers a lot of languages.
So, I've actually had different pieces of information depending on the documentation that I was using.
Either at HiFi or at the GitHub page.
So, at some point, I was covering 97. And I think the GitHub page. So at some point, I saw it was covering 97,
and I think their GitHub page is saying 97.
97 is a lot of languages.
I couldn't name 97 languages.
I'm a linguist.
I would have trouble naming, you know,
97 languages off the top of my head.
I definitely don't speak 97 languages.
And some of the nice things about it
is that you can use it as sort of like
a standalone module, like a command line tool, for instance. But you can also use it as a
web service. So that's really neat about it. And some more like needy things that were
really helpful when I was trying it out for some of the Mark projects was that
when you try to identify the human language using one ID,
it actually outputs the weight and the calculations on it,
which is very typical in like a lock space. We have like this funky numbers in the end,
truly speaking.
But the good thing is that you actually can convert them to more confident scores that
especially data scientists are used to.
And that actually comes in super handy because sometimes when you're trying to filter out
the data and you know that these kind of tools are obviously not 100% reliable. You can also use this as a course to maybe use it as again,
I said, okay, I'm taking this answer and I'm relying on that.
Or maybe I'll just drop this piece of data altogether because it looks like
the language identifier is not super actually sure what kind of language it is.
If you're targeting a specific language.
This is
wild.
So you basically might say
we're 80% sure it's English, but it
might also be Spanish
or something. Exactly, yeah.
English can be easily confused
with maybe German or sometimes
French, just because of so much of the vocabulary circling around those two languages.
So yeah, so the identifier is not going to be like 100% sure that, you know,
this is the language that, and the funny thing is that I'm not so sure about
langID, yeah, lang ID is also statistical, actually.
No remembering.
And so LANG detect as well.
And sort of the flip side of that is that
it actually works very well.
The bigger piece of data that you're fitting into it,
the more confidence it's going to be.
Like, right, that's how statistics work.
Yeah.
That's how machine learning works, generally speaking. And if you're working specifically with this kind of short tweet, social media post, if
it's a really short phrase, sentence, interspersed with emojis and stuff, it's probably not going
to be super confident. So the bigger the data, the more confident, the better the performance of the language
identifier will be.
So something to keep in mind when you're working with candidate and you're trying to filter
by language frame.
Yeah, that makes sense.
If you have one word or something, it's very hard to go off.
Yeah, exactly.
So this being one file, sorry, this being one file is insane. Like it acts
as a web server and does all sorts of stuff.
Crazy. This is crazy.
Yeah, and it's
something that I really like about it.
Pretty lightweight, sort of
well-isolated, low
dependability kind of package,
which is fascinating. Based on that,
kind of not a super
sophisticated, naive-based algorithm,
if I'm remembering it actually correctly. So yeah, that's really, really fun. It's really
nice. It works so nicely. And the other, which I happened to find a little bit more robust
when I got to work with language human language data in my project. And it's also
really neat and easy to use.
The great thing about the basic usage is it's very straightforward.
It's like one of those packages you discover.
You know immediately what it's doing, how it's doing it, and you really can understand
in five minutes if it's going to be something that's going to suit well in my project when
I put it.
Sure.
So the main methods are detects and detects length.
So you can either just call it in a piece of data and try and get the most probable language package things to do.
Or you can have return the elite of possible languages.
So it's going actually to to order them.
You can do maybe English, and then there's a tiny fraction of
the ability that it's going to be maybe German, or
something like that, and then you can
decide for yourself.
And, yeah,
so, overall, from my experience,
LangDefect works, and they
don't make languages a little bit better
than LangDavy,
but that sort of looks empirical.
Yeah, that's great.
It seems super useful for anyone that needs to parse text and can't be sure it's all in
one language.
Yeah, so if anyone now is working on some kind of data science project, working
with a general language data, I would highly recommend. And probably one of the things why language science is a little bit more confident and robust, I
know that it covers fewer languages. So I think it's 55 languages total compared to
97 for a language I do with it. Yeah.
Yeah. Interesting.
Nice. Well, Michael,
let me tell you about our sponsor
for this episode.
Before we move on,
it's a podcast. Amazing.
So this episode of Python
Bytes is sponsored by the Compiler
Podcast from Red Hat. So everyone
out there, just like you, Brian
and I,
we're both fans of podcasts,
listening to podcasts all the time and stuff.
That's why we started some, we like them.
And so I'm happy to share a new one
from a highly respected open source company,
Compiler, an original podcast from Red Hat.
With more and more of us working from home
or being more disconnected,
it's important to keep our human connection with technology.
Compiler unravels industry topics, trends and things you've always wanted to know about tech through interviews with the people who know best.
So on Compilator, you'll hear a chorus of perspectives from diverse communities behind the code.
These conversations include questions like, what is technical debt?
What are tech hiring managers actually looking for?
Hint, see item one to some degree.
And do you know how to code to get started with open?
How do you know how to code to get started with open source?
All right.
I was a guest on Red Hat's previous podcast
called Command Line Heroes,
and that was a super produced and polished podcast.
It was a really cool experience.
And so compiler falls along in that excellent tradition and that polished style. So I checked out episode 12,
how we should handle failure, which I found really interesting. I really value their conversation
about making space for developers to fail so they can learn without fear of making mistakes,
you know, like taking down the production website and so on, right? People grow through
experimentation, but they also fail if they try new things.
So you got to make sure that they get a chance to grow.
So learn about the compiler podcast
at pythonbytes.fm slash compiler.
The link is at your podcast player show notes
right at the top.
You listen to it on all the places that you would think.
So thanks to Compiler Podcast
for keeping this podcast going strong.
And Brian, also just real quickly want to point out i
know people can just go to their podcast app whether that's pocketcast or overcast or whatever
and type in compiler and search but please visit pythonbytes.fm compiler and there's a place to
subscribe with all of your various podcast destinations that way they know it came from us
rather than just out of the ether so um if you're going to subscribe or check them out
please do through that link just so people know nice yeah no so uh how about how about we talk
about watching some things like files yeah we were listening so now we're watching we were listening
now we're going to watch but watch them for changes not watch what they are so this one comes
to us from samuel colvin of hydantic fame. So, you know, it's a pretty cool, a pretty cool experience behind developing this API.
And the idea is it's a simple, modern and high performance ways to watch files for changes.
So there's a lot of reasons you might want to do that.
You might want to say, if somebody drops a file into this directory, I'm going to kick
off a job to like load it up and process it in some kind of batch processing. Or I want to have my web framework automatically
restart if this, any of the files in here get changed, right? Any of the Python files or
whatever. So you could use it for things like that. But the modern part's pretty interesting.
It hooks into the underlying file system, the underlying OS notification systems,
and that's done through the notify rust library.
So basically it's a low latency, high performance,
native non-polling way of watching the files.
It just goes to the operating system and says,
hey, in this directory tree, if anything changes,
call the callback.
Nice.
That's pretty awesome.
Yeah.
So there's real simple uses here. Like I can say from watch files, import watch, and then just four changes
in watch some path, then you can process those changes. So here's an example of an app that just
starts and its job is to, as things change here, take them up. That might be an example of what I
said about kicking off something over to like load it and parse it and decide what to do and then maybe pass it to Celery for background work, right?
On the other hand, you might want to do other things while you're watching for changes as well in your app, IO based here, you can just say, kick off the
watching bit and await for the changes to happen and then do other async processing like fast API
or web or database calls, you know, web with HTTPS or database calls with Beanie or whatever
other async IO things. And it's sort of lets you run them in parallel, which is cool, right?
Yeah. And then if you want to go even further, you can kick off a separate process and say, start
a process that will watch for changes here and then call back this function if those
things change.
So that's pretty cool too.
So there's all these different ways in which you can use it.
But yeah, it's pretty neat.
It's based on this REST library and it seems pretty powerful.
There's also a CLI, which I did want to point out one other thing over here. I thought this might impress you, Brian.
Definitely.
I can do a command line watch files command that will say watch this directory and if anything
changes, rerun the failing tests.
That's very cool.
That's cool, right? So you just do watch files and you run the string PyTest dash dash LF,
which is PyTest rerun the failing tests if anything changes.
I think that's neat.
The command line stuff is actually cool.
I'd check it out just for the command line usage.
But the ability to use it programmatically too with an API, that's impressive.
And I'm very happy they included that.
Yeah, absolutely.
If you're going to use it through the CLI, this is the
perfect PIPX install type
thing, right? PIPX install watch files
and then it's not really tied to any of your
projects. It's just always there. Anna, what do you think?
Yeah, but that looks super
neat. Just made
me immediately think about
file triggers that are
one of the things that are built in
most of these. They're just widely used and we now spend a cloud storage as well.
Yeah.
Yep.
I can imagine like all the possible ways that it can be used.
Yeah, that's really neat.
I wonder if in their documentation,
they actually provide any popular use cases or anything.
They might not do that, but I'm curious if they actually do.
Yeah, I didn't see any in particular.
I just took a couple of examples on how you might use it and all.
But yeah.
Yeah.
There's an older project called Watch God.
I don't know anything about that one.
But I'm glad I didn't learn about it because now there's a new one called Watch Files.
But if you're using the old one, this is the successor to that as well.
It's a funny name, but I could see why some people might not want to use it so yeah well i can see item one right pick a name that people are willing to talk about
exactly yeah well i want to talk about a new tool as well coverage not so hopefully all of us are
familiar with coverage.py so uh it's uh maintained by nedelder, a really cool tool.
But there's a new guy on the scene, and the new person on the scene is Slipcover.
So Slipcover, and actually I heard about Slipcover through the coverage.py Twitter account, which was interesting.
And so not surprising, though.
Ned's a pretty open-minded guy.
But so Slipcover is coverage, but it's pretty new.
So some of these commits, it's just within the last week or so that this came in.
So it's still at, I think the version is 0.1.1 or something like that.
We even just got a new one out this morning.
So why would you want to use something different?
Well, the, the, the big selling point of this is it's really fast. It uses a different,
a different process for, um, for getting the coverage information. And it supposedly is only
a 3% overhead, which, um, depending on your code coverage that.py can sometimes slow down your code significantly.
And if you've got a really long running test suite, making it even 20% faster, but sometimes
coverage can make it like twice as slow. So if you've got a five minute test suite, that makes
it 10 minutes and that's a little painful. So, uh, this might be worth checking out. It's quite a bit faster. I tried it against flask, uh, as an example and
the, the flash numbers. Um, so flask has got a pretty tight test suite anyway, but, uh, so just
straight pie test on my machine, it was like 2.7 seconds with coverage was about four,
4.3 seconds.
And then with slip cover,
it was just a little slower than just pie test.
So pie test 2.7 with slip covers 2.88.
So just a little tiny bit more and you get coverage information.
That's pretty cool.
It is in the early stages though.
There's some,
there's some kinks to work out still.
So I would try it out and watch this space um i think they're doing some really cool things definitely worth watching but uh like for
instance i ran into issues on projects that use pytest plugins i don't know why but the plugins
don't get loaded so the uh like for instance um i tried to run this this flask example but with xdist uh so that i could
run all the tests in parallel to see if it sped up parallel runs it also it didn't recognize the
parallelism so i'm not sure what's going on there but i am in communication with uh juan uh one of
the the maintainers of this or um let him know what what found out. I'm not just griping and not trying to make it better.
I'd love to have this be a really cool tool.
It looks neat.
Yeah, go ahead, Dana.
Yeah, and so the near-zero overhead
is mostly due to how they managed to provide that.
We talk about it in the world of presentation.
It's really interesting.
Yeah. Yeah yeah with such a
overhead i'm tempted to think of a more diabolical use of it like i've got i'm handed some crummy old
app that doesn't really have tests i gotta figure out well what part of this is dead because i don't
know if you've ever picked up some old app that's evolved and evolved and there's just stuff people
don't take out because they're afraid to. Just run this in production for a while.
Oh, yeah.
And just go, okay,
these things don't look like they're doing it.
There might be some case I need to track down,
but this gray area over here that's not touched,
let me look for things to delete over here.
That'd be kind of fun.
That's my favorite use of coverage
is looking for dead code.
Yeah, exactly.
Before we move off this, Brian,
Avara asks, does it have a pi test plugin
i know you said it doesn't work to run plugins but this is the reverse question um i don't i don't
think so so you're running you're running uh slipcover and pi test at the same time i don't
think you really need a pi test plugin for it um it it uh i would it does run work with PyTest, so you can run PyTest operations with it.
Nice.
Just not the bells and whistles yet.
Right.
But I'm sure they'll get there.
Yep.
I would love to circle back to the data.
It may sound like a perfect record,
but that's my favorite topic.
No, it's great to have you on to talk about it
because Brian and I don't live in the data science world, right?
So it's really cool.
Well, you're welcome
in our world. There's a lot
of fun stuff happening here.
Well, actually, if you think about it,
they actually are beginning, right?
Even before trying to wrangle
the data and trying to uncover
any interesting information of the data,
you have to get it somehow.
And sometimes if you're particularly working on some sort of side projects on your own,
you want to maybe try out a new tool or maybe if you're doing like a machine learning project modeling approach, you usually need some very specific data to
work on.
And how do you get the data?
Well, you have to actually go and maybe find some examples of the data on your own.
And so something I wanted to talk about today was actually web crawling and web scraping
and a couple of tools for that. So one that is quite popular
and it's actually like an industrial grade kind of tool
is, well, actually either Scrapey or Scrapey
as for both variants.
Yeah.
And it is a pretty great tool.
So one of the great things
from the get-go about it
is that it actually has
built-in shell,
so you can just go ahead
and sort of try out things
in the CLI,
get the response from a URL,
for instance,
and then try to hook around it
and then test out the behavior,
which is really nice,
and then see what kind of things
you might want from there.
And if you actually go ahead and use it for your module
to get some data, it provides all sorts
of real-life functionality.
To begin with, for instance, it's
a choice between using other CSS selectors
for the content of the pages or Xpath,
which is obviously a little bit more flexible.
All the colors and trails.
It's more fragile though,
because if they make any change to the page, it's...
That also, yeah.
Yeah.
But still, yeah, well, it's part of the game.
Yeah, that's right.
Yeah, and then some other really nice things about it
is that actually they do a lot of heavy lifting for you
in terms of printing and templating.
So you can build some methods for a start project,
and you can run that.
And right away, you have the whole structure and also the boilerplate
code.
And you just fill in certain pieces, for icon processing, which is in the pipeline module,
the things in the setting, etc.
And there you go.
You have a huge amount of work already pre-done for you.
And then some other nice things about it is that they also provide you with
numerous choices actually for exporting the data and for storing the data as well in a few places,
and the formats that you would love to use for it.
All the typical center things like CSV, JSON, etc.
Sure.
Some more, some less frequent options, really.
Yeah, another thing that's pretty interesting about this whole project is that there's a web scraping as a service company behind it, right?
It used to be called Scraping Hub, now it's Zyte, Z-Y-T-E.
And you can basically go in there
and just sign up and hand out one of these spiders
and it'll just run it on the different servers,
try to avoid getting blocked, all that crazy stuff.
Exactly, yeah.
So therefore, it's so elaborate.
And they really put a lot of, just like I was talking before,
a lot of blood and tears, like all of the sorts
of functionality, like covering all those corners of what
you might want from a web crawling tool.
And some other examples that I found particularly useful,
for instance, is the link extractor
class.
It's like really getting to like the ingredient parts of the tool where you can extract further
links from the page, but only those ones that, you know, adhere to a particular pattern,
for instance.
And the list that you get is already duped. So once again, it's like a little bit so much of, you know, the dirty work on your part.
So that's really great.
And they do provide actually ways to interact with the pages as well.
There's a form of class that you can use as it does provide some functionality where you can interact with the page.
But I haven't used it as much myself. use as it does provide some functionality where you can interact with the page, but
I haven't used it as much myself, so I'm not entirely sure how fascinating it is.
But it is probably well done as well.
And another library that I wanted to touch on briefly today as well was Roblox. That's actually something new for my post. That's something I'm
in the process of exploring, so I haven't had a chance to work a whole lot with it yet.
But it's been really, really interesting and I would love, I would be happy if I got to hear
from somebody else to write it out or something. Because in the first place, it's still
on top of HTTPX and VitafulSoup, VitafulSoup 4, rather.
They're super popular in the data processing line of work
and particularly in web scraping.
But they've added some really useful functionalities
and it looks like it allows even more
of this interaction with the pages
in a very neat and clean way.
You can probably find examples
where we hear in the documentation.
It looks so nice and clean and straightforward. It looks lovely.
So yeah, I'm really excited about this package and hoping to have an opportunity to test it out soon.
Yeah, Roblox looks very interesting. It looks very Selenium-like, where you could
actually control the page. It's more like, fill in the comments with this,
fill in the first name with that, and then submit. The other thing that's cool about it is it has async support for doing all this.
Exactly.
You can scale it.
Yeah.
Oh, that's fantastic.
Awesome.
Thanks.
Nice.
Well, where are we at now?
We have extras.
Extras.
Extra, extra, extra.
Hear all about it.
I only got one.
How many you got?
I got zero.
Zero.
All right.
Anything else you want to give a quick shout out to while we you got? I got zero. Zero. All right. And anything else
you want to give a quick shout out to while we were here? No. No. Okay, cool. Well, I wanted to
tell you all about my terminal adventures, I suppose we'll call them. So I've been using
Oh My ZShell, which is amazing. I love Oh My ZShell. But I also started playing with Oh My Posh
and Please and some of these other things.
And I thought, oh, well, how am I going to decide
between say OhMyPosh and OhMyZShell?
Well, it turns out, Brian, you don't have to decide.
You get both.
So here's a little animated video I'll throw up
for people who are watching.
I'll put it in the links as well.
So here's, you can see this cool prompt,
which is all driven by OhMyPosh,
but you can see a like auto complete
into git local git branches through oh my z shell for either branch or checkout and then on top of
that we can do like pls which is amazing you can do oh and mcfly we talked about mcfly before which
gives you auto complete into your history and sort of a emac style editor type of ai complete then pls for a ls replacement that is
developer friendly with like little icons for the file types and it uses git ignore to hide stuff
that you don't want to see and it's like python friendly like understands v and v's and
de-emphasizes them and all that kind of stuff so anyway uh people have been trying to decide
between these things it turns out they all go well together.
You don't have to decide.
That's pretty cool.
Yeah.
Yeah.
Yeah, I really like on my VSH
and that looks even, yeah.
Yeah, all the stuff that works,
you don't have to give up any of it.
The only thing that isn't there
is the prompt
and the prompt is not all that great.
Honestly, I mean,
I know you can customize it,
but I think it's better in online posh,
which is pretty amazing. So people who are listening, they can check out the little video
I'll link to somehow find a way to do that in the show notes. So you all can check it out.
Okay. That's my extra. Yeah. Yeah. How about a joke?
And I guess, how about a joke? So we're all starting to go back out to dinner,
restaurants, COVID's over, I hear. Not necessarily, but here's one from a slightly different perspective.
It says, hello, I'm your server today.
Brian, can you just describe for people listening what's in this picture?
There's two robots at a restaurant sitting down, and there's a server rack next to them.
Okay. like a server rack next to him. Yeah, okay. And the subtitle is, when you go out for a bite, B-Y-T,
he says, the server is by the table
where the robots are drinking.
He says, my name is DHX005972
and I will be your server this evening.
That's funny.
Follow this one.
Thanks.
All right, that's what I got for our joke today.
Nice.
Well, thanks, Anna, for joining us today.
Thank you.
Thanks for having me.
Yeah, it was great.
Thank you, Brian, as always, and everyone out there listening.
Thanks so much.
Have a good one, everyone.