Python Bytes - #281 ohmyzsh + ohmyposh + mcfly + pls + nerdfonts = wow

Episode Date: April 28, 2022

Topics covered in this episode: Take Your Github Repository To The Next Level 🚀️ Fastero Watchfiles Slipcover: Near Zero-Overhead Python Code Coverage Extras Joke See the full show notes fo...r this episode on the website at pythonbytes.fm/281

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 281, recorded April 27th, 2022. I'm Brian Ocken. I'm Michael Kennedy. And I'm Anna Astori. Welcome, Anna. Thank you. Before we jump in, tell us a little bit about yourself.
Starting point is 00:00:18 Yeah, definitely. So I'm a data engineer, or at least at the moment. I'm a byte trainee. I'm a data engineer, or at least at the moment. By training, I'm a linguist, so I'll be doing both theoretical linguistics and computational linguistics. So I'm really about how the information is encoded in our brains and how we share this information. And that's why I work in the tech. Nice. Since I got my master's in computational linguistics, I worked at Amazon, at Alexa AI org for a while. I first worked as a language engineer, actually, so it was more on the side of linguistic side
Starting point is 00:00:55 of things and dealing with extracting the semantics and the meaning, really, out of the data for Electro. Then gradually I switched over to just data processing and been in the role of data engineer for about three, four years now. I'm currently with Dicofone, which is the worldwide sports retailer.
Starting point is 00:01:16 Still working lots and lots of data there. That is fascinating. Yeah, it's been a ride. Yeah, it's really neat how we can speak to our devices these days and they kind of actually work, do amazing things, right? Like I know when Alexa first came out and Siri especially, it was like, ah, I don't really want to, that thing is so not getting, and now I talk to my devices all the time. It's amazing. Yeah.
Starting point is 00:01:43 There are some things that are really sophisticated that they haven't you know sometimes i can't even believe like where we're actually getting there so it's pretty exciting yeah and sometimes i admit that you know in several things that they're like really you can't do it yeah but i realized that having worked on that actually i realized that sometimes it's just um kind a thing that, you know, like from a professional standpoint, it might seem like kind of trivial to me, but I realized that, you know, there's so much work with things and then this AI of the actual device that sometimes just like you don't get, you know, go to little like corners, right?
Starting point is 00:02:23 So one of the things that I got to work at some point was actually helping Alexa kind of know when she needs to stop. When she needs to stop talking about things and telling you about things, like whatever she thought on Wikipedia right now. So yeah, it's funny. Fantastic.
Starting point is 00:02:42 So, well, for our first item, Michael, do you want to kick it off? I will definitely kick kick it off let's take it to the next level with this one so this is an article by eluda called take your github repository to the next level and there's kind of 13 levels but you know i guess it's a spectrum you decide which level you want to take it to. So here are basically 13 ideas on how your GitHub repository can be better. So there was a topic I was going to cover after I explored it more. I decided, eh, not so much. But as part of it, there was a conversation
Starting point is 00:03:15 about some WebAssembly stuff in Python, and I checked it out. It's really cool. They're like, we're going to use this library. This is the fundamental thing that makes it work. And I go to the GitHub repo for that, and it says, here's how you build it. And that's it. I'm like, wait. okay, great. But why do I want it?
Starting point is 00:03:32 What can I do with it? How do I use it? I don't care about how do I build it. Like that's the last, I'll just download the WASM file, but what do I do with it once I get it? Right. It was just none of that. And so that's kind of, you know, this article helps you think through those ideas. Oh, nice. So number one, and you know, it's Python friendly because it starts with zero step zero rather than one, make your project more discoverable. Now, every one of these comes with a recommendation, a bit of a description, and then examples, which is cool. Nice. So for example, this one says what you can do is to help people find your project if the name of your
Starting point is 00:04:05 project does not carefully describe what it is you can put tags basically so like refactoring or science or things like that might be something you put on there that's not immediately obvious from it right so you can tag subject areas and whatnot and they have some examples so for example there's this thing called well app which is like a mindfulness app for the Mac. Of course, it's for the Mac, isn't it? So it has tags such as macOS, productivity, happiness, mental health, but also Flutter and web app if people wanted to check out a Flutter web app, right? Okay, so that's, you know, there's other examples as well. That's step zero. Step one is choose a name that sticks. Something that's available on PyPI, something that people can Google, something that people
Starting point is 00:04:52 want to say. It doesn't sound silly or unprofessional if they were to use it. You wouldn't call your web app Fancy Pants Server, right? You wouldn't say, well, our Fancy Pants Server is really scaling today today. You wouldn't want to speak that way necessarily, so don't name it that way, right? So choose a name that sticks. And that we can say on air. Yes, exactly. And is somewhat predictable in the pronunciation maybe because that's also a challenge. So there's some examples of like... Anna, what do you think? Yeah, absolutely.
Starting point is 00:05:26 Just thinking about the name, something that I ran into today, particularly with Python, some of the services or applications and libraries as well that have them and in PY and sometimes we don't know if it's high or B in that case. It's like all confusing and then you're talking to somebody else
Starting point is 00:05:45 who's talking about the same thing. They're like constantly confused. So yeah. Yeah, I agree. It matters a lot. Let's see. So some of the things are conduct a thorough internet search for the name,
Starting point is 00:05:56 avoid hard to spell names, get the dev or.io domain if you really, really care about it. Is it some random small little package or are you trying to create the next fast API? A name that conveys some meaning. I was thinking about Jupiter, for example, like Jupiter is pretty interesting because it's kind of hard to spell, but once you know it, you just know it. And it very clearly works well in a search. There's probably no domain name. That's like a misspelled planet type of thing. You know, I mean, it was probably a
Starting point is 00:06:24 really good choice, even though it kind of breaks the maybe hard to spell at first. Yeah, but it's easier to search, right? So yeah, yeah. So the example they give for this one is size limit is the name. And what does it do? It calculates the real cost to run your JavaScript app or lib. Keep good performance.
Starting point is 00:06:40 It'll show an error in a PR if the cost, basically file size, exceeds the limit. That's cool. The next one, I'm all about exceeds the limit. That's cool. The next one, I'm all about this. Display a beautiful cover image. So if you go to a repo and it's just the text, that's not amazing. You want some color and you don't necessarily have to have an amazing logo. So they come back to this Well app and it's just a W with a little connection smile or something under under it. One thing I did learn about this though, that I thought was interesting, like how do they
Starting point is 00:07:08 center this image, but not have it go all the way across the readme? If you go to the readme and you actually look at it, apparently GitHub will let you put full HTML inside of your readme for the segments that need lots of formatting. I thought they wouldn't. I know some Markdown does fall back that way, but I didn't think GitHub did. Anyway, apparently, yes, you can. Also, this one's quick. Badges like, is CI passing? What's the license?
Starting point is 00:07:32 And so on. Is there a YouTube link, like a YouTube channel that shows people how to use it? Some more of those as examples. Write a convincing description in a paragraph or two. Things like, what is this repo or project? How does it work? Who will use it? What is the goal? And so like, what is this repo or project? How does it work? Who will use it?
Starting point is 00:07:47 What is the goal? And so on, right? Real simple one. And again, they come back to the size limit. It's a performance tool that'll crash your CI if it's too big. Here we go. Getting to the ones that Brian and I love.
Starting point is 00:07:57 Record visuals to attract users. Yes. So you might think there's no UI aspect, but here's a full-on CLI example that is create Go app CLI. And all it does, imagine this, it creates Go apps on the CLI. It's a good name that conveys what it does. But if you go to C, it's like, how do I create one?
Starting point is 00:08:18 It has the option, but then under it, it has an animated GIF doing the things that creates the app and showing you the tree structure that results, you know, the file structure that results and so on. Then a full video and a documentation to that thing and so on. So that's pretty awesome. And how about you? Brian and I are always trying to quickly jump into a project and figure out what is it about? Is it polished and so on? But, you know, that's because we run this podcast.
Starting point is 00:08:42 How do you see this sort of pictures and animations for repos? Yeah, that's super helpful. I really like the idea with the animation, just basically taking you through the kinds of things that this particular app, for instance, can do. That's super helpful. More and more people are doing it. I don't think it's super popular yet. I don't know about how about you guys,
Starting point is 00:09:08 but I haven't seen it pull up, you know, times. Yeah. Yeah, it definitely looks nice. Yeah, I really like it as well. All right, let's see. Another one is create a practical usage guide, like how to use it with some examples, some templates, answer common questions,
Starting point is 00:09:24 like an FAQ, I use it on Windows, some templates answer common questions like an faq i use it on windows or does it require admin support i don't know something like that build a community so maybe you have a this is probably further down the line but like do you have a discord community for your project or you can even just enable discussions on the github repository i'll end up with people opening issues on my various repositories saying, I have a question. Okay. A question is not an issue. An issue is the thing that is wrong or a thing to be improved, but they don't have another way to communicate traditionally. But GitHub now has, in addition to issues, they also have a discussion section that's more open-ended. So I think that's
Starting point is 00:10:00 off by default, if I remember correctly, at least on the older ones it is. So I go and turn that on. Code of conduct. That's all good. Contributor guidelines. Choose a license, the right license. Remember, if you don't choose a license at all, that means it's unlicensed and people can't really use it. So add a roadmap, create GitHub releases. One thing that I didn't pull up that's pretty cool is release drafter. I'm not sure if you all are familiar with this, but this is's pretty cool. Is a release drafter.
Starting point is 00:10:25 I'm not sure if you all are familiar with this, but this is a pretty cool thing as well. Release drafter drafts your next release notes as PRs are merged into master or main depending on how you set up your repo. That's pretty cool. Customize your social media preview. So if somebody shares your project, you can control what is shown
Starting point is 00:10:45 in that little Twitter card or other cards. So apparently that that can be customized inside of your GitHub repository and launch a website. Off it goes. You can use GitHub pages or Netlify is really easy for,
Starting point is 00:10:57 easy and free for static sites and so on. So anyway, there's a bunch of things people can do to take their repo to the next level. What do you all think? I think it's great. Yeah, I love this list. list it looks very nice i don't do any of these things and i probably should so i might have a picture i have a usage guide oh there's also one that talks about how to install it that i somehow skipped but most things don't need so one of the things
Starting point is 00:11:20 one of the things that i see a lot is uh I don't know if this covers it, but I see documentation that's on Read the Docs, which is great. But I still think a quick start or a little like this is how you install it and this is how you can do a little bit of something with it. That should be in the Read Me, even if you have other documentation, because I don't want to have to just go to the documentation to see if this is the right project for me. So, yeah, this is great. So we have a question of does how does one create a CLI animated GIF? And I don't know if the if this article covers that, but I don't think so. OK, left a left a research that and get back to you. Yeah. Well, Alvaro, what I do is I'll use Camtasia,
Starting point is 00:12:08 and you can record a Camtasia video of just the window, and then there's different output options like just audio or just the video or an animated GIF. Oh, okay, cool. So that's one of them. Jeremy Page points out there are a few tools to record that in a cinema. I don't know. Like ASCII cinema, basically.
Starting point is 00:12:30 I don't know how to say that. It's often used pretty cool. And Dean. You know the hook of names. Exactly. I'm at a loss on that one. Claudia, who I just had on TalkPython, has a blog post about many of those things. And he has a better release drafter and badges. Yeah, I covered that on TalkPython just recently about hyper-modern Python.
Starting point is 00:12:49 Awesome. Well, that's probably way more than people want to know about their GitHub repository, but so often GitHub repositories these days serve as your CV or your resume when you go to apply for developer jobs. And if you end up at somewhere that looks like what they described here, rather than a bunch of things with like weird commit messages and nothing like that's going to make a different impression. Or if you want people to adopt it and start using it. Yeah. And if you don't, then don't put this stuff in. Yeah, exactly. All right, Brian, let's go faster. Well, let's go faster. Speaking of CLI, so this is a fun tool. We're talking about Fastero. Faster, I don't know.
Starting point is 00:13:31 Fastero, I'm going to go with that. So this is a, it's like Timeit on the command line. So, but it's pretty neat. So this is by Arian Wassey, and we've covered something of his before um so it was the type explainer thing oh right i don't remember its exact name but type explainer where you put a typed thing in there and it would humanize what those meant so i this is a simple little tool but i'm loving it already so uh this one of the, it, it does either it times stuff, but it also
Starting point is 00:14:06 compares times. So like in this, we're showing the website here, but and it, I can't, I can't tell what their timing. So let's just pull over in the documentation. It does have a bunch of examples. So if you ran faster with with two code snippets, and in this example, we're showing is just either just showing either a string or an F string, timing those. So that's pretty neat. And those so those two code snippets, if you run those, it'll run both of those a whole bunch of times and do some statistics. Like in this example, it's running it 20,000 and 50,000 times, but no 20 million and 50 million. Wow. Um, and then, uh, it shows you a little progress bar and, um, and then who wins. Um, but if you don't, if you're not comparing two things,
Starting point is 00:14:56 it'll just show one with the same graphics, but you can do more than two. I did like three or four, just trying this out to time different things and compare them. And this often, that's why I'm timing something. I'm comparing two things and I want to see which one's faster. So this is a really cool feature. You can either pass in code snippets or you can give it to Python file names and it'll run both, both those things. One of the, it's kind of a whole bunch of really cool features actually. And one of the things i like is uh you can if you've got some a code snippet that you are um it needs some setup but that the setup part isn't the part you're timing you can give it some setup code to do before it does the time part so that's pretty neat anyway uh just a really nice looking command line interface
Starting point is 00:15:44 timing tool yeah that's very cool so you can sort of isolate the thing that you really want to time the setup thing you don't really care about yeah i haven't tried the setup part but it's cool that it has it in there's um there's a documentation is pretty thorough actually as well um quite a bit of customization available. That's cool. Yeah, I agree that it's nice, that setup stuff. Because so often, if I want to profile some web app or something,
Starting point is 00:16:13 the thing I want to profile is dwarfed by just loading up the framework and scanning all the files. And you're like, all right, now I've got to hunt down that little fragment that actually represents what I'm really after. So, pretty cool. Yeah, maybe I'll try one of those sometime. Yeah. And you can pass in strings of Python or you can pass in files.
Starting point is 00:16:30 Yeah. And when I saw the strings bit, I'm like, all right, there's a good use case for semicolons in Python. Well, you can use them. Yeah. So. Exactly. It makes you feel better.
Starting point is 00:16:41 Awesome. That's a good one. All right. Anna, on to you. What's your first one here? Yeah. So I wanted to talk a little bit about, well, data, my line of business. And I was just thinking that something that could be really interesting, especially for
Starting point is 00:16:57 that part of our audience that works with data science projects. Well, in general, you're collecting data. You definitely, in most cases, you get some kind of noisy data that you need to clean up and filter out in some way. And particularly, I imagine we have a pretty large international audience as well. And also, on the other hand, if you're working with data from social media, which is very popular right now, one of the questions that you have to solve there is identify the human language of the data that you're working with. And then you want to filter out the pieces of data that are maybe for example are not in english if you're going through um going through a social media post or something all right you get
Starting point is 00:17:52 that little translate this to your language little button at the end if for some reason the popular post is in spanish or something right exactly yeah and some of the platforms and their APIs rather do provide this kind of filtering on their back end. I know Twitter does that, but also as I know, sometimes it's not as reliable really. I guess maybe again, like I could imagine that maybe it's not really sort of the ultimate goal, the fact that maybe not putting as much love and caring to this question. So that's something that I had to deal with a few times also. And a couple of libraries that have worked with our Blang ID and Blang Detect,
Starting point is 00:18:40 there are a few more out there, and these ones have been out there for a while, actually. And LangID hasn't been actually sort of worked on actively for a few years now, but it's still kind of one of those benchmark libraries for this kind of questions. And both of those are super neat, actually. So LangID is really popular, and. So, language ID is really popular. And one of the things that I really liked about it is that it actually covers a lot of languages. So, I've actually had different pieces of information depending on the documentation that I was using.
Starting point is 00:19:19 Either at HiFi or at the GitHub page. So, at some point, I was covering 97. And I think the GitHub page. So at some point, I saw it was covering 97, and I think their GitHub page is saying 97. 97 is a lot of languages. I couldn't name 97 languages. I'm a linguist. I would have trouble naming, you know, 97 languages off the top of my head.
Starting point is 00:19:39 I definitely don't speak 97 languages. And some of the nice things about it is that you can use it as sort of like a standalone module, like a command line tool, for instance. But you can also use it as a web service. So that's really neat about it. And some more like needy things that were really helpful when I was trying it out for some of the Mark projects was that when you try to identify the human language using one ID, it actually outputs the weight and the calculations on it,
Starting point is 00:20:20 which is very typical in like a lock space. We have like this funky numbers in the end, truly speaking. But the good thing is that you actually can convert them to more confident scores that especially data scientists are used to. And that actually comes in super handy because sometimes when you're trying to filter out the data and you know that these kind of tools are obviously not 100% reliable. You can also use this as a course to maybe use it as again, I said, okay, I'm taking this answer and I'm relying on that. Or maybe I'll just drop this piece of data altogether because it looks like
Starting point is 00:20:59 the language identifier is not super actually sure what kind of language it is. If you're targeting a specific language. This is wild. So you basically might say we're 80% sure it's English, but it might also be Spanish or something. Exactly, yeah.
Starting point is 00:21:19 English can be easily confused with maybe German or sometimes French, just because of so much of the vocabulary circling around those two languages. So yeah, so the identifier is not going to be like 100% sure that, you know, this is the language that, and the funny thing is that I'm not so sure about langID, yeah, lang ID is also statistical, actually. No remembering. And so LANG detect as well.
Starting point is 00:21:50 And sort of the flip side of that is that it actually works very well. The bigger piece of data that you're fitting into it, the more confidence it's going to be. Like, right, that's how statistics work. Yeah. That's how machine learning works, generally speaking. And if you're working specifically with this kind of short tweet, social media post, if it's a really short phrase, sentence, interspersed with emojis and stuff, it's probably not going
Starting point is 00:22:23 to be super confident. So the bigger the data, the more confident, the better the performance of the language identifier will be. So something to keep in mind when you're working with candidate and you're trying to filter by language frame. Yeah, that makes sense. If you have one word or something, it's very hard to go off. Yeah, exactly. So this being one file, sorry, this being one file is insane. Like it acts
Starting point is 00:22:46 as a web server and does all sorts of stuff. Crazy. This is crazy. Yeah, and it's something that I really like about it. Pretty lightweight, sort of well-isolated, low dependability kind of package, which is fascinating. Based on that,
Starting point is 00:23:02 kind of not a super sophisticated, naive-based algorithm, if I'm remembering it actually correctly. So yeah, that's really, really fun. It's really nice. It works so nicely. And the other, which I happened to find a little bit more robust when I got to work with language human language data in my project. And it's also really neat and easy to use. The great thing about the basic usage is it's very straightforward. It's like one of those packages you discover.
Starting point is 00:23:50 You know immediately what it's doing, how it's doing it, and you really can understand in five minutes if it's going to be something that's going to suit well in my project when I put it. Sure. So the main methods are detects and detects length. So you can either just call it in a piece of data and try and get the most probable language package things to do. Or you can have return the elite of possible languages. So it's going actually to to order them.
Starting point is 00:24:28 You can do maybe English, and then there's a tiny fraction of the ability that it's going to be maybe German, or something like that, and then you can decide for yourself. And, yeah, so, overall, from my experience, LangDefect works, and they don't make languages a little bit better
Starting point is 00:24:43 than LangDavy, but that sort of looks empirical. Yeah, that's great. It seems super useful for anyone that needs to parse text and can't be sure it's all in one language. Yeah, so if anyone now is working on some kind of data science project, working with a general language data, I would highly recommend. And probably one of the things why language science is a little bit more confident and robust, I know that it covers fewer languages. So I think it's 55 languages total compared to
Starting point is 00:25:19 97 for a language I do with it. Yeah. Yeah. Interesting. Nice. Well, Michael, let me tell you about our sponsor for this episode. Before we move on, it's a podcast. Amazing. So this episode of Python
Starting point is 00:25:38 Bytes is sponsored by the Compiler Podcast from Red Hat. So everyone out there, just like you, Brian and I, we're both fans of podcasts, listening to podcasts all the time and stuff. That's why we started some, we like them. And so I'm happy to share a new one
Starting point is 00:25:51 from a highly respected open source company, Compiler, an original podcast from Red Hat. With more and more of us working from home or being more disconnected, it's important to keep our human connection with technology. Compiler unravels industry topics, trends and things you've always wanted to know about tech through interviews with the people who know best. So on Compilator, you'll hear a chorus of perspectives from diverse communities behind the code. These conversations include questions like, what is technical debt?
Starting point is 00:26:19 What are tech hiring managers actually looking for? Hint, see item one to some degree. And do you know how to code to get started with open? How do you know how to code to get started with open source? All right. I was a guest on Red Hat's previous podcast called Command Line Heroes, and that was a super produced and polished podcast.
Starting point is 00:26:42 It was a really cool experience. And so compiler falls along in that excellent tradition and that polished style. So I checked out episode 12, how we should handle failure, which I found really interesting. I really value their conversation about making space for developers to fail so they can learn without fear of making mistakes, you know, like taking down the production website and so on, right? People grow through experimentation, but they also fail if they try new things. So you got to make sure that they get a chance to grow. So learn about the compiler podcast
Starting point is 00:27:11 at pythonbytes.fm slash compiler. The link is at your podcast player show notes right at the top. You listen to it on all the places that you would think. So thanks to Compiler Podcast for keeping this podcast going strong. And Brian, also just real quickly want to point out i know people can just go to their podcast app whether that's pocketcast or overcast or whatever
Starting point is 00:27:30 and type in compiler and search but please visit pythonbytes.fm compiler and there's a place to subscribe with all of your various podcast destinations that way they know it came from us rather than just out of the ether so um if you're going to subscribe or check them out please do through that link just so people know nice yeah no so uh how about how about we talk about watching some things like files yeah we were listening so now we're watching we were listening now we're going to watch but watch them for changes not watch what they are so this one comes to us from samuel colvin of hydantic fame. So, you know, it's a pretty cool, a pretty cool experience behind developing this API. And the idea is it's a simple, modern and high performance ways to watch files for changes.
Starting point is 00:28:16 So there's a lot of reasons you might want to do that. You might want to say, if somebody drops a file into this directory, I'm going to kick off a job to like load it up and process it in some kind of batch processing. Or I want to have my web framework automatically restart if this, any of the files in here get changed, right? Any of the Python files or whatever. So you could use it for things like that. But the modern part's pretty interesting. It hooks into the underlying file system, the underlying OS notification systems, and that's done through the notify rust library. So basically it's a low latency, high performance,
Starting point is 00:28:53 native non-polling way of watching the files. It just goes to the operating system and says, hey, in this directory tree, if anything changes, call the callback. Nice. That's pretty awesome. Yeah. So there's real simple uses here. Like I can say from watch files, import watch, and then just four changes
Starting point is 00:29:09 in watch some path, then you can process those changes. So here's an example of an app that just starts and its job is to, as things change here, take them up. That might be an example of what I said about kicking off something over to like load it and parse it and decide what to do and then maybe pass it to Celery for background work, right? On the other hand, you might want to do other things while you're watching for changes as well in your app, IO based here, you can just say, kick off the watching bit and await for the changes to happen and then do other async processing like fast API or web or database calls, you know, web with HTTPS or database calls with Beanie or whatever other async IO things. And it's sort of lets you run them in parallel, which is cool, right? Yeah. And then if you want to go even further, you can kick off a separate process and say, start
Starting point is 00:30:06 a process that will watch for changes here and then call back this function if those things change. So that's pretty cool too. So there's all these different ways in which you can use it. But yeah, it's pretty neat. It's based on this REST library and it seems pretty powerful. There's also a CLI, which I did want to point out one other thing over here. I thought this might impress you, Brian. Definitely.
Starting point is 00:30:30 I can do a command line watch files command that will say watch this directory and if anything changes, rerun the failing tests. That's very cool. That's cool, right? So you just do watch files and you run the string PyTest dash dash LF, which is PyTest rerun the failing tests if anything changes. I think that's neat. The command line stuff is actually cool. I'd check it out just for the command line usage.
Starting point is 00:30:55 But the ability to use it programmatically too with an API, that's impressive. And I'm very happy they included that. Yeah, absolutely. If you're going to use it through the CLI, this is the perfect PIPX install type thing, right? PIPX install watch files and then it's not really tied to any of your projects. It's just always there. Anna, what do you think?
Starting point is 00:31:14 Yeah, but that looks super neat. Just made me immediately think about file triggers that are one of the things that are built in most of these. They're just widely used and we now spend a cloud storage as well. Yeah. Yep.
Starting point is 00:31:30 I can imagine like all the possible ways that it can be used. Yeah, that's really neat. I wonder if in their documentation, they actually provide any popular use cases or anything. They might not do that, but I'm curious if they actually do. Yeah, I didn't see any in particular. I just took a couple of examples on how you might use it and all. But yeah.
Starting point is 00:31:50 Yeah. There's an older project called Watch God. I don't know anything about that one. But I'm glad I didn't learn about it because now there's a new one called Watch Files. But if you're using the old one, this is the successor to that as well. It's a funny name, but I could see why some people might not want to use it so yeah well i can see item one right pick a name that people are willing to talk about exactly yeah well i want to talk about a new tool as well coverage not so hopefully all of us are familiar with coverage.py so uh it's uh maintained by nedelder, a really cool tool.
Starting point is 00:32:25 But there's a new guy on the scene, and the new person on the scene is Slipcover. So Slipcover, and actually I heard about Slipcover through the coverage.py Twitter account, which was interesting. And so not surprising, though. Ned's a pretty open-minded guy. But so Slipcover is coverage, but it's pretty new. So some of these commits, it's just within the last week or so that this came in. So it's still at, I think the version is 0.1.1 or something like that. We even just got a new one out this morning.
Starting point is 00:33:03 So why would you want to use something different? Well, the, the, the big selling point of this is it's really fast. It uses a different, a different process for, um, for getting the coverage information. And it supposedly is only a 3% overhead, which, um, depending on your code coverage that.py can sometimes slow down your code significantly. And if you've got a really long running test suite, making it even 20% faster, but sometimes coverage can make it like twice as slow. So if you've got a five minute test suite, that makes it 10 minutes and that's a little painful. So, uh, this might be worth checking out. It's quite a bit faster. I tried it against flask, uh, as an example and the, the flash numbers. Um, so flask has got a pretty tight test suite anyway, but, uh, so just
Starting point is 00:33:58 straight pie test on my machine, it was like 2.7 seconds with coverage was about four, 4.3 seconds. And then with slip cover, it was just a little slower than just pie test. So pie test 2.7 with slip covers 2.88. So just a little tiny bit more and you get coverage information. That's pretty cool. It is in the early stages though.
Starting point is 00:34:20 There's some, there's some kinks to work out still. So I would try it out and watch this space um i think they're doing some really cool things definitely worth watching but uh like for instance i ran into issues on projects that use pytest plugins i don't know why but the plugins don't get loaded so the uh like for instance um i tried to run this this flask example but with xdist uh so that i could run all the tests in parallel to see if it sped up parallel runs it also it didn't recognize the parallelism so i'm not sure what's going on there but i am in communication with uh juan uh one of the the maintainers of this or um let him know what what found out. I'm not just griping and not trying to make it better.
Starting point is 00:35:06 I'd love to have this be a really cool tool. It looks neat. Yeah, go ahead, Dana. Yeah, and so the near-zero overhead is mostly due to how they managed to provide that. We talk about it in the world of presentation. It's really interesting. Yeah. Yeah yeah with such a
Starting point is 00:35:26 overhead i'm tempted to think of a more diabolical use of it like i've got i'm handed some crummy old app that doesn't really have tests i gotta figure out well what part of this is dead because i don't know if you've ever picked up some old app that's evolved and evolved and there's just stuff people don't take out because they're afraid to. Just run this in production for a while. Oh, yeah. And just go, okay, these things don't look like they're doing it. There might be some case I need to track down,
Starting point is 00:35:52 but this gray area over here that's not touched, let me look for things to delete over here. That'd be kind of fun. That's my favorite use of coverage is looking for dead code. Yeah, exactly. Before we move off this, Brian, Avara asks, does it have a pi test plugin
Starting point is 00:36:06 i know you said it doesn't work to run plugins but this is the reverse question um i don't i don't think so so you're running you're running uh slipcover and pi test at the same time i don't think you really need a pi test plugin for it um it it uh i would it does run work with PyTest, so you can run PyTest operations with it. Nice. Just not the bells and whistles yet. Right. But I'm sure they'll get there. Yep.
Starting point is 00:36:33 I would love to circle back to the data. It may sound like a perfect record, but that's my favorite topic. No, it's great to have you on to talk about it because Brian and I don't live in the data science world, right? So it's really cool. Well, you're welcome in our world. There's a lot
Starting point is 00:36:51 of fun stuff happening here. Well, actually, if you think about it, they actually are beginning, right? Even before trying to wrangle the data and trying to uncover any interesting information of the data, you have to get it somehow. And sometimes if you're particularly working on some sort of side projects on your own,
Starting point is 00:37:14 you want to maybe try out a new tool or maybe if you're doing like a machine learning project modeling approach, you usually need some very specific data to work on. And how do you get the data? Well, you have to actually go and maybe find some examples of the data on your own. And so something I wanted to talk about today was actually web crawling and web scraping and a couple of tools for that. So one that is quite popular and it's actually like an industrial grade kind of tool is, well, actually either Scrapey or Scrapey
Starting point is 00:37:54 as for both variants. Yeah. And it is a pretty great tool. So one of the great things from the get-go about it is that it actually has built-in shell, so you can just go ahead
Starting point is 00:38:13 and sort of try out things in the CLI, get the response from a URL, for instance, and then try to hook around it and then test out the behavior, which is really nice, and then see what kind of things
Starting point is 00:38:26 you might want from there. And if you actually go ahead and use it for your module to get some data, it provides all sorts of real-life functionality. To begin with, for instance, it's a choice between using other CSS selectors for the content of the pages or Xpath, which is obviously a little bit more flexible.
Starting point is 00:38:55 All the colors and trails. It's more fragile though, because if they make any change to the page, it's... That also, yeah. Yeah. But still, yeah, well, it's part of the game. Yeah, that's right. Yeah, and then some other really nice things about it
Starting point is 00:39:10 is that actually they do a lot of heavy lifting for you in terms of printing and templating. So you can build some methods for a start project, and you can run that. And right away, you have the whole structure and also the boilerplate code. And you just fill in certain pieces, for icon processing, which is in the pipeline module, the things in the setting, etc.
Starting point is 00:39:39 And there you go. You have a huge amount of work already pre-done for you. And then some other nice things about it is that they also provide you with numerous choices actually for exporting the data and for storing the data as well in a few places, and the formats that you would love to use for it. All the typical center things like CSV, JSON, etc. Sure. Some more, some less frequent options, really.
Starting point is 00:40:17 Yeah, another thing that's pretty interesting about this whole project is that there's a web scraping as a service company behind it, right? It used to be called Scraping Hub, now it's Zyte, Z-Y-T-E. And you can basically go in there and just sign up and hand out one of these spiders and it'll just run it on the different servers, try to avoid getting blocked, all that crazy stuff. Exactly, yeah. So therefore, it's so elaborate.
Starting point is 00:40:44 And they really put a lot of, just like I was talking before, a lot of blood and tears, like all of the sorts of functionality, like covering all those corners of what you might want from a web crawling tool. And some other examples that I found particularly useful, for instance, is the link extractor class. It's like really getting to like the ingredient parts of the tool where you can extract further
Starting point is 00:41:13 links from the page, but only those ones that, you know, adhere to a particular pattern, for instance. And the list that you get is already duped. So once again, it's like a little bit so much of, you know, the dirty work on your part. So that's really great. And they do provide actually ways to interact with the pages as well. There's a form of class that you can use as it does provide some functionality where you can interact with the page. But I haven't used it as much myself. use as it does provide some functionality where you can interact with the page, but I haven't used it as much myself, so I'm not entirely sure how fascinating it is.
Starting point is 00:41:53 But it is probably well done as well. And another library that I wanted to touch on briefly today as well was Roblox. That's actually something new for my post. That's something I'm in the process of exploring, so I haven't had a chance to work a whole lot with it yet. But it's been really, really interesting and I would love, I would be happy if I got to hear from somebody else to write it out or something. Because in the first place, it's still on top of HTTPX and VitafulSoup, VitafulSoup 4, rather. They're super popular in the data processing line of work and particularly in web scraping.
Starting point is 00:42:43 But they've added some really useful functionalities and it looks like it allows even more of this interaction with the pages in a very neat and clean way. You can probably find examples where we hear in the documentation. It looks so nice and clean and straightforward. It looks lovely. So yeah, I'm really excited about this package and hoping to have an opportunity to test it out soon.
Starting point is 00:43:13 Yeah, Roblox looks very interesting. It looks very Selenium-like, where you could actually control the page. It's more like, fill in the comments with this, fill in the first name with that, and then submit. The other thing that's cool about it is it has async support for doing all this. Exactly. You can scale it. Yeah. Oh, that's fantastic. Awesome.
Starting point is 00:43:33 Thanks. Nice. Well, where are we at now? We have extras. Extras. Extra, extra, extra. Hear all about it. I only got one.
Starting point is 00:43:41 How many you got? I got zero. Zero. All right. Anything else you want to give a quick shout out to while we you got? I got zero. Zero. All right. And anything else you want to give a quick shout out to while we were here? No. No. Okay, cool. Well, I wanted to tell you all about my terminal adventures, I suppose we'll call them. So I've been using Oh My ZShell, which is amazing. I love Oh My ZShell. But I also started playing with Oh My Posh
Starting point is 00:44:02 and Please and some of these other things. And I thought, oh, well, how am I going to decide between say OhMyPosh and OhMyZShell? Well, it turns out, Brian, you don't have to decide. You get both. So here's a little animated video I'll throw up for people who are watching. I'll put it in the links as well.
Starting point is 00:44:18 So here's, you can see this cool prompt, which is all driven by OhMyPosh, but you can see a like auto complete into git local git branches through oh my z shell for either branch or checkout and then on top of that we can do like pls which is amazing you can do oh and mcfly we talked about mcfly before which gives you auto complete into your history and sort of a emac style editor type of ai complete then pls for a ls replacement that is developer friendly with like little icons for the file types and it uses git ignore to hide stuff that you don't want to see and it's like python friendly like understands v and v's and
Starting point is 00:44:58 de-emphasizes them and all that kind of stuff so anyway uh people have been trying to decide between these things it turns out they all go well together. You don't have to decide. That's pretty cool. Yeah. Yeah. Yeah, I really like on my VSH and that looks even, yeah.
Starting point is 00:45:14 Yeah, all the stuff that works, you don't have to give up any of it. The only thing that isn't there is the prompt and the prompt is not all that great. Honestly, I mean, I know you can customize it, but I think it's better in online posh,
Starting point is 00:45:24 which is pretty amazing. So people who are listening, they can check out the little video I'll link to somehow find a way to do that in the show notes. So you all can check it out. Okay. That's my extra. Yeah. Yeah. How about a joke? And I guess, how about a joke? So we're all starting to go back out to dinner, restaurants, COVID's over, I hear. Not necessarily, but here's one from a slightly different perspective. It says, hello, I'm your server today. Brian, can you just describe for people listening what's in this picture? There's two robots at a restaurant sitting down, and there's a server rack next to them.
Starting point is 00:46:03 Okay. like a server rack next to him. Yeah, okay. And the subtitle is, when you go out for a bite, B-Y-T, he says, the server is by the table where the robots are drinking. He says, my name is DHX005972 and I will be your server this evening. That's funny. Follow this one. Thanks.
Starting point is 00:46:19 All right, that's what I got for our joke today. Nice. Well, thanks, Anna, for joining us today. Thank you. Thanks for having me. Yeah, it was great. Thank you, Brian, as always, and everyone out there listening. Thanks so much.
Starting point is 00:46:32 Have a good one, everyone.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.