Python Bytes - #250 skorch your scikit-learn together with PyTorch

Starting point is 00:00:00 Hey there, thanks for listening. Before we jump into this episode, I just want to remind you that this episode is brought to you by us over at TalkPython Training and Brian through his PyTest book. So if you want to get hands-on and learn something with Python, be sure to consider our courses over at TalkPython Training.

Starting point is 00:00:17 Visit them via pythonbytes.fm slash courses. And if you're looking to do testing and get better with PyTest, check out Brian's book at pythonbytes.fm slash PyTest. Enjoy the episode. Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 250, recorded September 15th, 2021. I'm Michael Kennedy. And I'm Brian Ocken. And I am Brayson.

Starting point is 00:00:42 Brayson, welcome to Python Bytes. Yeah, it's a pleasure. I've been looking so much forward to joining you guys. Yeah, you've been somebody out there who's been giving us a lot of good ideas and topics and helping us learn about new things. So you've been a big supporter of the show, and now you are part of the show. Yeah, hurrah. Hurrah.

Starting point is 00:01:01 Yeah, hurrah. I've been looking so much for it. Like for the first time I saw, oh, we can take part in this. I go like, oh, I should try to just get myself in there. And here I am. Yeah, here you are. Thanks for, thanks for doing that. That's really nice.

Starting point is 00:01:17 Tell people a bit about yourself before we dive into Brian's first topic. Yes. Well, well, my name is Preysen Daniel and I'm originally from Tanzania, but living in Denmark, married with three awesome kids. Currently, I'm a principal data scientist at NTT Data Business Solution

Starting point is 00:01:41 here in Copenhagen. And yeah, so accidentally became a data scientist and somehow discovering that I was really, really good at it. Then I just started climbing my way up thanks to the Python community and everything that is out there. Yeah, awesome. Congratulations. Nice to see you finding your way in the data science world. Very cool.

Starting point is 00:02:05 Accidentally becoming a data scientist. That's interesting. Exactly. Congratulations. Nice to see you finding your way in the data science world. Very cool. Yeah. Accidentally becoming a data scientist. That's interesting. Exactly. Yeah. All right, Brian, have people been doing things wrong? I think so. Including race conditions with screen sharing. Yeah. So I just couldn't resist this article. There's an article out called exciting new ways to be told that your Python

Starting point is 00:02:25 code is bad, which is just a great title. And the, the, the gist is there's two new pilot errors. So it's pretty simple. There's, but it made me think about my code a little bit. And the first one is, is an error to tell you to consider ternary expressions. So if you've got like if condition, and then you assign a variable in both the if clause and the else clause, and it's a short thing, maybe use a conditional expression instead and do all in one line, like say, and one of the examples in the blog post says x equals four if condition else five. So ternary operators are pretty cool and they're pretty easy to read in Python. But I was just curious what you thought. Is this is a ternary expression easier to read or more difficult?

Starting point is 00:03:23 Well, for me, I think this is pretty nice. I'm always on the edge about the ternary condition, the value if condition else other value. A lot of times it starts to stretch out to be a little bit verbose and then it's kind of, you know, it's not entirely obvious. One thing I recently learned about, I don't know how it took me so long, is the simpler version of that, like variable or other option at, without the, the, if else just the thing or that thing. Right. So for example, if you try to get a user back, um, and you just want to return the user or you want to return, uh, maybe you want to check if they're admin, if they are, you return

Starting point is 00:04:00 them. Otherwise you might turn them back. You could say something to the effect of if I say user or result equals user or false or something like that. It's not a totally good example here. But this super short version of value where you kind of have

Starting point is 00:04:15 the return value and the test and then the alt fallback, the else piece, it wouldn't work in the example I have here, but that one I actually started to really like because it's so concise. I don't know. I think I'm very traditional. I like reading my code up going down.

Starting point is 00:04:32 So whenever it started stretching sideways to me, I go like, oh, okay. I think I just love the flow of if, then I now have to look down for the else, right? But now I have to look the else from the other side. Then, yeah, but one-liners are good in some places, but in most of the cases, out of readability, I usually just try to avoid them.

Starting point is 00:04:53 Yeah, I do as well. The one thing I was thinking is interesting on the data science side, Preysen, is a lot of times you're trying to take, instead of statements, multiple lines, you're trying to create little expressions that you can put together in like little list comprehensions and other types of things and these these one

Starting point is 00:05:09 liners become really valuable there yeah yeah definitely definitely mostly when we're using lambdas everywhere right yes exactly yeah exactly so they then the next error condition is funny i think and it's just the while is used so it's just a warning to say you have a while in your code. And this, the comment really is there's, it's just not really usually good to have a while because it can like never terminate. You can, there's no, it's not guaranteed to terminate if you've got a while loop. So I thought that was interesting. I, I actually was just thinking about this the other day is that I can't even remember the last time I've used a while loop in some code.

Starting point is 00:05:51 So I think this is actually pretty good just to warn people they've got a while loop. It's pretty strong. It's a pretty strong warning to say you have used this language construct. That's a problem i i certainly think it's i'm on board with the the zen of the idea that most of the time a while means you're doing it wrong most of the time you could probably iterate over a collection or you could enumerate and then iterate over the index and the value but there are times where you actually need to test for something and then break out and to put it as a full-on warning just for its existence yeah to me it seems a bit too far but it's it's interesting

Starting point is 00:06:32 to say the first one um yeah i think these are both sort of in the eye of the beholder a bit yeah yeah i actually been in like in our team or in my whole existence i i think we're using while only once and this is on the computer vision. So you are trying to capture videos from the camera and then do analysis with them. So it says while there's a frame, keep on doing this. And of course you always have to catch some way to go out of this while loop.

Starting point is 00:07:02 But I think that's the only time we use while. And we usually want people to say never use while except when we're doing computer vision. Interesting. Yeah. Especially if you've got things like pandas and stuff where maybe you shouldn't even be looping at all. No, no. Not at all. Not at all. Yeah. Interesting. Interesting. A couple of thoughts from the live stream. So Sam Morley out there says,

Starting point is 00:07:23 X equals Y or Z is really handy for setting instance variables in a class where they're using nones. I totally agree. Chris May, hey Chris, says, Turner is a great idea if it's simple. Else, not so much. Yeah. Nice. Clever. Brandon Brainerd out there agrees with you,

Starting point is 00:07:39 that the traditional if-else is probably easier to read. Henry Schreider says, Turner is much better for type checking as well. Okay. Yeah, probably because the type reference is more obvious there. So yeah, pretty neat, pretty neat.

Starting point is 00:07:54 Also speaking of neat stuff, what if you could have all sorts of little placards and things about your readme? So here is a project I want to tell people about called GitHub Readme Stats. placards and things about your readme. So here is a project I want to tell people about called GitHub readme stats. And GitHub readme stats is pretty interesting. It's comes to us from Poma. So thank you, Poma for sending that in. And the description says it's it dynamically, dynamically generated stats for your GitHub readmes. But I feel like that scope is actually

Starting point is 00:08:23 way too short. It's dynamically generated little placards for wherever you want to put them on the internet. You might want to put them on a project's readme so the project can describe itself more dynamically, but you might also want to put it on your about page on your blog or something like that. So give you all a sense of what's going on here. If you come down here, you can have these different, there's a whole bunch of different options. You can get like a GitHub stats card, you can get extra pins, you can get the languages. Like for example, we could say what the languages you are most likely to use across all of your repositories, the walk of time, week stats, there's a bunch of themes and visualizations and stuff. So I think the best way

Starting point is 00:09:06 to get a sense of this is to see an example. So I put a couple of projects in my own self in here to kind of pick on me. So here's an image that I could add. I'll zoom that in. So I have this Python switch package that I created a while ago when Python didn't have anything like a switch statement. So I wanted to add a switch statement to the Python language, so I did. And apparently, here are the stats of it. These are live. If I refresh it, it'll regenerate it.

Starting point is 00:09:30 And it gives you a little bit of info about the project, like the name and its little description. It's mostly Python. As it says, it has 238 stars and 18 forks, which is pretty awesome. So all I got to do to get that is go up here and say, I want to get the pin and I want to have the username be Mike C. Kennedy and the repo be Python dash switch.

Starting point is 00:09:50 And then this returns an image that I can put, like I said, anywhere, right? If you put this as the image source, it'll go. It's not just like it'll only render on GitHub. It'll go wherever you put it. So I think that that's pretty cool. Another example would be your stats. I'll refresh this because there's a little animation. I can get my Michael Kennedy's GitHub stats.

Starting point is 00:10:08 Apparently I have an A++, but a two thirds closed red ring. I'm not totally sure what the ring means, but kind of a cool little graphic here. Apparently I've got 3.5 thousand stars, which surprises me. A lot of commits, 73 PRs, 103 issues, 23 repositories I contributed to. I don't know if that's this year or maybe this year. Who knows? Or total. Anyway, that's kind of cool, right? 73 PRs, 103 issues, 23 repositories I contributed to. I don't know if that's this year or maybe this year, who knows? Or total. Anyway, that's kind of cool, right? You could put that on your blog or somewhere where you're trying to talk about yourself,

Starting point is 00:10:33 like you're trying to get hired or you do consulting or something. And then the third one here is you can say your most used languages. So apparently I have most used JavaScript, which is very much not true. But I've probably committed a ton of like node modules to some projects that I don't actually want to have to, you know, re NPM install. I want to just make sure they're there for like a course or something like that. Right. But it'll show you through the breakdown of your various languages and whatnot. So that gives you kind of a sense of what these are all about, what the idea of this thing is. You generate these little cards and you can put them,

Starting point is 00:11:06 like I said, wherever you want. What do you think? Like on a resume page. Yeah. I really love it, but it's kind of sad because most of our time is spent in GitLab and all this other, and all our commits are done there. And then when I come to my GitHub, it looks so empty,

Starting point is 00:11:23 and it makes my heart ache. What has Prez been doing? He hasn't committed anything for a week. Yeah. Yeah. So it's really really awesome.

Starting point is 00:11:31 Yeah. Cool. Yeah I guess it really only works for GitHub and that's where it's really handy but still pretty nice. Do you know if the stats

Starting point is 00:11:37 are only on public repos or are they public and private? It's a good question. So you can choose as a user if you go down here and like the stuff that shows in your contributions,

Starting point is 00:11:47 in your GitHub profile, you can check whether you want public and private contributions to appear in that little green of how much contributions have you made this year by day. So maybe it depends on whether you've checked that or not. You know what I mean? Probably.

Starting point is 00:12:05 But it might not. Cool. Anyway, yeah, pretty cool little project. Bryson, you're up next. What you got? Yes, yes, yes. So I got this one here. Actually, this is something that has been covered.

Starting point is 00:12:18 Not covered, covered, but been mentioned. So I could see it in thenotes as when I searched through. Actually, Brian, you covered it in episode 182 with Hypermodern Python. I think it's just a name that was there. Yeah, but it was not mentioned. I think it's just been, oh, this could be used in this Hypermodern Python way of doing awesome stuff. And then in episode 248, it was mentioned again with hyper modern python cookie cutter but it's just like a footnote of oh it use uh nox instead of tox so this is really really an

Starting point is 00:12:55 awesome tool that we've been using uh recently because we uh when we do machine learning we are encountering a lot of problems where we have to test how our models are performing and how are they ethical. So the test, when we do tests of our pipelines, we're not just testing that the models are accurate or they are doing the things that they're doing, like the API. It's actually, you cannot just ping our API, you need to have keys and all those. We actually also have to test about the ethicalness of our models. So like if we say our models does not segregate

Starting point is 00:13:34 between, let's say, gender. So we test, we have counterfactual tests where we send different genders and see what are the models responding. Are they responding with similar results? So when we say it doesn't segregate between sexual orientation, then we send different inputs where it pretends to be either straight or homosexual and just try to say, do we receive the same results? So we've been trying to run this very in an automatic way. And before that, we use a lot of talks.

Starting point is 00:14:13 But the problem is, the way of defining your talks is just not Pythonic. You don't write this Pythonic way of doing things. It's similar to, we had this issue with Make. I really could not debug Make. So whenever I made a Make file, I copied from someone else and then changed some things because anything I touched, then I have a syntax error. Oh, this thing is not in the right place.

Starting point is 00:14:40 And then I came across Evoke, which it was almost like Pythonic. I can write everything in a Python way. So this Knox is actually similar to what Evoke did to Make, but it's doing exactly to Tux. So in this case, you can create simple pipelines like this one here, where it creates a session, installs the package that needs to be installed,

Starting point is 00:15:08 and then run whatever experiments you're trying to run. This is really handy, at least we found it really handy because you can select that it actually use the Conda environment, like the Conda world has been used a lot in data science. So you can say first create a conda virtual environment, install these packages and then test them. So what I like about this tool,

Starting point is 00:15:32 it's almost similar to PyTest. Like if you know how PyTest works, then you know how this guy works because there's a parameterization and whenever you run tests, you can select which part of station needs to be run. Like in PyTest, we use the dash K, run this kind of test. And here you use the same thing, dash K, rather only this kind of builds, right? So it is dope. We really, really enjoy that.

Starting point is 00:16:01 Like you can pass in a invariant variable, but I actually wanted to show you the coolest part here. Yeah, this does look nice. It's just amazing. I cannot, I mean, the guy who created this, I just give him all the thumbs up with everything that they have come up with. So it's really, really handy. If you're not using it

Starting point is 00:16:25 or if you're using Tux, you should probably consider changing to Nux. That's cool. You can, for example, write that you have a test and then say, I want this as a decorator, sort of parameterized. I want this to run on 2736, 3738, and then it'll do that, right?

Starting point is 00:16:43 Yeah. So you can see it's like this example here, right? So you can see we are parameterizing a different Django. So we want it to first install this version and then run the tests, right? And then later it will come and take this version and run the test. But then in the command line,

Starting point is 00:17:01 you can actually just select it to run only the test with this guy and skip this guy here. So it's really, I mean, it's the ability that it gives you, it's incredible. So if I could see, so you can see like here, right here, right? This is exactly what like it goes into the PyTest-ish world. I see. So you can run it and say, don't run the linter, or just lint it, don't run the test, or test.

Starting point is 00:17:29 You can even put Python expressions, it looks like, test and not lint, for example. It's just insanely great. Nice. Brian, what do you think of this? Oh, I really like Knox. It's neat.

Starting point is 00:17:44 The parameterize, the use of parameterize is really cool. And the example of using a couple different jangos is good, but you can also build up matrices of testing easily with a couple. You can stack these, so you can have two parameterize together. It's a pretty cool project. I just really love talks, so I haven't switched.

Starting point is 00:18:10 But I know that there's like Invoke also. People are using Invoke for automation, but people are using Knox for more than just automating testing. You can automate really whatever you want to. It's just running a command, right?

Starting point is 00:18:27 Nice. Yeah. Tracy, you've got a lot of comments from the live stream on this one. Henry Schreider says, I love Knox. Tox is mired in backwards compatibility defaults. It is hard to tell what's actually doing,

Starting point is 00:18:39 whereas Knox is simple. It doesn't hide or guess stuff. It's just programmed like PyTest. Sounds great. Sam Morley says, this is the only way to write a makefile. I mean, I had that one. Yeah. Henry also says the PyPA projects have some very powerful Knox files, CI build, wheel, pip, and so on, which is good. And then Sam Morley also has a question for you. Can it also, Knox, run external tools?

Starting point is 00:19:10 For example, build a C extension or run a C test suite? Oh, I don't know, Brian. I don't know that either. I assume so. It definitely can because Python has subprocess, but can it do it without you forcing that into it? But you could technically, Python, call this other command, right? Well, there's an example in the tutorial of calling cmake.

Starting point is 00:19:36 Yeah, I saw the cmake as well. So that probably counts, right? Yeah. Yeah, I think that would count. So it's just running a command. Yeah. Of course. And then, Brian, Brandon out there has a comment for you. New lights's just running a command. Yeah. Yeah. Of course. And then Brian,

Starting point is 00:19:45 Brandon out there has a comment for you. New lights look great. I agree with him. I actually need to adjust my camera a little bit, which is a little bit off on the lights. Very cool. All right. Let's see.

Starting point is 00:19:56 I think Brian, you got the next one. Oh, okay. I forgot what I was talking about. Yeah. So I've got the old document there. So I've got a couple of things I wanted was talking about. Yeah, so I've got the old document there. So I've got a couple of things I wanted to talk about. So this is one of those extra, extra, extra things, but there's just two.

Starting point is 00:20:12 A couple of things around dealing with text. And I've been playing with my blog a little bit lately, not really writing much, which is a problem, but actually dealing with some of the old things. Well, what you wrote looks really good now. Well, I'm doing some automate, trying to automate some of the parsing of some of the old stuff. So I grabbed a whole bunch of blog posts from WordPress and which, yeah, nobody needs to throw eggs at me. I'm already switching and using Hugo now. But I've got a whole bunch of files

Starting point is 00:20:47 that I automatically generated Markdown files, but there's problems with them. So I have to keep track of them. So I've got some scripts. So a couple of tools are helping me. Python front matter is a really pretty, it's a package that's, it's just a really small package,

Starting point is 00:21:04 but all it does is really takes like YAML style front matter stuff and parses those. You could just load it. So you load, I'm using a Markdown file. So the example shows a text file. And you can get at all the pieces of the file, like the content and stuff. But for instance, I can grab the title. You can look at what the pieces of the file, like the content and stuff. But for instance, I can grab the title,

Starting point is 00:21:27 you can look at what the keys are. But so for blog posts, I've got, you know, tags and the date, and it's all converted to Python objects. So if I have a date listed in a blog post, it'll show up as a date time object. So you can do math on it and all sorts of stuff. So this is pretty cool. It's really small, but super handy for what I need. So it's good. Yeah, this looks nice.

Starting point is 00:21:56 The other tool I wanted to talk about, which is even a tinier use case, I think, is called FTFY, fixes text for you. And really it just takes bad unicode conversions and makes them good so it takes like common problems with unicode conversions and uh fixes them in like where it looks like you have greek or russian letters or something instead of a space or apostrophe or something like that yeah like the one of the first example a quick example there's like yeah like this weird ae character and really it was intended to be a check mark so it just converted it to the the proper what it was i'm not sure how it's doing this but it's pretty neat that is very cool um the the this gets me all the time with stuff like

Starting point is 00:22:42 goes from word if i'm converting from word or something, um, or copying, copy and pasting, uh, or other things. There's a lot of different quote marks that word processors put in and T, it's this weird, ugly, big Unicode thing. Yeah, so just replacing that with an apostrophe is a good idea. Yeah, nice. Does it change single quotes to double quotes and stuff like that as well? I don't know. Should it? I don't know if it should either i'm not sure okay yeah this is cool so you just run this across like your markdown files or something like that yeah so i'm not using it really for the blog stuff but

Starting point is 00:23:36 there's there was some other text parsing i was doing where i was scraping some information from somewhere and it just was just gross uh it was a had a bunch of gross unicode stuff in it and i just wanted to you know have something easy to just convert it quickly and this does the trick yeah very cool nice one nice finds so i'd follow up on that i was playing with my oh my posh shell and the new windows terminal and the new windows power shell on windows 11 earlier this week trying to set up some testing over there and i found they have all these cool themes that show you all kinds of neat stuff so you can see like uh the git branch you're on and they've got these little cool arrows and all these colors and they'll even do certain things for like showing

Starting point is 00:24:22 the version of the Python virtual environment that's active in the prompt and stuff like that if you activate the virtual environment and all that had a bunch of weird blocks and like squiggly junk like that and so it's not exactly the same problem I'm going to talk more about this later but there I found that there's this place called nerd fonts and apparently Hoshel is tested on nerd fonts but nerd fonts is full of all these amazing developer fonts that have font ligatures and all sorts of cool stuff. And they're all free. There's like 50 developer fonts and terminal fonts and stuff. So yeah, one more, one more thing along those

Starting point is 00:24:56 lines to check out. Very neat. But what I wanted to talk about is stealing this idea from Pracin that he was going to cover, but I got to it, got to it before. So there's this new project that recently is making traction. It's, it's been around for a couple of months, even, I guess it's about two years old, honestly, but somehow it got discovered and is now getting some traction called Empire, M-P-I-R-E. And the idea is it's a Python package for easy multiprocessing. It's like the multiprocessing module, but faster, better, stronger. It's like the bionic one.

Starting point is 00:25:32 So the acronym stands for multiprocessing is really easy. I love that thought. And it primarily works around taking multiprocessing pools, but then adding on some features that make it more efficient. For example, instead of creating a clone,

Starting point is 00:25:49 a copy of every object that gets shared across all the multiprocessing, it'll actually do copy and write. So it won't make a copy of the objects you're just reading. It'll only make a copy of the ones you're changing. So if you start like 10 sub-processes, you might not have to make copies, 10 copies of that, which can make it faster.

Starting point is 00:26:06 It comes with cool like progress bar functionality and insight to how much progress it's made. It's also supposed to be faster. I'll talk about it in a second. But it has map, map unordered, and things like that, iterative maps. The copy on write I talked about, which is cool. Each worker has its own state and some like startup shutdown type of behaviors you can add to it. It has integration with TQDM, the progress bar.

Starting point is 00:26:33 What else does it have? Like I said, some insights. It has user-friendly exception handling, which is pretty awesome. You can also do automatic chunking to break up blocks of queues across sub-processes and multiprocessing, including numpy arrays. You can adjust the maximum number of tasks or restart them after a certain

Starting point is 00:26:53 number, restart the worker processes after a certain amount of work. So in case there's like a memory leak or it's just hasn't cleaned it up, you can sort of work on that and create pools of these workers with like a daemon option. So they're just up and running and they grab the work. Let's see, it can be pinned to a specific or a range, specific CPU or a range of CPUs, which can be useful for cache invalidation. So if you're getting a lot of like thrashing and moving across different CPUs, then the caches have to read different data, which is of course way, way, way slower. So a bunch of neat things. I'll show you a quick example. So in the docs, if you pull their page up, there's a multi-processing example. So

Starting point is 00:27:35 you write a function and then you say with pool processes equals five as pool, pool.map and give the function and the data interval and it runs each one through there with the empire one it's quite simple similar as you just create a empire worker pool and you specify the number of jobs it says the difference of the code are small you don't have to relearn anything but you get things like all the stuff i talked about the more efficient shared objects the progress bar if you want you can just say progress bar equals true and you automatically get a cool little TQDM progress bar. You get startup and shutdown methods for the workers so you can like initialize them and what else you need to do. So yeah, pretty cool little project and the benchmarks show it down here at the bottom in the fast area. So you all can check that out. Grayson, what did you like about this? Well, I think it's also going to transition really well

Starting point is 00:28:29 to the other topic that I have is I like when one creates an API that you can just easily plug to your existing code. So you can just import this as this and do not change the entire code and then you take care of that. You know, like writing your code in a way that one can just flag and play. That's the amazing thing. So it's easy that you don't have to relearn a lot of stuff, but it just gives you the power that you need.

Starting point is 00:28:56 So this is why we moved toward this one. So we gain the power without changing much of our code. Yeah. Yeah, definitely. I love that as well. You know, I think of like HTTPX and requests for a while and I think they diverged at some point.

Starting point is 00:29:10 But yeah, let's see some feedback from audience real quick. I'll jump back to the nerd fonts. Chris says they're amazing. Henry Schreiner says, Fish Shell plus Fisher plus Oh My Fish. Then the theme Bob the Fish plus Sauce Code Proerd font is fantastic

Starting point is 00:29:26 oh my gosh I have no idea I've explored this yet you're going to send me on a serious rat hole I'm going to be losing like the rest of the day to just fiddle with that I'm afraid well I keep on missing my terminal every time I start fiddling around

Starting point is 00:29:42 right? that's right because I'm using a VSL Windows subs around, right? That's right. Because I'm using a WSL, Windows Subsystem Linux, right? So whenever I fix something, then I get it right. And before I know it, I broke it again. But yeah, it looks really awesome. Yeah, fantastic. And then on topic, what I was most recently talking about,

Starting point is 00:30:01 Chris Mace's, whoa, Empire looks nice. Alvaro asked asked will it help to get logging working in multi-processing i don't know that'll make any change i mean it really is mostly still multi-processing so probably not yeah yeah very cool all right grayson i think you got the last one here yes yes yes so i have this awesome uh tool here It's called Scotch. It's really like a mixture of Scikit-learn and Touch. This is really, really cool, as we were talking about building an API

Starting point is 00:30:35 that it's easy to integrate. So if someone already knows Scikit-learn and a bit of Touch, then you don't really need to to learn anything in this tool because everything just fits in together so basically um in when you're using scikit-learn so if you are not familiar with scikit-learn it's just this uh what we call it the must-have toolkit for data scientists because here they they have created a really good tool with a really good API where you can build an entire pipeline from cleaning your data to building

Starting point is 00:31:13 interesting models and everything like that. But the biggest problem which we've been keep on experiencing when working with Scikit-Learn is when it comes to neural networks that you really don't have a lot of power to customize your networks in the way that you will um you like it's very limited with this input that you you already have here and in most cases someone says well just create your own neural network classifier or a regressor and then wrap it in the scikit-learn wrapper but then oh sometimes one does not want to do that but nice thing is another guys just came up with this project which is really really neat so basically it's just, I think mostly I will just go about, maybe I should shamelessly show you an example in one of my gists, which is, I know this is a shameless way to do, but it's easier like giving a demo on how it works, right?

Starting point is 00:32:20 So like if you're using scikit-le psychic learn you are very familiar with all these other tools that someone needs to have like the way to split your data etc etc but then right the pipeline and the pipelines and all that stuff yeah okay but the coolest thing is instead of using one of the psychic learn models you can create your own custom neural net right so this will be like a neural network where we decided what how many uh we decided how many nodes we want in the first layer, how many nodes do we want in the second layer. And here we can build as many interesting net as we see fit, right?

Starting point is 00:32:56 And then basically here, we just do the calling of it. So this is very standard high-touch way of creating your net. The awesome part is that now this net uh forgetting about all this process we can see so we just create this net wrap it up like this and now we are using it as part of our pipeline so you can see i will just go down right here so i am having my pre-processor scikit-learn-ish and I'm having my net. And the coolest thing is now I just call this thing as I will do with any scikit-learn model with my classifier.fit this. And later I will do my classifier.predict these things.

Starting point is 00:33:38 So this example is we're trying to predict the species of penguin given the data that we have. So this whole thing is really, really cool because it obscure the whole fuzz of, when you do it in PyTorch, pure PyTorch, you will have to write this folder with the optimizer, stepping up, stepping down, all these things. But here, just transforming to the scikit-learn world where you just do fit, which just train your model.

Starting point is 00:34:09 And now you can just do predict as if you're predicting any other scikit-learn tool. So Scotch is a really, really tool that just does that. So it allows you to connect your touch net with the scikit-learn pipeline. So this is really, really awesome. So I would just encourage people to take a look at it. I love the idea of it that basically you can create these PyTorch models and do what you need to do to set them up and then just hand them off to the rest of the scikit-learn world. And I can see some really interesting uses for this.

Starting point is 00:34:45 Like I've got some library and it can either integrate with PyTorch or it can integrate with scikit-learn and it just uses this little wrapper to pass it around. I like it. Yeah, yeah. So just for me, it's like it just gave me this ability

Starting point is 00:34:58 to create these more extended algorithms and then just continue using my scikit-line, my scikit pipelines. So that's the coolest thing, that I don't have to change my code, because I just want to replace one line, and that is the model. So I get the model from Scotch, and then pass it in my ordinary, something like logistical regression instead. Now I'm using a net. Love it.

Starting point is 00:35:28 Nice. Brian, what do you think? You like this pattern? Yeah, I do. I like the pattern of being able to use, not have to change your entire tool chain, just to change one piece. It's nice and clean. I like it as well. So that's it for our main items.

Starting point is 00:35:44 Brian, I've got one. I feel like i feel like i should have let you have this one but i grabbed this little extra thing i wanted to throw out there because i thought it would make you happy neat can't wait yeah so uh marco garelli sent over this thing and said if you want to work in jupiter lab right i know that one of your requirements for working with tools and shells and stuff is that they're Vim-ish. You can do Vim keyboard things to it. I'm excited. Yeah. So he's sending this thing called JupyterLab-Vim, which is Vim notebook cell bindings for JupyterLab. So if you're editing a notebook cell, you can do all of your magic Vim keys to make all the various changes and whatnot

Starting point is 00:36:22 that you want. So yeah. cool. What do you think? I'm definitely going to try this. Yes. Yeah, awesome. All right, let's see. What else do I have? I got, oh yeah, this, nevermind my picture. I didn't really intend to put that up there,

Starting point is 00:36:35 but I just want to point out that I'm going to be speaking and the reason the picture is there is the conference, the Pi Bay conference that's running next month. They featured my talk that I'm doing. So that's why there's a picture of me, but the Pi Bay 2021 that's running next month. They featured my talk that I'm doing. So that's why there's a picture of me, but the Pi Bay 2021 food truck edition, they have rented out an entire like food cartopia type place with a bunch of these pods and having a conference outdoors and putting up multimedia like

Starting point is 00:37:00 TVs and stuff for each pod. So even if you're not at the, like a great line of sight, you can still see the live talks, but sit outside and, you know, drink and eat food, cart food in California.

Starting point is 00:37:10 Sounds fun. So I'm going to be talking about, uh, what, what did I say? My title of my talk was, is going to be HTMX plus flask, modern Python web apps,

Starting point is 00:37:18 hold the JavaScript. So I'm looking forward to giving that talk in there. So people, if they're generally in that area, they might want to check that out. That just sounds fun. Yeah. Yes, indeed. All right. That's it for my extra items.

Starting point is 00:37:31 You got any extras, Brian? No. How about you, Prasen? Yes, I got one. I had to actually search if this one has been covered and I was surprised that it has not been covered. I don't think it has. What is this? It's, you know, there's something called py.inf. So we've been using py.inf to, of course,

Starting point is 00:37:53 one can say, why don't you just use always.inf, then get whatever that is. Why do we need to install another package just to get the environment variable or something? But this is pretty neat. It's quite a recent project, I think, and it's rising slowly. And there's a lot of contributors and it's very promising. So what it does, I think I can just bring it somewhere here.

Starting point is 00:38:22 It allows you to do all this type convention casting, etc. Like you can say, I'm going to get my debug here and then I will set the default and also I will do the casting here. So this is really, really neat. So often when you're reading config files, everything

Starting point is 00:38:42 is a string and then you're like, oh, this one is a date time, so I got to parse it. This one is a float. So I got to parse it. Yeah. Okay. Yeah. But it's really even, it's so much that.

Starting point is 00:38:51 So there's another way where you can say from decouple import auto config. So it goes and search where is that.env file is. So otherwise you can just tell where the environment variable is. But it's just, it's just neat. It's very simple. It does what you want it to do. So I will really encourage people to look at it. It's, I just, we, I've just changed every places where I've been using .inv or always.inv with, with this one. And it's just helped me clean some unnecessary steps in my code. That's pretty cool.

Starting point is 00:39:27 Yeah, yeah, great, great idea. Definitely check that one out. All right, well, I think that's it for all of our items. Well, what do you think? Should we do a joke? Definitely. I love it because I've almost forgotten

Starting point is 00:39:38 what the joke is. So it's going to be new to me as well. All right, so the joke is called adoption. This comes from monkeyuser.com. And you've heard about the Python idea of you came for the language, but you stayed for the community. Well, what if it is a little bit different? What if actually people get brought in unwillingly and then they kind of an open field you know think um gazelle or something there's there's a couple of developers just running and there's one who is fixated on a butterfly who doesn't actually see what's there's a bunch of like a pack of python developers coming to adopt them it says a pack of python developers spotting a junior dev away from its pack initiate their conversion assault yeah Ah, yeah.

Starting point is 00:40:25 This is good. Silly, silly. Man, I'm that way even for non-programmers. So, and my family just sort of like rolls their eyes every time this happens. But every time I like get a, some, a young, somebody young coming over either in college or high school or just out of college, I'll, I'll say, so if you haven't done it already, no matter what your field is, you really should learn how to code. And while you're at it, why not just choose Python? So I'm trying to make Python developers out of every person I meet.

Starting point is 00:40:56 I think that's, do them a favor. It'll be their superpower amongst all their non-developer friends. Yeah. Beautiful. That's funny. Brian, thanks as always. And Prasen, really great to have you on the show this week, and thanks for being here. Yeah, thank you, Michael. Thank you, Brian. Thank you. You bet. Bye.

Starting point is 00:41:14 Bye. Thanks for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. Get the full show notes over at PythonBytes.fm. If you have a news item we should cover, just visit PythonBytes.f in B-Y-T-E-S. Get the full show notes over at PythonBytes.fm. If you have a news item we should cover, just visit PythonBytes.fm and click Submit in the nav bar.

Starting point is 00:41:30 We're always on the lookout for sharing something cool. If you want to join us for the live recording, just visit the website and click Livestream to get notified of when our next episode goes live. That's usually happening at noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #250 skorch your scikit-learn together with PyTorch

Topics covered in this episode: Exciting New Ways To Be Told That Your Python Code is Bad GitHub Readme Stats Nox Two tools for dealing with text MPIRE (MultiProcessing Is Really Easy) skorch Extra...s Joke See the full show notes for this episode on the website at pythonbytes.fm/250

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Python Bytes - #250 skorch your scikit-learn together with PyTorch

Topics covered in this episode: Exciting New Ways To Be Told That Your Python Code is Bad GitHub Readme Stats Nox Two tools for dealing with text MPIRE (MultiProcessing Is Really Easy) skorch Extra...s Joke See the full show notes for this episode on the website at pythonbytes.fm/250

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.