Python Bytes - #270 Can errors really be beautiful?

Episode Date: February 10, 2022

Topics covered in this episode: A Better Pygame Mainloop awesome sqlalchemy ThreadPoolExecutor in Python: The Complete Guide Chaining comparison operators Create Beautiful Tracebacks with Python’...s Exception Hooks Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/270

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 270, recorded February 9th, 2022. I'm Michael Kennedy. And I'm Brian Arkin. And I'm Dean Langsam. Dean, so great to have you on the show. Thank you. So often you help me with that start in the live chat.
Starting point is 00:00:19 I know you're a big participant in the show, so we pulled you in and now here you are. Welcome. Thank you. Thank you. I've been a fan actually since episode one. I've been hearing this weekly. That goes back years, like five years. Yeah, it's about five years. I remember I moved apartments back then and I listened to Python. I didn't know Python as well back then and I actually grow with the show. So that's very nice. That's fantastic. That's incredible. We've heard that the show. So that's very nice. That's fantastic. That's incredible. We've heard that from other people.
Starting point is 00:00:47 And that's just mind-blowing to me. But yeah, it's cool. Yeah, I was taking intro to data science classes in Coursera while listening to the show. And now other people call me a senior Python. So that was very nice. That's fantastic. And it does go fast. So awesome. Thank you so much for joining us on the show. It's awesome. Before we get into it, I also want
Starting point is 00:01:12 to say this episode is brought to you by Datadog. Check out their awesome stuff at pythonbytes.fm slash Datadog. I'll tell you more about that later. Right now, Python, I want to hear about a better Pygame loop. Brian, tell us about it. Yeah. So this is a article from Glyph and this is, Pygame is a, is a package that's used for, for game programming a lot. And it's, I mean, a lot and programming games is definitely, I think it's one of the things I tried to do early on when I was a developer. And I think it's something that I think I encourage a lot of new, new developers to try out things like simple games, because it's fun to learn coding that way. And it's anyway,
Starting point is 00:01:57 it's a big part of learning programming and the programming space. And with Python, it's pretty easy with Pygame. And there's a lot of tutorials out there. But one of the things that Glyph points out is a lot of the tutorials have this sort of simple while one loop where you, the main loop of a game where you just spin and wait for events and then handle the event or draw things or whatever and then go back and draw, you know, keep going.
Starting point is 00:02:25 And this just happens forever. While one loop in programming is a busy loop, and it's generally something that kind of has some issues. So Glyph is pointing out that some of the issues with this are that it's waste power, for one. Your CPU is just spinning all the time when you're really not going to get events that fast. And then also there's a there's a thing that I didn't know about called screen tearing, which is when you're drawing the screen at the same time you're writing to the screen buffer.
Starting point is 00:02:58 Right. You're not waiting for the V-Sync 60, 100 frames a second, whatever it is. Right. Yeah. So and that can cause glitches in the game and it doesn't, it doesn't look as good. Um, pie game does allow a V sync option,
Starting point is 00:03:11 but apparently there's like some problem with that. So what really there's a, the, the article walks through both of these, um, both of these problems and the V sync fix, but, and the problems with that, but the end result and the V sync fix, but, and the problems with that,
Starting point is 00:03:25 but the end result really is he's got, it's actually interesting discussion about like really what's going on in Pygame. And he talks about like that, that there's really three jobs going on, the drawing and the game logic and the input handling all at once. And so this is a three thing. It's probably a good idea to do maybe async stuff. So things can work together. And, and the solution he came up with is still, I mean, it's definitely a larger loop, but it's not that big of a loop, more complicated, and it's an async version to have some sleeps in there with some delays possibly but a better loop for gaming and it's it's not that complicated and actually
Starting point is 00:04:12 if you're learning gaming while programming hearing about this these sorts of issues and and trying and learning how to solve it it's probably just going to make you a better developer faster so i think it's a good thing to look at this. Yeah, this looks really interesting. This game loop stuff, you know, it's so often very much the same. And there's like these core elements like process input, you know, if the key's down or if there's a joystick attached, draw the scene, do the hit detection and AI or game logic.
Starting point is 00:04:43 And it's almost always the same. Like this looks great as a way to tell me what I should be doing. And maybe the next step would be create a class that I just override, do the AI logic, draw the screen and just let that like not even be something I ever see. So this is ripe for a little bit of hiding away
Starting point is 00:05:02 even this cool stuff. That's true. Maybe Pygame can extend a better built-in loop to hook into or something. Yeah. Yeah, I always think about... I'm not actually do a lot of gaming on Python, but I always think about browsers,
Starting point is 00:05:18 which are also kind of a loop that runs forever and renders stuff on your screen. And I think, well, the front- end guys got it so easy, right? They don't just write the code and the browser does it for them. And I'm not sure if it works exactly the same, but maybe if someone manages to implement something that's like, just write your game and put it in this thing,
Starting point is 00:05:43 maybe this could attract more people into writing small games in Python. Yeah, absolutely. And my thought is if you just sort of abstract that away, it's just 2D stuff, right? Which it's pretty easy to get into. I just listened to or watched a Netflix series called High Score, which is the history of video games going like way back to the Atari 2600 and Asteroid and whatnot. And there's this woman in here talks about how she got so inspired about just text-based games. So if you're learning to program, right, I definitely think games are a fun way. And often I think people might perceive that as like, well, I've got to write, you know, Angry Birds or something, which is fine. You can write that and that's super fun, but you can do a lot of stuff with just sort of text-based little fun story adventure type stuff as well.
Starting point is 00:06:32 I got to check out that Netflix series. That sounds great. Yeah, yeah. I was just helping a friend writing like this small game and he's written this like with one thread and everything for this school project. And then he told me, well, but how do I show a score that updates with the game?
Starting point is 00:06:50 And then I thought, no, for that you'll need multiple threads, a Pygame loop maybe, and stuff like that. So if that could have been easier on him while learning Python, this could have been awesome. Yeah, absolutely. There's a lot of nice comments out in the live stream. Anthony says, I teach Pygame in my code club after school class. Smart kids, Pygame is great. So is Arcade, which is an alternative, an OpenGL based alternative to Pygame. That's very cool.
Starting point is 00:07:18 I do think having something visual for people when they're learning, it just, it reinforces things so much, right? Like writing that API back in the Toxic database is great when you see the next three steps down the line, how it's going to enable something. But when you're getting started, you need quick feedback. Absolutely. All right.
Starting point is 00:07:37 Well, let's talk about something else that's awesome here. I want to talk about SQL Alchemy. SQL Alchemy has been getting a lot of attention lately, and that's super cool. Mike Bayer released SQL Alchemy 2, which was the first async API version. So now you can use async and await with SQL Alchemy, which opens up lots of possibilities. Sebastian Ramirez released SQL Model, which is like a marriage of Pydantic and SQL Alchemy, which is also super neat. But there are many other things that Pydantic and SQL Alchemy, which is also super neat. But there are many other things that you can do with SQL Alchemy that are really handy.
Starting point is 00:08:10 So as all the awesome lists go, here's one for a curated list of SQL Alchemy. Now, first, just a word of warning from what I can tell, including the PR that I added yesterday, all the way back to the one in June 2020, it doesn't seem to be getting a whole lot of love, which is unfortunate. So it seems like it might be sort of stalled out. But that said, it's still a really good list of things. So I'll pull out a couple that I think are nice here. Which ones did I want to highlight? The first one is called Continuum, SQL Alchemy Continuum. And this is versioning. So imagine you would like to have a history or a record of changes to your database.
Starting point is 00:08:52 Like maybe this is some sort of financial thing. And if you see changes, you want to be able to say this person made this change on this date when they said, you know, update, get the record, make a change and, you know, call commit on the SQL Alchemy session. So what this does is it will create versions of inserts, updates, and deletes. It won't store those. If there's not actually a change, it supports Olympic migrations. You can revert data objects and so on. So if you want that SQL Alchemy continuum, it's just like one of the many, many, many things in here, which is pretty awesome. Another one I wanted to highlight is UDC.
Starting point is 00:09:30 So one of the challenges that people often run into is when you're storing stuff in the database, dates in particular, what time is that? Is that the time of the user who might be in a different time zone than the API endpoint that it was running at. Right. So it might be nice to be able to store like time zone aware things and store them as UTC values. So they're always the same. And then you can convert them back to like the time zone, which is pretty cool.
Starting point is 00:09:58 Another one is the SQL Alchemy Utils is pretty cool. So it's got things like choice type, which I'm guessing is basically enum, but country, JSON, URL, UUID, all of these different data types, data ranges, all kinds of stuff, your RM helpers, utility classes, and different things like that. So that's kind of a grab bag of them. Let's see. One also is called File Depot. There's cool stuff for processing images. You've got File Depot, which is a framework
Starting point is 00:10:29 for easily storing and serving files out of your database on the web, as well as SQL Alchemy Image Attach, which is specifically about storing images
Starting point is 00:10:37 in your database, which, by the way, we do, Brian, on Python Bytes. Cool. You know, if you go to any page, any episode page,
Starting point is 00:10:45 and you see like that, that watch it on YouTube, that little thumbnail, we go get that dynamically from YouTube and then serve it up so we don't have to depend on YouTube. So anyway, that's pretty cool. Let's see, maybe two more.
Starting point is 00:10:58 There's searchable. So if you want to add full text search to your model, you can add, use use this and then only supports postgres because i'm sure it depends upon some core element there but you can also do another one from isql as well which is pretty cool and then the last one is schema display which generates basically graphs of your models and how they relate to each other, stuff like that, which is kind of neat. Nice.
Starting point is 00:11:26 What do y'all, what do y'all think? Cool stuff, right? Yeah. Very cool. Yeah. So I,
Starting point is 00:11:32 if you're really bought into SQL alchemy, you owe it to yourself to just flip through this list to just go like, wait, it can do that. I had no idea that it could do that. Right. And just sort of see what are the other things that people built on top of here that I think would be super, super helpful. And by the way, my PR was really to say,
Starting point is 00:11:52 there's a layer called thin abstractions. And it says, you know, under the thin abstractions, we really should have us some SQL model because that thing is super popular straight out of the gate, right? So people should check this out. It's already got almost 7,000 stars and it's from sequel model because that thing is super popular straight out of the gate right so people should check this out it's already got almost 7 000 stars and it's what a month old or something that's crazy yeah maybe maybe six weeks but really really new yeah and but the author i mean yeah exactly i know uh brandon on the audience says there should be a meta awesome list, like an awesome list of awesome lists. I'm sure there is. There is, I'm sure. And yeah, quite fun. I definitely recommend people check that out. All right. Dean, that brings us to your first item. Tell us about it. Yeah, so at work, I needed to write something that required threading.
Starting point is 00:12:45 And I was very afraid of threading at the beginning. Basically, what we needed to do, we have some mechanism. I'm a data scientist, and we need to take many queries at once and get them as painted as data frames and save them to disk and later take all of them and work with them. And instead of writing, like sending them sequentially, I wanted to send a bunch of them together. And the bonus thing I found is that
Starting point is 00:13:10 when you release them to a threading, if you don't lock the threads or you don't wait for the threads, you can actually still work with the Jupyter notebook while waiting for the queries. So that was my main reasoning. And eventually after I've written most of the code, I got this blog post called
Starting point is 00:13:27 The Threadpool Executor in Python, The Complete Guide. So this is basically Jason Brownlee. He's a guy who's also the guy from Machine Learning Mastery, so I'm very familiar with him. It's a very long blog post, so you could kind of read it as an e-book or just access the stuff you need because it's like, I don't know, a two-hour read maybe. And he explains everything from the beginning.
Starting point is 00:13:53 He explains what are Python threads, how to work with them. Then he introduces the Threadpool executor, which is a more convenient way to use threads. He explains about the lifecycle of what does he do, how to do it then with a context manager and stuff like that. And eventually what he talks about that other people do not when you search for a threading tutorial is actually about the complete lifecycle and then the usage patterns. And then he explains about IO bound versus CPU bound and everything. And he finishes off with the common questions. So this is
Starting point is 00:14:30 like the link I've saved because I will forget it in a week, but the next time I need to, I just know I can come back to this and like read the common questions part. And yes, there's a question, the questions like, how do you stop running? There's a lot there, yeah. There is a lot there in this article, isn't there? Yeah, it's a lot. It's a lot. But the thing is, you can come back later and just take the stuff you need. Like, I remember, I know I'm working, then I can ask myself, how do you set a chunk size in map?
Starting point is 00:14:57 Well, it says there that you don't because that's for the process pool. But then I have another question. Maybe how do you cancel a running test? And the answer is that uh so i think that's that's a good thing to have uh like to quickly access when you need to uh and he finishes off with like what's the difference from asyncio from threading dot thread from process pro executor uh so that that is a very helpful guide, very complete. And the entire blog actually explains, it's an entire blog dedicated to the threading pool executor and the process pool executor.
Starting point is 00:15:34 I love that it's covering the thread pool and process pools because it's easy for things to just completely get out of control. As you throw more work at it, stuff can completely back up. So if you just say, create me a new thread and run that, and then another place, create me more threads, and I got a bunch more. Oh, look, now I have a thousand items to process.
Starting point is 00:15:54 Create a thousand threads. Each thread takes a lot of context switching to switch between, and they take a decent amount of memory and all sorts of stuff, right? Through the thread pool, you can say, queue up the work and run 10 at a time. Same for processes, which sort of sets an upper bound on how much concurrency you can
Starting point is 00:16:11 deal with, right? Yeah. Yeah. This is cool. So you talked about solving some problems in Jupyter Notebook using this. What in particular were you trying to do? So basically I can send, I don't know, a thousand queries. And once they get, like, we have big data
Starting point is 00:16:28 and then I have a query that takes a part of it, like after maybe some group buys and limitations and stuff like that. And I want to take the data frame and save it. And then once I have the entire data from all the queries, I want to join them or maybe do some, I don't know, some processing and then join everything. The thing is after like 10 of those came back,
Starting point is 00:16:51 I have a sample of my data that I can work with and try to manage and then have a code written while the other stuff are still written. I want to have that. I can play with it. So if I release the other things to the threads and they work in the background, the main thread of the Jupyter notebook is open. And you can start working on the same notebook. Before then, I used to open a notebook that's querying stuff, open a notebook that I'm playing with and like see that the file paths are the same. So I'm not confused with like some other directory of the other versioning of this data. And now it just works. Oh, that's really cool. And you can also add a thread for, I know with some visualizations of what's finished, what's eroded, what's like everything. Fantastic. Yeah, that sounds really good.
Starting point is 00:17:45 I'm sure there's a lot of concurrency and parallelism in the data backend. It's just how do you access that from Python, right? So how do you issue all those commands? Excellent. All right. Let's see. Brian, anything you want to add
Starting point is 00:17:58 before I talk about Datadog? No, just some comments like Sam, morally concurrent futures is a much less painful way to work with them at a higher level. So maybe we could get an article on concurrent futures on the episode sometime. Yeah, for sure. So the thread pool executor gets you back futures. And then part of what's explained in the blog post is how to work with futures like as completed or sequentially or like you decide your strategy but you work with the futures nice
Starting point is 00:18:33 okay cool yeah nice and of course requisite shout out to unsync which is all sorts of awesome for this stuff. Unifies the API for direct threads, for processes and async IO. But what I want to tell you all about now is Datadog. Datadog is really awesome. You should really have insight into your applications. And that's what Datadog brings you. So Datadog is real-time monitoring that unifies metrics, traces, logs into one tightly integrated platform. Their APM empowers developers to identify anomalies and resolve issues, especially around performance and begin collecting stack traces, visualize them as flame graphs and organizing
Starting point is 00:19:18 them into profile types such as CPU bound, IO bound, and so on. And teams can even search specific profiles and correlate them to distributed traces to find things across different parts of your infrastructure and microservices and identify slow or underperforming code and then make it faster. Plus you can use their APM live search.
Starting point is 00:19:37 You can search across the full stream of all the traces over the last 15 minutes. So try Datadog APM for free with a 14 day trial and then datadog will send you one of these very cute doggy t-shirts which who wouldn't want one of those right so visit hythombice.fm slash datadog or just click the link in your podcast player show notes to get started thanks datadog and uh brian back to you back Back to me. I was, I'm going to apologize whoever tweeted this, but somebody tweeted this out, a link to this article, and talking about chaining operators. So this is an article by Rodrigo Serrao. Piedonts?
Starting point is 00:20:22 Yeah. So I don't know what the piedonts are about. I don'ts yeah um so uh so i don't know what the pi don'ts are about just i don't know maybe you started blogging about things you shouldn't do in python but um anyway this article is called chaining comparison operators and i use chaining all the time mostly i use it for simple things like um oh let me find one a A is less than B, less than C. So ranges, like my X value is between min and max. Yeah, that's really nice. Yeah.
Starting point is 00:20:52 My hint on that, just tip for anybody doing that, always do them less than. Don't do greater than, because it's hard to do that. Anyway, so keep them like that. But this article is talking about other stuff. So this is pretty easy to think about, like the less than operator. So A is less than B, less than C. Is it really the same as A is less than B and B is less than C? It is that combination. That's what chained operators are. And the importance there is it doesn't really work, uh, for some operations. Um, and he gets into like the equal operator. So you can do a equals B or equal C, which means
Starting point is 00:21:35 they're all equal. Great. What about not equal? Does that work the same way? Um, and it doesn't, because if you've got like, uh, uh, is not equal to B is not equal to C. It doesn't mean they're all different because, uh, A and C still could be the same and have that pass. So, um, this, this article, if you're working with chained expressions, which I think you should, uh, if you're doing complicated things, it's way, I like it better than doing it, having a bunch of hands in there. Um there, as long as you can keep it readable. But this article talks through some of the gotchas and things to watch out for,
Starting point is 00:22:14 like side effects and non-constants and things like that. So great discussion of chained operators. I hadn't even thought of doing this. Not equal to. This seems wrong. It just looks wrong yeah but um yeah and yeah don't do change not equal that's just unless and even if that's what you meant that like a is not equal b and b is not equal c but it's okay for a and c to be equal
Starting point is 00:22:39 that would be a terrible expression because it's confusing so don't do that it is yeah my favorite one of these chainings like x um you know it's seven less than x less than 10 yeah something like that that's nice my favorite is converting x if x is not none else y to just x or y boom that's that's so clean and so nice and i never coming from a C++ background and C Sharp, I never thought that was possible. And that's great. Dean, what do you think about this? I love it.
Starting point is 00:23:13 I use it a lot. It didn't always work. I think it's still not working with Pandas data frames or Pandas series and arrays. And I do wait for this to finally work. Arrays, when you do an array, like in NumPy or Pandas, when you do an array, it's less than some number.
Starting point is 00:23:31 It returns a new array, like a Boolean array with true and false. And last time I checked, it was a few months ago, but the last time I checked, it didn't work. I couldn't do one is less than the series is less than two
Starting point is 00:23:44 and get the Boolean array. So I'm waiting for this. But I love the concept a lot. I hadn't really considered the integration into pandas. I'm not sure how would you implement that with the regular data model of
Starting point is 00:23:59 DunderDunderEQ or is this something else? LTE, possibly. I'm not sure lt lte yeah possibly i'm not sure either yeah there's probably some magic method and it might just expand out to less than and then and you know like the two the two tests basically probably does oh cool we can ask uh uh we should ask uh brett cannon to do a deep dive into what he's pulling apart all the different parts of python syntax right yeah all right i want to uh give a quick shout out to rich because it's one of our episodes
Starting point is 00:24:32 so we talk about rich i was going to talk about anthony shaw but i didn't have enough information so i mean he's the other person who needs a shout out in every show So I want to talk about this article highlighting some tools by Martin Hines. Yeah, Martin Hines. Well, creating beautiful tracebacks with Python's exception hooks. So two things that I want to point out here. One, Python has an exception hook mechanism, which is pretty cool. So what you can do is you can create a function that has this signature of exception type, the actual exception and the traceback. So three arguments. And if you have a function like that, you can just go to the
Starting point is 00:25:12 sys and say sys dot accept hook equals that function, not calling it, of course, just passing the function as the value. And then whenever there's an exception, this will be called by Python. That's pretty cool, right? Yeah. So depending on what you want to do, like you could say, well, we're going to store all the errors. Like, let's imagine here's a scenario where you might make use of this. I'm going to create an app and I'm going to send it out. I'm going to use Py2App or Py2XE or just, you know, let people install it somehow. And then when it runs, I want it, it's going to run on their computers, but I want to gather up all the exceptions of all the users across the company or the research team or whatever. You could have this submit this error along with other details right back to a database over an
Starting point is 00:25:55 API, right? And then you could do like analytics, like, well, here's the most common error and so on. Of course you could use Sentry or something like that, but maybe you're trying to gather some specific information that's different, right? So that's one of the types of things you could do with this. So I got a question before I go on. Yeah. So this doesn't catch the exception. It just, it doesn't interrupt the flow.
Starting point is 00:26:16 It just, it just gets called when it happens. It doesn't, it doesn't catch the exception. It lets you basically change what kind of output comes from Python. So if you just wanted to print out like, here's a file where there was an error and here's the error message. Okay. Like you could do that, right? Or the type and then the message.
Starting point is 00:26:36 I'm just noticing the example doesn't rethrow it. So you don't have to do that then. No, I don't believe so. And I'm not 100% sure. I think the process still ends i if it's just a regular running a script rather than a web app i think it still ends uh but anyway sorry a different kind of output yeah yeah no you just don't get the standard print output that python gives you right so you could say avoid printing the trace back
Starting point is 00:27:01 if you wanted you could just say this file on this line had this error okay nice right okay so um it's easy enough to do like for example um they have this function that call they call that constant error and all you see when this crashes is there's a trace back um you know this file this line in this module here's the error message right instead of the huge stack trace that might scare people okay so i mean obviously you can use try and accept but this is global right so even if some library is calling something and you're not catching it and like right it's it's catching everywhere okay so then you could do more work about breaking that apart and they talk about doing that but the real interesting part is if you go and look at some options so there are five i believe there are five libraries mentioned in here that do really cool stuff for solving this the first one
Starting point is 00:27:50 is by will mcgougan's rich library so you can just go from rich dot trace back import install and then say install show locals is true and then this also basically installs one of those global exception hooks. But with the benefit being when you get the errors, what you get is a nice rich output. It's super pretty. It's pretty and it's useful. I mean, it's color highlighted so you can see where the error happened, but it also will print out in a really nice way with formatting and highlighting the locals, right? So, well, what values were passed to that function when it's crashed?
Starting point is 00:28:27 Well, here's a little table of those and so on. So this is really easy to identify and at the very bottom, like a nice clear way to like, okay, what happened? So you can do this super simple version here. And there's also some manual ways to make rich print this type of stuff. Number two is better exceptions,
Starting point is 00:28:46 which does similar stuff. You can see that it doesn't quite take over how the look and feel is so much but it basically colorizes the standard look and feel of errors so you can see you know which function uh which error and so on so that's pretty good and there's pretty errors check out pretty errors this looks pretty good right we've it's got a lot of like bold and highlights. You can really call out the error messages and the functions involved and the modules involved. Here's one for you, Dean.
Starting point is 00:29:13 The built-in one to IPython. It has Ultra TV for Ultra Traceback. And this is pretty nice, right? Actually, the IPython one's pretty good. Yeah, the Python one is really nice. And also I was planning to talk about it in the extras, but on IPython 8, which is pretty new, they even have this improved with some color coloring
Starting point is 00:29:34 of exactly where the error happened. I think this uses the 310 part or something like that. Oh, awesome. Yeah, that's cool. We'll hear more about it when we talk about IPython 8 as well. Cool. Yeah, so that's built in, kind of if you're already on the data science stack.
Starting point is 00:29:51 And then finally, stack printer, which you can give it a trace back and it will print that out. So you can sort of do like rich, you can say set exception hook and give it a theme like dark or whatever. And then it does this pretty nice printout as well. So these are all great. I'm personally liking the rich tracebacks version best, but this is really nice.
Starting point is 00:30:13 Yeah, Connor out there in the audience says, wow, using Show Local Sequels 3 would have saved me hours and hours of time. And you and me both. Yeah, I feel the pain. I do too. Because a lot of times you're like, I know it crashed and it
Starting point is 00:30:25 says none type does not have attribute whatever but like why is it none i need to go back three levels right like yes it's so good and then you find out you just uh forgot to return from the function yes exactly i was um i was just debugging a test failure the other day, and PyTest has the option to throw a local. You can show locals with a crash or with every failure. And I forgot that the particular thing I was testing had variables that were storing thousand element arrays. It just like went on. I believe the,
Starting point is 00:31:10 I believe riches has a truncate variables where it'll do an ellipsis or something like that. I think, I mean, yeah, I'm not a hundred percent sure. Cause I've been looking at all five of these. Will's in the chat.
Starting point is 00:31:22 We'll have to ask him. Will's in the chat. You have to give us a shout out. Will. I think, I think truncate is out there, in the chat. You'll have to give us a shout out, Will. I think Truncate is out there, right? I'm not 100% sure. I think of how can I actually, so I talk with databases
Starting point is 00:31:32 and sometimes the errors from the databases are like this big Java trace and then you need to like a lot of, go a lot of apps. Sorry, something, some noise here. Sorry. You need to go a lot up in the browser to actually see the error.
Starting point is 00:31:51 And if I could just shut it down and just give me the Python stuff. MARK MANDELMANN- Yeah. I don't know what setting you set for that. But certainly with this mechanism, you could set it up so that if the word Java appears, you just stop. You just stop going back. And Will says, yes,
Starting point is 00:32:10 that's right. Thank you, Brian, for pulling that up. Yeah, you can truncate it so the printing won't go completely insane. Because it could be gigabytes. I mean, it could be out of control, right? Even if they have a reasonably large limit, sometimes it's just like, oh, I forgot that huge array was there and it's hard to see stuff.
Starting point is 00:32:28 Yeah, absolutely. Yeah. All right. Over to you, Dean. Speaking of testing, Brian was talking about testing stuff and looking at the color and so on. Yeah, so I thought, Brian, this would be up your alley. So it's called Ways I Use Testing as a Data Scientist. It's by Peter Baumgartner.
Starting point is 00:32:46 And I'm a data scientist, but I also love testing. The thing about testing with data science is sometimes it's not that clear what you should test for, right? Because some things we do are stochastic, and then you could not actually test for stuff or stuff like that. So this blog talks about the art of testing because sometimes it's not clear what you should test and the more experience you get, you can actually see what's coming your way. And he talks about data validation and he's throwing many packages that could help you, packages like Pandera and great expectations that I think we've talked about
Starting point is 00:33:32 before in the podcast. And also, the NumPy has some stuff like is closed, checks for two numbers that are close to each other or array equal assert data frame equals in pandas data frame. So he talks a lot about that. He also talks about using assert in your code. Even if you had some ad hoc stuff of analysis, use assert within the code. Don't think about the tests later.
Starting point is 00:34:00 Just think like, where does this thing could hurt me? He gives an example. Maybe if I'm trying to join two data frames and they think they have the same shape, I want to check if they have the same IDs, so that way I know that the join works correctly. He asserts that the length of the IDs is the same within the two data frames. This is not even real testing, we would say.
Starting point is 00:34:25 It doesn't use some testing framework. It just says like write it within your code. It then continues to like hypothesis, which basically bombards the functions with a lot of ways to actually try to fail it. It continues with some other packages and eventually goes into a PyTest and shows how it would work with PyTest and with an approach that I haven't heard of, but it sounds
Starting point is 00:34:54 good. Arrange, act, assert. Arrange the data, then act on the thing you want to check and then just assert if they are equal or almost equal and the thing you wanted to check for. Yeah, it's such a such a easy mistake to make, like this number equal equal that number. But when you're doing science or data science
Starting point is 00:35:18 I'm glad he talks about structure because a lot of people that get into testing get these giant tests that do a little work, test something, do a little more work, test something. And then if it breaks, you're not sure where the failure is. So this sounds fascinating. And actually, I'm not sure how I missed it, but I really want a way to compare an array for almost equal. So I'm going to have to go read that.
Starting point is 00:35:49 Yeah, so NumPy and Pandas both have mechanisms for that. It's pretty great. Nice. Cool. Yeah. Very nice. I know this will be helpful to people. I always wonder about testing data science stuff and machine learning things and so on where you get small perturbations, but they're fine, right?
Starting point is 00:36:04 It's off by one millionth of some unit, but like, that's totally good. Those are equal, but it's, it takes, I think, an extra level of thinking about it, right? So much people focus on, well, how do you get rid of your dependencies and how do you make sure that you don't talk to the real database when you do this?
Starting point is 00:36:20 So it's right. And that's one aspect that people focus on, but this working with like science-y type stuff is its own specialty. Yeah. I think that the entire community is, it's a fairly new community, although it's not as new as it was. And I'm not sure like we're on top of how to do tests in machine learning. Like many, we have many packages for that. We have many theories for that, but I'm not sure like we have like actually one solid good way and maybe we shouldn't have, but it's a debate. Yeah, for sure.
Starting point is 00:36:56 Same with the rest of the software world. So welcome. Thanks. Yeah. And Sam out in the live stream says, NumPy has an assert array almost equal in NumPy.testing. Nice. I just learned there's a NumPy.testing.
Starting point is 00:37:10 That's cool. Yeah. Awesome. All right. Dean, while you have your screen up, do you have any extras you want to talk about? I know IPython 8 was a thing. Yeah.
Starting point is 00:37:20 So IPython 8 was released like last month after three years of waiting for a major version. It has a lot of new features, but this is the extra part, so I won't go over them. Just two and a half things I wanted to mention. It says that it's less code, and I love that. Once you get better in a programming language, you understand that you should write more code,
Starting point is 00:37:43 you should delete code. And that's what those guys do. And the way they could have done that is by hiring a person through the NumFocus small development grants. And I think this is important. It's actually been talked a lot about after the Log4j stuff. It's been talked about like, well, those are three guys who worked tirelessly. They have their full-time jobs and they couldn't fix the log4j stuff maybe as quickly as some other people wanted. But then you realize that they got donations of like a few hundred dollars within 10 years.
Starting point is 00:38:18 And then after the log4j, suddenly they got a thousand. So this, I think it shows you how money could help open source stuff. And maybe if you use some package in a company, in some corporate, maybe try and think how you can give back money. Or even if you give back code, if you free up your developers to actually contribute, this is awesome. And the half thing just mentioned, because it talks about the traceback, it shows that you can now see it's like colored. You can see on the screen,
Starting point is 00:38:54 it's called the part where actually the L was, it's colored now. So it's very nice to see. The example shows you add the function three times but only it fails on just one input of them so it shows you which of the three times the function failed. Right. You call it the same thing like foo of zero plus foo of one plus
Starting point is 00:39:18 foo of two and it's the middle one that failed not just line seven but the second invocation with the value one where it failed which that's awesome yeah exactly and uh the same well that's right i was gonna say the same thing for uh indexing into what is that a data frame or something like that like it's you're chaining together like bracket zero bracket one bracket zero it's the second one trying to get to the one of zero that that was the one that failed there. That's really, those are hard to come back and find
Starting point is 00:39:46 if you're not in a debugger. Like, well, which one of these failed? Like, great. Array index out of bounds on line three. Well, there's three of those happening. Which one? Yeah. Yeah, that's cool.
Starting point is 00:39:57 And another thing is a tweet by Victor Steiner. He's a core dev. And he says, I mean, it's now time to deprecate the standard lib URL module. And this has brought a lot of haters and fans. And I'm not sure what's my opinion yet. I'm not a heavy user of URL lib. But it opened up a debate like we know how to do. Yeah, that's really interesting there are certain things in the standard library you're like yeah yeah i know that's there and you could
Starting point is 00:40:30 use it but you probably shouldn't use it there's like so many better external choices that are so good that it would be kind of silly to fight them right that's sort of the recommendation here yeah but also the like some people don't like it they have people there that say like they don't, they hate dependencies. And sometimes you can do most of the work with the standard lib. And some of the tweets says like maybe deprecate the, the major parts that requests can do, but there are some other parts that are actually really needed. So maybe deprecate half of it.
Starting point is 00:41:04 Yeah. I'm not sure if I'm about deprecating it, but you know, it's one thing to say there are better choices. And we, as a community recommend, you probably just don't use this, but to deprecate it means to people who would rather go with a dependence, a lower level of dependencies, you're giving them warnings that they shouldn't be doing this when maybe, you know, it's unlikely it's going to actually vanish. Right. There's a, like a fallacy though, that I think some people have that if it's in this, if they don't have dependency and it's in the standard, they're using something in the standard library,
Starting point is 00:41:36 it's more solid. Um, but, um, I don't know if there's that many people working on URL, URL lib right now. And, and some of the other parts that, um, that maybe people working on URL lib right now. And some of the other parts that maybe people want to stop supporting. That's something very valid. Python still is an open source project, and we can make those decisions. Yeah, Victor actually says there are
Starting point is 00:41:57 four-year-old security issues in URL lib, so maybe it's better to use something outside of it. Yeah, people want it to stay, but there's these issues. Yeah, I wonder if there's a way to go, well, let's look at some of the libraries that are out there, try to bring them in and just use their core
Starting point is 00:42:16 to replicate that functionality. Not to say, you know, like you could, let's just pick on requests, like bring requests in, like vendor a little bit of it in, so it does what URL lib does and, and just go look, okay, this is the latest greatest that we got. And everyone's been looking at requests already. I don't know.
Starting point is 00:42:33 It'd be interesting. Yeah. And then Brandon out in the audience points out, there are also maybe environments where you can't install dependencies for security reasons. And so having things like URL lib allows you to do more with Python. Yeah. But if there's security problems with URL lib that, yeah, anyway,
Starting point is 00:42:50 just in some of the functions, you don't call those. No, I'm just kidding. All right, Brian, how about you extras? Um, what, just one extra I've been, I brought this up last week. I'm currently not writing a book. Um, so yeah, so I'm, I want to write more blog posts. So one of the things I've, I wanted to make sure that my, uh, my blog, I've migrated to python test.com and, and now it has a blog setting. Um, and, uh, I like it looks pretty
Starting point is 00:43:19 too. Instead of just pulling everything over from my old WordPress blog, i'm uh trying to edit it so i'm up through 2012 uh i'm gonna i'm gonna go oldest to newest um and gradually do things bring things in so um that's one of my side projects i'm working on that's a great side project nice uh what's that running on is that like some static site generator or other hosted thing? It's Hugo hosted by a free Netlify account. Yeah, Netlify is pretty awesome. All right. I got a couple of things
Starting point is 00:43:52 I want to give a quick shout out to. Yeah, Brandon had the same question, but we got it. All right. First of all, I have two new, my Python shorts, two new versions, two videos from there.
Starting point is 00:44:03 I got beyond the list comprehension. So basically set and dictionary comprehensions, stuff nice picture thank you it's a little and it's like just a screenshot out of an animation uh and then combining dictionaries python 310 way is the title of the article it really should be 3.9 but i kind of want to communicate like if you're on the latest python how should you be doing it it came out in 3.9 the features that are actually in there. Anyway, the pipe stuff, dictionary one, pipe, dictionary two,
Starting point is 00:44:30 pipe, dictionary three, which is all fun. And then I wanted to talk about a feature over on pipe.pi.org. I don't even know how I found this. Probably just like an accident, like bump the keyboard or something. But if I'm over here
Starting point is 00:44:40 and you just want to search for something, forward slash, now you can search. What? So they now have a beam in the browser. Exactly. So if you're on pipe.ai.org and you want to search, forward slash, yes. So that's pretty cool. Yep.
Starting point is 00:44:56 All right. That's it for the extras. Nice. I don't even remember what my joke is, so that's good. It'll be fine. I mean, it's a little bit slow. You all ready? Yeah. All right. right yeah here we go oh yeah this is another one of these sort of like frustration type of things that's great this comes from the programming humor twitter account you know
Starting point is 00:45:17 twitter.com slash programming humor which is there's a lot of good stuff in there some that i really liked i didn't want to necessarily put on the show, but this one is developers really frustrated that they're sucking in on their lips. They're pulling on their cheeks and going, oh, I hate this job. I hate my life. Why is this happening to me? Nevermind.
Starting point is 00:45:36 I misspelled a variable. Good to go. Yeah. Linting. Yeah. Linting is good. Linting, indeed. Indeed. Indeed.
Starting point is 00:45:45 If you just flip through the programming humor one, it's pretty good. You know, this eight-year-old is learning Python after dealing with the syntax bug. She asked, if the computer knows it's missing a semicolon here, why won't it add it itself? I don't know. I really don't know. Yeah. Yeah. Yeah. And like, so he follows up and says what he meant. He meant colon,
Starting point is 00:46:08 not semicolon, but so many people are like semicolon. We need to use a semicolon for Python. Exactly. There are uses. They are rare though. All right. Well,
Starting point is 00:46:21 fantastic. Last one. See, yeah, it shall not be spoken, but it's good, right? Well, fantastic. That last one. See? Yeah. It shall not be spoken, but it's good, right? Okay. There's a lot of good stuff. I recommend people go flip to that Twitter account.
Starting point is 00:46:33 Nice. Brian, thank you. As always, it's good to be back with you. It's good to be back. And Dean, thanks for coming on this side of the presentation and joining us for the show. Thanks for having me. Thanks for listening to this side of the presentation and joining us for the show. Thanks for having me. Thanks for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
Starting point is 00:46:50 That's Python Bytes as in B-Y-T-E-S. Get the full show notes over at PythonBytes.fm. If you have a news item we should cover, just visit PythonBytes.fm and click submit in the nav bar. We're always on the lookout for sharing something cool. If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live. That's usually happening at noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.