Python Bytes - #246 Love your crashes, use Rich to beautify tracebacks

Starting point is 00:00:00 Hey there, thanks for listening. Before we jump into this episode, I just want to remind you that this episode is brought to you by us over at TalkPython Training and Brian through his PyTest book. So if you want to get hands-on and learn something with Python, be sure to consider our courses over at TalkPython Training.

Starting point is 00:00:16 Visit them via pythonbytes.fm slash courses. And if you're looking to do testing and get better with PyTest, check out Brian's book at pythonbytes.fm slash PyTest. Enjoy the episode. Welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 246, recorded August 11th, 2021. I'm Michael Kennedy.

Starting point is 00:00:38 And I'm Brian Ocken. And I'm David Smith. Hey, David Smith. Welcome. So good to have you here. It's good to be here. Yeah, you've been a suggester of topics, I believe. You've sent in some ideas and thoughts for us. And well, we're going to get a good dose of that today for sure.

Starting point is 00:00:54 But honestly, if I'd known that you're going to open this up, I probably would have ordered some of those because it was a little bit of a scramble to be like, oh yeah, I already gave them that tip. We already gave them that tip. So yeah, I had to dig a little bit. You've already shared all your favorites. Well, your losses are gained because you've made it easier for us in the past.

Starting point is 00:01:09 So thanks for sharing those things. And yeah, thanks for being here. It's going to be great to have you. Definitely. That was great. Yeah, I want to give the quick elevator pitch on you. What do they know about you? Well, I'm a recent tech convert, I'll say.

Starting point is 00:01:22 Over the last 10 years, I've been working in the manufacturing space, either in quality engineering or manufacturing engineering. And over the last couple of years, I've been using Python a lot more heavily. I used to do a lot of VBA in Excel, which it was painful. And I got a suggestion from one of our equipment suppliers to say, hey, use Python. It's really, really nice. I resisted doing it because I didn't want to learn something new. It seemed intimidating because it's a programming language. I'm not a programmer, but I finally caved when it came to trying to automate plotting, which is pretty painful in Excel. Yeah, once I started on it and had something useful working in a couple hours,

Starting point is 00:01:59 I was hooked and then I started looking for more and more resources found to your show and got more and more into it from there. I started digging into the web and it's just been a, I'd say, an upward spiral from there. And about probably about two and a half weeks ago, I started in my first, I guess, official tech role in a similar kind of domain as for an automotive supplier doing engineering work. So it's been really exciting to be able to use Python full time. It is part of my job because, you know, the bits and times I got to use Python before, that's always the parts

Starting point is 00:02:30 I like the most. So I'm happy to be doing it, you know, on purpose. Awesome. Congratulations. I wish I could do it full time. I remember my first full time software development job.

Starting point is 00:02:41 I was like, I can't believe they're paying me to do this. I better figure this stuff out before they fire me. I can't believe I'm doing this. do this. I better figure this stuff out before they fire me. I can't believe I'm doing this. It was so great. So good. All right, well, congratulations and happy to have you here.

Starting point is 00:02:50 Brian, I feel like we should document this. Definitely should document it and test our docs too. So one of the things I'd like to tie, did I just try to edit? There we go. Something that came up recently was Vincent Warmerdam. I think we've had him on the show. A couple episodes ago, yeah.

Starting point is 00:03:08 Yeah. So Vincent announced that he's got a library called MakeTestDocs. And I kind of love this. So the idea is it's a bunch of utilities that you can use to help test your documentation. It doesn't do it right out of the box. You have to create your own test files to do this. But the idea, like the first example that he shows on his readme, is that you've got a Markdown file,

Starting point is 00:03:40 and it's got some Python blocks and code blocks in it. And you can make a test that goes through, reads the markdown, grabs the Python code and runs it. And if there's any problems with it, if there's any exceptions, it fails the test. This is just brilliant. There's examples in here for doing it with docstrings and even class docstrings. And then Vincent even did, he does the com code and he did a little com code video on how to use this. Yeah, and you're putting that in the show notes

Starting point is 00:04:14 for people, right, to check out? Yep, there's a link to the tutorial with the video. The suggestion or the use case that he was talking about at first was that maybe you're using MakeDocs for documentation. Therefore, you've got a bunch of Markdown. But my use case is going to be blogs. So I think that's a huge use case, actually. Yeah, I've got Python code in my blog source code.

Starting point is 00:04:40 It's Markdown files. I totally want that's one of my to-do list is to try this to make sure that the blog content is accurate so that is super cool you know one more thing that you might find interesting i think this is a more true software engineering type of solution but another sort of whizzy wig as you work style of solution is pycharm if you have a marked down file and you have Python code in there, we'll highlight the errors and actually show you if like symbols are missing and stuff.

Starting point is 00:05:11 So if you had the Markdown associated with the sample code and then you like do stuff with your little examples, it may actually show you the errors live as well. Oh, that's cool. Yeah. I mean, that's not like a CI sort of keep it fixed, but that's a, as you type kind of thing. thing yeah and the other comment that he had is um if you i normally don't put like asserting things are valid in in documentation but uh the comment in the readme is that if you

Starting point is 00:05:37 put asserts in there it'll get checked also so you've got like unit tests built into your documentation super cool david what do you think? It's interesting. I'm just trying to figure out, are you doing like a parametrized test and looking at your inputs versus outputs for the code that's in the documentation or how do you actually know it's testing correctly?

Starting point is 00:05:57 Is it a valid Python? The little code snippet we've got that we're showing on the screen, in the chat, but also there's a link in the readme to the read in the show notes to the readme. The parameterizes that it uses uses the like in this example, I'm saying, go look in my docs folder. And for everything that it finds in there, that's a markdown file that'll show up show up as a parameterization of the test. So if I've got this test will run once per file. So if I've got three Markdown files in there, the test will run three times.

Starting point is 00:06:36 This is the most comprehensive and yet extremely short test I've seen. Really long times. Three lines and it will basically traverse a tree of markdown file hierarchy type thing oh i do tons of really tiny tests so yeah yeah nice nice nice all right avaro welcome to live stream happy to have you here uh let's see let's move on to the next one i think speaking of users giving us our listeners giving us ideas and helping us out here i want to talk about something that i've been hanging on to for a little while since march but i finally decided it's time to talk about it and that is creating queues um out of process sort of asynchronous

Starting point is 00:07:18 queue processing so if i've got uh say a web app or an api or even if i'm testing a bunch of hardware and I want to kick off a bunch of jobs, eventually I don't want to necessarily block on all of them. I might want to push them down so other things can work on them. You know, if I'm going to send a bunch of emails, if you've ever tried to send a thousand emails in order synchronously, it turns out that times out your web request. Don't do that. So a better idea would be to push them to a queue and have have some sort of background process go, oh, there's new emails to send. Let me jam those on down the line. So Scott Hacker sent over this pointer to this library, a small but cool little one called, it is called QR3. And QR3 is a queue for Redis. And the three means Python three, because there used to be a QR that wasn't three

Starting point is 00:08:05 that's not Python 3 compatible. So here's like a reimagining of that for Python 3 or just a compatibility that got moved over. So it's pretty cool. We check it out. The API and implementation or the usage is quite simple as you could imagine. So all you got to do is you got to,

Starting point is 00:08:24 it's built upon Redis Pie. You've got to have Redis installed. That could be, you know, wherever. It could even be Redis as a service on some of these cloud platforms, run it in Docker, run it locally. Then you have Redis Pie, and then you just go over and you create a queue.

Starting point is 00:08:37 So you just say queue and you give it a name and then some server connect info like location and authentication and whatnot. And then all you've got to do is you push items to it. They could be just really simple things like a bunch of email addresses you're going to send. But it could also be really complicated. Like, for example, it could be, say, Pydantic models that store all the data that you need to process that request. So that's pretty cool.

Starting point is 00:09:01 It has the default way of getting data over to it is through CPickle. And CPickle is better than Pickle, but still has issues and other restrictions. Some of the restrictions are you can't put certain types of objects. Like it wouldn't make sense to serialize a database connection that has an open socket or a thread or some weird thing like that, right? But most of the sort of message, here's the data you need to process, you would send over, all that stuff would work. And you can also create your own serializer on a per queue basis, which is kind of cool. So if you said, I want to only work with

Starting point is 00:09:37 identic models, you could put the sort of from dictionary to dictionary transformation with the validation and all that kind of stuff. I personally would not use C pickle because one of the things you can run into is if you upgrade your version of Python on one server, but not the other, because you're in the process of going from one to the other. And some thing has a different structure and memory and gets put over there. The other ones can't read it or like, there's always these, these challenges of pure binary matches. So I don't know, I would do that. Probably serialize as JSON or something and serialize it back. But anyway, it's pretty cool.

Starting point is 00:10:09 What do you guys think? This looks nice. I actually haven't used queues in Python before, but it's on my to-do list because I mean, designing complex systems, breaking it up into different processes with queues back and forth is a cool way to do it. Yeah, I'm kind of inspired by this. I kind of want to do more stuff with queues as well. David?

Starting point is 00:10:27 Oh, it seems like a really clean, simple way to use queues. I'm with Brian, I haven't really used it in a Python context before. But like the examples he gave are perfect. You know, emails are, they take a long time. So you don't want to be binding up your main application, you dump those off into a background task. And this looks really, really simple to use. So, you know, I seem like it'd be worth a try for sure. Yeah, for sure. Other things are like you need to generate a report that takes 30 seconds, you know, kick off the generation and then see if it's in the database and just do some sort of like Ajax poll until it's there or whatever. It has some more features. So it has a queue, which is first in first out, as you can

Starting point is 00:11:05 imagine. It has a capped, I call it a capped collection. I feel like it should be a capped queue because it's implemented behind the scenes as a capped collection. They also say a bounded queue is another AKA. So the idea is if you're doing like analytics and logging and you're trying to eventually process that and save it to the database, but you want to say, you know what, we really don't want this queue to get more than 100,000 items at a time because we should be writing this to the database. And if something goes wrong, it can completely wreck the server. So you can create these capped queues where you're like,

Starting point is 00:11:34 I'm going to start throwing away old stuff if we don't get to it in time. There's a DQ, which to me sounds like getting stuff out of a queue. But oh no, it's a double-ended queue. A double-ended queue. It should be a, yeah, anyway, it should be, um, the idea is you can basically put stuff onto the front or the back and you can pop stuff off the front and the back. So you could, for example, put low priority items on the back or something's really important. You could kick it up to the front or right to the front of the queue. And then finally you also also do a stack. You can also do a

Starting point is 00:12:05 priority queue, which is like sort of pretty close to what I described, but you can't jump ahead of the things that have a similar priority, right? Like if there's super urgent and then low, you can put like a super urgent new thing at the front of the super urgent ones, but it would appear before all the others, things like that. So this is all pretty neat. What I really like about this is obviously Python has queues built in, right? Like that's just a data type. A list itself could basically be a queue. You can pop stuff off the front and Shazam, you have a queue. But this is out of process, right? This means if you have to scale out for your worker processes in any sort of API, or you want it to be able to be durable across app restarts, things like that. And if you

Starting point is 00:12:44 think, oh, I'm not going to scale out across, I'm not having multiple servers. Like almost every Python web app and web API runs with multiple worker processes at a minimum. So yeah, you're scaling out. Anyway, I think this is pretty useful. And if you're all about Redis, this is cool. Redis seems nice.

Starting point is 00:12:58 I'm kind of inspired to do something like this with MongoDB, but I'm also busy. So probably not right away. And John Sheehan out there in the live stream is telling me that learned a few years ago that DQ is pronounced deck. So yeah, double ended. Yeah.

Starting point is 00:13:11 All right. So deck. Thanks. And then Teddy out in live stream says, I'm not too familiar with queues, but how would it work if your queue process that execute Python code, it would end up being a process sequentially because of the Python gill.

Starting point is 00:13:27 So are you are you ending up with like a serial process because of this serial processing? I think it depends on just how you create the workers. Right. So there's two ends that you build. One end is the put stuff in the queue. Then you literally build the end that goes to the queue and says, give me the next item. And that's stored in Redis, which obviously can support multiple clients. So if you just scaled out the consumers of the queue messages, the things running the jobs, then you would escape the gill, right? Because you would have multiple processes. You can do multiple things feeding the queue as well. Yes. Multiple web requests or something yeah absolutely absolutely all right

Starting point is 00:14:05 david what you got for us all right well are you either of you heavy pandas users i'm a pandas admirer and i use it a little bit but i always feel like when i come to pandas i know there's way more i should be doing with this and this is so cool but uh not as much as i should be well and i use pandasas pretty heavily in my previous job to do a lot of analysis, especially on the one-dimensional data sets. And it always happened. When I first started using pandas, I was doing a lot of really bad things like it arose and that type of thing. And the more you learn about it, the better you get at doing setup type operations. But even in the last couple of months, you'd think I'd have everything down, but the API is huge.

Starting point is 00:14:47 And I always had these ah moments because I learned about something like Transform. And, you know, once I realized what you could do with Transform, it simplified so many things that I was doing. And the first item I have is an article that says 25 Panda functions you didn't know existed. And I don't normally like these articles because they almost feel a little bit clickbaity,

Starting point is 00:15:06 but this one actually had a handful of moments for me. So I thought I would go ahead and share it. So I have them listed in the show notes, kind of the moments for me, but between is a really nice, really nice. I think it would consider it a method on the data frame or a series and basically allows you to simplify logic instead of trying to say greater than or equal to blank and less than or equal to blank you can just say

Starting point is 00:15:29 between values very similar to the operation that you would do in a sql transaction uh styler i had no idea existed uh you can actually apply styles to the tables coming out of pandas. I do a lot to try to make my notebooks really, really pretty so that I can convert them to HTML or another format and share them with the business. The business isn't typically like notebooks, but I'm trying because I can't stand the intermediate step of copying to a PowerPoint, but this

Starting point is 00:15:58 would definitely help. You can do gradients. You may have a bunch of different functions behind that. Options is another one I've kind of played with a little bit. There's one in here that I wanted to try before the show. I hadn't had a chance. You can change the graphing backend on pandas from app plot lid to something else. So at some point, I'm going to try changing it to plot lid because that's my preferred plotting library for most things. Convert D types is really nice. If you know you have a categorical type set of information, you can dramatically reduce how much memory is taken.

Starting point is 00:16:30 Mask was a nice one. It basically allows you to quickly convert, somewhere down here, quickly convert certain particular values or values that meet a criteria to another value. I was doing this oftentimes in multiple stages. This would clean up that code significantly. NA smallest and NA largest also could have been very helpful. Essentially, it's similar to like a max or a min, but instead of just pulling a single, you could pull, in this case, five. And a clip at time.

Starting point is 00:17:03 Cool. So if I want to see the five largest revenue producing customers in my data frame, I could just quick do that. Yeah. Yep. And there are ways you can like with anything else, Pandas, you could use a couple other methods to get that done too. But it's just so much cleaner to do diamonds in largest five and then price.

Starting point is 00:17:21 It's just very clean and fast instead of having multiple lines to do a transformation and then a transformation and then another change. So I wanted to suggest this article. Like I said, I've been doing pandas for a couple of years and I still have these moments and this article, well, some of them aren't maybe quite on moments for me. They may be on moments for someone else because everybody probably knows 20 and maybe a slightly different 20 of the pandas api yeah this is really neat i love these types of things that i mean it's super easy to just scan through and decide whether or not it's it's really helpful to you uh the one for me the pandas one that had the biggest like oh my goodness was uh web scraping and and like

Starting point is 00:18:01 pulling html tables and turning those into data frames so like obviously i can go yeah you go with like requests and beautiful soup and do something but then you still end up with just a table of html but with pandas you can say read html and then just give me table three as a data frame like it's ridiculous right now pandas has some really nice uh io tools to around csvs parquet most the most data, data format types and even some of the lesser common ones. It's it's a really nice library overall. But yeah, like I said, there's always always some odd moments. And it's nice to have an article that highlights several odd moments for me. Yeah, super cool. So go ahead, Brian. The one that jumps right out at me was the number one one. I didn't know

Starting point is 00:18:46 that, that you could just write Excel with pandas. That's pretty cool. And I think there's another wrapper around write Excel that kind of simplifies converting a data frame to Excel. But I think write Excel lets you do some more, more intricate things with Excel. Yeah, that's pretty cool. Yeah, that's, that's super cool all right before we move on really quick from the live stream uh i liked when you ask if anyone uses pandas and likes it dean likes them just said yes all caps beautiful but then also suggested pointed out this project that he built that is a like a give you live tips while you work with pandas and notebooks type thing called dove panda. So I literally am just checking this out now,

Starting point is 00:19:26 but as you work with it, you can see here, like it gives you like little tips, like, Oh, by the way, did you know you can concatenate like this? If you specified access one,

Starting point is 00:19:33 you get, you know, such and such and gives you a little, little tips and tricks as you work with it. So people can check that out. Yeah. Yeah. So moments.

Starting point is 00:19:43 Exactly. Exactly. Thanks Dean. Brian. I do love some fastAPI, and I love Rich, and I'm looking forward to what you're going to do by trying to put these together. Yeah, well, I've been watching Rich, of course, and FastAPI a lot.

Starting point is 00:19:58 And so this article is by Hayden Kodelman, I think, and it's FastAPI and rich tracebacks in development. So the idea is that one of the cool things that Rich has is these awesome tracebacks and logging. They're just beautiful. And I mean, if you can say a traceback is beautiful, it's because of Rich, probably. They look pretty great.

Starting point is 00:20:22 And the logging is pretty good. So I'm just going to scroll down to some of these examples at the bottom. So the it's kind of tiny, but the logging is nice and colorized and stuff. And then the the exceptions, one of the things with the tracebacks and exceptions is there's a highlighted line number. It highlights the actual file name and kind of puts in lower, you know, more muted colors, the stuff you don't really need to care about right away. And it's just kind of a nice way to do it. You can use syntax highlighting in your,

Starting point is 00:20:54 like keyword highlighting in your code. Yeah. The code that is the stack trace of a crash in the trace pack. And so we've seen some examples of how to use the rich tracebacks from other programs, but I haven't seen it actually written up by somebody else. And so this is nice.

Starting point is 00:21:14 Using FastAPI is awesome for building web APIs. But how do you do this? How do you get your application to do this? And so I'm not going to scroll through all of this but the uh the gist of it is is there's really only a few steps so this post walks through all of it with all the code and just for the most part you create a database a data class with the logger configuration um and then you need a function that will either

Starting point is 00:21:43 install rich as a handler or the production log configuration. I like that he puts this, this, this switch in place. So the idea around this is when you're debugging, you're going to use this, this nice,

Starting point is 00:21:55 these nice tracebacks. But when you're winning some production, it's not going to use that. It's just going to do the, the, the default logging. And then you have to call logging basic config with the new settings. And then a little note that if you're using UVA corn, you probably want

Starting point is 00:22:11 to override the logger for that. And that's it really sets it up. And it's got all the code in place so that your fast API application can have these lovely logs and tracebacks during development. Yeah, that's super neat. David, are you a fan of either of these frameworks? I haven't had a chance to use Rich too much. I have been watching Textual pretty closely on Twitter because it's just phenomenal what he's been able to do. How do you have a docking scrolling side thing in a terminal window?

Starting point is 00:22:38 What's going on here? I do. I love FastAPI. I built my wife's website using Flask and I liked how FastAPI was similar to Flask in a lot of ways. But, you know, some of the syntax was a little bit cleaner, although with the newer version of Flask, it kind of borrows some of the same syntax.

Starting point is 00:22:54 And it's just got a lot of really good necessities built in. The API documentation was really, I think that's kind of clutch when you're learning a new framework, too, because you're not having to do, do like curl commands or anything like that. You can just bring up a webpage and poke at it, you know, visually, which is, which is pretty nice. So no, I really like fast. I just, you know, other than, you know, kind of building some small toy things, haven't had a really compelling reason to use it yet. So yeah.

Starting point is 00:23:19 Yeah. Very cool. Toys are compelling reasons. I think. Definitely. Definitely. Maybe some Arduino thing could run a fast API server. Who knows?

Starting point is 00:23:27 All right. So let me talk about some good news. Good news, good news. We've had a couple of things we've covered about some visionary sponsors coming on to support Python and the PSF and so on, which is fantastic, right? I've certainly whinged a lot about people running,

Starting point is 00:23:44 you know, multi-billion dollar revenue companies and doing nothing really to give back than maybe a PR or something. But we've got Microsoft, we've got Bloomberg, we've got Google as visionary sponsors, right? And one of the things that that made possible is the CPython developer in residence. I don't know if it's directly related to one of those

Starting point is 00:24:04 or if it's just sort of like that sort of brought it all together. But recently the PSF said they're going to have a developer in residence position and well-known community member, friend of the show, Lucas Lenga has applied and got hired. He's now the

Starting point is 00:24:19 developer in residence. This is a little bit old news for it's from last month, but I wanted to make sure we gave it a quick shout out because I for it's from last month, but I wanted to make sure we gave it a quick shout out because I think it's going to be pretty interesting to know that there's a developer side person inside the PSF making sure things are going. So the PSF has seven, eight, nine, I don't know, something like this. I haven't got recent updates, including this, but include this position, full-time employees, right? So there's a bunch of people who work there,

Starting point is 00:24:45 but to my knowledge, this is the first like developer person rather than marketing, legal, whatever, right? All that, the sort of business director, administrative side. So this is pretty interesting. Apologies to everybody that works at the PSF that's like, don't forget me. Yeah, no, no, no.

Starting point is 00:25:01 Those are super important, but it's interesting that there's not been a Python developer type of role within that group is all I'm saying. So they put that out. LucasLanga is now part of it. And there's some interesting takeaways here. So basically, let me just give a bit of a quote here for how Lucas decided to sort of position this and how he sees it. He said, I don't really want this to be like, Hey, I am the, uh, you know, the appointed CEO of Python. So he listened to what I have to say, right. But now, um, he's in, he's incredible

Starting point is 00:25:36 hope, incredibly hopeful for Python because of this and wanted to apply for it. And so on. He says, I think it's a role with transformational potential for the project. In short, I believe the mission of the developer in residence, the DIR, is to accelerate the developer experience of everybody else. And that not includes just the core team, but most importantly, the drive-by contributions contributors submitting pull requests and creating issues on the tracker. So he's hoping that with this role, he can do things like make sure that there's a steady review of the stream of PRs and issues so they don't get stale and there's not a backlog. Triage the issues, be present in the official communication channels to unblock people if

Starting point is 00:26:19 they get stuck trying to contribute, keeping CI and test suites in a usable state and making them run quick, and keeping tabs on where the work is most needed in the projects that are most important. So he's sort of the, it sounds to me almost like the technical person in the room to help the community keep moving and just making sure, oh, everyone's having a problem. Many people having a problem trying to do a PR because they can't get CPython to build. Let's make that incredibly simple for them and things like that. Yeah. I like his attitude of where he's going with this. So, yeah. Yeah. If I didn't point out, Lucas is also the creator of Black, the Black formatter, which I know we've talked about in a hundred thousand variations here. So that's great. David, how do you feel about this?

Starting point is 00:27:02 I think it's great. Any full-time person that can have working for the PSF or on Python directly is going to help increase stability. And I like his approach too, where he's going to try to increase throughput by maximizing everybody else's efficiency. I think that's a... It'd be easy to say like, oh, I'm going to work on these features or on this,

Starting point is 00:27:20 but he's most concerned about making development for Python as ergonomic as possible, which I think ultimately will create more throughput and, you know, a better, better Python in the long run. Yeah. And absolutely props to the PSF because it's easy to hire somebody and say, here's what I want you to produce for us. It's harder to hire somebody and say, I want you to be an enabler of other people because it's hard to measure that right yeah one of the interesting things that i think that he's doing is is i'm not sure if he's going to keep this up but it looks like he has so far is he puts out weekly report posts of what he's been doing so this i can't

Starting point is 00:27:56 imagine having that much public scrutiny over what my work week looks like but i mean brian why did you spend so much time working on CI? Come on. So, it's pretty impressive, and it's cool that he's doing that. The entire Python world is watching. No pressure or anything. Yeah, he did say he was a little nervous about this, because

Starting point is 00:28:20 this is the first year of this position, and so the success or failure he has will influence like whether it continues and you know what happens sort of in the future so super cool let me uh get a little feedback from the audience here so uh sam orley hey says good for lucas he's great i watched a bunch of videos he did on youtube about making music with async io i haven't seen those i have to check them out And Dean out in the live stream says, CEO of Python reminds me of a known joke in my country where this famous newscaster was shouting, get me the person in charge of the internet. Get me the

Starting point is 00:28:55 person in charge of the internet. That's great. Dean, you have to let us know what country that is. That's awesome. All right, Brian, you're with the next one? What's that? You're next. No, you already did this, right? Yeah, David's next. I got to keep track of what's happening here. David, you're next. Okay.

Starting point is 00:29:16 So my next item is a library or framework. I'm not sure which one it falls under called Daxter. It is a data orchestrator for machine learning, analytics, and ETL. It's one of the first attempts I tried for any data pipeline. It's based in Python, so you programmatically build up your pipeline using Python and different decorators depending on if you're building a solid, or depending on what you're building in the pipeline, or if you're doing configuration, use different decorators um it took a little bit to kind of wrap my

Starting point is 00:29:48 head around it i think it had more to do with the just kind of understanding how pipelines are typically constructed in industry but once i got my head wrapped around it it was really simple to use i felt like i could produce things pretty quickly um one really nice thing that they do is they you know allow you you to essentially work on your pipeline locally, then deploy to production to like a Kubernetes, or you can deploy to Airflow or Dask or whatever underlying engine you want to run your pipeline. And there's very little transition there. You're not developing something local and having to completely change it for like a cluster or larger scale. And another really nice feature it has is a UI called

Starting point is 00:30:33 Daggett. So you could do everything via the command line if you want to, but it does come with a really nice UI that allows you to see an overview of your pipeline. It allows you to test it using the playground. You can update your configuration in the playground. You can look at previous runs to see if they pass or fail. It gives detailed logging and error messaging. This by itself is pretty nice on top of an already very nice tool. I can give a quick demo too. So this is the, I think it's the first part of this work

Starting point is 00:31:10 tutorial they have you where you have multiple solids. So these represent different pieces of processing. And then, like I said, you can use the playground. It'll check all of your configuration, everything to make sure it's correct before it lets you run anything. So if you have something misconfigured, it's not going to blow up halfway through a, you know, a 30 minute job. And then when you like that, Oh, no, no. So I'll probably I'll probably forego the, the real

Starting point is 00:31:37 time demonstration. I think my terminal probably died is what that was. But uh, yeah, it will actually show a run in sequence and show the different pieces that are completing and feeding into the other piece too. It's not so much for this because it's a very small, quick pipeline, but if you have longer SQL queries or something like that, it'll actually show in real-time how it's processing.

Starting point is 00:31:59 You can get a visual intuition to what's going on on top of everything else too. Yeah. There are a couple of the resources around this too, if you want someone that explains it a little bit better than I do, the Data Engineering Podcast had an episode and Software Engineering Daily also did an episode about Daxter. So, you know, that's kind of where I first learned about it

Starting point is 00:32:19 and there's a lot of really good information in those podcasts. Yeah, these data pipeline frameworks are super interesting. I've certainly realized just how valuable they can be. Dean asks, David, how is this compared to Airflow? Do you have any idea?

Starting point is 00:32:32 Have you tried? Have you looked at either? This was, I haven't used Airflow. This is the first, my first stab at any kind of data pipeline. And in my current job, we're not using Airflow or DAX

Starting point is 00:32:42 or we're using one of the cloud-based tools. So it's, I think Airflow is more draggy, droppy, more visual, but I could be wrong about that. One thing I really like about Daxter is, at least compared to what I'm currently using, is that you could programmatically create these interfaces. And technically, the tool I'm using now has an API that you can throw JSON against to create your different resources and everything.

Starting point is 00:33:06 But it's nice having Python code because that works a little bit better with my brain than a lot of the draggy, droppy stuff. Yeah, yeah. I did have the Airflow folks on the show, on TalkPython, not the show,

Starting point is 00:33:19 a little while ago. It's not out yet, but last week maybe? And they pointed out that it's mostly, it's like pretty much all Python here as well. So you program it in Python over on Airflow and then you have similar visual tools to actually see what's happening, but you can't interact with it through those things.

Starting point is 00:33:39 You can just like kind of watch it and debug it and stuff from my understanding. So I would put them in a pretty similar category. I would say one thing that's pretty interesting is there's that's not what i would pull out actually one of airflow github is what i wanted to sort of point out i was really surprised to learn that airflow has 22 000 stars on github which kind of blew my mind i thought of it as like a this little framework that people might use apparently it's popular i'm not really sure about daxter i guess i could look as well i think it's it's relatively new so i'd be surprised if it were quite as popular as airflow but uh one nice thing that

Starting point is 00:34:09 that can do if you're running if you're running or if you have airflow uh pipelines that you're using you can use that server to run dax or two it can basically pilot you something that's compatible with airflow if you need to do that. So there's a couple of different, I think, translation ways you can translate it too. So it seems like a pretty interesting tool. And like I said, I had developed a small pipeline in my previous job as kind of my first stab at pipelines to eliminate it in Excel sheet that was doing a bunch of horrible, awful SQL queries.

Starting point is 00:34:41 I could just imagine that people are trying to do this with Excel and it was probably wrong. Oh, it was. Not necessarily incorrect, but it was wrong to do it. Well, it was, well, it was, it was interesting. Excel is just very interesting to reverse engineering. It's a lot of go-to statements. It's, it's ubiquitous, but it's definitely, as far as, you know, programming production

Starting point is 00:35:01 systems, not a good tool. So. Yeah. Yeah. Very cool. All right. So I got some more real-time updates here. you know programming production system is not a good tool so yeah yeah very cool all right so i got some more real-time updates here teddy says i know one of the big differences with airflow is

Starting point is 00:35:09 that you can use the output of a task as the input of the next task from what i understand daxter is kind of a second generation data orchestration unsure which which uh generation airflow would be but um here we go and Airflow mostly assumes you store and load data in each task, even though Airflow has something called XCOM, which allows you to pass the output as input of the next. Okay. Interesting. Yeah. Thanks for all that background info there. I haven't used either, but I definitely, definitely think they're both neat. And I feel there's a lot of places that are just like, well, how else are we going to do it? Of course, we're going to use that spreadsheet, right? And if they had tools like this, it would be very

Starting point is 00:35:44 empowering. One of the things I find very interesting about these frameworks is usually what you end up building is like the little piece, like load the CSV into the database or run the report that gets me the revenue for the day or, and what you end up building are very, very small pieces. And you don't have to worry about the reusability, the reproducibility, the durability. You just go like, I'm going to build an incredibly small bit of Python and we'll just click it in as part of this workflow, which really seems to empower people almost like the microservices story, but for data processing without all the hard deployment side of things. I hope that they, if they don't already have it, I hope that they put a tool connected

Starting point is 00:36:23 with Degster called called dagdabit because it needs to be there i think um maybe some sort of capture tool or something dagdabit would be good yeah yeah i love the ui bit of it as well all right uh quick bit of follow-up i guess brian you want to start you got any extras today i've got just a vanity extra so So one of the things that we noticed Will mentioned about textual, we talked about textual briefly. The stars on textual is just going through the roof. I love the graph.

Starting point is 00:36:58 Is this the XKCD format of Matplotlib or something? What is this? I have no idea what it is. Yeah, that's great though. Anyway, show us the other pictures. Yeah, the stars are insane. It's like a vertical line on a graph. One of my own projects has a similar trajectory.

Starting point is 00:37:16 So I wanted to just highlight that. It's looking up too. Of course, I only have 16 stars. Will has like 3,000. A little different, but still, look. It's kind of the same, don't you think? Yeah, that's awesome.

Starting point is 00:37:34 It's 16 stars most of my repos. You just got to extrapolate it a little bit. No, that's really cool. Awesome. David, do you have any extra stuff you want to throw out? Sorry, Brian. I had one extra. I didn't load it on my screen over here. Let me see if I can pop it over real quick. And this is in Python, but I know SQL and Python tend to play a lot.

Starting point is 00:37:53 Are you going to go back to some nostalgic time on the internet where you opened up a DOS prompt and typed win to start Windows? What is this? This is Modern SQL. It's a really fantastic slideshow that goes through a lot of updates so if you're still doing sql the old-fashioned way it shows you how you can replace that with you know better cleaner more concise versions of them there are so many things in here that i have was doing a lot of like just horrible hacky tricks to get to work that you could take

Starting point is 00:38:22 care of for in one line for sequel and that SQL. Even with some of the newer things I've learned, there's just so many great, I don't know if you call them tools or methods or what, but I found in SQL tend to work together a lot, especially in the data space. If you're like me where you have some self-taught SQL experience, something like this can be very helpful to kind of learn some of the, I guess, better practices for different things

Starting point is 00:38:48 that you might want to try to do with SQL. No, this is great because I learned SQL like in the 90s. So it's changed a lot since then. And I was just thinking the same thing, Brian. Like it's been at least 10 years since I've tried to refresh my SQL skill. So there's probably a lot of stuff that's, oh, you shouldn't do this. Like, why you do this? If you use this other a lot of stuff that's, Oh,

Starting point is 00:39:05 you shouldn't do this. Like, Oh, why you do this? If you use this other keyword, it's more efficient, safer, faster.

Starting point is 00:39:10 Come on. Yeah. That's like a jealous of the people learning SQL now. Yeah. How about you, Michael? Got anything extras? I got some followup,

Starting point is 00:39:20 some followup from last time. This comes to us from John Hagan. And I think I probably is the one who said this i said oh there's really cool time if i would like about being able to use lowercase d dict and lowercase l list as type hints rather than from typing import capital l list or capital d dict right said oh that's coming in 310 fantastic he's like uh you know that's in 35 or 39 so it's kind of already out. Oh, right. Okay. But he did point out some things that are coming that are neat. So for example, previously we had to say, if I want a potentially optional, it could be none or it could be a list. And the list, if it is a list has strings, you have to say optional

Starting point is 00:39:59 bracket list bracket stir. And those are all capital because they have this parallel type implementation over in typing, right? In Python 3.9, I can now say optional of lowercase l list, a bracket str. And you might think who cares if it's lowercase or uppercase L? Well, the difference is you don't have to do an import and explain to people who don't know that code like, oh, you've got to go import this other type things to say the type. Yes, I know list is right there, but you can't use list. You got to do something else, right? So that's the feature that I was excited about that I said was in 3.10, then 3.9. So hooray. But he also pointed out that the union operators were simplified. It used to be you would have a similar syntax for union as optional. You

Starting point is 00:40:39 would say union of bracket one thing, comma bracket the other thing. But now you can say just type one pipe vertical bar type two. And this actually allows us to model optional without importing optional. So instead of optional of list of string, we can just have list of string pipe none. Yeah, this is cool. And I'm glad somebody pointed out because the 310 announcements don't say anything about optional, but in effect they do. You don don't say anything about optional. But in effect, they do. You don't have to use this anymore. But are you going to start using this?

Starting point is 00:41:11 The pipe thing? Well, yeah. And the optional thing. Because I started to. And then I realized that if I start using that, then my code is 3.10 only. Yes, exactly. Which depends on the scenarios, right? So for, say, TalkPython training,

Starting point is 00:41:26 the code all behind that, I control the server. Yeah, nobody's looking at it. It's easy for me to make it the brand new thing. If I were to say generate, if I were going to build an example app for a course, then I would be hesitant to use this right away. I might wait a year or two

Starting point is 00:41:40 because I don't want to have to have people have a bad experience. Like, well, I have 3.9. That's pretty new. That should be work. Like, nope, that doesn't work because I didn't want to have to have people have a bad experience like well I have 3.9 that's pretty new that should be work like nope that doesn't work because of I didn't want to say that word optional right yeah and if it was an open source project I guess it would depend on how if I wanted to support older versions probably even longer there wait I don't know what you think yeah I'm always thinking a library specifically you'd probably want to almost stick with the 3.5 to 3,

Starting point is 00:42:05 at least for a while, to kind of flush out people that are using some of the older versions of Python. Yeah, I think 3.9, I'm using 3.9 on everything now, but I think for a lot of people, that's still pretty aggressive to have a 3.9 or higher requirement for a library.

Starting point is 00:42:21 Yeah, I agree. A couple of bits of real-time feedback out there sam and dean both say there are dunder future imports that you can do now that will enable some of this stuff already so like dunder from dunder future import pipe i don't know if that's true or if it's a joke um well i do know that the the d Dunder future stuff does support the newer type information. I don't know about for pipe. Okay. Yeah.

Starting point is 00:42:48 Yeah. Okay. We can do some after coding on this. Coding after the recording and we'll know. Oh, Dean says he's kidding. Yeah. But you really can. Thank you.

Starting point is 00:43:01 You really can do some of these other type information with the import tender features. Okay. Are you ready for a joke? Yeah. All right, Brian. So you're going to have to help me along here. Okay. So there's two developers staring very worried at a screen.

Starting point is 00:43:18 They have one section, then a big, long, quiet section, and then some more. So you be the very first person, and I'll be the second person here. Okay. Okay. I hope it works. Do not hope. Pray.

Starting point is 00:43:32 Pray it works. Have you ever been there, just in this situation where you're just like, oh, it must work. If this doesn't work, we're done. Yeah. Yeah, not so much on the software software side of things but when i was a manufacturing engineer there was so many times we'd be troubleshooting a machine on a saturday for eight hours straight and you think you made everybody's just holding their breath crossing their fingers work at work because i want to go home someday so yeah i mean i remember how

Starting point is 00:43:59 go ahead brian no i definitely uh feel this uh when I'm using it, when you're working on C++ code because you have to wait for it to compile and then test, load it, and then test it and stuff like that. But even with Python stuff, I still feel this when I'm working on CI tools because the continuous integration, you have to, you know,

Starting point is 00:44:18 you're not sure if you got it right, the syntax right, the YAML right or whatever until you push it and see what happens. Yeah. Yeah, CI is a good point you have so little visibility in there and if it's not working uh just one better real-time follow-up on mine here it's like if you come over here and you look at the um the pep 585 it does say the implementation of some of these new features under typing this is the one that's

Starting point is 00:44:41 coming out that came out in three nine says you can say from future import annotations and then start using lowercase l and things like lowercase d i who knows i know dean said he was joking but maybe you really can't get the pipe to come out that way but but at least you can do like these these um sort of three nine level uh changes using a back to three seven it looks like okay all right cool Well, that was a lot of fun. Yeah, it was. I had another one, but I'm going to save it. Good.

Starting point is 00:45:08 All right, well, I'm looking forward to hear about it next week. David, thank you for joining us. Thank you for having me. Yeah, yeah. And thanks for all the tips and stuff you've had

Starting point is 00:45:16 throughout the years. And yeah, it's really good to have you here. And congratulations on your first dev job. That's fantastic. That is fantastic. And thanks, Dean,

Starting point is 00:45:27 for correcting us in real time. That's awesome. It's good. Yeah, absolutely. Yeah, thank you, everyone. And oh, Sam does sadly show us that import pipe from the future doesn't work. But yeah, thanks, everyone.

Starting point is 00:45:40 See you all later. Bye. Well, thank you. Thanks for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. Get the full show notes over at PythonBytes.fm. If you have a news item we should cover,

Starting point is 00:45:55 just visit PythonBytes.fm and click Submit in the nav bar. We're always on the lookout for sharing something cool. If you want to join us for the live recording, just visit the website and click Livestream to get notified of when our next episode goes live. That's usually happening at noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #246 Love your crashes, use Rich to beautify tracebacks

Topics covered in this episode: mktestdocs Redis powered queues (QR3) 25 Pandas Functions You Didn’t Know Existed FastAPI and Rich Tracebacks in Development Dev in Residence Dagster Extras Joke ... See the full show notes for this episode on the website at pythonbytes.fm/246

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.