Python Bytes - #246 Love your crashes, use Rich to beautify tracebacks
Episode Date: August 11, 2021Topics covered in this episode: mktestdocs Redis powered queues (QR3) 25 Pandas Functions You Didn’t Know Existed FastAPI and Rich Tracebacks in Development Dev in Residence Dagster Extras Joke ... See the full show notes for this episode on the website at pythonbytes.fm/246
Transcript
Discussion (0)
Hey there, thanks for listening.
Before we jump into this episode,
I just want to remind you that this episode
is brought to you by us over at TalkPython Training
and Brian through his PyTest book.
So if you want to get hands-on
and learn something with Python,
be sure to consider our courses over at TalkPython Training.
Visit them via pythonbytes.fm slash courses.
And if you're looking to do testing
and get better with PyTest,
check out Brian's book at pythonbytes.fm slash PyTest.
Enjoy the episode.
Welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 246, recorded August 11th, 2021.
I'm Michael Kennedy.
And I'm Brian Ocken.
And I'm David Smith.
Hey, David Smith. Welcome.
So good to have you here.
It's good to be here.
Yeah, you've been a suggester of topics, I believe.
You've sent in some ideas and thoughts for us.
And well, we're going to get a good dose of that today for sure.
But honestly, if I'd known that you're going to open this up,
I probably would have ordered some of those
because it was a little bit of a scramble to be like,
oh yeah, I already gave them that tip.
We already gave them that tip.
So yeah, I had to dig a little bit.
You've already shared all your favorites.
Well, your losses are gained because you've made it easier for us in the past.
So thanks for sharing those things.
And yeah, thanks for being here.
It's going to be great to have you.
Definitely.
That was great.
Yeah, I want to give the quick elevator pitch on you.
What do they know about you?
Well, I'm a recent tech convert, I'll say.
Over the last 10 years, I've been working in the manufacturing space, either in quality engineering or manufacturing engineering. And
over the last couple of years, I've been using Python a lot more heavily. I used to do a lot
of VBA in Excel, which it was painful. And I got a suggestion from one of our equipment suppliers to
say, hey, use Python. It's really, really nice. I resisted doing it because I didn't want to learn
something new. It seemed intimidating because it's a programming language.
I'm not a programmer, but I finally caved when it came to trying to automate plotting,
which is pretty painful in Excel.
Yeah, once I started on it and had something useful working in a couple hours,
I was hooked and then I started looking for more and more resources found
to your show and got more and more into it from there.
I started digging into the web and it's just been a, I'd say, an upward spiral from there.
And about probably about two and a half weeks ago, I started in my first, I guess, official tech role in a similar kind of domain as for an automotive supplier doing engineering work. So it's been really exciting to be able to use Python full time. It is part of my job
because, you know,
the bits and times
I got to use Python before,
that's always the parts
I like the most.
So I'm happy to be doing it,
you know, on purpose.
Awesome.
Congratulations.
I wish I could do it full time.
I remember my first full time
software development job.
I was like, I can't believe
they're paying me to do this.
I better figure this stuff out
before they fire me. I can't believe I'm doing this. do this. I better figure this stuff out before they fire me.
I can't believe I'm doing this.
It was so great.
So good.
All right, well, congratulations and happy to have you here.
Brian, I feel like we should document this.
Definitely should document it and test our docs too.
So one of the things I'd like to tie,
did I just try to edit?
There we go.
Something that came up recently was Vincent Warmerdam.
I think we've had him on the show.
A couple episodes ago, yeah.
Yeah.
So Vincent announced that he's got a library called MakeTestDocs.
And I kind of love this.
So the idea is it's a bunch of utilities that you can use to help test your documentation.
It doesn't do it right out of the box.
You have to create your own test files to do this.
But the idea, like the first example that he shows on his readme,
is that you've got a Markdown file,
and it's got some Python blocks and code blocks in it.
And you can make a test that goes through, reads the markdown, grabs the Python code
and runs it.
And if there's any problems with it, if there's any exceptions, it fails the test.
This is just brilliant.
There's examples in here for doing it with docstrings and even class docstrings. And then Vincent even did, he does the com code
and he did a little com code video on how to use this.
Yeah, and you're putting that in the show notes
for people, right, to check out?
Yep, there's a link to the tutorial with the video.
The suggestion or the use case that he was talking about
at first was that maybe you're using MakeDocs for documentation.
Therefore, you've got a bunch of Markdown.
But my use case is going to be blogs.
So I think that's a huge use case, actually.
Yeah, I've got Python code in my blog source code.
It's Markdown files.
I totally want that's one of my to-do list is to try this
to make sure that the blog content is accurate so that is super cool you know one more thing
that you might find interesting i think this is a more true software engineering type of solution
but another sort of whizzy wig as you work style of solution is pycharm if you have a marked down
file and you have Python code in there,
we'll highlight the errors and actually show you if like symbols are missing
and stuff.
So if you had the Markdown associated with the sample code and then you like
do stuff with your little examples,
it may actually show you the errors live as well.
Oh, that's cool.
Yeah.
I mean, that's not like a CI sort of keep it fixed,
but that's a, as you type kind of thing. thing yeah and the other comment that he had is um if you i normally don't put like
asserting things are valid in in documentation but uh the comment in the readme is that if you
put asserts in there it'll get checked also so you've got like unit tests built into your
documentation super cool david what do you think? It's interesting. I'm just
trying to figure out,
are you doing like a parametrized test
and looking at your inputs versus outputs
for the code that's in the documentation
or how do you actually know it's
testing correctly?
Is it a valid Python?
The little code snippet we've got
that we're showing on the screen, in the
chat, but also there's a link in the readme to the read in the show notes to the readme.
The parameterizes that it uses uses the like in this example, I'm saying, go look in my docs folder.
And for everything that it finds in there, that's a markdown file that'll show up show up as a parameterization of the test.
So if I've got this test will run once per file.
So if I've got three Markdown files in there, the test will run three times.
This is the most comprehensive and yet extremely short test I've seen.
Really long times.
Three lines and it will basically traverse a tree of markdown
file hierarchy type thing oh i do tons of really tiny tests so yeah yeah nice nice nice all right
avaro welcome to live stream happy to have you here uh let's see let's move on to the next one
i think speaking of users giving us our listeners giving us ideas and helping us out here i want to talk about
something that i've been hanging on to for a little while since march but i finally decided
it's time to talk about it and that is creating queues um out of process sort of asynchronous
queue processing so if i've got uh say a web app or an api or even if i'm testing a bunch of hardware and I want to kick off a bunch of jobs, eventually I don't want to necessarily block on all of them.
I might want to push them down so other things can work on them.
You know, if I'm going to send a bunch of emails, if you've ever tried to send a thousand emails in order synchronously, it turns out that times out your web request.
Don't do that.
So a better idea would be to push them to a queue and have have some sort of background process go, oh, there's new emails
to send. Let me jam those on down the line. So Scott Hacker sent over this pointer to this library,
a small but cool little one called, it is called QR3. And QR3 is a queue for Redis. And the three
means Python three, because there used to be a QR that wasn't three
that's not Python 3 compatible.
So here's like a reimagining of that for Python 3
or just a compatibility that got moved over.
So it's pretty cool.
We check it out.
The API and implementation or the usage
is quite simple as you could imagine.
So all you got to do is you got to,
it's built upon Redis Pie.
You've got to have Redis installed.
That could be, you know, wherever.
It could even be Redis as a service
on some of these cloud platforms,
run it in Docker, run it locally.
Then you have Redis Pie,
and then you just go over and you create a queue.
So you just say queue and you give it a name
and then some server connect info
like location and authentication and whatnot.
And then all you've got to do is you push items to it.
They could be just really simple things like a bunch of email addresses you're going to send.
But it could also be really complicated.
Like, for example, it could be, say, Pydantic models that store all the data that you need to process that request.
So that's pretty cool.
It has the default way of getting data over to it is through CPickle.
And CPickle is better than Pickle, but still has issues and other restrictions.
Some of the restrictions are you can't put certain types of objects.
Like it wouldn't make sense to serialize a database connection that has an open socket
or a thread or some weird thing like that, right?
But most of the sort of message, here's the
data you need to process, you would send over, all that stuff would work. And you can also create your
own serializer on a per queue basis, which is kind of cool. So if you said, I want to only work with
identic models, you could put the sort of from dictionary to dictionary transformation with the
validation and all that kind of stuff. I personally would not use C pickle because one of the things you can run into is if you upgrade
your version of Python on one server, but not the other, because you're in the process of going from
one to the other. And some thing has a different structure and memory and gets put over there.
The other ones can't read it or like, there's always these, these challenges of pure binary
matches. So I don't know, I would do that. Probably serialize as JSON or something
and serialize it back.
But anyway, it's pretty cool.
What do you guys think?
This looks nice.
I actually haven't used queues in Python before,
but it's on my to-do list
because I mean, designing complex systems,
breaking it up into different processes
with queues back and forth is a cool way to do it.
Yeah, I'm kind of inspired by this. I kind of want to do more stuff with queues as well. David?
Oh, it seems like a really clean, simple way to use queues. I'm with Brian, I haven't really
used it in a Python context before. But like the examples he gave are perfect. You know,
emails are, they take a long time. So you don't want to be binding up your main application,
you dump those off into a background task. And this looks really, really simple to use. So, you know, I seem like it'd be worth a try for sure.
Yeah, for sure. Other things are like you need to generate a report that takes 30 seconds,
you know, kick off the generation and then see if it's in the database and just do some sort of like
Ajax poll until it's there or whatever. It has some more features. So it has a queue,
which is first in first out, as you can
imagine. It has a capped, I call it a capped collection. I feel like it should be a capped
queue because it's implemented behind the scenes as a capped collection. They also say a bounded
queue is another AKA. So the idea is if you're doing like analytics and logging and you're trying
to eventually process that and save it to the database, but you want to say, you know what,
we really don't want this queue to get more than 100,000 items at a time
because we should be writing this to the database.
And if something goes wrong, it can completely wreck the server.
So you can create these capped queues where you're like,
I'm going to start throwing away old stuff if we don't get to it in time.
There's a DQ, which to me sounds like getting stuff out of a queue.
But oh no, it's a double-ended queue.
A double-ended queue.
It should be a, yeah, anyway, it should be, um, the idea is you can basically put stuff onto the front or the
back and you can pop stuff off the front and the back. So you could, for example, put low priority
items on the back or something's really important. You could kick it up to the front or right to the
front of the queue. And then finally you also also do a stack. You can also do a
priority queue, which is like sort of pretty close to what I described, but you can't jump ahead of
the things that have a similar priority, right? Like if there's super urgent and then low, you
can put like a super urgent new thing at the front of the super urgent ones, but it would appear
before all the others, things like that. So this is all pretty neat. What I really like about this
is obviously Python has queues built in, right? Like that's just a data type. A list itself could
basically be a queue. You can pop stuff off the front and Shazam, you have a queue. But this is
out of process, right? This means if you have to scale out for your worker processes in any sort
of API, or you want it to be able to be durable across app restarts, things like that. And if you
think, oh, I'm not going to scale out across,
I'm not having multiple servers.
Like almost every Python web app and web API
runs with multiple worker processes at a minimum.
So yeah, you're scaling out.
Anyway, I think this is pretty useful.
And if you're all about Redis, this is cool.
Redis seems nice.
I'm kind of inspired to do something like this with MongoDB,
but I'm also busy.
So probably not right away.
And John Sheehan out there in the live stream
is telling me that learned a few years ago
that DQ is pronounced deck.
So yeah, double ended.
Yeah.
All right.
So deck.
Thanks.
And then Teddy out in live stream says,
I'm not too familiar with queues,
but how would it work if your queue process
that execute Python code,
it would end up being a process sequentially because of the Python gill.
So are you are you ending up with like a serial process because of this serial processing?
I think it depends on just how you create the workers. Right.
So there's two ends that you build. One end is the put stuff in the queue.
Then you literally build the end that goes to the queue and says, give me the next item.
And that's stored in Redis, which obviously can support multiple clients. So if you just scaled out the consumers
of the queue messages, the things running the jobs, then you would escape the gill, right?
Because you would have multiple processes. You can do multiple things feeding the queue as well.
Yes. Multiple web requests or something yeah absolutely absolutely all right
david what you got for us all right well are you either of you heavy pandas users i'm a pandas
admirer and i use it a little bit but i always feel like when i come to pandas i know there's
way more i should be doing with this and this is so cool but uh not as much as i should be well
and i use pandasas pretty heavily in my
previous job to do a lot of analysis, especially on the one-dimensional data sets. And it always
happened. When I first started using pandas, I was doing a lot of really bad things like
it arose and that type of thing. And the more you learn about it, the better you get at doing
setup type operations. But even in the last couple of months, you'd think I'd have everything down, but the API is huge.
And I always had these ah moments
because I learned about something like Transform.
And, you know, once I realized what you could do with Transform,
it simplified so many things that I was doing.
And the first item I have is an article
that says 25 Panda functions you didn't know existed.
And I don't normally like these articles
because they almost feel a little bit clickbaity,
but this one actually had a handful of moments for me.
So I thought I would go ahead and share it.
So I have them listed in the show notes,
kind of the moments for me,
but between is a really nice, really nice.
I think it would consider it a method on the data frame
or a series and basically allows you to simplify logic
instead of trying to say greater than or equal to blank and less than or equal to blank you can just say
between values very similar to the operation that you would do in a sql transaction uh styler i had
no idea existed uh you can actually apply styles to the tables coming out of pandas. I do a lot to try to make my notebooks really, really pretty
so that I can convert them to
HTML or another format and share them
with the business. The business isn't typically like
notebooks, but I'm trying because
I can't stand the intermediate step of copying
to a PowerPoint, but this
would definitely help. You can do
gradients. You may have a bunch
of different functions behind that.
Options is another one I've kind of played with a little bit.
There's one in here that I wanted to try before the show. I hadn't had a chance.
You can change the graphing backend on pandas from app plot lid to something else.
So at some point, I'm going to try changing it to plot lid because that's my preferred plotting library for most things.
Convert D types is really nice. If you know you have a categorical type set of information, you can dramatically reduce how much memory is taken.
Mask was a nice one. It basically allows you to quickly convert, somewhere down here,
quickly convert certain particular values or values that meet a criteria to another value.
I was doing this oftentimes in multiple stages.
This would clean up that code significantly.
NA smallest and NA largest also could have been very helpful.
Essentially, it's similar to like a max or a min,
but instead of just pulling a single, you could pull, in this case, five.
And a clip at time.
Cool.
So if I want to see the five largest revenue producing customers in my data frame,
I could just quick do that.
Yeah.
Yep.
And there are ways you can like with anything else, Pandas, you could use a couple other
methods to get that done too.
But it's just so much cleaner to do diamonds in largest five and then price.
It's just very clean and fast instead of having multiple lines to do a
transformation and then a transformation and then another change. So I wanted to suggest this
article. Like I said, I've been doing pandas for a couple of years and I still have these moments
and this article, well, some of them aren't maybe quite on moments for me. They may be on moments
for someone else because everybody probably knows 20 and maybe a slightly different
20 of the pandas api yeah this is really neat i love these types of things that i mean it's super
easy to just scan through and decide whether or not it's it's really helpful to you uh the one
for me the pandas one that had the biggest like oh my goodness was uh web scraping and and like
pulling html tables and turning those into data frames so like obviously
i can go yeah you go with like requests and beautiful soup and do something but then you
still end up with just a table of html but with pandas you can say read html and then just give
me table three as a data frame like it's ridiculous right now pandas has some really nice uh io tools
to around csvs parquet most the most data, data format types and even some of the lesser common ones.
It's it's a really nice library overall. But yeah, like I said, there's always always some odd moments.
And it's nice to have an article that highlights several odd moments for me.
Yeah, super cool. So go ahead, Brian. The one that jumps right out at me was the number one one. I didn't know
that, that you could just write Excel with pandas. That's pretty cool. And I think there's another
wrapper around write Excel that kind of simplifies converting a data frame to Excel. But I think
write Excel lets you do some more, more intricate things with Excel. Yeah, that's pretty cool.
Yeah, that's, that's super cool all
right before we move on really quick from the live stream uh i liked when you ask if anyone
uses pandas and likes it dean likes them just said yes all caps beautiful but then also suggested
pointed out this project that he built that is a like a give you live tips while you work with
pandas and notebooks type thing called dove panda. So I literally am just checking this out now,
but as you work with it,
you can see here,
like it gives you like little tips,
like,
Oh,
by the way,
did you know you can concatenate like this?
If you specified access one,
you get,
you know,
such and such and gives you a little,
little tips and tricks as you work with it.
So people can check that out.
Yeah.
Yeah.
So moments.
Exactly.
Exactly.
Thanks Dean.
Brian. I do love some fastAPI, and I love Rich,
and I'm looking forward to what you're going to do
by trying to put these together.
Yeah, well, I've been watching Rich, of course,
and FastAPI a lot.
And so this article is by Hayden Kodelman, I think,
and it's FastAPI and rich tracebacks in development.
So the idea is that one of the cool things that Rich has
is these awesome tracebacks and logging.
They're just beautiful.
And I mean, if you can say a traceback is beautiful,
it's because of Rich, probably.
They look pretty great.
And the logging is pretty good.
So I'm just going to scroll down to some of these examples at the bottom.
So the it's kind of tiny, but the logging is nice and colorized and stuff.
And then the the exceptions, one of the things with the tracebacks and exceptions is there's a highlighted line number.
It highlights the actual file name and kind of puts in lower, you know, more muted colors,
the stuff you don't really need to care about right away.
And it's just kind of a nice way to do it.
You can use syntax highlighting in your,
like keyword highlighting in your code.
Yeah.
The code that is the stack trace of a crash in the trace pack.
And so we've seen some examples of how to use the rich tracebacks
from other programs,
but I haven't seen it actually written up
by somebody else.
And so this is nice.
Using FastAPI is awesome
for building web APIs.
But how do you do this?
How do you get your application to do this?
And so I'm not going
to scroll through all of this but the uh the gist of it is is there's really only a few steps so
this post walks through all of it with all the code and just for the most part you create a
database a data class with the logger configuration um and then you need a function that will either
install rich as a handler or
the production log configuration.
I like that he puts this,
this,
this switch in place.
So the idea around this is when you're debugging,
you're going to use this,
this nice,
these nice tracebacks.
But when you're winning some production,
it's not going to use that.
It's just going to do the,
the,
the default logging.
And then you have to call logging basic config
with the new settings. And then a little note that if you're using UVA corn, you probably want
to override the logger for that. And that's it really sets it up. And it's got all the code in
place so that your fast API application can have these lovely logs and tracebacks during development.
Yeah, that's super neat. David, are you a fan of either of these frameworks?
I haven't had a chance to use Rich too much.
I have been watching Textual pretty closely on Twitter
because it's just phenomenal what he's been able to do.
How do you have a docking scrolling side thing
in a terminal window?
What's going on here?
I do.
I love FastAPI.
I built my wife's website using Flask
and I liked how FastAPI was similar to Flask in a lot of ways.
But, you know, some of the syntax was a little bit cleaner,
although with the newer version of Flask,
it kind of borrows some of the same syntax.
And it's just got a lot of really good necessities built in.
The API documentation was really,
I think that's kind of clutch when you're learning a new framework, too,
because you're not having to do, do like curl commands or anything like that.
You can just bring up a webpage and poke at it, you know, visually, which is, which is pretty nice.
So no, I really like fast.
I just, you know, other than, you know, kind of building some small toy things, haven't had a really compelling reason to use it yet.
So yeah.
Yeah.
Very cool.
Toys are compelling reasons.
I think.
Definitely.
Definitely.
Maybe some Arduino thing could run a fast API server.
Who knows?
All right.
So let me talk about some good news.
Good news, good news.
We've had a couple of things we've covered
about some visionary sponsors coming on
to support Python and the PSF and so on,
which is fantastic, right?
I've certainly whinged a lot about people running,
you know, multi-billion dollar revenue companies
and doing nothing really to give back
than maybe a PR or something.
But we've got Microsoft, we've got Bloomberg,
we've got Google as visionary sponsors, right?
And one of the things that that made possible
is the CPython developer in residence.
I don't know if it's directly related to one of those
or if it's just sort of like that sort of brought it all
together. But recently the PSF
said they're going to have a developer
in residence position and
well-known community member,
friend of the show, Lucas Lenga
has applied
and got hired. He's now the
developer in residence. This is a little bit old news
for it's from last month, but
I wanted to make sure we gave it a quick shout out because I for it's from last month, but I wanted to
make sure we gave it a quick shout out because I think it's going to be pretty interesting to know
that there's a developer side person inside the PSF making sure things are going. So the PSF has
seven, eight, nine, I don't know, something like this. I haven't got recent updates,
including this, but include this position, full-time employees, right? So there's a bunch
of people who work there,
but to my knowledge, this is the first like developer person
rather than marketing, legal, whatever, right?
All that, the sort of business director,
administrative side.
So this is pretty interesting.
Apologies to everybody that works at the PSF
that's like, don't forget me.
Yeah, no, no, no.
Those are super important,
but it's interesting that there's not been a Python developer type of role within that group is all I'm saying.
So they put that out.
LucasLanga is now part of it.
And there's some interesting takeaways here.
So basically, let me just give a bit of a quote here for how Lucas decided to sort of position this and how he sees
it. He said, I don't really want this to be like, Hey, I am the, uh, you know, the appointed CEO of
Python. So he listened to what I have to say, right. But now, um, he's in, he's incredible
hope, incredibly hopeful for Python because of this and wanted to apply for it. And so on. He
says, I think it's a role with transformational
potential for the project. In short, I believe the mission of the developer in residence, the DIR,
is to accelerate the developer experience of everybody else. And that not includes just the
core team, but most importantly, the drive-by contributions contributors submitting pull
requests and creating issues on the tracker. So he's hoping that with this role, he can do things like make sure that there's a steady
review of the stream of PRs and issues so they don't get stale and there's not a backlog.
Triage the issues, be present in the official communication channels to unblock people if
they get stuck trying to contribute, keeping CI and test suites in a usable state and making
them run quick,
and keeping tabs on where the work is most needed in the projects that are most important.
So he's sort of the, it sounds to me almost like the technical person in the room to help the community keep moving and just making sure, oh, everyone's having a problem. Many people having
a problem trying to do a PR because they can't get CPython to build. Let's make that incredibly simple for them and things like that. Yeah. I like his attitude of where he's going with this.
So, yeah. Yeah. If I didn't point out, Lucas is also the creator of Black,
the Black formatter, which I know we've talked about in a hundred thousand variations here.
So that's great. David, how do you feel about this?
I think it's great. Any full-time person that can have working for the PSF
or on Python directly is going to help increase stability.
And I like his approach too,
where he's going to try to increase throughput
by maximizing everybody else's efficiency.
I think that's a...
It'd be easy to say like,
oh, I'm going to work on these features or on this,
but he's most concerned about making development for Python
as ergonomic as possible,
which I think ultimately will create more throughput and, you know, a better, better
Python in the long run. Yeah. And absolutely props to the PSF because it's easy to hire somebody and
say, here's what I want you to produce for us. It's harder to hire somebody and say, I want you
to be an enabler of other people because it's hard to measure that right yeah one of the interesting
things that i think that he's doing is is i'm not sure if he's going to keep this up but it looks
like he has so far is he puts out weekly report posts of what he's been doing so this i can't
imagine having that much public scrutiny over what my work week looks like but i mean brian
why did you spend so much time working on CI? Come on.
So, it's pretty impressive, and
it's cool that he's doing that.
The entire
Python world is watching. No pressure
or anything. Yeah, he did say
he was a little nervous about this, because
this is the first year
of this position, and so the success or
failure he has
will influence like whether it continues and you know what happens sort of in the future so
super cool let me uh get a little feedback from the audience here so uh sam orley hey says good
for lucas he's great i watched a bunch of videos he did on youtube about making music with async io
i haven't seen those i have to check them out And Dean out in the live stream says, CEO of Python reminds me of a known joke in my country
where this famous newscaster was shouting, get me the person in charge of the internet. Get me the
person in charge of the internet. That's great. Dean, you have to let us know what country that
is. That's awesome. All right, Brian, you're with the next one? What's that?
You're next.
No, you already did this, right?
Yeah, David's next.
I got to keep track of what's happening here.
David, you're next.
Okay.
So my next item is a library or framework.
I'm not sure which one it falls under called Daxter.
It is a data orchestrator for machine learning, analytics, and ETL.
It's one of the first attempts I tried for any data pipeline. It's based in Python,
so you programmatically build up your pipeline using Python
and different decorators depending on if you're building a solid,
or depending on what you're building in the pipeline,
or if you're doing configuration, use different decorators um it took a little bit to kind of wrap my
head around it i think it had more to do with the just kind of understanding how pipelines are
typically constructed in industry but once i got my head wrapped around it it was really simple to
use i felt like i could produce things pretty quickly um one really nice thing that they do
is they you know allow you you to essentially work on your
pipeline locally, then deploy to production to like a Kubernetes, or you can deploy to Airflow
or Dask or whatever underlying engine you want to run your pipeline. And there's very little
transition there. You're not developing something local and having to completely change it for
like a cluster or larger scale. And another really nice feature it has is a UI called
Daggett. So you could do everything via the command line if you want to, but it does come
with a really nice UI that allows you to see an overview of your pipeline. It allows you to test it using the playground.
You can update your configuration in the playground.
You can look at previous runs to see if they pass or fail.
It gives detailed logging and error messaging.
This by itself is pretty nice on top of an already very nice tool.
I can give a quick demo too.
So this is the, I think it's the first part of this work
tutorial they have you where you have multiple solids.
So these represent different pieces of processing.
And then, like I said, you can use the playground.
It'll check all of your configuration, everything
to make sure it's correct before it lets you run anything.
So if you have something misconfigured,
it's not going to blow up halfway through a, you know, a 30 minute
job. And then when you like that, Oh, no, no. So I'll probably I'll probably forego the, the real
time demonstration. I think my terminal probably died is what that was. But uh, yeah, it will
actually show a run in sequence and show
the different pieces that are completing and
feeding into the other piece too.
It's not so much for this because it's a very small,
quick pipeline, but if you have
longer SQL queries or something like that,
it'll actually show in real-time how it's processing.
You can get a visual intuition to what's going
on on top of everything else too.
Yeah. There are a couple of the resources around this too,
if you want someone that explains it a little bit better
than I do, the Data Engineering Podcast had an episode
and Software Engineering Daily also did an episode
about Daxter.
So, you know, that's kind of where I first learned about it
and there's a lot of really good information
in those podcasts.
Yeah, these data pipeline frameworks are super interesting.
I've certainly realized
just how valuable they can be.
Dean asks,
David, how is this compared to Airflow?
Do you have any idea?
Have you tried?
Have you looked at either?
This was,
I haven't used Airflow.
This is the first,
my first stab at any kind of data pipeline.
And in my current job,
we're not using Airflow or DAX
or we're using one of the cloud-based tools.
So it's, I think Airflow is more draggy, droppy, more visual,
but I could be wrong about that.
One thing I really like about Daxter is,
at least compared to what I'm currently using,
is that you could programmatically create these interfaces.
And technically, the tool I'm using now has an API that you can throw JSON against
to create your different resources and everything.
But it's nice having
Python code because that works a little bit better with
my brain than a lot of the draggy, droppy
stuff.
Yeah, yeah.
I did have the
Airflow folks on the show,
on TalkPython, not the show,
a little while ago. It's not out yet, but
last week maybe? And they
pointed out that it's mostly,
it's like pretty much all Python here as well.
So you program it in Python over on Airflow
and then you have similar visual tools
to actually see what's happening,
but you can't interact with it through those things.
You can just like kind of watch it
and debug it and stuff from my understanding.
So I would put them in a pretty similar category.
I would say one thing that's pretty interesting is there's that's not what
i would pull out actually one of airflow github is what i wanted to sort of point out i was really
surprised to learn that airflow has 22 000 stars on github which kind of blew my mind i thought of
it as like a this little framework that people might use apparently it's popular i'm not really
sure about daxter i guess i could look as well i think it's it's relatively new so i'd be surprised if it were quite as popular as airflow but uh one nice thing that
that can do if you're running if you're running or if you have airflow uh pipelines that you're
using you can use that server to run dax or two it can basically pilot you something that's
compatible with airflow if you need to do that. So there's a couple of different, I think, translation ways you can translate it too.
So it seems like a pretty interesting tool.
And like I said, I had developed a small pipeline
in my previous job as kind of my first stab at pipelines
to eliminate it in Excel sheet
that was doing a bunch of horrible, awful SQL queries.
I could just imagine that people are trying to do this
with Excel and it was probably wrong.
Oh, it was.
Not necessarily incorrect, but it was wrong to do it.
Well, it was, well, it was, it was interesting.
Excel is just very interesting to reverse engineering.
It's a lot of go-to statements.
It's, it's ubiquitous, but it's definitely, as far as, you know, programming production
systems, not a good tool.
So.
Yeah.
Yeah.
Very cool.
All right.
So I got some more real-time updates here. you know programming production system is not a good tool so yeah yeah very cool all right so i
got some more real-time updates here teddy says i know one of the big differences with airflow is
that you can use the output of a task as the input of the next task from what i understand
daxter is kind of a second generation data orchestration unsure which which uh generation
airflow would be but um here we go and Airflow mostly assumes you store and load data in each task,
even though Airflow has something called XCOM, which allows you to pass the output
as input of the next. Okay. Interesting. Yeah. Thanks for all that background info there.
I haven't used either, but I definitely, definitely think they're both neat. And I feel
there's a lot of places that are just like, well, how else are we going to do it? Of course,
we're going to use that spreadsheet, right? And if they had tools like this, it would be very
empowering. One of the things I find very
interesting about these frameworks is usually what you end up building is like the little piece,
like load the CSV into the database or run the report that gets me the revenue for the day or,
and what you end up building are very, very small pieces. And you don't have to worry about the
reusability, the reproducibility, the durability. You just go like, I'm going to build an incredibly small bit of Python and we'll just click it
in as part of this workflow, which really seems to empower people almost like the microservices
story, but for data processing without all the hard deployment side of things.
I hope that they, if they don't already have it, I hope that they put a tool connected
with Degster called called dagdabit because
it needs to be there i think um maybe some sort of capture tool or something dagdabit would be good
yeah yeah i love the ui bit of it as well all right uh quick bit of follow-up i guess brian
you want to start you got any extras today i've got just a vanity extra so So one of the things that we noticed
Will mentioned about textual,
we talked about textual briefly.
The stars on textual is just going through the roof.
I love the graph.
Is this the XKCD format of Matplotlib or something?
What is this?
I have no idea what it is.
Yeah, that's great though.
Anyway, show us the other pictures.
Yeah, the stars are insane.
It's like a vertical line on a graph.
One of my own projects has a similar trajectory.
So I wanted to just highlight that.
It's looking up too.
Of course, I only have 16 stars.
Will has like 3,000.
A little different,
but still, look.
It's kind of the same, don't you think?
Yeah, that's awesome.
It's 16 stars most of my repos.
You just got to extrapolate it a little bit. No, that's really cool.
Awesome. David, do you have any
extra stuff you want to throw out? Sorry, Brian.
I had one extra. I didn't
load it on my screen over here.
Let me see if I can pop it over real quick.
And this is in Python, but I know SQL and Python tend to play a lot.
Are you going to go back to some nostalgic time on the internet
where you opened up a DOS prompt and typed win to start Windows?
What is this?
This is Modern SQL.
It's a really fantastic slideshow that goes through
a lot of updates so if you're still doing sql the old-fashioned way it shows you how you can replace
that with you know better cleaner more concise versions of them there are so many things in here
that i have was doing a lot of like just horrible hacky tricks to get to work that you could take
care of for in one line for sequel and that SQL. Even with some of the newer things I've learned,
there's just so many great,
I don't know if you call them tools or methods or what,
but I found in SQL tend to work together a lot,
especially in the data space.
If you're like me where you have some self-taught SQL experience,
something like this can be very helpful to kind of learn some of the, I guess,
better practices for different things
that you might want to try to do with SQL.
No, this is great because I learned SQL like in the 90s.
So it's changed a lot since then.
And I was just thinking the same thing, Brian.
Like it's been at least 10 years
since I've tried to refresh my SQL skill.
So there's probably a lot of stuff that's,
oh, you shouldn't do this. Like, why you do this? If you use this other a lot of stuff that's, Oh,
you shouldn't do this.
Like,
Oh,
why you do this?
If you use this other keyword,
it's more efficient,
safer,
faster.
Come on.
Yeah.
That's like a jealous of the people learning SQL now.
Yeah.
How about you,
Michael?
Got anything extras?
I got some followup,
some followup from last time.
This comes to us from John Hagan.
And I think I probably is the one who said this i said oh there's really cool time if i would like about being able to use
lowercase d dict and lowercase l list as type hints rather than from typing import capital
l list or capital d dict right said oh that's coming in 310 fantastic he's like uh you know
that's in 35 or 39 so it's kind of already out. Oh, right. Okay. But he did point out some things that are coming that
are neat. So for example, previously we had to say, if I want a potentially optional, it could
be none or it could be a list. And the list, if it is a list has strings, you have to say optional
bracket list bracket stir. And those are all capital because they have this parallel type implementation over
in typing, right? In Python 3.9, I can now say optional of lowercase l list, a bracket str. And
you might think who cares if it's lowercase or uppercase L? Well, the difference is you don't
have to do an import and explain to people who don't know that code like, oh, you've got to go
import this other type things to say the type. Yes, I know list is right there, but you can't
use list. You got to do something else, right? So that's the feature that I was
excited about that I said was in 3.10, then 3.9. So hooray. But he also pointed out that the union
operators were simplified. It used to be you would have a similar syntax for union as optional. You
would say union of bracket one thing, comma bracket the other thing. But now you can say just type one pipe vertical bar type two.
And this actually allows us to model optional without importing optional.
So instead of optional of list of string, we can just have list of string pipe none.
Yeah, this is cool.
And I'm glad somebody pointed out because the 310 announcements don't say anything about
optional, but in effect they do. You don don't say anything about optional. But in effect, they do.
You don't have to use this anymore.
But are you going to start using this?
The pipe thing?
Well, yeah.
And the optional thing.
Because I started to.
And then I realized that if I start using that, then my code is 3.10 only.
Yes, exactly.
Which depends on the scenarios, right?
So for, say, TalkPython training,
the code all behind that,
I control the server.
Yeah, nobody's looking at it.
It's easy for me to make it the brand new thing.
If I were to say generate,
if I were going to build an example app for a course,
then I would be hesitant to use this right away.
I might wait a year or two
because I don't want to have to have people
have a bad experience.
Like, well, I have 3.9.
That's pretty new. That should be work. Like, nope, that doesn't work because I didn't want to have to have people have a bad experience like well I have 3.9 that's pretty new that should be work like nope that doesn't work because of I didn't
want to say that word optional right yeah and if it was an open source project I guess it would
depend on how if I wanted to support older versions probably even longer there wait I don't know what
you think yeah I'm always thinking a library specifically you'd probably want to almost stick
with the 3.5 to 3,
at least for a while,
to kind of flush out people
that are using some of the older versions of Python.
Yeah, I think 3.9,
I'm using 3.9 on everything now,
but I think for a lot of people,
that's still pretty aggressive
to have a 3.9 or higher requirement for a library.
Yeah, I agree.
A couple of bits of real-time feedback out there
sam and dean both say there are dunder future imports that you can do now that will enable
some of this stuff already so like dunder from dunder future import pipe i don't know if that's
true or if it's a joke um well i do know that the the d Dunder future stuff does support the newer type information.
I don't know about for pipe.
Okay.
Yeah.
Yeah.
Okay.
We can do some after coding on this.
Coding after the recording and we'll know.
Oh, Dean says he's kidding.
Yeah.
But you really can.
Thank you.
You really can do some of these other type information with the import tender features.
Okay.
Are you ready for a joke?
Yeah.
All right, Brian.
So you're going to have to help me along here.
Okay.
So there's two developers staring very worried at a screen.
They have one section, then a big, long, quiet section,
and then some more.
So you be the very first person, and I'll be the second person here.
Okay.
Okay.
I hope it works.
Do not hope.
Pray.
Pray it works.
Have you ever been there, just in this situation where you're just like, oh, it must work.
If this doesn't work, we're done.
Yeah.
Yeah, not so much on the software software side of things but when i was a
manufacturing engineer there was so many times we'd be troubleshooting a machine on a saturday
for eight hours straight and you think you made everybody's just holding their breath crossing
their fingers work at work because i want to go home someday so yeah i mean i remember how
go ahead brian no i definitely uh feel this uh when I'm using it, when you're working on C++ code
because you have to wait for it to compile
and then test, load it, and then test it
and stuff like that.
But even with Python stuff,
I still feel this when I'm working on CI tools
because the continuous integration,
you have to, you know,
you're not sure if you got it right,
the syntax right, the YAML right or whatever
until you push it and see what happens.
Yeah.
Yeah, CI is a good
point you have so little visibility in there and if it's not working uh just one better real-time
follow-up on mine here it's like if you come over here and you look at the um the pep 585 it does
say the implementation of some of these new features under typing this is the one that's
coming out that came out in three nine says you can say from future import annotations and then start using lowercase l and things like lowercase d i who knows i know
dean said he was joking but maybe you really can't get the pipe to come out that way but
but at least you can do like these these um sort of three nine level uh changes using a back to
three seven it looks like okay all right cool Well, that was a lot of fun.
Yeah, it was.
I had another one,
but I'm going to save it.
Good.
All right, well,
I'm looking forward
to hear about it next week.
David, thank you for joining us.
Thank you for having me.
Yeah, yeah.
And thanks for all the tips
and stuff you've had
throughout the years.
And yeah, it's really good
to have you here.
And congratulations
on your first dev job.
That's fantastic.
That is fantastic.
And thanks, Dean,
for correcting us in real time.
That's awesome.
It's good.
Yeah, absolutely.
Yeah, thank you, everyone.
And oh, Sam does sadly show us
that import pipe from the future doesn't work.
But yeah, thanks, everyone.
See you all later.
Bye.
Well, thank you.
Thanks for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
Get the full show notes over at PythonBytes.fm.
If you have a news item we should cover,
just visit PythonBytes.fm and click Submit in the nav bar.
We're always on the lookout for sharing something cool.
If you want to join us for the live recording,
just visit the website and click Livestream
to get notified of when our next episode goes live. That's usually happening at
noon Pacific on Wednesdays over at YouTube. On behalf of myself and Brian Ocken, this is
Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.
