Python Bytes - #165 Ranges as dictionary keys - oh my!
Episode Date: January 21, 2020Topics covered in this episode: iterators, generators, coroutines requests-toolbelt Pandas Validation qtpy pylightxl python-ranges Extras Joke See the full show notes for this episode on the web...site at pythonbytes.fm/165
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 165, recorded January 16th, 2020.
I'm Michael Kennedy.
And I'm Brian Ocken.
And this episode is brought to you by DigitalOcean.
They're a great supporter of the show.
Check them out at pythonbytes.fm slash DigitalOcean.
Get $100 credit for new users.
More on that later.
Brian, we've got a lot of stuff to get through, and I want to just, let's start iterating
through it, man.
Okay, let's iterate through it.
Also, I can't believe that it's halfway through January, whatever.
Okay, so first off, let's talk here about iterators, iterators, generators, and coroutines.
So I'm linking to an article, that's pretty much what it's called, by Mark McDonald.
And when I Googled uh this relationship
between coroutines and generators apparently everybody else knows this is a thing but i
missed out somehow but this article is a really good introduction to all of this concept and how
they all work together so it start well okay i've got a i gotta start out with a beef it starts out
with like talking trying to do a
gentle introduction to the iterator protocol with like the dunder iter and dunder next i just want
people to stop doing that okay muscle through it but skip that part it should be an appendix i think
because people don't do that anymore okay next it goes talks about generators which are the same
thing as this iterator protocol sort of of, but using the yield function.
I know there's differences, but this is how I do it.
I use yield for generators.
It's so beautiful because you take the code that's not generator style, and then you just throw in yield instead of like list append or set.add or whatever you're going to do to gather up the results.
Just replace that with yield. Boom, you're done. It's usually less code. I love to gather up the results. Just replace that with yield.
Boom, you're done.
It's usually less code.
I love it.
It's great.
I'm a big fan.
Like, for instance, you just throw things into a loop and put yield in there or yield
the things you have, whatever works.
Unbound generators, it talks about, which means don't convert these to lists because
they don't stop.
It is possible to write a for loop that doesn't stop,
and therefore there's a way to do a generator that doesn't stop.
Right, if you're working on an infinite series,
some kind of series that you use a generator for,
it might not stop.
Yeah, I mean, there's legitimate reasons to do this.
Or maybe it does have an end,
but it doesn't fit in memory and stuff like that.
So beware.
Generator expressions, you know,
for some reason I just forget about.
They're like list comprehensions, but you put parentheses instead of brackets, and then
it's a generator expression.
They're smooth, right?
I mean, they don't have those sharp edges of those square braces.
Smooth.
Oh, wow.
That was bad.
Okay.
The reason why I highlighted this article really isn't for this stuff so far.
It's a couple things.
It talks about that generators can use other generators or nesting generators with a yield from.
And this is cool. I didn't know this was a thing.
So let's say bar and baz are generators.
You can define a new function, foo, that yields from each of these. And it just goes
through one. And then when it's exhausted, it goes through the other. Really slick. Did you know this
was a thing? Yeah, this was added after the yield keyword was added. So yield was there for a while.
And then what you would have to do before if you wanted one of these, you'd have to write a for
loop that goes through every item in the sub generator and then just yield that out.
But now you can just say yield from that thing.
It's been a few versions that it came in.
I can't remember exactly when.
But yeah, it's a bit of a new feature.
Maybe 3.5, maybe 3.4.
I can't remember.
But yeah, this is great.
The place that I've used this most is recursive generators.
You're writing a generator and it's going through some data structure,
but then you get to the point where you're like,
well, I need to call it again,
but with a different node in a tree
or something like that.
Instead of having to loop over that to yield,
you just say yield from basically the recursive call.
It's beautiful.
Oh, yield from with a recursive call.
Nice.
That hurts my head thinking about it.
Yeah, man, think about,
you know how painful it was to learn recursion
and how funky it is to learn about generators.
You mash them together and then the brain explodes.
Yeah, it's great.
Okay, the article goes on and talks about the relationship between coroutines and generators.
Because yield usually is just a thing.
It ends up returning a value out of your function.
But you can equal to an assignment,
a variable assignment from a yield,
and that's one of the syntax things
that works with coroutines.
And I got to admit, I got lost at this point.
So this is kind of a call to action to everybody.
I'd really like to have a coroutine tutorial
that could show me how to use coroutines
for stuff that I really actually might use that isn't async related and can we skip the iterator protocol.
Or make it an appendix, like you said.
Yeah.
Do you use coroutines?
I mean, they look neat.
I just don't know how to use them.
I use generators all the time and I use async methods, which ultimately are fancy wrappers around coroutines, but I don't use coroutines directly.
Not knowingly anyway.
Okay, cool.
I'll have to play with it a little bit.
Yeah, nice.
Something that I use a lot is requests.
You probably use requests a lot as well.
Yeah, lots of people do.
Yeah, and requests is one of these things.
Last time you spoke about PyPI stats.
Was it pypistats.org or something like that? And requests was certainly right near the top. Request is one of these things, you know, last time you spoke about PyPI stats, was
it PyPI stats.org or something like that?
And requests was certainly right near the top.
It was not number one on the list of things being used, but it was near the top and which
that means it's, it can't take too much change, right?
There can't be too many features or changes made to it.
So it would be nice to have something that makes working with requests nicer that can
change more quickly.
So there's this thing that I came across called Request Toolbelt.
Yeah, so Request Toolbelt is a, well, toolbelt of useful classes and functions to make working with requests easier.
And it really does, at the moment, four things.
But I think if people are out there and they're like, I always have to do this with requests.
It's like these five lines.
I got to make sure I remember to do this right.
It would be awesome to just, you know, extend this.
So this is a small project by someone.
I can't remember.
I don't think it says like really a meaningful name on it.
Yeah, no, it's just under requests.
Actually, this is not the small project.
I think, you know, but I think it would be cool to like take those ideas.
If you see patterns that you're doing with the request library and fold them in here so let me give you the rundown on the four things it does
first of all if you're going to do multi-part form data encoding like i have an image file and i want
to upload it to the server to the api that's annoying right it's not not super easy but with
this thing it's really easy to go and just basically say, here's a file stream. That is field two. It's whatever it is, right? It's binary
image data or it's text. And then you just say, here's my data, this multi-part form encoder.
And boom, it's just uploading files and doing all the stuff it has to do.
That's incredible. Just a few lines of code.
Yeah. It's really, really nice. And you don't have to think about like, how do I do multi-part encoding again?
Just give it a file stream.
You're good.
The next one is the user agent constructor.
So you have to set a header, user to ask agent, but then like, how do you construct that in
a meaningful way?
There's a class that takes, or a method, I think it's just a method, takes some arguments
and it will generate the string that is a, I guess, compliance user agent
for like your API app or whatever. So that's cool user agent constructor. Sometimes you have to,
when you're working with other systems, conform to certain SSL protocols, right? We have TLS
version one, 1.2, we have two, I think, coming along. But there's different versions of TLS, which is the
foundation of SSL, right? So they have an SSL adapter that lets you explicitly set,
I want to use TLS 1.2 or 1.0 or something like that if you need to.
Oh, wow. Okay.
That's cool. And then one thing that you can do with requests is you can create a session
and then it'll start talking over it. It probably reuses the connection.
I'm not entirely sure of all the things it does.
But one of the things the session does is it'll remember cookies and things like that.
Well, maybe you want to make a series of requests using a request session
that doesn't actually carry the cookies from time from request one to two to three and so on.
So one of the classes in here is a forgetful cookie jar so if you if you set the request session cookies container to the forgetful cookie jar it will
well it implements the protocol but it always forgets its cookies obviously so it's a cool
way to like clear out still use sessions but clear out cookie persistence across calls is there a
reason to use sessions without cookies well some websites behave differently if they think they've already seen you
or things like that right yeah like maybe i want to test the login function
both working and not working and then i want to try it of i forgot my password but it i don't
want it to know that i've already actually logged in and that sequence or something like it could
be some
like series that you're testing for playing with okay so like if you got to log in your session
login is still valid but you have to go yeah yeah or maybe you're going to a place like some sort of
paywalled ad place and it's like well you can come here three times but if you come here more than
three times this month we're going to show you the paywall you know what I mean you're like well
you're using cookies for that.
And my cookie jar is forgetful.
I don't know.
I don't personally have a reason for it, but I can imagine reasons that people might use that for like automation or whatnot.
I predict that we will hear other people telling us the reasons now.
Yeah, absolutely.
They definitely might.
So people can visit by them by side of him slash 165.
And down at the bottom
they can tell us why why they're doing it's a cool comment section yeah all right speaking of cool
let me tell you about digital ocean they're doing all sorts of good stuff they're offering a hundred
dollars credit for new users so it was 50 it's back to 100 yay that's great and we all of our
infrastructure and stuff runs on digital ocean and it's been just perfect for years. So that's great.
One of the things they recently released is memory heavy workload droplets.
So memory focused droplets. So you can get up to eight gigs of RAM for each dedicated CPU.
And it goes from two CPUs all the way up to, is that 32?
256 gigs of RAM available on your VM,
which is kind of ridiculous if you really need that.
But maybe you've got a workload that does.
So it's really good for high-memory apps like high-performance SQL or NoSQL databases
and memory caches like Redis,
maybe some data analysis of lots of data,
stuff like that.
So check them out at pythonbytes.fm
slash digitalocean.
Get $100 credit from your users and support the show.
Speaking of data science, what do you got, Brian?
What's next?
Yeah, speaking of data science,
Pandas is used by lots of folks.
Not just data science,
but I know the data analysis people use Pandas quite a bit.
And in episode 162, you weren't with us for that,
but we covered a project called Bulwark.
Yeah, I listened into that episode as well,
and you and Ollie did a great job.
That was fun.
And we had a listener suggestion about another package called Pandas Validation,
and then I was just looking around to see if there's other projects.
One of the others I found was Pandera.
So I'll try to briefly talk about these, but Pandas Validation,
Lance tells us that it lets you
create a template for your data frame, how it should look, and then it validates your entire
data frame against the template. So if you have a data frame with the first column being string
and second column being dates and then an address, and you can use a mixture of built-in
validate types to ensure that your data conforms to that.
So that looks pretty cool.
Yeah, this is really nice.
It's a little bit like, tiny bit like JSON schema or something.
So you've got these pandas data frames or time series that it's just full of whatever.
And then you can throw on top of it a cool validation.
And it's all at once against the whole collection, right?
Yeah.
And then Pandera is, I think, a similar sort of project that lets you set up types and properties for different columns of a data frame and perform validation to make sure sort of a schema validation sort of thing also. So they're all kind of solving a similar problem, but I was looking at it and the API and how you use it between Bulwark, Pandas Validation and Pandera are all very different.
Yeah, they are.
I'd really like to hear if there is a common approach or if Pandas Validation, DataFrame
Validation is just not something that's catching on yet or what people are using.
I'd love to hear that.
Yeah.
And I just noticed at the bottom of Pandera, they have other data validation libraries
and others Panda specific ones like opulent pandas and panda schema
and pandas validator and table enforcer and so on.
So apparently this is like a whole hole you
can go down into that I was not even aware
of. But I got to say the
Pandera API where you basically
define a column, a
data type, and then a
lambda function that you give it that
does the validation. That's super cool.
I love that yeah it
looks pretty clean yeah it looks incredibly flexible without getting like out of control
yeah speaking of out of control you know what's a little bit out of control
guis for python yeah and this way i don't mean i'm not actually this time complaining about their
absence or something like that but one of the best libraries for building guis in python
has got to be cute cutie right pronounced yeah and i was inspired at the python meetup that you're
running out in west portland when we saw agi more give a presentation how he he used FBS, F man build system, plus pi installer plus cute to build,
you know, nice packaged apps that are GUI apps that he could distribute around. And that was
really cool. So one of the things though, that drives me crazy is like, we've got pi cute five,
we got pi side to pi cute for we have pi side, we have Python for cute, we have all these different,
different things, right?
I think Python for Qt might be the next version of PyQt 5 and so on.
And I just don't know where to start, right?
I'm looking at this going, oh, my goodness.
Like, you see different examples doing different things.
And so I ran across something called Qtpy, Q-T-P-Y, Qtpy.
Yeah, or Qtpy.
I wanted to say Qtpy, but i don't know cutie pie cutie pie
so uh cutie pie no cute pie actually you know one of the things about a lot of these libraries is
they're like really cool little proof of concepts but in practice are how real are they how supported
are they and so on one thing that seems real and supported is anaconda the anaconda
distribution and with that comes the spider ide like the whole anaconda continuum data science
ide thing right and yeah this cute pie is the foundation of what they're doing to write that
okay at least it's in their github repo so it provides a uniform layer to support all those different
libraries that i complained about with a single uniform api so it's like an adaptive layer on top
of all those things and it figures out what version you're actually running against and then it just
adapts so you write a code once and then you can run it in all these different ways or you can see
different examples yeah it's nice yeah it's cool right so this is created by the spider development
team and there's not a whole lot to it basically it's nice. different libraries and they're not exactly compatible. So quite cool, I think. Yeah, and also during the presentation at the meetup,
is it Ogi?
Yeah.
Mentioned that just he uses that
and then if there's a problem with one of these packages,
just uninstall it and install one of the other ones
and you don't have to change your code at all.
Yeah, it's cool.
Just works.
One of the other things I thought was neat
is at the bottom of the readme,
they've got sponsors, like different sponsors at the bottom and become a sponsor.
I have not seen an open source project do that before.
It's an interesting idea.
Yeah, it is definitely an interesting idea.
I haven't seen that either.
Yeah.
So maybe I'll try that on my little open source project.
Well, they also have the GitHub sponsor at the top.
Are you using the GitHub sponsor?
No.
That's something
people can turn on.
I think that's really cool
that GitHub did that,
that people can now
sponsor projects
like through GitHub
instead of negotiating
some deal separately
with everyone.
Yeah.
I wonder if they're
tied together.
Oh, I'll have to look into it.
Anyway.
Yeah, yeah.
Check it out.
All right.
So, yeah.
What's next?
Well, I want to shed
some light on spreadsheets.
They can be a dark place if you get sucked down into VBA or too far down there.
Yeah.
So, actually, we got an email from Victor Kiss.
I think it's Victor Kiss, K-I-S.
He said he's got his very first open source project, but it looks darn cool.
It's called PyLite XL, and it's an XLS spreadsheet thing
that you can read and write spreadsheets with it.
So it's a lightweight, zero dependency, minimal functionality read writer.
Other than the standard library, there's no outside dependencies,
and you can read and write modern XLSX and XLSM files
with a very simple interface for getting access to the different sheets inside there
and rows and columns and stuff.
Actually, it looks pretty cool.
Yeah, it looks totally useful if all you've got to do is get in there and get the data.
I don't know if it does things like lets you change, say, conditional formatting
or other weirdness, but definitely if you just things like lets you change, say, conditional formatting or other weirdness.
But definitely, if you just want to open up an Excel worksheet, not a CSV, but a full-on XLS, and get at the data or the rows or whatever it is you're after, it's quite neat.
If you go to the link that you're linking to and just scroll down a bit, there's a little animated GIF.
And I think it tells you pretty much all you need to know.
You just watch it for a second, and it's like, here's the few steps to go work with this Excel file. It's a little animated GIF, and I think it tells you pretty much all you need to know. You just watch it for a second, and it's like, here's the
few steps to go work with this Excel file.
It's cool. He's already got documentation up
with the API, but I found the most
on the docs, the
best way also to get up to speed really
quick is to look at his, he's got a handful
of examples for how to do
different things, and it's like, oh my gosh,
I could just, if I needed to read
Excel from Python I could
get started in a few minutes with this
so
it's very cool
and no dependencies that's kind of nice as well
I never really thought about why
that would be important but he lists
one of the reasons is that if
you're going to a few things
if you're going to compile it into
another installer or something using PI
installer,
not having any DLL or other dependencies makes this easier.
And then he even says that he's a,
the library is just like a few source files.
So if you don't even want to install this as a package,
if you just want to copy this stuff into your own source,
that that's an option.
So,
right.
Yeah.
Just vendor it and then
you you don't have dependencies either yeah you know getting updates but you know yeah wow it's
the trade-off i'm going to tell you about this other thing and at first it might not sound very
exciting but i'm actually pretty excited about it i think this is this is quite cool it's a clever
little library and this suggestion comes to us from aiden price and he
told us about some project he's working on using something called python dash ranges okay okay so
we have range like the built-in range you can say you know start equals whatever and equals whatever
and it goes from the start integer wise up to but not including the upper bound but you can't use that range in like more
meaningful ways so for example if i had a range of zero to a hundred i can't easily ask is x in
there right if x is a number or if i have two ranges and i want to intersect them how do i do
that but this library takes that kind of basic idea sort of like series but with a lot
of set operations you can ask for the intersection of ranges you can ask for whether or not they're
mutually exclusive things like that so all the set operations you can do on it but then it also
extends that so you can have a range set which is a bunch of different ranges or even a range
dictionary so like why would you care about that so what you could do with a range set, which is a bunch of different ranges, or even a range dictionary.
So why would you care about that?
So what you could do with a range dictionary
is you can use ranges as keys.
So if the example they give-
That's crazy.
I know, but here's the example they give.
It's probably abusing the concept of a dictionary,
but it's really useful.
So if you had an if statement that said,
if they use tax or something like that,
let's just say if your income is zero to 10,000,
you're in bracket A.
If you're in 10,001 to 20,000, you're in bracket B and so on.
You had like a huge if, else if, else if, else if
to test for that condition.
You can create a range dictionary
where the key is a range zero to 10,000,
10,001 to 10,000,
20,000 and so on.
And then like some information about it is the value.
And then you could just take a number like 37,215 and get it from the
dictionary.
Say,
I want to get that from the dictionary and it'll return.
So it'll,
it'll basically do the test.
Like,
is this item in this range as part of the key match of a dictionary that's
brilliant that's cool isn't that cool it's got to be abusing the idea of the dictionary really but
it's it's pretty cool yeah yeah so it's almost like a switch statement in a sense like you could
take those things and those that if else and replace it with this just flat statement of
these ranges and then it'll do the the comparison kind of in the data structure yeah sweet so there's a bunch of stuff that you can do with with these ideas
they got some good examples but that little example i gave you i think is probably the simplest one to
tell you about because it gives you a good sense of like why you might actually use this right like
a lot of times you you look for these blocks or these ranges, and it's really cool to be able to sort of test in here.
You could even do really interesting stuff like, I want to know, is this thing in any of these five ranges?
You could just create one of these range sets or these range dictionaries and just ask, is this number in this set?
If it is, it's in one of the five ranges that are in there.
There's really cool ways to layer these together.
Yeah, and especially if you've got that all over the place.
For instance, I'm thinking hardware stuff.
Yeah, it's got to be in there.
There's a bunch of numbers and frequencies and whatnot, right?
Right.
So if I've got different power levels, for instance,
they'll have different attenuators that'll kick in at different power levels.
But I don't want those power level numbers to be hard-coded all over my code.
So having some central place where I put those in place so that I can just throw in a number
and it gets based on that, I know what the attenuation is or something, that'd be great.
It's cool.
Yeah, that's cool.
It also, it occurs to me, this might be useful for testing, right?
Because then your assert statement could have a little bit of ambiguity, right?
If there's like, well, as long as it's in this range, it's okay.
But if it's not, then it's not.
And so maybe that's also an interesting way to simplify testing.
Yeah.
Yeah.
Okay.
Cool.
Cool.
Well, anyway, I think that's a much more interesting project than it just sort of sounds like.
It's like, well, Python has range built in, whatever.
But no, this is cool.
Yeah.
Yeah.
That's it for our main item.
So what else do you want to tell folks about?
Well, I spent some time last night.
I think I brought this up, I don't know, last time or the time before,
that I have a few open source projects, not many,
but one of them was lacking some work
because it had a bunch of support requests or whatever you call them, issues.
So PyTest check, I went in last night,
I went and cleaned all those up and solved a couple
minor problems. But one of the things that I ran into that was interesting, and I don't, I mean,
I just kind of wanted to highlight it, is plugin for PyTest. There are other plugins for PyTest.
Some of them don't work together very well because of all the, the way they abuse and use PyTest.
I'm definitely abusing PyTest hook functions with PyTest check.
Intentionally, what it does is it allows you to check certain things within your test,
but not fail right away so that you can continue on.
And then if any of the checks fail, it actually fails the entire test and tells you all of
the failures.
It fails them at the end, not as it hits the first one, right?
Yes, but to get away with that, the only way I could figure out is to hook into the report
function, which happens much later after the test completes.
Well, so there's a whole bunch of other plugins that allow you to rerun tests if they fail.
There's rerun failures, there's flaky, there's retry, and there's a handful of others.
Most of them are not compatible with PyTest Check
because of the way at the time that they're checking
to see if something fails and the time I'm checking.
So I guess I just want to point out
that if you want that to happen,
rerun failures works, flaky and retry don't.
Nice.
Oh, that's really cool.
I wonder if you could monkey patch flaky and retry don't nice oh that's really cool i wonder if you could like monkey patch
flaky or retry to like force it to check later or something like that maybe but also i actually
commented in the defect report that it doesn't work with flaky and i said well i think it should
try and i had a comment from somebody that said you're just gonna kill yourself off if you think
that you're gonna try to make it compatible with all the plugins out there.
So as long as there is a workaround, it's fine to say,
if you need this to work with something like this, use this other plugin and not my problem.
It seems cold, but open source is a side project.
Yeah, absolutely.
Cool. Well, I've got a couple of short ones here.
Jeremy Schendel sent in just a quick message that Pandas is now 1.0.
It had been living on the zero-ver branch for a long time,
but it has migrated over to semantic versioning,
and it has a couple of new cool features.
So we were already speaking about Pandas earlier.
If you're using Pandas or whatever, hey, Pandas 1.0 is out.
That's a big deal.
Probably also means a lot for the stability of the API.
Yeah, it's good.
For the PyCharm fans out there, myself included,
friend of the show, Anthony Shaw,
has created a PyCharm plugin called Python Security.
So we'll link to that.
And basically what it does is it goes through,
much like when you're working with PyCharm,
it automatically tells you,
oh, you're doing a type mismatch.
You're passing an int and it expects a string,
or you're calling this function and it takes two arguments,
but you're giving it three.
It does all that checking in real time.
This one is for security.
So it checks for unsafe loading the YAML files,
remote code execution in Flask,
man in the middle with requests or HTTPX,
and debug configs in Flask and Django.
So that's kind of cool.
You want that? Get that and install it nice yeah and then finally i have my python for decision makers course that
sort of talks about how to whether or not you should and how to position adopting python at
your organization so i did a webcast on that and that's already passed that went really well but
the recording of it is out so i'll link to the recording if people want to you got to like register for the thing, but then you just watch the recording.
Oh, I'll have to check that out. Nice.
Yeah. Yeah. It was fun. A lot of fun. A lot of good conversations there. All right. I don't
know about this joke, but I'm going to do it anyway. You ready?
Yeah.
You've heard about optimists and pessimists and a glass, right? A glass is either half full or
half empty depending on which side of that divide you land on, right?
Yep.
Well, there's a third angle here.
And for the engineer,
you don't see the glass is half full or half empty.
No, the glass is twice as large as it needs to be.
Exactly.
It's all about capacity planning.
Come on.
Yeah.
Okay.
So I don't have a joke,
but I came up with a little bit of a brain teaser this morning.
Okay.
Nice.
Let's have it.
Yeah.
When is 90 greater than 100?
When is 90 greater than 100?
Yeah.
Well, there's a couple places.
One, which I was informed on Twitter, is when you're comparing string literals.
True.
Yeah.
Yeah.
If you're going to say, quote, 90 less than, quote, 100, it's false. Yeah. Yeah. If you're going to say quote 90 less than quote 100,
it's false.
Yeah.
Okay.
The other one is a microwave times.
So 100.
Nice.
Anyway.
Very good.
That's it.
All right.
Well,
you've left people with something to think about and yeah,
thanks for being here.
Thanks.
Bye.
Yeah.
Bye.
Thank you for listening to Python bites.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured,
just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast
with your friends and colleagues.