Python Bytes - #107 Restructuring and searching data, the Python way
Episode Date: December 7, 2018Topics covered in this episode: [play:0:52] glom: restructuring data, the Python way [play:5:31] Scientific GUI apps with TraitsUI [play:7:49] Pampy: The Pattern Matching for Python you always drea...med of [play:11:28] Google AI better than doctors at detecting breast cancer [play:15:37] 2018 Advent of Code [play:16:56] Red Hat Linux 8.0 Beta released, now (finally) updated to use Python 3.6 as default instead of 2.7 Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/107
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 107, recorded December 5th, 2018.
I'm Michael Kennedy.
And I'm Brian Ockton.
And this episode is brought to you by DigitalOcean.
Check them out at pythonbytes.fm slash DigitalOcean.
Huge supporters of the show, great product, and you get $100 free credit for new users.
So check them out.
I'll tell you more about them later.
But Brian, how you been?
I'm doing really good.
Good.
So I hear you're working on your stand-up act.
No.
No?
Your stand-up comedy?
No, but I do find lots of things funny.
And we've got the first topic turned into a Twitter discussion
that ended in a joke.
And so I'm going to share that later in the show.
Right.
But like good jokes, punchlines go at the end, right?
Yeah.
So the topic I want to talk about is Glom, which I'd actually heard about.
It's a package started by Mahmoud Hashemi, who brought us ZeroVer and other great things.
It's a package to try to reshape data.
So if you've got like JSON or really any data that is in or data structure that's in one type and one shape and you need it in another shape or you need some of it out, that's what GLOM is written for.
But it's written to be kind of like kind of used like a regular expression is.
It's a general purpose tool that you can use to translate from one thing to another.
And some of the cool things about it are that it's like a path-based,
you can access things with a path-based access.
Like, as an example, if you were going to have a 3D dictionary,
you'd have to pass in...
A dictionary of dictionaries of dictionaries, sort of thing.
Or maybe two levels and then an item.
So it's sort of a lot of brackets
and colons and brackets
and quotes and stuff to specify that.
So they've got a shorthand version
that you can say like
a.b.c or something like that instead
of all the brackets.
It's a fairly simple interface to
think about. It's a glom interface to think about. It's a GLOM, and then you have a
target data, target specification, and then you've got some other things that you can do like
default. So if like there's some data that's missing, there's a lot of Python ways to do
this sort of thing, but GLOM is sort of rather complete. It does a lot of neat things. And one
of the neat things it does is as you're specifying the from and portion of your data transformation, sometimes something might not be there.
Like, if you were expecting element C in a really nested dictionary, and if it's not there, or that element just doesn't exist, you might get something weird in normal Python, like the famous none type object is not subscriptable.
And it doesn't tell you anything about what went wrong.
So one of the things Glom does is gives you better error messages.
Like could not access C part two of the path ABC, which is like, oh, well, that's way more useful than something on this line was, you know, none, basically.
Yeah, exactly.
And then they also built in,
since it's being used in production,
Mahmoud is using it at work as well.
It's got a bunch of cool things
like built-in data exploration
and debugging features.
So when things do go wrong,
you can sort of interactively
try to figure out what went wrong.
That's really cool.
I love this, the way that it works.
It seems really nice.
I feel like you could almost do a little like a minor tweak
to it to make it even cooler where you can do straight
attribute access so you say
glom parenthesis data and then the
string a.b.c
it feels to me like you could extend it say glom of data
dot a.b.c
and have it understand
that and sort of apply it
so it doesn't look like function calls.
It looks more like attribute access
once you sort of glomify an object, who knows.
But either way, I still think this is really nice,
especially if you're working with data that comes,
like you said, in nested dictionaries or things like that,
where you haven't built like some sort of object structure
to pack it into with like Marshmallow or something.
You're just like, I'm going to work with this dictionary and it's kind of painful.
This seems like it takes a lot of the pain away.
Yeah, I have a use case right now that we're pulling JSON out of.
We took an off-the-shelf JSON reporter for PyTest that reports all the test output in JSON.
And it's nice, but it reports like way too much
than we care about.
So we're going to use this to,
or something like this to translate
from what we're getting to a data structure
that's easier to work with.
Yeah, that's quite cool.
Super nice.
So I think there's this topic I want to bring up.
Let me just know if we've covered it before.
It has to do with GUIs and Python.
So who's doing stand-up now?
I think you're doing the stand-up.
Yeah, I know.
Pretty much.
Oh, my gosh.
So long ago, you and I, we started down this path on this journey of exploring what we
thought were the UI frameworks, like WXPython, the Phoenix release, and Python for Qt coming along.
Those were like the big pieces of news, and there still are.
But it seems like every week somebody's like, oh, I know you guys have talked about 26 other cool UI frameworks, but do you know about X?
Yeah.
Right?
And, you know, even the guy behind Python Simple GUI is like doing all sorts of cool stuff since we started talking about it on the show.
And there's a lot of cool things happening here.
Yeah, you picked out a really neat one.
This is a really scientific computing Python GUI focused thing.
And it's really, really simple.
It's not for building super complicated things.
The idea is I've got some object or set of objects, and I would like to
create a GUI around it. So, you know, for example, they have like a, this camera concept,
and the camera has a gain and an exposure and some functions and stuff like that. Like you
take a picture based on those settings. And what you can do, it's a little bit like SQL
alchemy and that you specify these are the traits
of this object and then use this thing called traits UI from in thought and you can upgrade
that to like a form basically so you can say show the camera and it pops up a form it says what is
the gain what is the exposure and you can even control the widgets that go there. So like an up-down numerical thing and so on.
You can pack on graphs through this Kakoa thing, also from InThought.
And it's just a really simple way to take an object, show it to the user in a GUI form, and get their values back.
It's pretty cool.
And so the mindset kind of is people that are, again, a lot of people are using Python that are not.
Programming isn't their main job. So this is something where people would, they need access to,
you know, like, let's say a device interaction or something like this example, but they need to be able to control it and use their interface. And it doesn't have to be beautiful, but it,
but actually this looks pretty good. It doesn't look terrible. And what's cool is the foundational framework, it'll actually find its way to select like WX Python or PySide, which is the Python for Qt
variant or PyQt 5. So it'll cycle through the known frameworks and basically say, well,
I found WX Python, so we're using that, for example, which is really cool because a lot
of those frameworks are much better looking than say TK enter by default. Yeah. Yeah. That's cool. Yeah. So you can, if you ship your little
app, like you PI installer it with, you know, WX Python, it'll use that you PI installer it with
cute for Python. It'll do that. That's really cool. Now I kind of want to go out and see if
I can write a, like an oscilloscope interface with this, but like I got other things to do.
Oh, come on. You got a few hours, don't you?
Yeah. Awesome. All right. Well, what's next? Another, taking data from one format and putting
it in another one. I found another tool that I figured I'd cover in the same episode because I'm
comparing them at the same time. And so this one is called PAMPY, P-A-M-P-Y. It's a pattern matching for Python you always dreamed of.
That's their tagline.
It's a very small focused library that it's kind of got a neat interface that's pretty easy to catch up.
It's got a really interesting interface, yeah.
Yeah, so the example that we're going to stick in the show notes is you just say from PAMP import match and underscore.
So they're overusing, they're reusing underscore or using it as a thing.
And so you give it a pattern of known, like a known data structure pattern.
And then you put these blanks in the places where you expect other values.
And then you call match with any data you want.
And then this pattern, and then it spits out
as many variables as you've put underscores in, if they match. So you can just sort of go through a
whole bunch of data and pull out just the bits you need, as long as they match the pattern.
This is kind of similar to the one you had before, but it's like regular expressions applied
to hierarchical structure of data in
like a weird, weird way. So let me see if I can try to like visualize this for folks. So if you
have a variable that is a list and unless you have one, and then the next item is actually the list
two, three, and then four, you can say match, you know, list of one comma, some underscore,
a list that contains an underscore and a three and
then an underscore.
And then every, whenever you run it through that, it'll say, well, we found a match and
the values for the two underscores were two and four.
That's pretty cool.
And then the last thing you pass in is the what to do if you find a match.
And so you can post it, pass in a function that takes that many parameters or a Lambda
expression or something if you want
and it'll um call your function with um with those parameters and do whatever so yeah you
also just write a function that returns the value so you can capture it which is kind of cool as
well yeah yeah very nice i like it it's one of those things that i think looks really cool
and i think would be really useful but i would forget forget to use it. You know, so I guess I got to remember to use this thing
next time that I have like a situation
where it would be a really good fit.
And where, you know, it's a match
for the problem I'm solving.
Nice.
But it's one of those things also I like.
I like to see more packages
that are just small, sharp tools for one use case
or use them for whatever.
But I mean, i use screwdrivers
for all sorts of stuff but you know yeah the little backhand part is good for beating stuff
in like nails and whatnot yeah yeah i think that's a great great point all right now before we get
on to the next one which has some pretty practical applications actually i just want to tell you all
about digital ocean so one of the features i've been really happy with lately is their idea
projects because you go to some of these cloud providers and there's just tons of assets.
There's servers, there's IP addresses, there's load balancers.
They're all just spread in there.
And you don't know which one goes with which.
Maybe you've got a QA environment or a staging environment and a production one.
Which goes with which?
Unless you've named it really carefully.
And even then it's hard.
So at DigitalOcean, you can go create a project,
like a production Python Bytes server,
a project and put the servers and the floating IP addresses and all that in there.
Same for staging and so on.
So they've got all sorts of cool features.
If you check them out at pythonbytes.fm slash DigitalOcean,
you'll get $100 credit for new users
and definitely working out well for us. You guys should check them out. Speaking of getting checked out, sometimes
people get sick or they may be sick and you have to go to the doctor and the doctor takes some kind
of picture and says, I looked at this, this scan and either you're okay or you're not okay. Right.
It turns out though, that analyzing pictures for patterns is something that AI can do really well, right? Yeah. Yeah. So Google recently took
in this article, it's so funny. It says, well, they took this off the shelf AI and they pointed
it at mammogram scans to try to detect a breast cancer. And what they found out was a couple of things
that were super, super interesting.
First, this thing they called Lina
was able to correctly identify tumorous regions
99% of the time, the AI was.
That's amazing.
I mean, it's not 100%,
but it is much better than doctors.
I can't remember what the doctor percentage was, but it was way off.
If you have, if it's really a bad case, then it's pretty easy.
But this is like early detection, right?
And catching cancer early is the key.
And this is like much, much better than doctors did.
So that's really great.
So I guess the first, the question is, does this mean that all the radiologists and their jobs and the cancer pathologists, their jobs are just gone, right?
Is that what it means, right?
Because that could be what AI means, say, for truck drivers or taxi drivers.
But you always think that it's kind of low in jobs.
But is that really, do people who have medical degrees, are they in danger of being like kicked out of a job by AI?
I honestly am on the fence.
I don't really know.
This is not a great sign for that skill because computers are getting so good at it.
But one good sign is they did a second trial where they took six pathologists and they let them do diagnosis with and without the AI's assistance.
And they said with the assistance, the doctors found it easier to detect these small problems
and it only took half as long.
Yeah, well, that's what I was going to say.
I mean, like it says 99% of the time,
but that's not a real statistic.
We want to know like how many false positives,
how many false negatives.
There's going to be gray area where like the computer says,
yep, there's cancer there.
And I'm 100% sure or, you know, close.
All those cases, the doctor probably would have found it also.
But having the computers do it is going to be better.
And then the gray area is we're going to always need doctors to look at the stuff that's like questionable,
like 50% chance that there might be.
And they can look at it and go, yeah, maybe we should redo the
test or something or whatever. I don't know about other countries, but I think all of us have a
shortage of doctors. If we can have the same doctors do 10 or 100 times more patients with
the help of AI, then go for it. Let's do it. Yeah, I think that's the real bright point here
is to have more doctors
and not just having more doctors, but having doctors more evenly distributed. In a large
country like the US, there's very rural parts and there's very urban parts. And the access to
doctors you have in a big city versus, you know, a hundred miles from a big city in a tiny town,
that is not the same, right? But I can easily see
taking a scan at your local doctors, shooting it up to the cloud, it says this, you jump on a Zoom
meeting with another doctor for five minutes, it says, hey, here's what the AI says, I checked it
over, I agree, here's what we're going to do. Either, you know, you come to the city for
treatment, or actually, you're fine, you just hang out. So I think in the democratization of this for people,
I think this is really good.
Yeah, and speeding things up too.
It might be that on the walk back from the scanning area
of your doctor's office back to your normal room,
in that time maybe we could have an answer for you
instead of having to call you later tomorrow or something.
Yeah.
It's all good.
Yeah, it's definitely good.
All right.
So this next one, is this like a little bit like 100 Days of Code?
What is this?
I think it is, but it's like Christmassy.
So this is the Advent of Code.
And this has been around since 2015.
And it's at adventofcode.com. It's just sort of a fun code challenges that they reveal one per day for 25 days in December.
And you've got just small programming puzzles covering a wide variety of skill sets.
But they're sort of geared both easy to hard, and there's not a particular programming language you can use.
So a lot of people have said or have heard people say they solve them in their most comfortable language.
But then also you've got puzzles of past years available too.
If you're learning a new language, you can try to solve these puzzles in a new language as well.
Yeah, I really like it.
That's pretty cool.
And the fact that it comes one a day is pretty sweet. Yeah, and it says it doesn't need a lot of computational power, so it should be accessible.
Yeah, and then we've also put a link into a GitHub repo that's called Awesome Advent of Code, which is a whole bunch of extra resources like links to where people have posted their solutions in particular languages or things like that. So if you're really into it, you can check that out also. Yeah, I love it. And it's quite timely. Yeah, I guess people are maybe a couple of days
behind. I'll have to do a few in a row, right? Being December 5th, but that's okay. All right,
the last one is a nice year end type of thing as well. And it's it has to do with the the sun
setting of legacy Python, which most people agree agree I think is a good thing, right?
Definitely.
Yeah, definitely.
So when I think of some of the holdouts for legacy Python, Python 2, if you will, it's often these enterprises.
They have big code bases.
They don't really want to change them.
They don't have a large motivation to change them.
They're often using something like Red Hat Linux because they want the stability of that,
the long-term support of that. So the news is Red Hat Linux 8 is now updated for Python 6.
Sorry, 3.6 by default. 6 would be awesome. That'd be a huge announcement. Now, 3.6 by default
instead of 2.7. So that's pretty interesting, right?
Yes, very interesting.
By default, yeah.
I think I'm linking to the Reddit page.
Yeah, I'm linking to the Reddit discussion
that then links to the main article
because there's some funny stuff in there.
And I think, Brian,
I don't know if this comes from us in any way
or maybe Matthias who started this way back when,
but the very first comment was
just simply correcting the title to say,
no, you know, you didn't mean to say two seven. You meant to say legacy Python.
Yes. Keep going people. Keep going. Yes. So yeah, it's pretty cool. They said they have
only limited support for Python two seven and also no version of Python will be installed by default.
So you've got to install 3D as well.
But that's what most of the stuff defaults to.
Actually, that's kind of cool because then with nothing installed by default,
we can probably use some statistics better because it's hard to tell.
If it just comes with your install installation,
then we don't really know what people are choosing.
Right.
Absolutely.
Yeah.
So there's a couple of comments that are interesting.
It says Python 2.7 is available as a package, but it'll have a shorter life.
And the reason it's still available is to facilitate a smoother transition to Python 3.
That's one.
And they also say customers are advised to use python 3 or python 2 directly because
the shebangs that or sorry hash bangs that you put at the front at the file like to say this
should be executed in bash this should be executed in python well now you have to specify a major
version you can't say like python 2 yeah you can't just say python up there that's actually an error
you'll see you have to say python 2 or 3 if you want this to actually run
because they want you to opt in and not just choose some sort of default thing.
It's pretty cool.
Yeah.
So another step towards the present future.
Yeah.
So I've never seen hashbang before.
Yeah, I usually see it as shabang, but they say hashbangs here, yeah.
Okay.
It must be the enterprise
term maybe cool well that's pretty much all the news we have for this week there's tons more we're
always not covering all the items there's so much going on but that that's our news i do want to
throw out a one thing here and i know brian i'm still waiting for that punch line there so before
we get to that though i want to say thanks to thanks to Brian McCullough over at Tech Meme.
So Tech Meme is a website that's got like all the latest news on tech, which is pretty cool.
And they have a podcast called The Long Ride Home.
You can check that out.
So the reason I'm bringing this up here is it's a pretty cool show.
It's kind of like Python Bytes, but more for like general tech.
You know, like, oh, Google acquired this company or this thing's happening to the iPhone or whatever.
Right. So it's good analysis.
That's well done. It's about the same length.
But the reason I'm calling him out and saying thank you is he actually covered Python Bytes as the first recommended podcast on his show.
So I just want to say thanks, Brian.
And you guys can check out their show as well.
Yeah, definitely. And because they did that, which is it's a really cool call out to say thanks, Brian, and you guys can check out their show as well. Yeah, definitely. And because they did that, which is a really cool call out too.
Thanks, Brian.
But I listened to a couple episodes and I kind of liked it.
It's nice.
Yeah, it's nice.
I like it.
It's a good sort of a cousin of the show, if you will.
Yeah.
All right.
All right.
Tell me about this punchline, man.
Okay.
So I had heard of GLOM before, but I heard about it a lot more when I had Mahmood on
Testing Code. but I heard about it a lot more when I had Mahmoud on testing code.
And we talked about GLON,
but we mostly were talking about how difficult it was to test it
because if you're using a high-level construct,
you don't have to write very much code for it.
So your code can be 100% covered,
but you really haven't covered all the cases yet.
So how do you deal with that?
So we talked about that.
And then Anthony Shaw got on Twitter and started talking about some of the ways we could increase
the coverage of Glom. And then I pointed out holes in his solution. And then he replied with this
joke. And the joke originally came from Brennan Keller. And it's a QA engineer walks into a bar.
He orders a beer, orders zero beers, orders 9,999,000 beers,
orders a lizard, orders minus one beers,
orders a random set of characters.
Okay, now a first real customer walks in and asks where the bathroom is.
The bar bursts into flames, killing everyone.
I love it. It's so perfect.
Anyway, it has nothing to do with anything. It's just funny.
Yeah, no, it's really good. I like it. It's great. Thanks for sharing.
And thanks for doing this podcast with me. It's been fun.
It's fun as always. We're going to keep it rolling strong into 2019 for sure.
Catch you later.
All right. Bye.
Bye. Thank you for listening to Catch you later. Bye. Bye.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchcken, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.