Python Bytes - #107 Restructuring and searching data, the Python way

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 107, recorded December 5th, 2018. I'm Michael Kennedy. And I'm Brian Ockton. And this episode is brought to you by DigitalOcean. Check them out at pythonbytes.fm slash DigitalOcean. Huge supporters of the show, great product, and you get $100 free credit for new users. So check them out.

Starting point is 00:00:23 I'll tell you more about them later. But Brian, how you been? I'm doing really good. Good. So I hear you're working on your stand-up act. No. No? Your stand-up comedy?

Starting point is 00:00:35 No, but I do find lots of things funny. And we've got the first topic turned into a Twitter discussion that ended in a joke. And so I'm going to share that later in the show. Right. But like good jokes, punchlines go at the end, right? Yeah. So the topic I want to talk about is Glom, which I'd actually heard about.

Starting point is 00:00:57 It's a package started by Mahmoud Hashemi, who brought us ZeroVer and other great things. It's a package to try to reshape data. So if you've got like JSON or really any data that is in or data structure that's in one type and one shape and you need it in another shape or you need some of it out, that's what GLOM is written for. But it's written to be kind of like kind of used like a regular expression is. It's a general purpose tool that you can use to translate from one thing to another. And some of the cool things about it are that it's like a path-based, you can access things with a path-based access. Like, as an example, if you were going to have a 3D dictionary,

Starting point is 00:01:41 you'd have to pass in... A dictionary of dictionaries of dictionaries, sort of thing. Or maybe two levels and then an item. So it's sort of a lot of brackets and colons and brackets and quotes and stuff to specify that. So they've got a shorthand version that you can say like

Starting point is 00:01:57 a.b.c or something like that instead of all the brackets. It's a fairly simple interface to think about. It's a glom interface to think about. It's a GLOM, and then you have a target data, target specification, and then you've got some other things that you can do like default. So if like there's some data that's missing, there's a lot of Python ways to do this sort of thing, but GLOM is sort of rather complete. It does a lot of neat things. And one of the neat things it does is as you're specifying the from and portion of your data transformation, sometimes something might not be there.

Starting point is 00:02:30 Like, if you were expecting element C in a really nested dictionary, and if it's not there, or that element just doesn't exist, you might get something weird in normal Python, like the famous none type object is not subscriptable. And it doesn't tell you anything about what went wrong. So one of the things Glom does is gives you better error messages. Like could not access C part two of the path ABC, which is like, oh, well, that's way more useful than something on this line was, you know, none, basically. Yeah, exactly. And then they also built in, since it's being used in production, Mahmoud is using it at work as well.

Starting point is 00:03:10 It's got a bunch of cool things like built-in data exploration and debugging features. So when things do go wrong, you can sort of interactively try to figure out what went wrong. That's really cool. I love this, the way that it works.

Starting point is 00:03:23 It seems really nice. I feel like you could almost do a little like a minor tweak to it to make it even cooler where you can do straight attribute access so you say glom parenthesis data and then the string a.b.c it feels to me like you could extend it say glom of data dot a.b.c

Starting point is 00:03:40 and have it understand that and sort of apply it so it doesn't look like function calls. It looks more like attribute access once you sort of glomify an object, who knows. But either way, I still think this is really nice, especially if you're working with data that comes, like you said, in nested dictionaries or things like that,

Starting point is 00:03:59 where you haven't built like some sort of object structure to pack it into with like Marshmallow or something. You're just like, I'm going to work with this dictionary and it's kind of painful. This seems like it takes a lot of the pain away. Yeah, I have a use case right now that we're pulling JSON out of. We took an off-the-shelf JSON reporter for PyTest that reports all the test output in JSON. And it's nice, but it reports like way too much than we care about.

Starting point is 00:04:28 So we're going to use this to, or something like this to translate from what we're getting to a data structure that's easier to work with. Yeah, that's quite cool. Super nice. So I think there's this topic I want to bring up. Let me just know if we've covered it before.

Starting point is 00:04:45 It has to do with GUIs and Python. So who's doing stand-up now? I think you're doing the stand-up. Yeah, I know. Pretty much. Oh, my gosh. So long ago, you and I, we started down this path on this journey of exploring what we thought were the UI frameworks, like WXPython, the Phoenix release, and Python for Qt coming along.

Starting point is 00:05:07 Those were like the big pieces of news, and there still are. But it seems like every week somebody's like, oh, I know you guys have talked about 26 other cool UI frameworks, but do you know about X? Yeah. Right? And, you know, even the guy behind Python Simple GUI is like doing all sorts of cool stuff since we started talking about it on the show. And there's a lot of cool things happening here. Yeah, you picked out a really neat one. This is a really scientific computing Python GUI focused thing.

Starting point is 00:05:37 And it's really, really simple. It's not for building super complicated things. The idea is I've got some object or set of objects, and I would like to create a GUI around it. So, you know, for example, they have like a, this camera concept, and the camera has a gain and an exposure and some functions and stuff like that. Like you take a picture based on those settings. And what you can do, it's a little bit like SQL alchemy and that you specify these are the traits of this object and then use this thing called traits UI from in thought and you can upgrade

Starting point is 00:06:12 that to like a form basically so you can say show the camera and it pops up a form it says what is the gain what is the exposure and you can even control the widgets that go there. So like an up-down numerical thing and so on. You can pack on graphs through this Kakoa thing, also from InThought. And it's just a really simple way to take an object, show it to the user in a GUI form, and get their values back. It's pretty cool. And so the mindset kind of is people that are, again, a lot of people are using Python that are not. Programming isn't their main job. So this is something where people would, they need access to, you know, like, let's say a device interaction or something like this example, but they need to be able to control it and use their interface. And it doesn't have to be beautiful, but it,

Starting point is 00:06:59 but actually this looks pretty good. It doesn't look terrible. And what's cool is the foundational framework, it'll actually find its way to select like WX Python or PySide, which is the Python for Qt variant or PyQt 5. So it'll cycle through the known frameworks and basically say, well, I found WX Python, so we're using that, for example, which is really cool because a lot of those frameworks are much better looking than say TK enter by default. Yeah. Yeah. That's cool. Yeah. So you can, if you ship your little app, like you PI installer it with, you know, WX Python, it'll use that you PI installer it with cute for Python. It'll do that. That's really cool. Now I kind of want to go out and see if I can write a, like an oscilloscope interface with this, but like I got other things to do. Oh, come on. You got a few hours, don't you?

Starting point is 00:07:45 Yeah. Awesome. All right. Well, what's next? Another, taking data from one format and putting it in another one. I found another tool that I figured I'd cover in the same episode because I'm comparing them at the same time. And so this one is called PAMPY, P-A-M-P-Y. It's a pattern matching for Python you always dreamed of. That's their tagline. It's a very small focused library that it's kind of got a neat interface that's pretty easy to catch up. It's got a really interesting interface, yeah. Yeah, so the example that we're going to stick in the show notes is you just say from PAMP import match and underscore. So they're overusing, they're reusing underscore or using it as a thing.

Starting point is 00:08:30 And so you give it a pattern of known, like a known data structure pattern. And then you put these blanks in the places where you expect other values. And then you call match with any data you want. And then this pattern, and then it spits out as many variables as you've put underscores in, if they match. So you can just sort of go through a whole bunch of data and pull out just the bits you need, as long as they match the pattern. This is kind of similar to the one you had before, but it's like regular expressions applied to hierarchical structure of data in

Starting point is 00:09:05 like a weird, weird way. So let me see if I can try to like visualize this for folks. So if you have a variable that is a list and unless you have one, and then the next item is actually the list two, three, and then four, you can say match, you know, list of one comma, some underscore, a list that contains an underscore and a three and then an underscore. And then every, whenever you run it through that, it'll say, well, we found a match and the values for the two underscores were two and four. That's pretty cool.

Starting point is 00:09:33 And then the last thing you pass in is the what to do if you find a match. And so you can post it, pass in a function that takes that many parameters or a Lambda expression or something if you want and it'll um call your function with um with those parameters and do whatever so yeah you also just write a function that returns the value so you can capture it which is kind of cool as well yeah yeah very nice i like it it's one of those things that i think looks really cool and i think would be really useful but i would forget forget to use it. You know, so I guess I got to remember to use this thing next time that I have like a situation

Starting point is 00:10:09 where it would be a really good fit. And where, you know, it's a match for the problem I'm solving. Nice. But it's one of those things also I like. I like to see more packages that are just small, sharp tools for one use case or use them for whatever.

Starting point is 00:10:23 But I mean, i use screwdrivers for all sorts of stuff but you know yeah the little backhand part is good for beating stuff in like nails and whatnot yeah yeah i think that's a great great point all right now before we get on to the next one which has some pretty practical applications actually i just want to tell you all about digital ocean so one of the features i've been really happy with lately is their idea projects because you go to some of these cloud providers and there's just tons of assets. There's servers, there's IP addresses, there's load balancers. They're all just spread in there.

Starting point is 00:10:54 And you don't know which one goes with which. Maybe you've got a QA environment or a staging environment and a production one. Which goes with which? Unless you've named it really carefully. And even then it's hard. So at DigitalOcean, you can go create a project, like a production Python Bytes server, a project and put the servers and the floating IP addresses and all that in there.

Starting point is 00:11:13 Same for staging and so on. So they've got all sorts of cool features. If you check them out at pythonbytes.fm slash DigitalOcean, you'll get $100 credit for new users and definitely working out well for us. You guys should check them out. Speaking of getting checked out, sometimes people get sick or they may be sick and you have to go to the doctor and the doctor takes some kind of picture and says, I looked at this, this scan and either you're okay or you're not okay. Right. It turns out though, that analyzing pictures for patterns is something that AI can do really well, right? Yeah. Yeah. So Google recently took

Starting point is 00:11:51 in this article, it's so funny. It says, well, they took this off the shelf AI and they pointed it at mammogram scans to try to detect a breast cancer. And what they found out was a couple of things that were super, super interesting. First, this thing they called Lina was able to correctly identify tumorous regions 99% of the time, the AI was. That's amazing. I mean, it's not 100%,

Starting point is 00:12:21 but it is much better than doctors. I can't remember what the doctor percentage was, but it was way off. If you have, if it's really a bad case, then it's pretty easy. But this is like early detection, right? And catching cancer early is the key. And this is like much, much better than doctors did. So that's really great. So I guess the first, the question is, does this mean that all the radiologists and their jobs and the cancer pathologists, their jobs are just gone, right?

Starting point is 00:12:49 Is that what it means, right? Because that could be what AI means, say, for truck drivers or taxi drivers. But you always think that it's kind of low in jobs. But is that really, do people who have medical degrees, are they in danger of being like kicked out of a job by AI? I honestly am on the fence. I don't really know. This is not a great sign for that skill because computers are getting so good at it. But one good sign is they did a second trial where they took six pathologists and they let them do diagnosis with and without the AI's assistance.

Starting point is 00:13:23 And they said with the assistance, the doctors found it easier to detect these small problems and it only took half as long. Yeah, well, that's what I was going to say. I mean, like it says 99% of the time, but that's not a real statistic. We want to know like how many false positives, how many false negatives. There's going to be gray area where like the computer says,

Starting point is 00:13:41 yep, there's cancer there. And I'm 100% sure or, you know, close. All those cases, the doctor probably would have found it also. But having the computers do it is going to be better. And then the gray area is we're going to always need doctors to look at the stuff that's like questionable, like 50% chance that there might be. And they can look at it and go, yeah, maybe we should redo the test or something or whatever. I don't know about other countries, but I think all of us have a

Starting point is 00:14:11 shortage of doctors. If we can have the same doctors do 10 or 100 times more patients with the help of AI, then go for it. Let's do it. Yeah, I think that's the real bright point here is to have more doctors and not just having more doctors, but having doctors more evenly distributed. In a large country like the US, there's very rural parts and there's very urban parts. And the access to doctors you have in a big city versus, you know, a hundred miles from a big city in a tiny town, that is not the same, right? But I can easily see taking a scan at your local doctors, shooting it up to the cloud, it says this, you jump on a Zoom

Starting point is 00:14:51 meeting with another doctor for five minutes, it says, hey, here's what the AI says, I checked it over, I agree, here's what we're going to do. Either, you know, you come to the city for treatment, or actually, you're fine, you just hang out. So I think in the democratization of this for people, I think this is really good. Yeah, and speeding things up too. It might be that on the walk back from the scanning area of your doctor's office back to your normal room, in that time maybe we could have an answer for you

Starting point is 00:15:22 instead of having to call you later tomorrow or something. Yeah. It's all good. Yeah, it's definitely good. All right. So this next one, is this like a little bit like 100 Days of Code? What is this? I think it is, but it's like Christmassy.

Starting point is 00:15:36 So this is the Advent of Code. And this has been around since 2015. And it's at adventofcode.com. It's just sort of a fun code challenges that they reveal one per day for 25 days in December. And you've got just small programming puzzles covering a wide variety of skill sets. But they're sort of geared both easy to hard, and there's not a particular programming language you can use. So a lot of people have said or have heard people say they solve them in their most comfortable language. But then also you've got puzzles of past years available too. If you're learning a new language, you can try to solve these puzzles in a new language as well.

Starting point is 00:16:20 Yeah, I really like it. That's pretty cool. And the fact that it comes one a day is pretty sweet. Yeah, and it says it doesn't need a lot of computational power, so it should be accessible. Yeah, and then we've also put a link into a GitHub repo that's called Awesome Advent of Code, which is a whole bunch of extra resources like links to where people have posted their solutions in particular languages or things like that. So if you're really into it, you can check that out also. Yeah, I love it. And it's quite timely. Yeah, I guess people are maybe a couple of days behind. I'll have to do a few in a row, right? Being December 5th, but that's okay. All right, the last one is a nice year end type of thing as well. And it's it has to do with the the sun setting of legacy Python, which most people agree agree I think is a good thing, right? Definitely.

Starting point is 00:17:08 Yeah, definitely. So when I think of some of the holdouts for legacy Python, Python 2, if you will, it's often these enterprises. They have big code bases. They don't really want to change them. They don't have a large motivation to change them. They're often using something like Red Hat Linux because they want the stability of that, the long-term support of that. So the news is Red Hat Linux 8 is now updated for Python 6. Sorry, 3.6 by default. 6 would be awesome. That'd be a huge announcement. Now, 3.6 by default

Starting point is 00:17:41 instead of 2.7. So that's pretty interesting, right? Yes, very interesting. By default, yeah. I think I'm linking to the Reddit page. Yeah, I'm linking to the Reddit discussion that then links to the main article because there's some funny stuff in there. And I think, Brian,

Starting point is 00:17:56 I don't know if this comes from us in any way or maybe Matthias who started this way back when, but the very first comment was just simply correcting the title to say, no, you know, you didn't mean to say two seven. You meant to say legacy Python. Yes. Keep going people. Keep going. Yes. So yeah, it's pretty cool. They said they have only limited support for Python two seven and also no version of Python will be installed by default. So you've got to install 3D as well.

Starting point is 00:18:25 But that's what most of the stuff defaults to. Actually, that's kind of cool because then with nothing installed by default, we can probably use some statistics better because it's hard to tell. If it just comes with your install installation, then we don't really know what people are choosing. Right. Absolutely. Yeah.

Starting point is 00:18:48 So there's a couple of comments that are interesting. It says Python 2.7 is available as a package, but it'll have a shorter life. And the reason it's still available is to facilitate a smoother transition to Python 3. That's one. And they also say customers are advised to use python 3 or python 2 directly because the shebangs that or sorry hash bangs that you put at the front at the file like to say this should be executed in bash this should be executed in python well now you have to specify a major version you can't say like python 2 yeah you can't just say python up there that's actually an error

Starting point is 00:19:23 you'll see you have to say python 2 or 3 if you want this to actually run because they want you to opt in and not just choose some sort of default thing. It's pretty cool. Yeah. So another step towards the present future. Yeah. So I've never seen hashbang before. Yeah, I usually see it as shabang, but they say hashbangs here, yeah.

Starting point is 00:19:43 Okay. It must be the enterprise term maybe cool well that's pretty much all the news we have for this week there's tons more we're always not covering all the items there's so much going on but that that's our news i do want to throw out a one thing here and i know brian i'm still waiting for that punch line there so before we get to that though i want to say thanks to thanks to Brian McCullough over at Tech Meme. So Tech Meme is a website that's got like all the latest news on tech, which is pretty cool. And they have a podcast called The Long Ride Home.

Starting point is 00:20:15 You can check that out. So the reason I'm bringing this up here is it's a pretty cool show. It's kind of like Python Bytes, but more for like general tech. You know, like, oh, Google acquired this company or this thing's happening to the iPhone or whatever. Right. So it's good analysis. That's well done. It's about the same length. But the reason I'm calling him out and saying thank you is he actually covered Python Bytes as the first recommended podcast on his show. So I just want to say thanks, Brian.

Starting point is 00:20:41 And you guys can check out their show as well. Yeah, definitely. And because they did that, which is it's a really cool call out to say thanks, Brian, and you guys can check out their show as well. Yeah, definitely. And because they did that, which is a really cool call out too. Thanks, Brian. But I listened to a couple episodes and I kind of liked it. It's nice. Yeah, it's nice. I like it. It's a good sort of a cousin of the show, if you will.

Starting point is 00:20:55 Yeah. All right. All right. Tell me about this punchline, man. Okay. So I had heard of GLOM before, but I heard about it a lot more when I had Mahmood on Testing Code. but I heard about it a lot more when I had Mahmoud on testing code. And we talked about GLON,

Starting point is 00:21:07 but we mostly were talking about how difficult it was to test it because if you're using a high-level construct, you don't have to write very much code for it. So your code can be 100% covered, but you really haven't covered all the cases yet. So how do you deal with that? So we talked about that. And then Anthony Shaw got on Twitter and started talking about some of the ways we could increase

Starting point is 00:21:30 the coverage of Glom. And then I pointed out holes in his solution. And then he replied with this joke. And the joke originally came from Brennan Keller. And it's a QA engineer walks into a bar. He orders a beer, orders zero beers, orders 9,999,000 beers, orders a lizard, orders minus one beers, orders a random set of characters. Okay, now a first real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone. I love it. It's so perfect.

Starting point is 00:22:08 Anyway, it has nothing to do with anything. It's just funny. Yeah, no, it's really good. I like it. It's great. Thanks for sharing. And thanks for doing this podcast with me. It's been fun. It's fun as always. We're going to keep it rolling strong into 2019 for sure. Catch you later. All right. Bye. Bye. Thank you for listening to Catch you later. Bye. Bye. Thank you for listening to Python Bytes.

Starting point is 00:22:26 Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchcken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #107 Restructuring and searching data, the Python way

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.