Python Bytes - #137 Advanced Python testing and big-time diffs

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 137, recorded June 26th, 2019. I'm Michael Kennedy. And I'm Brian Harkin. And this episode is brought to you by Rollbar. I'll tell you all more about them later. For now, Brian, I always wonder about, you know, you hear that Python is an efficient and expressive language, and if you write code in C++, it'll be a lot longer. But, you know, how can you quantify that? Well, you can set up a whole bunch of people to write the same thing in a whole bunch of languages.

Starting point is 00:00:31 Well, that's awesome that people did that. It seems like a lot of work, but yeah, I guess that's cool. Tell us about it. Like, this is your first item, right? So this is an article called Comparing the Same Project in Rust, Haskell, C++, Python, Scala, and how do you pronounce that? OCaml?

Starting point is 00:00:48 O-C-A-M-L? OCaml, I think, yeah. So this was written up by Kristen Hume, and this is about a university project, which is kind of a neat project. Basically, they had to implement a big chunk of Java. So it's a Java to x compiler and as part of a compiler class and they were uh basically had to set up get up teams teams of people to do it and they could pick any language they wanted which is kind of cool because you know people be better at

Starting point is 00:01:19 different languages so let them use what they're good at yeah let them use what they're good at because then they'll do it properly and not just try to cram it one. They'll have the most efficient use of that language for sure. Up to three people on a team, and it was a multi-month project. And then also tests were added. So this is kind of a neat part of the process, which I think is an awesome way to teach people, is have some published tests of like, you're going to have to run these test cases and they have to pass, but then also have some secret ones where people don't, they don't know what tests are going to be passed, tested against it, which is kind of nice because people will, they'll have to be able to make sure their implementation is robust without knowing,

Starting point is 00:02:00 without the test cases. It's kind of neat. Yeah, that's cool. So I do love it that there's unknown tests. Like these are the specifications. You can kind of get close with these tests, but to pass, you actually have to just work. That's totally like real life. You know, you'll have write down some specifications and there's some specifications that are not written down. They're just supposed to be known. And then there's other things that people, once they see the implementation, they'll go, oh yeah, I wish it did this also. So I think that's a cool idea. And they weren't shooting for lines of code or anything complexity.

Starting point is 00:02:33 They were just trying to finish the project. So this analysis was done after was they had a Rust, a baseline implementation written by two people that were familiar with Rust. And then they compared everything against that. So there was another Rust team that chose different design decisions, and they had like three times the code. So these are all just comparing lines of code. The Haskell implementation was about equal, but depending on how you measure it, 1 to 1.6 times the code.

Starting point is 00:03:04 Same for the OCaml. C++ was bigger, about 1.4 times the baseline. And Scala was a little bit less with about 70% of the lines of code. The big outlier was Python, which had a lot of standouts. Python implementation was half the size approximately, plus written by one person, and had extra features past all the secret tests plus others. Somebody excellent at programming, of course,

Starting point is 00:03:37 used some of the metaprogramming techniques. And anyway, kind of a fun article. One of the things I forgot to mention one of the the hindrances was they were only supposed to use standard libraries no extra parsing and then not any parsing libraries even if they were part of the standard library so even the parsing had to be kind of built up from scratch yeah how interesting i wonder if that would make things like python even better possible i don't know about rust maybe well, but like C++ doesn't have parsing libraries built in that I know of. Things like that, right?

Starting point is 00:04:09 There's a lot of mini language parsing libraries around Python. So, I mean, it'd be interesting to do that with, you know, go wild and use whatever's available. Right. Like maybe take this project and then go, all right, well, what if we hit it with all the pip installable things? What happens then, right? Yeah, exactly. Yeah, it sounds like a super intense project though, right? Like deep, deep into the language, right? I mean, on one, you're writing the compiler, you're understanding Java,

Starting point is 00:04:35 you're compiling to x86, you're doing metaprogramming. Like there's a lot of stuff going on here. It's a pretty cool article. Cool. If that last one really connected with like your deep geek outlets, like go like really hard into the language, this next one is going to connect with your, I just wanted to work really quick and easy.

Starting point is 00:04:51 Yeah. So this one is really nice. If I was a data scientist, I might use matplotlib or just any kind of person who wanted some visualization of data. I might use matplotlib for that. And that's great. Except for at least I personally can't make my plot lib look super good, right? Like if I used Excel, I could put the data in there and I'd highlight the stuff and I would say, okay, insert chart.

Starting point is 00:05:11 And I would pick the kind and then I would go and I would like right click and edit the chart. And I would like maybe drag it around to size it correctly. Double click on the axes to change the axes. But in matplotlib, you just write code and the picture comes out, right? Yeah. And I know you can do all this stuff, but it's not obvious. And you have to look every little thing up and tweak it. Yeah. So, there's this project that we heard about from one of our listeners. And I can't remember. I'm trying to remember who it was. Oh, here it is.

Starting point is 00:05:40 This is from Lee Wagner. So, thank you, Lee, for sending it in because this is killer. So there's this project called Pi Illustrator for styling your matplotlib plot. So you just do your matpl thing much like excel where you can drag and drop and arrange your different plots you can like go to the properties and edit like the axes and the colors and just all the kind of stuff that you might do it even has uh like the cool design layout stuff where like it'll help you equally space stuff between each other so put those little bars to say right there if you drag and drop it they'll be equally space stuff between each other. So put those little bars to say right there, if you drag and drop it, they'll be equally spaced or like align the tops and the sides. Yeah.

Starting point is 00:06:29 And with that, the start thing, you can even fill it with some of your data to begin with. So if you, you kind of know the data you want to plot because that's going to affect how you're going to design it. So pre-fill it and then drag it around and design it. It's just totally cool. It's totally cool. I'm glad they have. So you can pre-fill it and then drag it around and design it. It's just totally cool.

Starting point is 00:06:45 It's totally cool. I'm glad they have. So the link we're going to show has a little embedded video. And that's where, I mean, talking about it, you're like, yeah, I think this might be useful. But you watch this video and you're like, oh my God, I need to use this right away. Yes. I had the exact same experience. I'm like, ah, kind of interesting. Oh, look at the video. Oh my God, it's amazing. Yeah. So this is super cool. And obviously you don't save your changes to like an Excel workbook. What you do is you save your changes and you can actually call save in PyListrator. And what it'll do is it'll put the configuration in Python back into the file that ran it. So that's pretty wild actually. Yeah. And then you uncomment the PyListrator. You don't have to import it later because it's not a dependency on

Starting point is 00:07:27 your project afterwards. Right, it's just a little design tool. So it's super cool if anyone's doing matplotlib and they want to have it styled, especially if you're doing more than one plot and you want to put them side by side. This is super cool. So check that out. I'm definitely a fan. Another thing I'm a fan of, Brian, MongoDB. Love it. Since you and I are paying attention to a lot of projects, there's a lot of different release cycles, and we kind of decided early on that we weren't going to try to track everybody's releases because that might get boring to people.

Starting point is 00:07:56 However, we covered MongoDB 4 because it came out with transactions, which was a big thing and but 4.2 is out and i'm kind of excited about a couple features that it came out with so the transactions are there but now they're multi-doc they're distributed transactions so they they're transactions that cross uh sharded clusters and replica sets and that's just really cool yeah Yeah, that's super cool. Yeah, I mean, you could use a cool transactional set before, but you're kind of limited, right? And now it's like,

Starting point is 00:08:29 no matter what crazy cluster with scaling and sharding and replication you have set up, you just do a transaction and it's all good. Pretty cool. They're a good idea anyway. But with testing,

Starting point is 00:08:38 you can set up a complex database full of stuff. And then at the beginning of your test, start a transaction. And then after your beginning of your test, start a transaction. And then after your test, and you roll this into a fixture, you can just roll back and your next test has the same data. It saves time. So that's cool. Yeah.

Starting point is 00:08:54 And it's probably got isolation if for some reason they ran in parallel or whatever. Yeah, it's really cool. Yeah. The other feature that's pretty amazing is the field level encryption. And this is encryption done on a per field basis on the client side. So the server doesn't even have it doesn't do that. It's not doing it on the server. So there's like system administration can be done without having to make sure everybody

Starting point is 00:09:20 signs NDAs and all that stuff that you can just, you can manage your database without even being exposed to any of the secret stuff. Yeah, that's awesome. Like most databases, most of what's in them is not sensitive, but there's often like a little bit that is, that's really, you don't want, you don't want anyone to get access to. And yeah, this is really cool because like you said, it's done in the library that talks to MongoDB. So in PyMongo for the Python folks. And you just set the encryption key or decryption key over there. And the server cannot decrypt it.

Starting point is 00:09:58 So if somebody breaks into the server or you lose it, or it's like you set it up on the cloud for like testing, you forget that it's there. All the kind of random stuff that happens to databases. It doesn't matter in terms of this encrypted stuff because literally the database doesn't know how to read it. It's the drivers on the client side that have the keys. So with GDPR stuff, if the customer says, hey, delete my stuff,

Starting point is 00:10:18 that's always been an issue with databases. It might be in a whole bunch of tables, but if you destroy the customer key, the data might still be there, but it's unreadable to anybody. So it may as well be garbage. Absolutely. And it gets to be really tricky because even if you set up the right code to delete all the customer data out of your database, what about the backup that somebody made when the older admin was hired and they stored that in the S3 buckets, so it was offsite, right? How do you delete the data

Starting point is 00:10:51 out of there? You know what I mean? But if it's encrypted, then you can just throw away the encryption key and then it's just gobbledygook. Yeah, cool. Pretty cool. I like it. Speaking of cool, Rollbar, happy to have them come along and sponsor the show. We use Rollbar on PythonBytes.fm, among other things. So if anything goes wrong, and it's kind of fortuitous, I guess, I woke up this morning with a ton of Rollbar messages because there was a data center failure that caused some connectivity between MongoDB and PythonBytes.fm.

Starting point is 00:11:27 How about that for a funny thing? So some network card broke, right? And like the site couldn't talk to the database server, so it was freaking out. How do I know? Nobody complained to me. They probably should. Like, Michael, your site's down. What's going on?

Starting point is 00:11:40 It's really messed up. But I just opened up my email. I'm like, whoa, there's a lot of roll bar stuff going on it's really messed up but i just opened up my email like whoa there's a lot of rollbar stuff going on here so if if you want to be notified right away even when users don't tell you check them out they have a free tier they have some great paid tiers visit pythonbytes.fm slash rollbar super easy to integrate into python into the web frameworks they've just got like one or two lines you enter and or maybe a little configuration a few settings off you go it's really really nice so check them out pythonby go. It's really, really nice.

Starting point is 00:12:05 So check them out, pythonbytes.fm slash rollbar. Nice. So kind of like PyLustrator, that sounds kind of useful and interesting. This next one also sounds useful and interesting, but like PyLustrator, it's like as you look into it, you're like, whoa, this thing does a lot, man. Look at it go. So there's this project that was recommended by Francois Leblanc thank you for that Francois and it's called deep difference was a lot it's just called deep diff and so it does

Starting point is 00:12:32 deep differences in search of any python object graph so I've got an object which holds a list that list points to a bunch of objects those have other other pointers. Like I want to know, is this thing somehow referenced by that? Let me do a search on it. Where is it? Is this giant crazy data structure same or different than other giant crazy data structure? And you could compare them. So that's pretty cool. So it has deep diff, it has deep search, and then also has deep hash.

Starting point is 00:13:01 So if I've got some giant crazy data structure, you would like to know that if the data is the same across two of those, that the hash result is identical. And if any part of the data changes that the hash then changes. Oh, yeah, possibly, right. So you will do that on object graphs that are not even hashable themselves. Really? Yeah. So that's pretty wild. I just a lot of nice touches in here that kind of made me realize like, wow, this? Yeah. So that's pretty wild. I have just a lot of nice touches in here that kind of made me realize like, wow, this is wild. So for example, it'll give me the differences in a list, ignoring orders and duplicates, right? Just what is the essence of this data? Or you can say, is any data repeated in this list or in this dictionary or something like that?

Starting point is 00:13:47 You can exclude certain types. Maybe I want to know the data is the same, but they're both using a thread object, and the thread object is different, so of course they're going to be different. But say, don't check on the thread object. Just check the other stuff. So you can explicitly opt in or out data types that you might use. You can say, I'd like to compare these things, but only to like four significant digits because I computed them slightly differently. And maybe they're, you know, I can't get them like to the decimal accuracy to be exactly the same,

Starting point is 00:14:15 just the way they're done. Right. You can exclude parts of your object tree that you've got for compared. I mean, this insane. They being able to do like significant digits in a deep data structure that's amazing that's really cool for for a lot of stuff i work with um yeah i can imagine exactly and you know what i bet this would be really good to mix in with testing like you create your test data and then you deep diff it against the result yeah exactly because there may be noise in the system and you know you know some of the signals are noisy. So, yeah, this is awesome. Cool.

Starting point is 00:14:46 It's super simple, but, yeah, it's pretty cool. So if that sounds like problems you're trying to solve, it sounds like you are, Brian, then I think it's definitely worth having a look at. Yeah, thanks. Yeah, you bet. See, we just do this podcast to help each other out. People can listen in. Yeah.

Starting point is 00:15:01 Speaking of testing. Josh Peek is somebody that we, I'm sure we met him before at a previous PyCon, but he stopped by at PyCon this last year. Yeah. Speaking of testing. In a situation at work where he was asked to do complex tasks where he had to, he knew that testing and making sure that he was doing things properly would, and do good coding practices would help the entire process and make it go smoothly. So this is sort of a start to finish summary of it, but it's not that long of a read, but he talks about his learning journey, which he includes some great podcasts, including ours. Also, an awesome book on testing, and I know the author for that one. Not just plugging our own stuff, he's got some great stuff in here. He starts off with just a basic, for people new to testing, what a basic test function looks like and having good structure. But then he talks about, he wanted to ensure, you know, do static analysis and code style. So he uses black within his testing. And when he was talking about using Pylint, I don't use Pylint every day. So I didn't know that there was,

Starting point is 00:16:19 it's a very comprehensive check, but it takes some time for large codebases. I didn't know that. But he has a cool hack that he puts in place for check-in tests, only lint modified files. Oh, that's cool. Because of course, if they're unmodified, then why would they have a different outcome?

Starting point is 00:16:39 Right. And then he uses incorporating Flake8 to do dockstring testing to make sure that people are using consistent dock string styles. He covers all of his ToxInny configuration changes. He was trying to increase his code coverage, so he includes coverage.py, but then also has a cov fail under flag that he adds for testing to make sure that if code coverage drops below a certain point, it fails the test. And then

Starting point is 00:17:12 just generally gradually ratchet that up so that increase his target was 75%. So it even goes into fixtures and mocks and spies and stubs and then even a cool tool called PyTest VCR, which records your network interactions and then replays those for future test runs. And he saw a 10x speedup in that. That's really cool.

Starting point is 00:17:35 There's so much cool stuff in here. PyTest VCR, that's really cool. I think the only problem with it is like maybe a lot of folks using it have no idea what VCR means. Oh, yeah, that's true. I mean, even, yeah. Yeah, but no, it's awesome that you just record the network interactions and don't have to depend on anything at all. I love it.

Starting point is 00:17:53 And the recordings are done based on a per test basis. So if you rerun an individual test, it only plays back the recording for that portion. It doesn't have order dependency built in, which is cool. Yeah, super cool. I love it. Yeah, that's a really nice article, Josh. Well done. The last one I want to talk about was sent over by Kevin Books. Now, we've covered a few of the language sort of language level learning things recently, we talked about the CPython byte compiler, either last time or the time before that, how it doesn't really optimize stuff. And maybe there's some opportunities there,

Starting point is 00:18:31 but more just to understand what's going on. So Kevin sent in a message, said, hey, I'm basically a C, C++ guy. And I saw the Dell keyword in Python and it threw me for a loop because Dell seems like delete in C++, which means free memory. But it doesn't necessarily mean that in Python. So it even seems like some of the books out there are kind of being a little misleading, at least according to Kevin's reading of them. So I thought I'd just pull up an article that he sent over and then talk a little bit over some of the uses for Dell. Great. I don't use it. So this would be good. Yeah. So the context where I know Dell is I want to get something out of a list or I want to get

Starting point is 00:19:13 something out of a dictionary. Right. Okay. And it's a little bit weird. It's like in keyword, right? A lot of times I would expect some operator to be on the object I'm modifying, right? Like list or, you know, string dot in or something, and you give it the value, right? But you say string space in space, the variable, right? So it's a little bit funky that you apply it not on the object, but as a keyword in the language and Dell's like that, right? So if I have a dictionary and I want to remove a key, not set it to nothing, but make it not be in the keys collection you can say dell dictionary of bracket like as if you're accessing that value but putting the dell there takes it out oh okay yeah and you also do that for lists so i can go in and remove it uh remove something from a list if i want

Starting point is 00:19:59 there's a remove function on the list but somewhat somewhat confusingly, potentially it's by value, right? So I could say, remove Jeff from the list and Jeff will no longer be in that list wherever he appeared. But if I want to say, remove the third thing, there's no remove at or anything like that,

Starting point is 00:20:17 right? I can't pass to, that's not a value, right? So Dell will let me remove that. You can also use pop for that. I believe on the list, but Dell's a little more general purpose.

Starting point is 00:20:26 And you can also delete slices. So I could say, go to this list and take out everything from 2 to 5. Yeah. You know, 2 colon 5, like that. All right, so these are all pretty interesting. Now, I'm linking over to the official docs that talk about it. And this article that kind of talks through some of these examples and shows you how to use it. You can also delete a variable out of like a local or a global

Starting point is 00:20:47 namespace. So if there's a variable that's been defined and you want it to not be defined, I can say del space variable name. And now it's as if I didn't do that line that defined it, right? That created it. Does it remove it from the namespace? It removed, yeah. It doesn't free the memory necessarily, but it takes it out as a global variable. Okay. Interesting. Or a local one, right?

Starting point is 00:21:08 Yeah. So does it actually free any memory, right? It depends, right? So if I have it in the global names, let's say it's a global, right? It has, obviously, the thing that has a value at that variable, it's taking up some memory. If nothing else is pointing at it, right, it's still going to be around because that global variable is pointing out. But if you call del that variable, you'll dereference that one reference to it, putting the reference count to zero and

Starting point is 00:21:34 freeing it up. So theoretically, you could free up memory using del. Similarly, if it's in a list, and the only place that points to it has a reference to it is that list itself, and you delete it out of there out of the dictionary, goes away right memory wise but if something else is pointing on it then obviously it's not going to go away yeah we also talked about how the c python bytecode compiler is dumb dumb as in not super optimizing maybe on purpose and i think you could also you know if you're like really dealing with memory issues and you you're like, I really wish this thing would just go away sooner in this one little edge case, you could probably use Dell to put in some of the optimizations

Starting point is 00:22:10 that you might hope that the compiler itself might do but doesn't. Like dereference a thing as soon as it's used within a function before you can get to the end or things like this. Yeah, okay. So is it for memory? Sort of, not really. But maybe as a side effect. Yeah.

Starting point is 00:22:28 This has been a long time, but I do remember it tripping me up because I was like, it seems a lot like delete, which should have a matching new to it. Exactly. Exactly. We both done the C++ thing, right? Like, where's the new that goes with Dell? I've never seen a new. Anyway, it's pretty cool.

Starting point is 00:22:44 There's a couple of links here there's a visual documentation there's the article understanding python's dell and then there's the reference to that bytecode compiler people can check out yeah in c++ i don't think there's a way to remove a name from a namespace yeah i don't think so either right yeah so you can like make it point at null but that's about it right yeah but? Yeah. But I mean, you got to think about it, right? Like classes, you could delete a field out of a class, right? Because it's just a dictionary, right? So much of Python is built on like dictionaries, right? Like the variables are, their variable names are the keys in the dictionary and their values

Starting point is 00:23:17 are their value. So you just take it out of the global dictionary effectively, right? Yeah. Okay, cool. Pretty sweet. So those are our main items for today you got anything else you want to chat about brian i'm just i'm glad it's summer it's starting to feel nice feels like summer but uh other than that not much how about you summer's

Starting point is 00:23:34 awesome it makes programming hard because programming is indoors although uh some of my friends and i who work from home we try to get out and program in like a coffee shop or a cafe by a lake or something. And periodically, we have the weird experience of getting a sunburn while writing code. And yeah, we've dubbed it a code burn. And it's kind of a badge of honor. That's funny. Cool. Yeah. So there's actually a couple of things I want to throw out here. We recently had Max Sklar from the local Maximum podcast. And afterwards, he had me on to his podcast. So

Starting point is 00:24:05 I'll be on episode 73, which should be out. Not yet, but thanks to time shifting, when this episode comes out, it should already be out. I'll put a link to that. Josh Thurston sent over a cool video of the popularity of languages on Stack Overflow over time as a bar chart race. I didn't know about bar chart races, but these are basically animated bar charts over time. And you just watch the bars grow and shrink. And it's really cool. Python is kind of like a little tiny consideration at the bottom. And obviously, we know that Python is crushing it on popularity and Stack Overflow and all those things. So it's like a minute and a half video. I think everyone will appreciate watching it if they just got a minute to kill. No, it's a fun video.

Starting point is 00:24:47 And one of the things I enjoy about it is early on, you see the Java bar going up and down based on the time of the year because it was used in education a lot. That totally made sense. Exactly. You're like, oh, there's a huge spike in September. I wonder why.

Starting point is 00:25:04 Maybe a bunch of people got a job. No, like CS101 is now back in session. Yeah. Exactly. Then the last one I want to throw out is this thing called Pinesource. So what this is, this comes to us from Anders Klint. It's basically a UML diagram creation tool for Python code. So you give it some Python files. It will generate a UML diagram that shows the relationship of all the classes in there. Oh, that's cool.

Starting point is 00:25:31 Yeah, it's pretty cool. There's a free, maybe even open source version. And then there's also a paid version. So you can buy it. I'm actually not a huge fan of UML. But if you have Python code and you think a UML diagram would help describing it, this thing's pretty cool actually. And it's a little GUI app.

Starting point is 00:25:50 There's a bunch of screenshots. You can check it out and see if it'll help you, but it looks pretty neat. And it does proper UML, not just like sort of visualization of classes. So that's kind of nice. My favorite use of these kinds of diagrams is to print them out and pin them to your wall, your cubicle wall, so that other programmers think that you're smarter than they are. Absolutely. Put some little cryptic notes on them, like as if, you know, you're marking them up.

Starting point is 00:26:16 Yeah, absolutely. Love it. Yeah, so you can do this with your project. Yes, this huge thing is our project. Anyway, it's pretty cool. And there's a free version, like I said. So maybe it'll help some folks out there. All right. You ready for some jokes, Brian?

Starting point is 00:26:27 Yes, definitely. All right. You've heard about the glass being half full and half empty and like, oh, I'm a half empty sort of person. I kind of see the world as slightly negative. Yes. So here's the developer version. So we have an optimist who says the glass is half full. We have the pessimist who says the glass is half empty. And we have the programmer who says the glass is twice as large as necessary. Yes, definitely. So I wanted to extend that with the pragmatist that says that I'm just allowing enough room for requirements oversight, scope creep, and schedule overrun.

Starting point is 00:26:59 That's right. It's perfect. I love it. And then you have this other one about software startups. Yeah, man. It's not really any startup, but I watched the upside with Kevin Hart last night, and it was a joke that I couldn't help but sharing. I can't remember the characters, but Kevin's character said, would you invest in my business idea? And the other guy says, that seems too niche. Kevin, what's niche mean? Oh, it's the girl version of nephew. It's terrible. I love it. That's bad.

Starting point is 00:27:33 If you got to ask, that's a pretty good answer. Yeah. Cool. Well, thanks for putting all the cool topics together, as always, and being here. Thank you. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.

Starting point is 00:27:48 And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #137 Advanced Python testing and big-time diffs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.