Python Bytes - #211 Will a black hole devour this episode?

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 211, recorded December 2nd, 2020. I'm Michael Kennedy. And I'm Brian Ocken. And we have a special guest. Matthew Feigert, welcome. Yeah, thanks so much for having me on. Yeah, it's great to have you here. You've been over on TalkPython before, right? Yeah. Talking about some cool high-energy physics and all that kind of stuff.

Starting point is 00:00:24 Yeah, I looked that up last night just toenergy physics and all that kind of stuff. Yeah, I looked that up last night just to try and remind myself. That was episode 144. I was on with my colleagues, Michaela Paganini and Michael Kagan, to talk with you about machine learning applications at the LHC. Yeah, and you do stuff over with CERN at the Large Hadron Collider and things like that. Yeah, yeah. So I'm a postdoctoral researcher at the University of Illinois at Urbana-Champaign.

Starting point is 00:00:52 And so there I split my time between working on the ATLAS experiment and working as a software researcher at the Institute for Research and Innovation and Software for High Energy Physics, IRISHEP. And so on ATLAS, ATLAS is this huge five-story tall particle detector that lives a hundred meters underground at CERN's large Hadron Collider. That's just outside beautiful Geneva, Switzerland. And so there I work with a few thousand of my closest colleagues and friends to try and look for evidence of new physics and make measurement precision measurements of physics we do know about. And then my Iris HEP work is kind of focused on working in an interdisciplinary and inter-experimental team to try and improve the necessary cyber infrastructure and software for us to be able to use in upcoming

Starting point is 00:01:34 runs of the Large Hadron Collider and in what we call like a high luminosity run, which is going to be way more collisions than normal. Have you guys ever turned it up to full power? Have you turned it up to full power yet? turned it up to full power yet no so yeah the design luminosity or the design energy of the lhc is uh at something called 14 tera electron volts 14 tv and we've been running uh intentionally at a lower operating energy for the last couple of years at uh just a little bit below that but in the late 2020s we're gonna suck the entire earth into it and that kind of stuff uh you know no no experimental evidence of black hole creations yet but kind of the cool thing is that if we even did make a black hole the lhc uh due to something called hawking radiation it would evaporate well

Starting point is 00:02:16 before it could actually ever do anything interesting gravitationally but yeah it's really exciting really but i'm joking but it's such a cool place, such cool technology. I mean, that's right out of the edge of physics these days. And the technology side is neat too. Yeah, no, it's super fun. Well, welcome over to Python Bytes. Yeah, it's great to be here. Yeah, it's great to have you.

Starting point is 00:02:37 Thanks for coming. And Brian, I think let's start with another one of my favorite topics. Farms? I love farming. you see the bumper sticker no farms no food i like food a lot so i love farms no no but the farm stack we've heard the lamp stack other stacks like lamp is not as useful as farm right farm sounds more useful so tell us about farm so uh aaron bassett he's um i i'm not sure, I think he's one of the spokespeople for Mongo or something, advocate or something like that.

Starting point is 00:03:09 Anyway, he's doing, he wrote this article, but they've also done, I think there's been some talks given, but this is a nice article. It's called Introducing Farmstack, which is FastAPI, React, and MongoDB. So I really actually appreciated the article and the code with it because there's a little GitHub to-do CRUD app that they've put together. And the article describes basically all of the pieces of the application using a little to-do app but with fast api you've got this is interactive um interactive documentation mode where you can interact with the application just almost immediately you don't have to really do much

Starting point is 00:03:58 to put it all together and and then for all your endpoints, you can actually interact with them, send data, do queries, and there's a little animated GIF to show how that's done. But the article then goes through and says basically how the endpoints and routes get hooked up and then uses UVicorn to set up an async event loop and get that going, shows how easy it is to connect to a database. And then defining models and how easy it is to set up a schema. And then it kind of hooks up, talks through the code discussion. You do have to write code for the endpoints and really how easy those are with all of these pieces. The React application is kind of a minimal React application.

Starting point is 00:04:49 I'm not sure why they kind of included that, but it's kind of a neat addition. There's a React application that's running that just sort of shows some of the interaction with the CRUD app, and it gets updated while you're changing things through the interactive API. And I just, I liked the demonstration of working through, working with an API and working through changing things and seeing it show up, having a, like a React app at the other end. It's kind of a fun way to kind of experiment with an API. This is a really neat thing. And one of the other major stacks that's kind of a fun way to kind of experiment with an API. This is a really neat thing.

Starting point is 00:05:26 And one of the other major stacks that's been used around Mongo is the mean stack. And the farm stack is way nicer than the mean stack, not just because it uses Python and not JavaScript, but there's some interesting things here. One of the examples is actually kind of blowing my mind in that it's an if statement using the walrus operator awaiting an asynchronous call in an API method.

Starting point is 00:05:48 Like the walrus operator and async, the await keyword, I've never seen those together. And it's kind of like, it's inspiring. It's nice. It's good. Yeah. It's such succinct code as well. It's super nice. I mean, it uses FastAPI, which is fantastic. It's using Motor, which is MongoDB's

Starting point is 00:06:06 officially supported Python async library, because you need an async capable library in order to do things against MongoDB. You know, this actually comes from the developer blog at MongoDB. There also are some ORM-like things, some ODMs, object document mapper stuff, that also supports async and await from mongodb so if you're more in the orm style you might check that out but other than that this looks pretty neat to me yeah yeah and i i do know that a lot of people use the orms but i like i appreciated the example without an orm uh for people because you throw an orm example in there and then people that don't use that particular one get lost so yeah matthew do

Starting point is 00:06:45 you guys do anything with mongodb any of these kind of things fast api uh yeah i have some friends that do i personally myself i'm um not too versed in mongo but uh i know i've heard it on the show and many many times elsewhere so this is i think also just uh just kind of paging through the article as ryan was talking about it is pretty impressive so it's really concise like here's your four lines to completely implement the API yeah type of things right asynchronous fast like all the cool stuff yeah yeah there was an example a case study of MongoDB being used at the large hadron collider but that was many years ago and I don't know if it still is so it's i've completely forgotten where that is yeah but um yeah yeah cool cool so next thing i want to talk about another programming

Starting point is 00:07:31 language last time brian i went on and on maybe the time before two times ago about dot net and c-sharp because anthony shaw had had done that work on pigeon to get python to run on dot net and we're like well why are we talking about c sharp on this project right on this podcast well i want to talk about something even more advanced apple script wow cutting edge yes it's like the cmd shells script of apple it's i don't have you ever programmed an apple script it's painful no i've not it's like you say like tell this application that to like make a command. Oh, it's bad news bears. Let me tell you.

Starting point is 00:08:09 So, what I've come across is this thing called PyAppleScript. Now, this is not brand new, but it's brand new to me. And there's a lot of talk about Macs and people maybe getting new Macs. So, I thought I would say, hey, look, here's a cool way to automate your Mac or Macs within your company or whatever with Python instead of this dreaded NSAppleScript. So basically it's a Python wrapper around NSAppleScript, allowing Python scripts or applications to communicate with AppleScript and Apple Scriptable applications. So apps for which they basically implement AppleScript and AppleScriptable applications. So apps for which they basically implement AppleScript and let you do that.

Starting point is 00:08:52 So scripts get compiled either from source or they can be loaded from disk. They have these, some of these ideas are from AppleScript as a standard run handler and user-defined handlers can be invoked with or without arguments. They're automatically converted. The responses to and from AppleScript are automatically converted either from AppleScript to Python types like Python string versus AppleScript one or vice versa, right? So you don't have to do the type coercion, which is cool. And they're persistent. So you can call your handle multiple times and it retains its state like AppleScript would. And it also has no dependency on the legacy AppleScript library

Starting point is 00:09:25 or the so-called flawed scripting bridge framework, which is limited to OSA script executables. So that's pretty cool. If you want to automate things on your Mac, you obviously could use Bash. But if you're talking to some kind of application that implements one of these scripts, like for example, you want to tell this other application to grab something out of the clipboard and then tell it to do something or something like that, right? Like you couldn't reasonably do that with Bash, right? Once it starts up, you kind of want to go back and forth with it. So it sounds like Apple Scripts might be the thing to do.

Starting point is 00:09:58 Pretty cool, huh? Yeah. Yeah. I mean, not a lot to it. Like if you've got a script, your Apple macOS stuff, do it with Python. You don't have to do it with that Apple script stuff. No, it's neat. Yeah.

Starting point is 00:10:11 Yeah. So, Matthew, you probably brought something to do with physics, data science, I'm guessing. What's your first one here? Yeah, a bit. So we currently live in this, like, really nice age of having awesome CI services and all these really nice metrics for all your GitHub projects and everything. So, you know, if you're, I'm thinking of like coverage. So if you're, you know, using PyTest and, you know, making sure that you're reporting your coverage, you have all these really great services to also track your coverage and report that in

Starting point is 00:10:39 iShiny badge. But let's say you're developing some tool or some library and you have some, some sort of performance metric that you care about. Let's say like how fast some some the speed of evaluation for certain expensive functions. And you actually want to try and like track that through the entire history of your code base. And that's not something that's like traditionally very super easy to do. So recently, I was really happy to find. So if you're making changes, so if you're going to be adding some feature or whatever, you are refactoring it. So it's easier to write, but you're not sure if that makes it faster or slower. This would sort of give you that information

Starting point is 00:11:13 from week to week or something like that. Exactly. Yeah. So you might like, you might go ahead and say like, okay, well, you know, I have like some, some tests that make sure that this function evaluates and under some period of time if it's an expensive function for your test. But let's say you actually want to like track across like different parameterizations, how that function actually is being performing and evaluating it in your whole code base.

Starting point is 00:11:39 So I've recently found this super cool tool written in Python called AirSpeed Velocity. And so from the docs, ASV, AirSpeed Velocity, is a tool for benchmarking Python packages over their lifetime. So it deals with runtime, memory consumption, and even custom compute values. And the results are then displayed in a super nice web front end that's interactive and basically just requires like static web page hosting uh so um it's it's pretty impressive and just if you click on the docs you can see that's developed by a community of people but um led by uh michael uh dorit boom i'm probably getting your name wrong very sorry and uh paulie uh burton uh but if you look at he's the guy that

Starting point is 00:12:26 who was behind uh pi oxidizer at mozilla oh really oh okay yeah that's a super cool project so yeah for sure yeah um yeah and so i mean if you look at the other people that are on the contributor list you can you know spot a lot of names that are uh common in SciPy and Jupyter ecosystem. So you already know that this is a nice community-built tool. And then also, as kind of some example cases, they give current projects that are using it, like NumPy and SciPy and AstroPy. So pretty well-established projects. And just as kind of like an example,

Starting point is 00:13:01 if you click on the SciPy project and go to the interpolate function there, you can just kind of look at a very nice visualization of the actual evaluation in time on the vertical axis across a whole bunch of parameterizations, such as like CPython version and number of samples that are being run. And you can see this for the entire lifetime of the code base, and you can zoom in on any section just with the mouse. And something I think that is super, super cool is if you're looking at the visualization of the plot and you see that, oh, there's like one commit where all of a sudden things go funky and the evaluation time just jumps up. You can just click on that node and it immediately opens up to that commit in GitHub, which is, I think, super awesome that you don't have to go and like search through your commit history to figure out what like where that corresponds to.

Starting point is 00:13:48 It's just boom, right? I'm looking at it shows the the Shaw from GitHub. Yeah, the the the unique identifier of the commit. That's crazy. Yeah. So, yeah, so I've I've, you know, a project that I'm working on, we've been interested in trying to have the sort of like metric tracking for some of our for some of our work so this is something that i'm actively kind of uh looking at how we might be able to deploy this for one of my projects with my co-authors

Starting point is 00:14:13 but it's openly developed on github it's up on pipe pi pi as well so just pip install asv and then i think something that's kind of very cute and very kind of Pythonic is that if you when you go to the reporting dashboard for the different libraries that you're actually benchmarking, it will up at the top, say the airspeed velocity of an unladen X. So the airspeed velocity of an unladen like NumPy or an unladen SciPy. So, you know, keeping very true to the, you know, Python's roots roots there there's some monty python uh the the show zen in there for sure exactly yeah this is impressive i mean brian how do you see this fitting into like testing and stuff i actually love this i i could use this

Starting point is 00:14:57 right away there's um there's lots of well a lot of times it's it's not um yeah performance of performance is always something we care about and and benchmarking systems um and you know testing uh it's always it's something you forget about sometimes you like running um running stuff and it still works but like over time things slow down and it's good to good to know that yeah and if this could just be automatic and just part of your ci you just go back and see the updates. That'd be very cool. Definitely.

Starting point is 00:15:28 Yeah. I don't think that this is something that at the moment, and I'm happy to be corrected about this, I don't think at the moment there is some way that this is currently being given as like a CI service. But I think that this is something that you could like set up and run for yourself pretty easily. Yeah, you could probably plug it in. Yeah. Yeah, exactly. But you could probably do some kind of web hook when there's a check-in, automatically kick it off and then save a result, right?

Starting point is 00:15:54 You could just hook into the GitHub actions and then have it just call you back and start your, you know, let's take a record of this or whatever. Yeah. Yeah, very cool. This is a great idea. Yeah. Something else that I haven't really investigated yet, but that I'm looking into is if this let's take a take a record of this or whatever yeah yeah very cool this is a great idea yeah something else that i'm i haven't really investigated yet but that i'm looking into is if this can also be used uh to do like gpu benchmarking so like let's say you have a library

Starting point is 00:16:13 that you know also that is going to be uh you can transparently uh use the apis to transparently move from cpu to gpu like you have something like Jax or TensorFlow or PyTorch, then this might be kind of a nice way if it's based on those to be able to benchmark your GPU performance as well. Yeah. Well, and that's one of the things you might not test, right? If it could run either way, you might just run it on your machine, whichever one of those it is, and forget to try the other style, right?

Starting point is 00:16:40 Exactly. Yeah. And I don't think there's too many ci services that are gonna you know generously give you some like really nice gpus to be doing benchmarking on yeah that's for sure for sure all right now for the next item let me tell you about our sponsor this episode is brought to you by tech meme the tech meme ride home podcast they've been for two years recording episodes every single day. So they're Silicon Valley's favorite tech news podcast. And you can get them daily, 15 to 20 minutes, exactly by 5 p.m. Eastern, all the tech news you want.

Starting point is 00:17:15 But it's not just headlines, much like By Them Bytes, actually. It's a very similar show, but for the broader tech industry. You could have a robot read the headlines or just flip through them. But it has the context and the analysis all around it. So it's like tech news as a service, if you will. So the folks over at TechMeme, they're online all day reading to catch you up. And just search your podcast app for the Ride Home and subscribe to the TechMeme Ride Home podcast. Or just visit pythonbytes.fm slash ride to subscribe i have a theory a hypothesis

Starting point is 00:17:46 about uh this i think that probably actually be a ton of work to put together a show daily on a time like that but it's great that they're doing it do you have any other hypotheses brian yes uh my hypothesis is that um there's not enough examples out in the world of how people are using Hypothesis in the field in real-world applications. So I'm excited that Parsec put it together. So Parsec... Well, let's take a real quick step back just for people who don't know.

Starting point is 00:18:16 What is Hypothesis? Oh, okay, right. Hypothesis is a testing framework. Well, it's not really... It attaches to other testing frameworks, so you can use it with Unit Test or PyTest. You you can use it with unit test or pi test. You probably should use it with pi test. But it's a way, instead of writing a declarative single test or test case, you can. It's a property-based testing.

Starting point is 00:18:37 So you describe kind of. It's not like I expect one plus two equals three. I expect if I add two integers and they're both positive that the result is going to be greater than both of them. You know, you have like these properties that you describe what the answer is. And the examples that Hypothesis and other tutorials on how to use use hypothesis um have given are more of these like a plus b sort of things they're simplistic things and i and i do i do see a lot of value in hypothesis and i know a lot of people are using it but there haven't been a lot of good descriptions for really how it's being used how like a real world example of how it's being used because um i'm i'm probably not going to,

Starting point is 00:19:26 I don't have those little tiny algorithm things. I've got big chunks of stuff and in hypothesis does have to run the test many times. So how do you do this effectively on a large project? So I love seeing this article. So Parsec is a, is a it's a clientside encrypted file sharing service. I'd never heard of them before, this blog, but it sounds cool.

Starting point is 00:19:49 It's cool. They describe themselves as the zero-trust file sharing service like Dropbox, where it's end-to-end encryption for Dropbox. Yeah. You could share the files, but it only matters if you actually have the key, right? Right. Actually, I have no idea. Sure. I suspect so, yeah. It sounds like a cool service, actually.

Starting point is 00:20:07 It sounds pretty neat. But they so they describe what kind of what they're doing there. And some of the problems. It's a it's a large four year old asynchronous Python project. And and then they describe this RAID redundancy algorithm that they need. It's fairly complex with a bunch of servers and stuff, a bunch of data stores going on. And what they need to test is they need to check things like if the blocks can be split into chunks and if the blocks can be rebuilt from the chunks that were split up before. And then if you can rebuild them if you've got missing chunks.

Starting point is 00:20:46 And so this all sounds fairly if you can rebuild them, if you've got missing chunks and, and so this, this all sounds fairly, you know, yeah, I can understand how you could try to test that, but there's a lot of variables in there. How big is the chunk size? How many chunks, how much stuff should be missing? Um, and all that sort of stuff. And, um, and that then they're, they're thinking, yeah, hypothesis would be good for that. The normal tutorials talk about a stateless way to test with hypothesis, but they're saying that for them, the stateful method that is supported is very useful because they're an asynchronous system and they describe how to do that. It's actually a fairly complex description and it's, it's kind of a lot to get through,

Starting point is 00:21:28 but it's neat that the power's there. So it does, you know, walks through how they, exactly how they set up a test like this. And this is something I think the, the testing community of considering hypothesis has been missing. So this is great.

Starting point is 00:21:45 They, they end with a, some recommendations, which, the testing community of considering hypothesis has been missing. So this is great. Um, they, they end with a, uh, some recommendations, which, um, I it's,

Starting point is 00:21:51 it's great. So the recommendation is for parts of your system that, uh, which parts should you throw hypothesis at? That's a really good question. Cause you don't want to throw it at everything. Right. Um,

Starting point is 00:22:01 cause there is some expense to set it up and also to run everything. So there, they describe it as if you're, um, everything right um because there is some expense to set it up and also to run everything so there they describe it as if you're um if the piece you're testing is kind of an encoder decoder thing like theirs is you're splitting things into chunks and then rebuilding things um that it's a hypothesis is a no-brainer for that because you can you can compare is that is my input the same as the, uh, encoded then decoded output. Um, the, the other cases, if you have a simple Oracle, simple Oracle, like it's simple to test the answer, but it's complex to come up with the answer. Um,

Starting point is 00:22:36 I'm not sure how what that is, but in the case, you know, some of the cases are, um, you know, I've got a complex system and, and I just, there's properties about the output that are easy to describe uh the other one is uh yeah it's i guess similar as if it's hard to compute but easy to check um well one example that just jumps out at me right away is anytime you have a file format i'm going to save this thing be able to save and load these files right because all you got to do is load up a whole bunch of random data say save load is it the same if it's not that's a problem yeah yeah yeah and actually

Starting point is 00:23:10 um i have talked with some people that uh that have thrown this at um uh some of the the standard library um modules just on the side to test uh because there's a lot of standard library stuff that's like kind of encoding, decoding sort of thing, or two-way conversions. Yeah, cool. Yeah, this is super nice. I'm going to have to really dig into this article in more detail. I remember the first time I learned about Hypothesis was when one of the core devs gave a talk at SciPy 2019,

Starting point is 00:23:42 and it just blew my mind then. And so this is so cool to see this very, very interesting application here. Yeah. Yeah. It seems like there's a lot of uses in data science. Data science seems tough to test, like that scientific computation side, because slight variations, you might not get perfect equality, but it's close enough, right? It's like, well, it's off, but it's like you know 10 to the negative 10th or something off right that doesn't actually matter but the equality fails yeah yeah you end up using numpy as uh you know uh numpy as approximation comparison schemes quite a bit in

Starting point is 00:24:18 your in your yeah in your pi test i can imagine i can imagine very cool all imagine. Very cool. All right. Next one, Brian, I told you about last time I talked about, I'm still waiting on my Mac mini, right? I ordered the Apple, the M1 Mac mini maxed out and I'm a little bit jealous. My, my daughter is getting a new Mac mini. She doesn't, or Mac air. She doesn't know what about, but it's supposed to show up tomorrow and mine's still weeks away. And I don't think that that's very fair. But if you are an organization

Starting point is 00:24:50 that depends on cloud computing and what organizations don't these days, right? They almost all do. It was just announced at reInvent that AWS is going to be offering Mac instances as a type of VM. So until now, you've been able to get Windows, Linux. That's it. So for all those people out there who are offering some kind of iOS app, even if they're not like a

Starting point is 00:25:11 Mac shop, they still have to have Macs around because you can't compile and sign your IPA, your Mac, whatever iPhone app format is. You can't create those without a Mac. So there's all these Macs that are around for like continuous, you know, CI CD or checking those things and whatnot. So now you can go to AWS and say, I'll take a Mac mini, please. That's pretty cool. That's cool. Yeah. So you can do your tests up there and they don't have M1 yet. Those are the Intel ones, but the M1 chips are coming later. So you'll be able to do it. What's interesting about this offering from AWS as basically any cloud service, you would imagine it's a VM, right? But these, when you say I want one of these,

Starting point is 00:25:50 you actually get a dedicated Mac mini. That's, you get pure hardware. Well, that's why you can't get yours because Amazon bought them all. They did. They had a huge truck full of them. Well, they bought the Intel one. So those were on sale, I bet anyway.

Starting point is 00:26:07 But no, they have some interesting, what do they call it nitro i think they call it their nitro service or something like that which allows them to virtualize actual real hardware so this is pretty neat you can sign up the billing is interesting you have to pay for at least one day's worth if you get it which i think is like 24 if you going to run it continuously all the time, this is one pricey sucker. Like the Mac mini you can get now is $700. This is $770 a month. Oh, okay. So if what you need is like a couple of Mac minis, you're probably, and you need them on all the time, you're probably better off just buying a few and sticking them in a closet,

Starting point is 00:26:46 especially the M1s. But if you just need one on demand every now and then, or you need to burst into them or something like that, that could be interesting. Yeah. Yeah.

Starting point is 00:26:53 If you're back old school and you only release like once every three months. Well, there was some conversations like, well, if your data is already stored in S3 and you have like huge quantity of data and what you need to run is actually running like some video processing on the Mac, you

Starting point is 00:27:09 could do it by the data instead of transferring that kind of stuff. Things like that might be interesting. I don't know. I would go ahead and throw out there also that this is all interesting. I have links to this kind of stuff and whatnot, like the blog post announcing it and so on. But there's also this thing called Mac Stadium. And if you look at Mac Stadium, it's pretty interesting. You go over there and say,

Starting point is 00:27:29 give me a dedicated bare metal Mac mini in their data center, $60 a month. So you can actually get a decent one for a decent price over there. So if you just want one running all the time, it might be good. But the thing is, if you're already deeply integrated to AWS, maybe this is a good price over there. So if you just want one running all the time, it might be good. But the thing is, if you're already like deeply integrated to AWS, maybe this is a good thing.

Starting point is 00:27:48 Yeah. Yeah. Matt, is there anything you... Yeah, go ahead. I was just going to say, this seems pretty interesting. I mean, I know one of the reasons that I love using GitHub Actions

Starting point is 00:27:56 and Azure Pipelines is the ability to be able to get access to Mac VMs for builds. But if you... I could also see this being really interesting and useful if you have some very huge application or some very large stack that you want to be able to do CI or tests on, that this could be really, really nice,

Starting point is 00:28:18 especially if you don't just want to be pounding and destroying one Mac over and over and over again. This is nice, especially if you have a distributed team. Yeah. Which every team is basically a distributed team. Yeah. Yeah. Welcome to 2020. One thing that's interesting about this is you can literally press a button or even just through the AWS, probably the Bodo API, you can just make a new Mac instantly. Like within seconds, you can have a clean, pre-configured Mac.

Starting point is 00:28:46 You can create AMIs, the Amazon machine image, which are like, I install a bunch of stuff and get it set up and then like save it so I can respawn new machines from it. Those are pretty interesting options

Starting point is 00:28:58 that just having a Mac meeting in the closet, you know, push a button, make a brand new one, try this, throw it away, make it a different way, throw it away. Like there are some use cases here that could be interesting. That said, I won't be using it. I'm just going to buy a Mac mini if I can ever get it. All right, Matthew, what's this last one

Starting point is 00:29:13 you got for us? Yeah, I don't have any clever transition, but all right. So maybe, I don't know about you, but I end up having to deal with a lot of JSON serializations of different statistical models and sometimes also getting CSVs of different data sets that I want to be doing analysis on. And your first instinct might just be to say, okay, I'm just going to open this up in Pandas and start to get to work on it. But if you kind of are used and comfortable to working in the Linux command line, kind of ecosystem of data tools, you might be itching a little bit

Starting point is 00:29:50 and wanna kind of just peek inside at the command line level and kind of get to work there. And so in that case, you might be really interested in this tool called VisiData. So VisiData is written on- This is blowing my mind actually. Yeah, it's like when I saw this, my jaw was kind of on the floor. So we'll make sure this is linked in the show notes because it has some really cool videos.

Starting point is 00:30:12 But so from the docs, so it's visit data is described as data science without the drudgery. So it's an interactive multi-tool for tabular data. It combines clarity of spreadsheets with efficiencies of being at the terminal and also, you know, the power of Python 3 on a really lightweight utility that can handle millions of rows with ease. I can attest to that personally. I've opened up like four gigabyte CSV files before and it just, you know, drops right in and starts asynchronously loading like a champ. In addition to that, it supports kind of a really astounding number of file formats that it supports. Currently on the website, it says it supports 42 different file formats.

Starting point is 00:30:52 So it supports things that you would expect like CSV and JSON. But then it also supports things like Jira. I guess like whatever Jira uses for their sort of like tabular stuff. It also can like read my, my SQL. And I guess it can also even deal with PNG, the image file format, which I was,

Starting point is 00:31:12 you know, impressed by. So this is all openly developed. The output is a terminal, right? Yeah. Like text. Yeah.

Starting point is 00:31:21 Yeah. So this is all openly developed on GitHub by a guy named Saul Pawson, I think. And if you go to the if you go to the Visadata website, it also has plenty of links to live demos of him doing kind of interactive examples of visualizations. There's one lightning talk that he's given that I think PyCascades 2018 or something like that, where he's able to basically find complaints about rodents and then filter on rat complaints and then plot that inside a visit data still on the terminal to basically make a visualization of like rodent distribution in the New York City boroughs. So I thought that was, you know, quite amusing and really cool. It's also, you know, this is a

Starting point is 00:32:23 Python application. So you might not want to, you know, continuously install this in every single virtual environment you make. So, I mean, it is up on PyPI, so you can just do pip install visit data. But since it's an application, you probably might also want it just kind of as a generic tool in your machine. So it's distributed through a lot of nice common package managers. So if you're on Linux, they've got it on Apt, as well as things like NX and GUX. But I didn't see it on Yum. So if you're on Fedora or CentOS, you might be a little bit out of luck.

Starting point is 00:32:57 You might have to do it manually. It's, of course, on Homebrew and even CondaForge. And it's not listed there, but a very, very cool tool that's been featured on the show before, which is PipX by Chad Smith. Yeah, PipX is awesome. It's so good. I love it. I tested this last night.

Starting point is 00:33:14 I just fired up a Python 3.8 Docker container and went ahead and installed PipX and then used PipX to install Visadata and was able to drop right into Visadata as expected so it's a very very cool and just the the power that you can have with it i think is worth checking out for anybody who is doing data analysis with tabular data this is super cool i love when people build these tools that are kind of you don't really expect them to be so powerful and you talked about how you just dropped in and grabbed some random data and started answering questions. And that's super neat.

Starting point is 00:33:46 Yeah. Yeah, the number of inputs. And because it's open source and because of all the other examples of data types, I think even if you have a different data type, it shouldn't be too hard to modify this to handle something different. I do notice I'm excited about it. It does have PCcap files for packet capture these are for communication uh packets talking to all your devices and all your hardware at your company right well this is like even wi-fi packets and cellular packets uh that's how we debug those so nice it's very cool yeah very cool and pipx is great uh i install a bunch of apps

Starting point is 00:34:23 like glances which is a fantastic like visualize the state you know like top but way way better the hdpi which is great for it's a better but much much better curl but the most important thing i install that way is a pie joke so now i can type a pie joke on my command line and we're always right there so speaking of which uh move on to our extras that's that's our all of our main topics brian you got anything this week oh i did i was i haven't dropped them in where'd my extras go yeah well you got it i just wanted to bring up that uh the uh pycon 2021 is going to be virtual and uh there's a's a website up, um, it's us.picon.org slash 2021. Um,

Starting point is 00:35:07 and, uh, there's not a lot there yet, but you can check out what's going to happen. I am. It's not surprising that this there, they have to start planning it and there may as well plan it as a virtual event.

Starting point is 00:35:20 Um, it's kind of hoping that we would have live, but I understand. Yeah. I mean, hike on is my geek holiday i love it's both work but it's also just such a nice getaway to connect with everybody you everyone else we know from the community um listeners i'm gonna miss not having it yeah i'm glad do you attend sorry brian yeah it's good that they they're i always check whenever they

Starting point is 00:35:43 announce the date to make sure it doesn't overlap Mother's Day. Oh, yeah. That's not good. Yeah. So I have unfortunately not attended PyCon yet in person, or, I mean, well, it was canceled this year, so maybe I'll attend this year remote. But I'm a regular attendee of the SciPy conference,

Starting point is 00:36:06 which this, so this past year, SciPy 2020 was moved online. And I thought that the organizers did a fantastic job of actually writing it online while still, you know, keeping kind of that SciPy community feel. So that was helped a lot also by, you know, plenty of bad puns. So I think that might be something that still comes through for a pycon 2021 maybe yeah absolutely um one of the live listeners muhammad said uh asked if it's gonna cost money or if it's gonna be free this year to attend did you notice anything brian i haven't looked i'm looking around and i don't know that it costs anything it's from what i can tell i don't see any pricing what i saw was sponsor information to get sponsors to sign up

Starting point is 00:36:51 to be part of whatever they're doing there but i i can't tell yeah somebody knows throw in the chat or put it into the you know visit pythonbyst.fm slash 211 and put it in the comments down there all right i got a couple here. First of all, we're trying out live streaming here and I think it's going pretty well. It seems like it's working out. There's a bunch of people watching. So if you want to get notified and we happen to keep doing this, just visit pythonbytes.fm slash YouTube and it should have like the scheduled upcoming live stream. You can like get notified there. So we'll, maybe we'll keep doing this. It's been fun. Thanks for everyone out there who's watching right now. And in addition to PyCon,

Starting point is 00:37:29 which you just announced or mentioned the announcement of, that is the main way that the PSF is funded. But they're also doing a dedicated offering sort of fundraiser thing with six companies to help raise some money for the PSF and TalkPython training is being part of that. And 50% of the revenue of a certain set of our courses that are sold during the month of December goes directly to the PSF. And people who buy those courses through the PSF fundraiser also get like 20% of a discount. So there's a link in the show notes for people to take some of our courses and donate to the PSF. If you'd rather just directly donate, that's fine. But if you're percent of a discount. So there's a link in the show notes for people to take some of our courses and donate to the PSF. If you'd rather just directly donate, that's fine. But if you're

Starting point is 00:38:09 looking to get some of our courses anyway, you can do it this way and support the PSF. They're hoping to raise $60,000. Hopefully we can do that for them and we'll see. And Brian, you announced BigPyCon. Another thing that got announced is SmallPyCon, PyCascades, Cascades being the mountain range that connects Portland, Seattle, and Vancouver. And traditionally this conference is cycled between those three cities. I don't even remember anymore what it was supposed to be this year. I think it's supposed to go back to Vancouver, but it's not going to Vancouver because nobody's going anywhere. So PyCascades is online and those do cost money. It's $10 for

Starting point is 00:38:43 students, $20 for individuals and $50 for professionals to support that conference. But I'll link to that one since that's one of our local conferences, if you will. Yeah, they're trying to push, they often push what's going on, what kind of try new things. So it's a neat conference. Yeah. Yeah. I enjoy my time there as well. All right, Matthew, what are you got for us? Anything else you want to get a shout out to? Yeah, just a few items. So Advent of Code 2020 has started now. It's day two, but there's still plenty of time to get involved with that if you want to. And for those of you who might not know, Advent of Code is just an annual kind of coding challenge that takes place every December. And it's just basically 25 days of fun and interesting programming challenges.

Starting point is 00:39:26 So it's always a great opportunity to try and brush up on your Python and maybe learn about some interesting collections that you might not have known about in the standard library. So that's going on right now, worth checking out, I think. And then I'm going to sneak in some very small physics-related follow-up to Python Bytes episode 205, in which awkward arrays were talked about. So the lead developer of awkward arrays is my friend and colleague Jim Pivarsky, who is one of my Scikit-Hep co-collaborators, as well as also a member of iris up and as of today which is recording december 2nd uh awkward v 1.0 is a release can is up on pi pi so by the time that this goes live if you just do pip install awkward you should get awkward 1.0 releases instead of

Starting point is 00:40:20 having to do no more awkward one exactly no more Exactly. No more awkward one, no more awkward zero. All that jazz. It's so good to have the actual install statement be awkward itself. Exactly. So that's a nice little tidbit. And I think that there's some nice links in episode 205 if people want to learn more about awkward.

Starting point is 00:40:38 But that's kind of a backbone of kind of the Pythonic ecosystem for physics right now. And then finally, I just want to give some kudos to Python Bytes as well, specifically for making full transcripts of the shows available to view on pythonbytes.fm. Not only is this, I think, like a cool idea in general, but I think this also makes the show more inclusive to the deaf Python community, which is definitely out

Starting point is 00:41:02 there. And one of my good friends and co-authors is deaf. And I know that he definitely appreciates this. So good job on you guys for being more inclusive of the wider community. Oh, that's so cool. I didn't know anybody was utilizing it. Yeah, that's awesome. Thank you. I think it's absolutely critical for that because the format is only audio.

Starting point is 00:41:24 But a lot of folks have reached out and said they also appreciate it if they're English as a second language. And they're not as good with English as well. So that also helps, I think. They're like, what was I saying again? What a weird word. Awkward array? Why would they talk about that? It doesn't make sense.

Starting point is 00:41:40 Yeah, transcripts and closed captioning is just more inclusive for everyone. So that's awesome. Yeah, thankss and closed captioning is just more inclusive for everyone. So that's awesome. Yeah, thanks. All right. Well, let's wrap it up with a joke, huh, Brian? Yeah. All right. So you guys, I'm going to need your help here.

Starting point is 00:41:55 I'm going to let, Matthew, I'm going to let you pick. Do you want to be Windows or Apple? I'll be Windows. All right. Brian, you'd be Apple. So the idea is like the title here is how to fix a computer, any computer. So instructions for Windows. Go ahead, Matthew. So step one, reboot. And then the flowchart goes to did that fix it? If no, proceed to step two. Step two, format your hard drive and then reinstall Windows. Lose all of your files and quietly leap. Brian, Apple doesn't have that problem. There's some totally different solution there.

Starting point is 00:42:28 Okay. For Apple, it's step one, take it to an Apple store. Did that fix it? If no, proceed to step two. Step two is buy a new Mac, overdraw your account, and quietly weep. That's me right now. All right. I got the Linux fix. It's so easy. It's totally like, you don't need those things. So you learn to to c you learn to code in c++ you recompile the kernel you build your own microprocessor out of spare silicon you have laying around you recompile the kernel again you switch distros you recompile the kernel again but this time using a cpu powered by the reflected reflected light from saturn you grow a giant beard. You blame Sun Microsystems. You turn your bedroom into a server closet

Starting point is 00:43:08 and spend 10 years falling asleep to the sound of worrying fans. You switch distorts again. You abandon all hygiene. You write a regular expression that would make any other programmers cry blood. You learn to code in Java. You recompile again,

Starting point is 00:43:20 but this time while wearing your lucky socks. Did that fix it? No. Proceed to step two. Revert back to using Windows and Mac, or Mac, quietly weep. There's really no good outcome here. They all end in quietly weep.

Starting point is 00:43:33 As a Linux user for the better part of a decade, I can neither confirm nor deny how accurate that last part is. Yeah, they all have their own special angle. It just takes longer to get there with Linux to get to your destination, I guess. Yeah. All right.

Starting point is 00:43:50 Well, that's fun as always. And everyone watching on YouTube, thanks for being here live and everyone listening. Just thank you for listening. Matthew, thanks for joining us. Hey, thanks so much for having me. This was really fun. Yeah, yeah.

Starting point is 00:44:00 Great for the items you brought. Enjoy them. And Brian, thanks as always, man. Thank you. It's been fun. Yep,, thanks as always, man. Thank you. It's been fun. Yep, yep. See ya. Bye.

Starting point is 00:44:07 Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchcken, this is Michael Kennedy. Thank you for listening

Starting point is 00:44:29 and sharing this podcast with your friends and colleagues.

Python Bytes - #211 Will a black hole devour this episode?

Topics covered in this episode: Introducing FARM Stack - FastAPI, React, and MongoDB py-applescript airspeed velocity visidata Extras Joke See the full show notes for this episode on the website... at pythonbytes.fm/211

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Python Bytes - #211 Will a black hole devour this episode?

Topics covered in this episode: Introducing FARM Stack - FastAPI, React, and MongoDB py-applescript airspeed velocity visidata Extras Joke See the full show notes for this episode on the website... at pythonbytes.fm/211

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.