Python Bytes - #95 Unleash the py-spy!

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 95, recorded September 12th, 2018. I'm Michael Kennedy. And I'm Brian Harkin. Hey Brian, how you doing this fine, fine Wednesday? I am excellent. Nice. It's also excellent that Datadog is sponsoring the show. So before getting further, tell them thank you, pythonbytes.fm slash datadog. There's a cool shirt if you go there and follow along. We'll talk more about that later.

Starting point is 00:00:25 You know, I feel like summer's coming to an end, and I've been quite lazy all summer. I'm not sure I'm ready to get back into the main swing of things, but it's upon us. Yeah. And do you know who else is lazy? Programmers are lazy. Productively lazy. They put lazy to good use. Yes.

Starting point is 00:00:42 They make lazy a virtue. And so this was our segue from nothing into the first item, which is Dataset. And Dataset is a Python package that bills itself as databases for lazy people. And this is actually something I totally want to try because it looks fun. Their premise is programmers are lazy. Oh, it says at first, I'll just read some of the top. Although managing data in relational databases has plenty of benefits, they're rarely used in day-to-day work with small and medium scale data sets. But why is that?

Starting point is 00:01:22 It's because people are lazy and they'll throw it in JSON or CSV instead. Oh, they say the answer is programs are lazy. They'll use the easy solution. And I guess I can't disagree. I've used JSON format as essentially a local database before. But this is kind of cool. What it is is that it's built on top of Alchemy. So it's built on top of SQL Alchemy so it can work with any base or a sql style database and it's just really easy it looks kind of like um a no sql it's kind of hard to describe uh

Starting point is 00:01:52 of course over the air but it's pretty pretty simple and worth checking out i think yeah i like it it does automatic schema creation upserts it has query helpers like distinct and stuff like that so if you were to say i'm just going to use like an in-memory dictionary or other things like that it you know it's kind of nice that it helps with some of those things so you just said a couple terms that i don't even know what those are so absurd absurd is i have a record and i'm going to try to save it to the database if it does not exist in the database an update update would fail, right? But if it already exists, an insert would fail because there'd be a duplicate key. An upcert says,

Starting point is 00:02:30 hey, data access layer, take this thing and save it. If it doesn't exist, put it in there as a new thing. If it does, make an update and set the values to the new one. Okay. And it's also deals with sparse stuff. So one of their initial examples is, let's say you've got people or something. And then the first person you give them a name and an age, and you can insert that. sparse stuff. So one of their initial examples is, let's say you've got people or something, and then the first person, you give them a name and an age, and you can insert that. The second person comes along, and you give them a name, an age, and a gender. And then you can, you know, search easily. And it, yeah, like you said, it deals with the schema for you already, and you don't have to deal with that. Yeah. And the example that you have in the show notes here uses SQLite as the backend database, but it uses the memory connection version. So you can just load it up with data and then query it and work with it. And then when your app shuts

Starting point is 00:03:14 down, it just goes away. Maybe you output stuff to a JSON file or whatever. So you don't even have to store the database necessarily. Yeah. To be able to use some queries on information, that's interesting. But you can also just play with it this way and then turn it into to a file stored fileback database as well with even with SQLite or with something else. Yeah, absolutely. Yeah, it's a good find. It's quite interesting. I like it. Yeah. So I have a question for you, Brian. Okay. Why is NumPy, not this thing that we're going to talk about next, but NumPy itself faster than like regular Python? Do you know?

Starting point is 00:03:47 I think because it's got stuff compiled in C. Exactly. Because it's written in C and it can even do parallelism and stuff. It could take advantage of cores. Like my new laptop is ridiculous. It has 12 cores. Nice. That's a lot, right?

Starting point is 00:03:58 So maybe, you know, maybe like we could actually take advantage of that with NumPy. Well, if you have NumPy code, this thing that I'm going to tell you about takes it to another level. It's called Kupy, I think is how you say it, because I think it's based on CUDA. So Kupy is what I'm going to go with. And its full name is Kupy GPU NumPy. And the idea is it's a API compatible library with NumPy. So all the NumPy features that it has, that NumPy has, you can call the same functions on KooPy, I called it. But instead of running on your, you know, six, four, whatever cores you have on your machine, it runs them on the GPU cores, which is insane. Oh, wow. Okay. Isn't that cool? And I looked, I looked and i did a quick a little search just

Starting point is 00:04:47 like hey what's like a modern machine learning or data science type gpu you might get so pretty standard one might be um the geforce gtx 1080 ti okay these things are getting super expensive because of all the bitcoin miners and stuff but anyway you get one of these things and it has 3584 cores 3000 and you can run your code parallel on all of those wow so instead of you know like having like oh my gosh i can't believe i have 12 no you have 3500 and all you have to do the only line of code you have to change is instead of import numpy as np which is the very common thing that people do you would say import coupai as numpy that's it and now you're running on the these cuda cores doing gpu backed data science okay do you remember what cuda is i don't know what CUDA stands for. I bet it's an acronym because it's all capitalized.

Starting point is 00:05:45 Yeah. There's a lot. There's like layers upon layers of acronyms here. I don't actually know what CUDA cores are. Okay. Yeah. It didn't mean to put you on this. It's the mechanism of parallelism on the GPU, basically. Okay.

Starting point is 00:05:59 Nice. But I don't actually know more than that. Yeah. Isn't that cool? I like it. Yeah, yeah. So it's really cool. It has this compatible API.

Starting point is 00:06:07 I threw a little code sample in the show notes there. And if for some reason you're like, you know, I actually need to customize how my code runs on the GPU, which is a thing sometimes people do. You can, like, program against the CUDA cores and CUDA kernels and things like that, you can actually embed in your Python code, C++ code, and CUDA Pi will actually compile that down to a CUDA binary, which is even cooler. Okay. I was just curious. So I'm really not a hardware guy, so bear with me. You said you have 12 cores. Is it on a laptop that you run in? Yeah. It's a new MacBook Pro. So it's a Intel Core i9 maxed out.

Starting point is 00:06:47 And it's really six cores that are each hyper-threaded is how it works. But the OS sees them as 12. So are there GPUs on a normal laptop or on your laptop? Or is this GPUs just something that... Okay.

Starting point is 00:06:58 No, no, there's a pretty high-end one on my MacBook. It's not as high-end as this, not even close, but maybe half or something or something i would guess in terms of performance that's a pretty bad estimate because i don't really know but yeah you could run this on a laptop yeah i'm just curious if the coup i would speed up things on just on a laptop or something or if it would i would think so i mean you got to have an algorithm that's like well adapted to gpus but if you did then i would think so yeah okay well this is neat for the right the people that really care about it

Starting point is 00:07:30 really care about it so this is cool yeah absolutely and i mean you can go and get like gpu clusters on aws or on digital ocean or things like that yeah okay and so you could actually ship your code up there even if you don't have one. Final note on this one is there was a PyCon 2018 presentation on this. And so I'm going to link to the presentation as well if people want to watch 30 more minutes of this. I think I would. Yeah, I actually do too. It looks really interesting. Yeah.

Starting point is 00:07:58 All right. I'm feeling a theme coming on. In episode 84, we did touch on somebody called in or called in. We actually don't have phone lines, somebody contacted us and said, hey, you should cover pre-commit. And we have, we did talk about pre-commit in episode 84, but we just sort of talked about what it was. But today, I ran across this fairly cool article called Automate Python Workflow Using Pre-Commits. I like this kind of an article actually of, okay, here's these

Starting point is 00:08:26 cool tools using pre-commit black and flaky. How do I put that in my day-to-day workflow? And how does it really work? And this is from LJ Miranda. So good job, LJ. It's got a great graphic at the start with telling you that you've got changes. When you add something, you go to get add, you go to get add you go to staging and then when you do a commit what happens is the pre-commit will intercept that part and it kicks off whatever pre-commit hooks you've got set up and if all of those pass then it lets the commit happen and if it doesn't it um kicks it back and then it shows you how to do how to deal with all of the different configuration that is available with pre-commit.

Starting point is 00:09:07 I like this. It's a good starter. If you're still quite not on board with pre-commit, this is a good article to read. Yeah, pre-commit's pretty cool. And that's a Python package that you install that then manages all the rest, which I think that's great. Yeah. This article, there's a little video, and I think it's an animated GIF or something a little short demo video that runs i don't know how to do this this is neat so it shows it shows it in action yeah yeah that's really cool i like those little autoplay gifs that'll

Starting point is 00:09:35 animate stuff because sometimes it's like you know if you could just see it happening it would be so much more easy to grok with little pictures trying to tell me. Yeah, and I also don't mind. Something like that is fine if it has an actual video to play, but don't give me a half-hour video. A little couple-minute video at most is great. Yeah. A half-hour GIF, probably not the way to go. I don't even know if you can do that.

Starting point is 00:09:59 So one way to go that is good, though, is to check out Datadog. So this episode is sponsored by Datadog, as I said, and I really appreciate them supporting the podcast. Datadog is a monitoring platform that brings metrics, logs, requests, traces, all that kind of stuff into one place across different systems and computers and all sorts of stuff. So you can use their trace search and analysis, which lets you break down Python application performance using high cardinality attributes like show me what this customer has done across my application or show me all the behaviors for this URL and really easy to troubleshoot

Starting point is 00:10:35 your app. So start doing that with your Python apps today with a free trial and Datadog will send you a free t-shirt, which has a cute little dog on it. So visit them at pythonbytes.fm slash datadog to get started. So you were talking earlier about that cool little GIF thing. And I think you can do it with Camtasia. Like you can record. Okay.

Starting point is 00:10:53 So I think you can do it with Camtasia. You can record basically a screencast and export it as a GIF, which is already pretty cool. Oh, okay. But this next item has a really nice little animated GIF thing going on as well, because it's super good to see it in action. So have you heard about PySpy? I have not. So PySpy is interesting for a couple of reasons.

Starting point is 00:11:15 It's interesting because it's a cool tool that people can use in some places that they could not previously do so. It's a Python profiler, so you can hook it up to your Python application, and it'll tell you where your Python app is spending its time, what functions and what it's doing and things like that. And it acts kind of like the Unix top command, which will take over your screen and it'll show you a list that's kind of updating every couple of seconds what's happening. That's pretty cool. So I can hook up this profiler and it'll live show me sort of the equivalent of like a process report

Starting point is 00:11:46 like a cpu usage report but it'll say right now you have these various functions that have run recently and here like we'll put the most expensive ones on top things like that oh neat that's cool right and so you can watch that little graph that little gif thing and and see it going this is written by ben frederickson and it's just taken off i think it was started in july or something like that it's already got 2,000 GitHub stars. So what's even cooler, though, is it'll let you visualize your Python's apps time without restarting or modifying your code in any way.

Starting point is 00:12:15 And it can attach to running processes and then start to profile them. Oh, nice. So normally profiling happens by I run a profiler, which runs my code, there's a bunch of stuff or maybe i reverse i'll write some code like imports c profile and i'll call a function start profile save profile export etc right like you it's really invasive so if you do it from the outside like the profiler runs your app you can't do it in production it makes it slow all sorts of

Starting point is 00:12:41 stuff if you do it the other way you're doing all sorts of you know writing code to change it this you just say hey there's a random Python program. I'm going to go profile that and it'll just attach to it. Nice. Yeah, you can just give it a process ID. Yeah, exactly. Give it a PID. And what's cool is because of that, that means you can use it in production. I could log into my web server that's getting pounded on not responding correctly or whatever and i could actually begin to profile it without like wrecking my thing or slowing it down or restarting it or whatever or any long-running process like while the problem is happening you can just attach to it and figure out what's wrong that's the key thing because maybe restarting it rerunning it takes like four hours to get into that weird state. You never know, right? Yeah. Oh, yeah, this is cool.

Starting point is 00:13:25 Sweet. That's pretty trick. So it's written in Rust, actually, but it's pip installable. So all sorts of cool things. And then he even goes into, Ben goes into how does it work. So there's a section on how does PySpy work.

Starting point is 00:13:39 So I'll just read you this and tell me, this sounds like a program you would have written. It's not one I would have. PySpy works by directly reading the memory of the Python program using process VM RedV, system call Linux, or VM read on OS 10, or read process memory on Windows. And then it just analyzes the memory over and over. That's crazy, right? But it knows enough about Python to go, well, that means X.

Starting point is 00:14:08 And it just, you know, off it goes. So there's a bunch of more details on how he actually makes it work. I'll link to that section as well. It's a pretty cool profiler. And I really like the attach to running processes without affecting them. That's pretty unique, I think. And so I wanted to highlight it. Yeah, nice.

Starting point is 00:14:25 And it can do isoglographs, which I don't know why that would be neat but it looks neat yeah those are cool i get sometimes you know just visually some things are really out of whack you're like what is that big bar from oh that's a radar sort why are we calling that a thousand times yeah things like that let's just sort it once all right what do you got first next i've got uh senpai. Let's just sort it once. All right. What do you got first next? I've got SimPy, which is just sort of fun. SimPy is a, well, I'm just going to read a little bit here too. Symbolic computation. So like you're in math class or something. We realized early on with programming that you can, if you punch things into the calculator too fast, it just mucks things up because you have rounding and various things like that.

Starting point is 00:15:07 So symbolic computation deals with the computation of mathematical objects symbolically. This means that mathematical objects are represented exactly, not approximately. And math expressions with unevaluated variables are left in symbolic form. And SymPy allows you to do that with Python. And it's sort of blasted cool. I've got a little example of doing an integration of the sign of x squared over negative infinity to positive infinity. And it will tell you what the answer is. And these sorts of symbolic math manipulations, for a lot of people, boy, if I had to do this by hand, I'd be in trouble. I did not do that well in math.

Starting point is 00:15:52 And so being able to do this programmatically is cool. And the introduction and the website is pretty awesome, too. It has a bunch of live, it's got an engine in the back that runs it, so you can try the examples out and pop up a little window and do it interactively. So this is neat. Yeah, there's a ton of cool stuff that comes out of this. So, for example, you can say x, y equals symbols x and y. And then after that, you can do algebraic expressions, like truly algebraically.

Starting point is 00:16:23 So like expression equals X plus two Y, not in quotes or anything, just like as if it were regular math. And then you could like add one to that expression and it'll reform the equation and stuff like that. You can ask it to do integration. Like the example you have in our show notes is to integrate sine of X squared

Starting point is 00:16:40 from a minus infinity to positive infinity. Instead of giving you the answer of oh my gosh what is that like 1.5 dot dot dot dot you know it just says that's square root of 2 pi over 2 like the exact answer that is pretty awesome you know we just wrecked the whole math experience for so many of our uh listeners who are students they're like you know what that calculus class i just solved that problem well i would have loved this while i was taking calculus yeah for sure yeah you could totally check your work like there's no answers in the book oh yeah really hold on that's pretty awesome so if you can take your laptop to your tests you're set yeah probably

Starting point is 00:17:22 not likely all right so this next one that i found brian it's pretty cool so something that i'm gonna i've been digging into lately behind the scenes and i'm going to be talking more and more about probably in the next few weeks is async programming in python like i've really been doing a lot with that lately and we'll have some cool stuff to share pretty soon but that means i'm running across all this cool async stuff so you've heard of whisky wsgi which is the web service gateway interface that's like how pyramid flask django all those things work none of them do a great job of supporting async programming because fundamentally this whisky interface is synchronous it can't be made async yeah so there's this other framework called asgi

Starting point is 00:18:06 for async gateway interface i guess that allows these frameworks to be asynchronous so the thing i'm talking about this week is starlet which is an asgi web framework and i like its little subtitle, the little ASGI framework that shines. It's cute. It is cute. So it's basically built for intended to build high performance async IO services. So if you have anything that talks to a database, to caches, to file systems, things like that, even calls other web services or microservices. Super easy to build. The API is basically Flask, like a Flask-ish API.

Starting point is 00:18:52 And you create like a web method. You say async def regular view method, and you go do a bunch of stuff. And it has cool support for like response types. So you can have a file response object that you just send back to the framework that's based on async AIO files, which is an async IO file-based thing. And there's a lot of nice integration like that. Okay.

Starting point is 00:19:10 You're just interested in this, or do you have an application that you're going to try to? No, I'm building a course on it. Oh, okay. I'm trying to make a nice, well-rounded async concurrent programming Python course. Well, yeah. So I've been building tons of little apps and stuff. So here we go. Here's one of them if you want to build an app that is way more scalable you know 10 times more scalable than regular web apps on the same hardware and whatnot it's pretty easy to do if mostly what that web app is doing is waiting right you can just you know the the

Starting point is 00:19:41 basically the async io web frameworks can just adapt to that more easily because they're not blocking while they're waiting. Also discovered a couple of cool things while looking into this. One is they say you should install the ultra JSON package, which the pip install command was ujson. And that is basically a drop-in replacement for the JSON built-in that is like between 50% and three times faster. So if you're doing a lot of JSON, you can just use UltraJSON,

Starting point is 00:20:12 and that's pretty awesome. Yeah. Okay. I'll have to check that out too. Yeah. So all you have to do is import a UJSON as JSON, and then that makes your code faster. Of course, it has to be there, right?

Starting point is 00:20:25 Yeah. That's pretty sweet. The other thing is you've maybe heard of G-Unicorn for the traditional web frameworks. There's a UVicorn, which is based on UV loop, and G-Unicorn, which is also pretty awesome for these async web frameworks. Well, it's cool, and I get the name,

Starting point is 00:20:43 but eventually if everybody starts using that, people forget where that came from, and it's just going to be a weird word, uveicorn. I know. Uveicorn. No, it's uveicorn. You've got to understand where the name comes from. Come on. Get it together. Well, that's it for our items.

Starting point is 00:21:00 I do have some extra stuff to share. How about you? Just one thing I wanted to point out, if I remember it. Okay, cool. I'll go first. You can share. How about you? Just one thing I wanted to point out, if I remember it. Okay, cool. I'll go first. You can think. So really big news. You and I, we had a good time at PyCon, right?

Starting point is 00:21:10 Oh, yeah. I can tell you when we're going to have a good time again. It's going to be, if we can go to a tutorial, it's May 1st and 2nd. If you want to go to talks, it's May 3rd, 4th, and 5th. And if you want to do the sprints, it's May 4th and 5th and if you want to do the sprints it's may 6 7 8 and 9 so basically the announcement is that the pycon dates are out yes and it's not over i don't think it's over mother's day this year i hope it's not hope it's not i also have a quick little follow-up uh you talked about the pre-commit package also

Starting point is 00:21:42 another listener matthew layman sent in some notes about how his team is using it and basically talked about how they're using um pre-commit the python package so that like their flake 8 and black and other other things that automatically run during continuous integration also automatically run when people do get commits so they have fewer failing builds, which is pretty awesome. And it has a couple of nice links. So I threw that in there at the end. And then finally,

Starting point is 00:22:10 you talked about the Gang of Four patterns last week, right? Yeah. So John Tosher, I think is right, sent us a message pointing out another talk from PyCon AU called You Don't Need That, which is pretty cool. And it basically talks about how if you study the Gang of Four patterns, a lot of what they were doing was because they were using Smalltalk or Java

Starting point is 00:22:32 or C++. And in Python, here's a new way that you just basically don't need that pattern. So pretty cool talk, and I'll link to the video for that. Yeah, if you translated the Gang of Four book directly to Python it would be like a pamphlet. That's right. Nice. Do you remember your item? I did not. So, save it for next week.

Starting point is 00:22:54 Yeah, save it for next week. We'll do this again next week, right? Yeah, maybe we should just do it every week. Yeah, alright, deal. We'll do it every week. Okay, cool. Cool. Alright, well thanks for doing the show this week. No, thank you deal. We'll do it every week. Okay, cool. Alright, well thanks for doing the show this week. Thank you. You bet. Bye. Thank you for listening to Python Bytes.

Starting point is 00:23:10 Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank

Starting point is 00:23:30 you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #95 Unleash the py-spy!

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.