Python Bytes - #109 CPython byte code explorer

Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 109, recorded December 10th, 2018. I'm Michael Kennedy. And I'm Brian Hocken. And this episode is brought to you by DigitalOcean. Thank you, thank you, DigitalOcean. Tell you more about them later. Right now, Brian, how is everything going? It's going really well. How about you?

Starting point is 00:00:21 Oh, it's super. I'm starting to think of year in review like what was the most amazing python stories of the year and things like that so uh looking forward to sharing those with everyone actually yeah that'd be great yeah so you and i actually did an episode along with dan bader on talk python which will drop here on this channel as well for the year in review in python news which is like, but more stuff and more depth. So that'll be a good thing for, you know, all those people traveling for the holidays, right?

Starting point is 00:00:49 Yeah. Give them something to listen to. All right. They may be stuck in an ice storm in Chicago O'Hare, but they can listen to some good Python News. Yeah. That's right. Speaking of good Python News, what do you got to kick us off this time?

Starting point is 00:01:00 I have a Python descriptor. I have Python descriptors are magical creatures. That sounds awesome. Yeah, this is actually kind of a neat approach this article thinking, yeah, I know what descriptors are and stuff and properties. It's talking about properties of object properties in Python. But this is a really great article. So this is an article by pablo arias and it talks about how you can add getters and setters to and properties to objects so you can have like instead of calling a function like get version you can just have like version and and you can use like object dot version and you can assign to that and that'll call the uh the setter and you read from it, that'll call the getter,

Starting point is 00:01:46 and you can have custom functions for that. It's one of the cool things about Python. And one that I'm glad that it's been highlighted, because some people forget this is around, especially if you come from a language that doesn't have this sort of thing. C, Java, that kind of stuff, right? Yeah. These are pretty neat, and they make it look like an attribute of the object, but it's actually a function that gets called. And it's a way you can actually migrate. You can start a system where it really is just data that's sitting there. And if you want to intercept it and say, you know, actually, when somebody assigns to this, I want to do some work, or I don't want to really store this data, I want to calculate it on the fly. I mean, you can turn those into getters and setters

Starting point is 00:02:26 and the calling code doesn't need to know. Yeah, I really like this because often the API makes the most sense as sort of fields, just setting the attributes, right? Like user.name, user.firstname or something like that. But what if you want validation, right? Like the name can't contain white spaces or other weird stuff.

Starting point is 00:02:44 You want to strip that off or username is always lowercase and things like that. So properties are perfect for that, right? You can validate it. You can raise an exception that says you can't have a none value here. It has to be a non-empty string, all kinds of good stuff. But the consumer doesn't care. They don't have to know. Yeah, and I personally actually have used uh get and set methods before

Starting point is 00:03:05 but the getters and setters but there's a deleter also and i i don't think i use that very much and it's kind of a probably a neat thing to stick in place if if you're doing this anyway to make sure if it's invalid for somebody to try to delete an attribute uh you may want to intercept that so yeah you're like no you always have a name. You can't delete it. Yeah, but this is a good general introduction to how to use these and so people can clean up their code a little bit, make it look a little less Java-y.

Starting point is 00:03:35 Yep, I totally agree. So the next one is I want to talk about a survey. So we've talked about the JetBrains Python survey and that data science featured heavily in it. But they also did a separate data science survey for just data scientists and asked data science questions only. So they pulled about 1600 people who are data scientists based in the US, Europe, China and Japan. And to figure out what's the story, what's the zen, and how are people feeling in the data science space right now.

Starting point is 00:04:08 And so it wasn't just for Python. It was just for data scientists. But you can imagine that there are many Python things happening in the data science world, right? So one of the key takeaways was that most people assume, currently most people use Python, and then they assume that Python will remain the primary programming language at least for five years.

Starting point is 00:04:27 Yeah. And that's essentially forever in computer time. That's right. Like, if you're planning past five years, you've got either a lot of faith in where things are going, or you're doing it wrong. Those actually could be the same thing. All right. And they also talked about what are the main tools people are using for machine learning stuff.

Starting point is 00:04:46 And they said Keras is the main one for professional developers. Whereas if you're an amateur data scientist, you're more likely to use Microsoft Azure machine learning services rather than libraries. So you're like, just make this a model. Teach it stuff. Figure that out later. Whereas the pros, in quote, are actually doing the straight API stuff. And remind me what Keras is? Keras is a machine learning framework.

Starting point is 00:05:11 Okay. Yeah. So it's sort of comparable to Azure ML, but Azure ML is a service. Like machine learning is a service. I haven't ever used it, though. Okay. So let's see. Main programming languages.

Starting point is 00:05:20 Obviously, there are other languages. And if you look back just a couple years, right, R was a machine learning and data science language that was more popular than Python was for data science. But now it's Python is 57%. R is only 15%. Some people say Julia is the next big language for data scientists. So they asked about Julia of these 1,600 people, and the number of people using it was 0%. So that's not super compelling for Julia, I guess. At least amongst this data, this statistical set. Yeah, yeah, yeah. And honestly, I forgot how they found this set of people. So I'm sure they talk about it in the write-up.

Starting point is 00:06:03 And then finally, when you talk about IDEs and editors, there were three standout main things people used. Obviously, Jupyter, Jupyter Notebooks, Jupyter Lab was 43%. PyCharm was 38%. And RStudio was 23%. So that's pretty interesting. Yeah. Yep. All right.

Starting point is 00:06:21 So if you're in the data science space, maybe this will help you keep your pulse on, keep the pulse of what's going on there. I want to highlight a little tool. So like I talked about properties just as a nice technique of people should make sure they understand how those work. Another thing, I've ran across memoization, like not memorization, but memoization with no R and this is a technique to if you've got a if you've got a function or something some work that you need to do that's dependent on input

Starting point is 00:06:53 only dependent on the input parameters but to get your answer you have to it's a computationally intensive and you often also get a lot of the same types of information coming in, same type of parameters. Memoization is a technique to basically just store, save the data, calculate it once. And if you get past the same arguments again, just return the answer that you've already calculated. This technique can make your code incredibly fast. Like if you have some function that you're calling with relatively bounded set of inputs,

Starting point is 00:07:31 and it's at all computationally expensive, or it goes to a service and it gets an answer back. Like you said, if the input is the only thing that drives it, it's not like, well, what's the weather at the zip code? Because that could always change. But it's like, what's the limit of this integral when passing in this lower bound, like discrete integral or something, right?

Starting point is 00:07:49 It's always going to give you the same answer back. So you can actually go to the function, even with the func tools built into Python, you can say, I want this function, if it gets the same arguments to not run again, just give the answer back. It's kind of stored in memory or somewhere, right? And that only works in process.

Starting point is 00:08:03 Yeah. One of the things I wanted to highlight is a project called cache.py that saves all this stuff off to a file. This would be helpful, especially if you've got a command line tool that gets called lots of times. It isn't going to be able to store everything in memory. So being able to save it in a file might be helpful. The interface is just a decorator to say,

Starting point is 00:08:25 hey, this function, you can cache the results so you throw a decorator on it called, it's just cache.cache, decorator onto your function and it just works and there's a whole bunch of customization you can do. You can say how long the cache is good for and where the file should be and things like that but the default just kind

Starting point is 00:08:45 of works pretty good too yeah i really like this so the thing is the built-in stuff only works in memory and so once the process is done it's done but like you said if this is a command line tool you're stringing together and you want it to keep that data for a certain amount of time or just always keep it so that it's like well if you pass me seven the answer is always going to be this right yeah that's yeah it's great that that'll keep it on the file system and it uses pickle right i'm not sure yeah let's see yeah currently uses pickle and inspect under the hood making it not portable so you can't like take your cache file and move it to windows when you ran it on linux or something i believe yeah because it you know memory structure and different versions of python and so on so what remind me what was the

Starting point is 00:09:29 built-in one that works in memory it's on funk tools and it's a lru cache i believe okay tools lru cache yeah yeah yeah i brought this up also mostly because i know a lot of people teach them learn on the job or teach themselves to program. I'm not bragging that I have a computer science degree, but this is one of those topics that you probably don't come up with on your own. It's a clever thing and a nice, useful tool for your toolbox, but it's not something that's obvious. It wasn't obvious to me until I learned about it. Yeah, same here. I think the first time I learned about this was when I started studying design patterns and stuff like that. And somehow it came up in there. I'm like, Oh, that's pretty clever. Yeah. When you are working with code

Starting point is 00:10:11 and it's slow to me, it seems like there's two things that are really, really powerful that can just go, Oh, well now it's a hundred times faster. That's cool. And that was like one line of code. You know, one is using the wrong kind of data structure. Like if you're using a list, but you really should use a set because you're testing for membership on a big set, something like that, or dictionaries or whatever. The other one is this kind of caching, right? Like if you're doing something and it takes a long time, even if it's going out to the internet and calling a service, like if you think that data changes once a day, it'd be totally great to put like a one minute cache on that if you're calling it a bunch of times.

Starting point is 00:10:44 Yeah. And it can, like you said, it can, it can make a massive improvement speed up. And it's like sort of an obvious of, you know, after you see it, you're like, well, yeah, duh, I didn't even think of that. Absolutely. So I really think, I think this is a cool one because it takes that idea and it just makes it easy to carry it across different processes or different runs of the same process. Okay. So before we get on the next one, let me just tell you all about DigitalOcean. They're doing all sorts of cool stuff. Our infrastructure runs on it. Really, really nice and reliable.

Starting point is 00:11:12 One of the things I want to highlight this time is their work with Kubernetes, Docker and coordinating Docker, orchestrating Docker stuff with Kubernetes is a big deal these days. And so they're launching a new Kubernetes cluster over at DigitalOcean. So a really nice way to manage and deploy your container workloads in the cloud. And if you go to pythonbytes.fm slash DigitalOcean and you're a new user, you get $100 credit to Kubernetes all the way if you want. You can run a lot of Kubernetes for $100 in the cloud.

Starting point is 00:11:42 So that's pretty awesome. Yeah, very cool. Yeah, so check them out, pythonbytes.fm.com. They're big supporters of the show, and they keep us going strong each week, don't they? Yeah, I'm very grateful for them. Yep. The next one I want to tell you about is a really short video. Last week, I covered an hour and a half video about being an expert on Python.

Starting point is 00:11:59 How about we cut this down to like a four-minute one? So I think this one is really good for people who are getting into data science and they have a little bit of a little bit of a challenge. If you're an expert, this is definitely not the video for you. But this is called setting up the data science tools. And so it's part of a larger video series. But it basically shows you how to set up the Anaconda distribution, TensorFlow, Keras, Jupyter, all those things and it actually talks about using conda conda virtual

Starting point is 00:12:27 environments creating notebooks and switching between virtual environments so if you've been mostly working with pip or you see examples in pip and you want to do more anaconda stuff this is a great video and especially if you want to install some of these tools and get going and you're kind of new this is a great way to get going. That's awesome. Yeah, cool. Yeah, it's great. I was just talking to somebody who was really new to Python and super eager to get going, but he was having a problem

Starting point is 00:12:51 because he was working on a computer that he didn't have admin access to. And so when he would try to pip install something, it would try to put it in the system-wide thing, which you'd have to make that happen. You shouldn't, but if you wanted it to happen, for sure, you could do sudo, but he wasn't allowed to, you know,

Starting point is 00:13:07 basically run his admin to do that. Right? So I'm like, oh, you just need to use a virtual environment. Then you can do whatever you want to your machine. It's like, oh, wonderful. Right? So I think, you know, it might sound like old hat to folks that have been doing it for a long time, but when you're new to it,

Starting point is 00:13:21 like that's not obvious, right? Like my Python won't install. Well, if you had a virtual environment, it would, or you did these other steps, it would, right? Right, and also somebody like me that is used to virtual environments, it's still not obvious how to do that in a Anaconda environment.

Starting point is 00:13:39 Exactly. I have to look it up every time because I'm all about pip. And I was like, wait a minute, it's a different way to activate. It's like a global activate command. Where's the list? How do I know what exists?

Starting point is 00:13:47 Yeah, it's different. So I'm sure I could actually use this as well. Cool. Beginner means beginner to Anaconda and data science tools. Not true beginner, right? Yeah. Awesome. All right.

Starting point is 00:13:56 Speaking of data science, I bet data science, data scientists draw a lot of graphs, right? Yeah. Well, lots of people draw a lot of graphs. Last time I tried to use Boca or Bokeh, I keep saying that wrong. You don't need to email me that I'm saying it wrong. I know it's Bokeh. It's B-O-K-E-H. It is a very powerful charting tool.

Starting point is 00:14:15 I believe it's not the most simplest interface to figure out as a newbie. And it's not like Matplotlib is super easy either, but a lot of people know about it. But Bokeh, yeah, it's not bad. It's just if you're a beginner, maybe there's an easier way. And this is the easier way. One of the easier ways is a package called Chartify that simplifies a lot of the defaults. And it's built on top of Bokeh. So if you've got some data and you want to throw it into a chart, this is a nice way to do it. It fills out a whole bunch of the defaults to where it starts out fairly pretty to start with. So simplifying the API for newbies into Bokeh. Oh, that's great. I do find it a little overwhelming because you can do everything,

Starting point is 00:15:00 right? You can specify so much detail. I'm like, sometimes I'm just like, you know, I could just use a histogram. Wouldn't that be awesome awesome can we just do a histogram yeah and if i got a bunch of different um you know i want to be able to pick the colors fairly easily and i don't really care but i just wanted to look nice yeah they also have a bunch of nice examples example notebooks and stuff that walk you through using it so yeah it's a great little resource speaking of jupiter and examples and notebooks and stuff I want to stick with that for the last one here. And it's called the CPython bytecode explorer. Most people probably know this at least at some level, but I'm sure not everyone does. When you run your Python code, it loads it up and it

Starting point is 00:15:40 compiles it to bytecode. And you're like, wait, what? Python's interpreted. It's not compiled. So it compiles your source code into bytecode. And those bytecodes are interpreted on top of the CPython, like a big loop that just runs. It goes, okay, what's the next bytecode? Let's do that. So understanding what those bytecodes are, how complex is something? Is it an atomic operation or does it take multiple steps? All of those things you might wonder about. So this was sent to us by Anton Helm and it's created by this guy named Jeremy Touloup. And what it is, is it's a plugin for JupyterLab, not Jupyter Notebooks, but the more full feature JupyterLab. And what it does is it lets you look at the byte

Starting point is 00:16:24 code of various things that you're various operations that you're working on. So if you pull up that thing, Brian, the link there, you can see there's a little animated GIF that shows you what's happening. So it's creating like an A, B and a C equals A plus B. And there's just on the right as you type, it just shows you the bytecode of those. So I think this is a great way to explore working with python if you want to understand more this low level bytecode thing yeah this would be awesome just in teaching like especially if you're going to talk about like um how the naming can vent how names work in python this would be kind of fun to use to see how it all

Starting point is 00:17:04 points to the same thing and whatnot. So yeah, another example that's cool. If you go to the very bottom, there's a bunch of little animated GIFs here. And the very bottom one shows the two operations looping over just the numbers zero to nine. And you can either do this by a while loop, you create the while loop, and you have less, you know, i less than 10, i plus equals one, or you could just say for i in range of zero to 10. And they show it side by side

Starting point is 00:17:31 comparing the disassembled bytecode of both of them. And surprise, surprise, the for in loop is a lot fewer bytecode operations. So it's probably faster. That's cool. Cool, right?

Starting point is 00:17:43 There's even a demo that shows that you can see the, have Python 3.6 and python 3.7 running side by side yeah in the same jupiter lab view you can have different versions of python with the same code to understand how bytecodes have evolved over time that's trippy yeah i know so if you want to understand bytecodes this is pretty uh trippy here so yeah like you said if you're teaching people about this kind of stuff, I think this would be an awesome resource. Yeah. Nice.

Starting point is 00:18:11 Cool. Yeah. Really good to just dig in and understand it. That's it for our six items this week, Brian. But I was wondering, how is the internet made? Is it like factory? Is it like, are there internet trees? Yeah. I was contemplating whether or not to bring this up but i it's too late now yeah i saw on uh i'm a little addicted to twitter so

Starting point is 00:18:34 somebody passed around this little video called how the internet is made and we're going to put a link to it and it's hard to describe but it's just this complete silliness of these old-time videos of how things are made and stuff. It gets shipped from here to there and gets rolled across the field with barrels and stuff. And it's bizarre, but it made me laugh so hard. It's like an old-timey silent movie with subtitles. It's like a documentary on how the internet is made so it starts out has lots of gears and cambers and things then eventually it's put into wheelbarrows if i understand this correct yeah and it starts in austria i believe that so it's the internet is mined in austria it's put into a special internet wheelbarrow which is pretty trippy it's like a hovercraft it's mixed up into like a gray goo and then it's shipped off

Starting point is 00:19:24 along these pipes now anyway it's it's a good joke people can check it out but it's mixed up into like a gray goo and then it's shipped off along these pipes now anyway it's it's a good joke people can check it out but it's much more visual it does reference both austria and ireland even though i think it's ireland even though the the map always points to italy i didn't notice that that was i'm like this is so off. Pretty awesome. So people, if you need a good laugh, you know, click on that link. It's silent, so it's not going to upset folks at work, right? It's all about just the visuals. Well, I think it was a good one, Brian, and I'm glad I forced you to put it in there. So next week, we've got a kind of a year in review thing that you're putting in, right?

Starting point is 00:20:02 Yeah, absolutely. So you and I had recorded a TalkPython year in review, top 10 Python stories of the year, not just of the week. And that's coming out next time. So be sure to check that out and it'll be a lot of fun. Yeah, nice. Okay. Alright, well, thank you for doing all of this

Starting point is 00:20:17 this year with me, Brian. Yeah, thank you. You bet. Bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Python Bytes - #109 CPython byte code explorer

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.