Python Bytes - #109 CPython byte code explorer
Episode Date: December 18, 2018Topics covered in this episode: [play:1:01] Python Descriptors Are Magical Creatures [play:3:38] Data Science Survey 2018 JetBrains [play:8:04] cache.py [play:11:54] Setting up the data science too...ls [play:14:03] chartify [play:15:23] CPython byte code explorer Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/109
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 109, recorded December 10th, 2018.
I'm Michael Kennedy.
And I'm Brian Hocken.
And this episode is brought to you by DigitalOcean.
Thank you, thank you, DigitalOcean. Tell you more about them later.
Right now, Brian, how is everything going?
It's going really well. How about you?
Oh, it's super. I'm starting to think of year in review like what was the most
amazing python stories of the year and things like that so uh looking forward to sharing those
with everyone actually yeah that'd be great yeah so you and i actually did an episode along with
dan bader on talk python which will drop here on this channel as well for the year in review in
python news which is like, but more stuff and more
depth.
So that'll be a good thing for, you know, all those people traveling for the holidays,
right?
Yeah.
Give them something to listen to.
All right.
They may be stuck in an ice storm in Chicago O'Hare, but they can listen to some good
Python News.
Yeah.
That's right.
Speaking of good Python News, what do you got to kick us off this time?
I have a Python descriptor.
I have Python descriptors are magical creatures.
That sounds awesome. Yeah, this is actually kind of a neat approach this article thinking,
yeah, I know what descriptors are and stuff and properties. It's talking about properties of
object properties in Python. But this is a really great article. So this is an article by pablo arias and it talks about how you can add getters and setters
to and properties to objects so you can have like instead of calling a function like get version
you can just have like version and and you can use like object dot version and you can assign
to that and that'll call the uh the setter and you read from it, that'll call the getter,
and you can have custom functions for that.
It's one of the cool things about Python.
And one that I'm glad that it's been highlighted,
because some people forget this is around,
especially if you come from a language that doesn't have this sort of thing.
C, Java, that kind of stuff, right?
Yeah.
These are pretty neat, and they make it look like an attribute of the object, but it's actually a function that gets called. And it's a way you can actually migrate. You can start a system where it really is just data that's sitting there. And if you want to intercept it and say, you know, actually, when somebody assigns to this, I want to do some work, or I don't want to really store this data, I want to calculate it on the fly. I mean, you can turn those into getters and setters
and the calling code doesn't need to know.
Yeah, I really like this
because often the API makes the most sense
as sort of fields, just setting the attributes, right?
Like user.name, user.firstname or something like that.
But what if you want validation, right?
Like the name can't contain white spaces
or other weird stuff.
You want to strip that off or username is always lowercase and things like that.
So properties are perfect for that, right?
You can validate it.
You can raise an exception that says you can't have a none value here.
It has to be a non-empty string, all kinds of good stuff.
But the consumer doesn't care.
They don't have to know.
Yeah, and I personally actually have used uh get and set methods before
but the getters and setters but there's a deleter also and i i don't think i use that very much and
it's kind of a probably a neat thing to stick in place if if you're doing this anyway to make sure
if it's invalid for somebody to try to delete an attribute uh you may want to intercept that
so yeah you're like no you always have a name.
You can't delete it.
Yeah, but this is a good general introduction to how to use these
and so people can clean up their code a little bit,
make it look a little less Java-y.
Yep, I totally agree.
So the next one is I want to talk about a survey.
So we've talked about the JetBrains Python survey
and that data science featured heavily in it.
But they also did a separate data science survey for just data scientists and asked data science questions only.
So they pulled about 1600 people who are data scientists based in the US, Europe, China and Japan.
And to figure out what's the story, what's the zen,
and how are people feeling in the data science space right now.
And so it wasn't just for Python.
It was just for data scientists.
But you can imagine that there are many Python things happening
in the data science world, right?
So one of the key takeaways was that most people assume,
currently most people use Python,
and then they assume that Python will remain
the primary programming language at least for five years.
Yeah.
And that's essentially forever in computer time.
That's right.
Like, if you're planning past five years, you've got either a lot of faith in where
things are going, or you're doing it wrong.
Those actually could be the same thing.
All right.
And they also talked about what are the main tools people are using for machine learning stuff.
And they said Keras is the main one for professional developers.
Whereas if you're an amateur data scientist, you're more likely to use Microsoft Azure machine learning services rather than libraries.
So you're like, just make this a model.
Teach it stuff.
Figure that out later.
Whereas the pros, in quote, are actually doing the straight API stuff.
And remind me what Keras is?
Keras is a machine learning framework.
Okay.
Yeah.
So it's sort of comparable to Azure ML, but Azure ML is a service.
Like machine learning is a service.
I haven't ever used it, though.
Okay.
So let's see.
Main programming languages.
Obviously, there are other languages.
And if you look back just a couple years, right, R was a machine learning and data science language that was more popular than Python was for data science. But now it's Python is 57%. R is only 15%. Some people say Julia is the next big language for data scientists. So they asked about Julia of these 1,600 people,
and the number of people using it was 0%.
So that's not super compelling for Julia, I guess.
At least amongst this data, this statistical set.
Yeah, yeah, yeah.
And honestly, I forgot how they found this set of people.
So I'm sure they talk about it in the write-up.
And then finally, when you talk about IDEs and editors, there were three standout main things people used.
Obviously, Jupyter, Jupyter Notebooks, Jupyter Lab was 43%.
PyCharm was 38%.
And RStudio was 23%.
So that's pretty interesting.
Yeah.
Yep.
All right.
So if you're in the data science space, maybe this will help you keep your pulse on,
keep the pulse of what's going on there.
I want to highlight a little tool.
So like I talked about properties just as a nice technique
of people should make sure they understand how those work.
Another thing, I've ran across memoization,
like not memorization, but memoization with no R and this is a technique to if you've got a
if you've got a function or something some work that you need to do that's dependent on input
only dependent on the input parameters but to get your answer you have to it's a computationally
intensive and you often also get a lot of the same types of information coming in,
same type of parameters. Memoization is a technique to basically just store,
save the data, calculate it once. And if you get past the same arguments again,
just return the answer that you've already calculated.
This technique can make your code incredibly fast.
Like if you have some function that you're calling
with relatively bounded set of inputs,
and it's at all computationally expensive,
or it goes to a service and it gets an answer back.
Like you said, if the input is the only thing that drives it,
it's not like, well, what's the weather at the zip code?
Because that could always change.
But it's like, what's the limit of this integral
when passing in this lower bound,
like discrete integral or something, right?
It's always going to give you the same answer back.
So you can actually go to the function,
even with the func tools built into Python,
you can say, I want this function,
if it gets the same arguments to not run again,
just give the answer back.
It's kind of stored in memory or somewhere, right?
And that only works in process.
Yeah.
One of the things I wanted to highlight is a project called cache.py
that saves all this stuff off to a file.
This would be helpful, especially if you've got a command line tool
that gets called lots of times.
It isn't going to be able to store everything in memory.
So being able to save it in a file might be helpful.
The interface is just a decorator to say,
hey, this function, you can cache the results
so you throw a decorator on it called,
it's just cache.cache,
decorator onto your function and it just works
and there's a whole bunch of customization you can do.
You can say how long the cache is good for
and where the file should be and things like that
but the default just kind
of works pretty good too yeah i really like this so the thing is the built-in stuff only works in
memory and so once the process is done it's done but like you said if this is a command line tool
you're stringing together and you want it to keep that data for a certain amount of time or just
always keep it so that it's like well if you pass me seven the answer is always going to be this right yeah that's yeah it's great that that'll keep it
on the file system and it uses pickle right i'm not sure yeah let's see yeah currently uses pickle
and inspect under the hood making it not portable so you can't like take your cache file and move
it to windows when you ran it on linux or something i believe yeah because
it you know memory structure and different versions of python and so on so what remind me what was the
built-in one that works in memory it's on funk tools and it's a lru cache i believe okay tools
lru cache yeah yeah yeah i brought this up also mostly because i know a lot of people teach them
learn on the job or teach themselves to program.
I'm not bragging that I have a computer science degree, but this is one of those topics that you probably don't come up with on your own.
It's a clever thing and a nice, useful tool for your toolbox, but it's not something that's obvious.
It wasn't obvious to me until I learned about it.
Yeah, same here.
I think the first time I learned about this was when I started studying design patterns and stuff like that. And somehow it came up in there. I'm like, Oh, that's pretty clever. Yeah. When you are working with code
and it's slow to me, it seems like there's two things that are really, really powerful that
can just go, Oh, well now it's a hundred times faster. That's cool. And that was like one line
of code. You know, one is using the wrong kind of data structure. Like if you're using a
list, but you really should use a set because you're testing for membership on a big set,
something like that, or dictionaries or whatever. The other one is this kind of caching, right?
Like if you're doing something and it takes a long time, even if it's going out to the internet
and calling a service, like if you think that data changes once a day, it'd be totally great
to put like a one minute cache on that if you're calling it a bunch of times.
Yeah. And it can, like you said, it can, it can make a massive improvement speed
up. And it's like sort of an obvious of, you know, after you see it, you're like, well, yeah,
duh, I didn't even think of that. Absolutely. So I really think, I think this is a cool one
because it takes that idea and it just makes it easy to carry it across different processes
or different runs of the same process. Okay. So before we get on the next one, let me just tell you all about DigitalOcean.
They're doing all sorts of cool stuff.
Our infrastructure runs on it.
Really, really nice and reliable.
One of the things I want to highlight this time is their work with Kubernetes, Docker
and coordinating Docker, orchestrating Docker stuff with Kubernetes is a big deal these
days.
And so they're launching a new Kubernetes cluster over at DigitalOcean.
So a really nice way to manage and deploy your container workloads in the cloud.
And if you go to pythonbytes.fm slash DigitalOcean and you're a new user,
you get $100 credit to Kubernetes all the way if you want.
You can run a lot of Kubernetes for $100 in the cloud.
So that's pretty awesome.
Yeah, very cool.
Yeah, so check them out, pythonbytes.fm.com.
They're big supporters of the show, and they keep us going strong each week, don't they?
Yeah, I'm very grateful for them.
Yep.
The next one I want to tell you about is a really short video.
Last week, I covered an hour and a half video about being an expert on Python.
How about we cut this down to like a four-minute one?
So I think this one is really good for people who are getting into data science and they
have a little bit of a little bit of a challenge.
If you're an expert, this is definitely not the video for you.
But this is called setting up the data science tools.
And so it's part of a larger video series.
But it basically shows you how to set up the Anaconda distribution, TensorFlow, Keras,
Jupyter, all those things and it actually talks about using conda conda virtual
environments creating notebooks and switching between virtual environments so if you've been
mostly working with pip or you see examples in pip and you want to do more anaconda stuff
this is a great video and especially if you want to install some of these tools and get going and
you're kind of new this is a great way to get going. That's awesome. Yeah, cool. Yeah, it's great.
I was just talking to somebody
who was really new to Python
and super eager to get going,
but he was having a problem
because he was working on a computer
that he didn't have admin access to.
And so when he would try to pip install something,
it would try to put it in the system-wide thing,
which you'd have to make that happen.
You shouldn't,
but if you wanted it to happen,
for sure, you could do sudo, but he wasn't allowed to, you know,
basically run his admin to do that.
Right?
So I'm like, oh, you just need to use a virtual environment.
Then you can do whatever you want to your machine.
It's like, oh, wonderful.
Right?
So I think, you know, it might sound like old hat to folks
that have been doing it for a long time, but when you're new to it,
like that's not obvious, right?
Like my Python won't install.
Well, if you had a virtual environment, it would,
or you did these other steps, it would, right?
Right, and also somebody like me
that is used to virtual environments,
it's still not obvious how to do that
in a Anaconda environment.
Exactly.
I have to look it up every time
because I'm all about pip.
And I was like, wait a minute,
it's a different way to activate.
It's like a global activate command.
Where's the list?
How do I know what exists?
Yeah, it's different.
So I'm sure I could actually use this as well.
Cool.
Beginner means beginner to Anaconda and data science tools.
Not true beginner, right?
Yeah.
Awesome.
All right.
Speaking of data science, I bet data science, data scientists draw a lot of graphs, right?
Yeah.
Well, lots of people draw a lot of graphs.
Last time I tried to use Boca or Bokeh, I keep saying that wrong.
You don't need to email me that I'm saying it wrong.
I know it's Bokeh.
It's B-O-K-E-H.
It is a very powerful charting tool.
I believe it's not the most simplest interface to figure out as a newbie.
And it's not like Matplotlib is super easy either, but a lot of people know
about it. But Bokeh, yeah, it's not bad. It's just if you're a beginner, maybe there's an easier way.
And this is the easier way. One of the easier ways is a package called Chartify that simplifies a lot
of the defaults. And it's built on top of Bokeh. So if you've got some data and you want to throw it into a chart, this is a nice way to do it.
It fills out a whole bunch of the defaults to where it starts out fairly
pretty to start with. So simplifying the API for newbies
into Bokeh. Oh, that's great. I do find it a little overwhelming because you can do everything,
right? You can specify so much detail. I'm like, sometimes I'm just like, you know, I could just use
a histogram. Wouldn't that be awesome awesome can we just do a histogram yeah and if i
got a bunch of different um you know i want to be able to pick the colors fairly easily and i don't
really care but i just wanted to look nice yeah they also have a bunch of nice examples example
notebooks and stuff that walk you through using it so yeah it's a great little resource speaking
of jupiter and examples and notebooks and stuff I want to stick with that for the last one here.
And it's called the CPython bytecode explorer. Most people probably know this at least at some
level, but I'm sure not everyone does. When you run your Python code, it loads it up and it
compiles it to bytecode. And you're like, wait, what? Python's interpreted. It's not
compiled. So it compiles your source code into bytecode. And those bytecodes are interpreted
on top of the CPython, like a big loop that just runs. It goes, okay, what's the next bytecode?
Let's do that. So understanding what those bytecodes are, how complex is something? Is it
an atomic operation or does it take multiple steps?
All of those things you might wonder about. So this was sent to us by Anton Helm and it's created
by this guy named Jeremy Touloup. And what it is, is it's a plugin for JupyterLab, not Jupyter
Notebooks, but the more full feature JupyterLab. And what it does is it lets you look at the byte
code of various things that you're various
operations that you're working on. So if you pull up that thing, Brian, the link there,
you can see there's a little animated GIF that shows you what's happening. So it's creating like
an A, B and a C equals A plus B. And there's just on the right as you type, it just shows you the
bytecode of those. So I think this is a great way to explore
working with python if you want to understand more this low level bytecode thing yeah this
would be awesome just in teaching like especially if you're going to talk about like um how the
naming can vent how names work in python this would be kind of fun to use to see how it all
points to the same thing
and whatnot. So yeah, another example that's cool. If you go to the very bottom, there's a bunch of
little animated GIFs here. And the very bottom one shows the two operations looping over just
the numbers zero to nine. And you can either do this by a while loop, you create the while loop,
and you have less, you know, i less than 10,
i plus equals one,
or you could just say for i in range of zero to 10.
And they show it side by side
comparing the disassembled bytecode
of both of them.
And surprise, surprise,
the for in loop
is a lot fewer bytecode operations.
So it's probably faster.
That's cool.
Cool, right?
There's even a demo
that shows that you can see the, have Python 3.6 and python 3.7 running side by side yeah in the same jupiter lab
view you can have different versions of python with the same code to understand how bytecodes
have evolved over time that's trippy yeah i know so if you want to understand bytecodes this is
pretty uh trippy here so yeah like you said if you're teaching people about this kind of stuff, I think this
would be an awesome resource.
Yeah.
Nice.
Cool.
Yeah.
Really good to just dig in and understand it.
That's it for our six items this week, Brian.
But I was wondering, how is the internet made?
Is it like factory?
Is it like, are there internet trees?
Yeah. I was contemplating whether or not to bring this up but i it's too late now yeah i saw on uh i'm a little addicted to twitter so
somebody passed around this little video called how the internet is made and we're going to put
a link to it and it's hard to describe but it's just this complete silliness of these old-time videos of how things are made
and stuff. It gets shipped from here to there and gets rolled across the field with barrels and
stuff. And it's bizarre, but it made me laugh so hard. It's like an old-timey silent movie with
subtitles. It's like a documentary on how the internet is made so it starts out has lots of gears and cambers and things then eventually it's put into wheelbarrows if i
understand this correct yeah and it starts in austria i believe that so it's the internet
is mined in austria it's put into a special internet wheelbarrow which is pretty trippy
it's like a hovercraft it's mixed up into like a gray goo and then it's shipped off
along these pipes now anyway it's it's a good joke people can check it out but it's mixed up into like a gray goo and then it's shipped off along these pipes now
anyway it's it's a good joke people can check it out but it's much more visual it does reference
both austria and ireland even though i think it's ireland even though the the map always points to
italy i didn't notice that that was i'm like this is so off. Pretty awesome. So people, if you need a good laugh, you know, click on that link.
It's silent, so it's not going to upset folks at work, right?
It's all about just the visuals.
Well, I think it was a good one, Brian, and I'm glad I forced you to put it in there.
So next week, we've got a kind of a year in review thing that you're putting in, right?
Yeah, absolutely.
So you and I had recorded a TalkPython year in
review, top 10 Python stories of the
year, not just of the week. And
that's coming out next time. So be sure to
check that out and it'll be a lot
of fun. Yeah, nice. Okay.
Alright, well, thank you for doing all of this
this year with me, Brian. Yeah, thank you.
You bet. Bye. Bye. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured, just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin, this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.