Python Bytes - #191 Live from the Manning Python Conference
Episode Date: July 22, 2020Topics covered in this episode: VS Code Device Simulator pytest 6.0.0rc1 What is the core of the Python programming language? Extras Joke See the full show notes for this episode on the website ...at pythonbytes.fm/191
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 191, recorded July 14th, 2020.
I'm Michael Kennedy.
And I'm Brian Ocken.
And welcome special guest, Enos.
Hi.
It's great to have you here.
So I want to kick this off with a cool IoT thing.
Now, IoT and Python, they've got a pretty special place. Because when I think about Python, I think of it as not being something that sort of
competes with assembly language and really, really low level type of programming for small
devices.
But amazing people put together MicroPython, which is a reimplementation of Python that
runs on little tiny devices.
And we're talking like $ dollar microchip type devices
right have either of you all played with these no no i i haven't but like i've been seeing a bit of
this from my brother so he's he's pretty amazing like he he's a bit younger than me he's an event
technician and he recently taught himself programming and everything just so he can
build stuff on these like tiny raspberry pies and like i don't know he's doing super advanced stuff
it's been really interesting to see him learn to program and he't know he's doing super advanced stuff it's been really
interesting to see him learn to program and he's also he's incredibly good he has like amazing
instincts about programming even though he's never done it before but like so i've been kind of
watching this from afar and it made me really want to build stuff so i'm very curious yeah i've done
the i've done the circuit python on some of the adafruit stuff exactly so i always just want to
build these things.
I'm like, what could I think of
that I could build with these cool little devices?
I just, in my world, I don't have it.
Maybe if I had a farm, I could like automate,
you know, like watering or monitoring the crops
or if I had a factory,
but I just don't live in a world
that allows me to automate these things.
Do you have pets?
Maybe you can build something for pets.
We generally don't have pets, but we are fostering kittens for the summer. So I could put a little
device onto one of the kittens potentially. GPS tracker. Yeah. So in general, you have to get
these little devices, right? You've got the US PyCon, we got the Circuit Playground Express,
which is that little circular thing.
It's got some 10 LEDs and a bunch of buttons and other really advanced things like motion sensors and temperature and so on.
Probably the earliest one of these that was a big hit was the BBC Microbit, where I think every seventh grader in the UK got it.
Some grade around that scale got one of these.
And it really made a difference in kids seeing themselves as a programmer. And interestingly, especially women were more likely to see programming as something they
might be interested in, in that group where they went through that experience.
So I think there's real value to work with these little devices, but getting a hold of
them can be a challenge, right?
You've got to physically get this device.
That means you have that idea of, I want to do this thing and then I have to order it from Adafruit
or somewhere else and then wait for it to come. And my experience has been, I'll go there and I'm
like, oh, this is really cool. I want one of these. Oh, wait, no, it's sold out right now.
You can order it again in a month, right? So getting it as a challenge. And also if you're
working in a group of say, like you want to teach a high school class or a
college class or something like that, and you want everyone to have access to these,
well, then all of a sudden, the fact that maybe it costs $50 wasn't a big deal.
But if it's $50 times 20 or 100 kids, then all of a sudden, well, maybe not.
So I want to talk about this thing called Device Simulator Express.
So this is a
plugin or extension or whatever the things that I think it's extensions that VS code calls them
that makes VS code do more stuff. And it's a open source free device simulator. So what you can do
is you just go to the visual studio code extensions thing and you type device probably is sufficient,
but device simulator express, and it'll let you install this extra thing inside of vs code that is really quite legit so
it gives you a simulated circuit playground express a simulated bbc micro bit and the most
impressive to me is the clue from adafruit which actually has a screen that you can put graphics on.
So really, really cool way to get these little IoT devices with CircuitPython. So Adafruit's
fork of MicroPython on there. What do you guys think? See that picture? Look how cool that is.
Yeah, so you can really so you can write Python in one tab and then just have the visualization
in the other.
That's pretty cool.
Yeah.
Yeah, exactly.
And it's very similar to, say, what you might do with Xcode and iPhones, where you have
an emulator that looks quite a bit like it or what you would do on the Android equivalent.
I actually think this is a little bit better than the device because it's actually larger,
right?
Like the device is really small, but here's like a, you know, it could be like a huge
thing on your 4K monitor with a little clue device so you can simulate circuit playground
express bbc micro bit and the clue in here and we just say new project and it'll actually write the
the boilerplate code for the main dot pi or code dot pi or whatever it's called that the various
thing is going to run and it like you said en Enos, on one half, it's got the code
and the other half does the device that you can interact with.
I was thinking that a couple of cases that would be great is,
like you were saying, trying to get a hold of it,
but you might not even know if the concept that you're going to use
is really going to work for the device you're thinking of.
So this would be a good way to try it out,
to try out whether the thing you're thinking of trying in for your house or whatever would actually work for this device.
The other thing was you brought up education and that it's big.
I was thinking about a couple of conferences
where they tried to do the display
and try to have like a camera or something.
Sometimes it works and sometimes it doesn't.
This way you could just do a like a
tutorial or in a teaching scenario and everybody could see it because it's just going to be
displayed on your your monitor so right your standard screen sharing would totally work here
that's a good point as well yeah and it doesn't have to be all or nothing it actually what's
really interesting is this thing isn't just an emulator but you can do debugging you can set
like a break point and like step through it running on the device,
simulate it, or you can actually run it.
If you had a real device plugged in,
you can run it on there as well and then do debugging and breakpoints
and stuff on the actual device.
So it's like you tested here.
I always admire people who actually use
like the proper debugging features.
I know VS Code has like so much of this
and I'm always like, I should use this more,
but I'm like, okay, print.
Print, print. Yeah, there's some really cool libraries that will actually do that i can't remember what it's called but
brian and i recently covered one that that would actually like print out a little bit of your code
and the variables as they change over time it was like the the height of the print debugging world
that was really really cool i wish i could remember do you remember brian no we actually
covered a couple of them and i know i know that's a problem we cover thousands of things in here so another thing
that's interesting is like okay so you see the device some of them have buttons and they have
lights and you can imagine maybe you could touch the button but they also have things like temperature
gyro meter type things or like you move in it or motion sensing or even like if you shake it
this thing has little ways to simulate all that stuff.
So you can like have a temperature slider
that freaks it out and says,
hey, the temperature is actually this
on your temperature sensor and so on.
So all the stuff that the devices simulate
are available here.
Oh, that's cool.
Yeah.
So I actually had the team over on TalkPython not long ago.
So people can check that over at talkpython.fm.
And yeah, I'm also really excited about what
you got coming here next brian what is that yeah well speaking of deep i guess debugging versus
test we didn't really talk about testing anyway i'm really excited we should have talked about
testing yeah so i was just i was thinking and i was thinking that that um i i hardly ever use a
debugger for my source code but i use a debugger all the time when I'm debugging my tests.
I don't know.
It's just something different about it.
But I've been running a lot of tests and debugging a lot of tests lately
because PyTest 6, the candidate release is out.
Now, by the time this episode airs,
I don't know if the release candidate will be released
or just the release candidate still,
but you can install it.
We'll have instructions in the show notes,
but essentially you just have to say 6.0.0 RC1
and you'll get it.
So there's a whole bunch of stuff
that I'm really excited about.
There's a lot of configuration
that you used to be able to put in lots of places
in your PyTest Any
or your setup config
or ToxAny or something,
PyTest 6 will support pyproject.toml now.
So if you jumped on the toml bandwagon,
you can stick your PyTest configuration in there too.
There's a lot of people excited about the type annotations.
So the 6.0 is going to support type annotations.
So it actually was a lot of work.
There was a volunteer that went through
and added type annotations to a bunch of was a lot of work. There was a volunteer that went through and added a type annotations to a
bunch of it,
especially the user facing API.
And why this is important is if you're type checking,
you're running my pie or something over your source and everything,
your project in your,
why not include your tests?
But if,
if I test doesn't support types, it doesn't really help you much.
So it will now, so that's
a really, really cool addition.
This is basically the API
of PyTest itself is now
annotated with types? Yes, and
a lot of the internal code as well.
So they actually went through and did a lot.
It was a lot of work.
If you look at the conversation chain,
it was a
month, several month
project.
Wow.
What does that mean for
compatibility?
Does that make PyTest
like 3.6 only and
above?
I think the modern
versions of PyTest
really already are 3.6
and above.
I'm not sure about
that.
Right.
So then the door was
opened to use that
because otherwise it
would...
I mean, it would be a
weird move to like release a completely new version
with Python 2 backwards compatibility.
Like, you wouldn't do that, right?
I mean, I think the message it sends is not great.
I totally agree.
There is a pinned version of PyTest, I don't remember which one it is,
that still supports 2.7 if you're on it,
but no new features are going in there.
The thing I'm really excited about
is a little flag they've added called no header.
So don't use this.
Most people don't use this.
When you run PyTest, it prints out some stuff
like the version of Python, the version of PyTest,
all the plugins you're using,
a bunch of information about it. All this stuff is really important for logging. If you're capturing the output to save
somewhere or do a bug report or something, that information is great to help other people
understand it. What I don't like about that is that it's not helpful if you're writing tutorials or
if you're writing code to put on a slide or something.
All that extra stuff just takes up space and it distracts.
Yeah, like I've had students say, like, I ran it, I think, PyTest in PyCharm.
And it has like some kind of output just stating where it is and what it's doing.
They're like, this didn't work for me.
I'm like, well, that was just random output from the tool.
You're not actually supposed to try to run that part.
You know what I mean?
But it's, I mean, I saw why they saw that.
But at the same time, like, the ability to just say, like, these details don't matter in the long term is great.
Yeah, so I'm excited about that, to trim it down.
There was a plugin called TLDR.
Too long, didn't read.
But it actually didn't take enough of the header off than I wanted.
So I had my own tool that would do this,
but now I've got this, which is great.
So a lot of the configuration, there is a chance for human error
if you type something wrong and you type a variable name wrong.
And so I really like this new flag called strict config,
which will throw an error if you have the PyTest section of your configuration
has something that it doesn't recognize.
And it probably is just you misspelled some variable or something.
Yeah, that's good to know.
And not to, I can't remember the version, but it was, I think it was in PyTest 5.
They added some code highlighting stuff that...
Yeah, that's super cool.
I discovered that just the other day.
I like just somehow updated all my dependencies in some environment and suddenly PyTest output was colored and I was like, the diff comparisons on PyTest are wonderful, but apparently they didn't do recursive comparisons of data classes and
adder classes,
but now they do.
So that's neat.
There's a whole bunch of new features.
There's fixes.
I ran through some of the features I really liked.
There are deprecations and it's a large list of breaking changes and
deprecations.
That's why they went to a new number by test six,
but I went through the whole list and I didn't see anything that was, was like, Oh, that's going to stop me. a large list of breaking changes and deprecations. That's why they went to a new number, PyTest 6.
But I went through the whole list and I didn't see anything that was like, oh, that's going to stop me. I'm going to have to change something. Okay, that's good to know. I mean, if you say,
oh, there was nothing that we're using, I feel confident that maybe there's nothing in my code
either. And I knew that somebody was going to ask, is my PyTest book still valid? Yes, it is.
I'm going through it right now i haven't gone
through the whole thing yet to make sure the side that is not compatible is not the book the book's
fine it's um i have a plugin that now is broken so uh pytest check still works but if you depend
on x fail pytest this is a wow this is a corner case but But if you depend on PyTest check and the X fail feature of it,
it doesn't work right now.
So I'll have to fix that.
So you would say X fail fails temporarily?
Yeah.
It actually marks everything as a pass.
So if you mark X fail.
Oh, wow.
That's like X fail-ception.
It's really bad.
Anyway, I'll have to get back to that.
Yeah, this is really exciting that PyTest 6 is out.
Super cool.
I know that there were some waves,
some uncertainty in the ecosystem.
So it sounds like that got ironed out.
Things are going strong.
New versions coming out.
I even saw that Guido had retweeted the announcement
and said, yay, type annotations coming in PyTest.
Of course, he's been all about type annotations these days. We'll come back to that later in the show, actually. So Enos,
I know you work a lot with text, but are you frustrated with it? What's the story of this
name here? Oh, my point of the day. Yeah. I thought I'd present something for MySpace,
obviously. Awesome. Yeah. There's this new framework that I came across. And it's called text attack. Yeah. And it's a framework for adversarial attacks and data augmentation for
natural language processing. So what are adversarial attacks, you've probably you might
have actually seen a lot of examples of it. For instance, that an image classifier that predicts
a cat or some other image, even though you show it complete noise and you somehow trick the model.
Or you might've seen people at protests
wearing like funny shirts or masks
to trick facial recognition technology.
So really to trick the model into,
to like, you know, not recognize them.
Or the famous example of Google Translate
suddenly hallucinating these crazy Bible texts.
If you just put in some complete gibberish,
like just gah, gah, gah, gah,
and then it would go like,
the Lord has spoken to the people, stuff like that.
That's amazing.
I include a link to an article by a researcher
who explains why this happened and shows the example.
It's pretty fascinating.
But I think it all comes down to the fundamental problem
of how do you understand a model that you train? And what does it mean to understand
your model? And how does it behave in situations when it suddenly gets to see something that it
doesn't expect at all? Like, gah, gah, gah, what does it do? And the thing with neural network
models is you can't just look at the weights. They're not linear. They're like, you know, you can't just look at what your model is. You have to actually run it. And so that library
text attack that lets you actually try out different types of attacks from the academic
literature and different types of inputs that you can give a model to see whether it produces
something that you're like not happy with, or that's like really weird and exposes some problems in your model.
And it also lets you then, because normally what's the goal?
The goal is, well, you do that and then you find out,
oh, damn, like if I suddenly feed it this complete nonsense,
or if I feed it Spanish text,
it like goes completely in the wrong direction
and suddenly predicts stuff that's not there.
And if you deployed that model into a context
where it's actually used, that would be pretty terrible.
And there are much worse things that can be happening.
So you can also create more robust training data
by replacing words with synonyms.
You can swap out characters and just see how the model does.
So I thought that was very cool.
And yeah, in general, I think adversarial attacks, it's a pretty interesting topic. And yeah.
Yeah, it's super interesting. So the idea is basically you've trained up a model on some text
and for what you've given it, it's probably working. But if you give it something you
weren't expecting, you want to try that to make sure that it doesn't go insane.
Yeah, exactly. And it can do it can expose very unexpected things like the Bible text,
for example, that sounds really bizarre when you like first hear it. But one explanation for that
would be that, well, especially it happens in low resource languages where, you know, we don't have
much text, and especially not much text translated into other languages. But there's one type of text
that has a lot of translations available. And that's the Bible. And so they're parallel corpora where you have one text,
one line in English, one line in Somali, for example.
And then people train their models on that.
But one thing that also is very specific about Bible text
is that Bible text has some words that really only occur in the Bible text.
It uses some really weird words.
So what your model might be learning is,
if I come across a super unexpected word
that's really, really rare,
that must be Bible.
And also the objective is,
you want your model to output a reasonable sentence.
So the model's like, well, okay,
if that's the rare word,
then the next word needs to be something that matches.
And then you have this bizarre sentence from the Bible,
even though you typed in ga-ga-ga.
And that happens.
Yeah, how funny.
Yeah.
Yeah, so it looks like they have actually
a bunch of trained models already
at the TextAttack model zoo, they call it, I guess.
Yeah, everything's called the model zoo.
Yeah, and so you can just take these
and run it against it,
like the movie reviews from Rotten Tomatoes or IMDb
or the new set or Yelp,
and just give it that kind of data
and see how it comes out, right?
Exactly, yeah.
I think that's pretty cool.
And yeah, and then you can actually,
you can also generate your own data
or load in your data and generate data
that maybe produces a better model
or like covers things that your model
previously couldn't handle at all.
So that's the data augmentation part.
Yeah, that's all very important.
And I think it's also very important to understand
the models that we train and, you know,
really try them out and think about like,
what do they do and how are they going to behave
in like a real world scenario that we care about?
Because, yeah, the consequences.
As soon as you're making decisions on this data, right?
On these models.
Yeah.
I guess as soon as a human is convinced that the model works and they start making decisions on it, right, that could go bad if the situation changes or the type of data. And especially if the model is bad.
Like I'm always saying, well, people are always scared of these dystopian futures where we have AI that can, I don't know, know anything about us and predict anything and works.
But the real dystopia is if we have models
that kind of don't work and are really shit,
but people believe that they work,
that's much more.
It's not even about whether they work,
it's about whether people believe it.
And then that's where it gets really bad.
Yeah, and that's way more likely.
Yeah, yes.
It's a more difficult world to test this sort of stuff to figure out.
What does it mean for a model to be bad?
How do you tell if it's bad?
And models can be both working in with some data sets and produce gibberish
with,
or yeah,
I guess in this case,
the reverse not produce gibberish. if you pass in gibberish.
Yeah.
Actually, yeah, I just realized it ties in very well with the PyTest point earlier.
And just like, yep, machine learning is quite special in a way that it's code plus data.
Code, you can test, you can have a function.
And you're like, yay, that comes in.
That's what I expect out.
Easy, write a test for it.
You know, it's not that easy.
Testing is hard, but like fundamentally, yeah.
It's somewhat deterministic.
Yeah.
Right.
Like, and even if it's not, there's like something you can, you know, test around it and it's
much harder with the model.
Yeah.
Yeah, for sure.
All right.
Before we get to the next item, just want to let you know, this episode is brought to
you all by us over at TalkPython Training.
We have a bunch of courses.
You can check them out and we're
actually featured in the humble bundle that's running the python humble bundle right now so
if you go to talk python.fm humble 2020 you can get a thousand four hundred dollars worth of python
training tools and whatnot for 25 bucks so that's a pretty decent deal and uh brian you mentioned
your book before tell people about your book real quick. Yeah. So Python Testing with PyTest is a book I wrote, and it's still very valid,
even though it was written a few years ago. The intent was the 80% of PyTest that you will
always need to know for any version of PyTest. And I've had a lot of feedback from people saying
a weekend of skimming this makes it so that they understand how to test.
It's a weekend worthwhile.
Yeah, absolutely.
And Ines, you want to talk a little bit about Explosion just to let people know?
Yeah, so I mean, some of you who are listening to this
might know me from my work on Spacey,
which is an open source library for NLP and Python,
which I'm one of the core developers of.
And yeah, that's all free open source.
And we're actually just working on the nightly version
or the pre-release of Spacey 3, which is going to have a lot of exciting features. I might also
mention a few more things later on. And yeah, so maybe that's already going to be out by the time
this podcast officially comes out. Maybe not. I don't want to overpromise. But yeah, you can
definitely try that out. And we
also recently released a new version of our annotation tool Prodigy, which comes with a lot
of new features for annotating relations, audio, video. And the idea here is, well, once you get
serious about training your own models, you usually want to create your own data sets for
your very specific problems that solve your problems. And often, the first idea you have
might not be the best one. It's a continuous process. You want to develop your data. And Prodigy was really designed
as a developer tool that lets you create your own datasets with a web app, a Python backend,
you can script. That's our commercial tool. That's how we make money. And it's very cool to see
a growing community around this. So yeah, that's what we're doing. We have some more cool stuff
planned for the future.
So stay tuned.
Yeah, people should check it out.
Actually, you and I talked on TalkPython 202
about building a software business and entrepreneurship.
You had a bunch of great advice.
So people might want to check that out as well.
Do you actually know these episode numbers by heart
or did you look that up before?
Some of them I know, but that one I used the search.
Okay.
I remember you were on there.
I remember what it was about, but not the number.
I just put together that I know two people from
Explosion, so that's interesting.
Yeah, Sebastian.
Yeah, he was on your podcast
recently, which I feel really bad.
I wanted to listen to this because he
advertised it with like, it will tell the story,
true story behind his mustache,
which I really wanted to know.
But then I was like, I'll need to listen to this on the weekend, and I forgot. So yeah, if he's listening, I'm sorry, I will definitely I
need I need to know this. So I will listen. Excellent. So don't spoil it.
Do a great work on fast API. All right. Speaking of people that have been on all the podcasts as
well as Brett Cannon, he recently wrote an interesting article called What is the core
of the Python programming language?
And he's legitimately asking as a core developer, what is
not the maybe lowest
level, but what is the essence, I guess
is maybe the way to think about it.
I only just got the core core
pun. It did not occur
to me when I first read the article.
I feel really embarrassed now.
To be fair, English is not my first language, but still
it's not about that.
Anyway, sorry for interrupting.
When I first read it, I was thinking like, okay, we're going to
talk about what is the lowest level
and yeah, okay, it's probably C and C of L
dot H, C of L dot C and
so on. But really the
thing is, Brett has been thinking a lot about
WebAssembly and what does that
mean for Python in the
broad sense he and I talked about it on talk Python I think at the very last icon event we did a live
conversation there about that and it's important because there's a few areas where Python is not
the first choice maybe not the second, sometimes not even the 10th choice
of what you might use to program some very important things like maybe mobile,
maybe the web, the front end part of the web, importantly, I mean. So there's a few really
important parts of technology where Python doesn't have much reach, but all of those areas
support WebAssembly these days,
right? And if you have something in C, you can compile it to WebAssembly. So there's some thought
about like, well, what can we do potentially to make a WebAssembly runtime for Python so that
Python magically, almost instantly gets access to what was just javascript front-end frameworks space and what is
you know mobile ios and android and all those things allow you to directly run javascript as
part of your app so how would we make that happen so it's pretty important right if we could solve
that problem like python is already so popular and his growth is so incredible like what if we
could say oh yeah and now it's an important language on mobile
and it's an important front-end language framework.
Like that would just take it to the next level
or maybe a couple levels up if you do them both.
And WebAssembly seems to be one of the keys
to kind of bridge that gap, right?
So Brett talks about in this article
how for so long we've just had CPython
is what we think of when we have python sometimes people use
pi pi py py as a partially jit compiled version sometimes faster version of python but not always
because the the way it interacts maybe with c libraries that you might be using through packages
and so on and really it's a lot of Python's dynamic nature makes it hard to do
outside of an interpreter, where to be clear, WebAssembly is a compiled language, right? So
if you're going to put it over there, maybe it's going to require it to be compiled.
So this is a really interesting thing to go through and read and think about with Brett,
he talks about things like, well, how much of the Python language would you have to implement
and still consider it to be valid python like we
talked about micro python and usually don't people look at they don't look at that go that's not
python that's fake right no like it's python but it's not as much python right you don't have the
same all the apis on micro python as you do on regular python so questions like do you still
need a repl could you live without locals right The ability to ask what the local variables are and so on. So he said he didn't really have a great bunch of
great answer. It's more of a philosophical, like we need to solve this. But I do want to share some
of my thoughts on this. And I feel like maybe what we could do is we could come up with like a
standard Python language definition that is a subset of full python right here's the
essence like okay we have to be able to create classes we have to be able to create functions
you have to define strings probably you want type annotations but do you need a val maybe maybe not
right so like that if you could have a subset of the language that was smaller as well as the standard library because do you really need
to like parse css hex colors everywhere probably not it's a very underused part of the library but
it's in there right so if we could narrow it down maybe it would be easier to think about how does
it go to web assembly how does it go to like some kind of javascript runtime or something like that
and if it sounds crazy you know the, the.NET people did this.
They have a.NET standard class library language.
They got it running on WebAssembly.
So there's an example of it out there
and something that's kind of sort of similar, right?
So I think this would just open stuff up
if you could get Python in these places.
What do you guys think?
Initially, I was never so sold on WebAssembly
and especially WebAssembly
and Python until I watched
Dave Beasley live code a compiler
at PyCon
India, I think it was. And I was like,
this is kind of fun.
It's just also fun to
watch Dave Beasley live code a compiler.
Yeah, for sure.
Classic. So that did get
me thinking.
I do think one question I think we should ask ourselves is,
well, do we really need Python to do all of the things in the browser?
Does this really have a benefit that actually makes a difference?
A. B.
There are a lot of things people use Python for that just wouldn't work in that way.
And that's also, I think, part of what made Python so popular in the first place. Like, for instance, you know,
all the interactive computing environments. That's why people want to use Python for data science,
iPython, Jupyter Notebooks, that sort of stuff. That's why, you know, Python as a dynamic language
made so much sense to people. And that's what made it popular. And large scale processing,
like a lot of the type of stuff we're working on,'s like yeah you they can there's stuff that you can run
in the browser but it's never going to be viable to run large-scale information extraction in the
browser because you want to run that on a machine for like a few hours but i think there are a lot
of opportunities also in a machine learning space for privacy preserving technologies that already
exist i think from what i understand mozilla is working on some features built into the browser, where you know,
you can have models predicting things without it being sent to someone's server. And I think that's
obviously very powerful. That's an interesting idea, right? Yeah, yeah. Because if you could
have a little bit of machine learning, yeah, but you don't have to give up the data privacy aspect
of it. That's pretty cool. Yeah. So I think for that, there's a lot of potential here for running Python in a browser.
Yeah.
Well, we start getting used to saying what is Python is what is the CPython implementation.
And we got to remember CPython is the reference implementation for the language spec.
And I think, I guess we're kind of getting at, maybe we need to split it up and have a core language spec and an extended one or something.
I don't know.
Where would you divide the line?
Because we've seen, like you said, we've seen things like CircuitPython and other things.
And we've actually talked about several smaller languages based on Python that just try to be the same syntax.
But at which point is it when is it
it's not python anymore and there's at least some of the stuff like i could totally see
having a distribution of python that doesn't have a repl still count i could totally see
not having idle for instance if something doesn't ship with idle is it still python
i think so and because of idle,
then you need TK enter and, or you need TK stuff in there. And there's a lot of stuff that maybe
I would be in like, you know, could you live without locals? Most of the time, probably.
I actually think this would be since the web and since mobile is so such a big part of our lives
and it will be for a while, this might be a decent dividing line to say, whether or not it's for WebAssembly or not, maybe we should split the division at whatever we need to implement a WebAssembly version of Python.
And anything above that line is an extended version of Python or something.
Yeah, that's a good point.
All right, I don't want to go too long in this section because I want to make sure we
get the others.
But I do want to leave you with just some thoughts.
What if shipping Python was just shipping a single binary and a thing that ran it?
You could do that with WebAssembly.
Maybe two WebAssemblies, the runtime plus the code.
What if all the browsers had capability to plug in alternate runtimes through WebAssembly. So right now you
have a JavaScript engine, but what if like say Firefox and Edge and whatnot came up with a way
to say, here's a WebAssembly API to plug in alternate runtimes, Python, Ruby,.NET, Java,
you name it, and then shipped with the latest version of each of those runtimes. So you just
don't have to down,
like the big problem now is you can do it,
but you still got to download like 10 megs per page,
which is not a good idea.
So anyway, I think there's a ton of interesting things
that open up if this were possible.
So I'm glad Brett's still on this
and hopefully he keeps thinking about it.
Brian, I still need to learn Pathlib.
You got any ideas on how I do that?
Really, you're not using path lib i i am such a i'm just stuck in the os path world i just really
need to get with the time help me out okay so path lib is i mean i know yeah you're like some
kind of animal like oh so i have no offense to always stop path but you know no i really love pathlib a lot and
but there is i gotta tell you that the documentation for pathlib doesn't cut it as an
introduction you can find what you're looking for but if you know what you're looking for but i agree
with chris may so chris may wrote a post called getting started with pathlib i guess it's kind of
he's got a little pdf field guide that you can download,
but he has a little bit of a blog post introducing it.
But I downloaded it.
It's like nine or 10 pages, and it's actually a really good introduction to Pathlib.
So I really like it.
The big thing with OSPath versus Pathlib is Pathlib creates path objects.
So there's a class that represents a path that you have methods on.
And it makes it different
for when you're dealing with this.
With the os.path, it's just strings.
So it's manipulating strings that represent paths.
So the object's different.
I like it.
Actually, I switched just for the ability
to add build up paths
with just having the slash operator.
Yeah, it's really interesting
how they've overridden division.
Yeah.
But I think it's a good example of where this makes sense.
It's a reasonable use case.
It looks good.
It's defensible.
There are other cases where you're like,
oh, did you really have to like overload these operators?
But they're fine.
I think that's very valid.
Yeah.
Yeah.
And things like how do you find parts of a path
that when you have to parse paths, that's where Pathlib really shines for me.
So if you want to find the parent of something or the second level parent,
there's ways to do that in Pathlib and in OS.path.
You're stuck with trying to split things and stuff, and it's gross.
I mean, there are operations to do it,
but it's very good to have this relative i don't know just
all these operators like parent and then one of the things that i it took me a while to figure
out was i was used to trying to find the absolute path of something and in path lib the finding the
absolute path is the resolve method so you say resolve and it finds the absolute path for you
you can find the current working directory you can go up and down folders, you can use globs, you can find parts of path names and stuff.
And it's just a really comfortable thing. So this, I think you should give it a whirl.
And it's not like it's going to change your life a lot. But the next time you come up with when the
next time you're programming, you're like, okay, I got to figure out, I got to have a base directory
and some other directory. I'll reach for Pathlib instead of os.path.
Yeah, I guess it has been there since 3.4, so I should get the times.
Yeah, so I mean, now before I could see the objection of like, oh, you have to backport it.
And also, I think what I like as well is a lot of integrations that like, you know,
automatically can perform checks where the path exists, stuff like that. Or for me as a library author,
you know, you're writing stuff for users
and you want to give them feedback.
And for instance, in a library like Click or Typer,
which is the modern type hint version CLI interface,
which was also built by my colleague, Sebastian,
you can just say, hey, this argument is a path.
What you get back from the command line is a path.
It will check that the path exists via pathlib.
So it does like, you know, a whole bunch of magic there.
Yeah, that is super cool.
Yeah, or you can say it can't be a directory
and then you write your CLI,
user passes in an invalid path
and you don't even have to do any error handling.
It will automatically, before it even runs your code,
say, nope, that argument is bad.
So that's pretty cool as well.
That's awesome.
And you don't have to care about Unix versus Mac or PC
or something like that.
Yeah, I mean, Windows, I mean, no offense to Windows,
but it's always the handling paths,
and Windows is always the classic story.
Also, as a library author,
we're supporting all operating systems,
but like, well, Windows just does it a bit differently,
and you cannot assume that a slash means a slash yeah for sure all right well the final item is yours enos
and it's definitely interesting so if you're working in the machine learning data science
side of things it might not be enough to just back up your algorithms and your code right yeah
you also have your yeah machine learning is code and data.
So yeah, so this is something we discovered a while ago and that we're now using internally.
So we currently, as I mentioned before, we're working on version three of Spacey. And one of the big features is going to be a completely new optimized way for training your custom models,
managing the whole end-to-end workflows from pre-processing to training to packaging,
and also making the
experiments more reproducible. You want to train a cool model and then send it over to your colleague
and your colleague should be able to run the same thing and get the same results.
Sounds really basic, but it's pretty hard in general in machine learning. So our spacey
stuff will also integrate with a tool called DVC, which is short for data version control,
and which we've started using internally for our models.
And DVC is basically an open-source tool for version control,
specifically for machine learning and for data.
So, you know, you can check your code into a Git repo
as you're working on it, but you can't just check your data sets
and models and artifacts into Git or your model weights.
So it's very, very difficult normally
to keep track of changes and your files.
Most people just end up with this directory
of files somewhere, and it can be very frustrating.
And so you could think of DVC as Git for data.
And the command line usage is actually pretty similar.
So you type Git in it and DVC in it to initialize it,
and then you can do DVC add
to start tracking your assets
and add them. So it's like, I think if you're familiar with Git as like abstract, it can be
at times, you will also kind of find it easy to get into DVC. And it basically lets you track
any assets like datasets, models, whatever, by adding meta files to your repository. So you always have like the checksum in there
and you always have these checkpoints of the asset,
even though you're not actually checking that file into your repo.
And that means you can always go back,
fetch whatever it was from your cache and rerun your experiments.
And it also builds this really cool dependency graph.
So you can really have these complex pipelines
with different steps. And then you only have to rerun one step if some of the inputs to it have
changed. So in machine learning, you'd often have a pipeline. You download your data, then you
pre-process it, then you convert it to something, then you train, then you run an evaluation step.
And everything sort of
depends on each other. And that can make things like really hard. And you never know, you usually
have to run everything very clean from scratch. Because yeah, if something changes, your whole
results change. So if you set up your pipelines with DVC, it can actually decide whether something
needs to be rerun. Or it can also know what needs to be rerun to reproduce
exactly what you're trying to do. So that's pretty cool.
Yeah, that could save you a ton of time and money if you're doing it in the cloud.
Yes, exactly. Yeah. And you know, you can share it with other people. It's like it's,
it's, I think it definitely solves a problem that's very real. And yeah, the people making
DVC, they've also recently released a new tool that I have not personally checked out yet. But
it looks very interesting. It's called CML, which is short for continuous machine learning.
And that's really more of the CI, which kind of is logically the next step, right? You manage
everything in your repo, and then you obviously want to run automated tests and continuous
integration. So the previous looked really cool. It showed kind of a GitHub action where you can
submit a PR
with like some changes to your code and your data.
And then you have the bot commenting on it
and it shows like accuracy results
and a little graph and how stuff changes.
So it's really like these code coverage bots
that you've probably seen
where like you change some lines
and then it tells you,
oh, coverage has gone up or down
and the new view of your code
so that's what it looks like so i think yeah i'm pretty excited about this and definitely it solves
a problem it's already been solving a problem for us and yeah how does it store the large files i
know it has this cache is that a thing that you host does it have a hosted thing that's kind of
like github or i'm not sure if you could you probably connected to some cloud but like normally
you have that locally it also has a cool thing where you can actually download files via the tool. And then depending on where you're fetching it from, if it's a cloud storage bucket or however they call it
locally as like you know so it's like kind of a drive you have access to locally and then you can
just sort of type gs blah blah and then the path and really work with it like a local file system
and that's pretty nice so you can you know you can have you can work with private assets because
the thing is a lot of toy examples assume that oh you just download a public
data set and then you train your model and then you upload it somewhere but that's not very realistic
because most of the time the data you have can't just go in the cloud publicly so yeah but yeah i
think i don't even know exactly how it works in detail but like it can basically tell fetch i
think from the headers or something it can tell whether the file you're downloading has changed
and whether there's something new.
Yeah.
With a normal version control,
one of the reasons we use it is to try to find what's different.
Do you do diffs on data?
I don't know.
Maybe.
I mean, I'm not sure if there's... I think the main diff is more like around the results that you get.
Because, I mean, diffing large data set,
diffing weights, you kind of can't
that's that's really where we are the you know the other problem where like you need to run the model
to find out what it does and then you're diffing accuracies rather than weights okay i don't know
if it does like actual diffing of the data sets but often the thing that changes is really the
models like you have the you know you have your whole data and then you change things about your
code yeah and something changes and it's very
you want to keep track of what it is or how it manifests yeah it's really cool to see them
working on this yeah so and also we'll be in spacey 3 we'll hopefully have a pretty neat
integration where you know if you want it's not like mandatory but if you say hey that's cool
that's how i want to manage my assets you can just run that in your in a spacey project and
then it just automatically tracks everything and you know you can shake that into git and share it and other other people can
download it so that's yeah i'm pretty excited about that it works pretty well so far yeah
everything you can do to make it a little easier to work with spacey and just make it reproducible
yeah and it's just the things are hard like there is i'm not a fan of these oh one click
everything just magically works like it looks it looks nice and it's a the things are hard. Like there is, I'm not a fan of these, oh, one click, everything just magically works.
Like it looks nice and it's a nice demo,
but like once you actually get down to like the real work,
like things need to be a bit modular.
Things need to be customizable.
Otherwise you're always hitting edge cases
or you have these leaky abstractions.
So yeah, I think things should be easy to use,
but you can't just magically cover everything
by just providing one button.
That's just not going to work.
Yeah, because when it doesn't work,
it's not good anymore.
Yeah, exactly.
Yeah.
All right.
Well, that's our six items that we go in depth into.
But at the end,
we always just throw out a couple of really quick things
that maybe we didn't have time to fit into the main section.
And I want to talk about two things that are pretty exciting.
One is if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are. There's probably over a million podcasts these days. One of our listeners,
Anton Zianov, wrote a cool Python package that will let you search the iTunes directory and
query it. It's basically a Python API into iTunes podcasting directory.
You know, some people think that you've got to be part of the Apple ecosystem
to care about iTunes, but really that's just the biggest like directory,
kind of Yahoo circa 1995 style of listing of podcasts.
So if you care about digging in and researching podcasts, check that out.
That's pretty cool.
And then,
yeah.
And then I've also,
I'm such a big fan of F strings.
How about you two?
Yes.
Yes.
F.
That's right.
Yeah.
I'm finally,
I'm finally working in like Python three only.
I remember,
I think last time I was on the podcast,
I was basically,
I was saying how like,
Oh,
all these modern things.
They're so nice.
I wish I could use them more,
but we're still supporting Python two, but like, no, everything these modern things, they're so nice. I wish I could use them more, but we're still supporting Python 2,
but like, no, everything I write now, 3.6, yes.
And I've talked previously about a tool called Flint,
F-L-Y-N-T, which lets you run against an old code base
and convert all the various Python 2
and 3 styles of formatting magically into Python 3.
I think that was actually the episode I was...
Yeah, you might've been, right? Like, I wish I could run this. Right. Yeah. And yeah,
I ran that against like 20,000 lines of Python. I found like just a couple errors reported and
they got fixed. So that's nice. But the thing that's bugged me endlessly about F strings is
I'll be halfway through writing the string and I'm like, Oh yeah, I want to put data here. So I
got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like the string is
being passed to a function. So I go back to the first quote, put the F, go back forward and then
start typing out the thing I actually wanted. Right. Or maybe I'll F string something when I
really, I'm not going to put data. Right. So it's like you're halfway through and you want it to
become an F string. Well, PyCharm is coming with a new feature where if you start writing a regular
string and pretend like it's an F string,
it'll automatically upgrade,
upgrade it to F strings.
Yes.
Halfway through.
Yes.
Without leaving.
So you just say curly variable.
It's like,
Oh,
okay.
That means that's F string and the F appears at the front.
Yes.
Nice.
So that is pretty awesome.
Anyway,
those are my two quick items.
Enos,
I'm also excited about the one you got here.
Yeah. Awesome is awesome.
Yeah, I had one, which is something coming to 3.9
or in 3.9, which is PEP 585.
And you can use, when you use type annotations,
you can now use the built-in types like list
and dict as generic types.
So that means no more from typing import list
with a capital L. Yes l yes yes so you just literally
but i mean when i first saw it i'm like that looks strange but like yes i'm so excited about
this it probably it'd be years until i can just like use it all across my code basis because
true yeah but like yay that's in three nine yeah yeah that it's in 3.9 i'm already using 3.9 and i didn't know that you
can do this yeah yeah and guido is one of the guys on the uh the pep making this happen like
i said he's really into typing oh that's great this is really cool because it was super annoying
to say oh you have this new import just because you want to use type annotations on a collection
right now you don't have to there's actually a bunch of the collection stuff and iterators and whatnot like this you know the collections module like that a bunch
of stuff in there it's really nice and they're compatible like lowercase list of str is the same
as capital list of str i believe all right brian what you got oh i just wanted to i'll drop a link
in the show notes uh testing code 120 code one 20 is where I interviewed,
uh,
Sebastian Ramirez,
uh,
from explosion also.
And,
talking about fast API and typer,
because I'm kind of in love with both of those.
They're really cool.
Yeah,
absolutely.
All right.
Well,
uh,
that's a cool one.
Definitely going to check that out.
And you can find out why he has the cool mustache.
That's right.
All right. So we always in the show with a joke and i thought we could do two jokes today so i think enos do you want to talk about
this first one oh yeah i mean i'm not even sure it counts it's a joke per se but like it's more
of a humorous situation i guess right yeah it ties in um well it's Sebastian again. Like he had this very viral tweet the other day
where he posted about some experience.
I can just read it out
because I think it needs to kind of stand on its own.
So he writes, I saw a job post the other day.
It required four plus years of experience in FastAPI.
I couldn't apply as I only have 1.5 plus years of experience
since I created that thing.
And then he says, maybe it's time to reevaluate plus years of experience since I created that thing. And then he says,
maybe it's time to reevaluate
that years of experience equals skill level.
So, and this was,
I was like, it resonated with people so much.
I was actually surprised to see,
like everyone was like,
oh yeah, HR,
like apparently this seems to be this huge issue,
obviously that like,
most job ads not written by the people
who actually work with the technologies and where you have, yeah. Actually, yeah, not most job ads, not written by the people who actually work with the technologies
and where you have, yeah.
Actually, yeah, this is awesome.
And this tweet actually just got covered on DTNS,
the Daily Tech News Show, I guess it is.
Alongside another posting that said
you needed eight years of Kubernetes experience
for another job.
But of course, Kubernetes has only been around for four years.
Yeah, when you say this went viral, it had 46 46 000 retweets and 174 000 likes that's like
that's got some traction i feel like this might be a problem yeah yeah i was i was surprised that
like so many people are like yeah that's a big deal and it's like and i mean it is true like
kind of tech hiring sort of seems seems to be broken and it's also, it's like, it's a bit different in my case, I guess.
But like, I don't qualify for most roles using the tech that I write.
And in some cases that's justified
because I'm not a data scientist.
Just because I write developer tools
for data scientists doesn't mean I can do the job.
But in other cases, I'm like,
there's kind of a ridiculous amount of arbitrary stuff
you're asking for in this job ad.
Maybe that's needed, maybe not.
But like, it centers around like a piece of software that i happen to have written and i do not qualify
for your job at all like that's insane the last time i wrote a job description i intentionally
left off the college a degree requirement because all of the other requirements i was listing in
there either they had it from college plus experience or they had it just from experience.
So I was fine with that.
By the time it actually went live,
somebody in HR had added a college degree requirement to it.
I just couldn't get away with not listing that, I guess.
Yeah.
Master's degree in space is preferred.
Yeah, but I guess another problem is,
well, look, if HR writes these job ads
with these bullshit requirements, then, well, look, if HR writes these job ads with these bullshit requirements,
then, well, who applies?
It's either people who are like, yeah, whatever,
or people who are full of shit.
And then that's the sort of culture you're fostering.
And it might not even be the engineer's fault
who voted on his job description,
but who applies to that?
You're going to make me lie about my fast API experience.
Yeah, or people just apply to anything.
I'm like, yep, I have 10 years experience in everything.
Great. And they're like, perfect. That's what we're looking for.
You're hired. And then you wonder,
why is our company culture so terrible?
Well, I actually
did have somebody apply to a job
and say they have multiple
years of experience in any
new language coming up.
Nice. It looks like we're just about out of time. Let me give you one more joke for it. Brian, will you describe
this picture and then I'll read what it says? There's a poorly drawn horse, I think, zebra
horse that has a white on the back end and black on the front end. And the text says,
I defragged my Zebra.
I don't even know if people defrag drives anymore.
So this is only going to resonate with the folks that have been around for a while.
I saw that there was this great video I came across on YouTube
where you can actually watch like a live defrag session,
like, I don't know, Windows 95.
And it's like, I don't know, it takes a few hours.
And, you know, you can kind of bring back that nostalgia
and just put it on your TV and just sit there and you're like, yeah.
It's like the aquarium you would put on your TV.
But for tech.
Follow the show on Twitter via
at Python Bytes. That's Python Bytes
as in B-Y-T-E-S.
And get the full show notes at
PythonBytes.fm. If you have a news item
you want featured, just visit PythonBytes.fm
and send it our way. We're always on the
lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank
you for listening and sharing this podcast with your friends and colleagues.