Python Bytes - #79 15 Tips to Enhance your Github Flow
Episode Date: May 25, 2018Topics covered in this episode: pytest 3.6.0 * Hello* Qt for Python MongoDB 4.0.0-rc0 available Pipenv review, after using it in production Pandas goes Python 3 only Extras Joke See the full sho...w notes for this episode on the website at pythonbytes.fm/79
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 79, recorded May 23rd, 2018. I'm Michael Kennedy.
And I'm Brian Ocken.
Hey, Brian, how you doing?
I'm doing great.
Nice. I think, as always, we've got a bunch of fun stuff to talk about.
And we wouldn't be doing it without DigitalOcean.
A couple reasons, but DigitalOcean is sponsoring this episode and a bunch of the upcoming ones. So thank you to DigitalOcean. Get $100 off or $100
credit at pythonbytes.fm slash DigitalOcean for new customers. Tell you more about that later.
I would be totally surprised, Brian, if you wanted to cover something about, say,
testing or PyTest. Yeah, yeah. So PyTest 3.6.0 just got announced. So 3.6.0 for PyTest.
And this is like an inside baseball kind of release
because there's not a lot that if...
I think 80% of the people using PyTest won't see a difference.
But however, this was a big deal for the team.
Essentially, it's a revamp of the implementation of the marker system and the data type that was used to hold the markers.
So there was a couple other things.
That's the big thing that's going on in the 3.6.0 release is a reworking of the markers in it.
And there's a list on their release notes of all the different defects that they fixed with this. The takeaway for a lot of people is if you were using,
if you're writing a plugin or something or using the plugin features and using get marker to find
out which markers are applied to a particular function, the get marker is deprecated. There's
a new API, there's inner markers and get closest marker. And yeah, we'll have a link in the show notes to read more on that.
So most of it's a plugin writers change,
the API change, but it's exciting.
And I'm excited for the team to get that out
because kind of like the Django 2 release,
it's about maintenance and going forward.
And so that's great.
There is one more feature that we,
from a couple of other things,
a breakpoint, the PyTest is supporting the breakpoint functionality in 3.7.
And that is brought to you by our friend, Anthony Shaw. So he put that in.
Oh, nice. Yeah. He's doing a lot of work on Python 3.7 because
rumor is he may be doing a course on Python 3.7. So awesome. He was able to bring it over here.
And a couple other smaller things like the, apparently I had never run into this, but if you have an assertion failure on equality and the only thing different is white space, it's kind of hard to tell.
So, they now escape characters, too.
So, you can see what the white space difference is a little bit better, which is kind of cool.
I've never run into that.
That's a little hard to print it on.
You're like, those are the same.
No, no.
That's two spaces, not one right there.
Yeah, so for the main port, I wanted to get this out to as many people as possible.
So if you are depending on reading markers in your internal code, pay attention to this.
So that's it.
Yeah, it sounds like a nice cleanup of the internal APIs for extension writers.
And that's always good because that probably means more extensions extensions are more likely to
be built yep sounds good so have we talked about guis python guis on this podcast i'm not sure if
we have we probably should yeah we probably should yeah let's do that so there's a lot of stuff going
on i you know part of the reason i went on that rant is because stuff needs to be happening there,
but also because some things are happening.
Like we had the WXPython Phoenix release, which is kind of a rebirth of WXPython, which is really great.
Well, we also have the same thing going on for Qt.
So for a while, there was a sort of a split.
There was PyQt.
There was PySide.
There was PySide2.
There are all these ways.
They defended it on different versions of Qt, and it PySide, there was PySide2. There are all these ways.
They defended it on different versions of Qt,
and it was kind of just generally a mess.
So the Qt company is now officially making something called Qt for Python, which as far as I can tell is more or less like a rebirth of PySide2 for what that's worth.
So it's really nice that the company that makes Qt,
the cross-platform GUI framework,
is really dedicating itself to Python.
Yeah.
One of the things that I think is cool about the Qt space
is they have the Qt designer,
and I think that's really nice and important
for a heavy visual way to design the UI.
I know you can write code and say the position is 20, 20 and it stretches this wide, but like that is not the same as draggy
droppy, press the button. You know what I mean? So I got a lot of, I'm pretty excited about this,
let's say. So that's really cool. They basically are keeping it super similar to the Qt C++ API,
where that makes sense. So like if you read documentation about C++,
that which is the native language for Qt, if you replace the pointer dereference, so the arrow,
the dash arrow, to replace that with a dot, you know, that may well be the Python API.
Okay, which is good. But some of the problem, some of the drawbacks, let's say that are like,
it doesn't necessarily leverage the Pythonic features.
So like maybe you call a function to do a thing rather than put a decorator on to something
else, things like that.
One thing that is nice is a lot of these UI frameworks are super painful to install, right?
You can install them on the system and then they don't work so well.
Maybe there's like some big long compilation step like WXPython takes forever to pip install it onto Ubuntu.
The last time I tried doing that.
So they're planning on shipping a wheels version of Qt,
which before you had to get like some separate installer or something.
So that'll be pretty sweet
that you'll just be able to pip install your thing
and it'll come with the foundational stuff you need.
That's exciting.
Yeah, that is pretty cool, right?
So, I mean, I really hope that the company behind Qt putting a big effort into this is going to mean, like, finally a polished version.
So, we'll see.
I think the licensing might still be GPL and LGPL.
So, as a combination, take your pick.
I'm not sure what the variations are exactly
there but i don't know i'd like to see something more permissive but who knows still still nice to
see some progress here so do you know i was trying to find it do you know the projected release date
for the official qt for python or is that they're about, so the article I'm linking to is a blog post calling hello QT for Python.
And they say they're working on a technology preview.
So that's all I've seen,
but they don't seem to have any further information that I easily found.
It may be somewhere else.
Yeah.
So it'll be,
it does say it'll be available under GPL,
LGPL and commercial licenses. It talks about when development be available under GPL, LGPL, and commercial licenses.
It talks about when development started and stuff like that,
but it doesn't seem to have a release date.
So there it is.
All right, cool.
Nice. Well, speaking of sneak peeks on things,
we've talked about MongoDB, the 4.0 release that's coming.
We've talked about that before, but now you can play with it. So the
400 RCO zero is now available. It's the very zeroth version of the RCO.
Yes. There's a lot of zeros there. Yeah. So that is out and ready for testing so people can
actually get their hands on it and try working. Again, the big news for for this there's a lot of new features but the big news is
uh acid transactions and multi-document multi-document acid transactions yes that's
that's a pretty big deal and i actually don't know if this is a big deal well there's a lot
of things here but non-blocking secondary reads i don't even know if i know what that means so
the idea with the non-blocking secondary reads is one of the ways you can set up
mongodb is in a what's called a replica set so there's like a primary thing that you read and
write to and then there are other ones which are constantly just staying in sync with that server
okay there's a couple of benefits to that like you could put them into say different data centers
the primary thing is if like
for some reason the the main server the primary server fails it'll automatically switch to one of
the secondary ones so it's kind of like a failover redundancy sort of thing as well but you can
configure it in a way that you can say i would like to read from the the non-primary database as a way of like adding read scalability.
So like if I have five servers in the cluster,
if I don't do anything,
I can only talk to the primary one as a single server
and I get no boost of concurrency, let's say.
But if you say, I want to read from the others,
well, then all of a sudden there's like,
you're sort of farming that across six different servers know servers primary plus other five or whatever right that used to
block for uh consistency reasons and now apparently they found a different way to ensure consistency
maybe because of the transactions okay anyway that's a long long explanation for what i think
that means that doesn't make sense and that's cool. At least I know that there are a lot of people
that choose a SQL database over a document database
mostly because of the lack of transactions.
And so that's one of the reasons why I brought this up
because I'm excited about transactions.
Yeah, I think that's super exciting as well
for the reason you just said.
What I do think is interesting is
as people get to more serious
applications, they get to a place where often they give up transactions anyway for sort of concurrency,
right? Like, you know, if I go to Amazon, it's not like, and I go to order something,
it's not just going to lock all of Amazon while I interact with, you know, my order.
What it's going to do is like, say,
we'll replace the order. And if it happens to be that actually the thing you ordered sort of sold out just at the moment that you pressed it, you'll get like a message or something, right? Like,
hey, sorry, we couldn't fulfill it or whatever. Here's your refund. So there's a lot of these
sort of compensation things that get put into like high scalable stuff. I just grabbed Amazon as an example. I don't
really know how they work. But you know, there's a lot of these large sites that sort of don't use
full on transactions in the same sense that other ones do. So it's pretty interesting. It's
interesting in that I don't really think transactions are something I'm going to be
using in any of my sites. They just don't really seem to be necessary with a few possible exceptions. I'll get to what those might be in a little bit. But
yeah, I think you're right that when people feel they need it, or there are a few situations where
you really do need it, this is super interesting. One other thing that's kind of cool that's not a
4.0 thing, but it's in a 3.6, which is the one right before this, as far as I know, is actually the streaming API. So if I've got like, say, WebSockets, or something that I want
notification of like push of change to the database, you could like run a query and say,
I want to stream new results that hit this query. And then as stuff is inserted to the database that
matches, it'll get pushed out to you instead of repolling the database.
So suppose I connect to a chat server and I set up WebSockets.
You could like literally subscribe to like these change streams on like the conversation record.
And you would just get them pushed back down instantly without any polling end to end.
Okay.
That's pretty cool.
It's kind of like RethinkDB's feature, primary feature was. I guess
where I would probably use transactions a lot, and it's not really transactions, but because of
transactions, you can do this, is I believe 4.0 also includes rollback checkpoints. For instance,
you can grab a replica of a big database or something. And like, for instance, for like during testing,
you can have a starting point,
do a whole bunch of transactions on it,
query it, and then roll back to a previous state.
Yeah, that is pretty cool.
And I think maybe that secondary non-blocking read stuff
has to do with that as well.
You sort of begin a transaction and you start reading.
Yeah, anyway.
Yeah, yeah, very cool.
So I'm glad to see that that's coming along.
I feel like the NoSQL document database world
and the relational world are kind of like merging.
They're getting closer to each other in a lot of ways, right?
We have Postgres getting JSON stuff.
We get MongoDB getting transactions,
and they're all kind of sort of growing
and intersecting in interesting ways. Speaking of interesting, DigitalOcean is pretty interesting. They're
doing a lot of good stuff for us. So like the files that you're getting, when you download
the podcast, the website, all that stuff is running on DigitalOcean servers. And I'm super,
super happy customer of theirs. And they're sponsoring the show as well. So one of the
things that's cool, maybe I mentioned this a while ago, Brian, is their sort of one-click app server configuration. So if I want to create,
say, a server with MongoDB all configured, I can go there, say, create me a droplet with
this version of Mongo or with this other web framework set up, and it'll automatically create
all the server configuration and have everything set up
and ready to go within like 60 seconds so really really nice and the probably the biggest thing if
you are not using digital ocean you can get a hundred dollar credit by going to pythonbytes.fm
slash digital ocean so that's a pretty good deal yeah that's great yeah awesome so if you're looking
for a nice affordable fair and very fast server hosting them out, pythonbytes.fm slash digitalocean.
So, Michael, have we talked about pipenv in the show before?
If I recall correctly, I think we were confused about pipenv.
I was confused about pipenv. that pip became sort of the officially recommended way of the packaging authority in Python
to manage packages.
And I'm like, oh, when did that happen?
That was pretty interesting.
So there's been a lot of debate.
And you said there was kind of a coarse Reddit thread.
Like, imagine Reddit was unkind to people.
Could you imagine?
Right, yeah.
That's unfortunate.
But I think it's too bad that that kind of stuff happens.
And maybe we should all just speak up like, hey, that comment is out of bounds, right?
Anyway, I'm not going to link to it.
I don't want to encourage it.
But I do want to link to this thing called PipMF Review after using it in production.
So there's this team that used PipMF in production since November 2017.
So what is that? A little over half a year, maybe almost exactly half a year.
And this sort of comes, they talk about, this is what worked for us.
This is what wasn't working so well for us.
And in the end, they're like, at no point did anyone in the team ever mention getting rid of PIP-EMF,
which actually is a pretty strong statement, apparently.
So like, you know, we got to get rid of statement, apparently. So like if no, he said,
no, we got to get rid of this. It's just like, ah, it's not quite working in some way. So here,
I'll give you the rundown. The article starts off pretty accurately. It says, the current state of Python's packaging is awful. I don't think anyone would disagree with that. The problem is recognized
and there are many attempts to solve the mess. And pip-F was the first and it did get a lot of traction, but not everyone loved it. And he said, one of the areas where PipMF can be a challenge is for
libraries. So PipMF is around, is more built for managing the dependency of an application.
But if you're a library author, that it doesn't necessarily make a lot of sense.
Yeah, I'm on the fence on that.
Sure.
Sorry, I forgot the guy's name.
The reason that he said this was basically supporting multiple environments goes against PipM's philosophy.
So they want a deterministic, reproducible application environment. But, you know, if you're going to do that for, say, PyPy and Python 2.7 and
Python 3.6 or whatever, well, then it doesn't really work potentially, right? Because it's,
you know, once exact hashes of the exact libraries, and if those don't match,
then you're out of luck, right? So that's a challenge. I think that's the primary challenge.
Yeah, yeah. And I agree with that. And it's just a challenge. I think that's the primary challenge. Yeah, yeah.
And I agree with that.
And it's just partly, I think, it's a miscommunication.
PIPM was never intended to work for every library's sort of use.
Because libraries, by definition, they don't have their dependencies pinned.
It's at the application level where you pin your dependencies. So you say there's this miscommunication, and I definitely think you're right.
Because when I looked at pipenv on GitHub, I really saw that as the statement,
pipenv is the officially recommended tool for managing application dependencies from PyPA,
as pipenv is the officially recommended tool for managing Python dependencies,
where really the application should have been bolded, underlined, and all caps.
Something to that effect, right?
Right.
So pretty interesting.
But yeah, I think generally their review of it was good.
So I'll try to give you the quick rundown here.
So pipfile and pipf pip file lock really are superior to
requirements.txt by a ton and the guy said hey i first disliked having flake aid and a security
checking tool all built into one thing but i think it's actually great installing from private
repositories that works really well creating a new pip file is easy No problems introducing pipenv to new users or installing from a mixture of indexes
and git repos. That was all really good. Virtual env is much easier to get into and understand.
Now, let's see, dependencies can be easily installed into a system like Docker. And finally,
like I said, no one proposed getting rid of it they were just a few edge cases
mostly around the library side of things so yeah pretty good but if you're thinking about using
pip in production you know check this article out it's kind of got some good discussion and
a lot of follow-up as well i want to add that i was for development, I am going to start, I haven't been using it, but I'm
going to start using it, not from the standpoint of handling all of the dependencies for the
library dependencies, but more because the setup.py does that. But the transitive dependencies and
also mostly the developer dependencies. So PIPM has a developer feature
where you can either create the environment for running or create the environment for development,
and those can be different. And traditionally, we've had a requirements underscore dev or
something like that, but it's just you kind of have to know it's there. So for that reason,
I'm going to try PIPM. The other reason is the dash dash run flag to be able to run in
the environment without activating the environment is going to be useful for things like Jenkins
runs and things like that. I'm going to give it a shot. I don't have a report yet, but I'm going
to start using this as well. Yeah, sounds good. You're going to have to give us a report after a
while. Yeah, definitely. Nice. All right. So you've got some stuff for GitHub Flow, the whole sort of working in GitHub, PRs, submitting issues, open source goodness.
Yeah.
I've got a development team that's migrating to both a lot of changes in our development workflow.
But one of the things is using Git more.
And we're using GitLab at work.
But this is so a lot of these some of these I use GitHub for open source projects, of course.
But here's an article called 15 Tips to Enhance Your GitHub Workflow or GitHub Flow.
And a lot of these apply to both Git and GitHub and GitLab.
Some of them are GitHub only.
But there's some things that you just sort of need to know about the culture around Git and GitHub and GitLab and everything that you don't actually, it isn't obvious from the start.
So I like having an article that calls out a lot of these things.
Like one of the talks about, I'm not using projects yet, but I'd like to try to use projects to prioritize issues and maybe track progress and plan for what's going in which release and stuff.
Maybe if that's built in, might as well try it.
Using tags on issues, I've started using that.
I know we have tags on a lot of open source projects like Help Wanted and things like that.
There's some standard ones.
Getting to know those are good.
Templates are something that really – so a lot of this stuff isn't stuff
I know about yet. It's stuff I want to
start using. Templates are something
like if somebody does a pull
request against your project,
having some predefined
stuff filled in
for them to know what to fill in.
And the default template is
sometimes kind of lame for certain
projects.
Like I've got a library that the default one asks for like operating system.
Well, I don't really care.
It's not going to affect the library I'm using.
If the issue is really hard to reproduce, I'll ask somebody and say, hey, I'm trying to reproduce it here and I can't reproduce it.
Anyway, there's a whole bunch of great things like down. One of the things I didn't know about at first was squashing pull requests and squashing commits.
That's something that is totally foreign to other, if you're coming to get from other
revision control systems. So there's a, just a good list of a whole bunch of goodies.
Yeah, that's really cool. And I like the, um, the automated tests and checks on, on pull requests. Like that's really nice. Like if the um the automated tests and checks on on pull requests
like that's really nice like if i do a pr to someone else's repo and like my pr automatically
gets tested like flake aided or whatever they're you know wanting to have checked right that can
tell me right away before they get back to me oh there's a step i missed let me fix that and then
you know resubmit the PR or just update
the PR and then have it rerunning. Okay, now everything's good. And I'm sure that on the other
side of things, if someone is running a project and it's already passing all that before they even
get to it, they can take it more seriously. Yeah. And that helps you with even, you know,
you're splitting up branches and so you can have tests running on multiple branches, which is nice to if you're have a long running development feature and then one of the things i want to play
with here is a there's a discussion in some about pre-commit hooks and hooking things like black up
to your pre-commit hook to make sure the styling is correct oh nice yeah instead of asking just
change it yeah your styling is wrong you need to break that line. Fine. We did that for you. Yeah. That's pretty cool. All right. So the last one I got, Brian,
is just a feel good story. Python versus legacy Python, that type of thing. So pandas goes Python
only. No more legacy Python for pandas. Wow. That's cool. That's a pretty big deal. Like
pandas in the data science space is one of the true foundational items.
Maybe it's more popular than any of the others.
I feel like people almost always start with pandas,
and then once they get their data processed, they, like, move to another library.
So pandas going Python 3 only is really, really awesome.
I got this off of Twitter from Randy Olson.
Thank you for that.
And basically, they're following NumPy's lead.
Remember, NumPy is going Python 3
only. So officially starting
January 1st, 2019,
which is not that far
away, seven months-ish, six months,
pandas will drop support for
legacy Python, and this includes
no backports of security or bug
fixes. The final release will
be the day before, and that
one's going to support python 2 and we're
just going to leave it there apparently yeah so i feel like data science has got a little bit of an
edge on the python 3 story for everyone and partly because they've come into the ecosystem
as a large group more recently than say the web developers or the automator folks who have been around for a long
time like the data science stuff has really exploded 2012 and onward so it was a slightly
easier choice i think yeah i think so yeah pretty cool all right well that's it for our news anything
personally you want to share uh no i'm just i'm excited to get back to like podcasting and stuff
it's been good i know it was a lot of fun to do the live one though at pycon right like nobody cheered for us today not totally hurt anyway right it was
so fun to just be in the and get the yeah and like nobody laughed at my jokes yeah maybe they did
we'll just never know maybe we need a like a sound yeah you like one of those fake audience tracks
no that that'll take away from the real ones we'll do some more live ones we're talking about it
right yeah definitely that'd be it was so fun we want to the real ones. We'll do some more live ones. We're talking about it, right? Yeah, definitely. It was so fun. We want to do more. Yeah, maybe we can do some more.
We'll figure that out. So are you excited? It's GDPR Eve. Yeah, the only what? Well, yeah, GDPR.
I don't really know how that affects me. But I'm telling people that's why I forget their name so
quickly is because I'm complying with GDPR. Oh, man. I have very mixed feelings about GDPR.
I'm a fan of privacy and respecting data stuff.
I'm not a fan of some of the ways in which they're going about it.
I mean, it's a tech requirement written by non-technical people, for starters.
Do you have to change, for instance, the courses site of yours?
I've been doing nothing but 10 hours a day of GDPR programming all week.
Oh, geez.
Yeah, and I'm not done.
I got one or two more days.
And what drives me crazy about this is I'm an American company, 100% in America.
And that Europe has these rules that apply to us which it's not about europe or america like what if india decides later that they have other rules that are inconsistent with what i've
done for gdpr and then brazil has other like i just think it's kind of crazy to say like
lawmakers in one country can like impose their will on all of the world through these laws it's
kind of funky but i'm gonna do it because they pretty much have to.
So basically, the reason I'm throwing this out there
is if you run a site where you've got, say, a mailing list
or people buy stuff or you collect user data,
just be sure to be really careful and look into this.
And also, we talked about environments
and we talked about pipenv and various other bits of packaging.
So I just want
to give a quick shout out to the xkcd python environment cartoon which came out a few weeks
ago so that would be xkcd.com slash 1987 it's just about the the sort of madness so my python
environment has become so degraded that my laptop has been declared a super fun site it's got homebrew for python it's got the os python anaconda it's got um pip another pip
easy install okay it's pretty good right yeah yeah i think we'll probably try to i'm going to
link to uh the uh kenneth reitz's writes his reitz i should just stop trying to pronounce names
his PyCon talk because
there was a lot of stuff in there about like the
history of packaging that I didn't know
about so it's a good listen
yeah you should definitely link to that that's awesome
alright well
thank you Kenneth for working on pipenv
and thank you Brian for sharing everything
with all of our listeners thank you
thank you for listening, for sharing everything with all of our listeners. Thank you.
Thank you for listening to Python Bytes.
Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at PythonBytes.fm.
If you have a news item you want featured,
just visit PythonBytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Auchin, this is Michael Kennedy. Thank you for listening and sharing
this podcast with your friends and colleagues.