Python Bytes - #196 Version your SQL schemas with git + automatically migrate them
Episode Date: August 27, 2020Topics covered in this episode: Surviving Django (if you care about databases) * Python Numbers and the Flyweight design pattern* What Are Python Wheels and Why Should You Care? * Pandas_Alive* Ho...w To Use the Python Map Function Version your SQL schemas with git + automatically migrate them Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/196
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 196, recorded August 19th, 2020.
And I am Brian Ocken.
And I'm Michael Kennedy.
And actually, we have a sponsor this week, Datadog. Thank you, Datadog.
Yeah.
Thanks, Datadog.
First off, I want to talk about Django a little bit.
I've always heard Django is super easy, and that's why people it because it's really easy to get started and it has all these things
that make working with Django easy
and so on, right? Yeah, I think there's
a lot going for it. The community seems
pretty awesome. There's a lot
of tutorials, there's a lot of expertise that
they can help you out. So there's an
interesting article by Dan Verrazzo
called Surviving Django
if you care about databases.
So, I mean, Surviving Django
right off the start,
that's an odd title
for an article about Django.
It's going to be kind of hard to summarize,
but basically the take on it is
a little bit of a,
he has a different take on
how to deal with databases
than normally is taught around Django.
And it's an interesting perspective,
but the gist of it really centers around
that there's a lot of parts of Django
that seem to be database agnostic.
So you could use MySQL or Postgres or something else.
But he says, kind of in reality, people don't do that.
People don't really switch databases that much.
So if you really want to utilize the database in reality, people don't do that. People don't really switch databases that much.
So if you really want to utilize the database and some of the great things about whatever database you pick, maybe not being database agnostic is good. Also, he
talks about how to set up schemas and database migrations
using the database, not using the built-in Django stuff.
It seems a little bit more like, why would I do that? It seems more technical than using the database, not using the built-in Django stuff.
It seems a little bit more like, why would I do that?
It seems more technical than I want to do with Django.
But there is some reasoning around it.
And then he also shows exactly how to do this, how to do migrations, how to do schemas.
And it really doesn't look that bad.
The interesting take, I was curious about what the rest of the Django community would feel about this, but then after the article, there's comments on the article.
There's a really nice civilized discussion between the author and somebody named Paolo Melchior, I think, and Andrew Godwin.
Definitely, I've heard of Andrew before, and some others talking about basically that take and one interesting comment was articles like this that point out some of the pitfalls of there possibly are pitfalls with jango
and some well-written articles are a good way to kind of point those out and because there's you
know there's a lot of fans of jango that really aren't going to talk about the bad parts. And this isn't necessarily the bad part, it's just something to be aware of.
Another really interesting comment by Andrew was, I agree that at some point in a project
or company's life, when it's big enough, SQL migrations are the way to go instead of the
Django migrations.
Migrations in the out-of-box state are mostly there to supplement rapid prototyping.
Like a lot of Django, it can be removed or ignored progressively if and when you outgrow the single set of design constraints when you chose them. all the agnostic stuff might be good early on and then maybe slowly going towards using your database more later.
Yeah.
That's an interesting take.
Yeah, that's cool.
A bit of a practicality beats purity on both ends there.
This article also made me really appreciate the Django community
because this was not a flame war.
This was a civilized discussion about a technical topic.
What, on the internet?
Yeah.
It was great.
Yeah, that's really cool.
However, a few comments.
One, I've switched from one database back into another
three or four times on major projects
as you're like, you know what?
This is just not doing it or it's outgrown this or whatever.
So it happens.
But at the same time,
like that's usually not my SQL to Postgres.
It's usually like relational to non-relational
or something massive
where it's going to require rewrite anyway.
So I do like the idea of saying
you have this capability to be completely agnostic,
but you're working with the lowest common denominator there.
And that's usually not the best choice if you're writing an application. Maybe if you're working with a
library, tons of people are going to use it in ways you don't anticipate. But if it's a application,
you know how it's going to be used most often. Yeah. Also, some of those speed and speed
improvements you can get out of a database, you really can't do too much of with the agnostic
front end. You kind of need to
know the specifics of that database so yeah pretty cool all right for this next one i want to talk
about an interesting pattern that python uses i guess interesting technique so you know the id
function right you can say id of a thing and it'll give you a number back and it basically tells you
what it is like where it is
in memory are you familiar with this i guess i don't use this yes if you want to know like if
i'm giving two variables are they actually referring to the same object or do they just
have the same value right like if i had a dictionary and i want to know is it the same
dictionary or does it just have the same keys and the same values for those keys?
You can say ID of one thing and ID the other.
And in CPython, that'll actually give you the memory address.
But in all Python, that gives you a unique identifier that is guaranteed to be different if they're different objects, the same if it's the same object, right?
Okay.
Okay.
So one of the things that Python does that's really interesting, and this is all research i've pulled up from working on my python for memory management course that is probably out
by the time that this comes out but you don't have to take that to care about this so one of the
things that's really interesting in python is everything is a pointer right allocated on the
heap including numbers and strings and other
small stuff that might be allocated on the stack in like languages like C sharp or C plus plus or
whatever, right? So numbers in Python are way more expensive than they are in languages that
treat them as value types rather than reference types. So for example, the number four uses 28 bytes of memory in Python,
whereas the number four could use one, two, four, or eight
in the languages that treat them as value types,
depending if they're like shorts or longs or whatever.
So there's this cool design pattern called the flyweight pattern,
and I'll just give you the quick rundown on that.
So flyweight is a software design pattern.
A flyweight is an object that
minimizes memory usage by sharing as much data as possible with similar objects, right? So that's
from Wikipedia, I'll link over to that. In Python, Python does that for numbers. So if you compute,
like through some mathematical function, if you compute the number 16, and then some other way
you compute the number 16, and then somewhere else you parse a string, the number 16.
Those are all literally the same 16 in memory.
Okay.
Okay?
Because 16 is pretty common.
But if you computed 423 the three different ways,
that would be three copies of 423.
So Python uses this flyweight pattern for the numbers from negative five to 256
and you'll only ever have one of those in the language in the runtime but beyond 256 or below
negative five those are always recreated isn't that interesting it is very interesting yeah so
yeah it doesn't matter how they come out basically if the runtime is going to generate the number
say seven as an integer it's going to use the same seven which is pretty cool i actually have
some example code that people can play with creates like two lists of a whole bunch of numbers
separate ways and then says you know are these the same number or not which is pretty cool i was just
playing with it right now so you can if you assign x to one you can do an id of both x and one it'll show
up as the same number but if you assign x to minus 10 x and minus 10 are different ids isn't that
funky yeah it's because the numbers in python are extra expensive so python takes special care to
not recreate these very common numbers and And apparently very common means negative five to 256 inclusive.
Anyway, I thought that might be interesting to people,
this flyweight design pattern concept,
and then applied to the numbers might be interesting.
And there's a little example code that I included it there.
So it's not quite an article, but it's like an idea with some code.
Yeah.
So can you, I mean, as a user,
can I use the flyweight pattern in Python for other stuff? totally should yeah like imagine you've got some objects you're creating and
instead of recreating them over and over they're being used in a lot of places you could totally
create some kind of like shared lookup for certain common ones like maybe you create you're creating
states and the state has a bunch of information about it like u.s states or
countries or something but then you often have to go like all right what state is this give me that
information right you don't need to necessarily recreate that you could just create 50 states
keep them in memory and never allocate them again okay i guess i'm like caching and memoization are
ways to do something similar but with only like one thing at a time exactly the big important thing here to make this
work correctly is they have to be immutable right because if if one person gets the state georgia
and it has certain values then another person gets it oh it has a new county let's add that
and like wait a minute that's not i've now not recreated a different thing or like it you know
so it's got to be immutable which is why it works for numbers and you could do it for strings and things like that.
Okay, cool. Yeah, pretty cool.
Something else that's really cool is Datadog.
So thank you, Datadog, for sponsoring
this episode. Let me ask you a question.
Do you have an app in production
that's slower than you like? It's
performance all over the place, sometimes fast,
sometimes slow. Now here's an
important question. Do you know why?
With Datadog you will.
You can troubleshoot your app's performance
with Datadog's end-to-end tracing.
Use the detailed flame graphs
to identify bottlenecks and latency
in that finicky app of yours.
Be the hero
that got the app back on track
with your company. Get started today
with a free trial at
pythonbytes.fm
slash datadog.
Awesome.
Thanks, Datadog.
You know what else is awesome?
What is awesome?
Pip installing a thing
that when I pip install something
and it happens right away
and it's not like
30 seconds of compile time,
like say MicroWhizgy is,
to get the thing installed
and I don't have to have
like MSBuild
or VCVarsBat
set up right
or whatever.
Yeah. So definitely I definitely grateful for wheels.
It was still a world that we didn't, there was less wheels in it when we started this podcast.
I'm pretty sure.
Yep.
Most of the common packages, a lot of them have migrated to distributing wheels.
And package authors have had to care about this a lot. And so I want to talk about this article that's on the RealPython blog from Brad Solomon called
What Are Python Wheels and Why Should We Care?
One of the things I really love about this is, like I said, a lot of package authors
have already gone through this and understand some of the ramifications.
But as a normal, casual user of pip install,
we don't really think about it.
But the first half of this article talks about
kind of what the user's perspective is,
and it's kind of a nice look.
When you say pip install something,
and it's cool because as an example,
I'm glad they list an example,
and it's a particular version of micro-whiskey
because most packages are
wheels now but if you install something that is not a wheel it's probably a tar ball and i don't
know if there's other options other than tar balls but anyway a tar ball is something that
ends in tar.gz so it's a a tarred and zipped and that's a whole bunch of unix speak that you don't
really have to care about but it downloads this blob of stuff and
then unpacks it and then pip
calls setup and some other
stuff to build the wheel after you
download it and then it
labels it and then it installs
it. There's a whole bunch of steps in there
plus it's calling setup.py
so there could be really any code in there
and so that's kind of creepy.
The difference is often with
if you actually have a wheel instead of the tarball pip install will just pull this down and
install it and doesn't call setup.py that's really nice actually because one of the things i think a
lot of people don't realize until they're like oh wait what just happened when you pip install
something you're running semi-arbitrary code off of the internet.
That's not ideal.
Right.
With the wheels, you don't have to run, because basically that runs the setup.py in the Estes version, I believe.
So this is really nice that wheels can cut out that Python execution bit.
It cuts that out.
Plus, also, I'm not sure what the technology is here.
I think it's probably just, it's already already precompiled and there's operating system specifics.
But wheels tend to be smaller than the tarballs, so they download a lot faster.
Wheels have a bunch of stuff in the name.
And it's not just random stuff.
It's specific stuff.
But it talks about what distribution it is.
It's got the version number.
It's got, like, maybe build ident identifiers and which Python it's for.
If it's a Python 2 versus Python 3 or a specific version.
And then the platform is one of the important bits.
So if you have compiled code, then
there's kind of a different CI pipeline to try to build all those
wheels. But on the user end, we don't have to care about it.
So one of the different things,
one of the interesting bits about moving towards wheels
is there's often a whole bunch of packages up there.
And that's something that users will see
if they look at what downloads are available.
There'll be this whole slew of stuff.
And for the most part, you don't have to care about that.
If you do pip install, it'll just pick the right one for your operating system.
However, it's good to be aware of those because if you are creating like a cache of stuff at your, if you have your office or something, you may want to cache more of those depending on what operating systems are being used around.
So that little discussion I think is pretty cool. Absolutely. Anyway, I'm not going to get too much into it. want to cache more of those depending on what operating systems are being used around so that
little discussion i think is pretty cool absolutely anyway i'm not going to get too much into it this
is a good article for yeah i use wheels but what are they and this is this doesn't get too deep
into it but it's nice yeah well wheels are definitely nice and another solid article from
real python so very nice you know what else is good pandas i've heard that pandas
does a lot of cool stuff now actually pandas is really really cool you could do a whole bunch of
interesting things with it and jack mccue he's been on fire lately he's created all these different
projects that he keeps sending him over and like oh this is not my phone he's like no this is another
one i created and a lot of them are cool one of the things he created was awesome python bytes
so a hat tip to jake on that that's cool like all the things he created was awesome python bytes so hat tip to
jake on that that's cool like all the awesome stuff that we happen to have covered periodically
but this one is called pandas alive and so trying to get the experience of this one you need to open
it up and just scroll through the readme on the github page and just look at the animations so
you probably have seen these racing histograms or racing bar charts that show stuff happening
over time like here's the popularity of web browsers all the way back from 1993 but it was
mozilla and then netscape and then ie and then you know whatever and you see them like growing
and moving over time so this is a package that if you have a panda's data frame in a really
simple format where the columns are basically the different
things you want to graph and it had they're all arranged by a common date and they just have
numbers you can turn that into a really cool like bar chart race type of thing or line graph race
where it's just this animation of those over time of the dates that you have in there oh i really
like this isn't this cool yeah and the i
mean like the race charts and stuff those are cool but then you can also do the like the line
the uh line graphs like growing zooming yeah yeah you can do like line graphs and you can do other
types of things little um plot scatter plot type things you can also do pie charts but you can even
have them together so you have maps so if you can even have them together. So you can have maps.
So if you want to have a map evolving over time with different countries or counties
fading in and out, you could have those two graphs animated side by side at the same time.
So you could have the chart of the bars as well as the map all animated together in one
graph.
Cool.
Seems pretty awesome.
Well done, Jack.
It's based on, I believe, Matplotlib.
And basically it'll render a bunch of different Matplotlib renderings
into an animated GIF.
So all you have to do is just go like dataframe.plotanimated,
give it a file name, and then this happens.
Oh, that's cool.
So then you can just generate this GIF and then put it wherever.
Exactly.
You can put it on your website.
You can put it wherever you want. You could share it on your website. You can put it wherever you want.
You could share it on Twitter, I guess, even.
Right?
But it doesn't require like a JavaScript backend running something
and your Jupyter notebook and then all that kind of stuff to wire up.
Like, no, it's just an animated GIF that comes out.
Neat.
This is mesmerizing.
I could just watch these all day.
You could watch it for quite a while.
So, yeah.
Anyway, really think that's a cool project if you want
to visualize data over time which you know there's a lot of good reasons to do that one of the the
things that has there is animated maps but maps are something else also there's also a map function
which has nothing to do with geographic maps you probably learned python a long time ago but
do you remember being surprised by map at all? Yeah, map and all those things, they always
confuse me, and I've always tried to basically
avoid them.
And I've successfully mostly done that.
But I know also, yeah, yeah, I also
know how useful they can be, so tell us about it.
This is an article from Catherine
Hancock's How to Use the
Python Map Function, and
I know I'm sure people have heard of
maps and map, the map function. It's an i'm sure people have heard of maps on and map the map function
it's a extremely useful function a useful thing so it's a built-in and what it does if you're
not familiar with it it takes two or more parameters the first parameter to map is
the function that you want to apply and then like let's say if you give it as the second argument
and iterable like a list or something it takes that function that you passed in and applies it to absolutely every element of the iterable, the other one. something to apply some like quick thing like if i want to do x times squared x times squared x
times two or x squared or something like that and apply that to every element you can do that and
you can make one list into another i think it's good for people to like read about them every
once in a while if they're not using them often because they do come in handy in places that you
all the time for me at least So it's not an obvious thing
if you're not used to this sort of a function
from other languages. I wasn't
coming from C and maybe
Perl has something like this but I never used it.
So that's the normal use of
applying it. One of the things I like
about this tutorial is it goes through a
few different things. So
applying lambdas to a list or
an interval and then the function you
apply doesn't have to be a lambda. It could be your own user-defined function, or it could be
a built-in function that you map to it. I want to warn people, the part where she's talking about
the user-defined function, it's oddly complex for some reason. I'm not sure why this was made so
complex, because user-defined functions just work like anything any other function that using for map but one of the things that I even got out of it is I had forgotten that map applies the function to the iterable one element at a time and it doesn't do it ahead of time so like for instance and I am like really and i had to like prove it to myself by putting a print
statement or something in a function to do it but what happens is um like let's say i've got
iterable hooked up to grab like a huge data chunk out of a stream or something i can apply some
function to each element as i'm pulling it out and using map to do that so i can iterate over map so map returns a map object which
whatever it doesn't matter it's just every element that you use if you use it as an iteration
is the answer after you apply the function it's like a custom generator type thing yeah yeah and
then if you want it as something solid you can convert it to a list or or a tuple or something
like that if you want to do everything i'm done with generators convert it to a list or a tuple or something like that if you want to do
everything. I'm done with generators, throw it in a list.
There's some
honesty here too. The other thing I
often forget about map
is that you can map it across, if you have
a function that takes multiple
arguments, you can pass it
multiple iterables and it'll take
element-wise each one.
So like the nth element out of each
list and and apply pass it to the function and then return the answer to that which is cool
the other thing a good comment in this because it's a similar problem area is comprehensions
kind of do the same thing so when would you use map versus comprehension? And the advice in this article is
comprehensions are very useful for smaller datasets,
but often for large datasets,
map can be more powerful.
So that's reasonable.
And sometimes you want to do operations
that if you had to go over different collections of data
would make a really nasty looking comprehension and stuff.
So, yeah, cool.
You also can do like pandas type of things a little bit,
like multiplying vectors, right? like if i've got two lists and i want to have the pieces put together
like that power example that's in there right it'll take the first element of the first one
the second element the first element the second one and then apply the function and generate a
new list effectively that has like as if you had sort of done vector multiplication, which is cool. Or like cross, I don't know, cross multiplication.
Yeah.
I often use map also when I want to muck with something
and it seems a little cleaner to me to iterate through something.
If I know I'm looking for something
and I'm not going to get the end of the data
or I'm using endless data.
Nice.
So we spoke earlier about databases
and I've got another one for us.
This cool thing called AutoMigrate.
It's a project called AutoMigrate.
Okay.
So what it does is it's kind of like you talked about Django migrations,
and we also have SQL alchemy migrations with Olympic.
But some people, either they're not using an ORM at all,
in which case those tools are useless,
or they want to very carefully write the SQL scripts that control their databases.
Some people, there's a group of DBAs that manage the database, and that's that.
We're not going to run just random tooling against the database.
We're going to run scripts that are very carefully considered. So this auto migrate thing, what it will do is, if you have a those DDL data definition language
scripts that say create table, add column, and so on, all it has to do is have the script that
will say, here's how we create something from scratch, you put that into GitHub. And then you
make changes to it.
Like to add a column,
I go and edit the create table thing and I just type in the new column in
there.
And what this will do is it'll look at your get history and it'll do diffs
on the create table statements and it will generate the migration scripts
from that.
Oh,
that's really cool.
That's neat.
Right?
So all you got to do is like maintain the,
here's how I create the database and it'll actually go,
we'll go to go from this version to that version here's the script that
would actually do it it'll do all that stuff for you nice yeah so if that's your flow if your flow
is to work with these ddl files these sql files this seems like a great tool now they do say oh
this is way better than like an orm or something because in those like alembic what you
have to do is you have to go and write the migration scripts here's how you migrate up
here's how you migrate down but they left out a little important thing dash dash auto generate
which looks at all of your classes in your database and go here's the difference we
automatically wrote that for you which i think is way nicer even than this project so i think
alembic is better but the big
requirement there is you are using sql alchemy if you're not using sql alchemy to do these migrations
then this tool but you're using these scripts instead to define your database like i'm sure a
lot of like especially the larger companies where there's like a database team or like dbas and so
on are doing then this seems like a really cool
project for it that said the converse is actually pretty cool so what it can do is it can look at a
database and it will generate your sql alchemy files for you that's pretty cool that's nice
yeah it'll generate or orm definitions from sql right using the sql alchemy generator which is
pretty awesome so you can say, here is my
create table scripts,
generate me the corresponding SQL
Alchemy thing to match that.
So in that direction, it's pretty awesome
also. So which does that? This one,
this auto-migrate, it'll look at your
DDL, like create these table scripts, and
it'll turn it into Python SQL Alchemy
classes. But the reverse,
it was saying like,
oh, it's painful to use Alembic in the other direction.
But if you use the auto-generate feature of Alembic,
then it's also not painful.
But there's certainly a couple of use cases that are pretty awesome here.
One, like starting from all the create stuff,
like given a database,
just ramp me up to getting a SQL Alchemy set of classes
that'll talk to it as quick as possible,
that's really cool.
Dan, if I've got a schema change,
is there a version number
that's stored in the database somewhere
to say which version of the schema is being used?
Yeah, I have no idea about this thing.
With SQL Alchemy and Alembic,
there is a version number.
It says I'm version, I'm hash.
And then all the migrations
one of those is the hash and each migration says the one that came before me is this and the one
that comes after me is that they can look at an existing database and say your version x yes
exactly for alembic i have no idea about this thing this thing could potentially look at the
the table basically run it like script this create table stuff for me and then look at that
compared to what it has potential i have no idea if it's that smart though okay yeah but it looks
like it could be handy for a lot of folks well i've had a rough week so i got no extra stuff
no extra stuff no extra stuff i don't have too much either i have a little bit i just want to
give a shout out that we have a ton of new courses coming and i want to
just encourage people if they're interested in these to go to training.talkpython.fm slash get
notified and put the email there if they haven't created an account or signed up there before
because we have excel moving from excel to python with pandas coming out we have getting started
with data science coming out we have python memory management tips coming out those all three will probably be within like a couple of weeks and then getting started
with git and python design patterns as well so there's a bunch of cool stuff if you want to
hear about any of those just be sure to get on the mail list oh wow that's cool
if i didn't talk to you every week i would totally get on this mailing list
awesome but actually i think i'm already on it i'm sure
you are because you do talk to me though you get jokes definitely but everybody listens gets them
also that's right this is a fun game to play the idea is you take some actual legitimate classical
painting and you you know like if you go to an art gallery it'll say like you know flowers in
bloom oil canvas monet 19 or you know 1722 or something like that like in a little placard
underneath so the game is to reinterpret these paintings in modern tech speak okay yeah so here
i'll do the first one i put three in the show notes that people
can check out i'll describe this to you then i'll read the little thing so there's like a
a ship that seems to be on fire with some extremely strong guys trying to drag the ship
out of the water maybe no they're pushing it into the water and a bunch of folks on the edge
sitting off it's like a viking ship i think
they're actually cremating somebody sitting out anyway it's this historical picture and it says
the the placard says engineers remove dead code after dropping a feature flag sir frank bernard
dixie 1893 oil on canvas you want to do the next one oh sure pull it up oh okay okay how do i describe
this this is like a like a picture it's a picasso picture of like an abstract violin
yeah yeah it's hard to tell really what's going on you kind of looks like a violin and the title is css without comments
that's good pablo picasso in 1912 all right the last one the last one we'll do there's by the way
there's hundreds of these are all really good so this one is a little disturbing there's a person
who looks deathly ill with a bunch of like gargoyles over them a priest with a crucifix kind of glowing
apparently trying to ward off the gargoyles and the placard says experienced developer
deploys hotfix on production francisco goya oil on canvas circa 1788 that's good yeah so
there's just so many of these you can go through them them all day. It's really fun. Didn't PyCon do that once?
Like one of the PyCons?
I think you might have been with us.
I know Chris Medina, Kelsey Hightower, and I were walking around the Portland Art Museum.
Like basically playing this game.
We were like coming up with the placards.
It was fun.
And were you there for that?
You might have been.
No, I wasn't.
I missed that one.
But that was good.
I remember that when we could go to conferences.
If there were people around you, other people close?
It was weird.
Actually, we don't need anybody to contact us
and tell us that we have no idea when different painters were alive.
But thanks.
And cool, good for you if you know it.
Awesome.
Yeah, these are really good.
If you enjoy this kind of stuff,
there's hundreds of fun pictures to go through.
And I think it's also amusing that we often pick
visual jokes for an audio format.
So, sure, why not?
Do it hard. That's what burgers do.
That's right. Let's do it with abstract art.
Yeah, it's funny.
Anyway. Awesome. Alright, well, thanks, Brian.
Thank you. Bye.
Thank you for listening to Python Bytes. Follow the show on Twitter
at Python Bytes. That's Python Bytes as in B-Y-T-E-Stes. Follow the show on Twitter at Python Bytes.
That's Python Bytes as in B-Y-T-E-S.
And get the full show notes at pythonbytes.fm.
If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
This is Brian Ocken, and on behalf of myself and Michael Kennedy,
thank you for listening and sharing this podcast with your friends and colleagues.