Python Bytes - #30 You are not Google and other ruminations
Episode Date: June 15, 2017Topics covered in this episode: Problems and Solutions are different at different scales Introducing NoDB - a Pythonic Object Store for S3 Elizabeth for mock data What’s New In Python 3.7 * Hypot...hesis Testing* Heroku switching default to v3.6.1 Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/30
Transcript
Discussion (0)
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
This is episode 30, recorded on June 13, 2017. I'm Michael Kennedy.
And I'm Brian Ocken.
And we have tons of good stuff to share with you today. I'm very excited about all of our items.
But first, we have a brand new sponsor. I want to say thank you.
That's exciting.
It's super exciting. So I want to welcome Datadog to Python Bytes as a sponsor.
And Datadog has this cool thing where if you do a little test integration with them, you
actually get a free shirt.
So I'll tell people how to get a free shirt later.
Wonderful.
Cool.
I heard that not everybody writes software like at the scale of Google.
Yeah, I think that's probably true, especially of me.
But there's a couple articles that I want to talk
about today. Actually, mostly one. There's an article by, I think it's Ozan One, which is a
pretty cool name, called You Are Not Google. And also he goes on to say that you're also not Amazon
and you're also not LinkedIn. But it isn't to say that the Google, Amazon, LinkedIn all have applications that might
look similar to normal folks' applications. But the scale is definitely different. So just
be aware of that. And I guess it's a reaction to people chasing a lot of the shiny new technologies
like asynchronous IO and other things. And I'm guilty of that as well. But
he presents when looking at solutions, he presents a model called the unpat, I think,
or maybe it's unfat. I'm not sure. U-N-P-H-A-T, which is try to understand the problem that you're
first. Enumerate multiple possible candidate solutions, read papers or articles about the
solution that you're going to try. Look at the historical context of why the solution came to
be in the first place, list out the advantages and disadvantages, and don't forget to think,
make sure that you're that solution really fits your problem. And anyway, I think it's a good
discussion about how a lot of the the architectures and stuff that people write about and how Google and others do things might not apply to you.
So that's just a heads up.
I think it's a super interesting article, and it's really interesting to think about computation at like a massive, massive scale,
like some of these data center things and the incredible failover and possibly like
global redundancy that some of these companies like Google operate at. And people read these
things are like, oh, we're going to do our startup. And it's going to have like, if we're lucky,
we'll have 1000 users after like three months. Well, maybe you don't need to architect around the same things that
say Google or LinkedIn or Amazon are architecting around, maybe you need to link architect around,
let's get this thing working as fast as possible. And then we'll deal with the scale later. So
there's a quote that says, and here's pretty nice says the thing is, there's like five companies in
the world that run jobs as big as Google. For everybody else, you're doing all this
IO and fault tolerance and things you don't really need. It's really just this idea of like,
it's cool to study these patterns, but these patterns were created within a context of a
problem. Do you have that same problem? It's like the recurring theme. I really like it.
Yeah. There's also, even if you are huge, even if your problems are big, they still might not
be the same.
And one of the things he talks about is maybe the, like your data store, maybe the number
of writes is more important than how often it's read or vice versa.
So even at large scales, the problem might be different than somebody else's.
Yeah. So even at large scales, the problem might be different than somebody else's. Yeah, for example, Amazon optimized for write tolerance on the database that backs their shopping cart.
But is that your primary concern?
If it is, maybe do what they're doing.
If it's not, then maybe that's not the best database for a general database, right?
For example.
Yeah, yeah.
Anyway, it's a good read.
Speaking of databases,
oh, let's talk about your microservice one as well that ties into this.
Oh, just along the same line,
I ran across another article
that just is called Enough with Microservices,
and it's a similar type of article.
It's also well-written.
It's by Adam Drake, and we'll have a link up.
Mostly it's similar sort of discussion
about microservices
and dependencies and how that's a complexity that adds to your cost. So make sure that you're aware
of that before you try to jump into it. Yeah. I recently had Miguel Grinberg on TalkPython.
I haven't released that episode yet because I have like three months of backlog of stuff
recorded that's got to get out, but it's on its way. And it's a really good episode
around microservices. So if you're really thinking about microservices, check that out. It's really
enlightening. Actually, I learned a lot from talking to him. But I think one of the takeaways
is that switching to microservices makes your application simpler than it otherwise would be.
Instead of having one complex application, you now like say six very simple applications but your devops and
deployment and coordination and your infrastructure story gets way more complicated and so you're kind
of pushing the complexity of your application around and does it make sense to push programming
complexity into infrastructure dev complexity? I think it
depends on your organization, how many people are on your team, how complex is your app.
But certainly, a lot of small apps probably shouldn't be microservices.
It's interesting that you bring that up with what your organization looks like, because
there's a lot of small startups and small organizations and sometimes just individual
people that you really pay
attention to what you like to spend your time on and what your skills are.
Because like when you just said that, I was like, wow, I better be very careful about
that because DevOps is not my strong suit.
So exactly.
I'd rather have the complexity in my app and carefully factor that thing rather than push
it to a bunch of servers that coordinate.
Right.
Yeah.
Yeah. Yep.
Yep. Pretty, pretty interesting. So speaking of databases and things that are complex
and things that scale super high, let's actually not talk about a database with this thing called
NodeDB.
Okay. I haven't heard of this at all. So tell me about NodeDB.
So NodeDB is a Pythonic object store that uses Amazon S3 as the backend. So it as a programming interface, it looks like
a simple NoSQL database. But what is actually doing instead of running a server or something
is it's talking to S3 and storing your objects there. So you can like insert into the database
and you configure that you're like your connection string, if you will, for the database is here's my S3 account.
And here's the bucket.
I want to store it into a folder, think folder.
And then you just like insert, query, update and delete from this database.
But what actually happens is it stores it over there.
And I believe the default is actually to pickle the Python object.
So you even get like type preservation like you could insert a customer and then boom out comes a customer with
its like functions and everything okay well interesting yeah interesting right so this was
done by rich jones he released it in april and it's it sort of ties in with some of the serverless architecture, right?
Like this is the guy that works on Zappa.
We talked about Zappa last time,
which lets you run web applications on Amazon Lambda,
which is already pretty interesting.
So this is like another, it's like you don't have a web server,
so maybe you get away with not having a database.
And it can handle a decent amount of load,
but it's not like a full-on super database.
It's more like for prototyping and things like that.
Okay, cool.
Yeah, so some of the examples he said it might be good for
is prototyping, like I said,
but also storing API event responses for like replay.
So if you are doing microservices
and you want to store all the traffic
that goes back and forth,
you could just do that here really easily,
capturing logs, simple data like here, Phil, add me your email to this thing.
One of the more interesting things is if you're doing Lambda, AWS Lambda, you can have triggers
that call the function based on S3 events.
This file was changed.
This bucket had a new thing added.
And what that means is you could insert into the database and it would call an AWS Lambda
function as a result of that.
So you could like insert this thing and the act of storing it also kicks off some action.
Oh, neat.
That's actually pretty cool.
Yeah, it's pretty neat.
Plus the article has a nice picture of a fish skeleton.
I like,
yes.
Yeah.
Pictures are important.
The one that you talking about next also has a cool picture.
Cool logo.
Yeah.
I've heard of Elizabeth a few times.
Yeah.
I do have to admit that the logo did bring me into this a little bit.
So,
yeah,
we talked about faker before,
which would let you
create like test data that looked real, like give me an address, give me an email, things like that.
This is like a competitor to Faker, huh? Yes. And it's, um, so, uh, if you haven't listened to
Faker, that was on episode 25. I looked it up this time. It's definitely a competitor to Faker.
They even, uh, have some comparisons And it looks like on their project page,
one of the main features that they're going for for Elizabeth is performance. Apparently,
it's faster than Faker. Yeah, Faker's kind of slow. I mean, Faker's really nice, but it is...
I tried to generate a database that had like a couple million entries with Faker, and it was a
little... It took a while, let's just say. Yeah. Yeah. I haven't tried anything huge.
I wonder, I'm curious to how Elizabeth compares, but it definitely is a similar space, but
I think it, um, it's just another project.
Maybe it fits better for your project.
It's, uh, and there's the, the articles were really well, well written.
So there's, we're linking up two part medium articles and there's also a,
it looks like the same person wrote a PyTest plugin so that you can, and the PyTest plugin
is actually pretty darn cool. It allows you to, within a test, be able to, as a fixture,
you can bring in different parts of the fake data. So. Yeah, that's really cool. Yeah,
it definitely looks nice. I'm, I feel like it's
in some ways complimentary to Faker. I'm not sure you would use both in the same thing,
but you can get kind of get slightly different data. So depending on what you're after one or
the other, maybe better. It's a slightly different model of how to pull the data out. So I think it's
good for people to try both and see which style works best for them. It also does a different
localization as well.
Yeah. The localization always is pretty impressive to me, actually.
Yeah. I wouldn't want to try to do that project myself, but I'm glad it's around.
I'm glad it exists.
Yes.
Are you ready to hear about how to get a free t-shirt?
I really want a free t-shirt, actually.
Actually, yeah. So Datadog, those guys came along and said, hey, we'd love to sponsor and support the show and get the word out about our project that we got here.
So Datadog is, we've talked about Rollbar before, right?
And Rollbar monitors your application for errors.
Well, Datadog kind of does something a little bit similar on a grand scale.
So Datadog will look at your application and all the layers of infrastructure on it.
Let's suppose we have, say, a Flask app.
We could integrate Datadog
and it will give us metrics about that Flask app,
but it'll also tell you about the Nginx web server
and your database and the Linux machine that it's running on
and basically the entire stack
of your application from the servers, the database servers, the web server, all those things, and
put all that stuff together. So you can have a really holistic view of what you're doing.
And you can even integrate it with all these different things. It'll integrate with things
like AWS, it integrates with Rollbar if you use those guys.
It integrates with many, many different things that you might already be using.
So it's super powerful.
It integrates with Postgres, with MongoDB, and so on.
So very, very cool.
Companies like Zendesk and Salesforce and even PagerDuty use it.
If you haven't heard of Datadog, if you haven't tried it, go to pythonbytes.fm slash datadog.
And they've got this little thing you
try it out and you get a free t-shirt. So pythonbytes.fm slash datadog. Support the show
and get a shirt. I think this shirt's cute also. Yeah, yeah, it's nice. So thanks, Datadog. And
you know what? Let's talk about what's coming in Python. I feel like my next two items actually
are both sort of future looking Python things.
So I feel like we just talked about Python 3.6, didn't we?
We've been talking about it since the beginning, yeah.
Yeah, it's been out for, I guess it's been out for a while now.
And so they're starting to talk about what's coming in Python 3.7.
Okay.
I haven't looked at all, so I'm interested.
Yeah, I kind of wanted to highlight that.
There's a whole bunch of things that I put here that are interesting. Two that I think are really worth, like, super interesting,
and I'll just touch on the other ones. The first one is an optimization. Okay, so Python works by
having a bunch of opcodes and then interpreting those opcodes in this, like, giant switch method
in this file called cval.c.
And it basically is a loop and a switch method.
And it looks the opcodes and it figures out what to do.
So they've added two new opcodes, load method and call method.
And it allows them to skip some instantiation of a few objects.
And it results in potentially methods in Python 3.7 being 20% faster than Python 3.6.
Oh, cool.
So one of the big sort of trade-offs that you make in Python is function calls are relatively expensive compared to other operations.
And we obviously want to write smaller functions and break our code apart for usability and readability.
But that can make things slow.
So having faster functions can actually make a really big difference in Python.
Okay, neat. So 3.6 optimized dictionaries a lot, and we might optimize function calls in 3.7.
Yeah, absolutely. Absolutely. So there's some new modules, like there's a new remainder function
in math, the dis function, which is a disassembly function.
If you've ever, if you haven't done this, it's pretty cool. You can say import dis. I think it's dis.dis, module.disassemble. And you give it like a function or a class or something,
and it'll show you the opcodes, kind of like that load method call method I was talking about.
Another really interesting thing that's coming in 3.7 is async context manager.
So a context manager is a thing you can use in a with block, right?
Like a file handle, database transaction, those types of things.
Well, you can have asynchronous context blocks.
And this async context manager lets you basically make the instantiation step in those context managers asynchronous,
which is pretty cool. Oh, that's cool. Yeah. One more that's kind of for the crazy book is
now functions can have more than 255 arguments. Apparently that was a limit that was bothering
someone. And they said, well, let's make it possible for functions to have more than,
you know, like 300 arguments because 250 wasn't enough.
Yeah, I run into that all the time.
I do too.
It's really frustrating.
Why would you need that?
I have no idea, especially when you've got star args and star star kid w args.
So anyway, it's now a thing or it's going to be a thing in 3.7.
Yeah, interesting. It looks like you wrote down bytes from hex and byte array from hex.
Yeah, so those are conversion functions that will parse hexadecimal strings into bytes.
And the change is that it used to have an error if there was white space on the beginning or end,
which really didn't affect what the thing was, but it wouldn't accept them. So now they basically strip off all the white space for you. And so it's a little more tolerant of inputs.
Okay, cool. That'll matter for some people.
More tolerance is always good in my opinion.
Yeah.
I would love it if there was like an army of people or things that could go test my code and find out what errors for me.
Yeah.
Well, I was really glad.
So there's an article called Unleash the Test Army.
It is about a hypothesis.
And I'm glad this came around because since I talk about testing a lot, I get questions about hypothesis a lot.
And I have never used it.
I know that you've had- Dave McKeever?
Yeah. I think you've had him on the show.
Yep, I have. On TalkPython episode 67.
Oh, you're ready too. Did you look that up?
No, I was talking about it last night, actually.
It's somebody's experience with working with Hypothesis. It's a good introductory article to kind of tell you what
it is. So Hypothesis is a testing framework that will really just come up with a lot of different
ways to throw. You set it up so that it throws different data at your code. And it's more of a
unit test type thing, I think. You have to define the input and output of your functions and whatnot to
make it work. It's really pretty quick about being abusive and getting at where the problem areas
might be. This is the first article that I've read that kind of explained how to get into it quickly
because hypothesis doesn't look like something that you can really just pick up right off the
bat, but this is a short introduction. One of the things I like is at the end, he talks about his conclusions with working with it.
And one of the conclusions he came up with is that it forced him to pin down his function specifications and really to consider special cases.
So really think about the interface to the function you're going to test.
What are the good parameters?
What is the expected behavior?
And what are the bad outputs?
And what do those look like?
Making you think about your interfaces is a good thing.
So if hypothesis helps people think about interfaces, great.
Yeah, I think it's really, hypothesis is interesting.
I haven't had a chance to do a ton with it,
but basically instead of choosing examples like,
well, let's see what's an edge case.
If the register value is false
and the email address looks like this and the price looks like that. That seems like a good
example. Let's pass that to my test and see what happens, right? So instead of doing that, you can
go to hypothesis or just write a regular test, but then add on to it this decorator that says,
okay, that thing is like an email address. That thing is a Boolean.
And these are some numbers.
Here's their range.
Go after it.
And it'll just do a bunch of different examples and record which examples worked and which
ones failed and things like that and store that notafile.
And it's pretty cool.
It can find those edge cases and other things you might forget about.
And this example of kind of do it in an interactive way, like you're not really sure how you should test your,
I mean, you've written some tests,
but you're not really sure what inputs to throw at it,
which test cases,
and making you think about where the edges are
and the different corner cases.
I think that's a good thing.
That is a good thing.
The edges and corner cases are a super important part
of unit testing, I think.
Yeah, I'm still trying to figure out exactly what level of the development process and what level of testing this makes the most sense at.
But there's definitely algorithmic pieces in your code that might be a little confusing.
I don't think this would make sense to throw at every unit test in your system, but there's definitely places where this would make sense.
Yeah, well, it's cool. People should check it out, and it's an approachable article for sure.
The last thing is one of these Python versus legacy Python things, and shock up one more win
for Python. So most people have heard of Heroku. Heroku is a platform as a service cloud provider.
Kenneth Wright works there, for example. So his unofficial title is something to the effect of like
Python Overlord at Heroku.
That's like on his business card or something.
And so anyway, he and the crew there basically make it
so you can say, here's my app and here's my requirements.txt.
Run this, please.
And until recently, the default has been
when you say run this Python app,
it's like, cool, you mean 2.7, right?
And you could run it on Python 3,
but you had to like configure it explicitly.
If you said nothing, it ran on Python 2.
The big news is on, what is that, June 20th, 2017,
Heroku is switching the default to Python 3.6.1.
Wow.
So, hooray for Python 3.
So now if you go to Heroku and you say run this,
it's going to be like, awesome, Python 3, right?
That's what you wanted.
And so this thing that I'm linking to
basically links over or displays their blog post.
And their blog post is super short that talks about it.
It just says basically what I said, effective Tuesday.
The default runtime is now Python 3.6.1.
Yeah, so if you've already got a job running there, it won't switch, right?
Exactly, no.
It is only for new projects.
So in the Reddit thing, there's a few interesting quotes.
Somebody said, lots of new projects start out on Heroku all the time,
so this is really great news for Python 3 adoption.
Someone else said, Python 3 is really happening.
Yay!
I was actually a little worried about the future of Python for a while, but I feel like it's all downhill from here.
Yeah, apparently people that don't listen to our podcast.
That's right.
Our listeners know better.
I mean, there's a lot of these examples, right?
We've got all the new frameworks that are exciting.
We also have Django 2 dropping support for Python for python 2 and ironically those numbers match up but the newest version
of django is only going to be python is python 3 only and things like that it's really starting to
you know pick up speed yeah one of the that comment there was interesting is that a lot of
new projects start out on heroku so must be people starting out a project and then later grabbing different server
solutions or something. I haven't done a lot with Heroku to be honest, but I think it's really
simple to basically just wire up a get repository, do a push to it, and it'll just start running your
app magically. So it's really, really easy to get started. And then maybe as you grow, maybe like
costs become a concern or you just want more control or whatever, but it's super easy to get started. And then maybe as you grow, maybe like costs become a concern or you just want more control or whatever,
but it's super easy to get started.
And however you get started
on whichever version of Python
is probably where you're going to stay.
So that's good news.
Yeah, great.
Well, cool.
Yeah, very cool.
And that's it for the news, Brian.
You got anything else you want to share?
No, no.
So, wow, number 30 in the can almost.
30, yeah, that's awesome.
I'm finishing up the last chapter this week, chapter 7 for Python testing.
So that's going to be done soon.
Yeah, yeah, very, very cool.
One of these days, the book will be a thing that you've done in the past instead of a constant job of yours.
Yeah, yeah.
And hopefully, I can't wait until it's an actual physical copy.
So it'll be good to have a stack of copies with that. Yeah, that's awesome to hear you're
making progress. And so thanks for covering this news with me. How about you? Do you have like now
four months of podcasts ready? I have about three months of podcasts that I've recorded. I'm going
to go on vacation for a while in the later half of the summer. So I'm trying to make sure that everything is going to be smooth, no interruptions. And so I have, I think, 13 to
14 episodes of TalkPython already recorded. There's tons of interesting stuff. I'm really
looking forward to sharing. I don't want to hold it back, but I've got to dole them out week over
week or it won't solve the problem. How about this? And as for this podcast, if we,
we haven't really decided yet,
but if we do a break,
we'll definitely let people know before that happens
so that they're not just hanging out there waiting.
Yeah, absolutely.
We'll try to,
we'll try to keep it rolling,
but we might,
we might miss a week or two
with some,
some trips there.
Okay.
In the summer.
All right.
Well,
thanks for sharing your news with everyone
and thank you to Datadog.
Get your t-shirt, pythonbytes.fm slash Datadog. Thanks, Brian. See you next week.
Thank you. Yep.fm. If you have a news item you want featured,
just visit pythonbytes.fm and send it our way.
We're always on the lookout for sharing something cool.
On behalf of myself and Brian Ocken,
this is Michael Kennedy.
Thank you for listening and sharing this podcast with your friends and colleagues.