Python Bytes - #30 You are not Google and other ruminations

Episode Date: June 15, 2017

Topics covered in this episode: Problems and Solutions are different at different scales Introducing NoDB - a Pythonic Object Store for S3 Elizabeth for mock data What’s New In Python 3.7 * Hypot...hesis Testing* Heroku switching default to v3.6.1 Extras Joke See the full show notes for this episode on the website at pythonbytes.fm/30

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 30, recorded on June 13, 2017. I'm Michael Kennedy. And I'm Brian Ocken. And we have tons of good stuff to share with you today. I'm very excited about all of our items. But first, we have a brand new sponsor. I want to say thank you. That's exciting. It's super exciting. So I want to welcome Datadog to Python Bytes as a sponsor. And Datadog has this cool thing where if you do a little test integration with them, you
Starting point is 00:00:31 actually get a free shirt. So I'll tell people how to get a free shirt later. Wonderful. Cool. I heard that not everybody writes software like at the scale of Google. Yeah, I think that's probably true, especially of me. But there's a couple articles that I want to talk about today. Actually, mostly one. There's an article by, I think it's Ozan One, which is a
Starting point is 00:00:53 pretty cool name, called You Are Not Google. And also he goes on to say that you're also not Amazon and you're also not LinkedIn. But it isn't to say that the Google, Amazon, LinkedIn all have applications that might look similar to normal folks' applications. But the scale is definitely different. So just be aware of that. And I guess it's a reaction to people chasing a lot of the shiny new technologies like asynchronous IO and other things. And I'm guilty of that as well. But he presents when looking at solutions, he presents a model called the unpat, I think, or maybe it's unfat. I'm not sure. U-N-P-H-A-T, which is try to understand the problem that you're first. Enumerate multiple possible candidate solutions, read papers or articles about the
Starting point is 00:01:48 solution that you're going to try. Look at the historical context of why the solution came to be in the first place, list out the advantages and disadvantages, and don't forget to think, make sure that you're that solution really fits your problem. And anyway, I think it's a good discussion about how a lot of the the architectures and stuff that people write about and how Google and others do things might not apply to you. So that's just a heads up. I think it's a super interesting article, and it's really interesting to think about computation at like a massive, massive scale, like some of these data center things and the incredible failover and possibly like global redundancy that some of these companies like Google operate at. And people read these
Starting point is 00:02:39 things are like, oh, we're going to do our startup. And it's going to have like, if we're lucky, we'll have 1000 users after like three months. Well, maybe you don't need to architect around the same things that say Google or LinkedIn or Amazon are architecting around, maybe you need to link architect around, let's get this thing working as fast as possible. And then we'll deal with the scale later. So there's a quote that says, and here's pretty nice says the thing is, there's like five companies in the world that run jobs as big as Google. For everybody else, you're doing all this IO and fault tolerance and things you don't really need. It's really just this idea of like, it's cool to study these patterns, but these patterns were created within a context of a
Starting point is 00:03:19 problem. Do you have that same problem? It's like the recurring theme. I really like it. Yeah. There's also, even if you are huge, even if your problems are big, they still might not be the same. And one of the things he talks about is maybe the, like your data store, maybe the number of writes is more important than how often it's read or vice versa. So even at large scales, the problem might be different than somebody else's. Yeah. So even at large scales, the problem might be different than somebody else's. Yeah, for example, Amazon optimized for write tolerance on the database that backs their shopping cart. But is that your primary concern?
Starting point is 00:03:56 If it is, maybe do what they're doing. If it's not, then maybe that's not the best database for a general database, right? For example. Yeah, yeah. Anyway, it's a good read. Speaking of databases, oh, let's talk about your microservice one as well that ties into this. Oh, just along the same line,
Starting point is 00:04:10 I ran across another article that just is called Enough with Microservices, and it's a similar type of article. It's also well-written. It's by Adam Drake, and we'll have a link up. Mostly it's similar sort of discussion about microservices and dependencies and how that's a complexity that adds to your cost. So make sure that you're aware
Starting point is 00:04:34 of that before you try to jump into it. Yeah. I recently had Miguel Grinberg on TalkPython. I haven't released that episode yet because I have like three months of backlog of stuff recorded that's got to get out, but it's on its way. And it's a really good episode around microservices. So if you're really thinking about microservices, check that out. It's really enlightening. Actually, I learned a lot from talking to him. But I think one of the takeaways is that switching to microservices makes your application simpler than it otherwise would be. Instead of having one complex application, you now like say six very simple applications but your devops and deployment and coordination and your infrastructure story gets way more complicated and so you're kind
Starting point is 00:05:16 of pushing the complexity of your application around and does it make sense to push programming complexity into infrastructure dev complexity? I think it depends on your organization, how many people are on your team, how complex is your app. But certainly, a lot of small apps probably shouldn't be microservices. It's interesting that you bring that up with what your organization looks like, because there's a lot of small startups and small organizations and sometimes just individual people that you really pay attention to what you like to spend your time on and what your skills are.
Starting point is 00:05:48 Because like when you just said that, I was like, wow, I better be very careful about that because DevOps is not my strong suit. So exactly. I'd rather have the complexity in my app and carefully factor that thing rather than push it to a bunch of servers that coordinate. Right. Yeah. Yeah. Yep.
Starting point is 00:06:05 Yep. Pretty, pretty interesting. So speaking of databases and things that are complex and things that scale super high, let's actually not talk about a database with this thing called NodeDB. Okay. I haven't heard of this at all. So tell me about NodeDB. So NodeDB is a Pythonic object store that uses Amazon S3 as the backend. So it as a programming interface, it looks like a simple NoSQL database. But what is actually doing instead of running a server or something is it's talking to S3 and storing your objects there. So you can like insert into the database and you configure that you're like your connection string, if you will, for the database is here's my S3 account.
Starting point is 00:06:48 And here's the bucket. I want to store it into a folder, think folder. And then you just like insert, query, update and delete from this database. But what actually happens is it stores it over there. And I believe the default is actually to pickle the Python object. So you even get like type preservation like you could insert a customer and then boom out comes a customer with its like functions and everything okay well interesting yeah interesting right so this was done by rich jones he released it in april and it's it sort of ties in with some of the serverless architecture, right?
Starting point is 00:07:26 Like this is the guy that works on Zappa. We talked about Zappa last time, which lets you run web applications on Amazon Lambda, which is already pretty interesting. So this is like another, it's like you don't have a web server, so maybe you get away with not having a database. And it can handle a decent amount of load, but it's not like a full-on super database.
Starting point is 00:07:45 It's more like for prototyping and things like that. Okay, cool. Yeah, so some of the examples he said it might be good for is prototyping, like I said, but also storing API event responses for like replay. So if you are doing microservices and you want to store all the traffic that goes back and forth,
Starting point is 00:08:02 you could just do that here really easily, capturing logs, simple data like here, Phil, add me your email to this thing. One of the more interesting things is if you're doing Lambda, AWS Lambda, you can have triggers that call the function based on S3 events. This file was changed. This bucket had a new thing added. And what that means is you could insert into the database and it would call an AWS Lambda function as a result of that.
Starting point is 00:08:33 So you could like insert this thing and the act of storing it also kicks off some action. Oh, neat. That's actually pretty cool. Yeah, it's pretty neat. Plus the article has a nice picture of a fish skeleton. I like, yes. Yeah.
Starting point is 00:08:48 Pictures are important. The one that you talking about next also has a cool picture. Cool logo. Yeah. I've heard of Elizabeth a few times. Yeah. I do have to admit that the logo did bring me into this a little bit. So,
Starting point is 00:09:01 yeah, we talked about faker before, which would let you create like test data that looked real, like give me an address, give me an email, things like that. This is like a competitor to Faker, huh? Yes. And it's, um, so, uh, if you haven't listened to Faker, that was on episode 25. I looked it up this time. It's definitely a competitor to Faker. They even, uh, have some comparisons And it looks like on their project page, one of the main features that they're going for for Elizabeth is performance. Apparently,
Starting point is 00:09:33 it's faster than Faker. Yeah, Faker's kind of slow. I mean, Faker's really nice, but it is... I tried to generate a database that had like a couple million entries with Faker, and it was a little... It took a while, let's just say. Yeah. Yeah. I haven't tried anything huge. I wonder, I'm curious to how Elizabeth compares, but it definitely is a similar space, but I think it, um, it's just another project. Maybe it fits better for your project. It's, uh, and there's the, the articles were really well, well written. So there's, we're linking up two part medium articles and there's also a,
Starting point is 00:10:06 it looks like the same person wrote a PyTest plugin so that you can, and the PyTest plugin is actually pretty darn cool. It allows you to, within a test, be able to, as a fixture, you can bring in different parts of the fake data. So. Yeah, that's really cool. Yeah, it definitely looks nice. I'm, I feel like it's in some ways complimentary to Faker. I'm not sure you would use both in the same thing, but you can get kind of get slightly different data. So depending on what you're after one or the other, maybe better. It's a slightly different model of how to pull the data out. So I think it's good for people to try both and see which style works best for them. It also does a different
Starting point is 00:10:43 localization as well. Yeah. The localization always is pretty impressive to me, actually. Yeah. I wouldn't want to try to do that project myself, but I'm glad it's around. I'm glad it exists. Yes. Are you ready to hear about how to get a free t-shirt? I really want a free t-shirt, actually. Actually, yeah. So Datadog, those guys came along and said, hey, we'd love to sponsor and support the show and get the word out about our project that we got here.
Starting point is 00:11:12 So Datadog is, we've talked about Rollbar before, right? And Rollbar monitors your application for errors. Well, Datadog kind of does something a little bit similar on a grand scale. So Datadog will look at your application and all the layers of infrastructure on it. Let's suppose we have, say, a Flask app. We could integrate Datadog and it will give us metrics about that Flask app, but it'll also tell you about the Nginx web server
Starting point is 00:11:39 and your database and the Linux machine that it's running on and basically the entire stack of your application from the servers, the database servers, the web server, all those things, and put all that stuff together. So you can have a really holistic view of what you're doing. And you can even integrate it with all these different things. It'll integrate with things like AWS, it integrates with Rollbar if you use those guys. It integrates with many, many different things that you might already be using. So it's super powerful.
Starting point is 00:12:10 It integrates with Postgres, with MongoDB, and so on. So very, very cool. Companies like Zendesk and Salesforce and even PagerDuty use it. If you haven't heard of Datadog, if you haven't tried it, go to pythonbytes.fm slash datadog. And they've got this little thing you try it out and you get a free t-shirt. So pythonbytes.fm slash datadog. Support the show and get a shirt. I think this shirt's cute also. Yeah, yeah, it's nice. So thanks, Datadog. And you know what? Let's talk about what's coming in Python. I feel like my next two items actually
Starting point is 00:12:41 are both sort of future looking Python things. So I feel like we just talked about Python 3.6, didn't we? We've been talking about it since the beginning, yeah. Yeah, it's been out for, I guess it's been out for a while now. And so they're starting to talk about what's coming in Python 3.7. Okay. I haven't looked at all, so I'm interested. Yeah, I kind of wanted to highlight that.
Starting point is 00:13:01 There's a whole bunch of things that I put here that are interesting. Two that I think are really worth, like, super interesting, and I'll just touch on the other ones. The first one is an optimization. Okay, so Python works by having a bunch of opcodes and then interpreting those opcodes in this, like, giant switch method in this file called cval.c. And it basically is a loop and a switch method. And it looks the opcodes and it figures out what to do. So they've added two new opcodes, load method and call method. And it allows them to skip some instantiation of a few objects.
Starting point is 00:13:39 And it results in potentially methods in Python 3.7 being 20% faster than Python 3.6. Oh, cool. So one of the big sort of trade-offs that you make in Python is function calls are relatively expensive compared to other operations. And we obviously want to write smaller functions and break our code apart for usability and readability. But that can make things slow. So having faster functions can actually make a really big difference in Python. Okay, neat. So 3.6 optimized dictionaries a lot, and we might optimize function calls in 3.7. Yeah, absolutely. Absolutely. So there's some new modules, like there's a new remainder function
Starting point is 00:14:21 in math, the dis function, which is a disassembly function. If you've ever, if you haven't done this, it's pretty cool. You can say import dis. I think it's dis.dis, module.disassemble. And you give it like a function or a class or something, and it'll show you the opcodes, kind of like that load method call method I was talking about. Another really interesting thing that's coming in 3.7 is async context manager. So a context manager is a thing you can use in a with block, right? Like a file handle, database transaction, those types of things. Well, you can have asynchronous context blocks. And this async context manager lets you basically make the instantiation step in those context managers asynchronous,
Starting point is 00:15:06 which is pretty cool. Oh, that's cool. Yeah. One more that's kind of for the crazy book is now functions can have more than 255 arguments. Apparently that was a limit that was bothering someone. And they said, well, let's make it possible for functions to have more than, you know, like 300 arguments because 250 wasn't enough. Yeah, I run into that all the time. I do too. It's really frustrating. Why would you need that?
Starting point is 00:15:34 I have no idea, especially when you've got star args and star star kid w args. So anyway, it's now a thing or it's going to be a thing in 3.7. Yeah, interesting. It looks like you wrote down bytes from hex and byte array from hex. Yeah, so those are conversion functions that will parse hexadecimal strings into bytes. And the change is that it used to have an error if there was white space on the beginning or end, which really didn't affect what the thing was, but it wouldn't accept them. So now they basically strip off all the white space for you. And so it's a little more tolerant of inputs. Okay, cool. That'll matter for some people. More tolerance is always good in my opinion.
Starting point is 00:16:20 Yeah. I would love it if there was like an army of people or things that could go test my code and find out what errors for me. Yeah. Well, I was really glad. So there's an article called Unleash the Test Army. It is about a hypothesis. And I'm glad this came around because since I talk about testing a lot, I get questions about hypothesis a lot. And I have never used it.
Starting point is 00:16:47 I know that you've had- Dave McKeever? Yeah. I think you've had him on the show. Yep, I have. On TalkPython episode 67. Oh, you're ready too. Did you look that up? No, I was talking about it last night, actually. It's somebody's experience with working with Hypothesis. It's a good introductory article to kind of tell you what it is. So Hypothesis is a testing framework that will really just come up with a lot of different ways to throw. You set it up so that it throws different data at your code. And it's more of a
Starting point is 00:17:19 unit test type thing, I think. You have to define the input and output of your functions and whatnot to make it work. It's really pretty quick about being abusive and getting at where the problem areas might be. This is the first article that I've read that kind of explained how to get into it quickly because hypothesis doesn't look like something that you can really just pick up right off the bat, but this is a short introduction. One of the things I like is at the end, he talks about his conclusions with working with it. And one of the conclusions he came up with is that it forced him to pin down his function specifications and really to consider special cases. So really think about the interface to the function you're going to test. What are the good parameters?
Starting point is 00:18:03 What is the expected behavior? And what are the bad outputs? And what do those look like? Making you think about your interfaces is a good thing. So if hypothesis helps people think about interfaces, great. Yeah, I think it's really, hypothesis is interesting. I haven't had a chance to do a ton with it, but basically instead of choosing examples like,
Starting point is 00:18:22 well, let's see what's an edge case. If the register value is false and the email address looks like this and the price looks like that. That seems like a good example. Let's pass that to my test and see what happens, right? So instead of doing that, you can go to hypothesis or just write a regular test, but then add on to it this decorator that says, okay, that thing is like an email address. That thing is a Boolean. And these are some numbers. Here's their range.
Starting point is 00:18:48 Go after it. And it'll just do a bunch of different examples and record which examples worked and which ones failed and things like that and store that notafile. And it's pretty cool. It can find those edge cases and other things you might forget about. And this example of kind of do it in an interactive way, like you're not really sure how you should test your, I mean, you've written some tests, but you're not really sure what inputs to throw at it,
Starting point is 00:19:11 which test cases, and making you think about where the edges are and the different corner cases. I think that's a good thing. That is a good thing. The edges and corner cases are a super important part of unit testing, I think. Yeah, I'm still trying to figure out exactly what level of the development process and what level of testing this makes the most sense at.
Starting point is 00:19:32 But there's definitely algorithmic pieces in your code that might be a little confusing. I don't think this would make sense to throw at every unit test in your system, but there's definitely places where this would make sense. Yeah, well, it's cool. People should check it out, and it's an approachable article for sure. The last thing is one of these Python versus legacy Python things, and shock up one more win for Python. So most people have heard of Heroku. Heroku is a platform as a service cloud provider. Kenneth Wright works there, for example. So his unofficial title is something to the effect of like Python Overlord at Heroku. That's like on his business card or something.
Starting point is 00:20:12 And so anyway, he and the crew there basically make it so you can say, here's my app and here's my requirements.txt. Run this, please. And until recently, the default has been when you say run this Python app, it's like, cool, you mean 2.7, right? And you could run it on Python 3, but you had to like configure it explicitly.
Starting point is 00:20:36 If you said nothing, it ran on Python 2. The big news is on, what is that, June 20th, 2017, Heroku is switching the default to Python 3.6.1. Wow. So, hooray for Python 3. So now if you go to Heroku and you say run this, it's going to be like, awesome, Python 3, right? That's what you wanted.
Starting point is 00:20:56 And so this thing that I'm linking to basically links over or displays their blog post. And their blog post is super short that talks about it. It just says basically what I said, effective Tuesday. The default runtime is now Python 3.6.1. Yeah, so if you've already got a job running there, it won't switch, right? Exactly, no. It is only for new projects.
Starting point is 00:21:17 So in the Reddit thing, there's a few interesting quotes. Somebody said, lots of new projects start out on Heroku all the time, so this is really great news for Python 3 adoption. Someone else said, Python 3 is really happening. Yay! I was actually a little worried about the future of Python for a while, but I feel like it's all downhill from here. Yeah, apparently people that don't listen to our podcast. That's right.
Starting point is 00:21:36 Our listeners know better. I mean, there's a lot of these examples, right? We've got all the new frameworks that are exciting. We also have Django 2 dropping support for Python for python 2 and ironically those numbers match up but the newest version of django is only going to be python is python 3 only and things like that it's really starting to you know pick up speed yeah one of the that comment there was interesting is that a lot of new projects start out on heroku so must be people starting out a project and then later grabbing different server solutions or something. I haven't done a lot with Heroku to be honest, but I think it's really
Starting point is 00:22:10 simple to basically just wire up a get repository, do a push to it, and it'll just start running your app magically. So it's really, really easy to get started. And then maybe as you grow, maybe like costs become a concern or you just want more control or whatever, but it's super easy to get started. And then maybe as you grow, maybe like costs become a concern or you just want more control or whatever, but it's super easy to get started. And however you get started on whichever version of Python is probably where you're going to stay. So that's good news.
Starting point is 00:22:34 Yeah, great. Well, cool. Yeah, very cool. And that's it for the news, Brian. You got anything else you want to share? No, no. So, wow, number 30 in the can almost. 30, yeah, that's awesome.
Starting point is 00:22:44 I'm finishing up the last chapter this week, chapter 7 for Python testing. So that's going to be done soon. Yeah, yeah, very, very cool. One of these days, the book will be a thing that you've done in the past instead of a constant job of yours. Yeah, yeah. And hopefully, I can't wait until it's an actual physical copy. So it'll be good to have a stack of copies with that. Yeah, that's awesome to hear you're making progress. And so thanks for covering this news with me. How about you? Do you have like now
Starting point is 00:23:14 four months of podcasts ready? I have about three months of podcasts that I've recorded. I'm going to go on vacation for a while in the later half of the summer. So I'm trying to make sure that everything is going to be smooth, no interruptions. And so I have, I think, 13 to 14 episodes of TalkPython already recorded. There's tons of interesting stuff. I'm really looking forward to sharing. I don't want to hold it back, but I've got to dole them out week over week or it won't solve the problem. How about this? And as for this podcast, if we, we haven't really decided yet, but if we do a break, we'll definitely let people know before that happens
Starting point is 00:23:50 so that they're not just hanging out there waiting. Yeah, absolutely. We'll try to, we'll try to keep it rolling, but we might, we might miss a week or two with some, some trips there.
Starting point is 00:23:59 Okay. In the summer. All right. Well, thanks for sharing your news with everyone and thank you to Datadog. Get your t-shirt, pythonbytes.fm slash Datadog. Thanks, Brian. See you next week. Thank you. Yep.fm. If you have a news item you want featured,
Starting point is 00:24:27 just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Ocken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.