Coding Blocks - Designing Data-Intensive Applications – Scalability

Starting point is 00:00:00 You're listening to Coding Blocks, episode 121. Subscribe to us, leave us a review on iTunes, Spotify, Stitcher, and more using your favorite podcast app. And check us out at CodingBlocks.net where you can find show notes, examples, discussion, and a whole lot more. Send your feedback, questions, and rants to comments at CodingBlocks.net. Follow us on Twitter at CodingBlocks or head to www.CodingBcks.net and find all our social links there at the top of the page. With that, I am www.AlanUnderwood. I'm Joe Zach, who does not smack his lips anymore. Oh.

Starting point is 00:00:35 Oh, right. And well, first, I want to know, like, how long did you have to practice that www and I'm Michael Outlaw? It actually worked. I don't know. I didn't practice at all. And it just somehow worked once. Once. This episode is sponsored by educative.io.

Starting point is 00:00:57 Level up your coding skills quickly and efficiently, whether you're just starting, preparing for an interview, or just looking to grow your skill set. All right. So today we're continuing on talking about designing data-intensive applications. And we're still kind of laying the groundwork here talking about scalability and things like load parameters and performance numbers, which I think is actually surprisingly cool. I was surprisingly happy with this section here. So I hope you will enjoy it. Cool. So I guess with that, we'll get into a little bit of podcast news.

Starting point is 00:01:37 You know, we shrink it back down this time. Last time we actually, we had more than we had in a while. So first off, we'd like to start with the reviews. So, Hey, thank you. Everybody that actually wrote in a review. Cause we were kind of sad last time. Like we were like, Oh man, I hope there's at least one. This will be like the first time in a hundred somewhat episodes that we didn't have any. We had one.

Starting point is 00:01:58 So, or no, we had two last time. Right. So at any rate, uh, everybody came out in droves and did it this time. So here we go. Hot Reload Jalapeno, that was one, on Stitcher, by the way. Leonique, not sure on that one,

Starting point is 00:02:13 anonymous, somebody chose not to put anything, and Juke0815. Alright, over on iTunes we got Bobby Richard, which I think is the Bobby Richard maybe that... You said it wrong. That is a strike. Is it really Richard? It is Richard.

Starting point is 00:02:28 It really is. Oh, my gosh. I'm so sorry, Bobby. I thought people were just messing with you. They pronounce your name like that. No, he's from Bayou Land. Yes. Oh, man.

Starting point is 00:02:37 Okay. Well, sorry for calling you out and then underlining it and bolding it and highlighting it. Bobby Richard. Sean Needs New Glasses. Teshi Nguyen, Vassal07, and TheGenJohn. I really appreciate those views. You know, it keeps the wheel keeps on turning. And so big thank you to that. Also, I've got to say big thank you to Waffling Tailors for having me on the podcast.

Starting point is 00:03:05 I mentioned it last time, but they just released, I think it was a three-part series, that I was guest on talking about old school and new school video games. And the second part in the series actually contains a really cool intro with some music provided by yours truly. And, yeah, it was really funny and awesome. You should go check it out. I need to download that. I absolutely do. I haven't listened to a podcast in a while. You got to mash that sub, sub, sub.

Starting point is 00:03:27 I do need to sub, sub, sub, sub. All right. So this one was a bit of news. It has nothing to do with programming. This is more of just, hey, you know, be careful what you think you're getting for free. So there was a report that came out today. This one came from Forbes. I'm sure it's all over the web everywhere.

Starting point is 00:03:46 But there are 8 million people that have apparently downloaded this free mobile jump VPN because they think that they're going to be able to anonymously do stuff online. Well, guess what? They're shipping all of your personal data, location, information, everything over to China, which, you know, regardless of whether it's China or anywhere else, just be aware that, you know, typically not getting things for free, right? Well, it goes back to that old thing that if it's free, then you are the product. Right. Yeah. That's what it is, right? So just, you know, be aware of the things that you download. I know. So being that the three of us work for a security company, I think we're probably more paranoid

Starting point is 00:04:31 than most people. We were probably more paranoid than most people prior to joining a security company. Probably why we got to a security company. Right. Right. And, and now it's even worse. But like every time that I go to install an app on my phone on Android, I look at it and I'm like, you want what permissions? Nah.

Starting point is 00:04:48 Yeah. There was a story that was kind of similar to this one. I want to say it was like over the summer where Facebook. Do you recall this story where Facebook was doing something similar where they were giving out a VPN service and they were targeting kids. Man, that's ridiculous. And their defense was, oh, well, the parents had to give consent. But then the people who were like, you know, some of the backlash was like, yeah, but some of the parents have no idea what they're signing up for. The kids are just like, hey, will you click OK?

Starting point is 00:05:20 Right, click OK, accept, whatever. Yeah, man, when you start reading the fine print on just about any of this stuff it drives it's it hurts my head like so a complete tangent because we're at the very beginning of the show that's where you do tangents um and throughout is you know i've been researching security systems and stuff and one of the things that drives me crazy if you buy any of these security systems that have cameras on them, you're basically signing over that video, right? Like, you know, that basically if you get a ring system or a nest or any of them, there are people on the other end that have access to that stuff. And, you know, they're looking at it so that they can better their machine learning or they're looking at it so that they can find out facial detection.

Starting point is 00:06:05 Like, again, just this kind of stuff drives me crazy. So, you know, the key point here is, especially with a lot of new technology and you guys saw the FBI report about all the smart TVs that were just bought over the holidays. They're saying, hey, these things are probably spying on you. So, you know, I'll step off my paranoid soapbox now. But yeah, just, you know, pay attention. Like, look at stuff. And you know why you're stepping off your paranoid soapbox? Because you're paranoid that it might break.

Starting point is 00:06:34 Well, that's just because I'm overweight. I need to fix that. Wait a minute. Now you're making it sound like I'm a jerk because of what I said. Oh, no, no, no. That's my own self-conscious awareness. Right. So and then the next thing that I got here is, you know, I mentioned last time begrudgingly because I don't like telling people that I'm out of town and all that kind of stuff. But I will be out of be able to say hi to some people. So, um, you know, James again, Zach Braddy reached out. So, uh, you know, Jamie, like I'm definitely hoping to meet up with a lot

Starting point is 00:07:10 of people that we've interacted with over the years, like super excited about it. So this is just messaging redundancy so that they absolutely will know that you will be out. I will be out there. Right. So again, January 31st is when my particular session is at 1140 in the morning. So, you know, come out and learn about some streaming technologies with some SQL Server and some Kafka and all that goodness. All right. Well, awesome. So I wanted to mention that we're going to be doing another book giveaway. So I just picked the winner right now when I should have been paying attention to recording.

Starting point is 00:07:43 So I'll be sending you a message. And that's how we do over here in cutting blocks land so if you are interested in winning a copy of the book we're talking about today designing data intensive applications then go ahead and just uh drop a comment there and uh let us know what you uh agree with disagree with or i don't know like what you're eating for dinner over in the comments there and uh you might want to book. Yeah. Hey, you know,

Starting point is 00:08:06 and it is awesome that people write and they're like, Hey, I love the show, whatever. I mean, you can butter us up. That's fine, but you don't have to,

Starting point is 00:08:13 right? Like you, like you said, you can tell us what you're eating for dinner, whatever. I mean, we just like to see that people come up there and comment. So that's,

Starting point is 00:08:19 that's excellent. And also I have a question for you guys. So we've done several books. Where would you rank this one in terms of books that you've read that you enjoy? Like, what is this? It's number one for me now.

Starting point is 00:08:34 Number one. Okay. I think what about you? We haven't even got to the good parts. I feel like, I know we have like this part's good, but it gets better. This is,

Starting point is 00:08:42 this is laying the groundwork for the, the cool stuff to come and even the groundwork is really good i think you guys might be ahead of me then because like i haven't gotten to anything that's making me think you know it's number one okay so i mean it's not a it's it's let's be honest this is no get oh my god see i thought you're gonna go with the art of unit testing i figured that was your number one uh no there's um uh a book right there to you i can't remember the name of it um it was like real real world machine learning that was a really fun one to read okay um but we didn't do it on the show i didn't oh you mean like which one we've done on the show yeah well in general we didn't

Starting point is 00:09:24 do the art of unit testing on the show. No, that's fair. Okay, yeah. So where would this fall in all your books? Like I said, it's too early for me. I'm apparently not as far into it as you guys are because you guys are way more excited. I'm not saying I'm not enjoying the book. Don't get me wrong.

Starting point is 00:09:40 That sounded kind of negative after I heard it. No, no, not at all. No, this is number one for me, too. Like, I truly am enjoying the ever-living heck out of this book. So, yeah. Hey, don't curse on our show. Right? I did not.

Starting point is 00:09:53 So, what is your current leader, though? Yeah, what's your very favorite book? The Twilight series? No, that was a joke. No, it really wasn't. It totally wasn't. There's always a little bit of truth in any lie.

Starting point is 00:10:17 Yeah. Wow. No. No, honestly, I don't know. I don't know. I don't know. Is Kling Code in there? I guess. I mean, they're all up there, right?

Starting point is 00:10:34 There's good things about all of them. But none of them just jump out at you? This is the first one that I sat down and I'm reading it like a novel. I read it and I'm like, man, I really don't want to stop reading, but I'm really tired. I need to. Another replication strategy? Bring it on.

Starting point is 00:10:55 Yeah, I guess I don't know that I ever think about books like that. Okay, fair enough. Well, let's jump into this. I mean, there's a bunch of favorites. So, I don't know. Maybe that's not a fair statement, too, because there's a bunch of favorites, but I never think of it as the one.

Starting point is 00:11:10 I couldn't even do this with music if we tried. I was going to say, so this is like asking you what your favorite movie is or what your favorite album is. Yeah. Okay, that's fair. That's fair. I get it. All right.

Starting point is 00:11:21 Well, who's kicking us off here? Here. You want to do it, Joe? Yeah. So today we're talking about scalability. And they do a really good job of kind of breaking up how to talk about scalability. So we're going to kind of intro it a little bit. And then we're going to talk about some of the actual numbers and parameters that you can look at to actually measure this. So it's getting a little scientific in horror.

Starting point is 00:11:46 It's kind of nice to actually have something you can measure so much of what we talked about in programming is it's um it's really hard to put kind of hard numbers on even when you're doing like static analysis or something it's hard to really say like something's truly better than it was before and so it's nice to to have this kind of here so that was i think a lot of fun and kind of refreshing to talk about in this chapter. But the reason I should say that we're even talking about scalability is because increased load is a common reason for the degradation of reliability, which is what we talked about last time. And scalability is the term you used to describe systems' ability to cope with increased load. So I've got a dog squeaking a toy here.

Starting point is 00:12:27 Are you chewing on a squeaky toy while you – or maybe you're just playing with a squeaky toy while you talk to us? Yeah, I'm going to – give me a second here. That's so awesome. All right. So I don't – it didn't quite sound like he finished that thought, but scalability is a term used to describe a system's ability to cope with increased load. Yep. So just to get that out there. And we have here that it can be tempting to say that something scales or doesn't scale, but referring to a system as scalable really means how good your options are.

Starting point is 00:13:01 Yep. So we got a couple of questions here to ask of your system. If the system grows in any particular way with more data, more users, more usage, what are your options for coping with that or dealing with that growth? And how easy is it to add computing resources? And that's really what scalability boils down to. All right. Go ahead. You back? Yeah, I'm back. I got the squirrel.

Starting point is 00:13:33 Okay. So we're safe for five minutes until we get another one. So describing loads. We need to figure out some numbers in order to describe the load before we can truly answer any questions about it. And load parameters are basically the name that we're going to use to refer to the metrics that make the most sense for our system.

Starting point is 00:13:53 And it's essentially a measure of stress. And I really want to kind of emphasize that load parameters are a measure of stress. They're not performance. So this doesn't tell you how fast things are doing. These are your load parameters. These are just the way you tell how much you're dealing with, essentially. So we're talking about number of requests and that kind of thing, right? Exactly.

Starting point is 00:14:16 So if you're a web server, requests per second tells you how much load you're under. If you're a database, maybe your read-write ratio is really important. Maybe just your number of reads per second. Or maybe if you're doing a video game, maybe like polygons shown on the screen at once or something like that kind of would define basically how much stress your system is under. So load parameters, stressors. That's the way I've kind of chosen to think about it. And so you even have a note up here also that says, hey, different parameters may matter more for your particular situation, right?

Starting point is 00:14:52 So in one case, number of requests per second might be the load parameter that you care the most about, right? In another situation, it might be the speed of reading from disk or writing to disk or, you know, there could be any number of things that are very specific to the situation you're dealing with. And you might even do like a ratio, like I mentioned, a read-write ratio, or like if you do a cache, maybe you care about the hit-miss ratio in order to kind of define like how many responses am I missing the cache on? Because that kind of tells me a little bit about the way my application

Starting point is 00:15:25 is behaving and how it's kind of going down the non-optimal path. And so this is a way of helping me describe the state and the stress that my application or system is under. Yeah, I have this old thing that we used, that the three of us used. I don't know if you recall the formula though, that from a previous site that we worked on with like the number of concurrent users like how we would try to calculate the number of concurrent users i vaguely remember it yeah do you remember it joe nope so it was i i wrote it down like years ago because i was like oh that's an an interesting way of thinking about it where we were keeping up with the uh, the stressors, I guess,

Starting point is 00:16:09 or the, what do you call it, the load? Load parameters? Parameters. There you go. Thank you. We were looking at where, like, the average think time, like how much time a person might stay on a given page. So the average think time times whatever target page per second we wanted

Starting point is 00:16:30 page per second divided by CPU times CPU. So average think time times, I'm trying to like describe this math formula because like part of it is a division. So like, what if I said, okay, cause it's all i said okay because it's all commutative since it's multiplication so uh page per second divided by cpu that answer times

Starting point is 00:16:53 average think time times cpu and that would be our number of concurrent users that we were estimating we could we could handle oh yeah that's a good point because it's it's really hard to tell someone's a active user or not like whether they closed the browser or whatever. And there's some tricks for kind of maybe getting that info from the client side, but you can't really trust any of that stuff. So it's really hard to know if someone's truly active or not. And so you've got to kind of cobble this stuff together. Yeah, and the reason why I bring that up is because Alan mentioned about the different parameters for your system, right? And so that's what we were trying to target then at that time, right?

Starting point is 00:17:27 But, you know, today, to your point, it might be something else. You might go after a different, you know, metric. Yeah, so take a minute to think about, like, the system that you're working on right now and think about what load parameters, what stressors are most important to you right now, and are you tracking them? Oh, and even then, like, that was what we started with right but i remember like we eventually started moving towards like how how long it was taking for the entire thing to load right right do you remember that i do so like you know and it was no longer about like hey how many concurrent

Starting point is 00:17:59 users can we have hit a particular web server it was hey how fast can we just load the page period the page, period, right? Because that was... Well, glad you brought that up. Because now we're kind of slipping into the other set of parameters here, our numbers. So we talked about the stressors, which is like number of users, amount of traffic, page loads per minute, things like that. Now, when you start talking about performance, then the book is very careful to use the word performance numbers. It doesn't actually careful to use the word performance numbers. It doesn't actually say parameters. It says performance numbers. And it kind of gives you two categories here, two ways to look at describing performance.

Starting point is 00:18:34 One is how does performance change when you increase a load parameter without changing resources? So as the concurrent users go up, how does my performance numbers change? How do my performance numbers change? And the other way is how much do you need to increase your resources to maintain your current level of performance while increasing a load parameter? So I would say if you're talking about like response times or page load times, now we're talking about basically performance numbers. And in that case, we're talking about, you know, as the number of concurrent users goes up, that's our load parameter. How is our page load time affected? Presumably it gets worse.

Starting point is 00:19:20 Yeah, this is very much like the scientific method, right? Like change one parameter, see what that does, and then see what you can change on the other side, one parameter. And that way you can actually find out where the load is actually affecting your system and how you mitigate that load. And then over time, we evolved to like, okay, we care more about like, let's go after performance and target that metric instead. Yep. Well, we started with one because we didn't know where to start with, right? Right. So that seemed like a – sorry to interrupt you, Joe. But that seemed like a reasonable thing. And then we transferred over.

Starting point is 00:20:04 There's actually something pretty cool coming up in a little while that we'll sort of refer back to this with the whole starting with – Yeah, there's the survey. But the assumptions part of this. So go ahead, Joe. I was going to say that it's very important to note that the load parameters are a big part of your performance numbers. Because you can't really express performance numbers in a meaningful way without also including load parameters are a big part of your performance numbers because you can't really express performance numbers in a meaningful way without also including load parameters. So it doesn't make sense to say, oh, my page load time is 200 milliseconds. You kind of need to say it's 200 milliseconds when

Starting point is 00:20:37 we have X number of users or X number of queries or X number of otherwise. Otherwise, it's kind of like, what are you really describing? Are you describing how well your site or application does when it's optimally performing? Because who cares about that? I want to know how it works in production. I want to know how it works when it's getting beaten up. I don't want to know how it runs in a lab.

Starting point is 00:20:58 Right, or how it runs with one user, right? Like one user on there doesn't really make any sense. Yeah, exactly. Yeah. Yeah. I i mean in our case though we weren't we weren't talking lab environment we were talking about like what the real world environment was doing but there there has to be some sort of load that determines it right because well yeah i get what you're saying though right like you know okay yeah you tell me

Starting point is 00:21:24 like hey the production environment this is what the you know, okay. Yeah. You tell me like, Hey, the production environment, this is what the, you know, the current load time is of the page. Right. But to Joe's point, it's like, well, if you don't know how many concurrent hits you're currently getting, then what does that number really mean? Right. So yeah, but I'm trying to be fair to us too. Like, Hey, you know, like let's let's not uh let's not be too mean to ourselves here like you know we weren't doing this in a lab environment no i think we i think we used a lot of that stuff and we we backed into the load numbers based off you know actual performance so it's well i don't know that we actually even looked at the load at that time you don't think so no very

Starting point is 00:22:01 possible no okay so we were idiots. No, no. It's not bad. It's just the book is providing kind of like a formal structure here. Yeah. Say like when we're talking about scalability, specifically scalability, we're basically studying what we can do with our resources in order to improve our performance. In order to do that, we have to understand what our stressors are and also the things that we care about measuring in order to know how to scale.

Starting point is 00:22:27 So the point is that knowing this now, we would look at this different. Back then, at that particular time, we would just say like, okay, hey, this is what like, you know, the various tools that were available at the time, you know, Page Insights and the Google Analytics. And I think like Pingdom had some tools, page tools, you know, they could tell you like various speeds, whatnot. Like those were signals that could say like, hey, you know, we're doing hits from around from these various servers from around the world. These are the average times that we're seeing, blah, blah, blah, as well as numbers that we might see in like a dev tools for example right but today the point that joe is making is that we wouldn't just look at that one number because what does that one number mean

Starting point is 00:23:13 right right although you don't have context to it we're probably doing it for marketing reasons primarily all right we want to know like if we improve the performance how well does that convert to actual sales in this case we we're specifically focusing on scalability. So if we're saying, like, how many web servers do we need to run? Do we need to provision? You know, in a cloud world, it doesn't really make so much sense. But if we're talking specifically about, like, we're having a big sale come up next week, how much is that going to cost us in terms of hardware? That's when you really need to start understanding your load parameters and performance numbers when you're specifically trying to figure out what that is going to mean for your resources

Starting point is 00:23:52 and basically your costs. And those performance numbers basically, they measure how well your system is responding. So we've got some examples here. Throughput is something I've seen used a lot and basically means like records processed per second. So this could be in terms of like real-time streaming or it can be in terms of a batch process, like how long does it take to do a batch job

Starting point is 00:24:15 with a million records in it. And response time is another one that we mentioned. Now that you mentioned, the book makes, it's got kind of a cool side note here. I just, I spanked twice. I'm sorry, Holly.'s got kind of a cool side note here. I just, I spent twice. I'm sorry, Holly, uh,

Starting point is 00:24:27 book has a cool little side note here about latency versus response time. And I didn't know, I never really thought about it. I always kind of use them like interchangeably. And what they say here is that latency is how long a request is sitting idle, awaiting service, but not doing anything. either the database server or application server that's trying to hit is too busy to handle it or you know there's some sort of other process

Starting point is 00:24:52 that's kind of tying it up but latency is how long it's waiting idle response time is the total time it takes for the client so the observer observer, to see the response, which includes any latency. But it just kind of gives you more of an idea about what your bottleneck is. I don't know that's necessarily material to what we're kind of studying here, but I just thought it was a cool side note. Well, no, I think it is material, and we'll probably get into some of the reasons here shortly, but it is because understanding what could cause your latency versus what can cause your response times to be high, that can actually change how you attack the problem, right, is basically what it boils down to. Okay, yeah, that makes sense. And we're going to dive into a bit more about these numbers here.

Starting point is 00:25:41 But first… This episode is sponsored by educative.io. Every developer knows that being a developer means constantly learning new frameworks, languages, patterns, and practices. But there's so many resources out there. Where should you go? Meet educative.io. Educative.io is a browser-based learning environment, allowing you to jump right in and learn as quickly as possible without needing to set up and configure your local environment. The courses are full of interactive exercises and playgrounds that are not only super visual, but more importantly, they're engaging. And the text-based courses allow you to easily skim the course back and forth like a book, so there's no need to scrub through hours of video just to get to the parts that you really want to focus in on. Yeah, and just when you thought that it couldn't get any better, they've now introduced subscriptions.

Starting point is 00:26:33 So check this out. For a limited time, they're offering 50% off their new subscription price. And with that, once you subscribe at that price, you're locked in to that subscription price for as long as you remain a subscriber. So it's basically like you can head to educative.io slash coding box and get 20% off any single course, or you can subscribe and you're essentially getting 50% off of every course. And I want to mention again, we talked about a little bit on the show here, but Grocking the System Design Interview has been one of the favorite courses I've ever taken just of all time. It's been really great. And it really goes hand in hand with a lot of things that we have been talking about lately with the book. So I definitely recommend you check that one out if you're looking for somebody to get started. Yep. So start your learning today by going to educative.io slash coding blocks. That's educative.io, E-D-U-C-A-T-I-V-E dot IO slash coding blocks and get 20% off any course. Hey, if you haven't already left us a review, we would greatly appreciate it if you did, as we said earlier. So you can head to www.codingblocks.net slash review where you can find some helpful links there.

Starting point is 00:27:53 And with that, we will head into my favorite portion of the show, Survey Says. All right. This one's going to be fun. So let's see. A few episodes back, we asked, is DevOps a dot, dot, dot, a job title, hiring now, or a job function, Get back to work. All right. So I don't remember who went first. So Eeny, Meeny, Allen goes first.

Starting point is 00:28:34 All right. Your name starts with A, so that probably didn't help you. There we go. You probably got that a lot growing up too, right? Well, no. So it was one of two extremes. Now, Joe definitely went last. Well, so mine was A.

Starting point is 00:28:50 So if we went by first name, I was there. But if it went by last names, I was typically in the end of the line, which Joe would have actually beat me. He would have been right behind me. Joe got you there. Don't even try it. Right. So I'm really excited to see what this one is because I was the, the fence setter on this one. Uh, you know, I don't know. I think it could totally be a title.

Starting point is 00:29:10 I think it's also a job function. I'm going to go with people want to get paid for it. So let's call it a job title hiring now. And I'm going to go, I have no confidence in my number here. I'm going to say 35%. I mean there's only two options. One is going to be super heavy and the other one is not. Okay. Well, I happen to know that a job function won with a minimum of 51%. Wait, you know that?

Starting point is 00:29:41 I know based on the comments received. Man, that's just the passionate people that reach out. The more passive people. And I enjoy it. It was fun. It was very funny and I heard some really good arguments for it and so I really appreciate it. There was a lot of conversation about this particular show. It was really good and slack.

Starting point is 00:30:00 This one struck a nerve. This one was good. I am curious but I think only the hyper-passionate reached out to you. The rest of them just went up here and voted. So that's why I'm going to win. All right. So Alan says job title 35%. Yep.

Starting point is 00:30:16 Joe says job function with 51%. Yeah, he's going to lose by Price is Right rules. No, he can't. I don't think he can. Anyways, go ahead. Okay. Do I need a drumroll? No, he can't. I don't think he can. Anyways, go ahead. Okay. Do I need a drum roll? No, I don't need a drum roll.

Starting point is 00:30:29 That was a horrible request. I should have never asked for a drum roll. Here comes the helicopter. I'm sorry. We just blew out some speakers. Joe wins. Really? Yeah, of course.

Starting point is 00:30:42 By how much? Of course, Joe wins. Come on. What was the percentage? Even with the Price is Right rules thrown into it, Joe wins. Was? Yeah, of course. By how much? Of course Joe wins. Come on. What was the percentage? Even with the Price is Right rules thrown into it, Joe wins. Was it like 52%? I'm even going to give the half part of this. 82.5% of our listeners are correct by saying it's a job function.

Starting point is 00:31:01 Thank you for all four of you that voted. What? No. 82% so you can figure out exactly how many people voted. Wow, that's impressive, man. Yeah, so most of our audience is right. I'm good with that. But there are other people who said, I want to get paid. That's awesome.

Starting point is 00:31:22 Yeah, so I can do this survey or I could give you Right? Yeah. That's awesome. Yeah. So, all right. So, I can do this survey or I could give you a little bit of a joke. We need a joke. We'll do a joke. Okay. So, Mike RG from Slack, if you've ever been on our Slack channel, you might have seen one or two posts from him.

Starting point is 00:31:41 He's very active in our Slack. He shared a tweet from my dad jokes uh with me oh actually no was that a tweet or oh yeah it was a tweet that was sent via reddit okay got it what do spanish programmers code in c um i don't know okay the answer per the tweet is c plus plus but i would have accepted c which is what joe said so yes well done joe well done sir. I got one! Alright, so for this episode's survey,

Starting point is 00:32:31 we ask, with the new year coming, what kind of learning resolution do you plan on setting? And this, prepare yourself, this is a long one. So many options. I plan to learn dot dot dot and your choices are a new language like rust go or lol code no seriously lol code

Starting point is 00:32:56 yeah lol code unfortunately it's a thing yeah there were some great ones oh in the bizarre language too like uh well one that we i don't Brain, we'll just say Brain F is one if you haven't seen that one. Arnold C was another one, but I was like, okay, we'll keep it with low code. Or another option here. How about a new JavaScript framework like React or Angular, but probably ExtJS. Infrastructure things like Docker or Kubernetes. Is virtual PC still a thing? Higher level concepts like machine learning and AI so I can prepare myself for Skynet.

Starting point is 00:33:40 Or more about an OS, maybe a new OS or just get better with the current one. A new database system, DB2, here I come. Streaming data solutions like Kafka or Kinesis, and depending on which part of Long Island you're from, it might be Kafka instead of Kafka. It is Kafka. Yeah.

Starting point is 00:34:03 Search solutions like Elastic or Azure Search. I can't even say it. Azure Search. I don't know. I need to Google it some more. I was wondering if you could chuckle on that one. That's pretty good. Yeah.

Starting point is 00:34:19 Algorithms. I need to go back to basics. How does Bellman Ford work again? Or data structures because I want to go way back to basics. How does Bellman Ford work again? Or data structures because I want to go way back to basics. Or all about cloud services. I hear AWS is a thing. I like this one. This has got a good smattering of different approaches here.

Starting point is 00:34:40 Yeah. Yeah, I'm going to go for like 7 out of 10. Ooh, should I make it multiple choice? Oh, that would be interesting. Yeah, I'm going to go for like 7 out of 10. Ooh, should I make it multiple choice? Oh, that would be interesting. Yeah, why not? Because then you see what people are truly interested in and they're not forced into a box. Yeah, that makes the results interesting. All right, I'm in.

Starting point is 00:34:56 I'm in. Multiple choice. Do you have to pick? Yeah, as many as you want. Yeah. But be truthful, right? Like be truthful. What are you actually going to spend time on? Why would you lie right like be truthful what are you actually gonna

Starting point is 00:35:05 well i mean spending time on why would you lie on this like who are you really lying to no i guess what i'm saying is if you answer this answer it in like hey i really do plan on looking at these not hey what am i interested in because i think most of us are interested in a little bit of everything oh yeah yeah yeah you know what i I'm saying? So pick the ones that you think you'll actually go after. Right. I wonder, hmm, can we even do multiple choice? I'm sure we can. Yeah, so you should go to the book club and see what we figure out.

Starting point is 00:35:36 No, don't email us. I was totally kidding. Oh, yeah, we can totally do multiple choice. Done. All right, beautiful. Done. What a silly question, Michael. Of course you can do multiple choice.

Starting point is 00:35:43 There we go. Gosh. All right, fine. Done. What a silly question, Michael. Of course you can do multiple choice. There we go. Gosh. All right. Fine. Let's get in. We talked about numbers like 35% and 51%. So let's talk about more numbers. Yeah.

Starting point is 00:35:56 And so I really like the idea of them kind of referring to them as numbers, even though it was confusing. I really wanted to call them parameters for that kind of symmetry there. But the reason they call it numbers is because it's generally a set of numbers that you talk about. So when you talk about response time, it doesn't make sense to really give just one number because there's actually a lot of numbers that are really important for different purposes. Things like the minimum response time, max, average, median, and percentile, which we'll talk a little bit about more about percentile here in a minute that's really interesting but sometimes the aisle irs are really important

Starting point is 00:36:30 like maybe the average is totally fine like the example i liked here was basically fps if you're playing a video game maybe the average fps is 59 which is nice and comfortable and enjoyable. But if whenever you fight the boss, it drops down to 10 frames per second, then that game is people are going to be mad. It's going to be nearly unplayable and frustrating and a bad experience for everyone. So even though 99% of the time it's 60 or above, if it drops down to 10 when things matter, then that's a really big deal. And so that's why sometimes it makes more sense to look at different numbers and things like averages are much too simplistic.

Starting point is 00:37:08 So it's common to get a suite of numbers here when we're talking about performance. I bet, I bet, um, Google Stadia is probably super interested in things like this right now, or Stadia. I can't remember what it's called.

Starting point is 00:37:23 If you guys are aware of what that thing is yeah i've heard kind of mixed reviews but uh i want one very mixed reviews like some people people like you guys you an outlaw who love your pc gaming are addicted to the high resolution and all that kind of stuff and i've heard that it just it doesn't you don't get that same crispness right so? So I don't know. But this whole thing matters a lot to them, right? Yeah. I mean, now that we're talking about gaming, we totally should have brought this up in the news section that Halo is out on the PC now.

Starting point is 00:38:01 And so now we're going to have to make it a regular thing that we have a coding blocks halo night because we did that for the release night of halo it had a blast and i think we're gonna have to make that a regular thing all right literally looks like i'm gonna be buying it now yeah i'm probably more apt to play that than i am rocket league i loved rocket league but i mean spoon killed us right like spoon made it not even fun. I get destroyed. Don't get me wrong. Yeah, we'll get destroyed in Hanglo, too. I'm fine with that.

Starting point is 00:38:33 For whatever reason, I don't mind dying, but I cannot stand not hitting that ball. Like, there's something that kills me about that. Anyways. Awesome. Well, okay, we'll play Oddball, then, and that'll make you feel better. Oh, I love Oddball. That one's fun. All right. All right, so

Starting point is 00:38:49 average may not make sense, just like we mentioned with the game example there, and that kind of applies to everything too. And so a lot of places, especially the book mentions Amazon, looks at percentiles. For example, you take the median, which is essentially the middle value of

Starting point is 00:39:07 a set of numbers, and you sort all the times and grab the one in the middle. Then the median is known as the 50th percentile. So that's what half the people have worse and half the people have better. So someone like Amazon might say, we have an objective of 200 millisecond response times for the 99th percentile, which means 99% of people have 200 milliseconds or better. And only 1% of the users or requests are going to be worse than that. Now, that's a very extreme number. I don't know how realistic that is, but it just kind of gives you an idea of how you can kind of use these different sliders and kind of take different stabs of things in order to adjust that. So maybe my FPS example, we might have an objective saying never drops below 30 and FPS is above 50 for the 99th percentile.

Starting point is 00:40:02 And that might be like a really good experience. And that's something that you can measure over time and graph. And if something drops below that, then you can throw an alert. Hey, and by the way, backing up real quick on this whole, the average may not be your best measure. If you want to see the typical response time of something that's really important to know, right? Because you could be severely top loaded on, you know, some really bad response times, right? Let's say that most of your response times are all 100 milliseconds at the P50 or the 50th percentile, but your extreme bad ones were five minutes, right? Those are going to skew your results so that your average no longer tells you what a typical response time is.

Starting point is 00:40:43 And that's why a lot of times you're looking at P 50. That's why you sort the entire set and you say, Hey, this is middle of the road right here. You know, 50% of my users get better than a hundred millisecond response time. And the other 50 get worse, right? Like it's the middle of the road. So that's better for your typical rather than using average because average can absolutely be skewed just by large or super small numbers, right? Yeah, they say actually, and I totally agree with it, that average is almost never the number you want to use. Right.

Starting point is 00:41:13 Because it's so easily skewed. Like using example, Amazon again, the average salary for someone at Amazon is probably really good. I don't know. Call it $300,000 a year, which is insane. But the median salary with all the warehouse workers and drivers and cafeteria workers and janitors and everything else might be more like, I don't know, $30,000 a year. But it's so skewed by Jeff Bezos there at the top that it can wreck that whole number and make it essentially meaningless and you know maybe even harmful because it doesn't give you that information so something like the percentile or the median is going to be much more useful to you yep and what they say here is to

Starting point is 00:41:54 find the outliers typically typically people are looking at the 95th 99th and 99.9th percentile that much so here's the thing if you go back to statistics i mean you guys probably remember this right like there was one standard deviation and then two standard deviations away i want to say this falls into the um one standard deviation right because i think two was like 87 point something and one was like oh yeah i i can't remember exactly, but for whatever reason, it triggered all that. But basically what we're saying is if you are looking at the 95th percentile, then that means that 95% of people are falling under whatever that – if its response time is your measure, then most people are falling under that, right? And so that last 5% of people are kind of not getting the best experience. Yeah, so it would be how to say this, the 0.1% would be

Starting point is 00:42:56 on the far ends of the tail of that standard deviation. That would be like the 99.7, right? so 66 percent are going to fall within the first deviation okay in a standard distribution in a standard distribution we're talking about a bell curve right and then and then 95 will be percent will be within two standard deviations there we go and then the the third third will be in the 99.7 yeah yeah so 68 was one standard deviation oh sorry i said 66 you're right 68 okay cool so um and and i was going to bring up standard deviation too because you were talking about like with the averages and everything

Starting point is 00:43:40 and uh you know going back to your Bezos example, because, um, like it, it, all the more reason why the average can be less meaningful to you is in the case where you do have skewed results, like whether, whether that distribution is skewed either positively or negatively, because, um, you know, it's probably going to – the chances are it's going to be skewed one way or the other, right? And so that's why that average is going to be not as – Misleading. Yeah. Yeah, unless you have a really good distribution,

Starting point is 00:44:16 an even distribution on both sides, average is probably – in almost every case, you shouldn't use it. Like, you know – Yeah, if you take all the raw data, you're probably not going raw data, your chances are going to be slim that you're going to have a normal distribution. Yep. Right? And so this next thing that I thought was really cool, and it's good to hear, and I

Starting point is 00:44:34 think this is one of the reasons why I like the book so much, is they use a lot of concrete examples, right? And one of the things they mention is Amazon describes their response times in P999. So three nines, right? 99.9%. Because even though when you look at it like this, that only affects one in 1,000 people, one in 1,000, they care about it. And why?

Starting point is 00:45:00 Because these slowest response times typically happen for the customers that have the most purchases. If you have a that have the most purchases. If you have a customer with the most purchases, what does that mean? They're probably a really valuable customer, right? So they do care about that one in 1,000 because they want those people to continue having a good experience on their site. But the thing that's interesting is they talk about, additionally, they don't care about four nines because trying to get those response times down for the four nines is hyper expensive, right? Like you're talking about one in 10,000 people experiences a slowdown. And now how do you even pinpoint that?

Starting point is 00:45:45 Right? Like it might be an environmental thing. There could be a network router out somewhere in between here and in that one person that had the problem out of 10,000 people, right? Like there's so many variables that they're really difficult to solve for that. It's probably just not worth spending the time and the money on that. And actually, uh, I wanted to give a shout out to Devin Goble over in the Slack channel as well. We even had a conversation when we talked about on last episode of hardware failures,

Starting point is 00:46:15 and we were talking about S3 has 11 nines of uptime, not reliability, but uptime, and things like that. Most companies aren't going to spend that time and money on that because to get to even four or five nines of reliability on your hardware is cost prohibitive. It's cheaper to just buy new hardware than it is to try and make your software and everything so incredibly bulletproof that you'll never experience a hardware hardware failure so um really interesting stuff that's why those slas were always interesting yeah yeah and i mean do

Starting point is 00:46:52 you remember like crazy slas like you know even like it didn't have to be the 90s necessarily but even in like the early 2000s you know and you'd have like four or five nines of reliability and you're like really and? And they were never true. Nope. Their nines are not my nines. You ever see that article? No. Just because your service page says green doesn't mean that I'm not down.

Starting point is 00:47:15 Right. Yeah. And one other thing that I kind of skipped over here in the middle of these other things that I was talking about is response times and latencies and all that are really important because so many companies out there and being that we've all worked on e-commerce platforms, like it's a big deal, right? Like every increase in a response from a hundred milliseconds to 200 milliseconds, like there have been measured studies to where it's like, Hey, if you wait more than X number of milliseconds, a certain percentage of people leave. Right.

Starting point is 00:47:49 And if it takes longer than a second, then people might even leave the search results or whatever. Like there have been tons of studies on basically people's patience and the user experience. And there's a massive drop off after you reach some threshold. Basically, we don't have any patients. Nobody has patients. Yeah. Right? Yeah.

Starting point is 00:48:08 But you might. So, I mean, I see you already crossed it off in the show notes, but I apologize. No. Because I was like looking up something. But I wanted to hear what the answer was about this, the Amazon, the one in 1,000 users. Oh, what about it? Because with the answer about, because the slowest response times

Starting point is 00:48:31 would be the customers who purchased the most. Yeah, so they, yeah. But I didn't understand, because I was trying to find that in the book. I didn't remember, I couldn't remember it from the book. And I was like, wait a minute, I don't understand, just because it's the because they're doing their queries against yeah so basically they tried to make sure that their response times are under 200 milliseconds i don't remember if that was the exact number

Starting point is 00:48:56 but for p999 so 99.9 percent of users they want their response times to be under that threshold and the reason was is that 1000th person had the most orders, or that that one person out of 1000 that had the slowdown, theirs is going to load slower, slower, because they have more data to retrieve from their databases and their services and all that. And they wanted to make sure that at least those, those thousand. Okay, so let me put this in a different in different words if you only had like one item in your cart then your cart would live fast right but if you had 20 things in your cart then okay fine it's fine if alan it's fine if it loads a little bit slower okay right right there's a name for it actually it's uh The book says it's called tail latency amplification.

Starting point is 00:49:45 And the idea is basically the bigger orders, the more intensive projects have more calls. So they're more likely to hit these conditions. So it's not necessarily that having 100 throws you off a cliff all of a sudden, but just having that much more stuff going on, that many more service calls or whatever happening, you're more likely to hit those bad conditions. So just that alone is more likely to cause problems. And that's why they make that buy now button so easy. So you just have like a bunch of single item purchases rather than one purchase with 20 items. Yep. And they like to call out that the percentiles are typically used in what are called SLOs or SLAs. So we mentioned SLAs. SLOs are service level objectives. That's what they're trying to target, right? SLAs are what you've contractually agreed to with whomever's using your products that, hey, we're going to have this amount of reliability or uptime or whatever. And there's a big difference, right?

Starting point is 00:50:45 Like we talked about – did we talk about reliability last time? Or are we going to talk about – yeah, okay. So there's a difference between reliability and uptime. Like we won't get into the super nitty-gritty details right now, but there are SLAs for different types of things, right? Like I said, S3, I want to say that their, what is it? Their uptime is 11 nines, but it doesn't necessarily mean that there's never a failure in data, right? So the service is available, but your data may not be in a good state for whatever reason.

Starting point is 00:51:25 So, you know, there's definitely going to be some legalese there. Am I the only one, though, that when you heard SLO, you thought of ELO? I did not think that. I don't even know what ELO is. I do now. Electric Light Orchestra? Yes. Especially this time of year.

Starting point is 00:51:44 Oh, this is where the dude put out the thing with Beethoven playing in the lights? No, no, no. That's Trans-Siberian Orchestra. No, no. I'm not talking about the band. I'm talking about the thing that people put out in front of their house and it plays music and the lights play to it. Is that what we're talking about?

Starting point is 00:51:59 No. Apparently not. No. Electric Light Orchestra is a band. Oh, okay. No, I didn't. ELO. No. Electric Light Orchestra is a band. Oh, okay. No, I didn't. ELO. No.

Starting point is 00:52:07 From like the 70s. Oh, okay. That's why. I mean, I'm a 2000s kid. I don't know. Oh, right. Right. Yeah.

Starting point is 00:52:15 I mean, sure. I should have said Bieber. Yes. There we go. Okay. Now you got it. Bieber-o. There we go.

Starting point is 00:52:22 I'm on board. All right, so they mentioned queuing delays are a big part of response times in the higher percentiles, and that kind of has to do with these calls kind of stack up in each other. So if one goes a little off, then it can kind of cascade and add up. And servers can only process a finite amount of things in parallel and the rest of the the requests are basically queued or you know they'll have some sort of pool going on and what that means is that a relatively small number of requests could be responsible for slowing many things down so you know if you have one request out of 100 that goes poorly and it's

Starting point is 00:53:02 whenever a certain you know amount of conditions kind of happens as your service gets bigger and bigger and bigger you get more and more users that that condition is more likely to hit and you've got a i don't know a third pool of 20 or something that which is tiny but um you you have 20 then the first time it hits it gets stuck a little longer keeps adding up next you know you've got 20 of these slow responses uh in there clogging the pipes well even worse than that everything if it was the first thing to hit the queue, it's slowing down everything that's waiting behind it. Right. And that's what they call head of the line blocking. So yeah, yeah, definitely. So, so that first low one came in and then let's say a hundred requests came in behind it. Those things may require almost no work, but they're

Starting point is 00:53:43 waiting on that first one to finish. And so that's where the whole latency versus response time stuff comes in, right? So the latency is how fast did this thing actually service the request? Well, they're all sitting there waiting for that first slow thing to actually finish what it's doing before they can even start. The best way to visualize the head of line blocking is just think about your local DMV or your post office because we've all been in that situation where you're like, oh, come on already. The sloth. Zootopia. That might be one of my favorite scenes in any movie ever.

Starting point is 00:54:22 That's awesome. So for this reason, it's important to make sure that you're measuring client side response times to make sure that you're getting the full picture because that basically gives you that kind of end to end number. And so I think that's a great point. So we talked about how load parameters may be different for you and your organization, things that matter for you. Same thing goes here, different performance numbers may matter for you, but in most cases, you're going to want something that represents the kind of end-to-end user experience. This next piece was really interesting to me. I don't think it was anything I'd ever thought about, or I don't even know that I'd done it wrong or right in the

Starting point is 00:55:02 past, but they were saying, hey, if you're doing load testing, you need to make sure that when you're sending requests, you're not waiting for the other one to come back in every situation, right? Because how are you going to test out this ahead of the line blocking? If you're always waiting for the response to come back. So while you send out requests, you don't need to be waiting.

Starting point is 00:55:21 You need to send more requests so that you can also try and trigger this thing to where things start queuing up and your wait times start growing for that reason. I've heard people argue that load tests are often not accurate or not very good because they're not, you know, in this case that we're trying to make a more realistic, but in some cases like it's just checking out a thousand times back to back and it's not really indicative of real human behavior. So we shouldn't pay much attention to results but i think that even though it is different obviously you know you can't really mimic true accurate you know user behavior i don't think that's a reason to discard the results because sometimes you're going to find things in load testing that you're coincidentally not hitting in production yet and so that may come

Starting point is 00:56:02 to a head one day and so i think those results are still really good. And of course you can have, you know, better or worse load testing results, but, um, I think it's still, uh, it's, it can be tempting sometimes to throw results that you don't like or kind of write them off because you know, it's easy. I think you gotta be really careful with that because you can hit things like head of the line blocking easier sometimes when you're load testing than when you've got real production traffic. Yeah. This, this reminds me of that saying, don't let perfect get in the way of good, right? Like a lot of times as developers, we, I mean, we do it all the time. Uh, and, and everybody, we know that are passionate about development. Like somebody will want to introduce something new and everybody's got 5

Starting point is 00:56:46 million reasons why you shouldn't do it. And, and in reality, you don't even know until you try it. Right? Like, so don't say, well,

Starting point is 00:56:55 I'm not going to load test because it's not perfect. It will get you some information, right? It may not be the complete picture, but some information is more than likely better than none. Right? At least in my logging and monitoring, It may not be the complete picture, but some information is more than likely better than none. Logging and monitoring, especially reading the book, emphasizes just how important that is and how many good decisions you can make and how scary it is to not know how things are going in your application or system. For applications that make multiple service calls to complete a single screen or a page, slow response times become particularly critical because the user experience is not good.

Starting point is 00:57:34 Even if you're just waiting for one small part of it, sometimes that's enough to make the thing just not usable totally. So the slowest offender can ultimately determine the user experience. I think this has gotten a little bit better in recent years with things like microservices and people splitting things up. And so even if my cart indicator is still spinning, I can still search. And that was less common a few years ago. And things are getting better. But overall, as an end user,

Starting point is 00:57:58 I've been happier with more microservice-y type actions. Yeah, but that doesn't mean that it couldn't still be that one microservice bringing the whole thing down, right? Right. If authentication is your microservice and it decides to tank for whatever reason, you know, head-of-line blocking or whatever,

Starting point is 00:58:14 you know, then yeah, everything else could go down. Yeah, the other 50 requests on the page came back quick, but that one didn't. You can't do anything, right? It really depends on which one it is. It does bring to mind, though, you guys have heard of Svelte JavaScript? Yeah, the incredibly disappearing framework.

Starting point is 00:58:33 Yeah, so, sidebar. So, it's interesting because it's actually made to solve this particular problem. Because I know you guys have gone to websites like, I mean, any news website, CNN, Fox, any of them. Sports sites are notoriously bad about this. NFL, NBA.com, all those. You go and you load a page and you look at all the requests that are hitting off that page, especially for ads and everything else. By the time it's done, it's served up 300 requests, right? Svelte, the whole reason it was developed, it was actually developed by a guy that is a, an online journalist or something. I don't want to, to say it completely wrong, but he basically wrote articles and he would have to

Starting point is 00:59:22 embed images and that kind of stuff. And the problem that he ran into was, you know, sometimes these frameworks, like let's say that you did it in react or you did it in angular. If you just had an article that needed to be plopped on a page, or you had an ad that needed to be plopped on the page, he didn't want an entire framework to load just to plop that one, you know, rack module on there or that one Angular module. So like what Joe said, it's sort of a disappearing framework so that it's a compile time type thing, right? You build your stuff.

Starting point is 00:59:55 It's sort of so you make the content that you want to show up on this news site or whatever. When you build it, it sort of does tree shaking of its own and it builds in exactly what it needs and only what it needs. And then that way you can embed that on the page and there's no frameworks being loaded. It's just the content and whatever little script pieces it needs. So it's really interesting. Um, but it reminded me of this simply because, you know, it's helping solve some of those problems. Like, especially, like I said, news sites are just horrendous at this kind of thing, and this was designed to fix some of those problems. I want to say the Times, but I can't find it.

Starting point is 01:00:36 Say what? I thought it worked for the Times or something, but yeah, I read some stuff about it too. It kind of reminded me of like scaffolding. So if you've got these tools while you're building it, then they kind of ultimately churn out just the minimum amount of static code you need to run, which kind of reminded me of some James Dackey stuff I had been

Starting point is 01:00:54 reading about at the time. Does being in tech sometimes just make you feel like you're not in tech? Like, how long has this thing been around? I haven't heard about this yet. And now you're both talking about like, oh, yeah, no, I've known about this for like 18,000 years.

Starting point is 01:01:09 How do you not already know about it? Like, I learned this thing like when I was a child. I don't even remember how I came across it. I mean, it's one of those things that I think people will take to extremes because that's what we do, right? Like, oh, drop everything. Don't do React anymore. Do Svelte, right?

Starting point is 01:01:28 Oh, don't do Svelte. Do this. I think it's just like many things. And what this book is really about is you choose the right tool for the job. If you have something that you need to be ultra lightweight because you need to make sure that, you know, whatever page is loading on something that you don't control, you want it to be as static and lightweight as possible. It's a great tool for it.

Starting point is 01:01:49 Would you build an entire site with this? I don't know. Maybe not. I mean, because React and Angular buy you some really nice stuff. So I don't know. It's really interesting. It's too much to know, ultimately. So no matter what, there's going to be tons of things you've never heard of, things that you may have heard of and have no real experience with, and you probably have all wrong. Oh, dude, I saw something about AWS because there's a big battle between AWS and Azure and Google to try and be the cloud of all clouds, right? They just released something like 170 some odd new services.

Starting point is 01:02:25 I saw some article the other day, like a hundred and, okay, let's say that I read that wrong. Right. And let's say that it's not new services. All right. Let's just say they have 175 total services. Like, how do you grok all that? Well, we were, I think it was the last episode, right? That we were, or at least I was reminiscing about the days of the AWS console where like,

Starting point is 01:02:51 you know, it used to be when you would log into the console, like you saw all of the services that were available for you in one screen. Now you don't. You see like, here's the last six that you use. Anything else you got to go search for yeah it's actually i think i think i said it wrong i think it might be 170 okay yeah yeah amazon's cloud business has now has over 175 different services for customers to use that's up for more than 100 services two years ago and 140 last year so i mean the thing is don me wrong. Like, thank you for doing this because I'm sure that they're adding value. But, oh, my God, how do you keep up with it? Well, I mean, it's just basically like an operating system, right?

Starting point is 01:03:34 Like they're just expanding the capabilities of the operating system more and more and getting more and more granular with it. So it's just going to – it'll keep going. And they're wrapping more and more software, right? Like really, if, when you boil down what any of these cloud services are, they take a lot of, probably in a lot of cases, open source software, and then they wrap it to make it easy to use.

Starting point is 01:03:54 Right. Well, no, it might not even be software though. It might just be like concepts like DNS, for example, route 53. Yep.

Starting point is 01:04:00 Good point. Was, was just like, you know, a DNS service. And that's why I'm saying like, you know, if you think about like your operating system and all the little things that it does, like you could make a similar, Yep, good point. used to be something that you would either have a separate app for or you would go to a separate web page. Now it's just built into the operating system that if it knows where you are, there's a built-in app for that. And now there's notifications. Yeah, there's all kinds. You're right. You're absolutely right. I mean, I guess the thing is, it's just like, man, the stuff is

Starting point is 01:04:41 all growing out of control. Like you said, is there any time you're a developer and you're like, oh, my God, like I must be way far behind everybody else. But at the same time. I didn't know about Svelte. So, yeah. Thanks for bringing that up again. There you go. But I didn't know about 175 services. And the funny part is I'm in AWS and I click something and I see the scroll bar like 20 miles long and I'm like, oh, yeah, I don't even know what I'm looking for.

Starting point is 01:05:03 So, you know, it is crazy crazy anyways, back to this stuff. Um, so we talked about a little while ago, this whole notion of compounding slow requests, that's called as the, the tail latency applicant amplification, right? So as some of these things slow down on the, on the tail end of it, it actually slows everything down because it's amplifying as more requests queue up and all that. It sounds like it should be finance related. The way you said that compounding slow request. Well,

Starting point is 01:05:32 it is with tail latency amplification. Well, I was thinking of like, I was making a reference to like compounding interest was what I was thinking when you said it. Yes. Annuities. That's what these are.

Starting point is 01:05:42 Okay. Next thing I mentioned is that monitoring response times can actually be a little bit dangerous because the monitoring itself can be a little bit expensive and have an effect on the ultimate resource utilization. I love this. I love this one.

Starting point is 01:05:59 Observing the behavior can change it. You ever look at how much stuff or I guess I don't really know how to look at it, but I've always wondered how with Google Analytics running on every page, tracking where my mouse goes, heat maps, not just Google Analytics, but a lot of services now track every little thing you do, where your eyeballs are looking on the page.

Starting point is 01:06:21 And so that stuff has a very real cost, and so it may be worth uh not necessarily tracking everything and only doing you know a little bit or every tracking tracking every hundredth request or something like that and so it's worth considering if you think that your your monitoring is going to be that expensive and have that profound of an impact have you guys ever been in a situation where you're like all right right, I need, I need to add some logging to this application. So I can like figure out what's going on, what's going on. Right. So obviously I don't have enough logging. Cause I can't, I can't figure out what's going on. So then you're like, okay, add some logging,

Starting point is 01:06:57 add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging, add some logging. And then you're like, you got to run your app. You're like, okay, I think now I can figure out where it's going. And then you're like, okay, let's run it again. Wait, why is it so slow? What just happened? Yeah.

Starting point is 01:07:19 Like you go from one extreme to the other extreme. But you found that problem. Yeah. Well, or that or the water never boiled because you were looking at it. Right. Yeah, that's true too. So there is one thing here that's really interesting is they mentioned like so the amount of logging, it can be problematic, not just because you've already got so much. So it could actually inadvertently impact the performance of your application, right?

Starting point is 01:07:47 But the whole thing is if you're trying to do aggregates every minute, let's say, and you're doing it on all the data. Right. And you have I don't know, think about like somebody like an Amazon at scale. Right. Like how many I forget they lose so many millions of dollars per second that they're down. Right. So if you imagine the amount of traffic that is and if you're logging all that and you're trying to do some sort of aggregation to say, you know, these are the response times for the last 10 million events that happened in the past minute, you know, you're going to be crashing servers. Like you can't keep up with that amount of data. And that's why he was saying is be selective, right? Like figure out another algorithm to do it. And there's a few of them that are listed here. Yeah. I want to highlight that too, because I'm because um keeping an average for example is actually really easy and that's probably why it's so common if you need to know the average you need to know how many number how many samples you've taken

Starting point is 01:08:32 and you know the sum there's two numbers to track there every time you get a new sample you increment the sample count and you add to the sum getting min and max also very cheap you want to know the median then the naive way to do it is to just keep track of every sample you've ever taken and then grab the one in the middle. And so, like Alan said, there's a couple of algorithms that kind of get around that by making use of better data structures or kind of compressing the data a little bit. So we've got here forward decay, t-digest, and HDR histogram. Those are all above my head, but I know enough about trying to keep medians to know that it can be a pretty big pain in the butt, actually. And not even just medians. I should say percentiles.

Starting point is 01:09:12 Yep. And this is one thing that's important. And we should all know this as people who have done math. Averaging percentiles makes no sense, right? Because a percentile of a set of 10,000 and trying to average it with a percentile of a set of five, it doesn't make sense. They don't work. So what they point out here, and I don't think I even knew this, by the way, I knew that you couldn't do it that way, but I didn't know this. So averaging percentiles is meaningless. Rather,

Starting point is 01:09:39 you need to add the histograms. So that sounds cool. I've never done it. I don't even know how you do it, but it's something I'll probably look into because that's really interesting. It's kind of almost like a bucketizing, I think. I don't know. I haven't really looked into them, but that's how I imagine is they kind of like divide things up into certain segments in order to kind of measure them. Well, I remember reading a white paper a while back that was related to a machine learning process that Amazon was using where they were doing mini batches in order to be able to do like real time. They were trying to get closer to real time inference. And so basically the way you could think about the mini batches is it was almost like taking averages of averages kind of situation.

Starting point is 01:10:29 Like, yeah, because you, because to your point about like, well, you can't process all of the data. Like, did you even give a number,

Starting point is 01:10:37 Alan? I don't think you did, but, um, you know, you were just saying like, Hey, if you're like a size of Amazon and you're trying to do these aggregates,

Starting point is 01:10:43 he's like, you can't do this across all the data in real time. No, you can't, but you could take like, like, Hey, if you're like a size of Amazon and you're trying to do these aggregates, he's like, you can't do this across all the data in real time. No, you can't, but you could take like, okay, Hey, here's a batch of whatever that amount of data is, get some average from it. Okay. Now we can go to the next data, next batch, get some average from it. We can take those two averages together, you know, and you can keep going like that. But, I mean, I'm grossly oversimplifying it because, you know, again, the particular white paper was more about machine learning, not necessarily about, you know, log metrics or whatever, you know, or related to, you know, your application's performance. It's all good stuff. Yeah, great time to be a dev in top security there's so much to learn and so much to uh try and know oh for sure for sure so we talked a

Starting point is 01:11:34 little bit about scalability we talked about uh the ways to describe load and we kind of talked about load parameters being your stressors we talked about how to measure performance by using performance numbers. So now we're going to talk about what you do to actually cope with load, basically how we retain good performance when our load increases. Another example here, an application that was designed for 1,000 concurrent users is probably not going to handle an order of magnitude jump up to 10,000 concurrent users without any changes. And so it's often necessary to kind of rethink your architecture a little bit every time your load is significantly increased.

Starting point is 01:12:11 Go ahead. Nope, all you. No, so that was one thing that I thought was really interesting is they said, hey, every time there's an order of magnitude increase, you probably have to rethink your architecture. And they said maybe even more than that, right? So even in stepwise increases, right? Maybe you designed it to work for a thousand users and you get to 1500 and it just tanks. So that's not an order of magnitude. Well, I mean,

Starting point is 01:12:34 I guess in that case it maybe is, but, but you see what I'm saying, right? Like it's not massively bigger, but that little bit puts you over the top of what is acceptable. And so you even have to rethink there and then you have to rethink again when you get the 3000 etc right yeah that's a really good point so if you can say like we've got a big sale coming up and we expect 10 times more traffic then you know trouble time like that's like a threshold that's known for having this kind of big bad jump so you're probably going to need some additional hardware either scaling up to basically you know, add more resources to an individual machine or scaling out by adding more machines. And so if someone tells you something in marketing, we're having a big, huge Black Friday sale, we can expect a hundred times or more our normal traffic.

Starting point is 01:13:20 Whoa, Nelly. At that point, I can't even imagine making that big of a jump without doing some serious testing. It's pretty hard to spit paw that, I would imagine. Dude, so you guys know that I'm a huge fan of Costco. I think, Joe, you are too, right? I enjoy it. Not as much as you, though. I don't know of anybody that enjoys Costco as much as I do.

Starting point is 01:13:39 Hey, our buddy Stuart. That's not even possible. Our buddy Stuart might like it as much, if not more than I do. But here's the thing. So Costco sent out a Black Friday email on Black Friday morning. And as I do, I clicked that email and the entire site died. And I'm like, oh, come on, man. How good was this sale?

Starting point is 01:14:00 Apparently, I don't know. It couldn't have been that good because nobody could get to it. Seriously, all day long, you could not log in. And if you know anything about Costco.com, you can't see the prices unless you log in because it's a member-only thing. Couldn't log in all day. And I tried many times because I like Costco. What did I tell you about authentication service? Your microservice is the authentication that dies.

Starting point is 01:14:23 Yep. So this whole scale, this whole load thing is exactly what you're talking about. You guys know, you ever, so they used to have wood offs, right? And anytime someone would say something about a wood off,

Starting point is 01:14:35 like everybody would be over there, click and refresh and their servers would die every single time. Right? Like everybody wanted that bag of crap that you could get on a wood off. It never happened because the page would never load so man i haven't thought about wood in forever right i just went there and i yeah me too and i'm like oh i actually want that see so look at the sushi shirt the what so this is a funny sushi shirt it It's Godzilla eating sushi. I'm like, ooh. Oh, no. No, this was like an Under Armour shirt that I just love. Oh, but I do see the Godzilla one, yeah. I just brought some joy back into your lives.

Starting point is 01:15:14 Yeah. Okay, episode's over. I got to shop here. Hold on. Yeah, this whole – hey, wait. Who knocked off these checkboxes? We haven't talked about these yet. So, yeah. I'm going We haven't talked about these yet. Um, so yeah,

Starting point is 01:15:25 I was just getting back to the whole thing that, you know, being prepared for this is way harder than you think, right? Like Costco.com has been around for a minute. Woot.com has been around for a minute. Like this is still not a super simple problem to solve. So that said the things that were marked off that we hadn't actually talked

Starting point is 01:15:44 about yet, scaling up. Um, typically when we talk about scaling, you hear of two different ways. They're scaling up and they're scaling out. And we've talked about this in the past before, but scaling up is adding more hardware resources to a single machine to handle additional load. So more memory, more CPU, more whatever, right? That's what that is scaling out is hey it's way cheaper to add more machines so just add uh we call them commodity hardware right and all commodity meant was it wasn't like the old sun spark or wrist systems or anything like that it was you just go buy a piece of hardware off the shelf from dell or whomever else you

Starting point is 01:16:24 throw it in your rack and you're good to go. So that's scaling up. You go from like 16 gigabytes of RAM to 32 gigabytes of RAM. It's a lot cheaper than buying a whole new laptop. Yeah. Right? So that's the case where scaling up, you can get much better performance for much cheaper than scaling out. But there's going to come to a point as you try to go up to, I don't know, 128 gigs of RAM, where it's definitely

Starting point is 01:16:45 going to be cheaper to buy a whole other laptop. And so that's kind of the deal there is that scaling up essentially has a limit. It definitely has a physical limit in terms of like, you know, physics and electrons and how fast you can get, but also just in terms of pricing, like it gets to be really expensive at the top end of that curve, while scaling out should be pretty close to a linear scaling of price. So, you know, a laptop costs $1,000. Three laptops cost $3,000. Three times your RAM is not going to cost three times as much.

Starting point is 01:17:18 Right. But let's also say that when we talk about scaling out costing you just that much, we're talking about hardware costs, right? Because now you're going to start incurring other costs. Yeah, like if you only wanted to have, I don't know, three gigs of RAM, right? That's not a big deal. And going from three gigs 256 terabytes of RAM. The machine alone that could support, you know, an insane amount of RAM is already going to start off expensive,

Starting point is 01:17:51 let alone the RAM for it. Yeah. I mean, I've looked at even for desktop builds, you know, hey, I want to put 128 gigs in this. Dude, you start running in some cash because now you're doubling the size of the chips on the single memory sticks. And like the price doesn't scale literally, right? Like going from 64 gig to 128 gig isn't twice the price. It's four or five times the price in many cases.

Starting point is 01:18:18 And that's what we're talking about. There's that threshold that once you step over it, it's like, okay, you know, now we're getting into expensive territory. Yeah. We have from our last episode's show notes, the statement where we said, as time has marched on, single machine resiliency has been deprioritized in favor of elasticity, i.e. the ability to scale up or down more machines, which I guess now we should really say scale out. Scale out, yeah. Yeah. Yeah, I'm bad about that. And to move it on here, so scaling up versus scaling out,

Starting point is 01:18:54 one is not necessarily better than the other. And we should talk about why now. Because if we're talking about the cloud world, everybody's like, ah, scale out, right? But there are situations where that doesn't make sense. And they, and they pointed out in the book and they do a beautiful job of it. Scaling up can be much simpler and easier to maintain, but there is a limit to the power available on a single machine, as well as the cost ramifications of creating an uber powerful single machine. So think about your traditional RDBMSs, right? SQL Server, Oracle, that kind of stuff. Like, it's really easy to manage one box, right? And if you can get all the power in that one box, it sure does make your life a whole lot easier than if you're trying to manage some sort of cluster with failover and all that kind of stuff, right? Scaling out can be much cheaper in hardware costs, but this

Starting point is 01:19:46 cost starts going up in developer time and maintenance because you've got to keep this infrastructure running. There's communication between them. There's balancing and complexity, latency. There's so many other things that you get there. So now you're truly balancing out

Starting point is 01:20:00 where you're spending your money and your time and your effort. Yeah, but the hope though man i hate to say that one yeah i mean i get where the book is coming from with the one is not necessarily better i get that especially depending on like what your needs are right like like we often joke about alan wanting to build something that can support a billion users from day one as he's still like proof of concept in this thing. Right. So, OK, in that case, yeah, I get it. Scaling up.

Starting point is 01:20:33 Right. there is a economy of scale there that like once you do figure out that complexity and how to handle that that infrastructure that you know that adding that uh that next machine to the configuration isn't going to cost you uh you know that much more effort right does that make sense like it does like i, I guess, okay. I guess what I'm trying to say here is like, if you were to picture the curve, right. That like it starts out really steep in the beginning, but then as time goes on down the line,

Starting point is 01:21:17 like the curve gets closer to zero. It depends on your needs though. Right. Like, or at least that's the hope. So I've brought this up in the past, right. And I think we have a link in it I've brought this up in the past, right? And I think we have a link in it, and I'll put it in the show notes here too.

Starting point is 01:21:29 But the Stack Exchange, or I think they renamed themselves again to Stack Overflow, but their architecture diagram, if you look at it, they have one SQL server. All right. If you want to really be realistic, they have two, but it's just for high availability, right? Like if one goes down, the other one's still up. So they've decided, Hey, we only need one of these things, but then we have other things backing it like a Redis cache and that kind of stuff. Right.

Starting point is 01:21:59 So again, it just depends on what you need. And we say that so many times on this show, right? Like there's no perfect answer to a lot of things and there's not here either. So rather than picking scale out over scale up, pick the one that best suits your needs. Stack Overflow maintains a pretty beefy SQL server, but then they back that with other services that allow the site to run extremely fast, right?

Starting point is 01:22:25 Yeah, I'll have a link to this in the show notes. But yeah, they do 1.3 billion page views per month. And they have, where was it, four SQL servers organized as two clusters. Each has one and a half terabytes of RAM with a database size of, well, let me rephrase that. They have four SQL servers, but two of them are for Stack Overflow. The other two are for Stack Exchange, Careers, and Meta. So the Stack Overflow one alone is one and a half terabytes of RAM, 2.8 terabyte database size, at least as of the time of this, right? 4% CPU usage. That's crazy.

Starting point is 01:23:14 And their peak. It peaks at 15%. Yes. Yeah, that part's crazy. And they get 528 million queries per day. That's unbelievable. And their peak is 11,000 queries a second. Yeah.

Starting point is 01:23:27 So again, this is one box, right? Now granted, 1.5 terabytes of RAM probably costs as much as maybe some houses out there, right? Like let's not downplay that. But this is a situation where they said, hey, let's scale up the hardware here because scaling this infrastructure out is going to be very difficult, right? So, again, in Stack Overflow, if you listen to the show, you probably have been there a time or two. So, yeah, I mean, you got to know the use case so i was looking at how much 1.5 terabytes of ram will cost you how much is it is that

Starting point is 01:24:12 going to be in our next uh show if you gotta ask that'll be that'll be in the next uh uh shopping spree episode one and a half terabytes of RAM. Dude. But I'm not going to get it unless it's got LEDs on it. In your server. Oh, man. I mean, like I want my server to be blingy. That's an extra 50,000 for the bling.

Starting point is 01:24:41 I bet I wouldn't be surprised. Not at that level. So, yeah, the next thing we have is there's also what's called elasticity. This is basically when systems can resize on their own based off some sort of metrics.

Starting point is 01:24:58 You set something up and say, hey, if I see 60% CPU utilization hit, then add another box and then distribute the load and when it goes down then it automatically shrinks back down and that's why it's called elastic right just like uh I don't know jogging pants or something right like as you get fatter they grow as you get shorter they shrink hey let's keep it real I was gonna bring up like a stretch armstrong okay there's that too that might be an older reference there yeah I don't Shrink. Hey, let's keep it real. I was going to bring up like a Stretch Armstrong.

Starting point is 01:25:26 Okay, there's that too. That might be an older reference though. Yeah. I don't like your – well, I guess your waistband reference though would be kind of fitting considering we just got past Thanksgiving. We did. Here in the States. Yes, people are shrinking back down now, hopefully. Yeah.

Starting point is 01:25:40 Or they will be in January when everybody stops eating for a week or two. Well, no. January you shrink back down because for the first two weeks you join the gym. Right, exactly. I'm going to skip lunch today. So, yeah, again, that all typically happens with some sort of criteria. Now, there are systems where people manually do these things, right? Like, oh, man, Black Friday just hit.

Starting point is 01:26:05 We need to add another server. Or maybe you know that because Black Friday is coming, you anticipate the need. So you're like, okay, let's go ahead and scale up a little bit extra, right? So that, because there might be, we've talked about this as it relates to like Lambda and Azure functions and things like that. Like just the cost, the latency that you might hit for the spin up time for that service.

Starting point is 01:26:30 Right. So because you might anticipate a big in the case of like an e-commerce thing with Black Friday, because you might anticipate that extra load, you could go ahead and scale that up in advance. Yep. And this reminds me of cloud services. Like I think a lot of times when people think about cloud services and AWS, Azure, Google, whatever, they think, oh, it automatically does everything for me, right? So I can just turn on this little elastic button and say auto-size, I'll give it some thresholds.

Starting point is 01:26:58 But I came across a use case that actually demonstrates this very well. So in Amazon, there's AWS Kinesis and there's Kinesis Firehose. And this is for loading data into the cloud, usually going into an S3 bucket or maybe somewhere else. But the interesting thing is if you use Firehose, it sizes for you. You don't ever have to worry about it, right? It is fully elastic. If you send it way too much data, it will grow to meet your needs. If you use Kinesis itself, you actually have to plan for the load. So it's still very much a manual person having to go in there and make decisions on, you know, how many partitions do I have? How much data am I pushing in?

Starting point is 01:27:39 So even though they're both basically services that do essentially the same thing, one's managed by you and the other one's managed by the framework itself. I don't know why this just came to mind, but you were talking about Amazon's 175 services. Maybe it was 175,000 services. But have you heard of – I don't know if we talked about this have you heard of aws snowmobile yes joe so uh your comment about fire hose uh in kinesis is what what made me think about it because you're talking about like putting in a bunch of data so they have a service called aws snowmobile where like imagine if you have a lot of data that you want to load into AWS. So like a gigabyte.

Starting point is 01:28:30 Let's say exabytes. Right, right, right. Petabytes, exabytes, a lot of data. Then what they will do is they will drive a truck to you, right? And this is basically like a data center on wheels they will drive it to you connect it to your network and your power and you can transfer your data to the truck and then they will drive it back to an aws data center and make it available for you on the network pretty cool how many hundreds of dollars do you have to spend per month to get them to do that? I think that's a call for pricing. That's awesome. Yeah, you could transfer 100 petabytes per snowmobile.

Starting point is 01:29:15 Man, that's a lot of data, right? But, I mean, that's cool. That's really, really cool. So, getting back to this with the whole auto elastic versus a person doing it, doing it manually might actually be simpler and it could protect you from a big bill, right? Like if you had some things in their auto scale and you didn't expect Black Friday for you to sell your widget, you know, a billion times, you might not have been prepared for that bill that's going to come your way when it scaled up

Starting point is 01:29:45 100 servers for you right so doing it manually can actually save some of that yeah that's a really good point with when we're talking about scalability especially in this section it's really important to come when you kind of talk about judging a system to understand what your scaling options are whether it's up or out, and how that affects basically the resources of the system under load, so basically those performance numbers, and how much it costs you, literally. How much money or resources it takes. That's how you truly measure a system.

Starting point is 01:30:20 How well does it balance all those needs? Yep. measure system like how well does it balance all those needs yep so i already mentioned you know for a long time the the rdbms is typically ran on a single machine um you know they might have had a failover or something but they typically weren't clustered or anything like that so you know what they pointed out in the book is as these distributed systems are becoming more common and the abstractions that are built on top of them are getting better, we're going to start seeing more of these commonplace, right? Like there for a while, it was really hard to do distributed computing, which it still is kind of hard. I mean, if you mess with it much, but things, the tooling's getting way better. And so it might become more common that instead of starting out with a single machine, you just start out with a scalable infrastructure. You look like you're about to say something there, Joe.

Starting point is 01:31:17 No, I'm getting attacked by dogs. Oh, sorry. You took the squirrel away. I took the squirrel away. Now, yeah, they got tired of fighting with each other. Now they're fighting me. And now you're suffering. That's awesome.

Starting point is 01:31:29 Yep. So this book actually talks about scalability as well as maintainability. And they definitely, they're opposing forces, right? Like if you really want to think about it, you make it more scalable. Well, it's probably going to be a lot harder to keep that entire thing running perfectly well i mean here's an easy way to think about this you know in the conversation we just had about like scaling up versus out right and that

Starting point is 01:31:56 um you know with the complexity of that right because if you have say uh two machines, that you've scaled up, right? And, well, I don't know, would you consider that up or out? Because you have more than one. But let's say that, you know, you manually configured those, right? So it's not elastic. It will be a lot easier when the time comes that you need to debug something, right? You know, I only got two machines to go to, right? So it's a lot easier to do that.

Starting point is 01:32:32 Whereas if you're dealing with elastic machines, right, that might scale up to like, let's say it auto scaled out to a thousand of those things, right? You can't think about a system where it's like, oh, it's possible for me to log onto one of those. No, no, no. You can't think of those systems in that kind of way at all. You're going to have to think about them in regards of, well, they're ephemeral. There's nothing there that's going to matter. Everything that I would need for debugging purposes, like logging or logs, for example, is going to need to be not on that box. It's going to need to be shipped somewhere else, right?

Starting point is 01:33:09 So that's part of the complexity that comes with it. So that's the trade-offs that you're making. Yep. You know, if I only have the one or two machines, I can be a little bit lazy, maybe. Probably shouldn't be, but I could be. Right. In the short term, you might be. Yeah, in the short term. because it'll be more cost effective,

Starting point is 01:33:26 right? It might be just fine. Yeah, it's fine. Alan, it's, it's interesting. So they call this in the book and I,

Starting point is 01:33:33 we didn't have this in the notes here, but they called that the share nothing architecture, right? Like when you, when you have a thousand machines, right? Like that's basically what you're saying is these things are sort of standalone things that can just operate on their own.

Starting point is 01:33:44 Um, but this also brings it back around to the whole Kubernetes thing, right? Like, basically what you're saying is these things are sort of standalone things that can just operate on their own. But this also brings it back around to the whole Kubernetes thing, right? Like that's kind of why like Kubernetes is sort of big deals. It sort of has like a managed pipeline for its logs and all that, because if you did have scaling set up inside it for containers, it could spin up a thousand of these things, right? And it's sort of elastic. You don't really have to think or care about it.

Starting point is 01:34:13 And it's got these things built into it already that help you manage and view logs and do that kind of stuff. So one of the reasons why I'm a big fan of Kubernetes is because they thought about a lot of the things that are really painful in the VM world, you know? I'll say, you know, there's no such thing as a generic one-size-fits-all scalable architecture. And the book, in many ways, is all about just those trade-offs that you have to make, whether it's in terms of scalability or maintainability or whatever, there's always trade-offs to have to make. But we'll say that, like you said, Alan,

Starting point is 01:34:39 just kind of echoing you, I think Kubernetes is so popular now because that's probably about the closest thing we have for being a kind of one-size-fits-all scalable architecture. It's not there. There's a lot of different complexities and things that aren't fully met or that you just have to solve somewhere else. And that onus goes on you. But I do think that's kind of like our current best hope in that direction no way do you think that that's why kubernetes is popular or do you think it's more that you can have these kind of

Starting point is 01:35:11 cloud-like concepts without being specific to a cloud environment that too because i think it's a package yeah i think this This is a way for kind of managing my architecture and it works locally and out in the cloud. And it gives me one kind of big box that I can kind of sort of almost get everything into. Yeah, I think it's both. I agree. I think part of it is once people realize what containers can do for them and how it can ease their lives and then they realize oh well that's great i got that and you know something like docker compose right like oh i can stand it all up but wait if it dies it's dead like that's it then kubernetes is that next evolution of oh oh no i got something that can keep it alive for me right

Starting point is 01:36:02 like i i think it's a little bit of everything. Like when you start digging deep into it, like I talked about the log stuff, you know, all logs are exposed in a centralized way. So plugging in some sort of dashboard on top of it's not hard to do. As a matter of fact, there's a Pluralsight course I can link in the description here that shows you how to take your Kubernetes thing.

Starting point is 01:36:24 And you can basically route everything through Prometheus and then have a Grafana dashboard and see your entire infrastructure. You can see how much CPU load is happening on each one of your nodes, how much utilization is happening everywhere. And it's all because they basically, you know, Google and everybody else has had this problem for years. Like, hey, I need to be able to monitor this stuff. They built it into the, to the platform. So I think it's a combination of everything, but anyways. Um, yeah. So getting back to this, like the, the one size doesn't fit all right. The next thing is your problems could be reads. It could be rights. It could be any number of things, right? The volume of data, the complexity of the data, whatever. And they had a really good example that I basically put down here, mostly verbatim, because it really demonstrates the problem. Systems that handle

Starting point is 01:37:16 100,000 requests per second at a kilobyte in size is very different from the needs of a system that handles three requests per minute, each with a file size of two gigs, right? Same throughput, same amount of data coming through, but very different requirements in terms of what they need in order to perform well. It's like a Dropbox versus a shopping site, right? That's a great point. Totally, totally different needs in order to make those things perform well. So, designing a scalable system based off bad assumptions can be both a waste of time and even worse, counterproductive.

Starting point is 01:37:56 As we kind of mentioned, it's hard to really get that stuff going before you really know what you're doing. In the early stage of applications, it's often more important to be able to iterate quickly. So if we kind of run into it, like we try to scale it too far ahead of time, you can get kind of tangled up in the weeds of hypotheticals. And it's just really hard until you kind of know what you're going after. In that educative course that I always like to talk about, the system design interview,

Starting point is 01:38:23 it was really kind of striking to me how important the numbers and the performance metrics came in to play in all of those systems like when they talk about doing a dropbox or a twitter how important it is to to kind of know and have some estimates based on you know reads and writes and throughput basically the parameters that matter because those had really big effects on how you set things up. So I thought it was really happening and it ties in really well with the book. Yeah.

Starting point is 01:38:52 It's, it's really interesting to think about that if you make the wrong assumption on what load parameters matter, how you could completely go the wrong way. Right. It's a, I mean, yeah. Uh, look at the wrong way. Right. It's, uh, I mean, yeah. Uh, look at the start of this conversation, right. Where I said that, you know, when, um, you know,

Starting point is 01:39:14 we were focusing first on, you know, going after concurrent users, right. And, and how we had calculated that. And then we use that to decide, okay, this is how much infrastructure we need and then if you recall we ended up scaling that back down by half right right over time right oh yeah a good example uh too if you've heard about how twitter's architecture has changed over time because when they first came out they didn't have to worry about so many, they didn't have to worry about so many – they didn't have to – they didn't predict some of the problems that they ended up having. So they changed the architecture.

Starting point is 01:39:50 And the example I'm thinking specifically is they didn't really anticipate or hadn't planned for these kind of super tweeters that have bazillions of followers like the Kanye West and the Taylor Swifts of the world who have billions of people following them. So whenever they had a tweet, their original architecture meant it was writing out things to all these people's feeds, which is totally fine if each person had an average of 500 followers and totally terrible if people had 13 million. So they had to kind of figure out how to balance that and adapt that as they went on. And I don't think you can really avoid that unless your service just doesn't take off at all.

Starting point is 01:40:30 That was actually one of the parts of the book that I think just hooked me when they were talking about the various different approaches they tried and which one stuck and why. And they had to think about the problem several different ways to even come up with a solution that seemed like it would make sense. So, yeah, that was awesome. All right. Well, we'll have plenty of links for this in the resources we like section. And with that, we will head into Alan's favorite portion of the show. It's the 15 billion tip of the week. Right.

Starting point is 01:41:08 Hey, I only have one tonight. Oh. Yeah. Somebody's slacking. I am slacking. Aha. Good pun. So here's the thing.

Starting point is 01:41:19 If you ever have to go set up your coding block slack specifically because that's what you care about right obviously if you ever need to go set that up on another computer it can be really frustrating so not just coding blocks right like we're probably all a part of five or more slacks so one of the things that's really irritating is if you go to slack.com and first I say install the desktop app. So there's that. But if you go to slack dot com and you go click to sign in at the top right or you're I think I'm signed in. I need to sign out, sign into another workspace. So if you do sign in, it'll ask you, hey, which workspace are you wanting? So you have to know the URL or the name of it, right? So in our case, it's codingblocks.slite.com. So you can type in coding blocks. A better way to do this,

Starting point is 01:42:11 everybody, is down below that block where it says sign into your workspace. Underneath the continue, there's a thing that says find your workspace. Click that. It'll allow you to drop in your email address and then it will send a link to your email address that you can click and it'll have all the workspaces that you registered under that email address and you can just launch each one of them. And if you have the desktop app installed, it'll say, hey, do you want to open this in the app? And if it does, it'll automatically add that workspace to it. So you don't have to remember the name of all the things that you've joined. You can just click some links and have everything set up. Do it that way. It's so much better. And because I have like three or four email addresses that are registered to these,

Starting point is 01:42:57 I can just do it a few times, have it send me the things, and then I can pop open all the ones. Why don't you just have these in your password manager so you can just go to Slack? I do, but here's the problem. So in my password manager, you know, if it's all with my main email address and I have like freaking 20 entries with my email address. No, no, no, no. You're searching your password manager wrong. I'm not searching my password manager at all.

Starting point is 01:43:21 I type in my email address and I get an email sent to me and I just click the links. It's easier. So here's the thing. I don't want to have to remember the names of all the sites that I'm on, right? Like codingblocks.com and all that. So it's just easier. I prefer easy over anything nowadays. Just about.

Starting point is 01:43:40 I was just offering an alternative, though, because if you were to just go into a new tab and in your password manager, you can search it and you can just type in slack.com and it would pull up all the different slacks that you're a part of and then click that and it would automatically take you to it. Yeah, I want to do that. Because I don't want to have to receive an email.

Starting point is 01:43:59 I want to get an email, man. I want to click a link. That's all I want to do. I want to click a link and it'll tell me the five that I'm joining. Hey, launch. Done, done, done. So, yes, I like mine. Hey, you know how you were talking, was it last episode or maybe it was the episode back, I think, we were talking about like the Slack shortcuts. Oh, yeah.

Starting point is 01:44:21 The shortcut keys. Yeah. shortcuts oh yeah the shortcut keys yeah and one that i like stumbled on that i was like it was kind of like a your response was like what magic did i just do uh related to the console tip in the last episode i think it was yeah you know uh when i was in slack and i did a control t or if you're on a mac command t is that the threads no it brings up a search it'll search across comments and asides eureka is more and it'll just search against all of it. Oh, you were going to try and open a new tab somewhere. Yeah.

Starting point is 01:44:51 Yeah, I didn't realize I was in Slack. And so I was like, you know, let me open up a new tab. And then Slack came up to the screen, and I'm like, whoa, what is this? What am I looking at? The accidental things are like, Oh, that's amazing. Yeah. And then I felt bad cause I was like, Oh,

Starting point is 01:45:07 we talked about all the slack shortcuts and I didn't realize that was one. Um, okay. So, uh, for my tip of the week, uh, okay.

Starting point is 01:45:19 You know, I talked about back in what episode was that? One, one, one 12. Um, I was like, One, one 12. Um, I was like, Oh,

Starting point is 01:45:27 Hey, you know, Hey, here's this cool thing that, uh, I had just found. Remember how we talked about felt now? I was like,

Starting point is 01:45:32 Oh, you ever feel like you're dumb in technology. Okay. So I was like, Hey, there's this cool thing with, uh, you know,

Starting point is 01:45:39 with Microsoft, with it, uh, in.net that you could use the code Dom to, you know, create and manage, uh, code DOM to create and manage code. And everybody jumped on me and was like, Outlaw, what are you doing, man?

Starting point is 01:45:55 That's like so 2005. 2005 called. They want their compiler back. It's all about Roslyn these days. So fine, use Roslyn. So I'll include a link, but yeah, I've been working in some pretty cool stuff here lately where it's like, if you're familiar with like, the closest thing that I can think of it as

Starting point is 01:46:13 is like eval kind of statements, like in Perl, for example, where you could like just take in some arbitrary string and then like execute it. And so that's the type of thing I've been working on here lately. And it's all been using like Roslyn APIs. But, you know, doing that obviously in a C sharp kind of world, right.

Starting point is 01:46:33 And where you can like take, take in some, use some string that, that you know is code. So you could compile it and then load that assembly, all of this happening in memory and so it's pretty cool so i'll include a link to roslyn so yeah i guess my tip of the week would be to use roslyn and not listen to my uh comment from episode 112 about code dom because i might have been code dumb oh terrible so uh it's that time of year terrible and everyone's we got the charles barkley terrible everyone's doing advent calendars and uh one i talked about

Starting point is 01:47:14 last year doing i was advent of code and i didn't make it all the way through last year and uh i think it's pretty tough to make it all the way through realistically uh but the cool thing about this site is that a new problem gets released every night at midnight uh eastern uh it may be your times i'm not sure how that works exactly but a new problem comes out uh at midnight my time which is you know sleep time but that's okay because i can do it at my leisure and unless i'm competing for points with people who do these problems in like two seconds and like are just way above my level I have no chance anyway then it doesn't matter and so you can go back to even previous years and do problems and so if you're having a problem let's say like the third problem this year you could actually just go back to previous years

Starting point is 01:47:58 and try doing some of the earlier problems and kind of practice up some of these problems I think it's just a it's a cool way of doing things. They are hard, though, if you're not used to doing these types of problems, because I've just kind of talked about solving programming challenges is almost like its own skill that's kind of different from day-to-day working, but there's a bunch of them out there. I think this is the third or fourth year,

Starting point is 01:48:18 and each one has 25 problems and two parts to each, so that's a lot of potential problems out there. They're pretty cool and innovative innovative and they help you save Christmas. So you should go to adventofcode.com and check it out. So what if you just wait till Christmas Day to sit down and do it?

Starting point is 01:48:36 There's not many people in the world who could do all the problems in one day. I would like to see that. I might actually try this. I didn't do it like do it uh the problem the problem is kind of ebb and flow i don't know that they necessarily get harder every day they kind of tend to give you an easier day on days that follow a harder one but there's some really cool ones i've done done some interesting um some data structures there's like a circular uh you can you can implement however you want but i ended up's like a circular uh you can you can implement however

Starting point is 01:49:05 you want but i ended up doing like a circular link list and some other cool stuff some other cool trees last year and uh this year's going okay uh it's never quite as good as i want because i'll get hung up on something stupid like spend a half hour on like something that should have taken zero seconds but that's that's what's like the code right that's what happens when you try and scale it yeah the other uh yesterday i did like a sweet little kotlin trick where i did a lazy initializer was setting a value lazily whenever you asked for it and so that way i was able to save myself some time by not calculating values unless you actually needed them so i made this cool little premature optimization and lo and behold i ended up making some modifications to the object and not realizing that, of course, Lazy's only evaluated once.

Starting point is 01:49:50 It's just evaluated lazily. So it wasn't returning values for my changes to the object. That's what I get for not being immutable anyway. But anyway, it was just kind of a fun little thing that, you know, you do run into problems like that at work or whatever, but I don't really do too many circular link lists or BFS trees or whatever at work on a normal day. So it's fun. Very cool.

Starting point is 01:50:14 All right. Well, just to kind of sum things up real quickly, scalability is a term we use to describe a system's ability to cope with increased load. And we study it and look at it because load is frequently a cause of reliability problems and outages. Load parameters are how we measure our system's load and the performance numbers are measures of how well our system is doing. And we can judge systems basically on how well they scale related to kind of the balance

Starting point is 01:50:41 of load parameters, performance numbers, and resource costs we talked about. And don't forget to leave a comment if you want to win this book, because it's awesome. Oh, I was totally paying attention. Um, yeah, well,

Starting point is 01:50:58 I was actually, I thought that Joe was going to say this part, so I was kind of waiting on him. And then I was going to say to send your feedback, questions, and rants to Slack. And to be sure to follow us on Twitter at CodingBlocks or head over to some website, CodingBlocks.net. Now we're all jacked. See, we're all blocked up. That's what happens.

Starting point is 01:51:19 And then you'll find some links there at the top of the page. Head of the line blocking is what just occurred. That was the latency, and here's the response times. Yeah. I think you covered it all. Yeah, we got it. It was good. It was good.

Starting point is 01:51:36 Listen to the end of last episode if you want to hear what we normally say. Okay. Good call. No, we got to say it. What? No. No, no. Fine. Yeah, we're done. No, we got to say it. What? No. No, no, no. Fine.

Starting point is 01:51:47 Yeah, we're done. No, we're not. Subscribe to us on iTunes, Spotify, search for more using favorite podcast app. Be sure to leave us a review at www.codingbox.net slash review. Yeah, while you're up there, look at our show notes, examples, discussions, and more. And to interview our Christian, write us a slide.

Starting point is 01:52:05 We can follow us on Twitter, iTunes, or head over to CodeBoss. And we can find all our social links at the top of the page. You didn't really say any words there, man. I don't even know what just happened. It was a blur. Yeah, it was.

Coding Blocks - Designing Data-Intensive Applications – Scalability

We continue to study the teachings of Designing Data-Intensive Applications, while Michael's favorite book series might be the Twilight series, Joe blames his squeak toy chewing habit on his dogs, and... Allen might be a Belieber.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.