Coding Blocks - Designing Data-Intensive Applications – Partitioning

Starting point is 00:00:00 You're listening to Coding Blocks, episode 171. Subscribe to us on iTunes, Spotify, Stitcher, and more using your favorite podcast apps. And if you can leave us a review, we would greatly appreciate it. And visit us at codingblocks.net where you can find show notes, examples, discussion, and more. Send your feedback, questions, and rants to comments at codingblocks.net or hit me up on Slack at joe follow us on twitter at codingblocks or head to um www.codingblocks.net and find all our social

Starting point is 00:00:36 links there at the top of the page with that i am the real alan underwood and i I'm Jay to the Z, LMNOP. And I'm Michael Outlaw. Right, yeah. That's good. This episode is sponsored by Datadog, the cloud scale monitoring and analytics platform for ensuring the health and performance of your databases.

Starting point is 00:01:04 And Linode, simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines. All right. So in this particular episode, we're going back to, I think, what is probably our favorite book, collectively speaking, which is Designing Data-Intensive Applications. And this time we're talking about partitioning. Now, we have a sit-in for Jay-Z today. How are you doing?

Starting point is 00:01:33 We have a special guest. Joe's had a bit of a toothache. So if you've ever had one, you know how those can kind of tear you down. So he's absent. Good call. Good call. Good call. All right. Hey, and so as we like to do.

Starting point is 00:01:51 I can't believe that you would like, you spoiled it. Like maybe nobody would have noticed. Maybe everybody would have thought my impersonation was spot on and it would have went the entire time. And then afterwards, somebody had been like, wait a minute, Joe wasn't there. Yeah, I do feel a little bad about that you know but i am curious though because i had to i had to miss an episode recording uh a few episodes back and the show was uncharacteristically short while I was gone. And so now I was like,

Starting point is 00:02:28 Oh, am I the reason why the show is that long? So now the question is going to be Alan, like is the show normal length now that I'm back without in Jay-Z isn't here or is it also uncharacteristically short because you know, a third of us are missing. Yeah, I don't know.

Starting point is 00:02:44 It might've just been a shorter topic. i don't know it might have just been a shorter topic i don't know we'll find out tonight if this goes if this goes an hour and a half we'll know that i don't know well i think the last one was much shorter than that though wasn't it it's like an hour i think yeah right uh yeah i don't know i felt bad i wouldn't feel bad hey so that bad as we like to do um we like to thank those who took the time to leave us a review and seen us how outlaw is the chosen speaker of names uh he's gonna go ahead and do this one okay uh so from it, we have Woham321, and thank you. Yes, very much a thank you. And also, we didn't put it in the notes here,

Starting point is 00:03:33 but because we were talking about designing data-intensive applications, we always like to do this. If you want to leave a comment on this particular show, which will be at www.codingblocks.net slash episode 171. Or if you're in a good podcast player, you can probably just click the link there from the show notes in that and go straight to it. Drop us a comment and you'll be entered for a chance to win a copy of this,

Starting point is 00:03:58 our favorite book. Yep. And like Alan said, we're going to be talking about partitioning tonight. So if you're following along in the book with us, this would be page 199 in the book. But if you're on a Kindle, this would be it's like 6% through. So position 11,000,128. Yeah. Yeah. Out of like 8 billion. Yeah. You got a, you got a bit to go, but you're, you're getting there. You're progressing. Right. Hey, so this partitioning, but you're getting there. You're progressing.

Starting point is 00:04:29 Right. Hey, so this partitioning thing, man, this is actually a really good chapter because, I mean, we've been dealing with a lot of this ourselves in our work days. And so this one kind of hits real close to home on some of the things that we've had to deal with, dealing with a lot of data. So first, I think we should talk about what partitioning is and what it is also known by, which is a little irritating, maybe. Isn't that true of all things technology, though? Everybody has to put their little spin on it to make it their own. Yeah, I would agree with that. This one, I feel like for such a fundamental thing

Starting point is 00:05:07 on what it does though, like throwing so many different terms out for the same thing is really a shame because it is confusing. So like a couple of these, it's called a shard in MongoDB or Elasticsearch or SolarCloud. It's called a region in H base a tablet in big table i mean

Starting point is 00:05:28 okay a v node in cassandra or ryok is that ryok or ryok ryok ryok i thought it was ryok but okay that's what i always said and then a v bucket in couch base because all of those say shard or partition. Yeah. I mean, at least I'm more willing to accept shard with it, just because I guess I'm more familiar with it, and so that's why I'm more forgiving of that one. But some of these other ones, like a V-bucket and a V-node, I'm like, come on.

Starting point is 00:06:01 Or a tablet? Yeah. Like, come on. Well, I thought that one, they were just being cute because it was a big table, so like well who's really right right yeah i just whatever it's partitions so but so then we didn't talk like notice though there's no relational databases mentioned but yet you can do these same types of concepts in like a Postgres, right? Yeah. I mean, you can do it in Postgres. You can do it in SQL Server. I would imagine Oracle and all them have partitioning as well, right?

Starting point is 00:06:31 Yeah. You can partition is what I was trying to get at. Yeah. Cool. So what is it? What is a partition? You want to take a stab at this one? Reading the Webster's Dictionary or just how I would define it? I would take a simple approach of just saying it's a way of splitting a data set into smaller chunks.

Starting point is 00:06:59 Again, this is not like a Webster's Dictionary version. I don't know if you've like gathered that already, but, but yeah, if I, if I had to think about it, like, you know, if I was trying to describe this to like, you know, a five year old, right. If you already had a table with a truckload of data in it, maybe you would want to partition that data off by, um, well, let's let, like if it was enterprise data, right. Maybe you'd want to partition it by customer. And so each customer, technically it's one table,

Starting point is 00:07:30 but it's basically each customer has their own mini table of data in it would be one way to think about it. I'll give you not the Webster's definition, but kind of what they said in this book and it's very similar to what he had right there is the one thing they wanted you to keep in this book is, and it's very similar to what he had right there is the one thing they wanted you to keep in mind though, is when we talked about replication in the past, that was making a copy of data,

Starting point is 00:07:52 right? Like, so if you, let's say you had a Postgres database or something, you make a copy of that and you put it on another drive somewhere. That's not what it is. Partitioning is spreading the data out over multiple storage sections. Doesn't necessarily have to be different drives.

Starting point is 00:08:08 Like, like what outlaw said a second ago, it can be chunks, but you do it either because the data won't all fit on a single storage mechanism or because you need it to be faster in some way. And in the, where this can like tie into like get confused with like replication though, is that because in the rep, in the case of replication, you're putting an exact copy on another system, right?

Starting point is 00:08:31 In the case of partitioning, it's not an exact copy because it would be unique data, but it could be on a separate system. But let's not mix up the fact that partitioning is, it's not like it's mutually exclusive from replication. You can totally replicate those partitions, but the reason you partition the data is different than the reason you replicate the data. Yeah, I would say you replicate the data because you want redundancy. You want resiliency in the case of an outage. You partition the data because you want to increase your IO to the data. Either increase the IO or because you straight up can't store it on a single drive because it's just too big. That's a great point. Yeah. So typically the records, they do talk about the partitions and they say that, hey, you usually have these records. They are stored on exactly one partition. So what they mean by that is if you have a record, a row, a document, depending on your data storage, right, whether it's a NoSQL database or a relational database or something, you aren't going to split a record or a document across partitions.

Starting point is 00:09:45 You're going to put the whole document in one partition. And just like Outlaw said a minute ago, each partition is basically its own little mini database. I like to think of it as its own mini table. Yeah, you could probably think about it like that. I think the reason why they said database in the book is because it handles a lot of operations, like indexing and things like that behind the scenes, right? But I mean, you're typically like within a quote database, you can partition multiple things in that thing. And you still access it as one thing and you don't necessarily know or care how those reads or writes are being

Starting point is 00:10:31 distributed. You just do the one query and get the results back. And so that's why in my mind it's kind of like I think of it as more as mini tables. Because I really like that enterprise example because then if you were to think about like if your application served Fortune 500 companies and so they might have a truckload of data and you were to partition that by each of those companies, right? Then you could still access that one thing like select star from users or whatever. And,

Starting point is 00:11:10 you know, you know, and read just your tenant, your, your, your customers data, you know, without,

Starting point is 00:11:19 you know, having to, you know, it's still the same database connection. And it's just like a portion of the table. Right. So we've already touched on a little bit here, but why partition? Well, one thing is scalability, right?

Starting point is 00:11:42 So they say different partitions can be put on completely separate nodes. And when we're talking about nodes here, we're actually talking about compute instances, right? So, I mean, if you're coming from a Kubernetes world, you're going to have nodes in Kubernetes, and those are basically your VMs or the machines that different pods can run on, right? So your pods are your little compute instances, and your node is the actual VM or the overall machine that those pods can run on, right? So your pods are your little compute instances and your node is the actual VM or the overall machine

Starting point is 00:12:08 that those pods can run on. And so they make it, they want to call it out that the data can actually live on different machines. Yeah, and you know, one of the things we were talking about, like the examples here that didn't make this list when you were talking about like shard and region and all that was Kafka. But I guess it's because, you know, it's just partitioning and Kafka, you know, doesn't have some kind of special name. But like a lot of this conversation, like in my mind, I'm thinking Kafka as we go through a lot of this. Well, Kafka is really interesting because their partitioning, as far as I know, you might have done a little bit more deep diving on this than I have.

Starting point is 00:12:49 But I think in Kafka, partitioning is literally just different folders on the same disk. And they do that on purpose so they don't spread it out across different nodes, I don't believe, because they want to ensure incredibly fast writes and reads. No, your partitions can totally be spread across different nodes because then you have NSYNC replicas for that, for a given partition within a topic. But isn't the replicas can be spread across nodes, but I thought all the writes were on the same disk. I thought all the partitions were on the same disk.

Starting point is 00:13:29 Let's not care about what the underlying structure is for the broker. If you had three brokers and you have a Kafka topic of, I don't know, orders, and whatever your partitioning scheme is, because you might, because Kafka, the partition is going to be dependent on your keying for it. And that keying is going to determine

Starting point is 00:13:56 like which partition it's going to go on to. And you can have those partitions spread across to where like, you know, if you had three partitions for that topic, each one of those brokers could be the leader for that particular, um, one, and then you could still have a replication factor as well. So maybe you have like a three partitions and a replication factor of two, meaning that for any one partition, there's two copies of it, a primary and a secondary, right? Right. That way, that way your, your reads could be served by either, but the, the rights,

Starting point is 00:14:32 any, whoever's the NSYNC replica for that given partition, once the, once it's decided, Hey, based on the keying mechanism, this is where that data should go, then that leader for that partition would take the write and then distribute that out to the read replicas. Okay. So getting back into this where they're talking about separate nodes, what they're saying is the reason you do this with the separate nodes is you can take extremely large data sets and you spread them out across a number of disks. And then that also means that your queries can be distributed across many processors is what they called it.

Starting point is 00:15:17 Right. So like when you do this and and I think, by the way, I think this goes back to why they called the database like little mini databases is because the queries are actually issued to the node that has the partition. And so that thing is processing the query on that particular partition, which is pretty cool when you think about it. And then, I mean, when you think about like, sorry to interrupt, but, but just going to just elaborate on this, like scale capability for a moment, you know, you think about like, sorry to interrupt, but, but just going to just elaborate on this, like scale capability for a moment, you know, you think about like, I mean, let's face it, like a lot of us that, that types of databases we're going to work on, we probably, you know, might not be working on database at that type of scale necessarily. But, you know, if you think about like a, um, a Facebook or a LinkedIn comes to mind since they created Kafka or that's where

Starting point is 00:16:06 Kafka got its roots, uh, you know, or, or like a Google or YouTube kind of scale where like, you're trying to do some kind of a search across someone, something that large, right? Like then, you know, you want, you want, you can't possibly have one, uh, drive system, which it definitely wouldn't be a single drive. But even if you had an array of drives, you're not going to be able to have it in one data center and still meet all your networking needs and latency needs and then replications. It wouldn't happen. So you'd have to spread that across. And then, like you said, because the physical disk is hosted by a separate machine, then you get to spread the I.O. and the CPU load of performing that query across multiple machines. So it's really a super cool concept when you think about it, but also really complex to execute. So like the people who put this stuff together, I mean, hats off to them, man. Cause this is, this is some deep

Starting point is 00:17:14 computer science stuff, you know? Yeah. It's not easy. And I mean, right. No, totally not. And to call it out explicitly. So, so if you were to, if you needed faster processing, right, like you're issuing these queries, then you might spread your data out across more partitions, meaning it could go across more nodes, right? And, and a good example of this, we did an episode on Elasticsearch a while back, but the way Elasticsearch works, and I may be butchering this a little bit in terms of terminology, but the way Elasticsearch works, and I may be butchering this a little bit in terms of terminology, but the way it works generally is you issue a query and it'll go to something like a master node. And then you have a bunch of different data nodes. Let's say you have

Starting point is 00:17:55 10 of them. And let's say that you issue a query saying, I want the top 100 results back sorted by last name, right? What that master node is going to do is issue that same query to all 10 of the other data nodes out there. They're each going to grab the top 100 records ordered by last name, and then it's going to bring together those 100 times 10. So those thousand results back and then, and then sort those at the master node to get the true top 100 ordered by name. So, so you just issued that query basically 11 times, right? You did it across all the data nodes and you got it back to the master node, put all those things together and then did that query again to sort it and get back your other one. So this is a good example of, and it's cool if you think about it, that will speed up your queries on those separate ones, right? Because they've got less data to work with,

Starting point is 00:18:58 but you might think, okay, well, crap, I'll just make, I'll make a hundred of these nodes, right? But then think about that. Now you're having to issue that query a hundred times, 101 times, really a hundred times, bring all that data back to the master node, sort that there. So there's, there's a trade-off at some point, right? Like you hit this, this inflection point to where, yeah, you get more processing power, but now you're spending a lot more time putting results back together. You have network traffic, all that.

Starting point is 00:19:27 It's not an easy problem. Yeah. And it really, it also depends on the system too. So like going back to the Kafka example, right? If you, in my example, I gave you three, three brokers. You actually can't try to create a topic that's has more partitions than you have brokers. Cause Kafka is like, well, where am I supposed to put this? Because in the Kafka world, uh, you know, when you, when you set that up, like it knows that you're supposed to be spreading

Starting point is 00:19:56 that across multiple brokers. Um, in like a Postgres world though, it doesn't care, right? Like you could just have the one Postgres server and partition that table or shard it, you know, like however many times you wanted you based off of whatever your structure was in Postgres isn't going to care. Wait, hold up.

Starting point is 00:20:16 You just said you can't have more partitions than brokers. You can like, you could, Oh, I'm sorry. I'm thinking of the replication factor. I'm sorry. Yeah. The replication factor. Yeah. Yeah. replication factor yeah yeah you can't don't make those two up yep sorry oh whoa thank you for catching that the the replication factor can't exceed the the number of brokers so which

Starting point is 00:20:37 makes sense because you know like how could you say like hey replicate this thing across four brokers but i only have three in my environment but even going back to like the constraints that you were talking about with like how you would do those um those partitions because in that example that i i gave i like purposely had fewer a lower replication factor than what the partition factor was because if you in like a Kafka world, for example, if you were to set those things equal, then you really like don't get the economies of scale that you want, right? Because, because what that would mean is that for every right operation, you're writing to every node and you don't want to do that. You really wanted to distribute it out. So like going back to your point in the, in the flip side where it was on the query side, right? Like't want to do that. You really wanted to distribute it out. So like going back to your point in the, in the flip side,

Starting point is 00:21:25 where it was on the query side, right? Like you, you want to, you want to have, uh, you know, some, some I'm trying to figure out a way to say it, like some of the burden be taken up by some of the nodes, but the other nodes available to take on other requests. Right. Yeah. And it's interesting. I mean, seeing as how we're talking about Kafka, like one of the things, and especially if you're dealing with big data,

Starting point is 00:21:52 you'll probably be looking at Kafka at some point, but one of the things you have to balance out there too is you create a number of partitions so that you can spread out the rights and you get more performance there, but it also allows you to process that data in parallel, right? So most stream processing technologies out there can work on a partition at a time. So a stream application could work on more than one partition, but you can't have more stream applications working on the data than there are partitions.

Starting point is 00:22:33 So for instance, if you were to split up your data into 10 partitions, the most you could paralyze those workloads is having 10 at a time run. And it's because all the data, like Outlaw said a little while ago, the data is typically stored in a partition based off a keying strategy. So all the data for a particular customer would exist in one partition. So if you were to split that data up across multiple partitions, then you're going to have a problem trying to aggregate data for that customer and that kind of stuff because you won't have it all going to the same workers.

Starting point is 00:22:56 Now, here's something to think about too. Like when you get into this type of world where you want to partition your data and everything, like it's typically because like you have larger data sets that you're dealing with you know for one reason or another it you know it might not be at the the size of facebook or or google or linkedin or whatever but you know you still you still have large sets of data and you want you know your users or customers or whatever to be able to, you know, not be necessarily be impacted by

Starting point is 00:23:25 another, you know, user or customer. But you need to know your data well enough to know like, well, how, like how, how many partitions might I expect to have in that? Because, uh, you know, like one thing that isn't often thought about, or, you know, I, I don't really hear this talked about often, but depending on that system, because it's going to like more often than not, like put a new file on the disc on some disc, uh, be it on the, on the same node or on a different node, like you're adding more files to the system. And by default, like let's say you're in like a Linux type of environment, like there's a max inode, like there's a max file count that you can have. And you can go in and you can set, you can increase that

Starting point is 00:24:19 if you know that you need to. But that's where I'm saying like you have to have an idea. It's like, okay, well, because this is going to be my partitioning strategy, then I'm expecting that these will be the total number of partitions that I might have because where it can totally bite you in a Kafka world. And I'm sorry to like keep picking on Kafka, but you know, we love it. Um, because for any, any partition, there's actually two files that get written into a Kafka world. And I don't know about the underlyings, you know, the underpinnings of like a Postgres or something like that. But I got to imagine that like there

Starting point is 00:24:56 might, the point being is that there might be more than one file that's getting written. And so, you know, there's like an amplification kind of factor there to worry about because but i'm not sure how how postgres deals with it specifically but i was kind of thinking in my mind like oh i wonder how it deals with like the write ahead logs for that type of situation but i know specifically for kafka it writes out two files per partition in a directory but i think i think that's mitigated like what you're talking the inodes, we've run into that before in some stuff that we've done. But I think in Kafka and maybe even in Postgres and a lot of these other ones, it's writing to single big files, right? So it might write out two files per partition, but it's not like it's writing out a new file per data piece that comes in that gets appended to another file set.

Starting point is 00:25:44 Well, if you were, well, if you were creating partitions on the fly, yeah, then you get into trouble there, you know, then,

Starting point is 00:25:52 then based off of your keying mechanism, um, you could get bit there, but where you could also get bit is not only is there the limit on, uh, you know, those files, like literally the, I know count

Starting point is 00:26:06 there, um, which you can't like, it is configurable in Linux, but, um, in the case of like a Kafka, it has a file handle open for each one of those files. So you're reserving, even though it might be a small amount of memory, like you're, you have memory in use just to keep that file handle open because Kafka is going to always keep that file handle open because Kafka is going to always keep those file handles open so that it can constantly read and write to them. Yep. So, so to get back around to this whole thing, this, this notion of splitting this data up across different nodes, because it's larger, some examples of these are no SQL databases and like Hadoop data warehouses,

Starting point is 00:26:45 right? So no SQL databases could be like Mongo or, you know, that's probably the most popular one out there. And then the Hadoop stuff, a lot of times you'll see that they called out data warehouses, but even data lakes, right? Like if,

Starting point is 00:26:57 if you're using some sort of technology like that, and they also called out that these types of data partitions are typically set up to service either analytical workloads or transactional. You're typically not going to do both with one because you're going to store data differently in those situations, right? And we've talked about it in the past, like relational databases are usually the hammer or the nail. They'll do the hammer for everything. Yeah, because you'll look at it and you're like, well, I have the data here. I can just run some reports off my transactional database.

Starting point is 00:27:38 And you can, but it's not going to be the most efficient, and you're probably going to end up hurting your transactional, um, throughput if you do that. So it's like the difference that we've discussed between like a OLTP versus OLAP type databases, right? Like columnar versus relational type storage mechanisms or row based, I guess. And, and so we mentioned it earlier, like when you're doing this partitioning, that all belongs to a single partition, but you can still replicate that stuff, right? For fault tolerance or even for speed. Like when we talked about Postgres in the past, right? Like you have your read replicas out there that, you know, that's perfectly fine. So, so it can be used for, for both fault tolerance as well as speed. I mean, if you think about it, like just, you know, put your creativity hat on for a moment. Like if all,

Starting point is 00:28:29 if you were able to have three, three replicas of your given partition, right. And each one of those obvious, you know, by replica, I mean, it's on a physical different machine.

Starting point is 00:28:42 Right. And maybe in front of that, those three nodes, there's some kind a physical different machine, right? And maybe in front of that, those three nodes, there's some kind of a connection manager, right? But if all your rights are able to go to whoever's serving as the primary and you could traffic all your reads to the other two, then right away your reads get the benefit of being able to be like potentially twice as fast right because now

Starting point is 00:29:07 you know depending on what the kind of mechanism is that we're talking about you know maybe it's like an elastic search where it can like aggregate the search and you know combine them all together or you know uh maybe it's just you know some other kind of strategy like you know particular customers might go to you individually different particular node. But the point being is that in that type of situation, this is where the scalability of this whole conversation comes in, right? Because now you can have one thing service the writes and potentially others service the reads. Yeah, so you don't have discontention for that same, that same stuff. So this is a outlaw.

Starting point is 00:29:51 I actually alluded to this with Kafka earlier. So a single node can store more than one partition, right? So if you have 10 partitions on your data, all 10 could be on that same node. Five of them could be on there, three of them, whatever. It doesn't have to be a fixed number, But this is what's interesting right here is nodes can be leaders for partitions of some of the data and followers on partitions on others. So I think you were mentioning that earlier with Kafka, right? Like maybe broker one is the leader for partition one, right? And then broker two might be the leader for partition two, which would make broker one a follower for partition two,

Starting point is 00:30:31 right? So they can, they can sort of alternate who is the leader for doing the rights when they're getting this data. Yeah. Yeah. Which is, which now,

Starting point is 00:30:43 now going back to that creativity hat situation. Now you're distributing your rights. So not only can you distribute your reads, but you can distribute the rights now because, you know, you can have different nodes acting as the leader for different partitions, which is a super cool concept. Yeah, you're kind of unbottling your bottleneck, which used to be how fast your disk could do your reads and writes. But if you're writing to 10 different disks, now you've reduced that bottleneck, right? And potentially you could add even more if you needed to. And this is where they did call out that partitioning is independent of replication in terms of what they are. And they said even the implementation, right, like different implementations of replication can be used with your partitioning. So more or less, they kind of stand on their own. And I did call out, so I don't typically put in the notes like a particular diagram or anything. But this one was interesting and it's kind of a busy diagram. But if you have the book, look at figure six one and it shows this whole

Starting point is 00:31:53 leader follower scheme to where it's partitioned across multiple nodes. And so you can kind of see, you know, hey, this one's a leader over here, but it's following this. And it is kind of a spaghetti diagram, but it'll help you understand what we're talking about here. It's basically like the same thing that you just described with the two nodes in the two partitions, except they've blown it out to four nodes and three partitions. Lots of lines. Yeah, so you have a lot of lines there.

Starting point is 00:32:23 Yep. And so now here's what's interesting. So we've talked about what partitioning is. We talked about what it's supposed to try and solve. But here is the key part, and this is sort of what throws the wrench in it because it all sounds amazing what we've been saying here, right? Like you have these partitions. It's faster.

Starting point is 00:32:40 You can grow. But the goal in partitioning is to try and spread the data around as evenly as possible. Because if it's not spread around evenly, it's called skewed and skew causes problems. Yeah. This is where like, so the, the, you know, to, to so far in this conversation, I'd mentioned like, well, maybe you would like partition by your customer, like in the Fortune 500 example. But that might not be the strategy you want to use because let's say in that example, not all of your Fortune 500s are created equally. So, you know, Fortune 500 company that ranks number one on that list might be a lot busier than the company that ranks 500 on that list. just the list of the employees, they were coming in for a given, um, uh, company in that scenario, then obviously the company that has the largest employees is going to get,

Starting point is 00:33:51 is going to, the, the, it's going to be skewed to them, which they, they actually call it a hotspot in the partitioning strategy because you're going to end up reading and writing to that partition more than you would one of the other ones. So you typically want to come up with a keying strategy. Again, I'm coming at this from like, I pretty much focused entirely on this chapter coming from a Kafka point of view, which is probably a fault. But in a Kafka world, you want that keying strategy to be, uh, like deterministic, right? So that, you know, you, you, you want to have a keying strategy that makes sense. It can't be random because in order for the read to work, it has

Starting point is 00:34:40 to be able to deterministically know which partition to go to for it. But you want that, that keying strategy to, to have something about it that would cause it to spread over the data. So for example, I think in the book they gave an example of, let's say that your keying strategy was on date, right? Well, and let's just say it was just like you were getting all of the website, all of the web logs from Amazon or Google. And as they're streaming those logs to you, you're writing them in and you're partitioning based on date. But then what that means is that your partition would have a hotspot on whatever the given day is because that's a hundred percent where all of your rights are going. And if all of your rights are only being served by that single node, then you know, that one thing, that one node is getting a beating for that day. And then the next

Starting point is 00:35:36 day it's another node and you know, et cetera, et cetera. And now your, your reads might be distributed a car, assuming you were to do multi-day reads in that scenario. But if you were to only do a single day read, then again, you end up hammering just one node. And so that's why you want to try to come up with a strategy that evenly spreads that. Yeah. And to call it out, these hotspots, that's where everything, all the activity is going, right? But that, another thing to keep in mind, typically, if you're running systems like this, where you have multiple nodes, you're paying a decent chunk of money for these nodes, right? So if you got VM spun up in a Kubernetes cluster or whatever, and you got these beefy servers there, these hotspots

Starting point is 00:36:20 make it to where you're using one node at 90% utilization and the other ones are sitting around doing nothing, right? So really the goal is to spread it out to where you're not bottlenecking anything and you're utilizing the CPU and the IO bandwidth that you have, right? So that is why you want to distribute this stuff. And so here's where they start. I love it that this book does this like, hey, well, one way to get around this is to do this. Right. And they say, hey, you can try and use putting data on random nodes. And then what I love about this book is they're like, OK, so that's great. But let me tell you the problem with that. Right. So they walk you down because you'll go, oh, yeah, that sounds good. And then I mean, that's pretty much the problem that I described a moment ago. Right. Cause then like you, you create, like, let's say if you were like,

Starting point is 00:37:09 okay, fine, I'll just create like a random, uh, you know, universally universally unique ID, right. A U a U ID or a GUID in like a C sharp world, then, um, you know, okay, fine. You, you created this globally unique identifier and you use that as your keying mechanism. But now how do you deterministically like recreate that based off of the data? You can't. So now when you want to read this record,

Starting point is 00:37:35 you're like, oh, let me read every partition I've got to go find it. Exactly. And that's what he's saying there is the biggest thing. And, you know, that needs to sink in a little bit. If you can't determine us deterministically know where that data is stored, like which partition on which node, then that means you have to issue a query to every single node to see if they, Hey, do you have it? So now it might be faster,

Starting point is 00:38:05 but you're actually wasting processing power because now you're making all your nodes do work when really you should be able to go straight to a node and say, Hey, give me this record. I think, I think this might've been the part of the book where they had talked about an example of like a dictionary or an encyclopedia set of books where like, if you had each letter of the alphabet as a separate book, right. Then, you know, on the surface,

Starting point is 00:38:36 you might think like, okay, well that sounds fine. Right. It's deterministic. Right. I know that if the, if the topic starts with this particular letter, I know exactly where to go and find it, you know? And, um, but the problem is, is that some letters are more, uh, common and more heavily used in the English alphabet than they are, than other letters are. So then you're going to have hotspots on like the letter T or S or whatever. So like those books might be like super thick, but a letter like Z or, you know, Q might be a really thin book, right? And so the idea is that you want all of these books to be the same.

Starting point is 00:39:21 And so even in the case of that single letter, you might actually split that out into multiple books to be like, uh, T a through T H is in this book and T I, uh, through, you know, some TZ is in another book, you know, whatever. Right. Yep. Um, so, so that you can like distribute, distribute those loads. But then if you were to think about what you were saying before about having it randomly, if you didn't deterministically know, okay, well, this topic starts with T, I know to go to this particular book, right?

Starting point is 00:39:59 If you instead just had a random GUID for that book or for that particular topic and all of your books were now just like this set of GUIDs are here or this set of GUIDs are there and this set of GUIDs are there. And say like, okay, find me the book for this particular Lamborghini. And you're going to like, oh, I guess I have to go hunt through every encyclopedia I have in order to take advantage of the scalability, you need to be able to pinpoint the direction you want to go to for the read. And I beat that dead horse. I think, I think they'll understand that. This episode is sponsored by data dog, the unified monitoring platform for increasing visibility into your Postgres SQL databases.

Starting point is 00:41:10 Create custom drag-and-drop Postgres dashboards within seconds so you can visualize highly granular data and custom metrics in real time. Datadog's 450-plus turnkey integrations make it easy to correlate metrics from your Postgres servers with other services throughout your environment. Datadog provides real-time service maps, algorithmic alerts, and end-to-end application tracing so you can monitor your systems proactively and detect issues before your customers do. And boy, does that matter right so you know when we talk about data dog here like uh you you ever heard of that whole thing about like a single pane of glass to like see all of your your stats of what's going on data dog is is the king of this when it comes to monitoring and they've got how to articles and documentations and blogs on like any topic you want. And we've been pretty much focused in like a database type topics here in

Starting point is 00:42:12 this particular episode, as we talk about partitioning and whatnot, they have, you know, if you wanted to do Postgres, Kafka, Cassandra, whatever it is, they, they have, you said it 450 plus turnkey integrations. All of those technologies that I just said are in there specifically like, okay, Postgres just to, to monitor the vacuum processes, just to, you know, see how it's, it's doing that, you know, in terms of like, uh, you know, compact compaction, they have, they have everything. Datadog is just amazing. I'm telling you, if you haven't already given their blog, if nothing else, give their blog a view and you'll be thoroughly impressed. I promise.

Starting point is 00:42:56 Yeah, and if you go out and check out their blog, then you'll probably also want to go ahead and start monitoring today with a 14-day, free trial and receive a free Datadog t-shirt after creating just one dashboard. So go ahead and visit datadoghq.com slash coding blocks. Again, that's datadoghq.com slash coding blocks to learn more about how you can start monitoring your databases with Datadog. Okay. Well, you know, it's that part of the show where I like to ask you to do me a favor. Now, this is the part where it gets weird, right? Because in the past, I don't know, man, Jay-Z and Alan,

Starting point is 00:43:33 they would do stuff like, hey, listener, we would greatly appreciate it if... And it's just like, I ain't got time for that. It's not time for the show. No, no. I can't do this radio. Late night, time for the show. No, no. I can't do this radio late night. You're listening to the sweet, smooth sounds of www.codenblocks.net.

Starting point is 00:43:52 No. We got a call in from Michael out there. Yeah. He would like to request that. Yeah. So if you would, we would greatly appreciate it if you would take time out of your busy day to leave us a review. We really do appreciate reading them.

Starting point is 00:44:07 Every time we read them, they put a smile on our face. It really doesn't mean a lot. And that's your way of like, hey, if you ever thought like, man, I'd like to buy these guys a coffee or a beer, this would be your way of being able to virtually do that for us.

Starting point is 00:44:24 So with that that we head into my favorite portion of the show survey says all right so uh we kind of can't do this because uh jay-z isn't here and it would be weird if you had you know we're the only one alan i think i would be the winner today i'm pretty certain that I'm going to win today. And yet you're going to lose because I'm not going to do it. Well, that kind of stinks, man. I'm not going to lie. That's an extra kind of sting right

Starting point is 00:44:53 there, isn't it? That's an extra kind of hurt. Yeah. Now, because I was just thinking, we'll hold off on doing that one until Jay-Z is back with us. But what we can do is that we were, you know, thinking about like, well, what could the survey be for this particular episode? And I would hate to like not do this one since, you know, it is so specific to

Starting point is 00:45:16 the topic. So here's your survey for this episode. Have you ever had to partition your data? And your choices are ever more like always, or on occasion, it's just another tool in my toolbox or once I don't want to talk about it or nope. Does that mean my data set is small? Or nope, not my job. I'm curious on this one. I really am interested in finding out how many people have to deal with this kind of stuff. Yeah, I mean, because it's actually kind of surprising. You know, like, I know that, like, we might be tempted to think, like, no, probably not. But we do.

Starting point is 00:46:04 But we do. But we do. So. This episode is sponsored by Linode. Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines. Develop, deploy, and scale your modern applications faster and easier. Whether you're developing a personal project or managing larger workloads, you deserve simple, affordable, and accessible cloud computing solutions.

Starting point is 00:46:30 You can get started on Linode today with a $100 free credit for listeners of CodingBlocks. You can find all the details at linode.com slash codingblocks. Wait, wait, wait. Did you say $100? I did. That's incredible, wait. Did you say $100? I did. That's incredible, right? That's crazy.

Starting point is 00:46:48 You were talking about developing your personal projects. I mean, how much easier is that if you got $100 free credit to get started with it, right? And Linode has data centers around the world with the same simple and consistent pricing regardless of the location. It's super easy to get started. Yeah, totally. And that's one of the location. It's super easy to get started. Yeah, totally. And that's one of the things like we've, we've used Linode and just their dashboards and, and the integration, the ability to go in and launch things quickly and easily get things spun up and, and turned off as quickly and easily as you want makes it to where you can get started

Starting point is 00:47:22 and running on the platform in no time. Yeah. And I mean, you, you honestly are joking because it's like, they actually have a marketplace. You can go to the marketplace and you say, you know what? I want to spin up a Postgres server and you can just click a button and you got a Postgres server, right? In the Linode cloud, super simple. You want Splunk or you want, heaven help you if you wanted to build your own Jenkins server. But, you know, I don't know. Maybe GitHub Actions just doesn't do it for you.

Starting point is 00:47:59 And you're like, you know what? I deserve some pain. I mean, luxury or what would it be? At any rate, you decided to go Jenkins and you know, I'm not going to fault you for it. You did it. You did it. And you know, kudos. Uh, you're, you'll be a better, better person for it. Um, I think. Yeah. Yeah. Oh yeah, definitely. You definitely will be a better person for it. So at any rate, the point is Linode is got a super simple platform, a super simple interface to work with.

Starting point is 00:48:26 They've got Kubernetes right there at your fingertips, Linux virtual machines to save you money. You can spread that $100 credit pretty far. Yeah. So go ahead and choose the data center nearest you. You'll also receive 24 seven, 365 days a year, human support with no tears or handoffs, regardless of your plan size. So you don't have to be a huge corporate customer to get that kind of love. You can choose shared and dedicated computers,

Starting point is 00:48:56 or you can use your a hundred dollars in credit on S3 compatible object storage, which will go a long way, manage Kubernetes or even more. If it runs on Linux, it runs on Linode, object storage, which will go a long way. Manage Kubernetes or even more. If it runs on Linux, it runs on Linode. Visit linode.com slash coding blocks and click on the create free account button to get started. All right. So we talked about the problems with these random keys, right? And what they cause you to do having to go out and query everything. So one of the ways that you might be able to solve this problem and, and outlaw basically talked

Starting point is 00:49:29 about it right before, cause he jumped ahead. Oh, sorry. I couldn't find the kill switch on his mic beforehand, but I disabled this whole partitioning by a key range, right? So, um, now this is interesting because they actually go into multiple different ways about this, though, is you can assign a continuous range of keys on a particular partition. Right. So like you said earlier, maybe you do TTA through TH. Right. Like that's one way to do it. And it was that section of the books. Yeah, it was. Yeah. I mean, you were literally just ahead of it. And so he already talked about all the benefits of it, right?

Starting point is 00:50:09 Like you know exactly where to go. If you're looking for a particular Lamborghini, you know exactly which book to go to because it's sorted by the letters, right? Now, this is what's kind of interesting. So you can either have these partition boundaries be determined manually. So somebody go in and set it because, Hey, we know our data, right? You got a set of data scientists or people that are really familiar with it and they set them, or you can have the system actually choose it for you, which is kind of cool, honestly. So this would be like going back to the book example, like somebody had to decide decide, oh, hey, this book is getting a little too thick.

Starting point is 00:50:46 Let's split this off here. And T.I. starts a new book. I'm sure there's a rap joke in here somewhere. I keep searching for it. Did T.I. actually start a book? He did. I'm not sure if you're aware, but he did. Oh, that's interesting. I'm not sure if you were aware, but he did. Oh, that's interesting.

Starting point is 00:51:08 I did not know that. It was the Encyclopedia Britannica series that starts at TI and goes to TZ. Awesome. Well, some of these databases that actually do automatic partitioning are BigTable. That's Google's. No. Yeah, BigTable is Google.

Starting point is 00:51:24 HBase, RethinkDB, and MongoDB, they all will do that. And the partitions keep these keys sorted to allow for fast lookups. We talked about this back in an episode when we were talking about SS tables and LSM trees. And this is a way to make sure that not only can you go to the partition you're looking for the data but once you get there you can get that data super fast now we need to like just focus for a moment on this because this part here was kind of like mind blown right because you think about creating an index for your data right and? And now we're getting super meta because now we're going to create indexes for the index

Starting point is 00:52:08 to say like, well, we've split the index apart, you know, dynamically and decided like, you know, A and B are over here and C through D are over there, right? And we decided to split, you know, we had E, F and G together, but we split those apart into three separate ones, right? Like it's like super cool that it's happening, but totally, you know, getting meta now, like, okay, we're going to index your index.

Starting point is 00:52:34 It's like, this is like if exhibit had to like define your database, you know, this is what would happen. Yo dog, I heard you like indexes. So I'll put an index in your index. That's what happens. Hey, but I think by this point now, though, you're probably more convinced that a partition is a little mini database, right? Because it's storing its own little indexes on indexes. And, yeah, I mean, it's crazy.

Starting point is 00:53:00 Okay. I mean, I will buy that because of of the overhead required to to do it but i still in my mind like i'm thinking of like you know take back the one thing yeah i'm sorry i didn't realize you were going to engage no take back seas i'm done i can't i can't i can't possibly undo that. That's right. You're there now. All right. So we have gotten him to admit it. So, um, so the, the other thing that was interesting. So we talked about this earlier, like if you were to try and use timestamp as, as your partitioning strategy, it wouldn't work, right? Because you're hotspotting one thing.

Starting point is 00:53:44 Well, one of the things that they said is, okay, well, let's say that this is big in IoT, right? Like if you have sensors, like I think Outlaw, you used to work for a company that did like air conditioning, HVAC stuff, and they had sensors all over the place, right? To find out when systems were failing, to find out what the temperatures were, all that kind of stuff, right? One way that you can handle this is, okay, well, you have data coming in that's time sensitive. Just take the sensor name and prefix it to that timestamp. And then that way, you've got the sensor data on their own nodes and partitions, but you still have it ordered in a way that you can get to that data quickly. Well, I thought it would still be like, it would be

Starting point is 00:54:28 distributed a little bit more evenly, right? Did I misread this part? Because like, it was a combination of the sensor name plus the date, and so like, that one sensor might actually be split across multiple partitions. It could. It totally could. What they were saying is if the ordering of this was important to where, like, let's say that you're querying a day's worth of data at once by doing it this way, by putting the sensor name plus the timestamp on the end of it as your key, then you'll be able to run aggregates because all that data will live on a single partition

Starting point is 00:55:05 when you go to pull it. But the thing that it fixes is, let's say that you have a thousand sensors. You don't want all those thousand sensors to be on one node. So when you go to query and aggregate that stuff, it's going to destroy a single node. So this will allow you to distribute that query across multiple nodes where that sensor data would live. But what am I missing here then? Because in that scenario, you might as well only have just the sensor name. And why do you need the date at that point? Because I thought the purpose was to spread it evenly across all of the partitions.

Starting point is 00:55:45 And that's why you were combining the two. Well, that's actually what they say. The problem with this is you end up pot spotting some things as well because you're putting a lot of that same data on a single node. So the difference is if you were to do it just by the date time, then all data goes on a single node. Right. if you were to do it just by the date time, then all data goes on a single node, right? If you do it by the node plus the time, then each nodes data is going on a single one and it keeps those timestamps together. So,

Starting point is 00:56:14 you know, it's not perfect, but it's a little bit better. I guess when I, I guess, uh, yeah, I'm probably thinking of like,

Starting point is 00:56:23 if you were to, uh, maybe create some kind of a hash out of that, but you probably wouldn't want to do that, create any kind of a hash off of a timestamp. That'd be an awful idea, but that's probably like where, where in my mind I was getting confused with thinking that like, uh, the combination of the two would spread it across because otherwise, you know, if you, if you had all of a single IoT device's data going to a single partition, then your assumption is that all of your IoT devices are going to equally submit data. And I don't know if that would always be the case. No, they wouldn't.

Starting point is 00:57:00 And that's good. Actually, one thing to point out is I said timestamp. It's not the whole timestamp. So you would do sensor data plus maybe year, month, day, hour, and minute, right? You probably wouldn't go all the way down in a second. So a single sensor for the same minute, all the data would live on one. Maybe for the next minute, it jumps to another node. So by doing it this way, you're

Starting point is 00:57:25 jumping around to multiple partitions, right? Now here's, what's crazy that might solve your right problem, right? That may actually distribute your rights out pretty well across all the different partitions for all your different sensors. What stinks though, is now you want to go run a query for maybe one sensor. Well, maybe it doesn't live on the same node all the time. Or maybe you want to query the data for all your sensors, and now you've got to go across all nodes. There's all kinds of weird problems that come in to where you kind of have to know your use case of how you're going to use that data after you write it to know how you need to distribute it for later access. Yeah. I mean, that's a, that's a total fair point. So going back to the earlier point about like knowing your data as to like how you would partition it, you know, I was coming at

Starting point is 00:58:15 it from the point of view of like the file handles and things like that. But, you know, knowing how you would want to read that data would matter, you know, as part of your partitioning strategy, like you said, you know, cause especially if you know, if you knew that you had to do like, uh, aggregations across it, you know, then that's going to, uh, factor in on how you might want to distribute that, uh, you know, that data across the partitions. Right. Because like we talked about earlier, if you have inefficient or ineffective partitioning for what your aggregate use case is afterwards,

Starting point is 00:58:52 you may be issuing the query against all nodes, which takes away from what you're trying to do in the first place. Right. And yet somehow Elasticsearch pulls off all of this and it's just magic. I don't even know how it managed to do what it does. And yet somehow Elasticsearch pulls off all of this and it's just magic. I don't even know how it managed to do what it does. Yeah, so they talk about it in here that's interesting. So like one of the ways that they sort of help with hotspotting is you have the ability in Elasticsearch to have like sort of these warm or hot nodes where your most recent data that you're going to query a lot lives. Then you're going to have a colder node where you put stuff that

Starting point is 00:59:31 doesn't get queried as often and you don't care if it's as fast. So they have ways to mitigate that, that they've thought about, like your different data nodes have different settings that you can do. And typically you'll store those on different types of drives too, right? So your cold data, you might put on a slow spinning drive, your hot data, you might have on some sort of, you know, new SSD or something. Well, I was going to say like, I use, uh, only SSDs for everything. So like, you know, just raids of, uh, you know, rate arrays of SSDs. And, you know, in last episode, I had learned about this, uh, new SSD from Corsair that had amazing stats on it. Did you realize, by the way, like a total tangent, that you can get that SSD as an 8 terabyte?

Starting point is 01:00:16 No, I didn't. How much is that thing? That is insane. Well, it's hard to tell because it's not showing up on what's grayed out. So the 4 terabyte version of it is $680. That's not terrible. Maybe I can't. Oh, I was able to click the 8 terabyte.

Starting point is 01:00:38 Woo! $1,500? Can I borrow some money? How much is it? You're pretty close, man. You are, you are pretty close.

Starting point is 01:00:47 You are $50 off. Oh yeah. It is 1450, but I could order it now and it'll be here in, uh, you know, six days. So dude,

Starting point is 01:00:58 get two of those and put them in a raid. Oh my goodness. Like that's the stuff dreams are made of right uh it's ridiculous but in all honesty i mean in enterprise systems they they spend some money on some ssds right like and they will raid them on enterprise type stuff but you know that's your true crazy kafka brokers out there they're probably running some sick hardware that costs, you know, more than a house.

Starting point is 01:01:27 Custom made hardware. Right. Seriously. I mean, you remember like the, the, do you ever see like the first pictures of the hardware that Google ran on where it was just like whatever they could get their hands on?

Starting point is 01:01:38 Oh no. Like when they were early, early, early, early, early, early starting days before you'd ever heard the name. Right.

Starting point is 01:01:45 Yeah. It was like every piece of random hardware that they could get their hands on, like just building out the system. It's pretty, pretty interesting. See if I can find a picture of that while you're. All right. So our next one is what if you partitioned by hashing the key? So this is one of the ways to avoid the skew in the hotspots, right? So, and this is actually what a lot of systems use to try and get around the problem.

Starting point is 01:02:13 So what they say, and there are some really interesting things all throughout this entire section here. So a good hashing function will take the data and it'll make it evenly distributed, which sounds perfect, right? One of the things that they call out here, and I'd venture to say a lot of people who are hung up on security probably think about this a lot, but hashing algorithms don't have to be cryptographically strong, not for keying purposes, right? So you're not going to need the latest, you know, AES-256 encryption or it doesn't matter. You don't need that. You just need to be able to use even an MD5 hash is perfectly fine because it doesn't need to be, you know, secure. Additionally, they do call out cassandra uses murmur 3 voldemort which until i read this book i'd never even heard of

Starting point is 01:03:10 voldemort uses fowler null vo which i'd never heard of that either um so just to give you an idea like md5 is perfectly fine for this mongo uses it and that system has done pretty well as far as I know. I'm so afraid you're going to say that other system a third time and then... I just sent you a picture. I'll include it in the show notes as well. The early days of Google. That's amazing. The first computer at Stanford that was used to house it was a custom-made enclosure made from Mega Bloks.

Starting point is 01:03:48 Yeah, a.k.a. the off-brand Legos. Right. I was thinking, like, Mike RG is going to be so upset. Google is now dead to him. Yeah. Isn't that so cool, though? There you go. That's pretty amazing. We'll have a link to that in

Starting point is 01:04:07 the show notes because uh that is pretty cool to look at i would just assume that i would melt the the mega blocks if i were trying to like i would have never thought like hey let me build this i would have chosen i would have built i would have like cut some wood first. Right. Before I was like, hey, let me. Let me build this out of plastic. Right, yeah. Let me melt my homework assignment. Yeah. No, not so much. Hey, so one of the cool things that they pointed out here that I never really would have even thought about was they said that not all programming languages have suitable hashing algorithms. Or I thought it was that not all of them are the same though right now they're necessarily suitable it no no suitable so they're not all the same but here's

Starting point is 01:04:54 why so it says because the hash will change for the same key oh so this this is the part where it was like, what? It doesn't even make sense. So Java, their object.hashcode and Ruby's object hash were called out specifically. They will not, for the same object, won't necessarily create the same hash each time you call it. And what you said earlier, you need deterministic keys to know where you're going. So if your hash changes and you're, and you're using that as your key on where to put it, that'll kind of mess you up. If you go to look for that record next time.

Starting point is 01:05:34 Right. Yeah. I thought, well, but I didn't read that. Those like, um, I don't know enough about the Java dot object dot hash code.

Starting point is 01:05:42 I just assumed that like, maybe, you know, that's an instance specific assumed that like maybe, you know, that's an instance specific thing. So like, you know, the memory address might play a part in it. Cause like,

Starting point is 01:05:52 you know, in, in like a.net world, there's a, I forget what it's called though, where like if you were to, to, um,

Starting point is 01:06:01 implement your own, uh, I equalable or whatever, like quality type of check, right? You know, like one of the things you can start with is the hash code to say like, Hey, are you even, is this even the exact same thing? So that's, so that's why I took that. I mean, you know,

Starting point is 01:06:20 that's where I thought that they were, they were going with some of that. But also in my mind I was thinking like, well, you know, that's where I thought that they were, they were going with some of that. But, um, also in my mind, I was thinking like, well, you know, depending on that, uh, you know, what you, how you decide to hash, you might want to, um, if you know that you're going to have clients cross language, then you want to have a consistent hash deterministic hashing algorithm that is available to both of, you know, to every one of your languages. Cause you could pick, end up picking, you know,

Starting point is 01:06:55 some, some operation, you know, a hashing algorithm that might only be available to Java. And then all your.net clients, for example, can't, can't query

Starting point is 01:07:05 or Python clients, whatever they might be, right? So two things, because that reminds me of something that actually happened. I'll share it in a second. But I think what you just said, and it made me realize, I think the reason why the object.hash code in Java isn't consistent, and I think it's exactly what you said. It's a memory thing, right? So in Java, you have to be very careful about whether or not you're doing like this equals something or this dot equals something because it's actually checking to see, is this the same exact object

Starting point is 01:07:35 or is this object have the same value, right? And those are two very different things. And that's probably why the hash code's different because they want to check to see, are you referencing the same object in memory? Um, so that's probably a very good call. Uh, the other thing that you just said about cross language. So we ran into this in Kafka and it was really interesting. So we were trying to do deterministic keying, like what you said, and we, and we use the same, uh uh hashing algorithm between both c sharp and java and guess what we ran into to where it actually messed it up the difference between big indian and little indian um microprocessor architectures it created a different value on a different system so you actually have to be aware of some stuff like that.

Starting point is 01:08:28 Even if the algorithm looks like it's the same, you might have some architectural things that will throw a wrench into your plans. And that's, that's fun to find. Another big one to think about too, depending on like how you're going to do your, your key.

Starting point is 01:08:44 And, and then subsequently, if you're going to use those keys to create hashes is do, it doesn't matter if you have a hash collision, because if, for example, if you just, you know, you said Mongo uses MD five, but if you just used an MD five, then you can have a hash collision. Like where, you know, for those that are wondering what I mean by that, like the idea is you could have two different pieces of data, you run them through the same algorithm and they could produce the same result. Now it's supposed to be rare, but I mean, you know, if you look at MD5, like technically it is possible. And, you know, if the data that you are trying to key is like super sensitive, right, then you don't, you might not want like Alan's hash to equal the same as mine.

Starting point is 01:09:38 And then Alan is able to see my data or vice versa. Right. So that's a really important thing to call out there is maybe it won't, it might not matter if you have hash collisions if you're just using it to determine which partition to write to. But if you're trying to use that as a unique identifier for your record. True, yeah. That could be an issue, right? So you do need to be aware of that. If you're not using something that is going to guarantee you basically unique results, then you could have that problem. Yeah, that's a fair point. That's a fair point.

Starting point is 01:10:10 But all of this goes back under the same advice that we've said multiple times now about like, know your data. Know your data and your use cases for how you plan to read and write it. And if you just take some time up front to just think through some of that, it can pay dividends later on. Totally. So the next thing they say is these parts that you can set up these partition boundaries, which we talked about earlier, right? Like, you know, A through B or whatever. You can do this and it can even be done pseudo randomly. They call it consistent hashing, but they say that consistent hashing doesn't work well for databases. So maybe perfectly fine for things like Kafka, not going to work for Postgres or something like that. Mongo, whatever. It says while the hashing of the keys buys good distribution, you lose the ability to

Starting point is 01:11:07 do these range queries on known nodes. So now your range queries all have to run against all the nodes, which was what we talked about earlier, right? So if you know exactly what node to go to, awesome. If you don't know what node to go to, then you have to query them all. And now you're potentially wasting processing and maybe it's faster. Maybe it's way more inefficient. You know, you kind of have to know your data. Yeah. If you're, if you're, um, you know, just to expand on that, that example that I gave earlier with the Lamborghini, if instead of your index being based on, you know, letters of the alphabet and you just simply being able to pull all the L's, you know,

Starting point is 01:11:48 immediately start there. If instead, um, your, your keying mechanism was based on license plates, then you'd have to carry every, you'd have to look at every license plate and say, is this one a Lamborghini?

Starting point is 01:12:01 No. Is this one? No. That's a great example. Yeah, it's crazy how that works. And that's why it doesn't work well for databases, right? Because databases, you're typically looking up data by properties. And if it's some sort of random hash property, it's not going to do you much good. They even said, and this is interesting too, they said some databases don't even allow for range queries on the primary key.

Starting point is 01:12:26 So like your hash or whatever. So React, Couchbase, and Baltimore. Oh, you said it. There we go. It's over. Him and Beetlejuice are going to show up now. Dude, I love it that they go so deep in this book. So this is where they say Cassandra actually kind of does it really well because they do a combination of key strategies.

Starting point is 01:12:48 So the first column, they use the first column of a compound key for hashing. And then the other columns in that compound key are used for sorting the data. So that gives you sort of the best of both worlds there. Now, I was trying to wrap my head around what they meant here, though, and I didn't dig into it enough, so apologies there. Because I was trying to think like, okay, well, what does that mean? What if the piece of data that I know and want to return is the first thing? It's kind of lost. You can't sort on that then yeah well i i mean i guess you're probably right so they call it out here they say this means you

Starting point is 01:13:39 can't do a range query over the first portion of of the, or in this case, I put K. Yeah, I have a K. Of the K. But if you specify a fixed key for the first column, then you can do a range query over the other column. So the way that data is stored in Cassandra is a little bit different, right? So yeah, I don't know, man. I'm not sure exactly how useful that is. I mean, I know Cassandra is, it's popular because you can just put massive amounts of data on it and it's horizontally scaled out. Right. But I guess it makes sense that you wouldn't want to query every single node in a Cassandra cluster. To find to find some data.

Starting point is 01:14:19 So if I remember right, that is the whole thing with Cassandra is you're typically going straight to a record or a set of records, and then you can sort it within there. So, yeah, I don't know, man. And, hey, shout out to a past sponsor, because DataStax is all about Cassandra, right? I just haven't had to, like, so as I was reading that part, I haven't had an opportunity yet to dive into cassandra like we have other storage technologies so that's why i was when i was reading that part

Starting point is 01:14:52 i'm like well okay well they gave they gave an example here and i forgot about this so maybe this will help out a little bit so i'm just going to read what I wrote. So an example would be storing all posts on social media by a user ID. So that user ID is your hashing column. And then the updated date is an additional column in the compound key. Then you can easily go retrieve all the posts by that user sorted by the post date. Right. So that makes sense if you think about it like that. Right. Um, especially if you have, if you, well, I mean,

Starting point is 01:15:29 I guess anybody, like if you're a Facebook, you know, you can go to outlaws thing and then say, Hey, show me all his posts sorted. And you can do that fast. You can do the same thing for mine or anybody's.

Starting point is 01:15:38 Um, I guess where I got tripped up on that though, is cause when I thought about like compound keys, maybe they don't mean compound keys in the same way i thought about it because i really and this is might be where i went wrong because i read compound key and in my mind i translated that to composite key oh and so like you know and by composite key um just to like define what i mean by that term is that like instead of if instead of of only having one thing that uniquely identifies the record, maybe you have multiple things that identify it. So it's not enough to just know my first name, but you'd have to know my first name, middle name, last name.

Starting point is 01:16:20 All of those things together might form, you know, be used together as a composite key. And maybe that's not what they mean here in the case of compound key. Yeah, it sounds like it's different. It sounds like the first portion is truly the key to get to that partition. And the next one is to sort that data on that partition. So that's where I got tripped up because I saw that too. But then I'm like, well, then if that's the case,

Starting point is 01:16:49 why would you call it compound key? Because those other parts really aren't technically a key. They're just other attributes of the data, are they not? Yeah, I guess. I don't know. Maybe the difference here is that... I think I'm answering my own question because maybe the difference is that Cassandra, like if you go back to your date example, some of my partition data might be on one partition for a given day

Starting point is 01:17:21 and some of my other data might be on another one for a given day. No, I don't think no yeah i mean because what this implies is you can easily go and find all your posts order by date right because we can get to your key because we can hash your your name or whatever and get to that partition and order that data what you can't do is you can't say hey show me all the newest posts show me the top 100 newest posts because you can't do a range query on that because the things are actually stored by the hash of your name or whatever, or the user ID. So yeah, I mean, that's one of the things about Cassandra that I've always

Starting point is 01:17:58 found interesting. It's very much like a key value store in that regard, in terms of how you get straight to the data. So, well, so I'm positive that somebody listening to this has some serious Cassandra chops and they can explain it to us in the show notes for this episode. If you were to go to coding block, coding blocks.net slash episode one 71 and you could give us the explanation there. And not only are you doing us a favor of explaining it, but you'd be entering yourself in for a chance to win a copy of the

Starting point is 01:18:31 book. I like what you did there, sir. Thank you. All right. So hashing is used to help prevent these hotspots, but there are situations where they can still occur. So we still haven't found the perfect and all be all solution, right? So they gave the example of what if you had a popular social media personality that had millions of followers, such as myself, such as yourself, all your data. If we were to say that whole user ID was on the one partition, all your data is on one partition. I thought you were going somewhere else with that. Oh, I don't know where I would have been going. Because of the way you paused, I thought for sure you were going to say all your data are belong to us.

Starting point is 01:19:17 That was like episode 13. That was a long time ago, yeah. That's been a minute. So here's what stinks. Most systems cannot handle that type of skew that skew um which kind of makes sense like you told it hey we're going to partition by this key so that key goes there so what they said here is in the case that something like this happens it's up to the application itself so your code to try and fix the skew. And they gave some examples of ways that they do it, like opinion, a random two digit number to the key that would spread that record out over 100 partitions.

Starting point is 01:19:56 Right. So outlaw dash zero one outlaw dash zero two or whatever. So every piece of data that came in, you put that, that random two digit number on there and, and you're doing that in your application code. Now, I don't know if this is, uh, this is probably gonna be controversial. Um, and,

Starting point is 01:20:18 uh, you know, you heard Joe say it here, but, but, but, but, you know, I was thinking about this. I was like, you know, in this scenario, there's nothing to say that, you know, one strategy has to be your end all be all for every use case. And so maybe your, uh,

Starting point is 01:20:45 celebrity data or whatever that, you know, that has like the high, uh, follower count or whatnot, maybe they use, they're often their own set of partitions that have their own strategy. And, you know, you and I are in a different strategy. I mean, you know, it's, it's a, it's a big one. Don't worry. and you know you and i are in a different strategy i mean yeah it's it's a it's a big one don't worry but you know that's right yeah we'll be we'll be okay you know but but but honestly though like think about it you know like there's the 99 percenters and then there's the one percenters like in terms of like uh the the follower counts and whatnot, right? That, I don't know, maybe it makes sense,

Starting point is 01:21:28 but also maybe it's probably a horrible idea. I mean, look, anytime you have like a fork in the road in your code logic, we both know, it's really hard to maintain and hard to reason about, hey, why did this happen over here? And oh, going back to this, appending the two digit number, that's amazing for spreading out your rights. But then you introduce the problem that we talked about before to where, Hey, if I want

Starting point is 01:21:52 to see all outlaws posts, because you know, the whole world wants to see them. You now have to query 100 nodes to get all of his posts as opposed to the one node that you would have had to have done earlier. So it's a trade-off, right? Like again, you really, we've said it multiple times. You have to know your data and the use cases and how that stuff is going to get used. I mean, even this scenario, this option here,

Starting point is 01:22:19 as well as the one I just described you know, both of them have this one pain point of like where when you decide that like there's this uh inflection point where that person crosses a threshold and now you need to move them to this other strategy right well do you do you just forget the old data or do you move it along too? I mean, I'm assuming that's what you do, right? But, you know, your use case may vary. So, you know, you would have that kind of a problem with the migration thing.

Starting point is 01:22:55 So, you know, like in my mind, as I was reading that part, I was kind of thinking of Twitter, you know, specifically, right? Because, you know, you create your Twitter account and, you know, things are slowly building over time. And then, you know, eventually you start gaining momentum and blowing up. Right. And, and then now, you know, Oh, we got to change the whole strategy for how we, we handle Michael's, uh, Twitter account, because, of course, it's a big deal.

Starting point is 01:23:28 I don't know if you're following. So, I mean, I specifically had – do you do that? Like when you're reading something, do you like immediately try to relate it to something you already know as you're reading it? Oh, totally. Like the Twitter thing is great because there's a, there's a big difference. Twitter pops into my mind a lot because when you look at how the system operates, it's gotta be super hard because you have the,

Starting point is 01:24:00 the feed, right? Where you just get new get new posts that come in nonstop, right? And that's one sort of data. That's just the timestamps when they come in, right? And then if, like you said, you click on Outlaw and you want to see his, there's no way that data is indexed the same way that that main feed is because sorting by a timestamp is way different than sorting by somebody's key and then timestamp. So we've talked about this in the past too. Like a lot of times, these systems that you interact with, especially when the data gets large and you need things to be fast, that data is stored in multiple different formats and multiple different technologies

Starting point is 01:24:44 to give you what you want. Like so. So we just talked about clicking on your name to see your post. You have your main feed to where it just comes in. What if you do a search now? Right. Like that's also a different storage mechanism more than likely. So, I mean, it's just.

Starting point is 01:25:01 But even to create that feed, though, specifically to Twitter, could be a big deal. Because whoever the celebrity is, if everybody's following Dwayne The Rock Johnson, for example, let's say, for example, I don't know how many followers he has on any given platform, but let's just say, for example, to pick on him, if he had like 5 million followers, that's probably too little. If he had, you know, 50 million followers, I don't know what's realistic here. So, you know, that means that like as you're pulling up your feed as Alan, right, as one of those 50 million followers you know you're wanting to query his you know index wherever his data is but so are the other 50 million followers that he has too right so you know that that's where like some of this can matter especially in like you know you think about the hot spotting that we were talking about you know with, with the, the skewing, you know, like the reads to his,

Starting point is 01:26:05 his updates are going to be a big deal, right. Because of how many followers he has. Um, so I don't know, like, uh, it was just, as I was thinking through, like in the, in the Twitter examples, you know, like, well, if you had to develop a Twitter today, cause do you remember, like, here's something like, this is to Twitter's credit, for example, something that's kind of amazing to think about. Do you remember it used to be pretty commonplace to see the Twitter fail whale? When's the last time you saw the Twitter fail whale?

Starting point is 01:26:42 Right. Exactly my point. Yeah. I can't tell you when the last time I saw it. So credit to them for what they did here, right? Because that's, you know, and obviously other social platforms have similar types of challenges that they're trying, problems that you're trying to solve, but

Starting point is 01:27:03 you know, that's where this stuff matters as like how, what your strategy, your, your data partitioning and storage strategies are going to be. Yeah. It's, it's really mind boggling when you start diving into all this stuff. So, um, I think Jay-Z, uh, put in these things down here. You want to, you want to tell us about them, Jay-Z? Yeah. So we had just had a couple examples here of what have been being like the sensor data that we kind of already talked about, the sensor data.

Starting point is 01:27:34 New readers coming in from IoT devices and whatnot and users can view the real-time data of those and pull reports for historical data. And then Michael already gave the example of multi-tenant SaaS platforms based on the customer where you might want to partition it based on the customer. And then we hadn't really talked about this one, but the giant e-commerce product catalog like an Amazon.

Starting point is 01:28:02 And then we kind of already talked about the social media, you know, in the way of Twitter, but I wrote down a Facebook here, you know, like the Facebook users. And,

Starting point is 01:28:11 you know, if you think about the graph of composing that home screen, when you go to look at it, I don't know why, but you just reminded me of Bill Murray from stripes, but that's not a left field. I don't know why it really is. I don't know, but that's not a left field. I don't know why it really is. I don't know,

Starting point is 01:28:27 but that was really funny. Um, cool. So we've covered a page and a half from this particular chapter. Um, we are fast. Yeah, we were fast,

Starting point is 01:28:40 but hopefully we painted the picture for you, right? Like what partitioning is for, why you'd use it. Some of the, fast, but hopefully we painted the picture for you, right? Like what partitioning is for why you'd use it. Some of the, some of the challenges you face in trying to find the right solution for what exactly it is you're trying to do. So, um, obviously a resource we like is the book. And again, go leave a comment on, on this particular page and, and get a chance to win that. It is one of our very favorites. Yep. Yeah, and with that, we head into Alan's favorite portion of the show.

Starting point is 01:29:11 It's the tip of the week. So I'm actually excited about this one. I couldn't find online documentation for this, but I just noticed it one day. So I don't know if you've ever worked in a really large Java project, like an enterprise level Java project that's got like 5 billion files in it. And you'll go to do a search in Visual Studio Code and you almost have anxiety about clicking something and going away from it because you lose your search context, right?

Starting point is 01:29:45 Like, or maybe you need to search for something else. And then you're like, man, I don't want to lose this search. There's something you can do. So if you do in the case of what I'm talking about, if you go into Visual Studio Code, you do a command shift F, which is, I think, find in all files, not just in the file that you're in, like finding all files that are available in your workspace. When you do your search right up underneath the search, there's like a little ellipsis and there's a thing that says, uh, hold on. I will tell you, it says open

Starting point is 01:30:17 an editor. This is so amazing. So it'll tell you how many results. So like the one I'm looking at that I did testing this out on my local here is it'll open up a new file and it'll say search and it'll actually in the tab name, it'll say search colon and whatever you search for. And then in the file itself, it will have the file where it is, the line numbers where the searches showed up. And what you can do is on Mac, you can hold down command. I'd imagine on windows, you can hold down control. It is. And you can actually click on the file and it'll take you there. So you won't lose that search thing. That search tab is

Starting point is 01:30:59 still there. So you can go back to it and then command click on something else. So it is a fantastic way for you to, without losing your place, because that's another thing that I hate about that particular search on the left that shows up is it's easy to lose. Which file did I click on? Where did I go? You can do it in that one file and you can sort of keep track of where you've been. And it's, it's absolutely fantastic. so that is my tip for this one yeah that's pretty awesome i i think the the i've never tried it but i would i think the thing that i would like the most about it is because like um i don't find like i actually go back i'll like navigate back to the explorer and then come back to that search like just click on the magnifying glass and your search is still executed.

Starting point is 01:31:47 And if you did click on any file in there, when you do navigate back to the Explorer and then come back to the search, it's still highlighted. It's like a muted color. But what I think, because I never, like I said, until you just said this, I hadn't tried this. But now that I have, I'm like, Oh man, this is amazing. Because what I think is like a point being undersold here is that like you could execute multiple searches totally now. And like, just open that result in an editor. So you, so if you wanted to like, if you had a need for keeping multiple things and you would know what it is. And, um, you know, you, like you said,

Starting point is 01:32:25 you get some context around the result too. So you can kind of see where it is. So that's, that's super cool. Like I never, I never bothered to click that. Yeah. I,

Starting point is 01:32:36 I, I saw it one day because whatever it was that I was doing was really getting on my nerves. And I saw it and I was like, what is this? And I clicked it. I was like, Oh man,

Starting point is 01:32:52 game changer. Yeah. That is, that is amazing so yeah that's that's mine all right well super cool uh all right so for my tip of the week uh i've got a couple for you so the first one is you know maybe like your mileage is going to vary but I thought it was pretty cool when I found it. So if you have an iPad pro, for example, like, like you've seen those magic keyboards that Apple has for the, for the iPads. And it's not just the, the, the large pros that have them, but there's different sizes, but I'm going to focus on the super $130 keyboard, $350 keyboard, hundred and thirty dollar keyboard three hundred fifty dollar keyboard sir no man yeah yeah this is up there with like you know a moon lander right but but but basically it's like a it's basically turned to your ipad pro like you have like the 12.9 inch ipad pro and it'll make it into like almost a laptop right with a keyboard and trackpad and all that kind of stuff. And like when it opens up, it doesn't, unlike other cases where the glass is now sitting, you know, on the same surface level as the keyboard instead now it's up higher. So like, you know, your fingers as you're typing aren't

Starting point is 01:33:59 necessarily blocking your view of it or anything. That $350 keyboard right now on Amazon is $243. Yeah. That's why I thought I would call that out because that's a big deal. And for those in the United States, Best Buy often will match Amazon. And if you are a Best Buy rewards member, then you could get your, you know, your points, uh, from that too.

Starting point is 01:34:29 So, or like whatever, you know, other, I'm sure for, for Alan, it's probably Costco for being honest. Uh,

Starting point is 01:34:36 I love me some Costco. Uh, but you do the only thing about Best Buy or most of these places that has to be sold by Amazon and it has to be in stock. This is shipped and sold by Amazon. But it's out of stock. Well, that just happened during the course of this talk because it was in stock at the start of this show. That's a killer price. Yeah. And it's been like that for

Starting point is 01:35:03 a few days now. so i'm kind of questioning like you know amazon might keep it at that like i don't know so now i'm sad that it's out of stock now um so at any rate that was a stupid one then fine whatever uh it's still good if you if you come back in stock that'll be amazing maybe it'll go all the way through Black Friday. Well, let's see. It is in stock in white, but it's only $30 off. Or, yeah. No, sorry, $20 off in white. Technically $19 if we're being more accurate.

Starting point is 01:35:38 Whatever. That's what we do. The details matter. At any rate, and, oh, that keyboard is like good for the third, fourth and fifth generation of the 12.9 iPad pro. So, you know, kind of a cool deal. Um, hopefully it'll come back in stock. Now here's the other one. This one's super cool. And I know that you're going to like this. I am honestly trying to learn this thing. So I can't tell you like a whole lot of experience on it yet, but it's called real, it's called room EQ wizard. And the idea here is that if you're setting up,

Starting point is 01:36:14 you know, some fancy pants speakers that this thing can sample your room for you to tell you like it's wrong or it's right, or it can help you to do it. And they actually have like specific, you can provide your own calibration or you can use like a specific microphones that they already have calibrated the software for and other equipment that they've calibrated it for. But this is where my head's at now is trying to use this properly and set up my, my, my room. That's actually very exciting. So yes, I thought you might like this because like, if you like,

Starting point is 01:36:59 I'm trying to think like, okay, cause, cause I wanted, so this thing is for here, let me just for those listening. This is a free software for room acoustic measurement, loudspeaker measurement and audio device measurement. The measurement and analysis features help you to optimize the acoustics of your listening room studio or home theater and find the best locations for your speakers, subwoofers and listening position. So from right there, you can kind of get an idea of what this thing does. And like I said, it's free. It runs on Windows, like since version, I think it was XP. So if you're still running Windows XP, first of all,

Starting point is 01:37:40 I'm going to give you a link to some hardware that we've talked about for maybe running Windows 10. But also everything from Mac OS 10.11, which I forget which one that is, up to current Mac, as well as Linux. So point being, a variety of different platforms. Like surely you use a platform that this thing can run on. And, you know, yeah, it's super cool. Like I said, you know, I'm still learning it myself.

Starting point is 01:38:18 So, you know, you can learn it with me. I like it. Or I'm sure there's like, you know, some listeners are like, oh, that's with me. I like it. Or I'm sure there's like, you know, some listeners are like, Oh, that's old hat. I already know it.

Starting point is 01:38:32 So, yeah. So, uh, you know, we hope you've, uh, learned a lot about partitioning.

Starting point is 01:38:38 I know that, uh, we have been thoroughly enjoying this book. Like Alan said, we will definitely have a link to this in the resources we like section. And, um, if you haven't already, you can subscribe to us on all the major platforms, wherever you'd like to find your podcast, iTunes, Spotify, Stitcher, uh, you know, wherever, uh, in, in, if you happen to be listening to this on, uh, you know, not your preferred platform, you know, let us know and we'll figure out like,

Starting point is 01:39:09 you know, what we can do to get on that other platform. And, you know, like I said earlier, we do greatly appreciate the reviews. So if you head to www.codingblocks.net slash review, you can find some helpful links. Yep. Hey, and while you're up there at CodingBlocks.net, make sure you check out our show notes. They're extensive. We have examples, discussions, and more. And send your feedback, questions, and rants to the Slack channel, which is amazing. So go to CodingBlocks.net slash Slack.

Starting point is 01:39:37 If you're not already a member, join it. There are just tons of awesome people up there. Yeah, and be sure to follow us on Twitter at Codenblocks or head to Codenblocks.net and you can find all our social links there at the top of the page. And be sure to follow me on Slack at Joe.

Pet Camera - EBO Air 2

Coding Blocks - Designing Data-Intensive Applications – Partitioning

We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages "no take backs"....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

Coding Blocks - Designing Data-Intensive Applications – Partitioning

We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages "no take backs"....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.