Coding Blocks - Hierarchical Data cont’d – Path Enumeration and Closure Tables

Starting point is 00:00:00 You're listening to Coding Blocks, episode 29. Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app. Visit us at CodingBlocks.net where you can find show notes, examples, discussion, and more. And send your feedback, questions, and rants to comments at CodingBlocks.net. Follow us on Twitter at CodingBlocks. Head over to Facebook at Facebook.com slash CodingBlocks. Or head to www.CodingBlocks.net and find all our social links there at the top of the page. And with that, welcome to CodingBlocks. I'm Alan Underwood.

Starting point is 00:00:31 I'm Joe Zach. And I'm Mike Woutlaw. And let's go ahead and get started with some news. First, we have some new reviews and... Yep, big shout out to Sean Lee, Mike Mike, and Dead Iron. Thank you guys very much and thanks for mentioning the bidet I've been trying to forget about that and Mike

Starting point is 00:00:52 Mike that was well done holding us at ransom for the review that was that was a good job well now don't give everyone else an idea what are you doing don't spread that and Sean Lee also I think that was from Stitcher, right? Yes.

Starting point is 00:01:05 They were saying that they were very selective about the podcast they listened to, and they really enjoyed the algorithm one we did. So thank you very much for the kind review and for putting us into your highly selective list of listening material. Yeah. That's awesome. Very nice. Awesome.

Starting point is 00:01:21 We've said it before, and we'll say it again. Thank you guys for taking the time to write those reviews. We really do appreciate it. Yep. So let's get into this. I thought this would be interesting. So we've done a couple of surveys over the past couple of episodes. Let's start with, I'm going to quiz you two guys. Regarding the first one we did, it was regarding the frequency of the podcast, which would you prefer most? And we had a couple options in there, you know, one hour to hour and a half

Starting point is 00:01:52 versus shorter episodes that were released more often in like 10 to 15 minute length segments or a show that was around once a week but 30 minutes in length. Which one of those do you think fared well what about you joe um just gut feel yeah i'm gonna guess uh once a week hour long like this american life yeah that wasn't one of the three that i said but okay but yeah i'm thinking

Starting point is 00:02:19 i'm thinking the once a week was probably pretty high up there and then only because i had seen the original one i was surprised as many people like the format that we had right now which was longer every so so once you once i tabulated everything down and like took some of the free form ones and broke them down into like you know uh one of these one of these categories, right, to just have a smaller set of items to talk about. It broke down into the current format being tied with shows released once a week at 30 minutes in length. Interesting. So the one hour, you both mentioned once a week at an hour.

Starting point is 00:03:02 Well, I said once a week. I didn't specify the hour. Okay. Well, that one was a fairly popular write-in. I guess it was an oversight on our part that we didn't put that in as one of the default choices. But it was a popular write-in. However, it was as popular, the same amount of popular, as the shorter episodes that were 10 to 15 minutes in length interesting so i thought that was a an interesting take that you know our you guys

Starting point is 00:03:34 our audience were uh mostly consistent you know they mostly fell into two very popular camps and then the ones that didn't fall into those two popular camps fell into like two other consistent camps they chose he said it was an even divide among among those so i guess now we need to figure out what our takeaway from that is whether we're going to start trying to do some some more frequent episodes or whatnot i mean uh well it sounds like that there's not you know the sorry joe but it sounds like um you know people aren't gonna be upset if there was some more right um but a lot of people did like the current format too so a lot of people to figure something out yeah yeah yeah we're tied at 43 you know so it's a big drop off to number

Starting point is 00:04:19 three yeah yeah the first two were tied at 43 sorry i didn't get into that and then and then the other two options one of those was a just a popular write-in uh those were it's you know roughly six and a half percent so um yeah it was a big drop so let's get into the second poll our survey which was you know years of programming experience you You guys have any guesses as to which one you think? There were five options. Less than one year, one to two years, two to five years, six to ten years, or 11 plus years of programming experience. If you're listening along, you can play along with us and say your vote out loud while these two guys ponder.

Starting point is 00:05:06 I honestly think that probably one to two years was the most popular. And to be fair, I did not look at these results at all. Thank you for not. I have no idea. Thank you for not cheating like Joe did. But I'll tell you why. Because most of the people who write in to us are the ones that, and I think it might be those people that are more curious because they haven't been in the field yet, but that's why... So you're saying

Starting point is 00:05:26 one to two. One to two would have been the majority. Joe, what do you think is the most popular? I could have flipped the coin. I really couldn't have predicted anyway, but I was surprised to see what the answer was. Okay. Well, yeah. He totally cheated.

Starting point is 00:05:41 Yeah, the majority of our audience is 11 plus years. Really? Yeah, the majority of our audience is 11 plus years. Really? Yeah, the second most popular bucket was the two to five year mark. So it was 33% for 11 plus years and 24% for the two to five years. The one to two year range that you thought would be most popular was the absolute lowest. You know what's awesome? It was actually tied for the lowest.

Starting point is 00:06:06 It was tied at 13% and it was tied with less than one year. You know what's awesome about that to me? It actually excites me because that means that there's a few things going on. One, people are genuinely looking to better their programming experience, even though they have 11 plus years of experience, right? Like, that's amazing. And the other thing that I find really awesome about that is that, you know, you people who are listening right now, you're actually finding what we're doing valuable enough to even, you know, to listen and with that much experience, go up there and take our polls. So I,

Starting point is 00:06:48 I've, I, that's flattering and awesome all at once. Yeah. In the, in that poll obviously hasn't been available as long as the, the one regarding the frequency of the show. So it'd be interesting to come back and see,

Starting point is 00:07:01 you know, uh, over time as time goes on. Um, but yeah. Awesome. Thank you very much for going and filling in those, those surveys guys. That's, that's excellent. I say guys, gals as well.

Starting point is 00:07:12 So everyone, thank you. Uh, so next up on the list. So right before we started recording tonight, we got an email from Lou and he's kind of got an interesting situation because he is he's later in his career and he's getting ready to go back to school from scratch and essentially go through a four-year school to get into programming and he's kind of wondering you know what is this this is a bad thing is this something that we've seen i mean do people actually come out as junior level programmers later in their career and here's my take on it and and both these guys will also fill fill in on this but i don't think there's ever a time to to that's too late to get into something

Starting point is 00:07:59 if you have a passion for it and you and you truly have a drive to do it, the only thing that I think would really ever hold any back is if you financially can't afford to take that step back. If you are a master, I don't know. It sounds like he's fortunate enough that he can. Yeah, if you can, do it. Good for him. My take on it is this.

Starting point is 00:08:22 There is never too many people that do a fantastic job at anything. So if you come out of this thing and you become an excellent programmer, you hone your skills, you're thorough, you think through problems well, you do all that, there's room for you. And it doesn't matter what your age is. If you're somebody who's easy to get along with, you communicate well, and you become skilled at your craft absolutely man and i will say you mentioned not getting into the field until

Starting point is 00:08:52 you know closer to when you graduate my recommendation would be try and start getting work a little bit earlier on even if it's a junior level type thing or go to one of those sites we mentioned previously, like topcoder.com or, or project oiler or some of those things. Yeah. Oiler. Um, so do some of those things because you would be shocked at how much better you'll get and how much faster it'll happen if you get involved in doing that kind of stuff. And then you can put those kind of things on your resume like you, or even putting code into like a GitHub project. So I'll let these guys carry on with this, but that's my two cents on it. Yeah.

Starting point is 00:09:37 I mean, so as a 21 year old, I will say that, you know that my take on it is that it's definitely – I feel like there's never a wrong time if you feel like you need to reinvent your life. I don't feel like there's ever a wrong time to do that. If you feel like you need to do it, if it's something that you want to do, if it's something you're passionate about, go for it. What frustrates me, though, and the problem is, is that there's, you know, there have been people that have put data together to form, to back it up. And I'm not saying like, I've bothered to investigate how accurate the data is or whatever, but like, here's an example

Starting point is 00:10:20 that I quickly Googled that, that bothers me. And it's a quote, it's maybe a well-known quote from Mark Zuckerberg. And he says, I want to stress the importance of being young and technical. Young people are just smarter. Why are most chess masters under 30? I don't know. Young people just have simpler lives lives we may not own a car

Starting point is 00:10:46 we may not have family simplicity in life allows you to focus on what's important and i kind of understand what he's getting at but it's frustrating to say that like well if you're not in your 20s then you can't solve technical problems yeah that's not so much an issue of getting a late start as it is, uh, just not being the age anymore. And it kind of implies that people are going to age out of the peak of their career early.

Starting point is 00:11:12 Well, and that's kind of like what, you know, going back to Lou's email though. I mean, he was talking about like companies going after, uh, you know,

Starting point is 00:11:19 being willing to hire someone, um, you know, in their upper ths to early 40s was the age range that he had questioned. And that's why it bothers me. Like when you see, here's the head of a major dot-com company, right, making a statement like that, and you're just like, wow.

Starting point is 00:11:41 Yeah, what's he say behind closed doors? Exactly. I'm sure the official HR legal answer is that, no, we'll take anyone. But he's kind of vocally saying, being quite public about it. I will say if you take it a step further than that, I have seen and heard of companies, like when you start getting closer to retirement age,

Starting point is 00:12:01 when you're in your 50s, then it becomes a question, right? Like, do we want to bring on this guy that we know that in three years getting closer to retirement age when you're in your 50s then it becomes a question right like do we want to bring on this guy that we know that you know in three years he's basically going to be leaving that's always a question i think the age range that he's talking about right here that he'd be in that's not usually the question but there are companies like the facebooks of the world that think that you know hey um the younger generation is where it's at probably for several reasons one they're cheaper two they they don't have all the other things they don't have families that are in the way so

Starting point is 00:12:31 they can work 16 hour days and they think it's cool that there's a playstation 4 in the break room right i've definitely been in those environments where they're like oh my god they let us spend the night here and right you know we got xbox yeah and so i mean it's it's a different there are definitely places you're going to walk into that all have different vibes but again i will go back to like there there is always a place for somebody who is good at what they do and fits on a team well because there are plenty of people out there who are not good at what they do and don't and don't meld into a team properly and i agree and that's why i worded it the way the way i did was that if it's something you're passionate about and you want to reinvent your

Starting point is 00:13:14 life you know go for it yeah you know i mean don't just do something you know half-heartedly because you think like that it might be better it should be something that you're truly interested in so that you'll have that drive to want to continue bettering. I mean, I look at it this way. I've been doing this career for quite a while, and I still study new topics all the time. So just because I'm not submitting homework doesn't mean I'm not still studying. Right. Has to be something you're passionate about.

Starting point is 00:13:51 Yep. And also want to stress the importance of networking, you know, especially with, you know, a resume that's going to look a little different than the people you're competing with. to go out there and shake hands and get to know people at meetups and also being able to show something, some sort of side project or some sort of something on GitHub or somewhere else just to kind of show that you do have a passion for this and that you're for real. That's a really good point. I mean, even someplace like Stack Overflow, right, where you get cred for answering questions or that kind of stuff. Yeah, that can never be stressed enough is the importance of networking. It, it really is underutilized by a lot of people, especially in the tech industry. So yeah, excellent, excellent email. Excellent question.

Starting point is 00:14:37 Hopefully we've given you something to walk away with positively on that. And the next thing I wanted to do is in going into the whole education thing. It's been a few years because I am also 21 since I took the course on algorithms back in college. And I, I don't know if you guys know about this, but there's, there's sites out there to where you can actually take university level courses for free. And one of the sites out there is called Coursera. And there is an algorithms part one course that I'm just going to take, uh, that starts in July,

Starting point is 00:15:14 I think in just a few days. So I, we're going to leave a link in the show notes and anybody else who wants to join me and maybe, you know, see, see how well they do in algorithms. It's going to go over things like big O notation.

Starting point is 00:15:26 What's also important to note, though, that for those that aren't familiar with Coursera, this is a Stanford University class. It's major. And you also get, it's not like you get college credit for taking it, but you actually get a certificate of completion and that kind of stuff. Like it's real. You have to do assignments. They get graded. Um, you'll have quizzes. You'll have, you basically watch videos throughout the week. And I think they said that it's about six hours a week of coursework and it

Starting point is 00:15:56 lasts six weeks. So anybody else who wants to polish up on their algorithm skills, uh, you know, come join me on this. And like I said, we'll leave the link in the show notes. And, and one cool thing about it too, is any of you guys who are interested in interviewing at the Microsoft's, the Google's, the Amazon's of the world, something like this is fairly much crucial.

Starting point is 00:16:17 So, you know, that's it. Yeah. Great. All right. Also, I wanted to mention that I got the

Starting point is 00:16:27 notes out of order a little bit here but independent 6 is out we actually just heard so I think they fixed a couple of your gripes Alan they did which is awesome so the two actually one of them that both Outlaw and myself shared

Starting point is 00:16:43 was theming so independent 5 or yeah it was five right yeah independ five did not honor your theme you could change it at dark you change it whatever you wanted it didn't care it was just going to stay with its um white theme and then the other thing was the whole double clicking on the column dividers in a grid it would just collapse everything down to nothing they fixed that that as well. So, so these guys listen, I mean, can you say that about a lot of companies out there writing software that you use day to day? No, it's cool. It's a coder tool written by a coder, you know, so he eats his own dog food and that's really cool. Yeah, that's most excellent. So if you guys

Starting point is 00:17:21 haven't checked it out and you, you need a static analysis tool to tell you how good or bad your code may be and areas to improve upon definitely check out independ and if you're curious to hear our previous uh review and conversation about this this product you can go back to episode 15 it was titled independence on how good your code is yep very nice okay so i think that brings us to the topic so this week we're talking about on how good your code is. Yep. Very nice. Okay, so I think that brings us to the topic. So this week we're talking about other solutions for storing hierarchical data.

Starting point is 00:17:52 In episode 28, we mentioned two, adjacency lists and this and that. So we're going to be probably comparing back to those a little bit. So feel free to go back an episode if you want to catch up on that. And this time we're talking about two new strategies for solving the same problem or similar problems, uh, path enumeration and closure tables.

Starting point is 00:18:13 Yep. So let's go ahead and get rolling on this one. Path enumeration is also referred to as materialized path. I've seen that written in a few places. So this goes both for the same. And this one's actually fairly simple. If you think about like a bread chrome, that's essentially what this is. So if you have a hierarchy of items, which again, continuing on the hierarchical path here, then essentially the way the data is stored is it will be your root node slash the sub node slash the

Starting point is 00:18:48 sub node slash the sub node or and it doesn't have to be slashes it could be periods i've seen them pipe delimited i the delimiter is not important it's just the fact that it's basically stored in a string parent slash child slash grandchild slash etc great grandchild yeah and so the concept's super easy i mean you guys have probably seen this stuff written out i mean it's very similar to what you see in your windows explorer right if you go to your drive you see c colon and then a folder and then another folder it's just it sub all the way down so i i mean other you know in all of our when we were prepping for the show like that analogy that you just came up with windows explorer never once came up and it is the most beautiful way to visualize what a materialized path or path enumeration is it is exactly what it is colon

Starting point is 00:19:41 slash program files slash microsoft visual studio 12 slash common slash ide slash okay so i've gone there a few times so if we imagine like a database of all our you know representing our file system basically then i would have a record for say you know my host file and associated with that i would have a field that contains the full path to that item and that's really nice because i can say things like select star from files where path like C program files. Yep. Right?

Starting point is 00:20:12 Yep. And then also you could even take it a step further. Now here's where it kind of sucks though because indexes don't play nicely with this. But you could actually say where like hosts, right? And then it would go get you all the paths that have a host file as the very last string in there. So, so you could do that again. SQL indexes usually aren't smart enough to be able to, to reverse up like that. You'd actually need a full text index or something to,

Starting point is 00:20:42 to make that fully efficient. But yeah, I mean, it's really easy to query. It's well, but see, here's the thing. And I was really hoping that maybe one of you two guys would have like a, to be able to talk me into it.

Starting point is 00:20:54 Cause I've never been a fan of this, you know, anytime I've seen it in a, in a table for any reason where it was trying to, you know, the purpose was to just flatten out the hierarchy and I've never been a fan of it.

Starting point is 00:21:11 You say that it's easy to query and I would agree that it would be easy to write the query, but from an optimization or from a performance point of view, you're taxing the server because it's doing a bunch of string manipulation to do that search.

Starting point is 00:21:31 So while the query might be easy to write, the work that the server is doing isn't as easy as other hierarchical data models that we've talked about. Yep. It's not normalized at all. And there's no integrity. There's no referential integrity Yep. It's not normalized at all. And there's no integrity. There's no referential integrity checks. It's a string. So my question to you guys then is,

Starting point is 00:21:51 talk me into why I would want to use path enumeration. I won't. Yeah. Yeah, I mean, it is convenient. I've done it. You know, that's fair. I asked for it. Yeah, there's been times I've done this before.

Starting point is 00:22:06 I consider it an anti-pattern in databases, but there's been times when I've popped a common delimited list into a field for convenience, and sometimes that's all you need to do. And it's not great. It's not scalable. It's a lot of extra data, and it's not great for – there's no integrity. Indexes are kind of yucky, but it's quick and dirty. Well, and I'll give you another thing that if you take it a step further,

Starting point is 00:22:30 when you just start talking about database architecture and how it works behind the scenes, if this list gets big, now you're going to have your data splitting across page files. Like your performance can really start taking a major hit. Oh, what if you decide to change the middle of that structure? Yeah, and that's... You lop off one of those tree nodes, and now you have to go back and rewrite all those strings. So here's why I will not ever talk anybody into doing this one.

Starting point is 00:22:58 If you look at it from the perspective of maybe a product catalog in an e-commerce situation let's say that you have 10,000 products that all existed under some subcategory and all of a sudden you're going to get rid of that subcategory and you're going to put it under another one typically the way these things are maintained are through triggers so if you change what category it belongs to it's going to go back through and automatically update all those fields. It is horrible because now you made that one change where you just moved all those products from 1.3. This is you trying to talk me into it?

Starting point is 00:23:32 No, no, no. I'm telling you why I would never. Oh. So if you moved all these 10,000 and something else, it's now going to update all 10,000 of those products. And it's going to have to re-query to find out the entire structure for each one of those to then update it and the problem is okay that sounds bad enough but when you start introducing that into something like a transactional system you start locking those tables and so everybody else is trying to access that data is now shut out so i don't know

Starting point is 00:24:02 i don't know did we give like an actual concrete example but I don't know if we did or didn't, but let's just say for simplicity, you have three records, okay? A record one, ID record one, ID record two, and three. And one is the parent, two is a child of one, and three is a child of two.

Starting point is 00:24:23 So your path enumeration might look like one slash two slash three on record three on record three and record two would be one slash two. Yeah. And record one would just be one one. Correct. Right. So, so there's this column that could be on the same table as those records or it

Starting point is 00:24:43 might be in some other column, but or some other table. But the point is, is that all of that data would be in the same table as those records or it might be in some other column but or some other table but the point is is that all that data would be in the one column so kind of like how we talked about adjacency list where every record knew its immediate parent in a path enumeration every record knows its entire lineage that is the one and only pro that I can come up with. Oh, there's actually a really good one. A huge advantage over nested sets and adjacency lists is you can actually have multiple lineages.

Starting point is 00:25:16 So if you break this item out from its lineage, you can actually have a breakout table and have multiple lineages. So in a file system example, like we talked about, you could have linked files or soft links. In an e-commerce example, if you had a gaming mice, you could have that under electronics, computers, accessories, mice, and you could also have it under toys and games.

Starting point is 00:25:40 Okay, okay. That's cool. There's another example. All right, so fine. Multiple parents and cool. There's another tree. So, all right. So fine. Multiple parents and every child knows its entire lineage. But what absolutely sucks about it is let's say that you do have this lineage and it's five deep,

Starting point is 00:25:56 right? And you want to know, instead of just the ID of it's of the middle category, you want to know the name of that category. You don't have to parse that string. Oh, right. You have to split that string on whatever the delimiter is and say,

Starting point is 00:26:10 okay, now I have to assume that's an ID. And I say assume because there's no referential integrity there. Right. And now you're going to have to say, okay, that was supposed to be a number. Let me try and join over to this category table to get it and hope that nothing blows up.

Starting point is 00:26:23 See, here's, in my mind, just, and this is true for any type of code that you're ever going to run into in your life. In my mind, if you're doing any kind of serious string manipulation in order to do something like you have to manipulate, manipulate some string and start, you know,

Starting point is 00:26:40 you're running a regular expression against it to start getting out tokens so that you can decide what you need to do or, you know, what actions to take. Like that's a serious code smell. Something's wrong. And, and that's,

Starting point is 00:26:51 this is what path enumeration feels like to me is that just like you said, you gotta have some regular expression to break that apart into its separate tokens so that you can get what's the, the third, uh, you know, ancestor above this particular record right it feels wrong yeah and it is i i would never ever ever recommend doing this so so when would you but that's the point is that there's there must be some value to it or else it wouldn't be

Starting point is 00:27:22 a thing right no one would do it it wouldn't exist unless somebody found value in here's a particular use case where path enumeration will just rock your world i'll give you a few if you don't have a database system that supports something like a cte or any kind of hierarchical data set type stuff my sql is an example you can basically you can get away with doing this because let's think about it in SQL Server in the olden days before you did have CTEs. The olden days. The olden days back in 7. The way that you would do this kind of thing is a lot of times this is what you'd see because there was no built-in

Starting point is 00:27:59 way to do it and the only way to do what you needed to do was to write some sort of stored proc that would loop over things with a cursor usually and then insert into some temporary table to get your hierarchy. So, Joe, I think what he's telling us, and you tell me if this is what you took away from that, Joe, is that this is an outdated way to do it. Yeah, if only there was a way to solve this problem and store these associations but maintain referential integrity. Woo-hoo! All right. So is that your way of telling me that we should segue off of this? I think that was a sagoo.

Starting point is 00:28:36 Yeah, I think so too. Well, before we jump into that, though, let me take one quick moment to say that uh you know you've probably heard me ask for this before but if you haven't already we greatly appreciate it if you would take the time to go to stitcher or itunes wherever you prefer to find your podcast and drop us a review it helps others to find us and we greatly appreciate that and when you take the time to do that, we actually see the results of it. And we really appreciate it. And it really makes us feel good when we read some of these results.

Starting point is 00:29:11 Yeah, it's huge. So please do. And to make it easier on you, you can go to www.cuttingblocks.net slash review and then pick your poison. iTunes, Stitcher, whatever. So please do that.

Starting point is 00:29:22 Also, I do want... We don't touch on this very much but so our show notes like we take pride and put a lot of blood sweat and tears into doing the show notes on this and we try to make it in a way that whether you're using an iphone or or an android device or a tablet of some sort most of those like i i know you guys use the podcast app on iOS. Yeah, the built-in iOS podcast app. I use Pocket Cast on Android. And if you go to the show notes

Starting point is 00:29:52 on any one of the episodes, we've got it set up to where you can click the links easy to go look at whatever we're talking about. Yeah, you can read the full-on show notes straight from the podcast that you download. Yeah, you don't even have to go up to the website. We're not trying to tell you to stay away from the website, but we want to make it easy on you.

Starting point is 00:30:09 We want you to have the information. Yeah, so you can go there. And these polls and these surveys that we mentioned, you can actually have the links there in the actual show notes as well. So if you want to be able to participate but you forget or whatever you know while you're driving just go i'm kidding not while you're driving you could go to the show notes and literally we'll have the links there and everything else so so if you weren't aware that you can see those directly in your in your pod listener uh you probably can just just explore a little bit and uh for those that do like to hit up the site you know or if you haven't already be

Starting point is 00:30:45 sure to hit up some feedback on uh on each episode too you can go down there and leave uh your comments directly to a particular show yep so all right so to uh kind of bring us back to the topic we were just talking about uh what's it called path enumeration crap enumrap enumeration? Oh, yeah. Path. Sorry. Yeah, we're not a fan of this one, but you basically associate the breadcrumbs with your record node, and it's easy for querying, but it's gross to maintain, and it just feels dirty. No, it's not easy to query. It's easy to write the query.

Starting point is 00:31:16 It's easy to write the query. I like that. There's a difference. There is. Okay. Alright, so that moves into, if only there was a better way. And there is. Coincidentally enough, that works with any database system out there. I don't care what it supports.

Starting point is 00:31:35 We're talking to you, MySQL. The closure table method. Wait, is this a JavaScript thing? It could be. So let's see if I can, if I can somewhat put this into, I want to go three levels deep. Like what Michael said a little while ago, just, just to keep things simple, starting off level one is the top level two is a child of level one and level three is a child of level two. All right. The way a closure table works is level. Two is a child of level one and level three is a child of level two. All right.

Starting point is 00:32:06 The way a closure table works is this. It is a separate table. It's off by itself. And I'm going to, in the same vein that, that, uh, Michael told me on the nested set,

Starting point is 00:32:18 I'm going to go ahead and throw in this additional information on this one as well, though, in a closure table, you have an ancestor column a descendant column and then you don't have to but you add in like a depth or a level column and that's easy for querying but essentially what you've got now is this this table is going to be every single combination of every record and its associated descendants or ancestors

Starting point is 00:32:47 that that exist period so first off let's do the big o because we actually had somebody ask about big o notation on the nested set model and we went looking for it i think out he's gonna go look for it while i while i talk through this i forgot to i forgot to even mention that at the start of the show so with here's here's the one downside to a closure table is given big o notation it could actually be o n squared all right that is absolute worst case scenario which almost never happens because that means everything would always be at the very bottom level but here's how it. But that's a pretty bad one, though. It would be bad, but it doesn't ever happen that way.

Starting point is 00:33:30 O of N squared is bad, but it won't typically ever happen that way. It's usually quite a bit less than that. So just to make sure I understand it, we're talking about basically storing the association between my node and everything in the hierarchy. So if we're looking at the 1, 2, 3 example,

Starting point is 00:33:48 that means I've got a row in there for 1, 3, a row in there for 1, 2, and a row in there for 2, 3. Well, let's take it all from the top because you missed a few. So you're going to have 1, 1 to itself with a level of 0. Then 1, 2, level of one,

Starting point is 00:34:07 because that's his child, then one, three, level of two, because it's his grandchild, but then you're also going to go down to the next one. So hold on, because so far, we've only described the hierarchy for one part of the node, and he's already added three records.

Starting point is 00:34:24 Where the ancestor is one right so so all these were ancestor one all right now if you wanted so now you've also got to do this for ancestor two and three so now you're also going to have a record for two two level of zero because it's itself two three level of one because that's itself to 3 level of 1 because that's it's child so now wait we had 3 records now we're at 5 now we're at 5 records

Starting point is 00:34:52 now for that bottom record number 3 you've also got 3 3 level 0 so now you've got 6 records for that 1 child all the way at the bottom now it sounds bad right It does sound a little bit bad, but here's the deal. The reason O N squared, if you assume that everything was always

Starting point is 00:35:14 at the bottom, that's why you get N squared, because basically it would have to do it for everything up the chain. You're not always going to have everything down at the bottom level. So that's not going to be it. It's going to to it's definitely going to run into a decent number of records but let's talk about why it actually is cool this way first off you're storing typically typically if you're doing guids you probably just go shoot your server right now but typically you're storing integers right for referential lookups so querying these things is uber fast, incredibly fast. Inserting is actually pretty easy. And assuming that you don't have a crazy number of hierarchy that's 100 levels deep,

Starting point is 00:35:59 then the inserts are also going to be pretty quick as well. That's the problem, it seems like, with the closure table, though, is that it really is, if you have a crazy tall tree, then this is going to break down in a performance point of view. No, it probably won't break down except for rights. Your reads are going to be still just incredibly fast, but your rights could run into an issue. But again, keep in mind, this table is not unlike the adjacency list or some of these other methods we've talked about. This is usually a standalone table

Starting point is 00:36:31 that is literally ancestor, descendant, level. It's three columns. So, and these columns are typically integers, not always. If you have some sort of key that's not a surrogate key, then it might be something like a guid or it could be a string or something but let's say for for simplicity's sake that all your records were using integer indexes right so basically you have three columns and they're all

Starting point is 00:36:57 numbers so searching this even if this was an incredibly tall table is going to be stupid fast because you can index it. You can index it based off ancestor off descendant, and then you could also tack a level on to each one of those. And now you can query this thing 20, which ways from Sunday and it's going to be just fast. So, so just to put this into different words, right?

Starting point is 00:37:23 If, if in case if it's not already clear. So what this is doing is it's storing every path from each node to each of its descendants. Correct. Right? So there's a couple things that I really like about the closure table. Number one, like you mentioned, simple integers, assuming you're using integers for your ID columns, that you

Starting point is 00:37:47 can easily index. I like the fact that you can get the referential integrity there. That's awesome, right? I like that, like the nested set model, you can easily get an entire tree. Yep. Right? Where this one has an advantage over the Nesta set model, though, is that you can, without having to do any kind of crazy math, you can just easily get, here's the hierarchy. Yep. Right?

Starting point is 00:38:18 Where you basically had three columns, an ancestor column, a descendant column, and a depth column. Yep. had three columns an ancestor comma descendant column and a depth column so whatever your uh whatever node that you want the tree down from you can just easily query say select star from whatever your closure table name is where ancestor id equals the node that you're interested in yep the node on the tree that you're interested in and then you can get all of it from there. Yep. It's, it's real easy. And, and let's just to give some examples of what, what Mike was just talking about. So one of the things that's kind of cool is let's say that you want all the descendants of something. The query is select all from tree paths. Let's assume that

Starting point is 00:39:02 we named our closure table, tree paths, select all from tree paths. Let's assume that we named our closure table tree paths. Select all from tree paths where ancestor equal N, whatever the number was. That gives you all its descendants because you're saying, hey, give me everything where I am the ancestor, right? Super easy. And then if you wanted to order it, you basically say, hey, order by the level. Done. Now you actually have a nice little thing that says, all right, so here was my child, my grandchild, my great grandchild, whatever. Super easy to do.

Starting point is 00:39:36 Now let's take the inverse. Let's say, hey, we want all the ancestors of this particular thing. Select all from tree paths where descendant equal in. So where I am a child, give me everything. Because there's a relationship to every single record walking all the way up the tree down to whatever that item is it's very easy and then you could again just order by the level in this case you probably do level descending to get it going in that order up from you so again super easy now here's an interesting one though let's say you you're going to insert an entry. And so Joe and I were talking about this beforehand. And one of the things that you'll see a lot online are comment systems, right? Like a, like a form or something, because if,

Starting point is 00:40:16 when you have a commenting system, right? Like the first person who comments, he's comment number one. If somebody replies, well, that's a child of that first comment. And if somebody replies to him, then that's a child of that comment. So, so that's why a lot of times you'll see that. Well, one of the things that's interesting is, all right, so you insert that comment into your comment table, right? And let's say it's comment 10. Well, the way that you would insert this into your closure table now, because now it needs to be able to relate itself to every single comment that was up above it in the hierarchy you'd say insert into tree paths and then you'd say your ancestor descendant and level now instead of just a values type insert you're going to say select

Starting point is 00:40:56 ancestor and then whatever that that new common id is so 10 and then do level plus one from tree pass where descendant equal whatever the the immediate parent to that comment was and then that is literally going to insert the records for all the parent records all the way up and then you'd also have to insert itself as well so you'd say insert into tree pass 10 10 10, 0, right? So it's pretty simple. Now, here's the thing. Like we said, if this is a really tall tree or a really deep hierarchy,

Starting point is 00:41:32 that could be 100 records if you're 100 levels deep, right? Oh, actually, it's much worse than that. I just pulled some numbers for fun. Oh, yeah, it would be worse than that. Yeah, it's much worse. And this is where that N squared comes in. So basically, in our simple example of 1, 2, 3 with the depth of 3, there were 6 records. Now, if we had 10 records with that same depth structure, so 10 is a child of 9, is a child of A, is a child of 7, then you end up with 220 records.

Starting point is 00:42:00 Woo! 100 records. But that's not N squared yet. Well, it's actually pretty close. So if you had 100 records, right? Now we're looking at 10,102. Actually, wait.

Starting point is 00:42:16 10 would be worse than N squared. What did you say it was? N is 10. And we end up with 220 records. Yeah, that's worse. Yeah. And that's where like... The depth.

Starting point is 00:42:32 Yeah, because I was going to ask you, if we had to decide... Okay, so we talked about Ness' set model in the last one. Hold up, though. What did you say? What was that other one after 10? 1,000, yes.

Starting point is 00:42:43 Oh, good God. He said, oh, for 1,000 oh geez what that would be i have no idea i do some math i'm gonna say a million uh half a million it's basically it's n times n plus one divided by two it's just a basic summation uh but yeah we're looking at 500,500. 500,500. Easy for you to say. Okay. So here's the part. Before we go too much further.

Starting point is 00:43:12 All right. Does anybody have a tree that's 1,000 levels deep? Right? Yeah. If you do, we need to talk. Yeah, that's 1,000 items with a depth of 1,000. Yeah. Okay.

Starting point is 00:43:24 Yeah. So before we start bashing on the closure table, I did want to come back with one comment that you haven't said yet. And that is that closure tables allow for multiple parents. Yes, much like the other one that we were

Starting point is 00:43:37 talking about. The path enumeration. Yeah, you can do as many as you want. So now let's bash on closure table. Okay. It could be actually many as you want. So now let's bash on Closure Table. Okay. Because it could be actually much worse than this. So in the last episode, we talked about adjacency list and Ness's set models, right? And I think we, at least Alan and I, are big fans of the Ness's set models. Joe's kind of indifferent.

Starting point is 00:44:03 Is that fair to say? Right. So my question, though, would be, I'm going to make you pick models, Joe's kind of indifferent. Is that fair to say? Right. So my question though would be, I'm going to make you pick now, Joe. Because you really care about adjacency list versus Nessa set model, but closure table versus Nessa set model. I'm putting you on the spot.

Starting point is 00:44:18 Yeah, if it's a true hierarchy, I'm definitely going Nessa sets. It's easy to query. It's easier to maintain. But if I need a true graph if i need cycles then i think that uh closure tables is my solution wait you said the nest set it's easier to maintain well i think he means in terms of like the batch process that's going to do it yeah well yeah you know easier is tough to define but to me it's just a lot less data and it's more human understandable it's like every time you make a change, it's not 100 inserts or updates.

Starting point is 00:44:47 No, so the nested set model is... Yeah, you can't do that with a nested set model. Yeah, they're both horrible. Yeah. Thanks, Joe. You know, let's just screw it. Let's just do path enumeration. It supports cycles.

Starting point is 00:45:00 It's simple. And string parsing isn't that hard. And that's probably why it's a thing that concludes this episode thanks all for listening right right no so let me hear your take on it alan so same question to you here's my thing i do closure pretty much just about every time the only thing that would drive me away from it is if I knew I was going to have a tree depth of of greater than I don't know 50 if you got if you got there then it starts becoming a thing because here's the deal if if it's performance that you need but it's transactional I don't think there's even a question you have to do the closure if it's something that you can do in batch like we talked

Starting point is 00:45:45 about with the nest set we've both done them before they are super fast to query and actually very easy to write queries against unless you're doing inserts and updates and whatnot my thing is space is really cheap nowadays right and ssds are on the on the market and people are using them in servers so well let me let me clear up one thing because when we're talking about the big o SSDs are on the market and people are using them in servers. Well, let me clear up one thing because when we're talking about the big O notation, we're talking like O of N squared rows for closure table, right? That was the worst case in terms of the number of rows. So when we're talking about big O notation, we're not talking about in terms of complexity to create it or to query it, right?

Starting point is 00:46:24 So that comes back to a whole nother conversation right good point so so if we're talking about in terms of space since that's what you're talking about then nessus set models is less way better it's going to be oh event yeah it's going it's going to be whatever it's one to one but it has this many rows you got this many rows but it does not support multiple hierarchies and you can get cyclic uh issues no no. You can have multiple hierarchies within a Nessus set table. What you can't have is multiple parents from a node. Multiple parents. My bad.

Starting point is 00:46:50 So both closure tables and Nessus set models will support multiple hierarchies can coexist in that same table. But the parents. Except closure table will allow multiple parents. Parents. And here's one other good thing. So we talked about before like if you're trying to batch process something like a net set model and there's some sort of you know something references a parent that that it shouldn't have you get into the circular dependency it's going

Starting point is 00:47:16 to crash everything out right you don't necessarily have that same issue over here you build those records on the fly and then it's done. So querying it, I don't think you run into that same circular breakdown that you would with a nest set model. But here's my thing. Space is cheap. The nested set model, or not the nest set, the closure table is easier to insert records into, easier to delete from, easier to update, and super stupid, easy to query. And it's a very small table and you don't have to batch process it like in my opinion it wins it pretty much across the board except for space if you're talking about space and space is at a premium then then this may not be your key. But in just about every other situation, it seems like it wins.

Starting point is 00:48:08 Okay, so I'm going to put you back on the spot again. Let's say you have 500,000 records. That's a lot of records. You were talking about a tree, if your tree was 1,000 deep. But what we're talking about here is it doesn't necessarily have to be a deep tree, right? So you have 500,000, whether that's,

Starting point is 00:48:31 um, 500,000 products or in the, when we talked about Nessa set models, we talked about, uh, employee hierarchies. So it could be 500,000 employees. You have 500,000 records.

Starting point is 00:48:42 Which one? If there are if they're multiple parents then there's no question right like you basically gotta go that way but let's say if it's five hundred thousand what do we say in the level is five deep unknown you gotta have at least some sort of you're walking into it you don't know you have some process you're either going to create a process that's going to have at least some sort of cap. You're walking into it. You don't know. You have some process. You're either going to create a process that's going to look at some data and flatten it into a closure table or it's going to flatten it into a nested set model. So the first thing I would do is I'd find out how many

Starting point is 00:49:13 levels deep it was. And then after that, after that, I would say, I mean, I would probably and this is. Go back to your little calculator that you had there, Joe. What's 500,000?

Starting point is 00:49:30 No, no, but that's levels deep. That's not fair. So in all fairness, like truly what I would do. No, no, no, that was supposed to be based on the number of rows, not based on the depth. Oh, come on. You said an employee. Are you going to have 500,000 people and they all report to the person above them? Yeah, that's not going to happen that's not gonna happen no no but what i'm saying is uh wasn't the o of n squared it wasn't refer n wasn't the depth was it it was the number it's the number of rows but it usually

Starting point is 00:49:54 assumes that the deeper you go the more rows it creates but but at any rate so it honestly trying to solve that problem what i would do is the thing, I would probably try to do the closure table, find out how many records it ends up with overall, given whatever depth, unknown depth and unknown number of reports at whatever level. I would do that. If it only hit a couple million rows, I'd probably stick with it. 500,000 squared is going to be 250 billion.

Starting point is 00:50:23 Yeah, that's not going to happen. Yeah, I have the number about it. I just don't know how to say it. So, actually, I got it. Okay. 125ity billion. Zero, zero, zeroity, deep, zippity. I feel like a Googleplex is going to come in.

Starting point is 00:50:44 You can't Google how to save this number. But, I mean, seriously, that's like one of those things to where you don't know what's going to fall out until you do it. So the way that I would approach it is I would probably attempt to close your table first, find out, hey, what's the damage, right? What's the amount of space this is taking up on disk? After I index it, how fast does it run? Because, let's be real right any any database where i feel like the the advantage to ness's set models assuming you don't have multiple parents this is where the advantage to the ness's set model comes in because i don't

Starting point is 00:51:15 have to know all that i don't have to take the time to say like it's also been for rows let me figure out how many levels there are let me let me do a test run and see like how long did this thing run how many rows did it create how much space does that take up in fairness that would be another thing that i would take into into consideration as well is hey is this something that doesn't change is it not transactional if it's not transactional maybe then that's what is the way to go because it's going to be faster who cares if it takes two minutes for it to create all the records however with the closure table if it is transactional that's going to be the first thing i look at because i can't i can't wait there are some additional requirements but i wasn't specifying that right so i i mean i like

Starting point is 00:51:56 the idea of being able to using straightforward sql to create update delete and query these things on the fly right that's the beauty of the closure table um whereas with the nested set it's not you you basically do have to create some sort of process to update these things now that being said there is something really nice to be said for hey you have a one-to-one relationship with every single record right that's that's excellent fast and and easy to actually just look at and kind of see what's going on yeah i mean what about you joe did you give up on all of it are you doing a document database structure now yeah i'm actually looking at other strategies for solving hierarchical data. I mean, there's...

Starting point is 00:52:48 And this particular show... Is it called Solr? We haven't... Well, I mean, there you go. That's actually one thing. But we haven't even... We weren't even going to talk about it on this particular show.

Starting point is 00:53:00 But there's always something like a document database, right? To where you can literally store whatever you want as subnodes of something, and then you don't even have to worry about any of it. We should talk about search engines. That'd be a good show, I think. Yeah, I think it would be. I mean, well, let me, so you teased it a moment ago about the big O for adjacency list and net set model. And so what I want to say was that coincidentally,

Starting point is 00:53:29 bigocheatsheet.com was the tip of the week in that episode. And bigocheatsheet.com has all of the operations for adjacency list already spelled out in Big O notation. Nice. So normally when you talk about big O notation, you usually use N, but these were not. So for example, storage for an adjacency list was the O of absolute V plus absolute E as an example. But then there were other ones that were kind of easy, like adding a vertex was O of one or adding an edge was O of one, right? So you're not going to get much better than O of one.

Starting point is 00:54:19 No. Now, this is where my research became more difficult was as it related to a nested set model. What was it in terms of big O, uh, notation for nested set model? And I couldn't find anything out there. But if you think about what a nested set model creates and I'm, you know, I'm open to feedback, but as I was thinking about it,

Starting point is 00:54:53 what I came to in my mind that it, that a nested set model actually ends up being is an NRE tree or a KRN tree, depending on which one you prefer to call it, right? It's not a binary tree, but it's, it's an NRE or KRE tree, which is essentially the same as a binary tree, except they can have multiple nodes at each level. Well, in a binary tree. Okay. Let's be clear a binary tree you have at most two nodes uh per any given you know one per parent right and the left is always lower and the right is always higher so so in a in our tree there is you know zero one and two are valid, but it could also be three, four, five, six, you know, whatever. There's no set number of how many nodes any one branch could have on it, right? And I couldn't find any place where, like, where in big O documentation, you know, notation where this was documented. What I did find, and this is also was already on the Big O cheat sheet,

Starting point is 00:56:05 is that specific to binary search, the operations were more often than not, for access, search, insertion, and deletion, binary search in Big O notation is O of log of n right for the worst case scenarios for those same operations it's o of n right so that's a that's in terms of binary search now both of those scenarios are not bad right um they're you know the worst case scenario is you do one for every. Yes. Yeah. It's not O of one, but at the same time, it's definitely not O of N squared either. Right. But that's just storage. But that's for a binary tree. Right. If you have taken the time to do calculations to calculate what its O notation is, then maybe you have a better answer than me.

Starting point is 00:57:12 That was the best I could come up with is that it's got to be somewhere close to that. I know it's not the same as binary, but I feel like that's going down a good path. But it really depends on... But that's the tough part too, though, because we're talking about SQL, right? It's not a programming challenge. Well, but these could be in-memory structures too.

Starting point is 00:57:39 They don't have to necessarily be... You could have your Nessus set model just as an in memory structure you know in your program right you could have your you could represent your closure table or your path enumeration or your jcc list or your binary search tree like you could represent all of these things as just in memory objects yeah i don't know how it how it translates though it i mean i get what you're saying like if you're going to do that programming wise how you could actually do that search but when you think about

Starting point is 00:58:10 it about how databases work right where they index things so that they know how to quickly find particular elements like with the nested set model you're just saying hey where it's between this and this and so i don't really know how to put that into, I don't know how it equates is what, is what I'm trying to get at. Like that's a very fast operation in a database. And I don't know the complexity because I'd have to, I'd have to dig way deeper than what I ever have to find out exactly how the database is storing that,

Starting point is 00:58:44 that information and how it would equate to doing it in memory and something like C sharp or Java or whatever. So I don't know. Nested sets are stupid fast and they're highly efficient. I don't know what the big O notation for it is. Yeah. So, so I guess I'm putting the request out there that if anybody else has an

Starting point is 00:59:04 answer as to the big O notation for nested set model, I would love to know that. Yeah. So we were actually asked that question on episode 28. So if you go to www.cuttingblocks.net slash episode 28, it's down there at the bottom of the page. If you know what it is, you can reply directly to him we i don't think we have yet so uh anyways yeah that's that was uh that was an interesting thing that came out of that particular episode yep but anyways that's uh that pretty much that wraps up that section of it we've uh we've pretty much gone over in depth all the hierarchical data structures that maybe we're aware of.

Starting point is 00:59:46 Not all. There's a bunch, but they start to get weirder after this. Those are the primary ones that we've looked at. There's other combinations where you combine a Ness's set model with an adjacency list or whatever.

Starting point is 01:00:01 There's some permutations that you can get into. Yeah. Now it is time for our tip of the what tip of the week the weeks oh and i do love somebody left us a review and actually called us out for that that was awesome oh yeah well like you don't call us out every time. Well, you know. We try to get into this. It's funny. We need to change that. Anyways, all right. I like it just the way it is. Tip of the week.

Starting point is 01:00:31 All right. So I'm going first this time. This is Joe. My tip, we actually got feedback on that too, saying our names more often. Tip of the week. So mine comes from Scott Hanselman again. He had a nice tip a couple days ago about using or setting up Chrome incognito mode as your browser default in Visual Studio. So when you hit debug, it pops up Chrome in incognito mode, which is going to give you a new session. It's going to clear all your cookies, all that sort of stuff.

Starting point is 01:01:01 So it's a nice way to get a fresh browser and a fresh session when you're debugging. Well, that's really cool. Huh. All right. You mean you don't use IE8? Right. All right.

Starting point is 01:01:16 So my tip of the week, I don't remember what I was doing that required, like, large data sets, but I was... That's always the best way to start a story. I mean, that's... So there I was. Yeah, right? I mean, I don't know.

Starting point is 01:01:30 Anyways, I ended up coming across several resources for creating data sets. And one of them that I thought was really cool was Amazon.com has exposed some public large data sets. Things like NASA NEXT. There's the Human Genome Project. Common Crawl Corpus. So this corpus of web crawl data composed over 5 billion web pages. So these are all data sets that are freely available.

Starting point is 01:02:03 So if you're trying to mock something up and you, and you need some data to be able to test some things out with, this is a fairly cool little resource where you can go and get real data for free. So that's on a aws.amazon.com slash creepy public dash data dash sets. What's creepy. I said, it sounds creepy.

Starting point is 01:02:25 It's, I mean, it's kind of cool. One of them is the U.S. Census data. Yep, from 2000, right? 1980, 1990, and 2000. Yep, so pretty cool stuff if you ever need some real fake data or some real data to play with for mocking some stuff up. All right, and so here's a get one for you. But I'm going to warn you to use this sparingly.

Starting point is 01:02:53 I was kind of nervous about even talking about this one. But let's say you've done all your commits, and let's assume that you haven't already pushed anything. Nothing has been exposed out in the wild. You've done all your commits locally while you've been working on something and now you're ready to bring in the latest version of some remote branch. Let's just assume it's master and you're ready to package your unit of work up and send it on its way. Well, you can do a rebase and your pull all at one step

Starting point is 01:03:33 so that you can grab the latest from that remote branch, in this case master, and replay your commits on top of it to make, when you do your final push to the remote repository, it looks like just one simple little clean bubble of commits rather than seeing your commits interspersed through other commits that other team members may have done and the commit the command is this get pull dash dash rebase equal true and then origin and master so get space pull space dash dash rebase equal true space origin space master, and that'll do it. That's actually really cool. And then you would do your push,

Starting point is 01:04:33 and then that way when all your team members see your commit that you made when you do your pull request, it'll just look like one clean set of commits that have been built on top of it. Now I say, oh God, please use this sparingly because if for example use it smartly okay let's say that you're working in some branch and you have already pushed that branch to the origin then you don't want to do this. Right. Okay? You only want to do this if your stuff has never seen the light of day anywhere else. Because if you don't, then it's going to end up erasing things that were previously public. And so there'll be no...

Starting point is 01:05:22 Well, not necessarily erasing. I wouldn't word it that way. Okay. What could happen is, let's say you're working let's say that alan and i are working on a on a branch and and we're sharing that branch on some remote repository and so we've already got a bunch of stuff in there and alan has based commits off of things that are already there if i were to come back back and do some rebasing on it, it would change his history

Starting point is 01:05:48 of what he's been creating commits off of. But there's a bigger problem too, though, is that if you have already done this push in this example that we're talking about, you couldn't even, if I did this git pull rebase, git pull rebase true or master command and then tried to push something that has already been published, that push is going to fail because it's going to tell me I have to do a pull first in order to make it match because there's already been history that's been pushed remotely. Interesting. been history that's been that's been pushed remotely interesting so that's why i say like you you definitely want to use it sparingly and only when history has not already been

Starting point is 01:06:31 published externally yeah oh and uh one thing that's important to know is when you rebase it doesn't squash all your commits into one. So any commit messages you make that might be somewhat... You better mean them. Yeah, might be a little offensive. You know, like, I can't believe this. Are you telling me that you used whatthecommit.com to get your best commit messages? I might have gone one up on that. But yeah, just know that all your commit messages will still be there

Starting point is 01:07:06 even if you thought they were just going to be your local copy. The random commit for the evening is its secret. Nice. Yep, that's I think we've hit it all, right? That was our tips

Starting point is 01:07:21 of the week. Wrap it up with a little summary. We hope you've enjoyed the last two shows about different ways of storing hierarchical data. Last episode, episode 28, we talked about adjacency lists and nested sets. Just to do a real quick recap on those, adjacency lists is when your node stored the parent ID, which was simple but scaled poorly.

Starting point is 01:07:43 Nested sets, we were just talking about that, but each node keeps track of its place in a hierarchical tree, and it's fast to query, harder to maintain. The two we talked about this time, path enumeration, you basically store the breadcrumbs in simple text and you parse it out, which is pretty gross. And then there's closure tables, which is similar,

Starting point is 01:08:01 but with data integrity and can potentially, but maybe not end up in lots of extra rows. Yeah, so with that, subscribe to us on iTunes, Stitcher, or more using your favorite podcast app. Be sure to leave us a review as well on those platforms. Tell a friend or three. Make sure you spread the word. We really do appreciate that. Yep, and contact us with a question or three make sure you spread the word we really do appreciate that yep and contact

Starting point is 01:08:26 us with a question or a topic leave your name and however you'd like to be mentioned on the show and visit us at codingblocks.net where you can find our show notes examples discussions and more yeah and send your feedback questions and rants to comments at codingblocks.net and make sure to follow us on Twitter at Coding Blocks. And that is it. We'll be back soon with another episode. Soonish. To celebrate the launch of version six of Indipend, they are offering a 20% off discount through the end of July 2nd. I know that's only a few days away, but if you head over to www.independ.com and you enter in the offer code INDEPENDV6, coding blocks, all one word, then you can get 20% off their latest version of their static analysis tool.

Your Ad Here

Coding Blocks - Hierarchical Data cont’d – Path Enumeration and Closure Tables

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.