Coding Blocks - Data Structures – (some) Trees

Starting point is 00:00:00 You're listening to Coding Blocks, episode 97. Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app. And check us out at CodingBlocks.net where you can find show notes, examples, discussion, and other stuff. Send your feedback, questions, and rants to comments at CodingBlocks.net. Follow us on Twitter at CodingBlocks or head to www.CodingBlocks.net and find all our social links there at the top of the page. With that, I'm Alan Underwood. I'm Jacek. And I'm Michael Outlaw.

Starting point is 00:00:31 This episode is brought to you by O'Reilly Software Architecture Conference. Have you got any plans this February? No? Well, now you do. This February, the 3rd through the 6th in New York City, O'Reilly is hosting the Software Architecture Conference. I mean, you want an excuse to visit New York City anyways, right? So, I mean, you're welcome.

Starting point is 00:00:53 The O'Reilly Software Architecture Conference is the only conference that focuses exclusively on software architecture and the evolution of that role. So if you want to dive into technical weeds by covering complex topics from microservices to domain-driven design with different learning styles available, ranging from 50-minute sessions to two-day training courses. Learn how to communicate, wait, scratch that, sell complex technical topics and their merit to both upper management and technical teams with training courses like the architectural elevator. Well, wait, I'm not an architect yet, you say? Well, a special networking experience called Architectural Kadas is where you can go to practice being a software architect by breaking up into smaller groups and working together with other people on a project that actually needs development. Yeah, network with people using the same tech stack that you use to gain personal insight

Starting point is 00:01:47 that you can apply to your own environment. The O'Reilly Software Architecture Conference covers the skills and tools every software architect needs. Use the code BLOCKS, that's B-L-O-C-K-S, during your registration to get 20% off of most passes to the O'Reilly software architecture conference. All right.

Starting point is 00:02:10 And we're back talking about trees tonight and, uh, tries to, and we'll get into that here in a second. Uh, differences between, uh, yeah,

Starting point is 00:02:19 for a spit podcast. So, um, I'm going to read the iTunes, uh, reviews. Really big. Thank you to sound shop and Tim Graff.

Starting point is 00:02:25 You know, really appreciate that. And from Stitcher, we have K Springs, Trickermand, Delden, and Flow. Very nice. And I've got something that I want to bring up, and I think it was maybe Coding Blocks Episode 74. I don't remember off the top of my head. Of course, you remember the numbers right off the top of your head. Actually,

Starting point is 00:02:46 I'm going to go there real quick and see, but here's the thing. You remember when we were doing clean architecture and we were talking about the whole thing where it was really cool that they coded to an interface and so they could swap out the storage and all that kind of stuff, right? Yeah. It was the story where he never,

Starting point is 00:02:58 he never coded to a specific database and instead he assumed file system and then he just kept going with that assumption until they eventually realized they never actually needed it yes and so they could just plug in whatever they wanted right so it is episode 74 so christian satnick he's actually written in a couple of times and right after we did that episode he's like you know hey i see that you guys were impressed by that he's like but wait a second you know i'm paying for this super awesome technology out here why would i code to some generic interface and not to not take advantage of the technology? That's a good point. And it's funny. So Joe and I just recently ran into a similar situation where it's like, okay, the interfaces that we might have had in place were very tailored towards a specific technology. They weren't written to take advantage of that exact technology, but when they were created, they definitely had that in mind, right? So some of the limitations of what was there

Starting point is 00:03:58 and some of the ways that you know that you'd have to do things were part of that interface and then we were going to switch over to a new technology and it had all kinds of awesome things in it that basically made things easier and faster but in order to take advantage of that you'd have to redo your interfaces right you'd have to redo those things that you were trying to make generic to make them awesome. So, so in other words, the abstraction wasn't abstract enough. Not even that, like there's, there's just a certain level of. Well, part of it was more hat kept it in mind is what I was getting at. Well, it was made in a very generic way. And by doing – so it was very abstract, I guess, going back.

Starting point is 00:04:51 It wasn't that it wasn't abstract enough. It was very abstract. But doing it that way meant that you could not take advantage of the new thing you wanted to plug in. Okay. Right? And it was – we actually had this conversation joe and i did where we were like well crap what do we do man do we do we just code it to this provider interface that we did previously which seems really stupid why even move over to this new technology if we're not

Starting point is 00:05:17 going to take advantage of everything it has to offer or at least a lot of what it has to offer and i i don't know what what have we ended up landing with there? I mean, it definitely made me think about some things because coding to the least common denominator, you start losing advantages. And then, you know, we talked about the example of like the file system database. Like maybe the reason they stuck with file system

Starting point is 00:05:38 was because if they just dumbed down the interface of the SQL database so much that it wasn't worth switching, then it wasn't really worth moving over for. And so you got to kind of include that in your calculations. But it's really hard because when you talk about coding to interface or something like that, a persistence layer, you're assuming that it's like a logical kind of easy modular swap. And it may not be the case if you're going to something completely different. So I think about like if you're making a video game, you don't want to totally abstract around like the graphics rendering engine for example like if you're doing stuff in direct x 11 or direct x 2 or maybe a 2d

Starting point is 00:06:13 engine you know how crazy do you want to get like if you try to make your game so it'll work in a 2d engine or a 3d engine it just doesn't even make sense right so there's going to be technical decisions that you make that either limit or greatly increase your capacity. And so it's tough. And I don't really know what the right answer is. But I think in that case, like in our case, where things just have wildly different capabilities, I think it makes sense to just swap over and maybe treat that new thing and create new interfaces around that so you can swap it out with something that has more of a feature parity with it. I don't know if that's the textbook answer, though. I mean, I was going to say, well, now I lost my train of thought on it, of what I was going to say.

Starting point is 00:06:58 Well, while you get that back. It'll be a minute. So I'll draw two parallels that I, that I think are interesting. So we've talked about search in the past. We did a whole episode on search, right? And we talked about elastic search specifically. Now here's a good example of where that doesn't necessarily translate perfectly. If you look at something like AWS's cloud search and you try and compare it directly to Elasticsearch, the basic search features are there. But Elastic bolts on a ton more, right? Like you've got things like crazy types of aggregation that you can do.

Starting point is 00:07:40 You've also got machine learning type stuff that is built in if you want to take advantage of that. So if you code to the lowest common denominator between those two things, you are basically just cutting off a ton of functionality that you could take advantage of in Elasticsearch versus CloudSearch. Does that mean one technology is better than the other? Not really, per se. It's more along the lines of, hey, they built this one out is better than the other? Not really per se. It's more along the lines of, hey, they built this one out to do more, right? Yeah, I mean, okay, so I remember what I was going to say, but your comment there just made me think that like another example too could be like if you were not even go to like search engines. It could be something as simple as like Oracle versus SQL Server. Yeah. And instead of being able to take advantage of the features that T-SQL might provide for you,

Starting point is 00:08:28 instead you're like, well, I wanted the ability to plug in either, so I'm just going to use ANSI SQL. Yeah. Just to keep, I'm going to dumb down all the SQL statements. So there's inefficiencies that you're interjecting because you're trying to dumb it down.

Starting point is 00:08:42 But the previous thought that I had that eluded me a moment ago, though, it almost makes a case for basically the anti-clean architecture approach, which is YAGNI. You ain't going to need it. Just code it the way you need it. Yeah. Why bother to assume you're going to switch this thing out?

Starting point is 00:09:05 Because the reality is, are you? Were you? Right. And it's hard. It's really hard to make those decisions up front because you don't know. Right? And I do want to point out, like, we had a great thread going here on the episode 74 in that he's also seen this happen in other places. And, and ironically, Joe and I have also talked about this and actually outlaw you and I probably have as well is you'll see this being done in the cloud also. Right.

Starting point is 00:09:31 You've, you've done something on premise. You have an application that works a certain way and, and you're like, well, I just want to make this, this thing, this square peg fit in this round hole up in the cloud. Right. And that's, man, that's not the way to do it. Like if you ever take the time to start looking at cloud services and what they offer you, if you want to get bang for your buck and actually take advantage of what the cloud can give you when you're using something like AWS or Azure, Google Cloud, you want to look at the services they offer, their managed services, and figure out how you can take advantage of it. It shouldn't be, hey, I'm going to go drop my app

Starting point is 00:10:12 in a VM up in the cloud and run it because there's several things, right? You're not going to get the bang for your buck because you're going to pay a lot for that compute. Scaling is going to be an issue. Scaling, you're not going to get that for free because now you've got to manage it all yourself. Whereas if you use some of these cloud services, you get that. So it was just, it was a really great conversation because the same things that apply in terms of code also apply when you start looking at infrastructure, like cloud infrastructure versus on-prem infrastructure and all that. So just I wanted to bring it up because, I mean, it was such a good, you know, we talk about things and I love it when people come back with a differing opinion on things because we all thought it was cool, right?

Starting point is 00:10:56 Hey, man, you know, Uncle Bob Martin created this interface that he could basically plug a MySQL thing in, a file system thing in, and it is cool. But he also had a very simplistic thing that he was dealing with, and that makes the decision a little bit easier. When you start looking at complexities of systems, then you've got to start figuring out. So maybe that's the takeaway here. Maybe that's the deciding factor is how complex is the thing that you're thinking about abstracting? Maybe the more complex it is and the more efficiencies that you could gain from adhering to it, maybe that's when you decide. But then that sounds so counterintuitive because it sounds like that's the exact case that you want to – I don't know, man.

Starting point is 00:11:40 You know what? This is all hard. We're done. It is hard. I think that's the answer though, right? CodingFlox didn't even make it to 100 episodes. We're shutting it down. We're done. Yeah, it is. It is hard. I think that's the answer though, right? CodingFox didn't even make it to 100 episodes. We're shutting it down. It's done. Episode 97.

Starting point is 00:11:49 The conclusion after 97 episodes is this is hard. Right. It's all hard. So, yeah. I think you just have to recognize the decisions that you're making and realize that if you're committing to a technology or committing to a certain framework or feature or whatever, that you just have to know what you're doing and kind of keep an eye on it, I guess. Yeah. Well, making an educated decision is definitely helpful.

Starting point is 00:12:13 But also, if you make that educated decision, don't be afraid when the new one comes up to not look at it and go, you know what? Maybe this generic interface doesn't make sense, right? Like if you're going to be switching to a new technology or whatever, don't just stick to it because you're being dogmatic about it. There needs to be a good reason, right? And I think that's the answer is there are no hard, fast rules and how you should do things. It's what's going to add the most value to you and what is going to produce a nice usable product and something that you can maintain. So I don't know. Yeah, it is hard. So at any rate,

Starting point is 00:12:53 that was my tangent. I had to bring it up because it was such a great conversation and I appreciate Christian like interacting back and forth on that one because this thread's been going for months now. Ever since we had the thing, like he'll have a thought and he'll post it up there and then I'll have a thought and I'll post it up there. Yeah, I've noticed this conversation's going on where there's long gaps where you've been thinking about that one for a while. That's awesome. Let's get into the meat of this episode and talk everything or something about trees.

Starting point is 00:13:26 No? Yeah. And I did a little bit of research on kind of why trees matter. I've kind of been a little scared of them, even though I've used them in various cases, just because they kind of seem like a hardcore design pattern compared to something like a design pattern. Data structure compared to something like an array or something that I use every single day, multiple times a day, like hash tables and array, like day in, day out. Like I'm slinging those things.

Starting point is 00:13:48 Like those are my nails and screws, but trees aren't so much. And whenever I get to a tree, it becomes something that I ended up Googling a little bit about and end up kind of spending some extra time with. And so I've always kind of been a little, uh, little,

Starting point is 00:14:01 uh, we're going to say like respective of them. Is that a word? Hesitant? Standoff-ish, yes. A little hesitant? Yeah, I give them a little wide berth, give them some extra consideration.

Starting point is 00:14:13 And that's because they're one of the most important data structures. And there's certain types of problems that are really difficult, almost impossible to solve without tree data structures. And they're also really efficient for a certain kind of task, so we're going to dive into some of that. Do they all involve

Starting point is 00:14:27 interview questions? Because that's what comes to mind. Absolutely. It's impossible to solve interview questions without an understanding of trees. You're just, but that is so true. If you go to an interview at one of the big companies, big tech companies out there, I can almost guarantee that

Starting point is 00:14:43 there will be some tree questions in there. Yeah. I also know who writes this stuff. They kick it back and be like, ha, try this one. This is a part of my senior dissertation. Oh, man. I haven't thought about it since. And part of the reason it's so hard and part of the reason that you don't really –

Starting point is 00:15:02 I kind of mentioned giving them some respect. It's because a lot of times you need recursive algorithms to work with them. Not always, but they just kind of lend themselves to those types of problems and that tends to come up a lot, which means anytime you've got recursive solutions and you've also potentially got stack overflows, if you're not careful, if your language doesn't support efficient tail recursion. And it's also just can be a little bit difficult to think like that

Starting point is 00:15:24 if you're not used to doing that stuff and i'm definitely not doing recursion as much as i am getting stuff out of hash tables and looping through arrays hey back up there real quick what is uh tail recursion oh gosh uh i'll have a hard time explaining it but so we know a little bit about we've talked about in past episodes about how um whenever you call a function it will uh it will allocate information and size for the frame for basically all the local variables and value types in that function and it adds it to the stack right so the stack trace and when that function is finished it will remove that item off the stack and you'll get that memory reclaimed that's really awesome because it's really efficient because whenever that function's gone, it just cleans it up.

Starting point is 00:16:07 You don't have to worry about a garbage collector or any of that stuff. So stack is traditionally a really fast way of doing things. And what tail recursion means is that there's something in the language that can recognize that the stack frame is unnecessary for the current iteration of the function. It can say, I don't need any data from a previous call of this. And so I can go ahead and do another recursive call. And instead of adding another frame onto the stack, I can just use this existing one because I can see that there's no information that I'm going to need from that frame. And I'm sure I'm butchering this. I probably should have read

Starting point is 00:16:49 up on that a little bit in order to kind of explain it a little bit better. But that's the gist of it is that you don't pop or that's a bad term. You don't push a frame onto the stack for every single call. It's like a sliding window is what you're saying. It's basically reusing that frame as you go down through the – when you're recursing. Yeah, it doesn't even slide, though. It's just the same one. So if you're doing Fibonacci sequence in like a crappy – the standard kind of recursive call, you're going to get frame, frame, frame, frame, frame, frame with all those local variables on there in the stack until you finish. And then you can go and reclaim that memory. So if you're doing Fibonacci number for a really big number

Starting point is 00:17:25 and you're using a kind of a naive basic interpretation or programming, then you're very likely going to hit a memory cap. And that's why a lot of times you'll see like Project Euler or some like Code Wars type stuff or Advent of Code will kind of mess with you a little bit on things like that where if you try to do a recursive solution, you're going to pop the stack. You're going to blow the stack. With tail recursion, it all sticks there.

Starting point is 00:17:50 So some languages don't support it. A lot of languages don't support it. I know Python doesn't. JavaScript I don't think does. But certain languages like Lisp and definitely all the functional languages do support tail recursion. That's a big selling feature because it saves you a ton of memory and it's hyper efficient.

Starting point is 00:18:06 Very nice. Thank you for backing up. What are we talking about? That's episode 97. Yeah. Yeah. We agreed on everything was hard and we're done. Yeah.

Starting point is 00:18:17 I think we can all agree that recursion just makes everything a little bit more mind bendy and hard to work with, right? It does. And you have to not just is the problem harder. It's like you said. You have to worry about things like stack overflows and all that. So it's not just worrying about the problem. It's worrying about what language you're using and what features of the language are available. Yeah.

Starting point is 00:18:40 When there's a recursive problem where you're kind of debugging something, like when you're done, the solution might look really eloquent and might be really easy to understand. But if you made some little mistakes, a little edge case, like you could destroy your whole network, burn your house down. Yep. It can be really difficult to even debug because these things are kind of spinning off in weird ways. So it can be difficult to work with. And Alan, actually, our most popular coding box host ever, has talked about storing hierarchical data in a database. You know what's so crazy about that? I looked at that. I wrote that back in 2014.

Starting point is 00:19:17 It's still number one. Yeah, it's insane. It was a product attribute database type thing. So, yeah, kind of interesting. I plan on updating that at some point in the future. Yeah, and it dealt with the problem of storing hierarchical tree type data in a traditional relational database, which is not easy to do. And that's another reason that makes along with them with those existing persistence mechanisms that are so common on our tool belt, and things get hard.

Starting point is 00:19:50 If you ever worked with product categories or something on the e-commerce site, and the product could be in multiple categories, and the category can be in multiple categories, and things just get really weird really fast, then it's really easy to blow that stack if you're not careful. Yep. There are a couple reasons why programmers like trees, though. Mainly, I think, for me, is that it just models real-world hierarchical data. I can't fix the fact that we need trees.

Starting point is 00:20:15 I can't fix that we have directory structures and e-commerce websites and org structures for companies. I can't do anything about those. I need to model those. And this is, I think, the best way of doing it because every other way is really awkward. But you mentioned in the article, if you're curious how to do that sort of stuff in a SQL database,

Starting point is 00:20:33 and that's a great rundown of common techniques that you can use that are actually really efficient for doing that. But yeah, I have to agree. That's much more awkward than having a node with a left and a right child. Oh, you're actually talking about a different article. Yeah, you're talking about the one where we had the episode about the different ways of storing hierarchical data in a database.

Starting point is 00:20:55 That was using the – Well, that was actually episodes that we did on that. Oh. Yeah, that's not – We did a couple – I mean, there's the database article that you wrote that was just about categorical data. Yep. But then we did a couple episodes way back when on hierarchical data, I think, was the name of the series. Yep.

Starting point is 00:21:18 And it was only like two episodes, so maybe me calling it a series is a bit much. But Ness's set model was one of them. Path enumeration, closure tables. Yep. I think you just hit all of them. That's amazing that you just pulled that out. Yeah, I was trying to remember even the name of one of them. I thought there was at least a fourth one. There might have been.

Starting point is 00:21:38 But yes, so yeah, going back to it. But what Joe was talking about was actually those episodes where we talked about that kind of stuff. And it's pretty cool stuff, and it's different ways to approach it in a database, hierarchical data. So, at any rate, back to what you were saying, Joe. Yeah, so it's really good for modeling certain things in the real world. Also, it can be really memory and processor efficient for different kinds of problems, which we'll get into. And I really like the idea of it being um good good enough for good enough type solutions like if you have a tree problem a lot of times i'll

Starting point is 00:22:10 be using like machine learning or other places where you can get kind of close enough so you can set like something like a max depth or a max number of search if you're looking for some sort of value and you can give up at a certain point so these algorithms really fit nicely with this kind of notion like of um like game trees or something if you've got like a chess engine they're really famous for kind of trying to prune data or trying to give up after a certain level of iteration because it's probably good enough to make a decision on even if it's not optimum and it works out really well because we've got that depth thing that we can ignore all right Just to circle back one quick moment, for those that were curious about the hierarchical data episodes,

Starting point is 00:22:47 it was episode 28 and 29. Golly. And adjacency list was the fourth one that I forgot. Man, it's been a while back that we did that. Those were good episodes too. If you're ever looking at hierarchical data in a database, like we go deep on those. There's some good stuff.

Starting point is 00:23:04 Yeah. And I do want, so one other thing about this, the trees, hierarchical data in a database like we go deep on those there's some good stuff yeah i and i do want so one other thing about this the trees why do they matter one of the things that has stuck with me from the imposter's handbook that i think is probably it might be the most important thing that they said in the book to me was just knowing the these types of things that are available helps you make a good decision when you have to go do something, right? So knowing about trees, if you have a problem that comes up that you typically would have solved with an array or something like that, if you know about the other types of data structures, then you might make a better decision up front and know that the complexity of what you're about to do is much harder or much easier or whatever if you choose the right data structure, right? So I think that's just understanding what trees are and why you use them is going to be

Starting point is 00:23:54 major in being able to do things as you go forward. Yeah, having an idea about what the data is that you're trying to model in your computer's memory, right? Yep. And then that can help you understand what might be the best data structure available to you. Yes. If you're trying to solve a tree problem without a tree, then you're in for a world of hurt. Seriously. Boom, crushing hurt. How would you do that? And pretty much all the solutions that we're going to talk about are algorithms.

Starting point is 00:24:24 A lot of them can be done iteratively, but it's just kind of ugly. The recursion works out really nicely. I got a nice list of common uses of trees here. We mentioned hierarchical data, but one thing I thought was kind of cool is we mentioned doing it in the persistence layer where

Starting point is 00:24:39 you've got categories. We do it in middleware with C Sharp or some sort of language that's running on a server server where you're dealing with like directory structures and stuff like that but also it's really common in the ui where you've got like navigation trees or or something like that you know like control panels that kind of let you expand nodes and do things so i thought it was kind of a cool thing where we have like very data-centric uses for this and also very visual use cases that kind of map nicely. So I thought it was kind of cool to mention.

Starting point is 00:25:10 And interview questions, of course. That's the real number one reason to get familiar with trees. Another cool one I always forget about is actually structured files. So like XML, that's a tree, like an HTML document. You've got HTML with a body tag in it and the body can be made of you know more and more divs like thousands and billions of divs and they all all go in a tree directory so when you're browsing that source explorer whatever in chrome developer tools you can kind of expand those nodes in order to drill in further

Starting point is 00:25:36 same with jason jason jason well jason is the uh the scarier version right uh search engines make use of in a couple cases we'll talk about um uh sorted list of data one thing i thought was kind of weird is a wikipedia listed a workflow for compositing digital images for visual effects does that make any sense to you? Maybe. Because if you're compositing in any kind of visual application, you're just layering things. Yeah. I don't know.

Starting point is 00:26:14 Yeah, I mean, that was kind of a – are you thinking like Photoshop layers? Is that what you're thinking of? Yeah. I kind of had a similar thing in my mind. But I don't see it as a tree. I see more as a layer of things. But I guess if you were to make the... If you go back to the directory structure

Starting point is 00:26:28 on your computer in File Explorer, then the, quote, directory structure in the layers within Photoshop can be the same thing because you could nest all kinds of groups. Yeah, you can. And you can also link various different ones. Let me show you the CodingBlocks logo documents that I've got. PSTs I've got. Let me tell you, brother,

Starting point is 00:26:44 there's definitely a directory tree there to see how that logo is made and all the different variants I have for it. That's awesome. That's awesome. Yeah, so I thought that was kind of interesting. That's definitely not a use case I would have thought of. Router algorithms, if you're familiar with some of those,

Starting point is 00:27:00 remember back to networking class days, if you had that, then trees come in handy. And we're going to go in depth on a couple different types of trees. But I did want to go ahead and mention the ones that I saw in the Wikipedia page just because there were so many. And I was really surprised at all the different types. Just the categories. So of the categories, I've got seven.

Starting point is 00:27:23 Binary trees, bee trees, heaps, trees, which is kind of annoying, but I kind of think of them as like valley trees. So, they're like the standard trees. I don't appreciate them having a type of trees named trees. So, I'm going to put a citation needed tag in there or something. Multi-way trees, space partitioning trees, and application specific trees those are the seven types and i counted up the actual individual items in each category and i came up with 115 oh yeah yeah like it's insane man i mean some of those have like i remember it was uh i think binary maybe there was like 21 different types of binary tree.

Starting point is 00:28:07 Yeah, that's ridiculous. And then when I was doing research. So we're going to start going through all of them, starting alphabetically. Yep. Because that's how all the good things are ordered. Saddle up, people. Hopefully you didn't look at the length of this before you hit play. It's going to be a long haul, guys.

Starting point is 00:28:25 But what's funny is when I started diving into the individual research items, I was looking at tries specifically. We'll get into that. There were, I don't know, 8 or 9 or 10, 12, something different kinds of tries that were all very specific to certain applications. So even this 115 number is way too small. There's way more than that if you start looking at subtle variations right and that's really what they all are right it's just very

Starting point is 00:28:49 specific changes to each one to make them fit a particular need a little bit better but you know i mean it kind of made me question that too though i mean okay so obviously we're not going to go into detail on 115 different types. No. We'll give you enough to like, you know, wet the palette and, you know, you can come back with your own curiosity. But, you know, I mean, in fairness, though, like when we talked about arrays, for example, and lists, we didn't go over all of them for that, too. But there's like two dozen different versions of arrays if you look at the same Wikipedia page. We'll have a link to it in the show notes. But if you look at the same Wikipedia page, there's probably two dozen different

Starting point is 00:29:31 arrays types that are mentioned there. We didn't go into the detail on all of this. It has been nuts, but it's kind of cool to know that that stuff exists. So if you're having a problem with one, you can do a little bit of research and see what else is done. It's also really cool too if you look up common common solutions to problems then you can see sometimes like oh use a sparse reverse index you know try and then you can do a little bit of research on

Starting point is 00:29:53 that it's really cool that we live in a world where you can look up something so specific and niche and find find an example a lot of them are named for people all the way so if you want to get a tree named after you apparently it's pretty freaking easy so just twiddle some bits and you know store a letter instead of a number or something and call it the joe tree now we're going to call it the jam tree isn't aren't you all about jam nowadays i am all about jam yeah but you know thanks but but thanks to dance to die though i can't help but think of jam and think of joe allen michael that's right that's what I'm saying. It has to be that. It's the Jam tree. Glanks kind of ruined my life. So, G thanks G Flanks for that.

Starting point is 00:30:33 Turning me on to the Jam train. But like you mentioned, Alan, the basic structure of a tree is very simple in most cases. You can just kind of imagine like a thin wrapper node that has a value that represents the data you care about and a set of children nodes. And there are other ways. Particularly, you'll see sometimes arrays will be stored in – or trees will be stored in an array. And I thought that was kind of interesting. And I've seen that usually for what they call full trees, which it means basically every branch is taken for sure,

Starting point is 00:31:05 so there's not going to be a bunch of sparse data in your arrays. I thought that was kind of cool, particularly in the heap use case, which is one of the four types that we're going to be talking about tonight. And you can have metadata, of course, associated with those nodes, like the file system example. You've got your data, which might be the name of the path that you're in, but you might have permission data about that or some other various information like the file size of everything underneath it that might be useful. And so, you know, we get a little loosey-goosey

Starting point is 00:31:34 with those definitions, but for the most part, that's what it is. So how can you have so many different types with such a simple value in children. So I looked at the different types, and I tried to figure out what was so different about them, and I basically came down to four things that I could find. There's basically either constraints on the data structure, like how many children, or there are rules about how you interact with the tree. How many nodes a given parent node can point to, as an example of a constraint?

Starting point is 00:32:07 And sometimes like the binary tree will say certain values go to the left, certain values go to the right. The red-black tree is kind of cool too. I don't know if we're talking about that tonight, but it's kind of a version where you store an additional piece of information. You basically paint the nodes as you go across them as one color or another in order to kind of inform you and decisions you make and then these variations have really big downstream effects so there are actually three sorry about that so there's constraint on the data structures stored metadata and rules for interacting with the trees which i thought the rules was kind of weird because that's like the algorithm so it's like we have different names for trees based on the algorithms that are associated with it. Well, but I mean, like it depends on how you're defining that, though.

Starting point is 00:32:51 Like where would you put in? OK, because you kind of hinted on there are some rules about where you can put something in the tree. Right. And then there are then there are trees where they have no such rules. So if the tree has those rules, then are you considering that a constraint? That's a good point. Or are you considering that a rule for interacting with it? Yeah, it's really a part of the data structure itself, right? Because it's not something that you have alongside it. And that's why we don't call heaps binary trees if they only have two children. Well, we'll get into heaps in more detail.

Starting point is 00:33:27 Yes, we will. Yeah, like we consider the hashing algorithm that accompanies the hash table as part of the data structure, right? Right. So I guess, yeah, so it is part of the data structure. That's why you have many of them. That's funny. Like I've never really thought about it.

Starting point is 00:33:40 I always think of them as being very distinct, but they're really not. There's a lot of gray area where those things intermingle. Yeah, it's the rules. It's the rules that make the data structures different for the most part. I mean, there's some storage things, but yeah, it's pretty cool.

Starting point is 00:33:56 Some of the storage things are huge. Massive. Let's get into some of this common terminology, but yeah, we'll get to it. Yeah, and what I thought was really interesting and why I wanted to bring up the common terminology is that a lot of the common terminology deals with the tree as a whole or information about the nodes that aren't directly always associated with the node itself. Like you may not know that or a node may not know that it's in root, the topmost point in the tree. It may just be the topmost point in the tree. It doesn't know.

Starting point is 00:34:29 There's nothing special about it. There's nothing in the node necessarily that says, hey, I'm the root. There's no is root flag. But it just happens to be the node that you've got a pointer to, and that's what makes it special to you. Oh, man. I totally know. We've got to come up with our own tree now so that Node knows who it is. And anytime you ask it, it says IAM Groot.

Starting point is 00:34:50 And it would be the Groot tree. That's funny. I programmed a heap earlier, and that was like the first thing I did. It's like if index is zero, it would print out IAM root. That's awesome. That's awesome. That's awesome. So the terminology that I found on FreeCodeCamp, which is awesome

Starting point is 00:35:10 by the way, root is the topmost node of the tree. Edge is whenever you've got two nodes that have basically a line kind of between them. So that's going to be the pointer that links between them. Child is a node that has a parent. Parent is a node that has an edge to a child node.

Starting point is 00:35:29 Leaf is really important. It's a node that does not have a child. Oh, I was talking about this with Robert over at Slack earlier. I really hate that we talk about trees as if they're these things that spring from the ground and go up, and they have leaves up in the skies, and they start from the root and go up. Man, whenever you see these things rendered or drawn or in a diagram, they're always going down like you're looking at the roots of the tree. But... We even talk about the depth.

Starting point is 00:35:56 But hold on. In fairness, though, you don't ever get the trunk of the tree when you're talking about a programming tree. It's just the way it forms out from the top, right? That's why it's a tree. It looks like an evergreen coming down, depending on the type of tree. Well, or, I mean, more specifically, it would look like the root system, but we don't talk about it in terminology that's familiar to the root. Right, yeah, in the ground. No, we're always talking about the leaves up in the sky. Yeah, we're always talking about the

Starting point is 00:36:26 top portion of it. Yeah, they definitely mix terminology, and I'm sure that we've just made things way better for you by going that route. Yeah. But yeah, that was it. So I just want to mention those terms because if we're talking about a really large tree and that good enough type solution,

Starting point is 00:36:42 then we're going to talk about ducking out at a certain depth. So I just wanted to cover that real quick. So if we say something like that, that we at least mentioned it and you weren't confused about it at all. Yeah. So the picture we're trying to paint is the root node is at the top. When we said is at the top most, it just fans out down below it, right? That's really what it looks like. And we'll try to keep that consistent. Yes, we'll try. All right. Starting with, let's talk about the tree growing up no i'm kidding all right so the first one we're going to jump into here is the binary tree and so what is it so unlike arrays or lists or queues or stacks or any of those trees are not linear data structures right so we've already talked about that. They're hierarchical. So a binary tree is nothing more than a tree whose elements have at most two child elements. So

Starting point is 00:37:31 they're usually referred to as the left and the right nodes. So if you have a root node, it has to have one or two child nodes. It can't have any more. And then each one of those child nodes can have one or two other child nodes and it can just keep fanning on down like that so uh the root node as we said before is the top most it has no parent um we already said this about the leaf nodes i think we covered a bunch of stuff up above that um is now sort of just repeating here so i'm going to try and skip a lot of that. I mean, well, let's just say like, I mean, cause you kind of hit it on this, but of all the types,

Starting point is 00:38:11 this one specifically gets its name from the fact that any node is only going to have at most two nodes, which is why it's always like, if you're traversing this tree, it's a binary decision as to which direction you're going to go. Do you go left or do you go right as you're going down, as you're traversing down the tree? So here's where I think it gets a little bit interesting. And this, this sort of goes into what they were saying is there's just so many different types for everything, right? So there's types of binary trees. You have what's called a full binary tree. And basically that's saying that every node has either two children or no children.

Starting point is 00:38:48 So basically it's full, right? There's no just one node hanging out by itself. You have the complete binary tree. And that says that every level is completely filled with nodes except for the last level. They can basically fill up everything except for one right node. So if you look at the tree, it would almost be like a perfect cone, or not cone, what's the shape of that? Like a pyramid shape going all the way down.

Starting point is 00:39:23 And it would be completely full except for maybe one node on the right right like wherever that thing stops it's just done so that seems like such a specific tree example right yeah and i'm sure there's a use for it everything is filled except for one yeah i i'm sure there's like complete well it's not not everyone except for one so let's say that that bottom row could have a potential of 20 nodes. You could stop at number three. So, so if you, if you think about like,

Starting point is 00:39:50 if you were to see something painting that, that tree on a screen, right? Level one, it's going to put one tick level two, then it's going to do the left tick, then the right tick. And then level three,

Starting point is 00:40:00 it's going to go, you know, left, right, left, right, left, right.

Starting point is 00:40:03 So it's going to keep drawing it out. Like you type it on the screen, right right when you get to that last row you can stop basically anywhere you want but there can't be a gap and then be more nodes to the right of it so basically wherever you stop you're done is is what is what that means okay so it doesn't necessarily mean that it's you know all of the items minus one at the end. It's just, hey, where you stop, you're done. There's no more nodes to the right of that. There's no more of a lower descent, descendant. No lower.

Starting point is 00:40:33 It has to end at the same level. So there would never be – so basically, if you think about the – if you thought – Okay, so there's no other, let's say, peers at your level that aren't of your same heritage. They cannot go. Like once you stop, there's no more to the right. So if you think about a Christmas tree, something like that, and you just shaved off the bottom half right of it, that it would be like that last row just stopped. Right. You wouldn't have like one arbitrary Christmas globe decoration.

Starting point is 00:41:08 Hanging off to the right. Hanging off in the middle of that gap. Couldn't happen. That couldn't happen in this. Right. But that's what I'm saying. That's a very specific definition and name for this type of use of a tree. It's really bizarre.

Starting point is 00:41:21 Go ahead. I just want to mention it's really important to heaps actually because it's very important to the heap that it maintains a uniform depth. So you can get in some trouble with certain types of trees where your tree ends up being a stick. Because let's say you've got a binary tree and every node to the right is bigger, and you happen to get a list that's in order. So now every node just goes to the right, and you've got a straight line instead of a tree. And then your searches and stuff are going to be negatively impacted to that because the depth is equal to the amount of your data. So if you need to get the last node in the tree,

Starting point is 00:41:50 then you're traversing through every single node, which is really bad. But heap avoids that by always filling in the tree uniformly. So it kind of fills up like a Christmas tree. So you start at the top and then you go from the left to the right and just fill up first one then two then four and then eight and so on across it and you're eventually going to stop it doesn't have to be completely symmetrical because you may add you know an even or odd amount of items to the tree and so that's where it kind of stops and there's like you took that razor and just kind of shaved off part of

Starting point is 00:42:18 that bottom bottom row on the crystal street because it stops at some point but if you were to add a new one you know right where it would go, it's going to go right there and fill in the part that you shaved off. Yeah. So it's, yeah, I mean, the next one that we, oh, yeah, binary heap is actually an example of that complete binary tree. So just what you were saying, right, where it's filling that stuff in. There's one called the perfect binary tree. And this says that all the internal nodes, right?

Starting point is 00:42:47 So anything between the root and the bottom most level is the internal nodes. Every, uh, all the internal nodes have two children and all the leaf nodes are at the same level. So this is slightly different than the complete binary tree in that they all go down to that last level, but you could have gaps in between those nodes, right? That's really all that means. So you could have little ornaments hanging off the bottom of various different parts of the tree. And they have different lengths of strings

Starting point is 00:43:20 so that they all end up right at the bottom. No, you couldn't have gaps in the perfect binary tree. The perfect one, because the node above it would have to have two children. I'm talking the bottom level. The bottom level. Yeah, the bottom level. The bottom level, but the node above it would have to have two children. They're all filled in.

Starting point is 00:43:35 Yeah. So the bottom row is completely filled in. There's no gaps because otherwise the node above it wouldn't have two children. No, no, no, no, no, no. Only the internal nodes need to have two children. Which that would be an internal node. If you had a tree of four nodes, right, and your third row, right, that fourth row has to be completely filled in because in order for the third row, which is an internal node,

Starting point is 00:44:02 for all of it to have two nodes. That means that the fourth row has all node has all of the nodes too. I don't think it's right because I think, I think the point that they point that they make here is that all the leaf nodes are at the same level. So basically wherever those leaf nodes end, it's, it's that bottom row. They all have to be on that same bottom row. So if it's four they're all there but i think there can be gaps in between everything above it has to be completely filled in though um i'm sure maybe i misunderstood then i i wonder if there's a visualization somewhere i would hope so yeah so they so if you google that you'll see that they do have some of these to where the nodes on the very last level can be sparse.

Starting point is 00:44:45 So the one furthest down, they can be spread out however they want to be. But everything up above it has to be completely filled in. So, yeah, it's kind of interesting. It's a different take on that same piece. And I don't really know what the use the use of that one will be necessarily, but it exists for a reason. Just don't know what it is. You've got the balance binary tree,

Starting point is 00:45:12 and this is the height of the tree is O log N where N is the number of nodes. So this goes back to maths, right? So if the height of the tree is eight, then the number of, no, let's say that the number of nodes is eight, then the height of it's going to be three, right? Because it's going to be two to the third power. So yeah, that's really important for like searching. If you're going down, you know, left to right, because that means at most you're doing that many comparisons.

Starting point is 00:45:45 Right. So in that case, the depth maps to the number of operations that you have to perform. Yeah. So your depth is not going to get huge as a number of your nodes increases. And so that's really nice for, for being able to do that. Probably the breadth first search,

Starting point is 00:46:00 right? Uh, more or less. So that's a, a lot of algorithms that you'll that you'll look up the time complexity and it'll be like, oh, it's O log M. You're like, M? M as in Mario? Why? Everything else

Starting point is 00:46:11 uses N. Why you got to make it weird? But it's because the arrangement of the data actually has an effect on the runtime because it could affect potentially the depth of the tree or whatever the arrangement. So, they have to define it in terms of that, which is pretty loosey-goosey. Yep.

Starting point is 00:46:28 Makes me uncomfortable. And one of the key benefits of this thing is they say it does have great performance because they have O log in time for both search, inserts, and deletes, which is a big deal, right? Like if you're in log in time, I think we said before it's the second best thing you can do so yeah if you have a tree with a million nodes a balanced tree with a million nodes at most you're gonna have to do 20 operations to find an item in that tree yeah that's crazy yeah that's that's incredible man i'm still tripped up on this perfect binary tree.

Starting point is 00:47:06 Because like, yeah, man. Did you see some of the images though? Yeah, but everything that I said supported what I was saying. Because even they're saying that the formula for it is that a perfect binary tree has two to the n plus one minus one nodes. So they say if the height of the node, if the height is zero, so you just have one node, right? Versus if it's one and you have three nodes, so, you know, because you'd have one parent with two children versus if you had two, then you'd have seven. So we're saying that it's full from top to bottom. Right.

Starting point is 00:47:39 That's what I was, that was my understanding is it is, that's why it's perfect. It's the perfect binary tree. Everything is full. Okay. I'll take that back then. Then I don't know why they would have stated it like they did. Why wouldn't they just say everything's got to all the way to the last level? Unless I'm really messed up here.

Starting point is 00:48:00 No, I think you're right. There's one example that i'm surprised that uh i was kind of surprised you didn't include here was the binary search tree uh we'll talk about that for oh we're coming in a moment i think i might have put it somewhere did i put it did i add it i might have i might not have um any rate rate, if I didn't, then I'll cover it quickly. Well, we'll come back to it. Yeah. So this one I think I added just because it sounded cool.

Starting point is 00:48:33 The degenerate or pathological tree. I don't know why I did that. No, I like it better that way. The degenerate or pathological tree. Pathological tree. So each node has exactly one child until you get to the leaf node obviously that's uh that's sort of like what you were talking about where you kind of end up with this big stick going all the way down yeah crooked stick yeah it doesn't really

Starting point is 00:48:56 make much sense so um but you have to eventually at that point yeah it is it is a linked list there's there's no difference there's no reason to use the tree at that unless unless you were to say like maybe the maybe you make an a-frame and the very top most parent node has two and then everything below it but then it's not the same thing right that seems weird yeah yeah so that one's kind of bizarre a lot of these are so based on like your use case though right right so the pros of the binary tree when ordered it can provide sped up search over like a linked list but it's still slower than an ordered array which probably isn't much of a surprise which we should clarify maybe this is where you meant for binary search to come in because like the difference between the binary tree and the binary search tree is that there's no rule about how the things are ordered in a binary tree.

Starting point is 00:49:50 Where the keys live. The nodes can be in any particular order. That's not what you care about. Right. But in a binary search tree, you do care that they are in a specific order. Yeah. Like if I remember right, and I guess I didn't type this in anywhere, but as you go down,

Starting point is 00:50:07 right, the, the keys are always ordered as you're going down so that the numbers are, are going up down the left side, right? No, the numbers to the left would be smaller than the numbers to the right in a binary search tree.

Starting point is 00:50:21 Right. And, and that's for searching exactly being able to get the things quickly be able to make those decisions do i go left or right as i get down this tree um which if you go back to our binary search algorithm conversation now you could really see how you could put these two things together like this is where this is where understanding the data structure and how you might use it and how you're going to interact with it matters because if you know that you're going to be searching this a lot right then you can know like oh hey there's this binary search algorithm and you know maybe i could

Starting point is 00:50:53 get some efficiencies of scale here if i were to like store this in a binary search tree to begin with right right yep yeah uh i want to kind of highlight there too like the reason that you would use a binary search tree instead of an array is because remember arrays are like fixed size and most languages, most implement to the left or to the right or and same with adding items. It's a pain in the butt. But with the tree, it's really easy to insert here. So that's why it's just as efficient for searching as a sorted array. But it's much faster for removing items and adding items. That's the primary advantage that I'm aware of.

Starting point is 00:51:39 Yeah. And to expand on what you said, right, like the shifting in the array is expensive because it's having to move every single item. If you add something in the middle, everything to the right of it has to be shifted over, right? You can get into some interesting decisions, though, that you have to make during that cutaway, during that cut, for example. Yeah. Because if you cut a node out of the middle, now you have to look at the two children and decide which one of those should take over as the parent. Right. And then what happens if, I guess in theory, then the one to the right should already be greater than the one to the left of the new.

Starting point is 00:52:17 Well, assuming that the one from the right was the one that went up to the top. But it could have ripple effects all the way back. I guess he would always be the one that would go up to the top then in a binary search tree. In a binary search tree, you could have ripple effects as you're resorting those things, right? No, no, no. I think it would work out to where it would always be the right is moving up as the new parent. Am I wrong? I'd have to look it up. You'd have to swap in for the parent, but then you'd have to compare it with the left just in case it's bigger.

Starting point is 00:52:48 So it could get a little shifty. Yeah, I think it could have a ripple effect, but I'd have to look at it right now. I can't think of it off the top of my head. This is why the recursive thing kind of hurts. So one of the... Just had a stack overflow in my head. I did. Uh, so another one of the pros is it can be performant on the inserts and deletes into

Starting point is 00:53:09 the tree. It's faster than inserting into the middle of the array, like you said, but slower than a linked list because you might have to move some things around. What's more than one address that you're moving around and you might have to make some decisions like we were saying. Right. And here's one super big key thing that trees have going for them that things like arrays don't, there's no limit to the number of nodes other than the amount of memory you've

Starting point is 00:53:35 got or amount of this storage space that you've got, because all that's happening is each node has pointers to the other nodes, to the child nodes or to the parents. Like that's one thing that I don't think I did much digging in. And it might depend on the implementation of the tree. You always know about your children. Do the children know about their parents? Is it just something where you're always traversing down?

Starting point is 00:53:59 Or is it like a linked list where it has a pointer back and forth? I think it depends on the specific implementation. But in a nutshell, you can add as many items as you want because it's just a pointer to the next node as opposed to memory that you have to allocate for something like an array. Yeah, I don't think I've ever had a parent kind of link back up like a double A linked tree. I'm sure there's use for it, but I never really come up because i i tend to almost always do dfs just by default and so it just kind of really programs really nicely i think and so that's good i've definitely had experience where we did both yeah where you where you knew your parent now where it where trees can get interesting is in situations where you want to know the breadth, right? That can get interesting.

Starting point is 00:54:51 Well, your DOM or your HTML DOM that you mentioned earlier is a perfect example of where you have both up and down, right? Like you can say, give me the child nodes and give me the parent node. So, you know. It's so weird. I've never, I mean, when you said the DOM as an example of it, like immediately I was like, oh yeah, I mean, I can see what you're talking about inside of like

Starting point is 00:55:13 the Chrome DevTools, for example, right? But prior to this conversation right now tonight, that never occurred to me to think of it in that regard. Oh, that's interesting. I just always thought of it as structured data. I never thought about it as like the visualization that, you know,

Starting point is 00:55:31 they do inside of like a DevTools or something. You guys, speaking of which, this should be a tip. You guys remember that there was a Firefox plug-in, or maybe it was even built into it. We did do this as a tip. A 3D? Yeah, the 3D thing where you can actually see the layers split. Oh, man.

Starting point is 00:55:48 I'm going to find that episode real quick. It was so cool. I remember I gave that one as a tip. I'm pretty sure it was me. It was so amazing. And that was early days, man. Yeah. Yeah, that was really cool.

Starting point is 00:55:58 It's old. I only did it like that week. Kind of like live sharing VS Code. Like, this is awesome. Totally forgot about it. Yeah. Man. So the cons, I didn't really have any cons of the tree.

Starting point is 00:56:10 I mean, it's almost like a raise, right? Like, as long as you use them when you should for binary trees, I should say, is, I mean, they have a purpose. Did you have any kind of con that you want? Well, yeah. I mean, like, you kind of said about not knowing what kind you're going to use, how you're going to use it, because if you just use a plain binary tree and you have to search that thing, that is going to be potentially a bad experience for you, right, if you're doing that a lot. Yeah, I'll give that to you. I mean, I guess it's knowing the use case. But you could basically say that about every single data structure we've ever talked about.

Starting point is 00:56:48 I found it. It was called Tilt 3D. It was a Firefox add-on. And what episode was this? It wasn't as far back as I thought it was. That was surprising. It was episode 25. That's not as far back. Dude, that's... We just talked

Starting point is 00:57:03 about the hierarchical data being episode 28 and 29. When I said far back, I was thinking like single digits. Oh, okay. Not 25. I mean, pretty soon we're going to be able to say things in like double digits. Like, oh my God, way back in double digit counting. Hey, in fairness, back in the day, episode 25, we were releasing what, like one a month. So that was probably like two years.

Starting point is 00:57:30 Yeah, but that was back uphill in the snow you know oh man actually you say that jokingly it actually was two years in it was right because because we started in 2013 right that one we released it at the end of march in 2015 oh man that's awesome yep so a year and a half let's give it a year and we'll call it we Yep. So a year and a half. Let's give it a year. We'll call it a year and a half. Hey, you know what? In fairness, we had actually put out a survey. We were like, do you guys like the frequency of the show? And people were like, we want more of it.

Starting point is 00:57:54 Yeah. Everybody was like, I want 30-minute chunks seven days a week. And we were like, okay, we see your 30-minute chunks seven days a week, and we erase you three hours every two weeks. That's right. We've got one minute and two, another con, at least compared to sort of the arrays for the binary search trees, is that trees can be infinite, which is awesome, but they're much less memory efficient. It's like an array. You've basically got one pointer and allocated memory, and you can do those index operations to find whatever piece you need of it.

Starting point is 00:58:27 For a tree, you're looking at a pointer per node, and you've got a little bit of, you know, just overhead of having that kind of wrapper object around it, which is significant if you've got a whole lot of nodes. And so memory-wise, you're going to end up using a lot more with trees, even if you don't have to pre-allocate. So it's something to consider. Good call. Good call.

Starting point is 00:58:43 Hey, here's a neat thing about the binary trees too, though. Each level of the tree represents its logarithmic value. Oh, right. Yeah. Okay. So I sound super smart saying that. Yeah, totally. And I'm totally – should be giving credit to Rob Connery from the Impostor's Handbook for having read the sentence that he wrote.

Starting point is 00:59:07 But it makes me sound smart when I say it though, right? But yeah, so that first node, the parent, the top, the root, right? He's the zero, so log of one would be zero. That's right. And then you get down to the one where we said eight. That's level three. Yeah. That's right. And then you get down to the one where we said eight, that's level three. Yeah. That's very nice.

Starting point is 00:59:28 So we have, when to use is basically a good examples. Like when you need to do comparisons on nodes and you're not sure exactly how many, and you're going to have basically you need a tree and seven array. Yeah. I mean, I have some of the similar things down here that you mentioned earlier

Starting point is 00:59:45 like a file system is is a pretty easy one to see you know you got your c drive then you folder under that and your files under that whatever where those aren't uh binary though yeah because because you can oh you're right you're right so now you're talking about like just trees i mean i don't know that we we specify this but like if there's only two nodes, then it's binary. We're going to talk about it as binary. But then if there's more than that, then it's, I don't know how to pronounce this. K-R-E or N-R-E. That's when it can be a variable number of nodes.

Starting point is 01:00:22 So like in your file explorer example, that's what that would be, right? Yeah. I don't even know why I put that there. Maybe you only have two files on your file. Copy and paste. Yeah, I think I did. Man, that's terrible. All right.

Starting point is 01:00:36 This episode is sponsored by Discover.Bot. Discover.Bot is an online community for bot creators. Amazon Registry Services Incorporated created Discover.bot is an online community for bot creators. Amazon Registry Services Incorporated created Discover.bot to serve as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. And on its own, a good idea isn't as powerful as it could be. But when a good idea is shared, then it gains strength and momentum. It becomes capable of changing things in a way that is both small and large. A good idea shared becomes an innovation. You got to say it with conviction. It gains strength.

Starting point is 01:01:18 Gains strength. Discover.bot aims to sit at the intersection of ideas and innovation. They want to help people turn their experiences, discoveries, stories, advice, and knowledge into part of a shared canon that moves everyone forward. For veterans and beginners alike, Discover.bot is a placecks. That's discover.bot slash codingblocks to learn about how to get started on your next great bot. All righty. So it's that time to ask you to please leave us a review. If just a logarithm of our listeners would leave a review, log base two, then I don't know what that would be. It's probably not enough.

Starting point is 01:02:04 We probably need way more reviews than that. We need O of N. I don't know what that would be it's probably not enough number we probably need way more reviews than that we need to oven i don't know what you're talking about we need yeah we need we really want an exponential we want two to the to the listeners we've gotten so so addicted to reading those positive reviews and we just need more of them because we've gotten to a point where we really depend on them to get us through those Mondays. So please, if you could just leave us a review. We really appreciate it. We tried to make it easy for you. So if you go to codingblocks.net slash review, there's like links and some screenshots and stuff that'll kind of

Starting point is 01:02:36 help guide you to like Stitcher or iTunes or whatever. And we really appreciate that. So thank you for getting us addicted to your good reviews. Yeah. And also to too, share us. Tell a friend. Spread the word about the show. I mean, we know some of you have spread the word about the Slack channel. But, you know, maybe mention hey while you're at it. You know, give it a listen, too.

Starting point is 01:02:59 It won't hurt. That's awesome. You know who you are. All right. So with that, it's time for my favorite portion of the show, Survey Says. All right. So a couple episodes ago, we asked, what do you want to focus on improving in 2019? And your choices are front end.

Starting point is 01:03:26 There's a three piece service for everything now or back end because flex box done or persistence is king data, data, data. I did that for you. That's nice. Next choice. Algorithms and data structures. Can't go wrong with fundamentals. Clean code.

Starting point is 01:03:55 Master the tactical before the strategic. Or architecture. I've ascended to higher levels of... I can't even say it. You got to do that again because you were going good there. Architecture. I have ascended to higher levels of abstraction. I can't even say the word now.

Starting point is 01:04:16 I don't know. Thanks. Thanks. Yeah. Or DevOps. Good luck doing anything without me thanks i'll have to take care of that i think we got some some weird stuff going on over here yeah and i i'm not a part of it so i'm with you dear listener yes i don't know what's going on blame alan yeah it's my fault. All right.

Starting point is 01:04:45 I assume it's something terrible since I'm not saying it. So assume the worst right now. It's pretty much there. Curse words or nudity right now, just so you know. Curse words or nudity. Hey, you can't see that I don't have pants on underneath the camera view. The camera stops here and then that's it. It's all shoulders up here.

Starting point is 01:05:06 Yeah, this is like the news. I'm starting to guess it before we get that explicit tag. All right. All right, so Joe, you go first. I really don't know. It's so hard. I'm going to say front end because I'm Mr. Jamstack. G flanks.

Starting point is 01:05:22 All right, Alan. all right alan i i'm going to go with man i really i i don't know on this one i'm gonna say algorithms the data structures can't go wrong with fundamentals and we'll go with 25 because i don't think anybody's going to feel good about most of these. Alright. Front end 25%? No. No, no. Wait. Dang it. Sorry. Now I'm totally distracted. Pause. Joe said

Starting point is 01:05:55 front end. Yeah, I said front end. He's upset about it right now. I said algorithms. Alright. Well, you're both wrong. It was architecture. Really? Surprisingly. Yeah.

Starting point is 01:06:11 Everybody's ascended. Yeah. Ascended to higher levels of abstraction. I don't know why I couldn't say it before. What was the percentage? 30%. Well, 29, really. That's pretty good.

Starting point is 01:06:25 If we're rounding, it would be 29. Everybody's moving on up in the world. I like it. So what were two and three? Clean code followed by DevOps. Wow. All right. Yeah.

Starting point is 01:06:37 But, I mean, it was like close after you get past the first two. Then, I mean, clean code and architecture are walking away with it. That's pretty cool. All right. Good to know. All right. So, for this episode, we ask, hey, how good were the holidays to you?

Starting point is 01:06:58 And your choices are, awesome, I got everything I hoped for. Or, I got things I didn't even ask for. Or it was good. I got everything I purchased. Or glad to be back at work. Or I didn't get squat. We know which one Joe's voting for. Bah humbug yeah

Starting point is 01:07:26 man we're gonna have to fix that this episode is brought to you by Clubhouse Clubhouse is the first project management platform for software development teams that brings

Starting point is 01:07:37 everyone together so that teams can focus on what matters which is creating products that their customers love while designed to be developer first and that's important. While designed to be developer first, and that's important, it's designed to be developer first. The UI is simple and intuitive

Starting point is 01:07:50 enough for all teams to enjoy using. Now, when I mentioned that the UI is developer first, here's how developer first it is. There is a button on your story that you can click to actually get hints for how to create your branch based on the story point. Yeah, it's a, the UI is phenomenal. You log in and you can immediately see your work queue, your active tasks,

Starting point is 01:08:16 your upcoming due dates and your activity feed. Yeah. It's easy for people on any team to focus in on their work on a specific task or project while also being able to zoom out to see the work that contributes to the bigger picture using the fast interface. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates into the tools you're already using every day, like Slack and GitHub, for example, and getting out of your way so that you can focus on delivering quality software on time.

Starting point is 01:08:49 And you could sign up for two free months of Clubhouse by visiting clubhouse.io slash coding blocks. And again, that URL is clubhouse.io slash coding blocks. And you get two free months and you can see why companies like Elastic, Full Story, and Launch Darkly really love clubhouse oh man i i got bee trees and eggs too i forgot about that all right so i'll try and blow through this one even though this one's way more complicated than the binary tree so bee trees Bee trees. What is it? It is a self-balancing search tree. And it gets all kinds of crazy.

Starting point is 01:09:29 And we kind of talked about the stick or that pathological bad one. This is basically kind of like a binary tree in that we have some information about it. But it's really good at making sure that stick pathological case doesn't happen. Right. It tries to make it fat as it's going down through there, and I think I actually have a note for that here. But one thing to think about with the B-tree, though, is don't think about the binary tree conversation we just had.

Starting point is 01:09:54 Right. No, it is not a binary tree. Don't think about one node pointing to, at most, two nodes. Instead, think about it as each node could be several values. Yeah. And we'll get into how it sort of figures that out here in a second. So it's, there's also a name for it, an AVL tree or an implementation of it. It's the Adelson, Velsky and Landis tree. So as Joe said earlier, you can just name them after everybody. He just came out of those names so easily, made it look so simple.

Starting point is 01:10:25 Oh, yeah. You get some practice. I'm well-traveled. Do you want my version of these names? Yes, please. Adele's on. Velsky and Land. I is.

Starting point is 01:10:42 Hopefully none of them listen. I don't think they do. All right. They're probably all past, man. Well, that's wrong. Oh, sure. Man, programming hasn't been around all that long. Trees have been around since like 1700s.

Starting point is 01:11:01 I'm going to have to look it up. All right. So the primary idea behind a B-tree is reducing the number of times that the disk is accessed. So basically try and keep as much in memory as possible. That is the predominant factor of this thing and we'll understand why. Well, I mean, I'll just go ahead and skip to it. They're primarily used for databases and it makes sense when you start thinking about the fewer times that you have to access this and the faster it will be.

Starting point is 01:11:29 So that's it. Most operations for search, insert, delete Maxman. They require O of H disc access where H is the height of the tree. So this kind of goes back to it where you want it to be a fairly shallow tree, like not all that tall, but super wide because the shallower the tree is, the less access, the fewer times you have to access the disc. And as I mentioned, this is done by creating

Starting point is 01:12:01 the fat tree. So, you know, you can either like that name or hate it. So if we were to go back to the database example that you gave, then like, help me understand it in that analogy. Okay. Cause, cause I mean like I can see the, the B tree example,

Starting point is 01:12:18 like from a Wikipedia article, for example, I'm like, okay, yeah, I get that. But real world though, I want to understand from a real world point of view.

Starting point is 01:12:26 So I think maybe the next few bullet points will help paint that in. So they say typically the B tree node is the same size as the block size on the disc. Now, I don't know exactly how that translates to SSDs, but in spinning disc, you have block sizes, right? You have those on SSDs too, though. Yeah, I've never really looked into them.

Starting point is 01:12:49 I mean, it's just how you format it. You can change your block size. Oh, okay. I've never really looked into what SSDs are. I mean, I never have. I've always just taken the defaults. Okay. Even in RAID applications.

Starting point is 01:13:03 So, which Mike knows way more about this things than probably most people because he had to deal with them a lot in in a previous life there are people that know way more but so so that said though you have a block size on there 4096 whatever you're going to make this thing and that's what the size of your node is going to be. And as you go down and you add more of these things, those are also going to be that same size per node. All right. And that is what's going to allow it to reduce the number of times it actually has to go because you're going to keep a lot of data in each one of those. Now, this also goes back to one of the other types where it says all the leaf nodes are at the same level, right?

Starting point is 01:13:47 So you're not going to have this tree that's got, you know, some things that are hanging down to level eight and then others that are up at level three. They're all going to stop there at the same place because remember, this is a self-balancing thing. It's trying to fill this stuff in as it goes. So would this be an example of the, which one was it, The complete binary or the perfect

Starting point is 01:14:08 binary? It wouldn't be perfect. I don't think it'll be perfect because you may not have enough data to fill in all the nodes. You might not have enough data to fill in the node, but you have the node. And that's the difference. That's true. That's true. So I think, except it's not binary. Yeah, it's not binary. But I think that's the analogy that you're trying to make, though, is that you're not going to have it staggered. So if your tree was three levels deep,

Starting point is 01:14:37 then every node would point to something on that third level, right? I think so. I think it would break it up. You wouldn't have like only half of them point to something on that third level, right? I think so. I think it would break it up. You wouldn't have like only half of them point to something on that because otherwise you'd just rebalance the tree. Yes. Because again, your goal here is to keep your height less because it's an O of H operations.

Starting point is 01:14:59 Right. Right. Understanding. Yes, I think that's right. And we'll get into some of the formulas of what actually dictates it. And unfortunately, some of this stuff is much better with a visualization. And there's a very good one on Geeks for Geeks. Wait a minute.

Starting point is 01:15:18 We got this. I don't know if you heard about our ability to explain a drawing. Oh, man. Here we go. Hey, we're not going to do it, but check it out. If you were to draw the left side, no. Oh, man, you still got me, too, because I thought you really were. No, no.

Starting point is 01:15:35 All right, so the minimum degree T, T depends on the block size of the disk, every node must contain T minus one keys. So this is where I don't know if you're going to fill in that bottom row completely because you're going, so you have nodes that are based off the block size of the storage, right? So that's the size of your node. And then your node must have the number of the block size minus one for the keys that are stored in that, right? So every node is basically, if you think of it, almost like an array of values in it, right? And in that node, you're going to fill that thing up until you get to the block size. So if the block size is 4096, then you're going to have 4095 keys in that thing.

Starting point is 01:16:30 Right. And so I don't know, would you just have nodes over there with empty values on the other one? I wouldn't think so. So I thought maybe I was, maybe my understanding was wrong. Cause I thought at that point you would rebalance across the nodes. I thought that's how it worked.

Starting point is 01:16:48 So I don't know how it works when all the data is not completely full. And I don't think that I saw enough of the visualizations to be able to describe it perfectly. The root may contain at minimum one key, which sort of makes sense. You need something in the root. All nodes, including the root, may have at most two times the block size minus one key.

Starting point is 01:17:21 So... Yeah, I'm walking through a visualization right now. It's actually really cool just to kind of see it. I set my max degrees to four. And so I insert one. It's fine. Two, it's fine.

Starting point is 01:17:31 Three, it's fine. Four, oh, that's the max. So we're going to split up. We've got one note and one. Where are you seeing this? What's the site for that? CS.USFCA something, something. I'll have it in the show notes.

Starting point is 01:17:46 Oh, B-Tree. I see it. There's a search for it. Apparently this is a big one. B-Tree visualization. Yeah, I just keep inserting different nodes and it kind of pops in and it's just got this nice thing. Basically, it kind of highlights each node and pops it in. But what it does is, just like you said, is it tries to keep everything balanced

Starting point is 01:18:01 and really tries to optimize for keeping that depth low, but keeping that width high. So it makes really fat trees, just like you said. Okay. Yeah. So it's dropping it in. Oh, this is awesome. And it has a very good visualization. So it did balance it out. So like what I was saying, you'll end up with two nodes on the second level as soon as it has to drop down to it. And then when it needs to split it now, this is where things get interesting, right?

Starting point is 01:18:27 Like as you add more to this thing, the keys that are on the first level, the values in the first node have to come before the first key, right? That are on the first child node have to be values that are less than the first key. Then as I added this, the second node that was added on that child level, the values have to be after that first key,

Starting point is 01:18:52 but before the second key, right? And this is the way it does it. It keeps adding these things in order and filling out these nodes as it goes down. Now, the interesting thing is they don't all have to have the same number of values in them. Right. And I guess that's, that's where, that's where this is a little bit different. Again, this is not a binary tree, right? This, this, you can have multiple nodes on the second level that, that aren't, you know, one to two going all the way down. And it looks to me like the structure that you get out of the tree is going to change based on the order in which you insert things. So I can't just tell you, like, I can't give you a tree and have you easily tell me, like, the order I inserted things in.

Starting point is 01:19:33 But you can see that it does keep things wide and kind of, I guess, the converse is true then. Like, if I want to recreate the same structure from the same data, I need to put it in in the same order. Yeah, this little thing that you found here is amazing because it does a really nice job of visualizing this as you go. So like I did some crazy things, right? Like my first value that I plugged in was five. Then the next one I did was six. Then the next one I did was 11 and it completely restructured the tree. Like it changed the nodes on the second level, bump some stuff up to the first level. So this thing, when it's re when they say it's rebalancing, it's actually moving the keys around to make sure that for search purposes

Starting point is 01:20:15 and for not having to access the disc as much, it's putting it in as good an order as it can so that you can get to your results quickly. Yeah. I'll, I'll be sure to include a link to this in the show notes because it is a very cool visualization of it. Yeah. So you can see things constantly being moved around as it's constantly reshuffling the tree. Yeah, and it wouldn't even be worth me trying to describe to you exactly how it's divvying all these things out. Except we're going to.

Starting point is 01:20:45 He can try if he wants. If you up the number, I forget what they call it. If you up the maximum degrees there and just kind of keep popping values in, it's not intuitive if you're just kind of watching visually. Because I'll expect, oh, it's going to shift now. It's all going to be even. And it goes and puts something in some weird spot that I didn't expect because it's actually basing it off the values and not what I'm seeing is

Starting point is 01:21:07 like a visual representation. Right. But it feels like an oddity. So it's kind of like a little disorienting if you're not paying attention to the values that you're typing in. Yeah. You've got it. I mean,

Starting point is 01:21:16 again, it goes back to the size. So that degree thing that it was talking about and that changes everything. So I, that it was talking about and that changes everything. So, uh, so one of the other key pieces of this is all the keys are sorted in order within the node. So as, as we were going through this visualization and adding things in every single node, the values that you typed in, whether it was one, five, 12, whatever, they're going to be in order within the node. The B trees grow and shrink from the root node.

Starting point is 01:21:52 And that's what you're talking about, Joe, where things you thought were going to happen didn't. And it's because everything sort of goes back up to that root node to figure out, hey, how do I need to rebalance this tree? And so you'll see some weird things happen, and you've kind of got to know the algorithm behind the scenes in order to even be able to guess it. Yeah, it looks like if you smash everything down and kind of do it serial left to right,

Starting point is 01:22:13 then it always ends up being in exactly the same order. So if you want to print this thing out sequentially, then what you would do is just BFS algorithm. So you go breadth first. No, I'm sorry. I totally said it wrong. Depth first. So you go all the way down to the left and you go up and then go right as you just keep going up and you'll eventually print out all the numbers in order of their values. Yep. And one of the reasons why this thing is popular for things like databases

Starting point is 01:22:39 is because the time complexity to search and insert and deletes are O log of N. So very fast. As Joe mentioned earlier, if you had a million records, that's only 20 operations to get to that value. So big, big deal in terms of performance. And you can't say that about a binary search tree because depending on how you got that data, if I entered sequential data into a binary search tree,

Starting point is 01:23:05 then I'm going to get a straight line. And that's going to be O of log N. It's the worst case scenario. Sorry, I said O of log N. O of N. So that would be terrible. For a million nodes, it would be a million comparisons potentially. Yeah, it could be crazy slow.

Starting point is 01:23:19 I think we've got to pause the show. I'm so mesmerized by this animation. It's amazing, isn't it? I can't, I can't stop. I just, I just can't quit you. His tree's like 50 layers deep now. Oh no, I've been playing around with like the different degrees that you can set this thing on and like it resets it constantly.

Starting point is 01:23:38 Like, yeah, it's, it's so cool. So how does it work? The search, it's similar to a binary search tree which is really cool uh if you're searching for a key you recurse down the nodes if the if the node is a non-leaf node and it has the key then the key itself or the node itself is returned from the search if the non-leaf node doesn't have the key then the child node that is the child of the first key with the created value. So basically you're looking to see which node do I go to? Do I go to the one to the

Starting point is 01:24:10 left or to the right as you're going down? You're basically comparing the key value and then going to the best possible child node from that point, right? And it'll just keep going down until it either finds the node that contains a key or it'll just return node if it gets to the to the final leaf node um the this is like i didn't put in notes for every bit of this because this is where things get a little bit complicated so an insert on this thing it always starts at a leaf node it does a binary search to find the node where the insert should happen. So let's say that you have a whole string of numbers in there that you want. Now you're going to insert 15, right? It's going to do a binary search to find the node where

Starting point is 01:24:56 15 belongs. And if there's room, then it'll just put it in that node, right? So if it hasn't met that degree size, that block size or whatever it is, it'll just put it in that node, right? So if it hasn't met that degree size, that block size or whatever it is, it'll just put it in that node and it won't have to reshuffle anything. If that thing is full, then magic starts to happen, right? And it's not magic. It's just a lot of changing. So it will evenly split the node into two nodes at that point. And it's going to try and find the median value that was in that node, right? So you had 10 numbers in there. It's going to try and pick one that was in the middle.

Starting point is 01:25:42 And then it's going to split off into two other nodes, add the values to the left of that median into the one node, and then add the values to the right of that median in the other node. And then it's going to try and make that median the parent node. And then it's going to try and make that median the parent node. And then it's going to insert that value wherever it needs to go. So there's a whole lot of things that happen there. And then by the way, if that fills up another node that it happens to be doing, then that splitting continues all the way up the tree. So it, it ends up being a pretty,

Starting point is 01:26:04 I guess it can be a heavy operation doing that. But the thing is it's optimized for search afterwards because it's sorting all that stuff as it goes. It sounds like, I mean, the way you're describing it, plus with that visualization, it sounds like the emphasis is it tries to act more like a binary tree more often than not. And when it can't, that's when it'll try to prefer putting the non-binary parts of that at the leaf side until eventually that starts bubbling its way up and then it can eventually have to you know get into a situation where it can resort and it'll can work itself back out into more binary like form is that sort of i just don't want us to confuse the fact that it's binary with it because it's totally not just two nodes right yeah exactly yeah but but it's binary with it because it's, it's totally not just two nodes, right? Yeah, exactly. Yeah.

Starting point is 01:27:05 But, but it's trying to prefer it is what I'm, is what I'm getting at. Like you're, does that make sense? It's, it's so weird. It like looking at this thing visually, it sort of reminds me going back to our episode where we talked about the

Starting point is 01:27:17 database stuff where the, what was the name of the one where you had a left node and a right and then things in between the nested set model the nested set model so it's it's sort of like when you have keys if you have key one and key 10 in your root node right then actually i mean the difference about the nested set model though and the tree conversation though is that the nested set model has that in-between kind of concept that you're talking about there, though. Actually, let me split that. I think if I say it like this, it might help.

Starting point is 01:27:53 If you have keys 5 and 10 in your root node, then let's picture that there's three nodes below the root node. Anything that was less than 5 is going to be in the first node. Anything that was between five and 10 would be in that middle child node. And anything that is greater than 10 would be in that third rightmost node. So it basically tries to split the nodes up so that each node falls either before or after a key in the previous node in the parent node yeah i mean i after you know i took another look back here at the wikipedia article and i really think it maybe that that binary example that i was trying to make there it works great

Starting point is 01:28:41 if your if your degree for each level is like three degrees before you split, right? But it sounds like if you're not going to do something that small, then forget about it. It's not going to – it might mimic the binary tree or try to more often as much as it can with such a small degree. But that's probably not going to be your real world use case for the binary. I mean, I'm sorry, the the B tree, the B tree. I will say one of the really cool things is the visualizations on this. When you do it, it shows it going through every step of it. Right.

Starting point is 01:29:20 So if you do something that was going to cause that bottom one to split, then you're going to see it do that one and then create that and go up and do that and then go up again and that's where i was really trying to like to to what i was trying to the point i was trying to make is that at least in that that one visualization it was favoring putting things on the leafs until they would bubble up but But yeah, I think with the small degree, that's probably why it looked that way. And I know it's great listening to us talk about a visualization, but it's totally worth going here because what it does, and I don't know if you guys noticed this, Joe, this is amazing that you found this, is when you type in a value to plug in, and if you type in enough values,

Starting point is 01:30:05 it'll be easier to see because it goes pretty slow. But as you type in a value and you say insert, you're going to see it do the binary, or not the binary, you're going to see it do the search. Deciding where to put it. To go find the node, and then it's going to try and figure out, hey, can I put it into this node?

Starting point is 01:30:21 And then if the node is at maximum degrees, then it'll go ahead and re-split it. Yeah. Split and then potentially refactor all the way back up to the root. Right. It's really cool. And it does it in a fairly slow manner so you can watch it. And I mean, I kind of regret bringing up the whole binary conversation now. I'm not going to lie to you. But, but yeah, so at any rate, the bee trees are super important for that reason, right? They are amazing. Let's see here. I've actually got the pros. They're better, more consistent

Starting point is 01:30:53 search times than binary search because of the balancing of the tree, the rebalancing that happens every time you insert a value. I just popped this one in there i was looking at this um there are self-balancing binary trees like red black trees but the the major pro the reason why you'd use a b tree over one of those is that it's optimized for batch inserts so you imagine if you got like a really big max degree like say a thousand and then if you're writing you know

Starting point is 01:31:22 chunks of like 500 at a time then then the chances of you having to, uh, rebalance tree are less and it's more efficient, uh, balance because you have to do it less often. And that's the primary reason. That's the reason those databases and disks use B trees instead of like a red black tree.

Starting point is 01:31:37 Hmm. Interesting. And then I think, uh, you put the con in here as well, right? Yeah. Just the, the max degree kind of thing.

Starting point is 01:31:46 So that's more of a difference between this and a binary search tree type thing. Just because the rebalancing kind of thing. It's a little bit complicated. It does take a performance. It kind of reminds me a little bit of when garbage collection runs. I know it's not as intense as that, but it's kind of like, everything's great, everything's great, everything's great. Oh, hold on.

Starting point is 01:32:03 All the presses, I got to do some work. Okay, everything's great, everything's great. everything's great. Oh, hold on. All the presses, I got to do some work. Okay, everything's great, everything's great. You know what it brings to mind is SQL Server. Like, we've all worked with it a long time. One of the things that you'll find out is if you're doing a lot of updates to tables that have a ton of data in them, the performance can be absolutely awful because what happens is you get these page splits and it's probably because of things like this B tree storage behind the scenes where it's like, Oh,

Starting point is 01:32:29 well I need more data. It doesn't fit the block size. I need to now do a page split and move this data into two different spots. I was going to say, I was going to say something similar to that because I was thinking of like, uh, cause I was thinking like, okay,

Starting point is 01:32:42 what's when you said that it's more consistent search times in a binary search because of the balancing. And then I'm like, yeah, when you said that it's more consistent search times than a binary search because of the balancing, and then I'm like, yeah, but you've got to take a hit then on that balancing, so where does that come in? But then the database kind of analogy came to mind, similar to what you were talking about with when you have to run the statistics, what's it called? Ah, dang it. In DB2, it would just be called run statistics, right? It's actually recalculating statistics. But there's the fragmentation that you're trying to uh re recoup defrag the indexes and the statistics you rebuild this yeah rebuild the indexes yeah those kind of things that's um maybe that's where like the databases aren't always taking that hit maybe well i bet they gotta be doing the rebalancing

Starting point is 01:33:27 as needed though i'm sure that they've trees maybe that's when you maybe that's when your your database starts to perform slow is because uh it because of that fragmentation the height of the tree has gotten too tall maybe that's what part part of it is. Yeah. I don't know what kind of efficiencies they've built in behind the scenes. Obviously, they've done things over the years, right? But yeah, I do know and I remember for years reading that, oh, databases use B-tree indexes. And I was like, oh, that's cool. I don't have any clue what that means. But, you know, that's fine.

Starting point is 01:34:00 The A-tree. Yeah, I just threw out the number 1,000. But for all I know, it could be like 40 million. You know, I don't know. Right. That'd be interesting. So, yeah, I think you also, you put in these additional notes right here, right?

Starting point is 01:34:15 Yeah, I just want to mention that B-trees are well suited to storage systems because, like, if you're just doing, you know, simple kind of data sets, and like we said before a million times, it really doesn't matter. But if you've got a need for a tree, you need to do a lot of searches, but you also need those dynamic inserts. So an array is kind of out of the question.

Starting point is 01:34:32 Then B-tree is there for you. It's got your back because it's got the ability to write those blocks of kind of contiguous data easily. So it's good for disks and database systems and file systems. All right. So if you're still with us, still awake, we have a bunch of resources that we like listed in the show notes. If you like double tap this episode

Starting point is 01:34:52 or head to the website at codingblast.net slash episode 97. Is correct. This will be 97. All right. And it looks like I'm still talking. You are. It is now my favorite part of the show.

Starting point is 01:35:09 It's the tip of the week. Yep. And my tip of the week comes from Both Zoli, who put a tip into cpu.show slash tips about plant UML. So when I first read this tip, I was immediately turned off because of UML because I've had bad experiences as a lot of people have. But they mentioned that there was a VS Code extension, so I tried it out, and it's way cooler than I thought. It is a way of generating diagrams with a simple kind of markup language.

Starting point is 01:35:43 So there's a DS a dsl little simple stuff so you know that if you ever seen those traditional like cryptography kind of diagrams where there's like alice talking to bob you can do similar charts to that by saying like hey alice alice arrow bob uh alternate successful uh, else some kind of failure. It'll draw a little boxes. I'm not doing a good job of describing it, but what I want to impart to you is that you have a simple tree like structure in your market file. It looks kind of like Jason or YAML. You pop a couple of couple arrows in there you throw a couple labels you can do some coloring and some other fancy stuff and at the end of it you get a drawing

Starting point is 01:36:33 rendered of the words that you typed and you can actually zoom in and stuff too and because of the code extension what i do is i create a file with the extension wwsd i change you know the words bob to uh outlaw and when i save my diagram saved so i don't have to be messing around some sort of like you know art program just to do a simple diagram i can do this stuff i can check it in it's just text i can even do like loops and like simple cool things like this and i I can do it all in VS Code, which I love. And it generates these nice little drawings, which are really convenient for showing people rather than trying to explain stuff in an email. And I hate messing with diagramming type programs. So this is really nice.

Starting point is 01:37:16 Yeah, these are like use case diagrams. Oh, man. Class diagrams. Really cool. Yeah. Well, but this is just all in text, though. Yeah. I'm looking at this.

Starting point is 01:37:31 I'm like, man, I wouldn't want you to try to describe anything complicated, like giant in text only form a diagram. I don't know, man. I don't like doing the diagrams. That's what I was going to say. Have you ever done a diagram and you're like, oh, crap, I forgot a box, and then you go there and you move the boxes up, but the arrows didn't know. I don't like doing the diagram. That's what I was going to say. Have you ever done a diagram and you're like, oh, crap, I forgot a box. And then you go there and you move the boxes up, but the arrows didn't move. So you go move the arrows and then the text is not aligning for some reason.

Starting point is 01:37:53 Yeah, imagine if you did that now here. No, it redraws for you. No, but it's so cool. Like what he showed me a little while ago, the text, you just insert something in the middle of it and a redraw diagram for you it's yeah real time it's like the markup uh pane so you get a pane to the right and your changes are made like as soon as you save the document boop shows up on the right it's it's sort of mind-boggling i'm gonna have to see this thing in action because apparently alan was made into a convert and yeah i'm gonna need to be made into one. He showed me in like two minutes and I was like, okay, that's pretty super cool.

Starting point is 01:38:29 Yeah, I closed the pane and I forget how to do that. I forget how to get it. Oh, there we go. I got it up. It's awesome. It is very nifty. All right, so mine is going to come from a pane. Oh, here, he's showing you.

Starting point is 01:38:44 He's showing you. He's got it up. Yeah, so I'm tired, so I didn going to come from a pane. Oh, here, he's showing you. He's showing you. He's got it up. Yeah, some tires. I didn't do a very good example, but you can see, like, I started to change the bobs to outlaws, and whenever I save, I didn't do a very good job of replacing. So it's kind of filling some stuff in for me. But you can just kind of see.

Starting point is 01:39:01 You saw that? I was drawing this cool diagram. This is kind of like what I thought of as being a complicated example. And the diagram looks pretty complicated, but it's only 20 lines. And I feel like it's pretty readable. I'd take you an hour to make that in Vizio or something. Oh,

Starting point is 01:39:14 come on. Yeah, man, especially if you wanted to add something in the middle there, though, that's what I was saying. Yeah. He could add a line right there.

Starting point is 01:39:20 Just copy and paste or, or, you know, do a few of them. Yeah. You just keep adding them. It's a nested there. Just copy and paste or do a few of them. Yeah. You just keep adding them. It's a nested tree. No, but I mean like another

Starting point is 01:39:30 not swim lane that you got there, but another actor, I guess. Oh, don't be asking complicated questions. We don't know. Let's see. So you want like maybe Alice talking to Alan? I mean, something like that.

Starting point is 01:39:48 Sure. I don't know. Yeah. We'd have to spend more time on it. Hey, there's Alan. Boom. They're not quite talking together though. Oh, because I didn't give it a relationship.

Starting point is 01:39:59 I mean, it's impressive. I mean, seriously, some of these diagrams would take forever to make. Like Joe said, I forgot to put something in here. Now I've got to move these 30 blocks over 12 pixels. I mean, this is definitely like a Vizio, but for people that just want to stay in. They want to code. VS Code. Right.

Starting point is 01:40:22 Which, I mean, VS Code is being used for everything. We never have to leave VS Code. Right. Which, I mean, VS Code is being used for everything. We never have to leave VS Code. If you ever find yourself with a reason to leave VS Code, think about it. It's a Swiss Army IDE now, isn't it? All right. And yet still in here. But it's cool. Lightweight enough.

Starting point is 01:40:42 Yeah, it's cool. And who has Vizio, man? Come on. Windows 95 called. Dude, I downloaded Vizio today. hear, but it's cool. Lightweight enough. Yeah, it's cool. And who has Vizio, man? Come on. Windows 95 called. Dude, I downloaded Vizio today. I was so excited about it because you know the pain. If you're using Gliffy and you accidentally do alt and then the left arrow, oh, man. You accidentally hit that back button keystroke?

Starting point is 01:41:03 Oh, dude. Oh, there it is. I've actually had to walk away from my computer before because that's happened. I'm like, I got nothing left today. So, yeah, mine came from just debugging a multi-threaded app that has just caused me some major pain recently. So searching for the best way to do some debugging on a C sharp app that had some threads and some parallel task running and all that kind of stuff. And I'm going to share the URL to this thing. You'll have to go to the show notes,

Starting point is 01:41:42 do it because it's way too long to say, but here's the cool part on this particular page that I'm sharing with you, they have C-sharp code that you can just copy and paste and stick it into a console app. They've also got C++ code that you can do the same with. If that's your language of choice, they've got VB. And I think that's all of them. All right. It will tell you to go do this. And then they have a little tutorial below this thing that will walk you through all kinds of coolness so that you can see what's going on in various threads. Like you run this thing.

Starting point is 01:42:22 They've got set debug points in the code that will stop there. You can open up your parallel task window. You can open up your parallel threads window. You can open up your stacks window. And as you click things, it'll show you and describe in this document what you're looking at throughout this. It is absolutely amazing. And it, it totally saved my life trying to figure out what was going on in this app, where it looked like I was getting a deadlock in some threads. I could actually see the very last call stack of a particular thread.

Starting point is 01:42:57 I could bounce between threads. It would show me the tasks that were associated with them. It will even show you parent child hierarchy, dependency trees of the threads that are currently running and attached to each other. Dude, it was, it was kind of impressive when I first saw it, because I didn't know that visual studio had this ability in it. And honestly, you can't even debug the stuff any other way. Cause if you've ever tried to put break points in an application where it's multi-threaded, you'll see that you just bounce around all over the place. And this makes it to where you can make sense of it. Or what's worse is that you step in and then you hit the step over or step in whatever next.

Starting point is 01:43:41 And all of a sudden you went back up a line or two. Wait, what just happened? And it's because you're in another thread different thread executing that same function yeah this this thing is amazing man it'll actually show you like midways down the page on this you can actually see the chaining the tree of where the thread started and how they chained off as a matter of fact this last thing that you were talking about, the tries, they have something very similar to that. So like here, they'll show S, A, S, B, S, C for threads A, B, and C for class S. And then it'll show where it branched off and went other directions.

Starting point is 01:44:16 So it's like the starting points compressed into what everything has in common, and then it branches off of those. So, uh, at any rate, just, it was an amazing resource that helped me. Like, it's so good that I plan on doing a video just to walk through some of the things that, that, that they show on that page. So yeah, check that one out. All right. So, uh, here's my tip of the week.

Starting point is 01:44:46 Don't judge. So there you go. Laughing. So you're in SQL server management studio and maybe you have like, you know, several queries that you're running and, uh, you know,

Starting point is 01:45:02 you get back this count at the bottom and you're like, Oh, that's the aggregate count. I never realized. Shut up. I never. Stop it. I never realized that if you click on any one of those individual result sets that the count changes to the count of that particular result set.

Starting point is 01:45:19 Stop laughing. Stop it. I didn't. Why would would i why would i assume you do it like i saw this tip in the worksheet here in our show notes and i was like it never dawned on me that somebody wouldn't realize that i guess because i've looked at that stuff for so many years but you saw that i was like i totally get why it's not like i haven't looked at it for so many years i just thought like well why would i care like i don't normally go clicking around in the cell and when i are in that you know when i have clicked into there i'm not also looking to

Starting point is 01:45:56 see oh did the record change i'm just like looking at the data and focused on that so i never noticed it until a co-er pointed it out. And I'm like, yeah, I didn't know. How long is that? Thank you. Really?

Starting point is 01:46:11 Wow. Nope. My people. Hey, don't assume, don't assume because you've thought of something that the other people, I mean, it's just one of those things that I've noticed forever.

Starting point is 01:46:22 That's, I mean, yeah, it's, it's, I never did. So, yeah.

Starting point is 01:46:27 So if you execute multiple queries in SQL management studio, you get back by default, an aggregate record count for all of those queries. But if you click into any individual one, uh, then the count will be specific to that result set. That particular grid that you're looking at. I'm blown.

Starting point is 01:46:47 That's awesome. All right. And then one other quick one. Oh, yes. If you are on a Mac and not booted into Boot Camp, use the option key every now and then and see what other crazy options are out there. So a little bit of like how the sausage is made. We got some new monitors, some nice shiny new 4K monitors.

Starting point is 01:47:14 Only the scaling options for us were limited by default. It was like, oh, 1080p. On our 4K. Or it was either like, you know, you get all the pixels or four of the pixels. No scaling. Right? There's nothing in between. And it turns out like, oh yeah, if you just press the option key, you see these other scaling options. And the reason why I call that out though is that both Alan and I were like,

Starting point is 01:47:45 Oh yeah, we should have known better, man. Everything, you know, anytime you're in a menu on OS 10 or Mac OS, press the option key just to see like what else pops up. Yeah.

Starting point is 01:47:55 Like finder is a perfect example. Like if you were to click the click, open up finder and go to the go menu and you hold down the alt button, you'll see library show up right i was going to mention the same one yeah i mean it's ridiculous they've even got it on the view like i don't even know why this is a thing like there's some parts of mac os that are just really irritating to me like the view menu you hold down alt and the arrange by changes to sort by like come on man like really well basically another way to say

Starting point is 01:48:25 this is every menu something changes when you press that every single one of them and that's why except for help uh well yeah the help is not all that helpful usually yeah um but yeah that's why when when you said that about the scale thing like if it the scale button you have to hold down alt while you click that scale button and then the new options show up and i was like i feel like such an idiot you're supposed to alt everything in mac os i don't know why they hide functionality like that but they do yeah part of their minimalist approach yes all right so uh we hope you've enjoyed this conversation about trees and tree like things uh if you haven't already maybe a friend lets you listen to it on their device or pointed you to it by a link or something,

Starting point is 01:49:10 be sure to subscribe to us on iTunes, Stitcher, and more using your favorite podcast app. And as Joe mentioned earlier, if you haven't left us a review, you can find some helpful links by heading to www.codingblocks.net slash review. And while you're up there, go ahead and check out

Starting point is 01:49:27 our incredibly extensive show notes, our examples, our discussions, and more. And you got some feedback, questions, or rants, I don't know, drop a comment on the episode maybe or take it to the Slack.

Starting point is 01:49:38 And make sure to follow us on Twitter and you can reach out to us if you have questions on how to get to that Slack and we'll gladly help you with that. And go over to the website, codingblocks.net, where you can reach out to us if you have questions on how to get to that Slack and we'll gladly help you with that. Go over to the website, codingbox.net where you can find a bunch of social links and show notes and other stuff

Starting point is 01:49:52 at the top of the page.

Coding Blocks - Data Structures – (some) Trees

We ring in 2019 with a discussion of various trees as Allen questions when should you abstract while Michael and Joe introduce us to the Groot Tree....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Coding Blocks - Data Structures – (some) Trees

We ring in 2019 with a discussion of various trees as Allen questions when should you abstract while Michael and Joe introduce us to the Groot Tree....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.