Coding Blocks - Data Structures – Heaps and Tries

Episode Date: January 21, 2019

We dig into heaps and tries as Allen gives us an up to date movie review while Joe and Michael compare how the bands measure up....

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to Coding Blocks, episode 98. Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app. And check us out at CodingBlocks.net where we can probably show this example of discussion a lot more. Oh, you're trying to mess it up for all the two-times listeners, huh? There's been some controversy lately, so I just want to, you know, buck with the status quo. Alright, so I'm going to go the way that I believe that I sound. Send your feedback questions. I can't even listen.
Starting point is 00:00:31 That's too painful. And rant to comments at coding blocks.net. Hey, did you, okay, hold on and follow us on Twitter at coding blocks or head to www.codingblocks.net. Follow, find all our social links there at the top of the page. I'm Alan Underwood.
Starting point is 00:00:49 I'm Zach. And I'm Michael Outlaw. This episode is brought to you by O'Reilly Software Architecture Conference. Have you got any plans this February? No? Well, now you do. This February, the 3rd through the 6th
Starting point is 00:01:07 in New York City, O'Reilly is hosting the Software Architecture Conference. I mean, you wanted an excuse to visit New York City anyways, right? So, I mean, you're welcome. The O'Reilly Software Architecture Conference is the only conference that focuses exclusively on software architecture and the evolution of that role. So if you want to dive into technical weeds by covering complex topics from microservices to domain-driven design with different learning styles available, ranging from 50-minute sessions to two-day training courses. Learn how to communicate, wait, scratch that, sell complex technical topics and their merit to both upper management and technical teams with training courses like the Architectural Elevator. Well, wait, I'm not an architect yet, you say. Well, a special networking experience called Architectural Kadas is where you can go to practice being a software architect by breaking up into smaller groups and working together with other people on a project that actually needs development.
Starting point is 00:02:06 Yeah, network with people using the same tech stack that you use to gain personal insight that you can apply to your own environment. The O'Reilly Software Architecture Conference covers the skills and tools every software architect needs. code BLOCKS, that's B-L-O-C-K-S, during your registration to get 20% off of most passes to the O'Reilly Software Architecture Conference. By the way, because we did all that, did you guys ever watch the, what's the new movie that came out? It's got the rabbit, it's in the fox, and...
Starting point is 00:02:43 Oh, yeah. Into the Spider-Verse. No, come on, man. What's up on a Deadpool? Man, you guys really bumblebee. Officer hops. Anyways, there's a scene in this movie that I cannot think of that is absolutely amazing. And it's, they go to the DMV to get some information
Starting point is 00:03:08 and the people working behind the counter are all sloths. It's my favorite part of the movie. It is so painful. And it's so slow. So that's amazing. And that movie is, should I tell you? Or should I leave it as an exercise for the
Starting point is 00:03:24 lister? Zootopia. I got it as an exercise for the lister? Zootopia. I got it. Wait a minute. My movie didn't just come out. Well, you know, relatively recently. Hold on. Within the past decade. We're good.
Starting point is 00:03:36 Your movie reviews don't count anymore. No, it's pretty terrible, right? All right. So a little bit of podcast news coming up here. So first, big thanks to some iTunes shoutouts, iTunes reviews. So shoutout for Pendelgeist and Bozzoli. Bozzoli, sorry we were late. We appreciate your patience with you, and thank you so much for installing iTunes, leaving the root, and then probably deleting iTunes and telling them in the review that you're going to do it.
Starting point is 00:04:04 That's awesome. That was baller. That was boss. So thank you very much. Mike, you want to do these? Okay, sure. From Stitcher. Now, I totally didn't take a moment to practice like my normal practice,
Starting point is 00:04:17 so these are going to come out really raw. JBZ Cooper, Terrence, and Saltire Steve? Saltire Steve? I would have said it kind of like satire, so Saltire Steve. Okay.
Starting point is 00:04:37 I think that's how I would have gone with it. A little salty satire, I like it. Alright, a few things I want to mention. One, I gave up on Advent of Code. I hate giving up on coding challenges. I like it. Alright, a few things I want to mention. One, I gave up on Advent of Code. I hate giving up on coding challenges. I feel like I'm giving in. I mean, I tell you, I spent eight hours plus on problem number 15, so I apologize to everyone
Starting point is 00:04:54 on the Advent of Code channel. I'm throwing it in. That's December's news. Who cares? It's January. It's really cool coding challenges, though. I got to do a lot of cool A-star algorithms or like singular circular circular linked lists and a lot of cool data structures and things like we would talk about in the show like next uh problem but oh this is a heap so that was really convenient so i thought it was really cool but i just uh when i
Starting point is 00:05:17 added up the time on it and then realized like oh i'm gonna be spending 50 hours this month on these problems i gave up but i'm curious to know if anyone else did it out what their experiences were and you know like obviously the problem wasn't designed to take eight plus hours i'm just a adult and couldn't figure out how to get that last one over the line there but um wait wait are you saying adults have no capability of doing these things oh so i meant dolt like d-o-l-t like the. Got you. Yeah, 1993 called and they want their insult back. I thought you were basically saying fresh college grads would get this in a heartbeat. And of course, you're in.
Starting point is 00:05:52 Okay. All right. Not that. I'm just old. Yeah. So I don't know. But I would be interested in finding more things like that that aren't so time intensive. Because I'd like to be able to do challenging problems that make me feel accomplished when I finish.
Starting point is 00:06:06 But I can't be doing two hours of homework every day and going to work, right? That's a bit of a crazy. So the code caught us on Code Wars, you're past that now? No, I still really like those, although I haven't done them in a long time. I've been slacking and I need to get back to it.
Starting point is 00:06:22 But I like those because so many of them, you can set the difficulty level. So you can do 10 minute problems or 20 minute problems or hour. I've never tried to go super hard on the problems but I like the idea that you can kind of dial it back. And if I disappear for a year and come back to it
Starting point is 00:06:37 then I can do that and I don't feel like I'm giving up on Santa which is how I feel about Advent of Code. You can't flip any more of those calendar doors open anymore now? Yeah, I ruined Christmas because of problem 15. Well, yeah. Like I said, it's January now, so who cares? Yeah, that's right.
Starting point is 00:06:58 And speaking of January, Orlando DevFest is coming up, and I wanted to mention that for a couple different reasons. One is it's just a really cool conference, and it's in my backyard. And two is because it totally snuck up on me. I totally missed that this was going to happen until basically a month before the conference. And so it was kind of upsetting me to think, oh, there's this really cool conference going on right where I live. It's got a bunch of topics that I'm interested in. I'm looking at like Flutter and DevOps and I'll just, I'll coach those sorts of cool stuff. And I totally almost missed it completely. If it
Starting point is 00:07:35 wasn't for a meetup, I would never have found out about it. And it's like, if I, someone who goes to a lot of meetups, who pays attention, who specifically looks for conferences in the Southeast to go to, if I almost miss this, what else am I missing? So I was curious how people find out about conferences because there doesn't seem to be one good spot to do it. I really don't know either. I usually find out at meetups or what you guys tell me. Yeah.
Starting point is 00:07:59 Yeah. I mean, it really kind of is sad, right? Yeah. Yeah. It's just kind of crazy sad, right? Yeah. Yeah. It's just kind of crazy to me. So, I don't know. I was just wondering, like, if someone's out there that has a solution to this problem, then I would love to hear it. Because otherwise, you know, I've got a little bit of time off work coming up.
Starting point is 00:08:15 So, who knows? I might try to build something crazy. Hey, speaking of which, before you move on to the next one, didn't we both put in for a talk down at Orlando? Yeah. Orlando Cogame, which happens in March. So, we're both there. We're actually getting into Booth this time. move on to the next one didn't we both put in for a talk down at orlando yeah orlando cooking which happens in march so we'll both there we're actually getting into booth this time so we're gonna be hanging out uh we'll probably have some stickers too so if you're in the area you should definitely come by and hang out with us because otherwise we're just gonna be like kind of staring awkwardly into nothing yeah but so did you ever hear back on whether or not you got
Starting point is 00:08:44 accepted for the talk? I'm assuming not, right? I have not put mine in yet. What do you mean assuming not? Wow! That was harsh! I haven't actually been in my talk for that yet. I will be, but they aren't announcing whether you got in or not until January. Oh, okay. So
Starting point is 00:08:59 shots fired, not fired. Okay, good. I was just saying that... We're good for now. Even that's the thing. With conferences, when are the call for proposals available? When are the deadlines ending? It's like this secret weird network where the speakers all kind of whisper each other. I don't know if it's supposed to be secret info or whatever, but I would like to see that information be readily available and consistent. Yeah, it kind of stinks too, though, because if they're telling us in January, right, then that means that you've got two months to prepare for a talk, which, as we know, blows by faster than what anybody hopes for. So anyways.
Starting point is 00:09:31 All right. Cool. And so that's Orlando DevFest, so January 19th. But I also want to mention a contest that we're going to have coming up here pretty soon. So if you're not familiar with React Amsterdam, go Google it because it's the biggest React conference in the world. So they've got tons of developers, really cool talks coming up in April. And we're going to have a free ticket to give away. So if you're on a mailing list, then that's just an example of the type of contest that we like to do.
Starting point is 00:09:57 And we're going to try to do even more next year. If you're not on the mailing list, you can go to codingbox.net and sign up there on the top right. Or if you're on a phone, maybe scroll down to the bottom. But somewhere there's a mail sign up. and we try to keep that list clean. We try to pretty much only really do contests on it. There may be an occasional other thing, but I also try to have a little fun with those contest emails. So you should join and let me know what you think.
Starting point is 00:10:18 Yep. All right. So now it is time to move on to heaps. What you got, Joe? Yeah, heaps are one of my favorites. They're kind of like specialized binary trees. So they have two nodes, left and right. But they're kind of special because they keep data sorted-ish.
Starting point is 00:10:36 So like a binary search tree, you always know every value to the left is smaller and every value to the right is bigger. But with the heap, depending on if it's a min or max tree, I'm just going to focus on max trees. It's basically max heaps. It's basically the same thing. Just reverse the greater than less than sign. So I'm just going to stick with max. Every child is going to have a value that is greater than you. Now your left may be bigger than your right or your might be right might be bigger than your left. And that doesn't matter. All that matters is that it's bigger than you.
Starting point is 00:11:15 And that keeps things close to sorted but not perfectly sorted, which has actually some interesting properties. It makes slower lookups because – a lot slower lookups because you have to look at every value because you really don't know where any specific value is going to end up a tree other than it's going to be you know down or up well you said you have to look at every value you look at every value till you find it right till you find it so you could get lucky and find it really fast or you could get really unlucky and find it you know the very last thing that that was in it yep worst case scenario you're going all the way through every single node. Okay. So it's not, it's just flat out, not good for searching.
Starting point is 00:11:49 However, because of this kind of interesting property where we don't really worry too much about things being perfect. What this means is that like in the max case, we're going to have our biggest value in the tree is always going to be up. So the biggest value is going to be the root. So in a case where you only care about getting the maximum value out of a data set, this is a great data structure. And you may think, when is this ever going to come up? But it actually does in a few interesting cases. Primarily, the use case I'm used to seeing is
Starting point is 00:12:23 priority queues. So so if let's say we're looking at like tasks or we've got some sort of um stuff that needs to happen and it needs to happen in a particular order based on whoever's got the priority or if we're doing like a breath first search or we've got a tree and then we've got this other tree this heap over on the side which we use to keep track of what needs to go next then what we do is we make sure that that top most value is always the next one that we're going to grab and then we can just pluck it off and because we don't really care about whether it's to the left or the right it makes that balancing cheaper than it was for the b trees or the self-balancing binary search
Starting point is 00:13:01 trees or binary trees so it's kind of a compromise there, where we say it's okay to have things sorted-ish, and we're going to deal with worse searches, but we're going to have really nice rebalancing. And the algorithm itself is actually really cool. And we're going to be basically focusing on the main use cases there so min max trees uh max heaps there are actually 18 different types listed on wikipedia which is just nuts and they're all slightly different let's see but because you're not so strict on
Starting point is 00:13:40 the ordering it's quicker to insert the data and then keep it balanced. And so you get linked list type times. It's actually sorry, I got a little lost on my notes. I got so excited talking about these heaps. I got lost on my own notes that I wrote. You did bounce around a little bit. Yeah, it'd be alright.
Starting point is 00:14:00 So the downside, like I mentioned, is that it's slower for lookups. You have to look at each one, which stinks. But another cool property of probably the killer feature and the one that makes it particularly suited for those priority queues is that it's really efficient to remove the top node, remove the root. And so do you want to explain that a little bit? Yeah. Because the entire tree...
Starting point is 00:14:37 Can I punt on that and get to it when we talk about the how it works section, which is kind of right now? I'm going to come back to that. We come back to of right now. I'm going to come back to that. How about we come back to it right now? I'm going to take the long way to get there. It's one of my specialties. My coding block superpower.
Starting point is 00:14:57 The deal here is that heaps are typically stored in an array. This is unlike most trees where we kind of set some of those main advantages of trees over arrays is that we can have those dynamic sizes and we can kind of insert them in place and we typically don't store them in data structures like arrays. But this is a special case where heaps are almost always stored in arrays and that's because of the way that you fill them in. And this is a binary tree
Starting point is 00:15:23 although you won't frequently see it mentioned with other binary trees, just because it's got these own kind of special properties. And it's so tuned for plucking root nodes out that it's kind of considered its own beast. And it's so common for that one specific use case that it's basically considered its own category. But say you take that memory, you allocate it for an array, you say size 1000, 1024, who cares. And when you go to actually fill in the values of the tree, what you do is just pop your I've got to stop using that word. You push the value that you're inserting
Starting point is 00:16:03 into your heap into the first available spot in that array, the first non-null spot. And let's say if we start with the number 25, we go ahead and throw that in the first index of the array. So array 0 becomes 25. We throw in a second number 1. Now 1 is going to be less than 25. We throw in a second number one. Now one is going to be less than 25. So we know that in our max tree where we want the max value to be the root, it's going to go after. So we're just going to go ahead and put it in the next index. So now our array reads 25, one. Now, if we add one that's bigger, 99, then what we do is we go ahead and pop it in, push it into the third index there. We add it. So
Starting point is 00:16:48 now our tree is out of order. This is not good. So when we add it in there, it's important that we check the parent and say, Hey, are we still good? In this case, we're not, we've got 25 at the root, but we've got a child of 99, which is bigger. And we want a max tree here, a max heap. So what we're going to do is swap those values. And if this were a bigger tree, if it was, say, you know, 100 nodes or something, then you would just keep going with that. So you always just insert into the first available index in your array,
Starting point is 00:17:19 and you compare that index with its parent and say, hey, which one of us is bigger? And then depending on that result, you're going to swap them and you just keep checking it until you bubble up. But there's some interesting properties, like I mentioned, because it's a binary tree, this is really important. Because it's binary, you can easily compute the value of your parent or the value of your children from any index. So if you give me index five, I can say, okay, my parent is going to be, so I don't know the reverse of it, like two N minus one, something, some sort of a computed number. That's going to be a fixed math.
Starting point is 00:17:59 There's division involved, but it's not anything crazy. There's no exponential growth because it's always going to be a balanced tree, like we talked about with the binaries. So I can always do just a little bit of math, a quick calculation to figure out where my parent is, my child is, and I can use that to either bubble up the values when I insert new nodes, or here's the finally the answer to your question. When I take the maximum value off, I chop the root off that array. What I do is I go ahead and grab the last filled in item in the array and I push it into that array at index zero. So I've moved the last into the first position. And then from there, I just bubble down. I say, hey, are you bigger than any of your children? If so, swap. Then if that node is bigger than any of its children, swap. If that node is bigger than any of its children, swap. And in both cases, whether you're bubbling up or bubbling down, you're only ever looking at the logarithm of your data set.
Starting point is 00:19:10 So basically the depth of your tree. And that's because we fill it in. We always fill in the next empty spot in the array. You always end up with a perfectly filled in tree. Like we talked about that Christmas tree with a little bit shaved off. So does that make any sense? I think sort of. The only thing that I'm not getting off the top of my head here is we're talking about multiple.
Starting point is 00:19:40 It's multiple arrays, right? It's one array. Okay, so it's one array. And then, so this whole parent-child notion then, how does that live? I'm guessing then the first index of the array is the root. Yep. Okay, so then the child is what? The left child is index one, and the right child is index 2.
Starting point is 00:20:07 Now, the children for index 1 are going to be 3 and 4 and the children for 2 are going to be 5 and 6. Okay, now I'm with you. So, I can take any arbitrary note. I say like,
Starting point is 00:20:22 hey, what's 2 going to be? And the results for 2 were I think it's like 2 plus n minus 1 and 2 plus n plus 2. I forget the actual equation. But there's a simple mathematical equation that I programmed during my lunch break today. But it's on another computer and I can't see the formula I came up with now. But it's a really easy calculation that given any nodes, you can find its parent in like a, you know,
Starting point is 00:20:48 a plus and a minus and maybe a divide, or you can find the children in the same way. Okay. Got to reverse the equation. Yeah. I'm with you now. So basically the array, as you're moving to the right through the array.
Starting point is 00:21:01 So if the array starts on the left and as you move to the right, then basically you've got the root, then child one, then child two, and then child one's first child, then child one's second child, etc. Yeah, you're just going to keep walking through it. Okay, that makes sense. Yeah, and in the notes I actually wrote down like a really precise way of saying that i planned on saying but i just got so excited talking about the heaps because
Starting point is 00:21:29 they're one of my favorites that i kind of blew through that and so sorry if that was confusing uh really hoped it wasn't but the the deal there is the root is always going to be your max value in a max heap or your min value in a min heap and so if you want to chop that sucker off, you just take the value out of it, save it, do whatever you want to, and then pop the last item from the array that's been filled in, the last non null value, pop into that first spot. And then you just do a couple of comparisons with your children to see and your grandchildren, your grandchildren, until you get things in the right spot. And this thing is not going to be in order when you're done. Probably not.
Starting point is 00:22:06 And we don't care about that. All we care about is that your max value is always in that first position. It's kind of a funny case. I actually found, much like you found, a super cool visualization for the B tree. I found one that's really awesome for this as well. Oh, very nice. This is a binary heap plus priority queue. So it's a little bit more complicated, but it's doing something very similar.
Starting point is 00:22:34 Yeah, okay. Awesome. And so priority queues, you know, kind of the obvious example there is like I've got a bunch of tasks and some are more important than others. So I always want to make sure that the most important task is in that first position in the array so what i do is when i do any sort of insert even if i know it's a high priority is just go ahead and throw it in the last open or yeah the last i guess i should say the first null spot in that array the first non non-filled in area non-filled and you can in like languages like c that don't really have the null thing that we've got on, like C-sharp, then you can basically just keep a pointer that keeps track of the last.
Starting point is 00:23:12 And whenever you remove an item, you decrement the pointer. You add an item, you increase it. But you basically just put it in the first available spot. This keeps the tree balanced. And then when you bubble up, you're only going to be doing a maximum of log n procedures. And once you find a value that is correct, you don't have to check anymore. You just need to go up until you don't have to swap anymore, and then you're done. So similar to the previous site visualization that we used from USFCA, there was another one that they had from MinHeap.
Starting point is 00:23:47 I mean, I know that Joe was wanting to focus on the max heap, but again, same thing, just reverse your direction for the comparison. So you were talking about the example you found was in regards to like a priority queue, but this one you can see the array and how it's building the tree and you see how it reshuffles the tree and the array as items get thrown in sweet so i'll be sure i'll include links for both yeah and i'm just gonna look up real quick so that equation was so i don't think i completely
Starting point is 00:24:27 had understood until i saw the visualization which is again i mean it's harder to describe some of these things but what you were saying that i that i finally realized was when you were saying you insert the value into that last position what it then does is look at the tree's parent its parent to see if it's bigger than its parent. And if it doesn't, it swaps it. So every time you're doing an insert, it could potentially look at its parent and swap with its parent all the way up to the root if that happened to be the case.
Starting point is 00:24:59 Well, and another way that didn't dawn on me until I saw the visualization too is that if you think about this in tree structure, it's always favoring the left side of the tree. If you visualize the tree. It's always favoring the left side of the tree until all of the nodes are full, and then it moves its way to the right before it goes on to another level. And by full, this is a binary. So it just means if there's two nodes, the move on to the next one. So,
Starting point is 00:25:34 and so as part of that favoring the left, that's when it'll make the decision of, Hey, do I need to swap with the parents? So, so by it going to the very end of, you know, the first null value in the array, that is going to be whatever the leftmost node available is on the tree.
Starting point is 00:25:52 And when it gets there, then it'll start comparing to its parent to figure out, do I stay where I am or do we need to swap? So this is the University of San Francisco website. And I got to give them props, man. They did some nice things with these visualizations. These are awesome. Yeah, I just looked up that equation. So in a zero-based array, if you give me any node index, so we'll say like number 7, then its children are going to be located at two times seven,
Starting point is 00:26:27 14 plus one. So the nodes and plus two, so that the, my children, if I'm node seven are going to be at 15, 16. And so if I'm at 15 or 16, I just reverse that equation and I'm going to get back to seven.
Starting point is 00:26:41 If you either, either one, 15 or 16, because I'm going to round down. So it's literally an O of one operation to go find the parent or the children. Yeah. Now you may have to find multiple parents in the case that you keep having to swap. But the worst case scenario, you're looking at the logarithm.
Starting point is 00:26:57 So with one million nodes, worst case scenario is 20 operations to remove the top node from this tree. Right. You're not going to get top node from this tree. Right. You're not going to get that with any other tree. Yeah, which is also swapping them as they go, right, basically. Yep. And it's keeping it perfectly filled in, too, the best it can, depending on the values.
Starting point is 00:27:15 Right. No, that's really cool. And I like that the whole way it kind of achieves that and is able to minimize those is just kind of by keeping it kind of closely sorted. It doesn't even have to be fully sorted. It's just as long as it's kind of close, then we can be a little loosey-goosey with how we rebalance and we save a ton of steps and we get you most of the way there and it works out pretty dang well. Very nice. So it's one of my favorites. So I was very happy about that. And so the pros for this one is you get a fast insert on average and O log N for the worst case. And if you remember the binary search tree, which is a thing that is often compared to, it's going to have O log N for the average case and O N for the worst case.
Starting point is 00:27:57 So much faster inserts than binary trees, binary search trees anyway. And you've got that fast removal of the root node, which you don't get in any other case, but is a very specialized need that you're going to have. Now, the downside is that slow search. At worst, you're looking through every single element. This is not a data structure you want to use if you're doing any sort of searching. You're better off with almost any other kind of tree
Starting point is 00:28:20 that we've talked about. So when should you use it? When you need to quickly insert data and you only care about getting that, either that min or talked about. So when should you use it? When you need to quickly insert data and you only care about getting that either that min or max value. And of course, there's other 18 variations, but for the most part, it all kind of just deals with how it calculates that most important object. And don't use a heap when you need to search for arbitrary values.
Starting point is 00:28:44 And of course, I mentioned that priority queue with the BFS. Do you guys remember the deal with the priority queues and the BFS? No. I think that's the one we said where, like, you're going to borrow money, and you don't want to go to the grandchildren. You want to start with the parents. So you, like, kind of go to your children first and say, hey, give me $20. You know, if your child says no, so you go to your next child and say, hey, you give me 20 bucks or else I'm going to go ask your children.
Starting point is 00:29:08 And they say no. Then you go to the grandchildren. But basically you start with the nodes that are closest to you. And there's no way to do that without having another data structure in order to keep track of what you need to go to next. And the most common solution for that is to use a heap. So here's a silly question. Is the.NET heap actually using a heap data structure?
Starting point is 00:29:36 Do we know? Why is it named that? You know, I didn't even think about that at all. So you would use it when you want to insert data quickly and you only care about the min or max value. So that's surprising to me. Oh, but how about this? If you just wanted to quickly get the max value being like the next memory address, I don't know how that would work with different sizes though. So I don't know. Yeah. I'm not sure. I was looking for it and i couldn't
Starting point is 00:30:07 quickly come up with anything but it just seems odd that it would be named the same you know yeah i bet it is i just don't know what the the specific the specific advantage of having it being a heap versus something else yeah so anyway i really wish I would have looked that up. Sidebar. Anyway. Okay. Well, if you know, leave us a comment. Chumac, I'm looking
Starting point is 00:30:30 at you. Really good comments. I'm not even finding the class. Yeah. There isn't one. Oh, okay. I thought you were
Starting point is 00:30:39 saying that there was one. No, no. I'm saying like when you talk about the stack versus the heap or whatever. Oh, you're talking
Starting point is 00:30:44 about the memory. The actual, what type of data when you talk about the stack versus the heap or whatever. Oh, you're talking about the memory. What type of data structure they use behind the scenes for the heaps. And why do they use that particular one? Yeah. Why do they use that name? So this question is really across any language then. Yeah, maybe.
Starting point is 00:31:00 Like the Java garbage collection. See, for example, if you allocate memory, you're getting on the heap and you're saying like, oh, why do they call it the heap? Is this some kind of relationship? Is this a heap data structure? Right, yeah. Yeah, I'm curious. I didn't find anything in my quick Googling.
Starting point is 00:31:16 So, all right. Yeah, that's really interesting. I really want to know. Like the whole reason you would use that to me is like you've got stuff coming in fast. Okay, we've got that we want it loosely uh organized so that we can do that quick insert and we never want to search i never want to search memory which seems right yeah it seems like it fits i don't just it's weird that it's got this special property where it's really good about popping off either like the min or max or whatever the most important value is yeah i mean that's the only thing that i'm not certain of is usually the whole point of the heap in in a managed language is the ability to quickly garbage
Starting point is 00:31:56 collect that stuff and i don't think it matters about the max or the min or any of that kind of stuff right it's just is it still used as the reference to it gone? If it's gone, clean it up, kill it. The way it constructs the heap would be interesting. So you allocate a 18 kilobyte string, then it's going to go ahead and put that in a lower spot. And so I would think that would always be arranged such that
Starting point is 00:32:19 basically either the most memoryful item is at the top or the least, either way. And maybe it does the least, and so it's easier for it to kind of go through those bottom nodes and look for things to clean up in a smarter way. I'm not really sure, but that would have been a really good thing to look up tonight.
Starting point is 00:32:34 Sorry. Hey, I got a question, though. You weren't about to ask something. No, no, I wasn't. So you said about if you had a million nodes in it, right? I want to make sure I understand. It wasn't that it was log of a million as is the maximum.
Starting point is 00:32:52 Correct? The maximum number of comparisons you would do is the logarithm of the number of nodes. Which is also the height of the tree. It's the same. But the number of nodes is not equal to the height. No, the height is the logarithm of the number of nodes. So if you have a million nodes, you're going to have a height of 20,
Starting point is 00:33:15 and that's the most number of comparisons you're going to do whenever you insert or remove a node. I wanted to make sure because I thought you said like logarithm of a million. I might have. And that's what threw me off. And I was like, wait a minute, really? But okay. I think I'm with you now.
Starting point is 00:33:32 This episode is sponsored by Discover.Bot. Discover.Bot is an as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. And on its own, a good idea isn't as powerful as it could be. But when a good idea is shared, then it gains strength and momentum. It becomes capable of changing things in a way that is both small and large. A good idea shared becomes an innovation. You got to say it with conviction. It gains strength.
Starting point is 00:34:15 And strength. Discover.bot aims to sit at the intersection of ideas and innovation. They want to help people turn their experiences, discoveries, stories, advice, and knowledge into part of a shared canon that moves everyone forward. For veterans and beginners alike, Discover.bot is a place for learning, teaching, and talking. Head to Discover.bot slash coding blocks. That's Discover.bot slash codingcks to learn about how to get started on your next great bot. Alright, it's that time again where I'm going to ask you and thank you for leaving us a review
Starting point is 00:34:52 because it means the world to us. It's the biggest thing that you can do and it's the thing that we want most in 2019 and every year prior. We want reviews. We really appreciate when we get them. iTunes, Stitcher, if you go to codingblocks.net slash review, we're going to have links to a couple of others.
Starting point is 00:35:07 It's really important for us to help grow the show and it really means a lot and it really helps us out. So if you could go there, leave a review, just a small one with lots of stars, that would be fantastic. Yep. And as we've said before,
Starting point is 00:35:22 Hey, share us with a friend, spread the, spread the word. We, we would with a friend. Spread the word. We would greatly appreciate that. And with that, I want to take a moment. We played this little game before, and I thought it was a lot of fun, about how much data is generated every minute. And there is a new version of it that's available.
Starting point is 00:35:43 And I thought, hey, we could play it again, but with the new version of it that's available and i thought hey we could uh we could play it again but with the new version so we won't oh man i should have paid attention to the old version yeah right right uh and and these are like new categories too some of some of them are new categories so uh i wasn't going to like repeat the ones that we've already done. So, every minute of every day for the year 2018, right? How many packages did Amazon ship? Every minute of every day. Wow. How many people are in the u.s 25 255 million 300 million i thought yeah somewhere in the block so i'm gonna say 500 million packages now every minute of every
Starting point is 00:36:36 day is that is that crazy all right okay let me do some maths. I'm going to go. Every minute. Let's say. Every minute. I'm going to go with a thousand packages. All right. Well, I like these answers. 50 million every minute. 30,000.
Starting point is 00:37:00 A thousand every minute. So we go from one extreme to the next. 30,000 for me. I think 30,000 a minute. Your spot. You one extreme to the next. 30,000 for me. I think 30,000 a minute. Your spot, you're really close, man. Really? Super duper close. Even by survey says rules, you would have won. The answer was 1,111.
Starting point is 00:37:17 Nice. Wow. Yeah. Every minute of every day. That's how many packages they're shipping. That is crazy. Okay. So we all love Giphy.
Starting point is 00:37:32 So hang on real quick there. Sorry. So I just wanted to see how much that was. And so that was 1.5 million. So that's like in a day. Sorry, in a day. So you kind of think about like in terms like what we said, you know,5 million. So that's like in a day, sorry, in a day. So you kind of think about like in terms of like what we said, you know,
Starting point is 00:37:47 300 million, that's, that's like every other person, half of the people in the U S get a package every day. That's crazy. This. Yeah, that's just,
Starting point is 00:37:57 that's all right. Sorry. It's kind of a big deal. I don't know if you've heard of Amazon. There's this thing called Amazon. Okay. So, uh, we all love Giphy,
Starting point is 00:38:09 right? Oh, man, I was so wrong. I'm so wrong, man. I'm sorry. What's 1.5? I was going to say, yeah. Half of people get a package every day. I'm so sorry. That's so wrong. I'm so wrong. Joe's kind of here tonight.
Starting point is 00:38:27 He's sort of partly here. It's one in 150 people get a package every day. So I'm sorry. And that's like third diversion. I'm just going to shut up now. Okay. Okay. So Giphy, big part of our Slack community.
Starting point is 00:38:48 We all, we all know it. Well, is it not Jiffy? Sorry, go ahead. Okay. So how many,
Starting point is 00:39:02 how many gifs do you think Jiffy serves up every minute of every day of 2018? Ooh. And this is, this is just through slack or is this Jiffy or Giffy or whatever it is? Just how much they serve up. Because even if it's coming from slack, they're still serving it. So anywhere.
Starting point is 00:39:18 Wow. So what was the parameters? What was the time parameter there? You said every minute, every, all of these questions are every minute of every day for 2018.
Starting point is 00:39:29 How many? I'll go with 10,000. 10,000. Man. 1,000. Wait, served up. Jeez.
Starting point is 00:39:44 20,000. All right, so $10,000 and $20,000. Yep. Well, I'm going to give it to Joe. He's technically the closest. Worked for it. He's technically the closest. You know I'm proud of you two.
Starting point is 00:40:00 I just got to say, because in the history of all these games, I don't recall you guys ever just gotta say because in the history of all these games i don't recall you guys ever like literally one upping the other person and saying like ten thousand and one you know i don't recall that ever happening but uh yeah one million three hundred and eighty eight 388,889 per minute? Every minute of every day of 2018. Wow. Right? That's nuts. Let's see.
Starting point is 00:40:34 How do they make any money? I know! Yeah, really. Okay, so, let's see. They own the backbone to the interwebs. That's why. All right. Docker.
Starting point is 00:40:51 Kubernetes. That's why. Cloud. All right. How much Bitcoin is created? New Bitcoin. Every minute. Oh, gosh. Two. Wait wait bitcoin still exists i mean they exist they're just not worth much
Starting point is 00:41:12 i'll go with three i'm gonna one-up you on this one three all right because not much happened just as soon as i gave you guys a compliment man yeah but you gave us a fractional question we can't do much with this one all right joe takes it again two to one wow it no no i'm saying your score is two to one joe's ahead two to one oh okay so it was less than two that the new bitcoin every minute of every day 1.25 dang it man wow all. I was actually surprised it was that high. Yeah. That part... Would you consider how much energy is wasted on creating? That's ridiculous, man.
Starting point is 00:41:53 Alright, go ahead. This is making me sad. It is. How about new Reddit comments? 1.21 gigabytes. And these are comments we're talking about. What's bigger than gigabytes?
Starting point is 00:42:12 I think it's 33 Hoobastanks. Great, Scott! 1.21 Nickelbacks. That's too many Nickelbacks, man. All right. So how many new comments on Reddit? You go first, Joe. I thought I'd answer. I'm going to say say one million no uh a hundred thousand per minute okay a hundred thousand
Starting point is 00:42:51 i go two hundred thousand joe is now leading the game three to one. Come on, man. 1,944. Oh, people need to start writing more comments. The world's not filled with enough vitriol so far. Let's get some more up there. I tried to give you guys a hint. We're not talking about upvoting something. We're not talking about a new post. We were talking about comments.
Starting point is 00:43:19 Man, it's Reddit. We should know. All of ours usually just have 10 negative ones on there. Yeah. Oh, man. I'm wondering now. i'm doing pretty good with this quiz like how did i not get that google job so many years ago they asked me how many ping pong balls fit in the airplane i'm like oh wait 300 was it a c-130 hold on hold on c130, 700. All right. How many Uber rides do users take? Or did they take?
Starting point is 00:43:52 Every minute of every day, 2018. 30. Every minute? I'm going to say 60. Come on. You know what this means, man. Doggone it. Did I just go down 4-1?
Starting point is 00:44:08 It's 4-1. Dang it. Wow, and look at what you're losing to. Just running away with it. 1,389 rides. That's pretty close. You're not thinking. 1,000 per minute, man.
Starting point is 00:44:24 You got to think globally, man. I did. We're talking every minute of the day. I thought 60 was pretty good. I was like, hmm. Yeah, I know. That's a lot of rides. Okay, if it were 60, how would people be able to do that as a job?
Starting point is 00:44:39 That's 60? People are traveling every minute of the day. I was trying to average it out to like 8 to 5, right? And then, of course, the drinking hours. So maybe 11 to 1. Yeah, I'm not so good with numbers past 100. All right. I'll give you one last of a new one.
Starting point is 00:44:58 And then I'm curious to see if you guys wanted to see the changes from 2017. So how many songs did Spotify stream? Per minute. Per minute of every day of 2018? 10 million. One million. The score is now 5-1. Yes.
Starting point is 00:45:30 Whatever, man. This game's rigged. 750,000 songs. Wow. Yeah. They need to step their game up. It should be closer to 10 million. All right.
Starting point is 00:45:43 So there were some curious ones in here too. Like, uh, I think we did the text one, one like text messages sent last time. Yeah, this was curious. Last time it was 15.2 million text messages sent for 2017. It went down for 2018. Is it because of FaceTime and, and hangouts and that kind of stuff?
Starting point is 00:46:02 They don't attribute it in this, in this particular, uh, graph I'm looking at, but it went down to 12.9 million. So I thought that was a pretty good drop, all things considered. People didn't stop text messaging. They're using Facebook Messenger and things like that. I guarantee you they're just not counting that. But it's not like Facebook Messenger just came out in 2018.
Starting point is 00:46:25 Like what new communication technology came out in 2018 that would have changed that? I think people are just moving away from SMS, period. I think they've started sending Giphy messages instead. That's probably right. That's probably right. Yeah, they don't have filters in text messages like my snaps do. I got a question i got a question uh same vein how many uh hangout messages are sent per minute in the year 2020 this is a little forward thinking yeah zero yeah
Starting point is 00:46:56 well what about aloe uh that's the thing that's replacing it yep that'll be a lot it'll be zero zero yeah that's right because they can that one too well what about google plus then oh yeah zero okay okay i'm beating now on three for three saying it i see where you're going give me another one i'll get this one. How many times per minute will I go to reader.google.com just to check and see? Just in case. Five. I'll say 0.5 per minute. Just in case.
Starting point is 00:47:33 They brought it back. Listen, if you're an intern at Google, I want to talk to you. I got a 10 spot. If you just plug that server back in. A 10 spot. You realize where they're working, they can't even buy a sandwich for that right yeah that's true here here was another interesting one uh that we talked about last time i believe we talked about the amount of data used that americans used and uh it was measured in gigabytes
Starting point is 00:48:02 and again it was still measured in gigabytes but uh some quick math here, and I can easily make that into terabytes for you if you'd prefer. So last time we said it was 2.6 terabytes every minute of every day for 2017. For 2018, it went up to 3.1 terabytes. So a 50% jump almost. Yeah, that's big. Yeah. And it's just growing yeah uh it's too much people it's just too much yeah scale it back well i think did we talk about the youtube one was that one that we talked about last time call maybe i don't know hit us with the stats, quick. YouTube videos watched per every minute of every day for 2017 was 4.1 million.
Starting point is 00:48:51 But for 2018, it went up to 4.3. So not that big a jump. That's already everyone in the world watching YouTube. That also corresponds highly with how many toilet flushes per minute. So strong correlation there. Anyway. Wow, this took a dark turn. We'll never play that game again.
Starting point is 00:49:22 Those balloons popped on balloons. Yes. All right. So, uh, with that, let's head into my favorite portion of the show. Survey says, all right. And,
Starting point is 00:49:36 uh, so we won't have a survey to report on from, you know, a previous survey to report on because, Hey, guess what? I said it was January, but actually I was lying.
Starting point is 00:49:45 We're recording ahead. So that's why Joe was talking about Advent calendar stuff in January, if you were curious. See. So, but we do have a survey for this episode, which is how much time do you spend coding outside of work? And your choices are nada. I don't write code for free. Or, until I go to bed.
Starting point is 00:50:14 Or, I take the weekends off, otherwise I'm writing code. Or lastly, like a sine wave, I go through phases. Sometimes more, sometimes less. This episode is brought to you by Clubhouse. Clubhouse is the first project management platform for software development teams that brings everyone together so that teams can focus on what matters, which is creating products that their customers love. While designed to be developer first, and that's important,
Starting point is 00:50:43 it's designed to be developer first. The UI is important, it's designed to be developer first. The UI is simple and intuitive enough for all teams to enjoy using. Now, when I mentioned that the UI is developer first, here's how developer first it is. There is a button on your story that you can click to actually get hints for how to create your branch based on the story point. Yeah, the UI is phenomenal. You log in and you can immediately see your work queue, your active tasks, your upcoming due dates, and your activity feed. Yeah, it's easy for people on any team to focus in on their work on a specific task or project
Starting point is 00:51:21 while also being able to zoom out to see the work that contributes to the bigger picture using the fast interface. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates into the tools you're already using every day, like Slack and GitHub, for example, and getting out of your way so that you can focus on delivering quality software on time. And you could sign up for two free months of Clubhouse by visiting clubhouse.io slash coding blocks. And again, that URL is clubhouse.io slash coding blocks and you get two free months and you can see why companies like Elastic, Full Story, and LaunchDarkly really love Clubhouse.
Starting point is 00:52:03 So I want to talk about tries and i'm going to column tries even though it's somewhat controversial looked it up on try as you may tries to spell t-r-i-e-s and if you look it up in wikipedia or other spots you'll see that it was developed by somebody and you know like a million years. And it was named after the word retrieve tree. The person was French. So this is a tree that's spelled like tries and is technically pronounced trees and nobody pronounces it that way. So we're going to call it tries. We're going to call it tries.
Starting point is 00:52:38 We're going to be wrong. Just like everybody else, because that's how language works and English works. And actually we're going to get into that. I mean, just look at Jeff's. So tries also have other names and sometimes the names are kind of specialized versions. Like you'll often see them referred to as Radix trees,
Starting point is 00:52:59 which are technically a more specific version of a tribe, but whatever. We'll just want to go ahead and mention Radix trees, prefix trees, digital trees. If you see any of those things, you're all in the same ballpark. And they are a specialized tree where the nodes don't matter. That, like I saw them on Wikipedia,
Starting point is 00:53:18 it was like instantly derailed. Like, what do you mean the nodes don't matter? Right? Yeah. Yeah. And this is one where you can stash some data in the nodes if you need to do some calculations or you need some metadata or something that's specific to your kind of business-y domain kind of use case, whatever. But really, the data that we care about is going to be associated with the edges. So like a binary tree, you would have like a left and a right.
Starting point is 00:53:42 Or in a B tree, you would have an array there of children. In this case, you would have, I don't know what the data structure would be, but I would think that I would do some sort of hash table here where I have a key that represents each of the edges. And then the value in my hash table would be a node where maybe i keep some other stuff and it more importantly has another hash table with these keys and so this seems pretty goofy and it doesn't even sound that different because it's one-to-one right aside from the root, every single node has one edge leading to it. So if this is one-to-one, why do we even care
Starting point is 00:54:31 if we associate that value with the edge or if it's just on the stupid node, right? What's the difference? I mean, this sounds kind of like a directed graph example. That's what it sounds like. Yeah, where it's weighted. We've got weights associated with those edges, where it's weighted. Yeah.
Starting point is 00:54:45 Yep. Yeah, we've got weights associated with those edges, and it's important because they may not necessarily be the same value. So, one direction is five, the other direction is seven. So, it's really important to have those on the edges. And in this case, it's a little bit different. But the reason that we don't say that the value is associated with the node is because the nodes don't really have any sort
Starting point is 00:55:06 of importance aside from notating the existence of the edge. And so if I pluck a node out and then re-add it, it's not even, it's not even cynical in this case because we don't care about the nodes, we care about the relationship. So it doesn't make sense for me to say, hey, give me a node out of it and put one back in because I'm not really doing or adding anything. It's care about the relationship. So it doesn't make sense for me to say, hey, get me a node out of it and put one back in because I'm not really doing or adding anything. It's all about that relationship. So I don't say, hey, here's a node. I say add a relationship for like the letter 7 or the letter S. And actually this is frequently associated with alphabetical type things.
Starting point is 00:55:46 Yeah, it's so unfortunate here because the way the Wikipedia article versus, like, going back to the Impostor's Handbook, for example, like, their visualizations are in contrast to one another.
Starting point is 00:56:03 Oh, really? Okay. Yeah, because the way... Okay, so let's look at the Wikipedia article for a moment. So they're consistent with how you're describing it, that the edge is what the value is, and the node is the relationship. So if you were to walk down in this particular... You'll have to...
Starting point is 00:56:22 Dear listener, there'll be a link for you to click on. But, you know, you can go down that the topmost parent node is always going to be blank. If we think back to the example that he gave where it's like words and alphabetical kind of use cases, right? So no letters. So parent, then you find your first letter. So maybe you go down to T. So you're at that first node level there for T. Then you're like, okay, well, the relationship to E.
Starting point is 00:56:54 So T represented the line that pointed to that first node. Then you say E as your second letter. There's a line that represents that. And the next node down is the relationship of a particular word. And so you can formulate words by traversing the tree in a particular path if there is such a word, right? You know who has a good visualization? Is it California? University of San Francisco. Yeah, I'll drop it in here.
Starting point is 00:57:50 And so I guess the part that made sense was kind of what you were just saying is you end up with a tree of a word. Like the interesting thing is if you go to this, this particular visualization on the try or tree is if you try and punch in a number and hit insert, it doesn't work. If you use letters, it'll draw a node. And so I was really confused because I did ABC and a created separate nodes. It's not until you do multiple letters that you'll see the tree grow. So basically if you have 20 letters, you'll have a 20 depth tree, right? Or 21 nodes actually, because you're going to have your root node and then the 20 additional child, child, child, grandchild, great grandchild, et cetera. See, this is also still so confusing. Their representation of it matches the imposter's handbook. So you can draw it either way. But the reason I like the definition of having that stuff sitting on the edges is because later when we're going to talk about briefly about compressing the trees, that's where you basically take any node that's parent only has one child and you compress it. And then in that case, you're changing the
Starting point is 00:59:07 value that would be associated in the node. And that's different from every other tree that we've talked about where you give it a node that has a value and it does something with it. In this case, the tree actually owns the node's values. So you wouldn't even, you don't even provide the nodes to the tree you give it a word so like even type in like abcde that's not really a good use case for this try typing in or reset the tree type the word code enter it then do coder enter it then do coding enter it then do i don't know cod enter it and what's it's going to create a tree based on those letters that kind of um well it was the commonality being a big prefix right it'd be better if you know you had too so many that were like the same word but like if you did like impose impasse imposter right like you
Starting point is 00:59:58 would see some variation among that tree like they the the top of the tree would remain the same I and P, for example, but then you would start to see it differentiate below that. Yeah. And in the case where you actually compress that tree, you could take that I and M and say, you know what? Every single child from here has I and M as a prefix. Why do I have two nodes for this? I can save space by saying this node is I M. And then I don't split that node up until I get some reason, you know, some other combination there that starts with an I. And that compression is actually really important to saving space and also speeding things up. And that's why it kind of boggles my mind to think about the nodes having values. But, you know, I think either way you slice it, it's the same thing.
Starting point is 01:00:41 So conceptually, it doesn't really matter if too much if it's the edges or the nodes, because the tree or the try in this case, owns the values of those nodes. So it is responsible for changing it also dealing with the implementation details of like, finding things or adding things to it. And so you're pretty far removed from the actual values of the nodes, and you don't really get to do much with it as a user of this data structure. Now, kind of going back to what we hinted at at the very beginning, though, when we were talking about the space implications of this, and you kind of mentioned this, too, in regards to the compression. The space complexity of the try is where there can be significant gains here
Starting point is 01:01:25 because you're not having to restore the entire word. You only have to store parts of the word. Yeah, so the example I said, like code, coder, coders, coding. I think I did a little bit of math down here. Once again, I'm lost in my examples. Yeah, well, we'll get to it here in a minute. But basically, you don't store the letters uniquely, but you do store the combinations uniquely. So like the example of code, coders, coding, whatever,
Starting point is 01:01:56 then the COD prefix is only stored once for all of those different words. So it's much, much less space requirements than storing all those words inside an array or dictionary. But you've got to remember there's a pointer that's going to be going to each one of those. So if your data doesn't have a lot of duplication, then it could potentially actually be worse than storing in something like an array or a hash table.
Starting point is 01:02:23 And I've got an example that I think is going to help a lot here. So if you think about a spell checker for like a, you're working on like a word document or something like that, and English is a mess, bringing it back around. It doesn't really have very good rules. Like, you know, we can say Q is always followed by a U. I is not always before E because there's that weird C case. And Y isn't always even a value. And even aside from that, there's all sorts of words that we're adding all the time that have numbers in them. And there's actually words in the dictionary that are multiple words. They have spaces in them.
Starting point is 01:02:59 They have accents in them. We have words from other languages. There's just so much crazy stuff that you can't really define the English language with just a simple set of rules. And so what typically companies like Microsoft or whoever makes the grammarly makes these dictionaries and spell checkers is they just have a big fat dictionary with all the words. There might be some shortcuts there, but that's kind of the gist of it. And so I went to look up how many words there were in English. And just like anything else, you know, it's complicated. Is run and runs the same word or not? Different people counted different ways and things get weird. This dictionary in 1970 had this many, but this
Starting point is 01:03:39 other one had another. But the best number I could kind of come up with for our case where we do count things differently is about a million. And that's why I knew offhand that 20 was the logarithm of 1 million because I had looked it up for this example. For a build and a spell checker. Yeah, the logarithm is
Starting point is 01:03:57 20 of 1 million. No, it would be 20 levels. 20 depth. Which also happens to be the logar. 20 depth. 20 depth. Which also happens to be the logarithm base 2 of 1 million. No. Yeah. Oh, wait.
Starting point is 01:04:12 Maybe I'm thinking base 10. Yeah, base 2 of 1 million is 20. Actually, it's 19th century. 19th century, yeah. So, if we were to get a big array keep it sorted because we know we can do that binary search on there we're going to have a million values in there and i looked up the average size of the word and i actually i was curious just that the overhead is string so i looked that up too and john ski actually had a really great stack overflow uh overflow question that he answered
Starting point is 01:04:41 and came up with some numbers so i actually put that together and I figured out that if you were to store the entire English language in an array, you'd be looking at about 30 megabytes or 32 megabytes, which is actually not that bad, except you think it's, you know, text. So that's kind of crazy. But that means in that big array that, you know, the first word is probably a, and the second word is probably aardvark and the third word is whatever. Uh, obviously there's a lot of repetition. You know, I mentioned code coding, codes, coder. Um, there's a lot of the, coders, coding, then, uh, let's see. I got the math in here somewhere. Okay, here we go. So if we were to store those words separately,
Starting point is 01:05:35 we'd be looking at a total of 26 characters, but using the try, it's only going to be 10. And you can imagine that's just a contrived example. If you were to think about every word in those 1 million, there's a lot like automobile, automaton, automation. How many cases are there where there's a significant prefix that could potentially be reduced? And aside from the size, that's really only kind of a side note. That's not the primary reason to use this data structure. It's nice that you can save a lot of room when you've got redundant data, and it's definitely important. But it's also really efficient for looking stuff up in certain cases.
Starting point is 01:06:16 Like we mentioned, this is a spell checker, right? So if, again, our naive hash table or array example, you hit save or you type in a word, we would see the word break. We would go through and we would do a binary search and it would take a logarithm amount of time to figure out whether or not your word is in our dictionary. We could give you the squiggles or not. That's a fast lookup. I can't dog on the searches for that, or the hash tables and the arrays, because they do a really good job at just using a lot of space. However, if we were to do this with a try, first, I don't know what it would be. I'd have to run actually a dictionary of a million words through to figure out what the size was. But what's really cool about it are the other benefits. Like because I've got this thing
Starting point is 01:07:05 stored in order of the word, I can tell you as you're typing, as soon as you type a wrong letter. So you start typing C O D F. That's when you get the squiggles and the time complexity for me to do that is constant. So it's O of one. It doesn't get any faster than that because I should say I was kind of using the example where you look up per letter and kind of as we go along. But ultimately, a better way to say that is that the search time is dependent upon the size of the word that you're searching for. So that's C-O-D-F. I'm looking at four as I start at the C. I go down to my child, which, remember, has the key of the letter. So O-D.
Starting point is 01:07:59 And then when I see that there's no child with the letter of F as the key, then I know that's an invalid word word and I can duck out right then. So as you're typing. The next lookup is constant time is basically what we're saying. It's for every key you're getting O of one lookup. So number of letters. And then you get down to that last one and you have immediate results because you had the pointer to that last character that you had done. Yeah.
Starting point is 01:08:24 It's trivial for me to look up as you're typing to see whether the word is misspelled or not. And from there, you can see it's not too crazy to say, like, you typed C-O-D-F. We don't know what that is. However, here are a couple words that you might have meant. Coder, coding, coders, whatever, because those are the things that it's easy for us
Starting point is 01:08:41 to kind of traverse the tree down and see what you might have been looking for instead. And you can imagine some of that metadata that I mentioned storing might involve storing some of the most commonly used words that involve that prefix. So just thinking out loud here, this is actually better than a hash table because your hash table example works great if you're searching for a complete word. But if you're doing a partial word like COD, it falls apart because now you've actually got to search the hash table. You're scanning the table as opposed to going directly to a key, right? So this is actually better if you're trying to do a partial lookup on something.
Starting point is 01:09:25 Yeah, I mean, you pretty much can't do it with a hash table. You can't iterate through your keys unless you've got a secondary data table, in which case you're cheating. So I'm not counting that data structure. But if you've got an array, you can do it, but you just have to start with the CODs and then loop through all of them until you get to COE or something that's past that, which is a big pain in the butt. And it's so slow there because you're looking at an order N operation, especially that you type that first letter A, then that's going to be miserable in an array or data structure. So you just don't do it. That's why this is a much, much better data structure
Starting point is 01:09:56 specifically for that type of thing. That's pretty cool. And it's often used for autocomplete, autosuggest type stuff. And English is a great example because it does have a high amount of redundancy. And so you can get that kind of savings. And you can see how it's almost got this kind of like notion of compression kind of built in, right? Because, you know, every word that comes in, we've keep the prefixes only once. And so we're saving that data size, but we also save that when we're doing our lookups. And it's also really easy to iterate through. Like we mentioned the hash table. Like if you want to say like, oh, let's start with the COD key or any keys like COD and then go to the next, go to the next, go to the next, you just can't do it. They're not sorted like that. You have to store those in another data structure and then look them up that way.
Starting point is 01:10:45 But this data structure is in alphabetical order by kind of definition, right? So I could start with C-O-D-I-N-G, and as long as I insert those values to the right every time one comes in or we keep those letters in order, then it's really easy to say, okay, well, what are the next five in alphabetical order? That's pretty nifty. So I did a little bit of research on this, and I did see that there
Starting point is 01:11:09 is one better algorithm. They actually notated this in the wiki that said actually, if you're doing spell checker, this is a great example for tries, but there's actually one better, more specific data structure. So I thought that was kind of cool. I didn't look too much into it, but I think that what it does is basically it's more finely tuned for kind of keeping track of common terms
Starting point is 01:11:28 and stuff like that. So it's got some built-in support for doing that sort of thing. But the name is pretty funny. It's DOFSA for short. So if you know people referring to it as DOFSA for short, then the real name has to be just miserable right it's a deterministic acyclic finite state automaton so if you're doing a spell checker autocomplete type thing then that's probably what you're going to want to look at you're going to need a spell checker just to spell this thing yeah yeah it's like i mentioned too like tries can compress really well so our example the words that i gave coder coders coding whatever they always had the first three letters like well what if our data structure saw that and said okay well you know what uh let me just go ahead and
Starting point is 01:12:18 combine those because c only has one child the o only has one child the d and it's not till we get past d that there's any sort of variation so let let's just pull them all in together. So I've got one node that's now COD, and then the things branch off of that. So now I've dramatically reduced the size of my tree and the number of comparisons that I need to do. And they say that in practice, this works out really well. So I came up with some contrived examples like in my english language dictionary probably not good because that's like every word combination is a bunch of bad ones but you know the q and u i know that there are words that are q and not followed by u but if there were that would be a good example of something that we can press or if there's like say anytime the word ends in y that the the letter n
Starting point is 01:13:03 comes first or something like that depending on your your set of words, then that's a shortcut you could take. And there's some really interesting math behind that. Where I was thinking, though, that it might happen is maybe more towards the end. Once you get more towards the end of a particular path on the tree, that's where you might have compression more often, right? Oh, yeah. At least in our spell check mean at least it makes sense yeah and actually this is used a lot of times for um things like ip addresses or phone numbers or other kind of areas where um
Starting point is 01:13:35 you know the phone number obviously the case is going to be we're going to have a lot more area codes that are the same depending on your kind of purpose like if you're operating like in a local area but the ipaddress is the opposite where... Oh wait, no, that's not true. No, like router lookup. The prefix is often similar. But yeah, in English, I think the ends would be there. They also use it for binary, so it's
Starting point is 01:13:56 not just for English language, but that's kind of the most common example just because it's easy to give. Yeah, that's cool. And the compressive strategies do get complicated because it also means that whenever you're looking up values that you have to kind of know that there could be more than one. You need to know to split stuff up when you get new
Starting point is 01:14:13 terms in. So that's not always so great. It's still pretty cool. When do you make the decision that, okay, now is the time to compress? But I guess, like you said, those compression strategies can get complicated. And I guess deciding when to compress is part of it. You could do as you go along.
Starting point is 01:14:32 But you can imagine if you're in a case where you're loading a dictionary for the first time. Yeah, that sounds awful. That would be terrible because you're going to be compressing. Yeah. So it's something you want to do periodically whenever you get like a new batch of data in or something. Yeah. So as far as pros, this is often used in place of a hash. It's got a faster worst-case lookup than an imperfect hash table,
Starting point is 01:14:53 and perfect hash tables are kind of hard. So it's a little bit faster in cases where we know something about the data, or funnily enough, in cases where you don't know anything about the data, and so we can't guarantee a perfect hash. There's no, you know, bad hash functions that we have to deal with. We don't have to worry about that. It's, we don't have a hash function, which also means we don't have to worry about collisions. So it just kind of takes out some of the overhead and that overhead that I'm talking about isn't
Starting point is 01:15:20 even included in the, the big O times because they kind of consider that constant. So it's just kind of a nice reduction of overhead for a try over a traditional hash table. And you can get that ordered listing of values which you cannot do with a hash table at all. So if you need that, you need to be able to kind of like find a key and then move on past that, then this is your data structure and a hash is not all. So if you need that, you need to be able to kind of like find a key and then move on past that,
Starting point is 01:15:45 then this is your data structure and a hash is not a way to go. And despite those pros, it can be slower than traditional hash lookups, particularly if the random access time of the medium is bad. And this is actually a case where they said in spinning drives,
Starting point is 01:16:00 tries cannot be so good because they're not so good at hopping all over the place like you can with SSD or RAM. So the physical medium actually has a bearing on this. And it's not good when there would be long meaningless change. So we talked about storing numbers in this
Starting point is 01:16:17 and binary even, and that's fine. But if you've got floats in there where you've got these long kind of meaningless changes where chains were like 2.333 isn't really different from 2.3 then the try is not going to help you on that the data structure has any doesn't have any sort of support for kind of recognizing those two things kind of mean the same thing and just ignoring so not good for floats it can end up taking more space in a hash table in certain situations because of those pointers
Starting point is 01:16:47 there. So if you've got a lot of values with no redundancy, then you're going to actually end up spending more space than you would in a traditional hash table. So ultimately, you know, you should use it when you need those fast inserts and you need efficient sorting and you've got lots of redundant data. And search
Starting point is 01:17:04 engines actually use this really, really effectively a lot of times for like auto completes and stuff like that. All right. Well, with that, we'll have a lot of resources, a lot of links for you in this episode. So be sure to check out the show notes and you can see the resources we like section there. They'll have all kinds of references for you. And with that, we can head into Alan's favorite portion of the show. It's the tip of the week. All right. It looks like I'm up first this time. And so I was kind of thinking about maybe putting some stickers on my laptop, but there's this kind of this weird phase you go when you start
Starting point is 01:17:40 stickering a laptop where it's like you put one on it's fine then you put two on it you know it's just kind of sometimes this is like this little trash trashy trashy uh in between period where like you don't have a lot of stickers on but it's kind of like you know it's a little bit and maybe there's like one that's overlapping another and it's just this awkward like kind of adolescent phase that things go to and you just need more dev stickers but sometimes you know you go through a dry spell and so i was looking on amazon and i noticed that you could just buy tons of dev stickers so we're gonna have an affiliate link so if you click this link we'll get like you know a nickel or something and uh there's a ton of stickers that you can buy for like eight bucks ship free on prime and it's good stuff there's like lin of stickers that you can buy for like eight bucks ship free on prime.
Starting point is 01:18:25 And it's good stuff. There's like Linux stickers, get stickers, Java stickers. There is no cloud stickers. And some of the really funny stuff, PHP, geek inside,
Starting point is 01:18:36 like a whole bunch of stickers. So you can get like nine bucks. And of course, if you keep Googling and like looking around, there's like actually tons of stickers you can buy on Amazon. And some are like really cool. Like, I don't i don't know i guess i'm out of sticker game and have been since like elementary school um but there's a lot of cool really vibrant neat stickers i'm seeing like puffy stickers like there's freaking hologram stickers no js stickers
Starting point is 01:19:02 all just all sorts of stuff so they're really cheap and just go get some stickers. If you want to, if you want to do that and avoid that kind of like that weird kind of, you know, awkward phase with your laptop stickering. But I mean, we, we know everyone's favorite is going to be the coding box sticker.
Starting point is 01:19:20 Yes. Oh yeah. And speaking of which we, we forget to mention this all the time. If you want a coding block sticker,locks sticker, drop us a note. Head up to CodingBlocks.net slash swag and drop us a note, and we'll get some stickers out to you. And with that, nobody's stickering up their laptops. That's crazy talk.
Starting point is 01:19:39 Says the guy who just put a sticker on his. Well, I mean, it was an important one. It's the CodingBlocks one. It's the coding blocks one. It's the only thing that I deface my laptops with. I finally saw a sticker in mine. That's ridiculous. Wait, that's the only one? I know.
Starting point is 01:19:52 It's the only one. You put a sticker mule? Is that the only sticker that you put on there? You didn't put the coding blocks? I'm going to send him some goop off. That's ridiculous, man. Actually, I bought the sticker pack from dev.to. So I was thinking about going all the way, but I just wanted to kind of try one and see if it was going to rub off or whatever.
Starting point is 01:20:08 But actually, it's been great. So I think I might go all in. I have an idea of a sticker you should put on your laptop, sir. Everyone knows that a coding box sticker would be more than worth a thousand nickelbacks. I have about a thousand cling box stickers in my closet but i don't want to waste one you know that's man that's it we have a talk with this guy this relationship is over all right all right i've got two one tip is because i literally experienced some massive pain in the past couple of weeks. So quick background,
Starting point is 01:20:47 had a console application that was supposed to just do some stuff, right? And it's a process that could be long running and it just goes and goes and goes until it's done. The problem is it didn't go and go and go until it done. It go to go to go to until it stopped doing anything and it would stop logging. It would stop telling you anything. It would just stop, but it didn't, it didn't quit. It just didn't do anything. So what I found through much heartache and just hours and hours of boring looking is there is an option on windows when you open up a command prompt window if you right click that thing and you go to properties there's something called quick edit mode if that's turned on it's really cool because it's what allows you to do like block copies inside a
Starting point is 01:21:40 console thing so imagine you're tailing an output of a log file that streaming stuff across. If you click in there, it'll actually pause the windows so that you can, you know, copy something. And then if you hit a key afterwards, it'll keep going. The weird part is this was running as a background task. And for some reason it would would act like quick edit mode had engaged and it would just stop. So that said, I did find something to where if you write console apps that are going to be doing long running processes or something like that in C sharp, there is a stack overflow here. And I wanted to give this one a special shout out because the guy that put the answer up here that is really awesome, very elegant, it works perfectly. Like I can launch it and I can go look at the properties of the window and I can see that quick edit mode is disabled.
Starting point is 01:22:33 This is the guy who wrote independent. So Patrick, the guy that we've talked about his product on here before, put one of the just – it's just a beautiful answer. He's got the entire solution that he's got right there. And a huge thank you for putting it together because it saved me some pain because you actually have to reach into the underlying operating system, DLLs and stuff to, to do these changes because it's not exposed natively through the.net
Starting point is 01:23:00 framework. So really cool stuff. It's also worth pointing out though, sorry to interrupt. It's worth pointing out that hit that answer from him. While it is not the accepted solution answer to the question, it actually has more upvotes than the one that is given the credit. Yep. The one that was given the credit had the nuts and bolts information,
Starting point is 01:23:22 but you would have had to have done quite a bit more research on your own to, to put this thing together. And he was like, you know, I really like to be able to copy and paste code that works. So here you go. So that, I mean, just awesome answer. So thank you from Patrick who did that. It's amazing stuff. And he does give credit that his solution was inspired from the original. Yep. So if you ever run across that, like seriously days of my life wasted, I'm not exaggerating. It was,
Starting point is 01:23:49 it was highly frustrating. So the next thing, so I love programming that, that solves some sort of unique problem. That's fun, right? Like I like big data problems because to me that's unique and it's, and it's interesting and they're hard to solve and it's challenging.
Starting point is 01:24:07 So probably a lot of you guys haven't even heard of this guy, a guy named Mark Rober. He literally wrote software and did engineering project, or I don't guess he wrote software, but he was an engineer for NASA, right? He did rocket science type stuff. Well, this dude has one of the most amazing YouTube channels that there is period bar, none, just really cool stuff. But there's one specific video that you absolutely must go check out where he created something and they had to write some code to make this happen. And I'm not going to spoil it. I'm just going to
Starting point is 01:24:43 leave the link here in the show notes. And you absolutely need to go check this out because it will, beyond a shadow of a doubt, it will put a smile on your face because it is just, it's creative, it's genius, and it's where I like to see engineering minds go, right? Writing code and writing and creating things that solve unique problems.
Starting point is 01:25:05 Can we at least give a hint that it's a revenge story? That's perfect. Yeah, we can leave it there. Yes, it's a revenge story. Okay. All right. And so with that, hey, if you have a tip for us that you would like to share, you can go to Joe's favorite place on the internet, cb.show slash tips and submit your tip for the tip of the week there. Because I bring that up
Starting point is 01:25:34 because I pulled from that and want to say thanks to the pigeon fighter because the tip of the week that I'm giving this week is the semantic colorizer plugin for Visual Studio, Visual Studio 2015 and up. And this editor extension will give you distinctive colors. So what's really nice about it, really, aside from like your normal kind of keyword type of colorization, the thing that really draws me to it that I like is the colors for things like parentheses and curly braces and things like that and brackets and whatnot. And I think we've talked about a similar extension for Visual Studio code called, I think it's Bracketizer. If I remember right. So if you're familiar with Bracketizer, this would be like that, but for Visual Studio. Very nice.
Starting point is 01:26:31 Yeah, it looks like it does a lot more. Super cool. So again, thank you to the Pigeon Fighter. All right. So we hope you've enjoyed this episode on heaps and tries. And as Joe mentioned earlier, if you would do us a favor and leave us a review, we'd greatly appreciate it. You can find some helpful links there by visiting www.codingbox.net slash review, as well as we would appreciate it if you spread the word,
Starting point is 01:27:06 you know, tell a friend. You can subscribe to us on iTunes, Stitcher, or more using your favorite podcast app in case somebody did happen to share a link with you and you're listening to us that way. Definitely. And while you're up there, make sure you check out our show notes.
Starting point is 01:27:19 I mean, put a ton of time and share all the information up there with you. And also check out our examples discussions and more and send your favorite questions right to the slack channel going with the select the comment be sure to follow us up to our account go ahead over the box you can find the social links at the top page you got a mouthful all right sometimes i get excited then i go a little fast we got one and a half hoop stanks down turn it down so what's bigger a hoop of stank or a nickelback
Starting point is 01:27:49 that's the show title right there 1.21 hoop of stakes great scott

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.