Coding Blocks - Data Structures – Heaps and Tries
Episode Date: January 21, 2019We dig into heaps and tries as Allen gives us an up to date movie review while Joe and Michael compare how the bands measure up....
Transcript
Discussion (0)
You're listening to Coding Blocks, episode 98.
Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app.
And check us out at CodingBlocks.net where we can probably show this example of discussion a lot more.
Oh, you're trying to mess it up for all the two-times listeners, huh?
There's been some controversy lately, so I just want to, you know, buck with the status quo.
Alright, so I'm going to go the way that I believe that I sound.
Send your feedback questions.
I can't even listen.
That's too painful. And rant to comments at coding blocks.net.
Hey,
did you,
okay,
hold on and follow us on Twitter at coding blocks or head to www.codingblocks.net.
Follow,
find all our social links there at the top of the page.
I'm Alan Underwood.
I'm Zach.
And I'm Michael Outlaw.
This episode is brought to you by
O'Reilly Software Architecture Conference.
Have you got any plans this February?
No?
Well, now you do.
This February, the 3rd through the 6th
in New York City, O'Reilly is hosting the Software Architecture Conference. I mean,
you wanted an excuse to visit New York City anyways, right? So, I mean, you're welcome.
The O'Reilly Software Architecture Conference is the only conference that focuses exclusively on
software architecture and the evolution of that role. So if you want to dive into technical weeds by covering complex topics from microservices to domain-driven design with different learning styles available,
ranging from 50-minute sessions to two-day training courses.
Learn how to communicate, wait, scratch that, sell complex technical topics and their merit to both upper management and technical teams with training courses like the Architectural Elevator.
Well, wait, I'm not an architect yet, you say.
Well, a special networking experience called Architectural Kadas is where you can go to practice being a software architect by breaking up into smaller groups and working together with other people on a project that actually needs development.
Yeah, network with people using the same tech stack that you use to gain personal insight
that you can apply to your own environment.
The O'Reilly Software Architecture Conference covers the skills and tools every software
architect needs. code BLOCKS, that's B-L-O-C-K-S, during your registration to get 20% off of most passes
to the O'Reilly Software Architecture Conference.
By the way, because we did all that, did you guys ever watch the, what's the new movie
that came out?
It's got the rabbit, it's in the fox, and...
Oh, yeah.
Into the Spider-Verse.
No, come on, man.
What's up on a Deadpool?
Man, you guys really bumblebee.
Officer hops.
Anyways, there's a scene in this movie that I cannot think of that is absolutely amazing. And it's, they go to the DMV
to get some information
and the people working behind
the counter are all sloths.
It's my favorite part of the
movie. It is so painful.
And it's so slow. So that's amazing.
And that movie
is, should I tell you?
Or should I leave it as an exercise for the
lister? Zootopia. I got it as an exercise for the lister?
Zootopia.
I got it.
Wait a minute.
My movie didn't just come out.
Well, you know, relatively recently. Hold on.
Within the past decade.
We're good.
Your movie reviews don't count anymore.
No, it's pretty terrible, right?
All right.
So a little bit of podcast news coming up here.
So first, big thanks to some iTunes shoutouts, iTunes reviews.
So shoutout for Pendelgeist and Bozzoli.
Bozzoli, sorry we were late.
We appreciate your patience with you, and thank you so much for installing iTunes, leaving the root, and then probably deleting iTunes and telling them in the review that you're going to do it.
That's awesome.
That was baller.
That was boss.
So thank you very much.
Mike, you want to do these?
Okay, sure.
From Stitcher.
Now, I totally didn't take a moment to practice like my normal practice,
so these are going to come out really raw.
JBZ Cooper, Terrence,
and Saltire
Steve?
Saltire
Steve?
I would have said it kind of like satire, so Saltire
Steve. Okay.
I think that's how I would have gone with it. A little salty
satire, I like it.
Alright, a few things I want to mention.
One, I gave up on Advent of Code. I hate giving up on coding challenges. I like it. Alright, a few things I want to mention. One, I gave up on Advent of Code.
I hate giving up on coding challenges. I feel like
I'm giving in. I mean, I tell you, I spent
eight hours plus on
problem number 15, so I apologize to everyone
on the Advent of Code channel.
I'm throwing it in.
That's December's news.
Who cares? It's January.
It's really cool coding challenges, though.
I got to do a lot of cool A-star algorithms or like singular circular circular linked lists and a lot of cool
data structures and things like we would talk about in the show like next uh problem but oh
this is a heap so that was really convenient so i thought it was really cool but i just uh when i
added up the time on it and then realized like oh i'm gonna be spending 50 hours this month
on these problems i gave up but i'm curious to know if
anyone else did it out what their experiences were and you know like obviously the problem
wasn't designed to take eight plus hours i'm just a adult and couldn't figure out how to get that
last one over the line there but um wait wait are you saying adults have no capability of doing
these things oh so i meant dolt like d-o-l-t like the. Got you. Yeah, 1993 called and they want their insult back.
I thought you were basically saying fresh college grads would get this in a heartbeat.
And of course, you're in.
Okay.
All right.
Not that.
I'm just old.
Yeah.
So I don't know.
But I would be interested in finding more things like that that aren't so time intensive.
Because I'd like to be able to do challenging problems that make me feel accomplished when I finish.
But I can't be doing two hours of homework every
day and going to work, right?
That's a bit of a crazy.
So the code caught us
on Code Wars,
you're past that now?
No, I still really like those, although I haven't done them
in a long time. I've been slacking and I need to get back to it.
But I like those because so many of them, you can set
the difficulty level. So you
can do 10 minute problems or 20 minute problems
or hour. I've never tried to go
super hard on the problems but I like the idea
that you can kind of dial it back. And
if I disappear for
a year and come back to it
then I can do that and I don't feel like I'm
giving up on Santa which is how I feel about
Advent of Code.
You can't flip any more of those calendar doors open anymore now?
Yeah, I ruined Christmas because of problem 15.
Well, yeah.
Like I said, it's January now, so who cares?
Yeah, that's right.
And speaking of January, Orlando DevFest is coming up,
and I wanted to mention that for a couple different reasons.
One is it's just a really cool conference, and it's in my backyard.
And two is because it totally snuck up on me.
I totally missed that this was going to happen until basically a month before the conference.
And so it was kind of upsetting me to think, oh, there's this really cool conference going on right where I live.
It's got a bunch of topics that I'm interested in. I'm looking at like Flutter and DevOps and
I'll just, I'll coach those sorts of cool stuff. And I totally almost missed it completely. If it
wasn't for a meetup, I would never have found out about it. And it's like, if I, someone who goes
to a lot of meetups, who pays attention, who specifically looks for conferences in the
Southeast to go to, if I almost miss this, what else am I missing?
So I was curious how people find out about conferences
because there doesn't seem to be one good spot to do it.
I really don't know either.
I usually find out at meetups or what you guys tell me.
Yeah.
Yeah.
I mean, it really kind of is sad, right?
Yeah.
Yeah. It's just kind of crazy sad, right? Yeah. Yeah.
It's just kind of crazy to me.
So, I don't know.
I was just wondering, like, if someone's out there that has a solution to this problem, then I would love to hear it.
Because otherwise, you know, I've got a little bit of time off work coming up.
So, who knows?
I might try to build something crazy.
Hey, speaking of which, before you move on to the next one, didn't we both put in for a talk down at Orlando?
Yeah. Orlando Cogame, which happens in March. So, we're both there. We're actually getting into Booth this time. move on to the next one didn't we both put in for a talk down at orlando yeah orlando cooking which
happens in march so we'll both there we're actually getting into booth this time so we're
gonna be hanging out uh we'll probably have some stickers too so if you're in the area you should
definitely come by and hang out with us because otherwise we're just gonna be like kind of
staring awkwardly into nothing yeah but so did you ever hear back on whether or not you got
accepted for the talk? I'm assuming
not, right? I have not put mine in yet.
What do you mean assuming not? Wow!
That was harsh!
I haven't actually been in my talk for that yet.
I will be, but they aren't announcing
whether you got in or not until
January. Oh, okay. So
shots fired, not fired. Okay, good.
I was just saying that... We're good for now.
Even that's the thing. With conferences, when are the call for proposals available?
When are the deadlines ending?
It's like this secret weird network where the speakers all kind of whisper each other.
I don't know if it's supposed to be secret info or whatever, but I would like to see that information be readily available and consistent.
Yeah, it kind of stinks too, though, because if they're telling us in January, right, then that means that you've got two months to prepare for a talk, which, as we know, blows by faster than what anybody hopes for.
So anyways.
All right.
Cool.
And so that's Orlando DevFest, so January 19th.
But I also want to mention a contest that we're going to have coming up here pretty soon.
So if you're not familiar with React Amsterdam, go Google it because it's the biggest React conference in the world.
So they've got tons of developers, really cool talks coming up in April.
And we're going to have a free ticket to give away.
So if you're on a mailing list, then that's just an example of the type of contest that we like to do.
And we're going to try to do even more next year.
If you're not on the mailing list, you can go to codingbox.net and sign up there on the top right.
Or if you're on a phone, maybe scroll down to the bottom.
But somewhere there's a mail sign up. and we try to keep that list clean.
We try to pretty much only really do contests on it.
There may be an occasional other thing,
but I also try to have a little fun with those contest emails.
So you should join and let me know what you think.
Yep.
All right.
So now it is time to move on to heaps.
What you got, Joe?
Yeah, heaps are one of my favorites.
They're kind of like specialized binary trees.
So they have two nodes, left and right.
But they're kind of special because they keep data sorted-ish.
So like a binary search tree, you always know every value to the left is smaller and every value to the right is bigger.
But with the heap, depending on if it's a min or
max tree, I'm just going to focus on max trees. It's basically max heaps. It's basically the same
thing. Just reverse the greater than less than sign. So I'm just going to stick with max.
Every child is going to have a value that is greater than you. Now your left may be bigger
than your right or your might be right might be bigger than your left.
And that doesn't matter.
All that matters is that it's bigger than you.
And that keeps things close to sorted but not perfectly sorted, which has actually some interesting properties. It makes slower lookups because – a lot slower lookups because you have to look at every value because you really don't know where any specific value is going to end up a tree other than it's going to be you know down or up well you said you have to
look at every value you look at every value till you find it right till you find it so you could
get lucky and find it really fast or you could get really unlucky and find it you know the very last
thing that that was in it yep worst case scenario you're going all the way through every single node.
Okay.
So it's not,
it's just flat out,
not good for searching.
However,
because of this kind of interesting property where we don't really worry too much about things being perfect.
What this means is that like in the max case,
we're going to have our biggest value in the tree is always going to be up.
So the biggest value is going to be the
root. So in a case where you only care about getting the maximum value out of a data set,
this is a great data structure. And you may think, when is this ever going to come up?
But it actually does in a few interesting cases. Primarily, the use case I'm used to seeing is
priority queues. So so if let's say
we're looking at like tasks or we've got some sort of um stuff that needs to happen and it needs to
happen in a particular order based on whoever's got the priority or if we're doing like a breath
first search or we've got a tree and then we've got this other tree this heap over on the side
which we use to keep track of what needs to go next then what we
do is we make sure that that top most value is always the next one that we're going to grab and
then we can just pluck it off and because we don't really care about whether it's to the left or the
right it makes that balancing cheaper than it was for the b trees or the self-balancing binary search
trees or binary trees so it's kind of a compromise there,
where we say it's okay to have things sorted-ish,
and we're going to deal with worse searches,
but we're going to have really nice rebalancing.
And the algorithm itself is actually really cool.
And we're going to be basically focusing on the main use cases there so min
max trees uh max heaps there are actually 18 different types listed on wikipedia which is
just nuts and they're all slightly different let's see but because you're not so strict on
the ordering it's quicker to insert the data and then keep it balanced. And so you get linked list
type times.
It's actually
sorry, I got a little lost on my notes.
I got so excited talking about these heaps.
I got lost on my
own notes that I wrote. You did bounce around a little bit.
Yeah, it'd be alright.
So the downside, like I mentioned,
is that it's slower for lookups.
You have to look at each one, which stinks.
But another cool property of probably the killer feature and the one that makes it particularly suited for those priority queues is that it's really efficient to remove the top node, remove the root.
And so do you want to explain that a little bit?
Yeah.
Because
the entire tree...
Can I punt on that and get
to it when we talk about the how it works
section, which is kind of right now?
I'm going to come back to that. We come back to of right now. I'm going to come back to that.
How about we come back to it right now?
I'm going to take the long way to get there.
It's one of my specialties.
My coding block superpower.
The deal here is that heaps are typically
stored in an array.
This is unlike most trees where
we kind of set some of those main advantages of trees over arrays is that we can have those dynamic sizes
and we can kind of insert them in place and we typically don't store them in
data structures like arrays. But this is a special case where heaps are almost
always stored in arrays and that's because of the way that you fill them in.
And this is a binary tree
although you won't frequently see it mentioned with other binary
trees, just because it's got these own kind of special properties. And it's so tuned for plucking
root nodes out that it's kind of considered its own beast. And it's so common for that one specific
use case that it's basically considered its own category. But say you take that memory, you allocate it for an array, you say
size 1000, 1024, who cares.
And when you go to actually fill in the
values of the tree, what you do is just pop your
I've got to stop using that word. You push the value that you're inserting
into your heap into the first available spot in that array, the first non-null spot.
And let's say if we start with the number 25, we go ahead and throw that in the first index of the array.
So array 0 becomes 25.
We throw in a second number 1.
Now 1 is going to be less than 25. We throw in a second number one. Now one is going to be less than 25. So we know
that in our max tree where we want the max value to be the root, it's going to go after. So we're
just going to go ahead and put it in the next index. So now our array reads 25, one. Now,
if we add one that's bigger, 99, then what we do is we go ahead and pop it in, push it into the third index there. We add it. So
now our tree is out of order. This is not good. So when we add it in there, it's important that
we check the parent and say, Hey, are we still good? In this case, we're not, we've got 25 at
the root, but we've got a child of 99, which is bigger. And we want a max tree here, a max heap.
So what we're going to do is swap those values.
And if this were a bigger tree,
if it was, say, you know, 100 nodes or something,
then you would just keep going with that.
So you always just insert into the first available index in your array,
and you compare that index with its parent
and say, hey, which one of us is bigger? And then depending
on that result, you're going to swap them and you just keep checking it until you bubble up.
But there's some interesting properties, like I mentioned, because it's a binary tree,
this is really important. Because it's binary, you can easily compute the value of your parent
or the value of your children from any index. So if you give me
index five, I can say, okay, my parent is going to be, so I don't know the reverse of it, like
two N minus one, something, some sort of a computed number. That's going to be a fixed math.
There's division involved, but it's not anything crazy. There's no exponential growth because it's always going to be a balanced tree, like we talked about with the binaries. So I can always do just a
little bit of math, a quick calculation to figure out where my parent is, my child is, and I can
use that to either bubble up the values when I insert new nodes, or here's the finally the answer to your question. When I take the
maximum value off, I chop the root off that array. What I do is I go ahead and grab the last
filled in item in the array and I push it into that array at index zero. So I've moved the last into the first position.
And then from there, I just bubble down. I say, hey, are you bigger than any of your children?
If so, swap. Then if that node is bigger than any of its children, swap. If that node is bigger
than any of its children, swap. And in both cases, whether you're bubbling up or bubbling down, you're only ever looking at the logarithm of your data set.
So basically the depth of your tree.
And that's because we fill it in.
We always fill in the next empty spot in the array.
You always end up with a perfectly filled in tree.
Like we talked about that Christmas tree with a little bit shaved off.
So does that make any sense?
I think sort of.
The only thing that I'm not getting off the top of my head here is we're talking about multiple.
It's multiple arrays, right?
It's one array.
Okay, so it's one array.
And then, so this whole parent-child notion then, how does that live?
I'm guessing then the first index of the array is the root.
Yep.
Okay, so then the child is what?
The left child is index one, and the right child is index 2.
Now, the children
for index 1
are going to be 3 and 4
and the children
for 2 are going to be
5 and 6.
Okay, now I'm with you.
So, I can take any arbitrary note. I say like,
hey, what's 2 going to be?
And the results for 2 were I think it's like 2 plus n minus 1 and 2 plus n plus 2.
I forget the actual equation.
But there's a simple mathematical equation that I programmed during my lunch break today.
But it's on another computer and I can't see the formula I came up with now.
But it's a really easy calculation that given any nodes,
you can find its parent in like a,
you know,
a plus and a minus and maybe a divide,
or you can find the children in the same way.
Okay.
Got to reverse the equation.
Yeah.
I'm with you now.
So basically the array,
as you're moving to the right through the array.
So if the array starts on the left and as you move to the right,
then basically you've got the root, then child one,
then child two, and then child one's first child,
then child one's second child, etc.
Yeah, you're just going to keep walking through it.
Okay, that makes sense.
Yeah, and in the notes I actually wrote down like a really precise way
of saying that i planned on saying but i just got so excited talking about the heaps because
they're one of my favorites that i kind of blew through that and so sorry if that was confusing
uh really hoped it wasn't but the the deal there is the root is always going to be your max value
in a max heap or your min value in a min heap and so if you want to chop that sucker off, you just take the value out of it,
save it, do whatever you want to, and then pop the last item from the array that's been filled in,
the last non null value, pop into that first spot. And then you just do a couple of comparisons
with your children to see and your grandchildren, your grandchildren, until you get things in the
right spot. And this thing is not going to be in order when you're done.
Probably not.
And we don't care about that.
All we care about is that your max value is always in that first position.
It's kind of a funny case.
I actually found, much like you found, a super cool visualization for the B tree.
I found one that's really awesome for this as well.
Oh, very nice.
This is a binary heap plus priority queue.
So it's a little bit more complicated, but it's doing something very similar.
Yeah, okay.
Awesome.
And so priority queues, you know, kind of the obvious example there is like I've got a bunch of tasks and some are more important than others.
So I always want to make sure that the most important task is in that first position in the array so what i do is when i do any sort of insert
even if i know it's a high priority is just go ahead and throw it in the last open or yeah the
last i guess i should say the first null spot in that array the first non non-filled in area
non-filled and you can in like languages like c that don't really have the null thing that we've got on, like C-sharp,
then you can basically just keep a pointer that keeps track of the last.
And whenever you remove an item, you decrement the pointer.
You add an item, you increase it.
But you basically just put it in the first available spot.
This keeps the tree balanced.
And then when you bubble up, you're only going to be doing a maximum of log n procedures.
And once you find a value that is correct, you don't have to check anymore.
You just need to go up until you don't have to swap anymore, and then you're done.
So similar to the previous site visualization that we used from USFCA, there was another one that they had from MinHeap.
I mean, I know that Joe was wanting to focus on the max heap,
but again, same thing, just reverse your direction
for the comparison. So you were talking about the example
you found was in regards to like a priority
queue, but this one you can see the
array and how it's building the tree and you see how it reshuffles the tree and the array
as items get thrown in sweet so i'll be sure i'll include links for both
yeah and i'm just gonna look up real quick so that equation was so i don't think i completely
had understood until i saw the visualization which is again i mean it's harder to describe
some of these things but what you were saying that i that i finally realized was when you were
saying you insert the value into that last position what it then does is look at the tree's
parent its parent to see if it's bigger than its parent.
And if it doesn't, it swaps it.
So every time you're doing an insert,
it could potentially look at its parent and swap with its parent all the way up
to the root if that happened to be the case.
Well, and another way that didn't dawn on me until I saw the visualization too is that if you think about this in tree structure, it's always favoring the left side of the tree.
If you visualize the tree.
It's always favoring the left side of the tree until all of the nodes are full, and then it moves its way to the right before it goes on to another level.
And by full,
this is a binary.
So it just means if there's two nodes,
the move on to the next one.
So,
and so as part of that favoring the left,
that's when it'll make the decision of,
Hey,
do I need to swap with the parents?
So,
so by it going to the very end of,
you know,
the first null value in the array, that is going to be whatever the leftmost node available is on the tree.
And when it gets there, then it'll start comparing to its parent to figure out, do I stay where I am or do we need to swap?
So this is the University of San Francisco website.
And I got to give them props, man.
They did some nice things with these visualizations.
These are awesome.
Yeah, I just looked up that equation.
So in a zero-based array, if you give me any node index, so we'll say like number 7,
then its children are going to be located at two times seven,
14 plus one.
So the nodes and plus two,
so that the,
my children,
if I'm node seven are going to be at 15,
16.
And so if I'm at 15 or 16,
I just reverse that equation and I'm going to get back to seven.
If you either,
either one,
15 or 16,
because I'm going to round down.
So it's literally an O of one operation to go find the parent or the children.
Yeah.
Now you may have to find multiple parents in the case that you keep having to swap.
But the worst case scenario, you're looking at the logarithm.
So with one million nodes, worst case scenario is 20 operations to remove the top node from
this tree. Right. You're not going to get top node from this tree.
Right.
You're not going to get that with any other tree.
Yeah, which is also swapping them as they go, right, basically.
Yep.
And it's keeping it perfectly filled in, too, the best it can,
depending on the values.
Right.
No, that's really cool.
And I like that the whole way it kind of achieves that
and is able to minimize those is just kind of by keeping it kind of closely sorted. It doesn't even have to be fully sorted. It's just as long as it's kind
of close, then we can be a little loosey-goosey with how we rebalance and we save a ton of steps
and we get you most of the way there and it works out pretty dang well. Very nice. So it's one of my
favorites. So I was very happy about that. And so the pros for this one is you get a fast insert on average and O log N for the worst case.
And if you remember the binary search tree, which is a thing that is often compared to, it's going to have O log N for the average case and O N for the worst case.
So much faster inserts than binary trees, binary search trees anyway. And you've got that fast removal of the root node, which you don't get in any other case,
but is a very specialized need
that you're going to have.
Now, the downside is that slow search.
At worst, you're looking through every single element.
This is not a data structure you want to use
if you're doing any sort of searching.
You're better off with almost any other kind of tree
that we've talked about.
So when should you use it?
When you need to quickly insert data and you only care about getting that, either that min or talked about. So when should you use it? When you need to quickly insert data
and you only care about getting that either that min or max value.
And of course, there's other 18 variations,
but for the most part, it all kind of just deals with
how it calculates that most important object.
And don't use a heap when you need to search for arbitrary values.
And of course, I mentioned that priority queue with the BFS.
Do you guys remember the deal with the priority queues and the BFS?
No.
I think that's the one we said where, like, you're going to borrow money,
and you don't want to go to the grandchildren.
You want to start with the parents.
So you, like, kind of go to your children first and say, hey, give me $20.
You know, if your child says no, so you go to your next child and say, hey, you give me 20 bucks or else I'm going to go ask your children.
And they say no.
Then you go to the grandchildren.
But basically you start with the nodes that are closest to you.
And there's no way to do that without having another data structure
in order to keep track of what you need to go to next.
And the most common solution for that is to use a heap.
So here's a silly question.
Is the.NET heap actually using a heap data structure?
Do we know?
Why is it named that?
You know, I didn't even think about that at all.
So you would use it when you want to insert data quickly
and you only care about the min or max value. So that's surprising to me. Oh, but how about this?
If you just wanted to quickly get the max value being like the next memory address,
I don't know how that would work with different sizes though. So I don't know.
Yeah. I'm not sure. I was looking for it and i couldn't
quickly come up with anything but it just seems odd that it would be named the same you know
yeah i bet it is i just don't know what the the specific the specific advantage of having it being
a heap versus something else yeah so anyway i really wish I would have looked that up. Sidebar.
Anyway.
Okay.
Well, if you know,
leave us a comment.
Chumac, I'm looking
at you.
Really good comments.
I'm not even finding
the class.
Yeah.
There isn't one.
Oh, okay.
I thought you were
saying that there was
one.
No, no.
I'm saying like when
you talk about the
stack versus the heap
or whatever.
Oh, you're talking
about the memory. The actual, what type of data when you talk about the stack versus the heap or whatever. Oh, you're talking about the memory.
What type of data structure they
use behind the scenes for the heaps.
And why do they use
that particular one? Yeah.
Why do they use that name?
So this question is really across
any language then. Yeah, maybe.
Like the Java garbage collection.
See, for example,
if you allocate memory, you're getting on the heap and you're saying like, oh, why do they call it the heap?
Is this some kind of relationship?
Is this a heap data structure?
Right, yeah.
Yeah, I'm curious.
I didn't find anything in my quick Googling.
So, all right.
Yeah, that's really interesting.
I really want to know.
Like the whole reason you would use that to me is like you've got stuff coming in fast.
Okay, we've got that we want it loosely uh organized so that we can do that quick insert and we never want to search i never want to search memory which seems right yeah it seems
like it fits i don't just it's weird that it's got this special property where it's really good
about popping off either like the min or max or whatever the most important value is yeah i mean that's the only thing that i'm not certain of is usually
the whole point of the heap in in a managed language is the ability to quickly garbage
collect that stuff and i don't think it matters about the max or the min or any of that kind of
stuff right it's just is it still used as the reference to it gone? If it's gone, clean it up, kill it.
The way it constructs the heap
would be interesting. So you allocate a
18 kilobyte string, then
it's going to go ahead and put that in a lower spot.
And so I would think that would always be arranged
such that
basically either the most
memoryful item is at the top
or the least, either way.
And maybe it does the least,
and so it's easier for it to kind of go through those bottom nodes
and look for things to clean up in a smarter way.
I'm not really sure,
but that would have been a really good thing to look up tonight.
Sorry.
Hey, I got a question, though.
You weren't about to ask something.
No, no, I wasn't.
So you said about if you had a million nodes in it, right? I want to make sure
I understand. It wasn't that it was log
of a million as
is the maximum.
Correct? The maximum
number of comparisons you would do is
the logarithm of the number of nodes.
Which is also the height of
the tree. It's the same.
But the number of nodes is not equal to the height.
No, the height is the logarithm of the number of nodes.
So if you have a million nodes, you're going to have a height of 20,
and that's the most number of comparisons you're going to do
whenever you insert or remove a node.
I wanted to make sure because I thought you said like logarithm of a million.
I might have.
And that's what threw me off.
And I was like, wait a minute, really?
But okay.
I think I'm with you now.
This episode is sponsored by Discover.Bot.
Discover.Bot is an as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together.
And on its own, a good idea isn't as powerful as it could be.
But when a good idea is shared, then it gains strength and momentum.
It becomes capable of changing things in a way that is both small and large.
A good idea shared becomes an innovation.
You got to say it with conviction.
It gains strength.
And strength.
Discover.bot aims to sit at the intersection of ideas and innovation. They want to help people turn their experiences, discoveries, stories, advice,
and knowledge into part of a shared canon that moves everyone forward. For veterans and beginners
alike, Discover.bot is a place for learning, teaching, and talking. Head to Discover.bot
slash coding blocks. That's Discover.bot slash codingcks to learn about how to get started on your next
great bot. Alright, it's that time again
where I'm going to ask you and
thank you for leaving us a review
because it means the world to us. It's the
biggest thing that you can do and it's the thing that
we want most in 2019
and every year prior.
We want reviews. We really appreciate
when we get them. iTunes, Stitcher, if you go to
codingblocks.net slash review,
we're going to have links to a couple of others.
It's really important for us to help grow the show and it really means a lot
and it really helps us out.
So if you could go there,
leave a review,
just a small one with lots of stars,
that would be fantastic.
Yep.
And as we've said before,
Hey,
share us with a friend,
spread the,
spread the word. We, we would with a friend. Spread the word.
We would greatly appreciate that.
And with that, I want to take a moment.
We played this little game before, and I thought it was a lot of fun, about how much data is generated every minute.
And there is a new version of it that's available.
And I thought, hey, we could play it again, but with the new version of it that's available and i thought hey we could uh we could play it again but with
the new version so we won't oh man i should have paid attention to the old version yeah right right
uh and and these are like new categories too some of some of them are new categories so
uh i wasn't going to like repeat the ones that we've already done. So, every minute of every day for the year 2018, right?
How many packages did Amazon ship?
Every minute of every day.
Wow. How many people are in the u.s 25 255 million 300 million i
thought yeah somewhere in the block so i'm gonna say 500 million packages now every minute of every
day is that is that crazy all right okay let me do some maths. I'm going to go. Every minute.
Let's say.
Every minute.
I'm going to go with a thousand packages.
All right.
Well, I like these answers.
50 million every minute.
30,000.
A thousand every minute.
So we go from one extreme to the next.
30,000 for me. I think 30,000 a minute. Your spot. You one extreme to the next. 30,000 for me.
I think 30,000 a minute.
Your spot, you're really close, man.
Really?
Super duper close.
Even by survey says rules, you would have won. The answer was 1,111.
Nice.
Wow.
Yeah.
Every minute of every day.
That's how many packages they're shipping.
That is crazy.
Okay.
So we all love Giphy.
So hang on real quick there.
Sorry.
So I just wanted to see how much that was.
And so that was 1.5 million.
So that's like in a day.
Sorry, in a day.
So you kind of think about like in terms like what we said, you know,5 million. So that's like in a day, sorry, in a day. So you kind of think about like in terms of like what we said,
you know,
300 million,
that's,
that's like every other person,
half of the people in the U S get a package every day.
That's crazy.
This.
Yeah,
that's just,
that's all right.
Sorry.
It's kind of a big deal.
I don't know if you've heard of Amazon.
There's this thing called Amazon.
Okay.
So,
uh, we all love Giphy,
right? Oh, man, I was so wrong.
I'm so wrong, man.
I'm sorry. What's 1.5?
I was going to say,
yeah.
Half of people get a package every day.
I'm so sorry. That's so wrong.
I'm so wrong. Joe's kind of here tonight.
He's sort of partly here.
It's one in 150 people get a package every day.
So I'm sorry.
And that's like third diversion.
I'm just going to shut up now.
Okay. Okay.
So Giphy,
big part of our Slack community.
We all,
we all know it.
Well,
is it not Jiffy?
Sorry,
go ahead.
Okay.
So how many,
how many gifs do you think Jiffy serves up every minute of every day of 2018?
Ooh.
And this is,
this is just through slack or is this Jiffy or Giffy or whatever it is?
Just how much they serve up.
Because even if it's coming from slack,
they're still serving it.
So anywhere.
Wow.
So what was the parameters?
What was the time parameter there?
You said every minute,
every,
all of these questions
are every minute of every day
for 2018.
How many?
I'll go with
10,000.
10,000.
Man.
1,000.
Wait, served up.
Jeez.
20,000.
All right, so $10,000 and $20,000.
Yep.
Well, I'm going to give it to Joe.
He's technically the closest.
Worked for it.
He's technically the closest.
You know I'm proud of you two.
I just got to say, because in the history of all these games,
I don't recall you guys ever just gotta say because in the history of all these games i don't recall
you guys ever like literally one upping the other person and saying like ten thousand and one you
know i don't recall that ever happening but uh yeah one million three hundred and eighty eight 388,889 per minute?
Every minute of every day
of 2018.
Wow. Right? That's nuts.
Let's see.
How do they make any money?
I know!
Yeah, really.
Okay, so, let's see.
They own the backbone to the interwebs.
That's why.
All right.
Docker.
Kubernetes.
That's why.
Cloud.
All right.
How much Bitcoin is created?
New Bitcoin.
Every minute.
Oh, gosh. Two. Wait wait bitcoin still exists i mean they exist they're just not worth much
i'll go with three i'm gonna one-up you on this one three all right
because not much happened just as soon as i gave you guys a compliment man
yeah but you gave us a fractional question we can't do much with this one all right joe takes it again two to one wow it no no i'm saying your score is two to
one joe's ahead two to one oh okay so it was less than two that the new bitcoin every minute of
every day 1.25 dang it man wow all. I was actually surprised it was that high.
Yeah. That part... Would you consider how much
energy is wasted on creating?
That's ridiculous, man.
Alright, go ahead. This is making
me sad. It is.
How about
new
Reddit comments?
1.21 gigabytes.
And these are comments we're talking about.
What's bigger than gigabytes?
I think it's 33 Hoobastanks.
Great, Scott!
1.21 Nickelbacks.
That's too many Nickelbacks, man.
All right.
So how many new comments on Reddit?
You go first, Joe.
I thought I'd answer. I'm going to say say one million no uh a hundred thousand per minute okay a hundred thousand
i go two hundred thousand
joe is now leading the game three to one. Come on, man. 1,944. Oh, people need to start writing more comments.
The world's not filled with enough vitriol so far.
Let's get some more up there.
I tried to give you guys a hint.
We're not talking about upvoting something.
We're not talking about a new post.
We were talking about comments.
Man, it's Reddit.
We should know.
All of ours usually just have 10 negative ones on there.
Yeah.
Oh, man. I'm wondering now. i'm doing pretty good with this quiz like how did i not get that
google job so many years ago they asked me how many ping pong balls fit in the airplane i'm like
oh wait 300 was it a c-130 hold on hold on c130, 700. All right. How many Uber rides do users take?
Or did they take?
Every minute of every day, 2018.
30.
Every minute?
I'm going to say 60.
Come on.
You know what this means, man.
Doggone it.
Did I just go down 4-1?
It's 4-1.
Dang it.
Wow, and look at what you're losing to.
Just running away with it.
1,389 rides.
That's pretty close.
You're not thinking.
1,000 per minute, man.
You got to think globally, man.
I did.
We're talking every minute of the day.
I thought 60 was pretty good.
I was like, hmm.
Yeah, I know.
That's a lot of rides.
Okay, if it were 60, how would people be able to do that as a job?
That's 60?
People are traveling every minute of the day.
I was trying to average it out to like 8 to 5, right?
And then, of course, the drinking hours.
So maybe 11 to 1.
Yeah, I'm not so good with numbers past 100.
All right.
I'll give you one last of a new one.
And then I'm curious to see if you guys wanted to see the changes from 2017.
So how many songs did Spotify stream?
Per minute.
Per minute of every day of 2018?
10 million.
One million.
The score is now 5-1.
Yes.
Whatever, man.
This game's rigged.
750,000 songs.
Wow.
Yeah.
They need to step their game up.
It should be closer to 10 million.
All right.
So there were some curious ones in here too. Like, uh,
I think we did the text one,
one like text messages sent last time.
Yeah,
this was curious. Last time it was 15.2 million text messages sent for 2017.
It went down for 2018.
Is it because of FaceTime and,
and hangouts and that kind of stuff?
They don't attribute it in this,
in this particular, uh, graph I'm looking at,
but it went down to 12.9 million.
So I thought that was a pretty good drop, all things considered.
People didn't stop text messaging.
They're using Facebook Messenger and things like that.
I guarantee you they're just not counting that.
But it's not like Facebook Messenger just came out in 2018.
Like what new communication technology came out in 2018 that would have changed that?
I think people are just moving away from SMS, period.
I think they've started sending Giphy messages instead.
That's probably right.
That's probably right.
Yeah, they don't have filters in text messages like my snaps do.
I got a question i got a question uh same vein how many uh hangout
messages are sent per minute in the year 2020 this is a little forward thinking yeah zero yeah
well what about aloe uh that's the thing that's replacing it yep that'll be a lot it'll be zero
zero yeah that's right because they can
that one too well what about google plus then oh yeah zero okay okay i'm beating now on three for
three saying it i see where you're going give me another one i'll get this one. How many times per minute will I go to reader.google.com just to check and see?
Just in case.
Five.
I'll say 0.5 per minute.
Just in case.
They brought it back.
Listen, if you're an intern at Google, I want to talk to you.
I got a 10 spot.
If you just plug that server back in.
A 10 spot.
You realize where they're working, they can't even buy a sandwich for that right
yeah that's true here here was another interesting one uh that we talked about last time i believe
we talked about the amount of data used that americans used and uh it was measured in gigabytes
and again it was still measured in gigabytes but uh some quick math here, and I can easily make that into terabytes for you if you'd prefer.
So last time we said it was 2.6 terabytes every minute of every day for 2017.
For 2018, it went up to 3.1 terabytes.
So a 50% jump almost.
Yeah, that's big.
Yeah. And it's just growing yeah uh it's too much
people it's just too much yeah scale it back well i think did we talk about the youtube one was that
one that we talked about last time call maybe i don't know hit us with the stats, quick. YouTube videos watched per every minute of every day for 2017 was 4.1 million.
But for 2018, it went up to 4.3.
So not that big a jump.
That's already everyone in the world watching YouTube.
That also corresponds highly with how many toilet flushes per minute.
So strong correlation there.
Anyway.
Wow, this took a dark turn.
We'll never play that game again.
Those balloons popped on balloons.
Yes.
All right. So, uh,
with that,
let's head into my favorite portion of the show.
Survey says,
all right.
And,
uh,
so we won't have a survey to report on from,
you know,
a previous survey to report on because,
Hey,
guess what?
I said it was January,
but actually I was lying.
We're recording ahead.
So that's why Joe was talking about Advent calendar stuff in January,
if you were curious.
See.
So, but we do have a survey for this episode,
which is how much time do you spend coding outside of work?
And your choices are nada. I don't write code for free.
Or, until I go to bed.
Or, I take the weekends off, otherwise I'm writing code.
Or lastly, like a sine wave, I go through phases.
Sometimes more, sometimes less.
This episode is brought to you by Clubhouse.
Clubhouse is the first project management platform for software development teams
that brings everyone together so that teams can focus on what matters,
which is creating products that their customers love.
While designed to be developer first, and that's important,
it's designed to be developer first. The UI is important, it's designed to be developer first.
The UI is simple and intuitive enough for all teams to enjoy using.
Now, when I mentioned that the UI is developer first, here's how developer first it is.
There is a button on your story that you can click to actually get hints for how to create your branch based on the story point.
Yeah, the UI is phenomenal.
You log in and you can immediately see your work queue, your active tasks,
your upcoming due dates, and your activity feed.
Yeah, it's easy for people on any team to focus in on their work on a specific task or project
while also being able to zoom out to see the work that contributes to the
bigger picture using the fast interface.
With a simple API and robust set of integrations, Clubhouse also seamlessly integrates into
the tools you're already using every day, like Slack and GitHub, for example, and getting
out of your way so that you can focus on delivering quality software on time.
And you could sign up for two free months of Clubhouse by visiting clubhouse.io slash coding
blocks. And again, that URL is clubhouse.io slash coding blocks and you get two free months
and you can see why companies like Elastic, Full Story, and LaunchDarkly really love Clubhouse.
So I want to talk about tries and i'm going to
column tries even though it's somewhat controversial looked it up on try as you may
tries to spell t-r-i-e-s and if you look it up in wikipedia or other spots you'll see that it was
developed by somebody and you know like a million years. And it was named after the word retrieve tree.
The person was French.
So this is a tree that's spelled like tries and is technically pronounced trees and nobody pronounces it that way.
So we're going to call it tries.
We're going to call it tries.
We're going to be wrong.
Just like everybody else,
because that's how language works and English works.
And actually we're going to get into that.
I mean, just look at Jeff's.
So tries also have other names and sometimes the names are kind of
specialized versions.
Like you'll often see them referred to as Radix trees,
which are technically a more specific version of a tribe,
but whatever.
We'll just want to go ahead and mention Radix trees,
prefix trees, digital trees.
If you see any of those things,
you're all in the same ballpark.
And they are a specialized tree where the nodes don't matter.
That, like I saw them on Wikipedia,
it was like instantly derailed.
Like, what do you mean the nodes don't matter?
Right?
Yeah.
Yeah.
And this is one where you can stash some data in the nodes if you need to do some calculations or you need some metadata or something that's specific to your kind of business-y domain kind of use case, whatever.
But really, the data that we care about is going to be associated with the edges.
So like a binary tree, you would have like a left and a right.
Or in a B tree, you would have an array there of children. In this case, you would have, I don't know what the data structure would be, but I would
think that I would do some sort of hash table here where I have a key that represents each of the
edges. And then the value in my hash table would be a node where maybe i keep some
other stuff and it more importantly has another hash table with these keys and so this seems
pretty goofy and it doesn't even sound that different because it's one-to-one right aside from the root, every single node has one
edge
leading to it. So if this is
one-to-one, why do we even care
if we associate that value with
the edge or if it's just on the stupid node, right?
What's the difference?
I mean, this sounds kind of like a
directed graph example.
That's what it
sounds like. Yeah, where it's weighted.
We've got weights associated with those edges, where it's weighted. Yeah.
Yep.
Yeah, we've got weights associated with those edges, and it's important because they may
not necessarily be the same value.
So, one direction is five, the other direction is seven.
So, it's really important to have those on the edges.
And in this case, it's a little bit different.
But the reason that we don't say that the value is associated with the node is because
the nodes don't really have any sort
of importance aside from notating the existence of the edge. And so if I pluck a node out and then
re-add it, it's not even, it's not even cynical in this case because we don't care about the nodes,
we care about the relationship. So it doesn't make sense for me to say, hey, give me a node out of it
and put one back in because I'm not really doing or adding anything. It's care about the relationship. So it doesn't make sense for me to say, hey, get me a node out of it and put one back in because I'm not really doing or adding anything.
It's all about that relationship.
So I don't say, hey, here's a node.
I say add a relationship for like the letter 7 or the letter S.
And actually this is frequently associated with alphabetical type things.
Yeah, it's so
unfortunate here because
the way
the Wikipedia
article versus, like, going
back to the Impostor's Handbook, for example,
like, their visualizations are
in contrast to one another.
Oh, really? Okay.
Yeah, because the way...
Okay, so let's look at the Wikipedia article for a moment.
So they're consistent with how you're describing it,
that the edge is what the value is,
and the node is the relationship.
So if you were to walk down in this particular...
You'll have to...
Dear listener, there'll be a link for you to click on.
But, you know, you can go down that the topmost parent node is always going to be blank.
If we think back to the example that he gave where it's like words and alphabetical kind of use cases, right?
So no letters.
So parent, then you find your first letter.
So maybe you go down to T.
So you're at that first node level there for T.
Then you're like, okay, well, the relationship to E.
So T represented the line that pointed to that first node.
Then you say E as your second letter.
There's a line that represents that. And the next node down is the relationship of a particular word.
And so you can formulate words by traversing the tree in a particular path if there is such a word, right?
You know who has a good visualization?
Is it California?
University of San Francisco.
Yeah, I'll drop it in here.
And so I guess the part that made sense was kind of what you were just saying is you end up with a tree of a word. Like the interesting thing is if you go to this, this particular visualization on the try or tree is if you try and punch in a number and hit insert, it doesn't work.
If you use letters, it'll draw a node. And so I was really confused because I did ABC
and a created separate nodes. It's not until you do multiple letters that you'll see the tree grow. So basically
if you have 20 letters, you'll have a 20 depth tree, right? Or 21 nodes actually, because you're
going to have your root node and then the 20 additional child, child, child, grandchild,
great grandchild, et cetera. See, this is also still so confusing. Their representation of it matches the imposter's handbook. So you can draw it either way.
But the reason I like the definition of having that stuff sitting on the edges is because later when we're going to talk about briefly about compressing the trees,
that's where you basically take any node that's parent only has one child and you compress it. And then in that case, you're changing the
value that would be associated in the node. And that's different from every other tree that we've
talked about where you give it a node that has a value and it does something with it. In this case,
the tree actually owns the node's values. So you wouldn't even, you don't even provide the nodes to the tree you give it a word
so like even type in like abcde that's not really a good use case for this try typing in or reset
the tree type the word code enter it then do coder enter it then do coding enter it then do
i don't know cod enter it and what's it's going to create a tree based on those letters that kind of
um well it was the commonality being a big prefix right it'd be better if you know you had too so
many that were like the same word but like if you did like impose impasse imposter right like you
would see some variation among that tree like they the the top of the tree would remain the same I and P, for example,
but then you would start to see it differentiate below that.
Yeah. And in the case where you actually compress that tree, you could take that I and M and say,
you know what? Every single child from here has I and M as a prefix. Why do I have two nodes for
this? I can save space by saying this node is I M. And then I don't split that node up until I get some reason, you know, some other combination there that starts with an I.
And that compression is actually really important to saving space and also speeding things up.
And that's why it kind of boggles my mind to think about the nodes having values.
But, you know, I think either way you slice it, it's the same thing.
So conceptually, it doesn't really matter if too much if it's the edges or the nodes, because the tree or the try in this case, owns the values of those nodes. So
it is responsible for changing it also dealing with the implementation details of like,
finding things or adding things to it. And so you're pretty far removed from the actual values
of the nodes, and you don't really get to do much with it as a user of this data structure.
Now, kind of going back to what we hinted at at the very beginning, though,
when we were talking about the space implications of this,
and you kind of mentioned this, too, in regards to the compression.
The space complexity of the try is where there can be significant gains here
because you're not having to restore the entire word.
You only have to store parts of the word.
Yeah, so the example I said, like code, coder, coders, coding.
I think I did a little bit of math down here.
Once again, I'm lost in my examples.
Yeah, well, we'll get to it here in a minute. But basically, you don't store the letters uniquely,
but you do store the combinations uniquely.
So like the example of code, coders, coding, whatever,
then the COD prefix is only stored once for all of those different words.
So it's much, much less space requirements
than storing all those words inside an array or dictionary.
But you've got to remember there's a pointer
that's going to be going to each one of those.
So if your data doesn't have a lot of duplication,
then it could potentially actually be worse
than storing in something like an array or a hash table.
And I've got an example that I think is going to help a lot here.
So if you think about a spell checker for like a, you're working on like a word document or
something like that, and English is a mess, bringing it back around. It doesn't really
have very good rules. Like, you know, we can say Q is always followed by a U. I is not always before E because there's that weird C case.
And Y isn't always even a value.
And even aside from that, there's all sorts of words that we're adding all the time that have numbers in them.
And there's actually words in the dictionary that are multiple words.
They have spaces in them.
They have accents in them.
We have words from other languages.
There's just so much crazy stuff that you can't really define the English language with just a simple set of rules.
And so what typically companies like Microsoft or whoever makes the grammarly makes these dictionaries and spell checkers is they just have a big fat dictionary with all the words.
There might be some shortcuts there, but that's kind of the gist of it.
And so I went to look up how many words there were in English. And just like anything else,
you know, it's complicated. Is run and runs the same word or not? Different people counted
different ways and things get weird. This dictionary in 1970 had this many, but this
other one had another. But the best number I could kind of come up with for our case where
we do count things differently is about a million. And that's why
I knew offhand that 20
was the
logarithm of 1 million because I had
looked it up for this example.
For a build and a spell checker.
Yeah, the logarithm is
20 of 1 million.
No, it would be 20
levels.
20 depth. Which also happens to be the logar. 20 depth. 20 depth.
Which also happens to be the logarithm base 2 of 1 million.
No.
Yeah.
Oh, wait.
Maybe I'm thinking base 10.
Yeah, base 2 of 1 million is 20.
Actually, it's 19th century.
19th century, yeah.
So, if we were to get a big array keep it sorted because we know we can do that binary
search on there we're going to have a million values in there and i looked up the average size
of the word and i actually i was curious just that the overhead is string so i looked that up
too and john ski actually had a really great stack overflow uh overflow question that he answered
and came up with some numbers so i actually put that together and I figured out that if you were to store the entire English language in an array, you'd be looking at
about 30 megabytes or 32 megabytes, which is actually not that bad, except you think it's,
you know, text. So that's kind of crazy. But that means in that big array that, you know,
the first word is probably a, and the second word is probably aardvark and the third word is whatever.
Uh, obviously there's a lot of repetition.
You know, I mentioned code coding, codes, coder.
Um, there's a lot of the, coders, coding, then, uh, let's see. I got the
math in here somewhere. Okay, here we go. So if we were to store those words separately,
we'd be looking at a total of 26 characters, but using the try, it's only going to be 10.
And you can imagine that's just a contrived example.
If you were to think about every word in those 1 million, there's a lot like automobile, automaton,
automation. How many cases are there where there's a significant prefix that could potentially be
reduced? And aside from the size, that's really only kind of a side note. That's not the primary
reason to use this data structure.
It's nice that you can save a lot of room when you've got redundant data, and it's definitely important.
But it's also really efficient for looking stuff up in certain cases.
Like we mentioned, this is a spell checker, right?
So if, again, our naive hash table or array example, you hit save or you type in a word, we would see the word break.
We would go through and we would do a binary search and it would take a logarithm amount of
time to figure out whether or not your word is in our dictionary. We could give you the squiggles
or not. That's a fast lookup. I can't dog on the searches for that, or the hash tables and the arrays, because they do a really
good job at just using a lot of space. However, if we were to do this with a try, first, I don't
know what it would be. I'd have to run actually a dictionary of a million words through to figure
out what the size was. But what's really cool about it are the other benefits. Like because I've got this thing
stored in order of the word, I can tell you as you're typing, as soon as you type a wrong letter.
So you start typing C O D F. That's when you get the squiggles and the time complexity for me to do that is constant.
So it's O of one.
It doesn't get any faster than that because I should say I was kind of using the example where you look up per letter and kind of as we go along.
But ultimately, a better way to say that is that the search time is dependent upon the size of the word that you're searching for. So that's C-O-D-F.
I'm looking at four as I start at the C.
I go down to my child, which, remember, has the key of the letter.
So O-D.
And then when I see that there's no child with the letter of F as the key,
then I know that's an invalid word word and I can duck out right then.
So as you're typing.
The next lookup is constant time is basically what we're saying.
It's for every key you're getting O of one lookup.
So number of letters.
And then you get down to that last one and you have immediate results because you had the pointer to that last character that you had done.
Yeah.
It's trivial for me to look up as you're typing
to see whether the word is misspelled or not.
And from there, you can see it's not too crazy to say,
like, you typed C-O-D-F.
We don't know what that is.
However, here are a couple words that you might have meant.
Coder, coding, coders, whatever,
because those are the things that it's easy for us
to kind of traverse the tree down
and see what you might have been looking for instead.
And you can imagine some of that metadata that I mentioned storing might involve storing some of the most commonly used words that involve that prefix.
So just thinking out loud here, this is actually better than a hash table because your hash table example works great if you're searching for a complete word.
But if you're doing a partial word like COD, it falls apart
because now you've actually got to search the hash table.
You're scanning the table as opposed to going directly to a key, right?
So this is actually better if you're trying to do a partial lookup on something.
Yeah, I mean, you pretty much can't do it with a hash table.
You can't iterate through your keys unless you've got a secondary data table, in which case you're cheating.
So I'm not counting that data structure.
But if you've got an array, you can do it, but you just have to start with the CODs and then loop through all of them until you get to COE or something that's past that, which is a big pain in the butt. And it's so slow there because you're looking at an order N operation,
especially that you type that first letter A,
then that's going to be miserable in an array or data structure.
So you just don't do it.
That's why this is a much, much better data structure
specifically for that type of thing.
That's pretty cool.
And it's often used for autocomplete, autosuggest type stuff.
And English is a great example because it does have a high amount of redundancy. And so you can get that kind of savings. And you can see how it's almost got this kind of like notion of compression kind of built in, right? Because, you know, every word that comes in, we've keep the prefixes only once. And so we're saving that data size, but we also save that when we're doing our lookups.
And it's also really easy to iterate through. Like we mentioned the hash table.
Like if you want to say like, oh, let's start with the COD key or any keys like COD and then
go to the next, go to the next, go to the next, you just can't do it. They're not sorted like that.
You have to store those in another data structure and then look them up that way.
But this data structure is in alphabetical order by kind of definition, right?
So I could start with C-O-D-I-N-G, and as long as I insert those values
to the right every time one comes in or we keep those letters in order,
then it's really easy to say, okay, well, what are the next five
in alphabetical order?
That's pretty nifty.
So I did
a little bit of research on this, and I did see that there
is one better algorithm.
They actually notated this in the wiki that said
actually, if you're doing spell checker, this is a great
example for tries, but there's
actually one better, more specific
data structure.
So I thought that was kind of cool. I didn't look too much into it, but
I think that what it does is basically it's more finely tuned for kind of keeping track of common terms
and stuff like that. So it's got some built-in support for doing that sort of thing.
But the name is pretty funny. It's DOFSA for short. So if you know people referring to it as
DOFSA for short, then the real name has to be just miserable right it's a deterministic acyclic finite state automaton
so if you're doing a spell checker autocomplete type thing then that's probably what you're
going to want to look at you're going to need a spell checker just to spell this thing yeah
yeah it's like i mentioned too like tries can compress really well so our example the words
that i gave coder coders coding whatever they always had the first three letters like well
what if our data structure saw that and said okay well you know what uh let me just go ahead and
combine those because c only has one child the o only has one child the d and it's not till we get
past d that there's any sort of variation so let let's just pull them all in together. So I've got one node that's now COD,
and then the things branch off of that. So now I've dramatically reduced the size of my tree
and the number of comparisons that I need to do. And they say that in practice, this works out
really well. So I came up with some contrived examples like in my english language dictionary probably not good because that's
like every word combination is a bunch of bad ones but you know the q and u i know that there
are words that are q and not followed by u but if there were that would be a good example of
something that we can press or if there's like say anytime the word ends in y that the the letter n
comes first or something like that depending on your your set of words, then that's a shortcut you could take.
And there's some really interesting math behind that.
Where I was thinking, though, that it might happen is maybe more towards the end.
Once you get more towards the end of a particular path on the tree,
that's where you might have compression more often, right?
Oh, yeah.
At least in our spell check mean at least it makes sense yeah and actually this is
used a lot of times for um things like ip addresses or phone numbers or other kind of areas where um
you know the phone number obviously the case is going to be we're going to have a lot more area
codes that are the same depending on your kind of purpose like if you're operating like in a local
area but the ipaddress is the opposite
where... Oh wait, no, that's not true.
No, like router lookup.
The prefix is often similar.
But yeah, in English, I think the ends would
be there. They also use it for binary, so it's
not just for English language, but that's kind of
the most common example just because it's easy to give.
Yeah, that's cool.
And the compressive strategies do get complicated
because it also means that whenever you're looking up
values that you have to kind of know that there
could be more than one. You need to
know to split stuff up when you get new
terms in. So that's not
always so great.
It's still pretty cool.
When do you make the decision that,
okay, now is the time to compress?
But I guess, like you said, those compression strategies can get complicated.
And I guess deciding when to compress is part of it.
You could do as you go along.
But you can imagine if you're in a case where you're loading a dictionary for the first time.
Yeah, that sounds awful.
That would be terrible because you're going to be compressing.
Yeah.
So it's something you want to do periodically whenever you get like a new batch of data in or something.
Yeah.
So as far as pros, this is often used in place of a hash.
It's got a faster worst-case lookup than an imperfect hash table,
and perfect hash tables are kind of hard.
So it's a little bit faster in cases where we know something about the data,
or funnily enough, in cases where you don't know anything about the data,
and so we can't guarantee a perfect hash.
There's no, you know, bad hash functions that we have to deal with.
We don't have to worry about that.
It's, we don't have a hash function, which also means we don't have to worry about collisions.
So it just kind of takes out some of the overhead and that overhead that I'm talking about isn't
even included in the, the big O times because they kind of consider that
constant. So it's just kind of a nice reduction of overhead for
a try over a traditional hash table.
And you can
get that ordered listing of values
which you cannot do with a hash table at all.
So if you need that, you need to be able to kind of like find a key and then
move on past that, then this is your data structure and a hash is not all. So if you need that, you need to be able to kind of like find a key and then move on past that,
then this is your data structure
and a hash is not a way to go.
And despite those pros,
it can be slower than traditional hash lookups,
particularly if the random access time
of the medium is bad.
And this is actually a case where they said
in spinning drives,
tries cannot be so good
because they're not so good
at hopping all over the place
like you can with SSD or RAM.
So the physical medium actually has a bearing on this.
And it's not good
when there would be long meaningless change.
So we talked about storing numbers in this
and binary even, and that's fine.
But if you've got floats in there
where you've got these long kind of meaningless changes
where chains were like 2.333
isn't really different from 2.3 then the try is not going to help you on that the data structure
has any doesn't have any sort of support for kind of recognizing those two things kind of
mean the same thing and just ignoring so not good for floats it can end up taking more space in a hash table
in certain situations because of those pointers
there. So if you've got a lot of
values with no redundancy, then you're going to
actually end up spending more space than you would
in a traditional hash table.
So ultimately, you know, you should
use it when you need those fast inserts
and you need efficient sorting and you've got lots
of redundant data. And search
engines actually use this really, really effectively a lot of times for like auto completes and stuff like that.
All right.
Well, with that, we'll have a lot of resources, a lot of links for you in this episode.
So be sure to check out the show notes and you can see the resources we like section there.
They'll have all kinds of references for you.
And with that, we can head into Alan's favorite portion of the show. It's the tip of the week.
All right. It looks like I'm up first this time. And so I was kind of thinking about maybe putting
some stickers on my laptop, but there's this kind of this weird phase you go when you start
stickering a laptop where it's like you put one on it's fine then you put two on it you know it's just kind of sometimes this is like this little trash trashy trashy uh in between
period where like you don't have a lot of stickers on but it's kind of like you know it's a little
bit and maybe there's like one that's overlapping another and it's just this awkward like kind of
adolescent phase that things go to and you just need more dev stickers but sometimes you know you go through a dry spell and so i was looking on amazon and i noticed that you could just buy
tons of dev stickers so we're gonna have an affiliate link so if you click this link we'll
get like you know a nickel or something and uh there's a ton of stickers that you can buy for
like eight bucks ship free on prime and it's good stuff there's like lin of stickers that you can buy for like eight bucks ship free on
prime.
And it's good stuff.
There's like Linux stickers,
get stickers,
Java stickers.
There is no cloud stickers.
And some of the really funny stuff,
PHP,
geek inside,
like a whole bunch of stickers.
So you can get like nine bucks.
And of course,
if you keep Googling and like looking around,
there's like actually tons of stickers you can buy on Amazon.
And some are like really cool. Like, I don't i don't know i guess i'm out of sticker game and
have been since like elementary school um but there's a lot of cool really vibrant neat stickers
i'm seeing like puffy stickers like there's freaking hologram stickers no js stickers
all just all sorts of stuff so they're really cheap and just go get some stickers.
If you want to,
if you want to do that and avoid that kind of like that weird kind of,
you know,
awkward phase with your laptop stickering.
But I mean,
we,
we know everyone's favorite is going to be the coding box sticker.
Yes.
Oh yeah.
And speaking of which we,
we forget to mention this all the time.
If you want a coding block sticker,locks sticker, drop us a note.
Head up to CodingBlocks.net slash swag and drop us a note, and we'll get some stickers out to you.
And with that, nobody's stickering up their laptops.
That's crazy talk.
Says the guy who just put a sticker on his.
Well, I mean, it was an important one.
It's the CodingBlocks one. It's the coding blocks one.
It's the only thing that I deface my laptops with.
I finally saw a sticker in mine.
That's ridiculous.
Wait, that's the only one?
I know.
It's the only one.
You put a sticker mule?
Is that the only sticker that you put on there?
You didn't put the coding blocks?
I'm going to send him some goop off.
That's ridiculous, man.
Actually, I bought the sticker pack from dev.to.
So I was thinking about going all the way, but I just wanted to kind of try one and see if it was going to rub off or whatever.
But actually, it's been great.
So I think I might go all in.
I have an idea of a sticker you should put on your laptop, sir.
Everyone knows that a coding box sticker would be more than worth a thousand nickelbacks.
I have about a thousand cling box stickers in my closet
but i don't want to waste one you know that's man that's it we have a talk with this guy
this relationship is over all right all right i've got two one tip is because i literally
experienced some massive pain in the past couple of weeks. So quick background,
had a console application that was supposed to just do some stuff, right? And it's a process
that could be long running and it just goes and goes and goes until it's done. The problem is it
didn't go and go and go until it done. It go to go to go to until it stopped doing anything and
it would stop logging. It would stop telling you anything. It
would just stop, but it didn't, it didn't quit. It just didn't do anything. So what I found
through much heartache and just hours and hours of boring looking is there is an option on windows when you open up a command prompt window if you right click
that thing and you go to properties there's something called quick edit mode if that's
turned on it's really cool because it's what allows you to do like block copies inside a
console thing so imagine you're tailing an output of a log file that streaming stuff across. If you click in there, it'll actually pause the windows so that you can, you know,
copy something. And then if you hit a key afterwards, it'll keep going. The weird part is
this was running as a background task. And for some reason it would would act like quick edit mode had engaged and it would just stop.
So that said, I did find something to where if you write console apps that are going to be doing long running processes or something like that in C sharp, there is a stack overflow here.
And I wanted to give this one a special shout out because the guy that put the answer up here that is really awesome, very elegant,
it works perfectly.
Like I can launch it and I can go look at the properties of the window
and I can see that quick edit mode is disabled.
This is the guy who wrote independent.
So Patrick, the guy that we've talked about his product on here before,
put one of the just – it's just a beautiful answer.
He's got the entire solution that he's got right there.
And a huge thank you for putting it together because it saved me some pain
because you actually have to reach into the underlying operating system,
DLLs and stuff to,
to do these changes because it's not exposed natively through the.net
framework. So really cool stuff.
It's also worth pointing out though,
sorry to interrupt.
It's worth pointing out that hit that answer from him.
While it is not the accepted solution answer to the question,
it actually has more upvotes than the one that is given the credit.
Yep.
The one that was given the credit had the nuts and bolts information,
but you would have had to have done quite a bit more
research on your own to, to put this thing together. And he was like, you know, I really
like to be able to copy and paste code that works. So here you go. So that, I mean, just awesome
answer. So thank you from Patrick who did that. It's amazing stuff. And he does give credit that
his solution was inspired from the original. Yep. So if you ever run across that,
like seriously days of my life wasted,
I'm not exaggerating.
It was,
it was highly frustrating.
So the next thing,
so I love programming that,
that solves some sort of unique problem.
That's fun,
right?
Like I like big data problems because to me that's unique and it's,
and it's interesting and they're hard to solve and it's challenging.
So probably a lot of you guys haven't even heard of this guy, a guy named Mark Rober.
He literally wrote software and did engineering project, or I don't guess he wrote software,
but he was an engineer for NASA, right?
He did rocket science type stuff.
Well, this dude has
one of the most amazing YouTube channels that there is period bar, none, just really cool stuff.
But there's one specific video that you absolutely must go check out where he created something and
they had to write some code to make this happen. And I'm not going to spoil it. I'm just going to
leave the link here in the show notes.
And you absolutely need to go check this out
because it will, beyond a shadow of a doubt,
it will put a smile on your face
because it is just, it's creative, it's genius,
and it's where I like to see engineering minds go, right?
Writing code and writing and creating things
that solve unique problems.
Can we at least give a hint that it's a revenge story?
That's perfect.
Yeah, we can leave it there.
Yes, it's a revenge story.
Okay.
All right.
And so with that, hey, if you have a tip for us that you would like to share, you can go to Joe's favorite place on the internet,
cb.show slash tips and submit your tip for the tip of the week there. Because I bring that up
because I pulled from that and want to say thanks to the pigeon fighter because the tip of the week
that I'm giving this week is the semantic colorizer plugin for Visual Studio, Visual Studio 2015 and up.
And this editor extension will give you distinctive colors.
So what's really nice about it, really, aside from like your normal kind of keyword type of colorization, the thing that really draws me to
it that I like is the colors for things like parentheses and curly braces and things like
that and brackets and whatnot. And I think we've talked about a similar extension for Visual Studio
code called, I think it's Bracketizer. If I remember right. So if you're familiar with Bracketizer, this would be like that, but for Visual Studio.
Very nice.
Yeah, it looks like it does a lot more.
Super cool.
So again, thank you to the Pigeon Fighter.
All right.
So we hope you've enjoyed this episode on heaps and tries. And as Joe mentioned earlier,
if you would do us a favor and leave us a review, we'd greatly appreciate it. You can
find some helpful links there by visiting www.codingbox.net slash review, as well as
we would appreciate it if you spread the word,
you know, tell a friend.
You can subscribe to us on iTunes, Stitcher,
or more using your favorite podcast app
in case somebody did happen to share a link with you
and you're listening to us that way.
Definitely.
And while you're up there,
make sure you check out our show notes.
I mean, put a ton of time
and share all the information up there with you.
And also check out our examples
discussions and more and send your favorite questions right to the slack channel going
with the select the comment be sure to follow us up to our account go ahead over the box you
can find the social links at the top page you got a mouthful all right sometimes i get excited
then i go a little fast we got one and a half hoop stanks down turn it down
so what's bigger a hoop of stank or a nickelback
that's the show title right there
1.21 hoop of stakes
great scott