Programming Throwdown - Hash Maps
Episode Date: August 4, 2021In this duo episode, Jason and Patrick introduce us to the world of hash maps, from buckets and hash functions, to differences between open and closed addressing, to minimal perfect hashes an...d locality sensitive hashing. A familiarity with hash maps is an oft-overlooked but highly sought-after skill, and it can be a valuable asset for those eyeing a career in programming.Along with the main topic, Jason and Patrick also talk about some of their latest interests: books, gadgets, tools and games.This episode touches on the following key topics and ideas:00:01:27 Playing games with Oculus Quest: Acron, Racket: Nx, Gorn, Superhot 00:11:05 News: “I Made a Water Computer” by Steve Mould00:14:56 colinfurze00:15:52 News: Comprehensive guide to Attention Mechanisms00:21:53 News: Starship SN1500:25:18 News: MailSync now Open source (GPL)00:28:34 Jason’s Book of the Show: Elon Musk00:32:04 Patrick’s Book of the Show: Ready Player Two00:33:40 Jason’s Tool of the Show: Datadog00:38:44 Patrick’s Tool of the Show: I Expect You to Die 00:40:30 Escape rooms00:45:39 Sudoku00:48:35 Hash maps: the promise and idea00:50:59 Hash Functions00:52:34 Examples of hash functions: Cryptographically Secure and Non-Crypto01:01:05 Load Factors01:03:43 Open vs Closed Addressing01:15:10 Minimal Perfect Hash 01:16:25 salts01:19:00 Locality Sensitive HashingResources mentioned in this episode:ToolsMailsync http://mailsync.sourceforge.net/Mailspring https://getmailspring.com/Datadog https://www.datadoghq.com/SHA https://en.wikipedia.org/wiki/Secure_Hash_AlgorithmsMD5 https://en.wikipedia.org/wiki/MD5MurmurHash https://github.com/aappleby/smhasheraxxHash https://cyan4973.github.io/xxHash/MapReduce https://www.ibm.com/analytics/hadoop/mapreduceBooksElon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future by Ashlee Vance Ready Player Two by Ernest ClineGadgetsOculus Quest 2 https://www.oculus.com/quest-2/Oculus Link https://www.oculus.com/accessories/oculus-link/GamesAcron: Attack of the Squirrels! https://www.resolutiongames.com/acronRacket: Nx https://www.oculus.com/experiences/quest/2255408847836468Gorn https://www.oculus.com/experiences/quest/3349689215139117Superhot https://www.oculus.com/experiences/quest/1921533091289407I Expect You to Die https://www.oculus.com/experiences/quest/1987283631365460The Legend of Zelda: Breath of the WIld https://www.zelda.com/breath-of-the-wild/Videos:I Made A Water Computer And It Actually Works https://www.youtube.com/watch?v=IxXaizglscwcolinfurze YouTube channel https://www.youtube.com/user/colinfurzeArticles:Comprehensive guide to Attention Mechanisms https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/Starship SN15 https://www.space.com/spacex-starship-sn15-launch-landing-successMailSync is now Open Source (GPL) https://community.getmailspring.com/t/a-free-open-source-future-for-mailspring/484If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/Reach out to us via email: programmingthrowdown@gmail.comYou can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM Join the discussion on our DiscordYou can also help support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★
Transcript
Discussion (0)
Hey everybody, we're here with a duo episode.
We've moved on to two episodes a month.
I'm sure people have noticed by now, which is really, really cool.
And I probably talk about that too much, but I just can't tell you how excited I am that we have an actual producer and we can up the amount of content that we get out to everybody,
which is really cool.
Exciting.
Yeah.
So this episode, we're going to talk about Hashmaps, which is really cool. Exciting. Yeah.
So this episode, we're going to talk about HashMaps, which is really cool.
So before we get into the actual show, I just had like a pre-show.
I wasn't sure if HashMaps include hash tables or like what the umbrella term is.
I think HashMaps kind of cover everything.
I think they include, I know hash maps and hash sets are basically the same thing.
Yeah, that's true.
But then hash table, I mean,
does that just mean you're using an array?
Yeah, I don't know either.
We'll have to, we'll look it up.
We'll look it up.
Okay.
But yeah, our show is going to be all about hashing
and all the different things you can do with hashing
and all the clever stuff that it unlocks. So we're really about that and i actually another thing i'm excited about is i got
an oculus quest uh one of the quest twos and i think i think in our very first show i think i
talked about or maybe it was one of the first few episodes how i got a cell phone and how that was
kind of a big deal and i was such such a, or smartphone, sorry, smartphone.
I was a late adopter to the whole smartphone thing. Um, and I'm also a late adopter here.
So I feel like I've been consistent, but, but I got a quest to, so I got, you know,
Beat Saber and a lot of things that, that everyone knows about. And it's been fun.
I've been exercising a lot with it. One of the things I was able to do with it that I didn't
expect is, um, I've had a racing wheel.
So it's like a USB wheel that actually clamps to my desk.
And it comes also with foot pedals.
And so I'm really into racing games.
And so I can play like F1 and Project Cars and Dirt and these other racing games with an actual steering wheel.
And I've had that for a little while.
But I actually was able to play a VR racing game with the wheel and the quest.
So the quest has this thing called Oculus Link.
And so you can connect the quest to the computer and use it, have your computer drive it kind of like the rift.
So you can use the quest in this way.
And so with that whole like Frankenstein kind of of setup here with the wheel and the computer and
the link and the quest i was able to actually do a racing game with like physical controls and vr
and it was awesome like it was really really fun i was racing go karts and you could actually like
as you pass someone you could look or if someone's behind you and you want to see how close they are
you could turn your head and like look at them and then turn back it was really fun and even just the feeling of like you
kind of like instinctively lean when you turn uh the steering wheel and so when i lean you know my
head actually leans and the view changes and it was it was super super fun i have to admit it was
a total blast yeah i also got an oculus quest 2 There's some, like I don't use Facebook personally,
like nothing against Facebook.
It's just like, it's not something
that I engage with regularly.
So it was a little awkward that it's so heavily linked
between sort of like Facebook,
which you can go read about all on the internet
and people have very strong opinions.
Yeah, so I just set this up.
I could talk to this.
When I started it, it said,
hey, do you want to create a Facebook account
or an Oculus account? And I said, I'll create an Oculus account and try and keep things
separate. So that way I can use key pass of a separate password, et cetera. So I went through
like the whole signup flow. And then at some point really early on, I think it's like, I went to get
my first game or something. They're like, uh, yeah, actually about that. You're going to have
to link these two accounts. And the one you just created is going to get deleted in a year. I was like, what?
Yeah. But aside from that, I also have enjoyed it. My kids have really enjoyed it. We do it in
small doses and we cast from the headset to the TV so other people can watch.
Wait, what? Wait, wait. How do you do that?
There's an option, like a cast option on the main
main menu so you can either you from your phone if you have it linked to your phone you can just
watch it on your phone or tablet um but then you can also send it if you have a chromecast i think
it also might support the what is the apple tv airplay i've not tried that yet but yeah that
it's pretty cool experience i actually have the tool, I won't spoil it, is an Oculus Quest game that I put as tool of the show
before we decided to talk about VR at the beginning.
So I'll leave that for then.
But there's been a couple of games I've really enjoyed
being able to play and being immersive.
One that was really different was,
I think it's called Acron,
which is like a permutation of the letters of acorn
about a tree and nut-stealing squirrels. was I think it's called Acron, which is like a permutation of the letters of acorn,
about a tree and nuts stealing squirrels. And the interesting thing is that the person wearing the VR goggles is a tree that can throw,
you know, sort of, I guess, like bombs and balls and whatever at squirrels that are running towards
it, trying to eat the nuts that are at the base of the tree
and take them back to their sort of base.
But instead of having only bots do this,
you can have other people on their phones
download the game and take control of the squirrels.
So they become the squirrels and you take turns,
or at least we do, we take turns.
Who's the tree?
You have giant tree limb arms
and you're picking up these things
and throwing them at the squirrels. And the other people in the room are playing with you, right? They're
the squirrels trying to come and steal your nuts. Whoa, that sounds awesome. And so it's super cool,
you know, just like an experience that you wouldn't otherwise have. The only kind of like
downside that I've gotten is one is like the battery life isn't super long, you can easily,
you know, taking turns with everyone,
eat through the batteries.
And then the second thing is
you actually need a fairly large space
for the games that don't have you stationary.
For the games where you actually like
move or walk around the space,
which I find to be kind of like
the most immersive and most cool.
You actually need a fairly decent space
that's clear of objects.
So like a living room or something.
You need probably a good
I don't know what they recommend, like eight feet by eight feet or 10 feet by 10 feet. But it's
actually surprisingly difficult, at least was in my house to like, find such a space. And so we had
to kind of like make an arrangement where we move some of the furniture to be able to, you know,
have a big enough space to really fully enjoy it. Yeah, yeah yeah i was playing this game called i think it's called rocket rx but it's it's effectively like uh it's like solitaire solitaire tennis where you're
surrounded you're encased in like a sphere and so you hit the ball off the sphere at different points
to hit you know various targets and stuff that are like overlaid on the sphere right and so you're
and the harder you swing well it's not's not, if you swing hard enough,
then the ball will actually skate along the surface.
So for example, if you hit it hard enough,
then you don't have to like hit the target directly.
You just have to be kind of a leading the target
and the ball will kind of go through it.
And so it kind of encourages you to swing
a little bit hard.
I don't think the harder you swing, the better it is, but you have to swing hard enough. And so I
had drawn my guardian, which is what they call the sort of border you set up before you start
playing. I had drawn it kind of right up to the wall and not really thinking about it. Oh no.
Yeah. By the time the guardian warned me, like a split second later, I'd like punch the wall. It wasn't a big deal, but yeah,
you easily like kind of lose track
of where you are spatially.
And that was one of the things that surprised me.
I've only played it for a few days now,
but surprised me was how often the Guardian
is actually, you know, warning me,
like how often I get far away
from where I was originally standing.
I also got a game called
gorn g-o-r-n which is a uh it's like a roman coliseum and um there's a bunch of like weapons
on the ground and so you actually reach down to the ground pick them up and uh and you have to
sort of fight other people in this coliseum and uh it is i mean all of these games are a huge huge workout
i mean i was sweating afterwards it was a lot of fun oh this is it feels like uh that movie
wally you know like all the people like riding around in their you know chairs and drinking
their slurpees like we're we can't go outside and exercise you know it's too much work so we're
gonna put on these goggles and stay inside and exercise. It's
just kind of, we're getting there. It's pretty wild. I mean, it's, it's, yeah, it's interesting
because it's definitely more of a workout than playing like Quake or something. So in a sense,
it's like a step in the right direction as far as not ending up like a person in WALL-E.
But yeah, you're right. It is. Actually, I saw a game in the store that said VR VR,
where I guess you go around putting on VR goggles in VR.
So there is, I think it's this Superhot.
So I think Superhot, which I believe is available.
We should say there's lots of VR goggles out there.
I think this one's gaining momentum
towards being kind of more
commonplace than just early adopters.
Well,
I think the quest is the only one that's self-contained,
right?
At least that I know of.
I think there are others,
but they're not at the same performance.
Like they were much cheaper ones further back,
like years ago.
Again,
we're late to the party.
Yep.
Yep.
So in super hot,
the,
which is an interesting game.
They have,
I believe it's there a scene where
you reach up and pull down vr goggles onto your head to use your controllers to reach up and grab
it and pull it down onto your head and it's sort of that same surreal thing wait i have on vr goggles
and i'm reaching for vr goggles in vr to put on my vr dog oh it's amazing. I look forward to all of these little cheeky puns in these games as people start adopting
these more and more.
Yeah, I mean, another thing I did just for the experience.
Yeah, I don't think I would do it again, but it was actually pretty interesting was I watched
an episode of Family Guy in VR.
And so you're just like in this, like, I think you're in like some kind of a tree house or something, whatever that, you know, environment is when you start the quest for the first time you're, you're in this environment and there's basically just a giant, enormous TV that's just suspended in midair and it's playing family guy.
And, uh, yeah, there's something just interesting about that experience.
Like, like, I don't think I would do it again.
Yeah.
It's like, you might as well just watch the TV of course. Right. But, uh, there's just something kind of interesting about that experience. Like, I don't think I would do it again. You know, it's like you might as well
just watch the TV, of course, right?
But there's just something
kind of interesting about that.
Part of it, too, is I was just surprised
how comfortable it was.
That really shocked me, actually.
I think that having the strap
that goes over your head
and, you know, just getting
all those straps, you know,
tightened the right way
actually made a really big difference.
I think it's time for news.
Go for it. What's your news?
All right. My first one is a YouTube video called The Water Computer, or I guess it's called I Made a Water Computer by Steve Mould, which I guess he's a fairly famous YouTuber.
I keep coming across YouTubers who have like a million subscribers and I'm like, I heard of this person before yeah i wonder what the percentile is there like uh i i
bet there's there's there's got to be over a hundred thousand maybe everyone's probably groaning
like oh of course i already saw this video i knew this guy well i didn't i never heard of it so i
came across this video by steve mold saying i made a water computer. And he is doing it with another sort of
like math YouTuber, Matt Parker, which Matt Parker, I've watched his videos before. So I'm not sure I
don't watch all of them. But I'm not sure how I miss this. Anyways, the guy has an interesting
mechanism for building logic gates out of sort of plastic and doing the computation instead of
electricity, you use water. And I just thought
it was pretty cool. So in college, I took some amount of classes in digital logic,
discrete logic, these kind of things about building up an entire CPU out of just gates.
This is pretty common in a lot of electrical engineering or computer engineering curriculum.
So I'm not unique, but I always just come back to it as like a sort of very enjoyable way to to kind
of think about these like individual gates and how that they grow up up up and eventually you're
playing vr on a headset so just to be clear like this doesn't mean a waterproof computer this is
literally a computer that uses water instead of electricity oh okay so yeah i was about to get
there thank you for making sure to clarify that yeah so this is like a class of people who have done what I guess what you call like physical computing, where instead of having like silicon logic gates, or it used to be, I guess, vacuum tube gates, you actually make anything that by having two physical objects, and then if both of them are in the quote unquote one state, then you know,
and you have an AND gate, then you get some sort of output. So I've seen them built of Legos,
of Tinker toys, of now somebody with, you know, flowing water. These are always just like super
fascinating to me. And in theory, I mean, there's some nuance to using water for doing your actual
computation. But in theory, you could build up a fully functioning computer using this.
I mean, it would get unreasonably large.
I mean, modern silicon has an enormous amount
of transistors in it.
So you would end up, you know, it wouldn't be practical,
but it's just a fun, thoughtful way
to see the kinds of things people are able to do.
And so you actually build something they can add to,
oh, I don't have it in front of me,
two four-bit numbers or two five bit numbers. And then the output computation
is a given by water flowing through this series of valves. So if you've never seen it before,
you know, check that one out or just in general, you know, the fascination around
using physical objects to do. Yeah. So when you go to take your SATs and they tell you,
you can't bring your calculator, just be like's all right i got this guys uh can you please define calculator for me um
yeah i just it's okay i don't need a calculator but can you bring me like 10 jugs of water
yeah can i have a 55 gallon barrel of water by my desk. Hey, wait, guys, it's still dripping. It's still dripping. Hang on. I need 10 more seconds. Oh, my gosh. Yeah, man, that brings back so many memories of
I think that SAT you could have a regular calculator, but the ACT you couldn't have
anything. I'm sure it's changed now. I haven't kept up with it. But I remember,
yeah, there being like a specific list of approved calculators you could use.
Yeah, that's right. That's right.
Cool.
Yeah, that sounds awesome.
I'll give this a shot.
Actually, you know, I was thinking about it the other day.
A lot of the videos that are in my YouTube history are videos that you've posted.
Really?
So yeah, this is yet another creator.
I guess another way of putting it is maybe not the history because that's too shallow,
but like too recent.
But a lot of the people I subscribe to are people that you found for me ah so uh cool i'll have to return the favor
maybe next show i'll do uh are you have you heard of colin fursey yes yeah okay okay that's that's
one person that i found on my own so just for people who haven't heard of him on the show he
does uh he just makes crazy things he's like a mechanical engineer extraordinaire and just builds all sorts of wild things the most maybe not the most recent
episode but a recent episode he made a car uh waterproof um and the interior of the car is
totally waterproof and so then he filled it up full of water and he had like a car that was also
a hot tub what okay i've not seen this one yet. Yeah, you got to check that out.
It's a convertible and it's a hot tub.
And while you're driving,
there's like water splashing out of the car and everything.
All right, my news,
one of my news is this really cool guide
to something called attention mechanisms.
And so I'm going to need to kind of, you know,
explain and build up to this.
But, you know, as a lot of people know if you want to do
what people call supervised learning which means someone gives you a label so they say like okay
here's a picture this is a dog here's a picture this is a cat here's a picture it's a dog they
give you a bunch of those you can do supervised learning and you could train a machine learning
model to predict the output.
So you could give that model a new image and it will tell you whether it's a dog or a cat.
Actually, it'll give you the probability that it's a dog, the probability that it's a cat.
Right. And so, you know, you might have, let's say a table full of data in Excel or something,
and then you have this one column and that one column in the table, maybe sometimes it's missing.
And so you want to try and predict the value of that column when you don't have it.
And so you train a machine learning model to do that.
You take all the other columns, you pass them as inputs.
That's relatively straightforward.
You'd have some kind of neural network or some kind of model to do that.
Now, when you have images, that's more tricky because an image, every single pixel,
and really every channel of every pixel is a separate input. And so if you just use a regular
neural network or some other model, it's just going to have too, it's too expressive, too open
ended, and it can't really learn effectively. And it won't be able to handle different sized images
and stuff like that.
And so for images,
they've made this thing called a convolutional network.
And so the convolutional network's kind of nice.
Think of it as like a roving eye.
So it's like a little eye
that can look at a piece of the image
and that eye kind of moves across the image
doing the same thing
at all the different spots in the image.
And so if you know how to detect a dog, you know, in one spot of an image, that same dog detector
will run everywhere else. And so you could end up with the output could be, let's say another image
that's some black and white image. And the more intense the image image the more likely there's a dog there or not and
that image is created by this this eye that kind of like roves over the image doing the same thing
over and over again and because you only have to learn that eye once then it makes it practical
to to to build these models and so it's cool for images but then you you have this you have like a
lot of other modalities that you want to maybe learn from, right? And one might be language. So you might say, well, here I have a sentence and I
want to predict whether this sentence is, I don't know, makes people happy or sad. And so you have
a bunch of sentences and then you have a bunch of, you have another column that tells you whether
the person was happy or sad and you want to train the same type of model but how do you do that with a sentence right it's not really clear like how to
how to you know what sort of you know uh model or what sort of topology you need to do that
and the thing that's especially challenging about sentences is that they can go on forever
just and other than the image you know if there's, you know, the dog will probably just be in one spot of the image.
Right.
Or if the dog is going to be over the whole image, you could shrink the image down.
There's tricks you can do where you only have to look at, let's say, a certain spot of the image.
But for sentences, you can't do that, especially for certain languages and things like that.
But even probably across all the languages, like the first part of the sentence might be really important when you're reading the end of the sentence. And so the sort
of structure that they've designed to do things based off language is called LSTM, which stands
for long short-term memory. And it's similar to this roving eye. I mean, it's a different technology,
but it's similar in the sense that it uses a certain mechanism to kind of go over the whole
sentence and be able to sort of build up knowledge as it goes through the sentence.
So with all that groundwork, LSTMs are similar to convolutional nets that they didn't work
for a really long time.
And recently, people have been able to get LSTMs to actually work and be really accurate and train models that are really accurate.
It's through something called an attention mechanism.
So the problem with LSTMs are...
So this LSTM, think of it like an accumulator, like a battery, right?
And things are coming into this battery, like knowledge is coming into this battery.
And this battery is sort of storing this knowledge.
And then when you finish storing the knowledge and the battery now has kind of everything
in it, now you can start draining the battery, draining that knowledge.
So you could imagine if you're, let's say, translating English to French,
you would store all this English knowledge and then you'd have another model that kind of drains it in French. And if in French you would have reversed the order of the words in a sentence,
that's okay because the battery kind of has everything, right? The problem that they were
having was as you store new things, they were kind of erasing the older things.
And it's very hard actually
to kind of keep storing more and more and more words
without destroying like the very first word you put in.
And so the attention mechanism
is a way to prevent that from happening.
And so the article gets a lot more detailed than that,
but this guide is really cool
because it goes through the whole history
and explains it kind of with pictures.
And so if you've ever wondered
how machine learning works on words,
check out this guide.
It's got a lot of good info.
You make my news article seem so shallow.
Well, that's probably the deepest one I've ever linked. So
sample size of one. Yeah, that's super fascinating. I think that was a great overview.
I knew all of those words. My news article is another abrupt 180 degree different direction,
which is the successful launch and landing of Starship SN15. So we have like a little bit, I guess Jason was mentioning
about being a late adopter to smartphones.
We have a history as well
as covering some amounts of space articles
and making space predictions.
This happened a couple of weeks ago
while we're recording this.
So it's sort of old news,
but I just want to celebrate.
I think it's super cool.
This is a SpaceX prototype
and they were able to launch it out of Texas
up to, I don't remember where they go,
up to about 10 kilometers or something
and then land it back just next to where it took off from.
But just the whole thing flying and launching it,
it looks so ridiculous
and to actually see them do it successfully.
They've managed to sort of land before,
but there was always problems of one kind
or another and so this is the first time they sort of took off landed no major issues and in fact
they're looking to potentially do it again i guess they're doing inspections and stuff so oh i was
gonna ask about that because i thought they had done it before but yeah maybe there were problems
you know it might have looked like it landed, but something broke or something.
Yeah.
And there was also the, they did tests earlier, which were not as high and weren't, didn't have aerodynamic surfaces.
So they did some like tests of the engines and stuff.
This is the first time they sort of sent up what would end up being the final stage, like
the orbital part of the thing.
And now it looks like they're, based on filings, they're thinking about launching up to
orbit from Texas and splashing down somewhere around Hawaii sometime soon, which would require
I think them to stack two stages together. So this thing's huge. If you've not, go look at the
like diagrams for how big the final thing is going to be compared to like a person or existing
launch vehicles. And it's it's absolutely insane.
Wow, it's amazing. Yeah, the so how far are we away from being able to get a ride on one of those
as a passenger? You and me or an arbitrary rich person? Let's say let's say an arbitrary rich
person. I mean, I think they're going to fly several civilian launches this year.
I think SpaceX is doing one on the Falcon.
Their current ones that go up to the space station, they're going to launch some civilians,
which will sort of kick off another set of civilian space travel, which hasn't happened
for a few years.
But I think on Starship, I would say probably still sort of two or three years away.
I mean, there's the Japanese person who bought a trip around the
moon on one of these starships, and they haven't set an exact date yet. But I think they're saying
it's going to be closer than you think. Wow, it's amazing. Now you and I affording it?
Yeah, that's a that's a whole nother question. Yeah. Yeah. Maybe if i if uh yeah we stop you know buying so much junk food
maybe we can get there how much dogecoin did you buy oh gosh
oh yeah yeah last show um or a couple shows ago a few years ago we're talking about you know like
fractures or banking and those things, the last duo episode. And since
then, people have been trading their dollars for dog money. So I think that just kind of tells you
everything you need to know. So okay, my second news article is on MailSync. So I actually use MailSpring. If you've ever gotten an email from me
recently, you'll actually see this little footer because I've never bothered to change the default,
which is great advertising for them. You'll see this little footer saying MailSpring,
the best free email app or something like that. But I actually love MailSpring. We should definitely
do it as a tool to show some time if I haven't already. But it's a nice desktop email app.
It has a lot of really nice features.
It collects all your inboxes.
I just feel like it has a great interface.
I'm a big fan.
And for a while, the MailSpring kind of front end, which is all in Electron, was open source.
But the back end, which actually handled ingesting mail from a trillion
different sources, they handle Exchange, they handle Gmail, they do all of that. That was all
closed source. And so I don't remember if MailSpring is on Linux. I know it's on Mac and
Windows. I'm sure it's on
Linux too, but, but in general, like you always, when somebody is closed source, you always kind
of wonder how long it's going to survive. And, uh, and now they recently decided to open source
the whole backend, which is really cool. Um, not only does it mean mail spring will be around for
forever, but, um, also, um, if you ever needed to do that, like if you needed to ingest
mail, and basically MailSync basically converts email into think of it as like a SQL light table.
And so if you ever needed sort of a database representation of your email, and one that sort
of is kept kept in sync and kept alive, kept alive. MailSync does exactly that.
And so I think it's a really nice thing to have for the community. So I'm really excited about
all of that. I think email is still an area where there's a lot of low-hanging fruit. I mean,
I spend a ton of time on email. Something you know, organized email. I do think there is something to that. And this will sort of open up that where people can write new front
ends and new mail experiences without having to learn, you know, 10 different protocols.
Oh man, that's cool.
Yeah. There are some email clients that are premium, like there's superhuman and there's
hay, but I haven't really tried them
yet. But yeah, there's got to be, I think, a better way to do email, like something that's suggested.
Like, I don't want something that automatically puts things in different folders because you
never really know what it's going to do. But something that somehow like just suggested
ways for me to do email better. I think that could be cool.
I mean, I was, I much preferred inbox, Google's Gmail client that they experimented with a while.
I'm so still disappointed. I still feel I'm worse at email using the Gmail app than I was using inbox. I too still have a hole in my soul. Yeah. I actually switched. I, I, I found
mail spring when inbox was shut down.
Ah, okay. Well, we're not going to get on that. It's gonna make me sad. It's time for.
Yeah, it's true. Inbox was so good, but it is what it is.
Time for book of the show. All right. My book of the show,
which actually, you know, maybe this might be the first time ever, but I already read this book. I'm not suggesting a book and walking someone off a plank.
But yeah, this book is self-titled.
It's Elon Musk.
And so it's all about Elon Musk.
I think it's awesome.
Actually, I found the book fascinating.
I think that it really did a good job of you know covering the good and bad stuff actually it's
really interesting the foreword is or maybe the first chapter preface or whatever whatever is um
is from the author first person talking about um the negotiation he had to go through with elon to
to write the book and originally elon wanted all these footnotes and you know i guess he's like
he has a sort
of engineer mindset where he wants to just correct the record and wants everything to compile right.
But they kind of went back and forth on that a lot. But as you can imagine, someone like that's
really concerned about their image and everything, right? That's not a surprise. But yeah, I felt
like the book was extremely well balanced. It had a lot of really good detail. It covered all the, no pun intended, all different chapters in Elon Musk's life.
So he started off as just an intern at some software company and then just doing a bunch
of coding.
One thing I noticed was, and one thing you kind of take for granted,
or at least I kind of took for granted, is the support that he received from his family.
So there's a bit of trivia here. Elon Musk, one of his things he's known for is having
kind of a rough childhood. His parents, I guess, were really strict. And I'm not taking anything
away from that. But his parents actually funded the first
company. And, and actually when I've read about other, uh, like tech billionaires, actually,
you kind of see that pattern over and over again. And so I was thinking, it made me think a lot
about, um, like, like that sort of community, like the Elon Musk's and the Mark Zuckerberg's
and the Bill Gates and what have you. And a lot of these people are kind of like in a position where they can take some of those really big risks. So like, you know, someone in all of these cases, someone offered them millions of dollars for their company, and they turn them down. And I feel like that would be pretty extraordinary to do that. But then when you see the background, it's like, okay, well, actually these people already had millions of dollars usually like through their family. So,
so that, that kind of like made more sense, but still, I mean, the, the, you know, Elon Musk's
a total genius. Uh, I felt like, uh, yeah, I have a ton of respect for him after reading the book.
And, uh, um, yeah, I thought it was a fantastic read. It actually is very exciting. I was really
interested in it the whole time. So it doesn't, you know, a lot of these can tend to drag on because they might say, well, here's sort of the middle years where nothing happened. And there really wasn't a lot of that. The book was super entertaining all the way through.
Nice. I've actually not read any Elon Musk's biography. I not too much studied him as a person yeah you know i i it's uh i know very little also about a lot of i don't really
study a lot of people but for some reason like i was really just drawn to reading this biography
the only other biography i've read is the steve jobs biography and uh yeah there's there's just
a lot of really interesting parallels parallels there and uh yeah it's a solid book nice my book of the show is ready player two
this is the follow-up to ready player one um by ernest klein uh it's set in the same universe as
ready player one i guess it's a low-hanging low-hanging fruit not a super deep book but
enjoyable i enjoyed as much as ready player one but definitely pretty solid i i'm not getting
through as many books uh nowadays as i used to yep same here but definitely pretty solid. I'm not getting through as many books nowadays as I used to. Yep, same here. But this one was nice, easy. I actually listened to it,
but a nice, easy listen. And I kind of did it in the background. I thought it was good. You know,
the same kind of thing. If you liked the first one, you probably like the second one. If you
didn't like the first one, I wouldn't bother with the second one. Yeah, I mean, it's the same kind
of like pop culture laden,
you know, exploration of the 80s and a little bit more music bent maybe than the first one.
But yeah, a good read. I always struggle. I, you know, don't unlike a biography, I don't want to
spoil the fiction too much. So I don't really want to say too much about what the book is.
But yeah, Ready Player Two by Ernest Cline.
Cool. Yeah, you recommended the first one a while back, and I gave it a read and I thought it was
fabulous. So yeah, I'll definitely I think I already bought this book as soon as I saw it,
but I haven't read it yet. Yeah, my backlog is getting pretty big. So I got to start spending
some dedicated time getting through all these books I want to read. Yeah, I noticed that I've kind of caught myself switching to podcasts because, you know,
I don't have the long commute or anything. And so but I think I'm going to try to get back into,
you know, doing full books, but just in bite sized pieces and see how that works out.
Is it time for tool of the show? Tool of the show.
My tool of the show is Datadog, which is an amazing tool.
It is super, super fun.
So the way Datadog works is, so you're developing, let's say, some kind of app.
It could be an Android app, could be a website with some web backend, or even on the front
end on the browser.
Anything you develop, you're going to give it to somebody.
And that person is going to run it when you're not around. And so a bunch of logs are going to be generated, right? Now, like if it crashes, that's a whole nother thing. And I think I already
mentioned Sentry in a previous episode. So if it crashes, you know, you have Breakpad, you have
Crashpad, you have Sentry, you have a million different tools, Rollbar, etc. But then there's also just, is this healthy?
In general, you don't want to crash websites or apps, right?
But you still want to know, for example, how many people are opening your app, clicking
on, let's say, the settings and then leaving the whole app?
Well, that would probably be good to know.
Or just anytime you have any type of log, like sometimes you have a log, you know, should never get here. Right. And it, and it fires. Right. So most apps, sites,
et cetera, are generating logs. And, you know, it's great to have those sort of aggregated.
And so a data dog does is they have a bunch of clients for a whole bunch of different languages. I actually ended up writing my own client because they didn't have a C++ with your application ID and a bunch of logs.
And then what Datadog does is it collects all of that. And then it presents you with this really
nice web UI where you can, you can actually, there's something called live tail and you can
actually see logs as they come in. And then you can write rules. And they actually have this sort of way
to automatically find patterns in your logs. So if you have a bunch of logs that says,
you know, received 26 packets, received 31 packets, received 100 packets, right,
all of those lines are going to be different, but they all have the same pattern. It's just received,
you know, N packets. Datadog will automatically
pull out that pattern and then group all of those logs together. To be honest, I haven't ever used
any of the premium features. I've only ever used the free tier, which makes me think that they
probably should maybe charge more or something, but nothing I've talked about costs any money.
They do all of this for free. And then you can then write rules on top of that.
So you can say, for example, all the logs that have the severity level of error, I want all of those to get written to an S3 bucket.
So S3 is the Amazon AWS file system.
And so that's one of the rules I have set up.
So whenever an error log happens on Eternal Terminal,
I've actually gone through and looked at all the error logs
to make sure that there's nothing personal,
like someone's name or IP address or anything like that.
And so I know all these error logs are safe.
And then whenever one of those error logs are written,
it will get sent to Datadog and
then it gets dumped through an S3 bucket.
So all of that is super nice.
The other part of it is Datadog, as I was saying, kind of alluding to earlier, they're
using your own kind of infrastructure.
So like Datadog's not actually storing anything.
They have a way where you give them permission to a folder on your AWS.
And so you're, you know, effectively you're paying for all the hosting costs and stuff yourself.
And because of that, there's no limits.
Like they can just take tons and tons of data.
And if they have to write a lot of data, you're going to pay for it through Amazon, not through them.
So it's an amazing service for free. I have to look up what the, I think for premium, they'll let you do alerts and
more fancy things. But yeah, definitely check that out if you're building anything that's going to be
looked at by other people. Super cool. Yeah, they're like a publicly traded company and stuff
now, right? That's a good question. I don't know.
This is super cool to like hear you say like kind of like their niche on top of other people's services, I guess is kind of cool.
Yeah, actually, that was I felt like a really good sort of business decision, because in my case, you know, it's a it's an open source project.
I'm not too worried about if they get hacked, these sort of error logs, which, as I said, have no personal real data on them.
But, you know, imagine you're like a company that's logging customer information or whatever.
The fact that, you know, it's passing through their system briefly and then it's gone is kind of nice from a security standpoint.
And it means for them they don't really have a big footprint.
My tool of the show is uh
already teased it not a tool but a game i expect you to die uh for i played it on the quest 2 but
i believe it's actually on a lot of vr platforms so real quick the data dog is publicly traded
and actually that company is has a market cap of 28 billion billion. So that is a much, I kind of, I thought it was
just a small company, I guess I was totally off the mark there. But I think if you're going to
buy a Dogecoin instead, buy a share of Datadog. That's my advice. Shares are a million dollars piece um so my game is i expect you to die i guess it's a little bit like a escape room
kind of feel you unlike the ones i was saying where you move around the room here you're
actually supposed to be sort of seated but all of the scenarios make sense with you being stationary
so either you're like sitting or standing in kind of one spot and moving your body sort of like rotating around the room to do various items and there's various puzzles
and it's themed as you're sort of a spy going on missions but there's not a ton of explanation
and various things that you do in trying to to solve the puzzles cause you to die. So you may be like in a car and there's poison gas outside.
And if you break the window somehow,
the poison gas is going to come in and kill you.
But that's not the only way there might be a grenade.
And if you drop the handle off the grenade,
the grenade will explode.
And so you aren't expected,
at least I guess based on the title,
I assume you're not expected to not to die.
So you die and then you try again and and you you sort of figure out the puzzles uh puzzle rooms isn't
interesting i've never done an actual puzzle room like in real life like i've never gone to one
oh my family loves them they're so fun like escape rooms you mean oh that's what i mean
yeah sorry i don't even know the name i don't't, this is not my... Oh, they are so good.
But I sense it's probably something similar where there's a kind of language to them
that you have to figure out.
Like if you just went to one never having heard of it before,
you're not going to necessarily do so well
because you don't, the premise can be,
you may need to know a certain trope
or a certain way of thinking for it to start to make sense.
The same is true of a lot of things. So I feel this one does a good balance of understanding
where the puzzle is. So sometimes when I play games that are claimed to be sort of escape room
themed, I don't even, I can't figure out where the puzzle is. I understand I'm supposed to do
something, but I don't get that you're supposed
to just tap around the screen
or you're supposed to bash open things
to look what's inside.
Those aren't intuitive to me.
But this game, I feel like,
strikes a really nice balance of like,
it's clear what things I can affect.
And then by like working through
the interactions between them,
I can see where the puzzle is and I can see how to solve the puzzle.
I don't always get it right the first time, but each time I can continue to make progress.
And so I find it for my naive level of sophistication.
I don't have to like go consult a walkthrough repeatedly to like keep making progress.
I'm able to work it through on my own.
And it's at just the right difficulty level where I don't get bored of it.
And I don't just cruise right through it either. So I very much enjoyed and the theming is kind of
cool to a little bit James Bondy a little bit, you know, the spy thing. And it has a little bit
of story without being too much I find to be a really good experience. And each level is sort
of bite sized enough that for me, like sitting and doing one VR session, you know, I can get
through kind of like one level and that's about the right for me. Cool and doing one vr session you know i can get through kind of like
one level and that's about the right for me cool it makes sense and so yeah i guess one of the
challenges with that i mean puzzle games in general it's like uh you never really know as a
puzzle game designer like how you're supposed to deal with someone not figuring it out that seems
like the hardest part right like with these rooms, what they've done is,
like for example, we did one where,
and we've done a bunch,
but we did one where we just,
there was just a trick that we didn't get.
I think it was some kind of language trick
where there was like four different cat names
and based on the cat names, that was the clue.
Like you just had to know some factoid
that none of us had really heard of.
And so we were really stuck.
And so they have this thing where you can press a button
and basically they'll kind of give you,
I think there's two levels of hints.
But yeah, kind of getting that right,
I think especially for VR, it's really important
because you don't want to take the headset off
and then go look on Wikipedia andipedia back you know and so uh
actually how do you die in in these games like does the screen just go black yeah i think it
like i don't remember exactly i think believe that like fades to red you know and then it's just like
but whatever when it happens like a explosion of a grenade or like gas coming in and you starting
to like black out and you have a few seconds to rack because sometimes you can stop it but sometimes like a grenade exploding you can't um and then
it fades right and you go to like a sort of death screen where it you know makes some funny quip
about you sucking and not figuring it out or so much wasted training on you or you know some
some such something and then you can replay the level but there are achievements as well and the
achievements sometimes require you to die in certain ways or whatever.
Oh, yeah, that makes sense.
Yeah, I used to play a game when I was really little called Lost Vikings, which was a puzzle game.
And one thing that I thought was pretty interesting was if you lost, if like you died three times trying to finish the level, then on the fourth time they would offer you to skip it.
Oh. then on the fourth time they would offer you to skip it and uh oh yeah i feel like that that
actually is probably the hardest part of all these games is figuring out what to do there
they do that in some super mario games like the newer ones if you die too many times they give
you like a special power up or whatever that that makes it a little easier yep yep that makes sense
um but yeah i so i've really enjoyed it you know, I think I want to get more into these puzzle games,
but like you pointed out,
they need to be kind of like well-designed
or else it just becomes frustrating.
I still find myself frustrated at some of the puzzle games
where you need to like do an action repeatedly,
even though it's not resulting in anything happening.
Like, oh, you click this character and the character burps
and you have to kick the character
and they just keep burping in exactly the same way.
But on the 13th burp, they belch out a potion in their belch bubble or something.
And it's like, what?
Like, why would I sit here and click the same thing, getting the same result?
If you don't give me some indication that progression is being made, I'm going to go do something else.
Yeah, yeah, exactly.
Like there should be a way to just using logic and observations to not have to guess. I think that's that's really important. Yes. Yeah. Yeah. Actually, they I used to do the those would you'd guess but it would just take forever or like Sudoku. So because much more common example, like you shouldn't be doing a lot
of guessing in Sudoku. Like if you're doing a lot of guessing, no, no guessing. Properly designed
Sudoku puzzle requires zero guessing. I thought there were some were like there would be maybe
one guess. It may need a very advanced technique or whatever, right? Like if it's a hard puzzle,
like you may not know the deduction method that's needed and therefore you would have to guess if
it's like above your level but uh like a any like now i don't want to say like you can make whatever
sudoku puzzle you want right as long as if it has a unique solution and it could require guessing
though but they're like when the algorithms design them they or the designers
design them they're supposed to know here's the only kind of techniques you know and it should be
forward solvable using only deduction oh interesting yeah that makes a lot of sense
i think uh i think yeah there was this other puzzle game i used to really enjoy
and um i i watched a um actually a podcast. There's a podcast called Game Design.
I can't remember the exact name of it.
It was Game Design Podcast.
And one of the inventors,
the person who wrote that game was on that podcast.
And it's actually really clever.
They just kind of,
and Sudoku probably works somewhat similar,
is it kind of worked backwards from the answer.
But you have to work backwards
in a way where it's not ambiguous.
But yeah, I think that sounds awesome. awesome actually i'll give that a shot i'm looking for another quest game to
pick up yeah um this is one where like i pretty much like the kids will play a level i've already
played and once they learn like you know kind of thing but it's over their head but they enjoy
watching me die like oh daddy sometimes i do it on purpose sometimes like i realize there's a novel
way to die and i'll let myself die just for kicks.
But they always think it's hilarious.
Oh, daddy, you suck.
Like you can't, like you didn't realize
that grenade was going to kill you.
Like how dumb are you?
No, I mean, I'm being hyperbolic,
but it's an enjoyable family experience that way.
Yeah, my son does the same thing with me and Zelda.
So he, I made it to this one temple in Zelda
where it really was too
advanced it's one of these ones you have to battle a robot people have played the breath of the wild
there's some of these temples where you fight battles but the game is non-linear so you could
get there at any point and this one robot i just ended up basically if he hit me at all i just died
instantly and so it made it made this battle way harder than it really needed to be by doing it
so early. And yeah, my son actually was filming it on his iPad for his school share day. And he's
like, dad's going to die again. He's going to die. Watch. He's going to die. And then I died.
He's like, yep, dad died again. There's a game over screen. I was like, honestly,
it's the first time in probably 25 years where I wanted to throw the controller for 35 years or something.
But I was like, you know, having the audience support
actually didn't help that much.
All right.
Hash maps.
Hash maps.
Yeah, I'm going to kick it off.
So, you know, before, you know, when I was in college and learning about data structure and algorithms, of course, I learned about, you know, Hashmaps.
And, you know, oh, cool.
Like, we're going to talk about, you know, what makes them interesting in a minute.
But it never really clicked to me.
And I was mentioning this to Jason in the pre-show.
I don't think we'll get to it at the end because we're already going along. But when I went to start, you know, looking at sort of what you call Silicon Valley companies,
big companies, like interviewing there, like I kept reading, oh, HashMaps, you got no HashMaps,
you got no, I hadn't really used HashMaps.
And I thought they were overplaying it.
But sure enough, all my interviews, like some solutions needed familiarity with HashMaps
and intricacies of HashMaps.
And then showing up to, you know, my first jobs there, like, sure enough, like hash maps were
like super common. When you get to kind of like really big data, I was really surprised I
underestimated the importance of hash maps early in my computer science career.
Yep, same here.
So the promise of a hash map is insertion and lookup being constant time
so the idea is if you think of like a binary tree we talked about trees i think in our last duo
episode then you get log of in insertion right the insertion if you just insert into a sorted array
for instance uh you and you don't use binary search,
you'd have to sort of like go through each element in the array to find where you need
to insert it.
So that would be like linear time.
If your stuff is a well-formed binary tree, then you need log of in like visits to nodes
before you find where to insert.
But HashMaps promised to blow that away by saying, I'll give you a constant time insertion
and a constant time lookup.
So if you're trying to see if this key is in your HashMap,
you can do that in sort of constant time.
That doesn't mean zero time.
It just means that it doesn't really matter
how many items are in your map,
that no matter how big your map
grows within bounds, you would still get sort of this algorithmic constant time. And so that's a
super, super powerful thing. And we're going to talk a little bit about the various ways that can
be accomplished. Cool. Yeah, totally makes sense. Yeah, I think, I think it actually will surprise a lot of people, like how ubiquitous hash functions are, they're really just everywhere. But But yeah, at a high level, what's going on here is, so you know, we talked about trees, right? And, you know, trees are super nice. The challenges with trees are that you have to, you know, keep them balanced, right? And so that could be really expensive.
And even if you don't have to keep them balanced,
putting something or finding something in a tree takes logarithmic time.
And so that doesn't sound like a big deal,
but when you start putting lots and lots of things
into the database,
it can actually become a really big problem, right?
So imagine like log base two of of a million is
oh geez i kind of uh kind of put myself on the spot here is what like maybe six or something or
the log base you know log base two of a million is what like 14 or something anyways it could
easily end up taking like 14 times as long the internet says says it's 20. Oh, 20. Well, there you go. All right. It shows
how well I could do a log in my head. But yeah. And so imagine, you know, like you have an app
and now it takes 20 times as long to run. It could be, it could end up being a really big deal. And
so the challenge with hashing is figuring out sort of the hash function. And what that means is figuring out
sort of some type of way
to represent something really concisely
or really like ultimately represented
with just a number.
And so there's been a bunch of work in this field.
I mean, it would take a really long time
for us to really dive deep into it.
But at a high level,
there's really two types of hash functions that are
important. The first type is like cryptographically secure, and the second is not secure, right? So
for example, if you're trying to hash like credit card numbers, well, maybe, like maybe, you know,
someone, let's say someone has a, like has a person's name, they're trying to find their credit card number, and maybe they have somehow they have some access to the database.
And so they have all these hashed credit card numbers.
If you do something that's not cryptographically secure, then, you know, they can use various statistical tricks to try to reverse engineer that.
Or maybe they can narrow down.
If they have a list of credit card numbers,
they can narrow down which one it is. And so if you're storing something really sensitive,
using a cryptographically secure one has some nice guarantees, right? Or if speed is not super,
super critical. And so that's why something like Git, for example, Git will use SHA-256,
which is cryptographically secure. Now, you might not be
storing passwords in Git. In fact, you probably should never be storing passwords in Git.
But they do that because it's on almost all the computers and that's not the thing that's going
to slow you down if you're using Git. You're not blocked by the hash. Jason, there's one more point
there too that I was kind of unaware of because I don't build these kinds of services. But another reason to make
sure to use cryptographic hashes is if you have sort of a public service in some way or an API
that isn't only trusted users, I guess, that is backed by a hash map and the keys in some way are like basically user defined. So I'm creating a
username or I'm creating something that I can kind of control and if someone can deduce what
hash function you're using and insert entries which all collide, all go to the same hash,
then they can basically cause a denial of service. We'll talk about why in sort
of a minute. But that's another reason why if you have a public API of some nature, where you're
going to back it by a hash, you want to be really careful that someone can't cause like force hash
collisions to occur, and thereby like slower denial of service your your service. Yeah, yeah,
that makes sense. Yeah, I think, yeah. So, so just,
yeah. Using Git is a good example. Like, so let's say someone, um, you know, creates like 10 million
Git commits that all have the same content in them. Right. Well, and you're running like a Git,
uh, website, like you're, you're, you're hosting Git for other people as your job. Right. So you're
probably going to write some rule that says, look, if this person makes, you know, the same commit all these times,
you know, they all have the same content that I'm going to block them
or, you know, I'm going to, yeah, block them and then send an email
saying, you know, your account's been compromised or what have you
or something bad has happened, right?
On the flip side, maybe someone is creating a whole bunch of different commits
with different content because they are writing some script that's automatically converting all the tabs to spaces and all their
files or something. And for whatever reason, it's creating a lot of commits. And so that might be
normal, right? And so getting to Patrick's point, someone could actually, if it's not a cryptographically
secure hash function, they could actually figure out a way to be making a lot of different changes,
but they're all hashing to the same hash and creating all sorts of problems there.
And so it'd be very hard to fix that. But there are times, I mean, a good example of a time to use
hash that's not crypto secure is when you're doing these big data
ETL things. So when you're doing like a MapReduce or Spark or one of these things, so just to like
not sidetrack too much, but, but the way MapReduce works is you have this enormous volume of data
somewhere. And let's say you just want to build a histogram of it. So you want to
count the number of times you see Patrick, you want to count the number of times you see the
word Jason, you want to count the number of times you see the word programming throwdown.
So the way MapReduce will do this is, you know, it will have a ton of different machines
that are going to just take a small piece of this massive data set and count the Patrick's and
the Jason's and the programming throwdowns. And so then it's going to hash all of those names,
right? So Patrick is going to hash to some kind of number. Jason's going to hash some kind of
number. Programming throwdown is going to hash to a number. And then it's going to, based on that
number, send it to what's called a reducer. And so if you have, let's say 10 reducers,
you might say, well, all those numbers that end in a zero go to the first reducer. All those
numbers that end in a one go to the second reducer. I guess you'd need 11 reducers, but,
but anyway, so right. So then you, you use that to the reducer is going to collect all that information. And so a single reducer knows it's going to have
all the Jasons
because JSON is always going to hash to the same thing.
And so that's how MapReduce works.
And, but for MapReduce, you know,
as I said, you're processing these massive data sets
and you know, it's not really a cryptography thing.
Like there's not really a cryptography thing. Like there's not really any
concern about that. And so MapReduce, at least back when I was working on a long time ago,
use Murmur. Now it might be using XXHash, which has been proven to be a lot faster. But, you know,
that's something where, you know, actually saving that bit of time by using Murmur instead of SHA will actually really matter because the computation there is so intense.
And so the hash in HashMaps is needed because ultimately what you're trying to do is take some key.
And we should mention, I guess, HashMaps and HashSets are kind of interchangeable. Like it's just kind of the underlying fundamentals
are the same as just what data type you use,
whether it's just the key only or the key plus some data.
And so what you're trying to do with the hash
is take arbitrary data, mix it up, combine it,
do something in some way to go from an arbitrary,
any input size of bytes.
Imagine a string and you don't know the length of the string right it could be Bob and it could be
Sir Ferdinand the Great I don't know like something really big and you want
to combine it all down and let's say you want only one byte to represent all of
that data how do you mix it up so that your keys are as evenly distributed as
possible across your key space. Now the
other thing you have is you have the hash function which goes from an
arbitrary number of bytes usually like for the ones Jason was mentioning to you
know 4 bytes or 8 bytes for 64 bits or 256 bytes would be what 16 bytes? No?
Yeah that's true. So you end up having, you know, say, let's say four bytes, right?
But that's still two to the 32. That's a lot of like keys. And the getting constant time,
you have something called a bucket. And a bucket is taking a key and having,
you can think of it as an array of a size. So if you is 32 bytes. There we go. Thank you. That makes sense.
And so you have these buckets, and the buckets are like slots in your array. So say I have,
I want to make my array small to begin with. So I have seven buckets, seven elements in my array.
And I have a four byte hash result, right right so i need to get from four bytes down to
seven um and the way you do that typically i mean there's other variation is is a modulus
and if you do the modulus of you know two to the 32 whatever the value is modulus seven then it'll
just keep wrapping around and you'll find which slot to uh put your in. Now there is interplay as well that we won't get into about
how to choose that when you want more slots than seven, like what do you choose as your next? Do
you choose eight? Do you choose another prime number? Seven's prime. And there's some trade-offs
there. But ultimately you're taking this hash function and fitting it into your buckets of some size.
Now, the size is determined in part by one of the design factors that can end up sort of breaking the constant time assumption by something called the load factor. So the load factor is how many items you have stored in your hash map divided by your current number of buckets. So if you have buckets that are size seven,
and you have 14 items you tried to store in seven buckets, then clearly you're going to have at
least some collisions, and your load factor would be two. Now, when you have collisions, two things
mapping to the same slot, there's a variety of ways of handling it um the we'll talk about that in in
sort of a minute but in all of the cases the bigger your load factor gets in some hash maps
they try to keep or must keep the load factor under one um in other cases you could allow it
to grow above one but in both times once the load factor gets too high, you start breaking your constant time assumption.
You end up going to linear time, which is worse than logarithmic time.
And so you have to be really careful to play this load factor.
Now, where the load factor game comes into play is that you end up using more memory
because the smaller your load factor is, you're effectively wasting memory.
So if I say, oh, I'm just going to create a, you know, billion entry array, and I'm just going to hash everything to it. And I'm not going to worry about hash collisions. Okay, cool. Except that
billion element array is really big in memory, and it's mostly sitting empty. So there's a trade
off between memory usage and speed.
Yeah.
So, I mean, you know, what makes the hash table or hash map or hash set, what makes these things so fast is that hashing is turning some content into a number and that number
is then used as an index.
And so if you have an array, you know, getting an item from that array from its index is super fast. And so you're kind of turning like you're doing sort of an association, right? That's turning, you know, strings or whatever you want into indices and then just putting it into that spot. But yeah, as Patrick said, the problem is that only makes sense if the array has already been totally allocated in memory and you can go to spot five by doing some arithmetic.
And so for that whole thing to work, there has to be something, even if it's just empty in spot four and spot three and spot two and spot one, like that space has to be occupied. So we were kind of getting at this load factor hashing about changing your data into a sort of number,
modulizing that number by the number of buckets you have, and then looking in that slot.
Now, what happens if two things map to the same slot?
So it turns out there's roughly two strategies for dealing with it.
The first one is called closed addressing.
So an example of this is the C++ STL library for their unordered map and unordered set
use closed addressing.
And there's some variation and complexity about what they do,
but roughly what ends up happening
is if you have a collision,
because you could have seven buckets
and you insert your first item in index one and then
your second thing hashes and you also insert it in text one so you have only two things but they
collided and what would happen in the stl library and these closed addressing is you would just make
a linked list so you would take put them in the same bucket but but there would be a list. And so when you go to do a lookup,
you get the bucket and you say,
does my key match the key at the head of the linked list?
If not, go traverse the linked list
until you either find the match
or you find out that it's not in your set.
So that's closed addressing.
Yeah, actually, one thing that I wanted to just mention there is, yeah, you have to keep the
original key because when you hash something, you've kind of destroyed the original value.
Like if I, yeah, if I hash Jason, it's going to give me, you know, 13 or something like
that.
And so now Jason is gone.
Right.
And so, and so in the hash table, when you go to spot 13, you actually still need to
store the key again.
And that's one thing that kind of confuses a lot of new people when they see hash tables.
And so, yeah, so that's to Patrick's point.
Like if you're looking for JSON, just because you got to 13, it doesn't mean that you found JSON because it could have been something else hashed to 13.
So that's where you would actually look through that list and say, OK, is there an actual JSON here? And if that list gets really long, that's going to take a long time.
Hot tip here too, is I think every one of these things we're calling out, I've heard personally
in an interview I've been in, done or listened to. So if you're going to do an interview for a job
in Silicon Valley, like every one of these caveats I think we've mentioned so far has come up, at least in my experience. So the nuances here, if for no other reason than
interviewing ability are important, but they do actually impact when you get into the subtleties
of how your HashMap is performing. Because I do know a lot of people who just slap in a HashMap
and assume it works and then move on. And they don't actually measure to see
if it's an improvement or if they should tune it. Yeah. One of the things that I was really
fascinated by, again, working, so, you know, MapReduce, like working on MapReduce was definitely
the time where I was really exposed to a lot of these things because that scale is when you'll really start to see
all of the challenges get exposed.
And MapReduce switched from using QuickSort to MergeSort.
And the reason is because, you know,
in QuickSort and MergeSort have the same runtime on average.
But if you pick bad pivots which could
happen because the data is just structured a certain way and and you're expecting totally
random or arbitrarily distributed data there's actually times where quick sort would you know
consistently for this one map reduce quick sort would take almost linear time and and you know
this is over you know like billions and billions of items so it
just it would basically never finish and so um and so and so we end up switching everything to
merge sort and so i think um and so that's that's that's one of these challenges with hash tables
too is is um you know especially if you're doing something at scale, is you want to monitor the number of collisions
and also something called the smearing,
which is just another word for the entropy, really.
But you want to measure the entropy of your hash table.
So in other words, you want to see that a lot of the items in the hash table,
all the cells in the hash table are either empty
or filled with one
or two or three things.
And yeah, just keep a constant eye on it.
And you're right, it does come up a lot in interviews,
trying to understand that whole,
like understand all of those dynamics.
So if we gave the name closed addressing,
it doesn't take much, I guess,
to figure out the alternative is open addressing. So if you end up with a collision and you're using
an open addressing hash map then you use some mechanism for finding another
bucket, another slot in your array to go look at to see if your key is there. So
the easiest one to kind of think about is when inserting,
if I insert and there's already something there, then I just do my bucket plus one. And then I look,
is there something there? And if so, go bucket plus two, right? And I just keep walking down.
And you could do linear probe, quadratic probe, right? You're just basically looking and kind of
iterating through all of the buckets, looking for one that's open. And then at accessing, you just do kind of like
the opposite, you would go to your array, if that item, if the bucket is empty, you're done.
But if the bucket has an item, and it doesn't match your key, then you have to follow sort of
the same probing as the insertion algorithm use. And there's a variety
of these. And these actually, and I'll talk about why in a second, but there's there's cuckoo hashing,
Robin Hood hashing. So like in Robin Hood hashing, you measure how far away from your original slot
each element is. And you use that to decide whether to keep looking or to displace the elements that they are put yours in
and then move that one further down
so that you're trying to get your keys
as close to their original bucket as possible.
And the reason why these end up mattering
and why this versus like the linked list approach
from closed addressing matters
is because again, algorithmically,
these are constant time insertion and lookup on
average but in practice you have problems with cache efficiency in a linked list if i don't pay
attention i go look up one key and then it's not correct and so i have to traverse the linked list
but that item in memory could be at an entirely different place in ram just based on wherever
i got it from my allocator off the heap and I have to wait for the processor to go through all of its caches out to RAM,
pull in that value, and access it, right? So the cache efficiency there is very different than if
I have two slots in an array, which will be consecutive in memory, and I just need to look
at the first one and then the second one,
because the way the processor works is it doesn't just fetch your value from RAM and into cache.
It fetches, it's called sort of a cache line.
It fetches a bunch of memory all at once into the cache.
And then so it expects that your next lookup is close to your first lookup when you're accessing memory.
And so these open addressing ones try to leverage that realization
to make much more cache efficient ways of accessing the hash buckets.
So yeah, one question, when you're doing the open addressing,
how do you know when to stop?
Like, let's say the hash table is full, right?
Yep. So you stop when either you get to an
empty bucket or when you have if it's completely full and you've gone all the way around but these
are much more sensitive so closed addressing and i i might someone will tell me i'm wrong that's
fine closed addressing you can have a load factor over one open addressing you could never have a load factor over one
right it'll crash or something right because it just won't be able to insert like yeah you have
no you can't insert the last thing as soon as you go over one and if you have one then you're right
any checking for any value not in the set will require you to iterate all buckets. Yeah, the only alternative
would be you'd have to also store the hash. So like in every bucket, you'd have to store
what the hash would have been if there were no collisions. Yes, there are. So there are also
hybrid hash maps, I think that kind of start to use rehashing methods. So rather than
using a single hash, you use a different seed and you hash again, or multiple bucket lists.
But yeah, so Oh, no, I was saying something a little different. I was saying like,
so let's say you're doing linear probing, right? And so but but let's say in the hash table,
along with the key, you're also storing the hash of the key. Or I guess you could just do it on the fly. So like if you're doing the linear probing and you're looking for Apple and Apple hashes to two, and then you go to the next one and that item, whatever that is, if it doesn't hash to two, then you
know to stop, even if the hash table is full.
But then like, you're either like doing a lot of hashes or you'd have to somehow like
store the hash or something.
Yeah.
Does that make sense?
Maybe.
I don't want to get into a design review of your algorithm.
I feel something else could
be hashed to three. So you could have some dog was hashed at three.
Yeah, but I'm saying like, okay, so let's say the thing you're searching for,
your query hashes to two, right? And you go to that spot and there's something that hashes to
two, but it's not your item, right? So then you go to the next spot and if that thing in that next spot
doesn't hash to two then you're done right but i'm saying something else could have already been
there yeah oh and then you have to jump over it yes oh and actually actually now i think about it
so this is this is uh yeah this is bringing back a lot of memories so actually when you when you
when you um if
something if there's something that doesn't hash to two in that spot that you're in right now it
could be that apple is still there it just yes got there too late yes and like someone else oh yeah
okay so yeah you're right you have to go through the entire hash table worst case yeah see i would
totally fail actually just just to circle back something you said earlier, I've actually never had a hash table kind of question,
but you could tell I would totally fail that.
But I think it is super important.
I think, you know, don't panic
if like your background is in something else.
I don't know if you'll get maybe these kinds of questions,
but it's still really good to know.
Okay, I've seen good interviews and bad interviews.
I won't speak to bad interviews, but in good interviews, I think they're actually hoping you don't know the answer because they don't care. They want you to be able like Jason
just did, like, Oh, have this idea. Oh, wait, this idea doesn't work. And then, you know,
be able to make the progression to refining your idea or discovering something that's already in
literature that maybe your interview already knows about. But they're letting you sort of do it from first principles and seeing how your brain
unpacks the problems. Yeah, it's a good call. So if you really, really want the cheat code,
then there's something called the minimal perfect hash. So the idea here is if you know up front
all the items you're going to hash, say 397 of them, then you make exactly
397 buckets and you find the hash function which one-to-one maps all of
your input keys to your bucket list with zero collisions. These are pretty cool
because you have a perfectly space efficient answer. And the only
problem is, how do you search the space of hash functions to find such a perfect hash? And there's
a class of ones that you can narrow in on and how to structure this problem. It's a very interesting,
sort of research topic. I've never seen it used in practice, but it is a really cool idea that
you could have the best of all worlds, a constant time lookup, no wasted space, a load factor of one
and guarantee that you would instantly know if a key is in your set or not. It's just really
awesome. The problem is it becomes very, very expensive to find this perfect hash.
Yeah, that makes sense.
Yeah, I mean, the only way I could think to do it as a person who has no experience would
be to just try adding salt and just keep doing that until, but that's going to take forever.
I mean, the odds that...
You've just discovered Bitcoin mining.
Yeah, I guess so.
Yeah, I think, I mean, the odds that you eventually hit that right hash are just
Oh, for people who don't know, a salt is, a salt is very simple. It's something that you add
to all of your keys that just make the hashing work differently. Oh, yeah, yeah. Okay. And the
reason why, correct me if I'm wrong on this, or feel free to fill in the holes here, but I think the reason why they use salt is not to try and find better hash functions,
but the idea is the salt is something that you as the programmer know, but someone on
the outside doesn't know.
And so if someone, let's say, gets a hold of your database, but they don't know your
salt because they don't have the source code, then there are certain tricks
that they can't use. Like for example, let's say someone has a database full of hashed passwords.
Well, they know that a lot of people out there make their password the word password.
So they'll just run like the hash function. They'll like look at all the popular hash functions
and run them all on the word password and see the numbers that come out of that or the hash function, they'll like look at all the popular hash functions and run them all
on the word password and see the numbers that come out of that or the hash, let's say that comes out
of that. And, um, and just see like, okay, which of these hashes is in your database a lot. Okay.
It's this one. Therefore you're using this type of hash function. And then knowing that allows
them to do a bunch of other things. But if you've put a salt in, now you don't have a whole bunch of hashed words of password.
Now you have a bunch of hashes of, you know, my cool salt password and my cool salt Abba
and my cool salt Patrick and whatever.
And just like, you know, even though you added the same thing to every key, it's going to
cause the hashes to be totally different. Yep. That's right. And so, yeah, I think, uh, so getting
back to Patrick saying like, you could just try like adding, trying different salts until you get
one that hashes these 300 things in a 300 unique like integers after you've modded them. Uh, but
that will take forever. Not forever. Just a long time.
Not literally forever.
But yeah, really, really, really long time.
Yeah, I mean, definitely.
I mean, intractably long time.
Like not long as in come back in a week.
Long as in like your whole lifetime.
Like more than that.
Another thing that I've used hashing a lot for
which is like very different
from the traditional hashing, but still really useful, is locality-sensitive hashing.
And this is kind of the opposite of everything we just talked about for goals for hashing.
In locality-sensitive hashing, you actually want things that are similar to hash to the same number, right? So imagine a map, right? Like a map
of United States, right? And you have a bunch of points on the map. And so maybe all the points
represent different businesses or something, right? And so in this case, you want to hash
the points that are close together to the same number
because that number is going to represent, let's say, a file on disk that has all the
names of the businesses.
And so when you're looking on Google Maps, you're probably looking at a whole area.
And so you're probably looking at a bunch of businesses, but they're all close together
spatially.
You want to hash all of those to the same number so that all the names of the businesses come from
the same file on disk, right? And so that's locality sensitive hashing. And so in that case,
you're not worried about collisions. It's kind of a different use case, but you still get a lot of benefit.
You get a lot of the order one type benefits.
So imagine if you now have a bunch of these points from your map,
or maybe even you have a region of points, right?
And you can feed this through this locality sensitive hash,
maybe feed the boundaries or Monte Carlo it or what have you. And it'll say, okay, the hashes are three, four, and five.
And then you can load those files, which might have businesses that are outside of the range,
but they're all going to be kind of in that area because of the way you've done the hashing.
And that's how a lot of these kind of spatial databases work.
Awesome.
Yeah, this is a really good episode.
This is very fun.
Yeah. I mean, hash tables and trees, you know, it's sort of like, uh, there's that saying that,
uh, you know, with a, uh, uh, with a linear saw and a hammer, like you could do just about
anything. Like you build a whole house just with a linear saw and a hammer. I feel like trees and
hash tables are kind of like that for us. Like trees and hash tables, you can build just about...
Things like arrays and algebra, that's all primitives, right?
But with those two data structures, you could build just about everything.
And a 55-gallon drum of water.
Yeah, that's right. You could pass the computer science GRE
with a 55-gallon drum of water,
a water computer, and trees and hashtags.
Music by Eric Feindler. music by eric barnmeller programming throwdown is distributed under a creative commons attribution share alike 2.0 license you're free to share copy distribute transmit the work to remix adapt the
work but you must provide an attribution to patrick and i and share alike in kind