Programming Throwdown - Hash Maps

Episode Date: August 4, 2021

In this duo episode, Jason and Patrick introduce us to the world of hash maps, from buckets and hash functions, to differences between open and closed addressing, to minimal perfect hashes an...d locality sensitive hashing. A familiarity with hash maps is an oft-overlooked but highly sought-after skill, and it can be a valuable asset for those eyeing a career in programming.Along with the main topic, Jason and Patrick also talk about some of their latest interests: books, gadgets, tools and games.This episode touches on the following key topics and ideas:00:01:27 Playing games with Oculus Quest: Acron, Racket: Nx, Gorn, Superhot 00:11:05 News: “I Made a Water Computer” by Steve Mould00:14:56 colinfurze00:15:52 News: Comprehensive guide to Attention Mechanisms00:21:53 News: Starship SN1500:25:18 News: MailSync now Open source (GPL)00:28:34 Jason’s Book of the Show: Elon Musk00:32:04 Patrick’s Book of the Show: Ready Player Two00:33:40 Jason’s Tool of the Show: Datadog00:38:44 Patrick’s Tool of the Show: I Expect You to Die 00:40:30 Escape rooms00:45:39 Sudoku00:48:35 Hash maps: the promise and idea00:50:59 Hash Functions00:52:34 Examples of hash functions: Cryptographically Secure and Non-Crypto01:01:05 Load Factors01:03:43 Open vs Closed Addressing01:15:10 Minimal Perfect Hash 01:16:25 salts01:19:00 Locality Sensitive HashingResources mentioned in this episode:ToolsMailsync http://mailsync.sourceforge.net/Mailspring https://getmailspring.com/Datadog https://www.datadoghq.com/SHA https://en.wikipedia.org/wiki/Secure_Hash_AlgorithmsMD5 https://en.wikipedia.org/wiki/MD5MurmurHash https://github.com/aappleby/smhasheraxxHash https://cyan4973.github.io/xxHash/MapReduce https://www.ibm.com/analytics/hadoop/mapreduceBooksElon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future by Ashlee Vance Ready Player Two by Ernest ClineGadgetsOculus Quest 2 https://www.oculus.com/quest-2/Oculus Link https://www.oculus.com/accessories/oculus-link/GamesAcron: Attack of the Squirrels! https://www.resolutiongames.com/acronRacket: Nx https://www.oculus.com/experiences/quest/2255408847836468Gorn https://www.oculus.com/experiences/quest/3349689215139117Superhot https://www.oculus.com/experiences/quest/1921533091289407I Expect You to Die https://www.oculus.com/experiences/quest/1987283631365460The Legend of Zelda: Breath of the WIld https://www.zelda.com/breath-of-the-wild/Videos:I Made A Water Computer And It Actually Works https://www.youtube.com/watch?v=IxXaizglscwcolinfurze YouTube channel https://www.youtube.com/user/colinfurzeArticles:Comprehensive guide to Attention Mechanisms https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/Starship SN15 https://www.space.com/spacex-starship-sn15-launch-landing-successMailSync is now Open Source (GPL) https://community.getmailspring.com/t/a-free-open-source-future-for-mailspring/484If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/Reach out to us via email: programmingthrowdown@gmail.comYou can also follow Programming Throwdown on Facebook | Apple Podcasts | Spotify | Player.FM Join the discussion on our DiscordYou can also help support Programming Throwdown through our Patreon ★ Support this podcast on Patreon ★

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, we're here with a duo episode. We've moved on to two episodes a month. I'm sure people have noticed by now, which is really, really cool. And I probably talk about that too much, but I just can't tell you how excited I am that we have an actual producer and we can up the amount of content that we get out to everybody, which is really cool. Exciting. Yeah. So this episode, we're going to talk about Hashmaps, which is really cool. Exciting. Yeah.
Starting point is 00:00:49 So this episode, we're going to talk about HashMaps, which is really cool. So before we get into the actual show, I just had like a pre-show. I wasn't sure if HashMaps include hash tables or like what the umbrella term is. I think HashMaps kind of cover everything. I think they include, I know hash maps and hash sets are basically the same thing. Yeah, that's true. But then hash table, I mean, does that just mean you're using an array?
Starting point is 00:01:15 Yeah, I don't know either. We'll have to, we'll look it up. We'll look it up. Okay. But yeah, our show is going to be all about hashing and all the different things you can do with hashing and all the clever stuff that it unlocks. So we're really about that and i actually another thing i'm excited about is i got an oculus quest uh one of the quest twos and i think i think in our very first show i think i
Starting point is 00:01:36 talked about or maybe it was one of the first few episodes how i got a cell phone and how that was kind of a big deal and i was such such a, or smartphone, sorry, smartphone. I was a late adopter to the whole smartphone thing. Um, and I'm also a late adopter here. So I feel like I've been consistent, but, but I got a quest to, so I got, you know, Beat Saber and a lot of things that, that everyone knows about. And it's been fun. I've been exercising a lot with it. One of the things I was able to do with it that I didn't expect is, um, I've had a racing wheel. So it's like a USB wheel that actually clamps to my desk.
Starting point is 00:02:09 And it comes also with foot pedals. And so I'm really into racing games. And so I can play like F1 and Project Cars and Dirt and these other racing games with an actual steering wheel. And I've had that for a little while. But I actually was able to play a VR racing game with the wheel and the quest. So the quest has this thing called Oculus Link. And so you can connect the quest to the computer and use it, have your computer drive it kind of like the rift. So you can use the quest in this way.
Starting point is 00:02:40 And so with that whole like Frankenstein kind of of setup here with the wheel and the computer and the link and the quest i was able to actually do a racing game with like physical controls and vr and it was awesome like it was really really fun i was racing go karts and you could actually like as you pass someone you could look or if someone's behind you and you want to see how close they are you could turn your head and like look at them and then turn back it was really fun and even just the feeling of like you kind of like instinctively lean when you turn uh the steering wheel and so when i lean you know my head actually leans and the view changes and it was it was super super fun i have to admit it was a total blast yeah i also got an oculus quest 2 There's some, like I don't use Facebook personally,
Starting point is 00:03:26 like nothing against Facebook. It's just like, it's not something that I engage with regularly. So it was a little awkward that it's so heavily linked between sort of like Facebook, which you can go read about all on the internet and people have very strong opinions. Yeah, so I just set this up.
Starting point is 00:03:40 I could talk to this. When I started it, it said, hey, do you want to create a Facebook account or an Oculus account? And I said, I'll create an Oculus account and try and keep things separate. So that way I can use key pass of a separate password, et cetera. So I went through like the whole signup flow. And then at some point really early on, I think it's like, I went to get my first game or something. They're like, uh, yeah, actually about that. You're going to have to link these two accounts. And the one you just created is going to get deleted in a year. I was like, what?
Starting point is 00:04:09 Yeah. But aside from that, I also have enjoyed it. My kids have really enjoyed it. We do it in small doses and we cast from the headset to the TV so other people can watch. Wait, what? Wait, wait. How do you do that? There's an option, like a cast option on the main main menu so you can either you from your phone if you have it linked to your phone you can just watch it on your phone or tablet um but then you can also send it if you have a chromecast i think it also might support the what is the apple tv airplay i've not tried that yet but yeah that it's pretty cool experience i actually have the tool, I won't spoil it, is an Oculus Quest game that I put as tool of the show
Starting point is 00:04:47 before we decided to talk about VR at the beginning. So I'll leave that for then. But there's been a couple of games I've really enjoyed being able to play and being immersive. One that was really different was, I think it's called Acron, which is like a permutation of the letters of acorn about a tree and nut-stealing squirrels. was I think it's called Acron, which is like a permutation of the letters of acorn,
Starting point is 00:05:08 about a tree and nuts stealing squirrels. And the interesting thing is that the person wearing the VR goggles is a tree that can throw, you know, sort of, I guess, like bombs and balls and whatever at squirrels that are running towards it, trying to eat the nuts that are at the base of the tree and take them back to their sort of base. But instead of having only bots do this, you can have other people on their phones download the game and take control of the squirrels. So they become the squirrels and you take turns,
Starting point is 00:05:38 or at least we do, we take turns. Who's the tree? You have giant tree limb arms and you're picking up these things and throwing them at the squirrels. And the other people in the room are playing with you, right? They're the squirrels trying to come and steal your nuts. Whoa, that sounds awesome. And so it's super cool, you know, just like an experience that you wouldn't otherwise have. The only kind of like downside that I've gotten is one is like the battery life isn't super long, you can easily,
Starting point is 00:06:03 you know, taking turns with everyone, eat through the batteries. And then the second thing is you actually need a fairly large space for the games that don't have you stationary. For the games where you actually like move or walk around the space, which I find to be kind of like
Starting point is 00:06:16 the most immersive and most cool. You actually need a fairly decent space that's clear of objects. So like a living room or something. You need probably a good I don't know what they recommend, like eight feet by eight feet or 10 feet by 10 feet. But it's actually surprisingly difficult, at least was in my house to like, find such a space. And so we had to kind of like make an arrangement where we move some of the furniture to be able to, you know,
Starting point is 00:06:39 have a big enough space to really fully enjoy it. Yeah, yeah yeah i was playing this game called i think it's called rocket rx but it's it's effectively like uh it's like solitaire solitaire tennis where you're surrounded you're encased in like a sphere and so you hit the ball off the sphere at different points to hit you know various targets and stuff that are like overlaid on the sphere right and so you're and the harder you swing well it's not's not, if you swing hard enough, then the ball will actually skate along the surface. So for example, if you hit it hard enough, then you don't have to like hit the target directly. You just have to be kind of a leading the target
Starting point is 00:07:19 and the ball will kind of go through it. And so it kind of encourages you to swing a little bit hard. I don't think the harder you swing, the better it is, but you have to swing hard enough. And so I had drawn my guardian, which is what they call the sort of border you set up before you start playing. I had drawn it kind of right up to the wall and not really thinking about it. Oh no. Yeah. By the time the guardian warned me, like a split second later, I'd like punch the wall. It wasn't a big deal, but yeah, you easily like kind of lose track
Starting point is 00:07:48 of where you are spatially. And that was one of the things that surprised me. I've only played it for a few days now, but surprised me was how often the Guardian is actually, you know, warning me, like how often I get far away from where I was originally standing. I also got a game called
Starting point is 00:08:05 gorn g-o-r-n which is a uh it's like a roman coliseum and um there's a bunch of like weapons on the ground and so you actually reach down to the ground pick them up and uh and you have to sort of fight other people in this coliseum and uh it is i mean all of these games are a huge huge workout i mean i was sweating afterwards it was a lot of fun oh this is it feels like uh that movie wally you know like all the people like riding around in their you know chairs and drinking their slurpees like we're we can't go outside and exercise you know it's too much work so we're gonna put on these goggles and stay inside and exercise. It's just kind of, we're getting there. It's pretty wild. I mean, it's, it's, yeah, it's interesting
Starting point is 00:08:50 because it's definitely more of a workout than playing like Quake or something. So in a sense, it's like a step in the right direction as far as not ending up like a person in WALL-E. But yeah, you're right. It is. Actually, I saw a game in the store that said VR VR, where I guess you go around putting on VR goggles in VR. So there is, I think it's this Superhot. So I think Superhot, which I believe is available. We should say there's lots of VR goggles out there. I think this one's gaining momentum
Starting point is 00:09:23 towards being kind of more commonplace than just early adopters. Well, I think the quest is the only one that's self-contained, right? At least that I know of. I think there are others, but they're not at the same performance.
Starting point is 00:09:35 Like they were much cheaper ones further back, like years ago. Again, we're late to the party. Yep. Yep. So in super hot, the,
Starting point is 00:09:43 which is an interesting game. They have, I believe it's there a scene where you reach up and pull down vr goggles onto your head to use your controllers to reach up and grab it and pull it down onto your head and it's sort of that same surreal thing wait i have on vr goggles and i'm reaching for vr goggles in vr to put on my vr dog oh it's amazing. I look forward to all of these little cheeky puns in these games as people start adopting these more and more. Yeah, I mean, another thing I did just for the experience.
Starting point is 00:10:13 Yeah, I don't think I would do it again, but it was actually pretty interesting was I watched an episode of Family Guy in VR. And so you're just like in this, like, I think you're in like some kind of a tree house or something, whatever that, you know, environment is when you start the quest for the first time you're, you're in this environment and there's basically just a giant, enormous TV that's just suspended in midair and it's playing family guy. And, uh, yeah, there's something just interesting about that experience. Like, like, I don't think I would do it again. Yeah. It's like, you might as well just watch the TV of course. Right. But, uh, there's just something kind of interesting about that experience. Like, I don't think I would do it again. You know, it's like you might as well just watch the TV, of course, right?
Starting point is 00:10:46 But there's just something kind of interesting about that. Part of it, too, is I was just surprised how comfortable it was. That really shocked me, actually. I think that having the strap that goes over your head and, you know, just getting
Starting point is 00:11:00 all those straps, you know, tightened the right way actually made a really big difference. I think it's time for news. Go for it. What's your news? All right. My first one is a YouTube video called The Water Computer, or I guess it's called I Made a Water Computer by Steve Mould, which I guess he's a fairly famous YouTuber. I keep coming across YouTubers who have like a million subscribers and I'm like, I heard of this person before yeah i wonder what the percentile is there like uh i i bet there's there's there's got to be over a hundred thousand maybe everyone's probably groaning
Starting point is 00:11:33 like oh of course i already saw this video i knew this guy well i didn't i never heard of it so i came across this video by steve mold saying i made a water computer. And he is doing it with another sort of like math YouTuber, Matt Parker, which Matt Parker, I've watched his videos before. So I'm not sure I don't watch all of them. But I'm not sure how I miss this. Anyways, the guy has an interesting mechanism for building logic gates out of sort of plastic and doing the computation instead of electricity, you use water. And I just thought it was pretty cool. So in college, I took some amount of classes in digital logic, discrete logic, these kind of things about building up an entire CPU out of just gates.
Starting point is 00:12:15 This is pretty common in a lot of electrical engineering or computer engineering curriculum. So I'm not unique, but I always just come back to it as like a sort of very enjoyable way to to kind of think about these like individual gates and how that they grow up up up and eventually you're playing vr on a headset so just to be clear like this doesn't mean a waterproof computer this is literally a computer that uses water instead of electricity oh okay so yeah i was about to get there thank you for making sure to clarify that yeah so this is like a class of people who have done what I guess what you call like physical computing, where instead of having like silicon logic gates, or it used to be, I guess, vacuum tube gates, you actually make anything that by having two physical objects, and then if both of them are in the quote unquote one state, then you know, and you have an AND gate, then you get some sort of output. So I've seen them built of Legos, of Tinker toys, of now somebody with, you know, flowing water. These are always just like super
Starting point is 00:13:17 fascinating to me. And in theory, I mean, there's some nuance to using water for doing your actual computation. But in theory, you could build up a fully functioning computer using this. I mean, it would get unreasonably large. I mean, modern silicon has an enormous amount of transistors in it. So you would end up, you know, it wouldn't be practical, but it's just a fun, thoughtful way to see the kinds of things people are able to do.
Starting point is 00:13:41 And so you actually build something they can add to, oh, I don't have it in front of me, two four-bit numbers or two five bit numbers. And then the output computation is a given by water flowing through this series of valves. So if you've never seen it before, you know, check that one out or just in general, you know, the fascination around using physical objects to do. Yeah. So when you go to take your SATs and they tell you, you can't bring your calculator, just be like's all right i got this guys uh can you please define calculator for me um yeah i just it's okay i don't need a calculator but can you bring me like 10 jugs of water
Starting point is 00:14:16 yeah can i have a 55 gallon barrel of water by my desk. Hey, wait, guys, it's still dripping. It's still dripping. Hang on. I need 10 more seconds. Oh, my gosh. Yeah, man, that brings back so many memories of I think that SAT you could have a regular calculator, but the ACT you couldn't have anything. I'm sure it's changed now. I haven't kept up with it. But I remember, yeah, there being like a specific list of approved calculators you could use. Yeah, that's right. That's right. Cool. Yeah, that sounds awesome. I'll give this a shot.
Starting point is 00:14:48 Actually, you know, I was thinking about it the other day. A lot of the videos that are in my YouTube history are videos that you've posted. Really? So yeah, this is yet another creator. I guess another way of putting it is maybe not the history because that's too shallow, but like too recent. But a lot of the people I subscribe to are people that you found for me ah so uh cool i'll have to return the favor maybe next show i'll do uh are you have you heard of colin fursey yes yeah okay okay that's that's
Starting point is 00:15:17 one person that i found on my own so just for people who haven't heard of him on the show he does uh he just makes crazy things he's like a mechanical engineer extraordinaire and just builds all sorts of wild things the most maybe not the most recent episode but a recent episode he made a car uh waterproof um and the interior of the car is totally waterproof and so then he filled it up full of water and he had like a car that was also a hot tub what okay i've not seen this one yet. Yeah, you got to check that out. It's a convertible and it's a hot tub. And while you're driving, there's like water splashing out of the car and everything.
Starting point is 00:15:52 All right, my news, one of my news is this really cool guide to something called attention mechanisms. And so I'm going to need to kind of, you know, explain and build up to this. But, you know, as a lot of people know if you want to do what people call supervised learning which means someone gives you a label so they say like okay here's a picture this is a dog here's a picture this is a cat here's a picture it's a dog they
Starting point is 00:16:18 give you a bunch of those you can do supervised learning and you could train a machine learning model to predict the output. So you could give that model a new image and it will tell you whether it's a dog or a cat. Actually, it'll give you the probability that it's a dog, the probability that it's a cat. Right. And so, you know, you might have, let's say a table full of data in Excel or something, and then you have this one column and that one column in the table, maybe sometimes it's missing. And so you want to try and predict the value of that column when you don't have it. And so you train a machine learning model to do that.
Starting point is 00:16:52 You take all the other columns, you pass them as inputs. That's relatively straightforward. You'd have some kind of neural network or some kind of model to do that. Now, when you have images, that's more tricky because an image, every single pixel, and really every channel of every pixel is a separate input. And so if you just use a regular neural network or some other model, it's just going to have too, it's too expressive, too open ended, and it can't really learn effectively. And it won't be able to handle different sized images and stuff like that.
Starting point is 00:17:26 And so for images, they've made this thing called a convolutional network. And so the convolutional network's kind of nice. Think of it as like a roving eye. So it's like a little eye that can look at a piece of the image and that eye kind of moves across the image doing the same thing
Starting point is 00:17:43 at all the different spots in the image. And so if you know how to detect a dog, you know, in one spot of an image, that same dog detector will run everywhere else. And so you could end up with the output could be, let's say another image that's some black and white image. And the more intense the image image the more likely there's a dog there or not and that image is created by this this eye that kind of like roves over the image doing the same thing over and over again and because you only have to learn that eye once then it makes it practical to to to build these models and so it's cool for images but then you you have this you have like a lot of other modalities that you want to maybe learn from, right? And one might be language. So you might say, well, here I have a sentence and I
Starting point is 00:18:30 want to predict whether this sentence is, I don't know, makes people happy or sad. And so you have a bunch of sentences and then you have a bunch of, you have another column that tells you whether the person was happy or sad and you want to train the same type of model but how do you do that with a sentence right it's not really clear like how to how to you know what sort of you know uh model or what sort of topology you need to do that and the thing that's especially challenging about sentences is that they can go on forever just and other than the image you know if there's, you know, the dog will probably just be in one spot of the image. Right. Or if the dog is going to be over the whole image, you could shrink the image down.
Starting point is 00:19:12 There's tricks you can do where you only have to look at, let's say, a certain spot of the image. But for sentences, you can't do that, especially for certain languages and things like that. But even probably across all the languages, like the first part of the sentence might be really important when you're reading the end of the sentence. And so the sort of structure that they've designed to do things based off language is called LSTM, which stands for long short-term memory. And it's similar to this roving eye. I mean, it's a different technology, but it's similar in the sense that it uses a certain mechanism to kind of go over the whole sentence and be able to sort of build up knowledge as it goes through the sentence. So with all that groundwork, LSTMs are similar to convolutional nets that they didn't work
Starting point is 00:20:02 for a really long time. And recently, people have been able to get LSTMs to actually work and be really accurate and train models that are really accurate. It's through something called an attention mechanism. So the problem with LSTMs are... So this LSTM, think of it like an accumulator, like a battery, right? And things are coming into this battery, like knowledge is coming into this battery. And this battery is sort of storing this knowledge. And then when you finish storing the knowledge and the battery now has kind of everything
Starting point is 00:20:37 in it, now you can start draining the battery, draining that knowledge. So you could imagine if you're, let's say, translating English to French, you would store all this English knowledge and then you'd have another model that kind of drains it in French. And if in French you would have reversed the order of the words in a sentence, that's okay because the battery kind of has everything, right? The problem that they were having was as you store new things, they were kind of erasing the older things. And it's very hard actually to kind of keep storing more and more and more words without destroying like the very first word you put in.
Starting point is 00:21:16 And so the attention mechanism is a way to prevent that from happening. And so the article gets a lot more detailed than that, but this guide is really cool because it goes through the whole history and explains it kind of with pictures. And so if you've ever wondered how machine learning works on words,
Starting point is 00:21:36 check out this guide. It's got a lot of good info. You make my news article seem so shallow. Well, that's probably the deepest one I've ever linked. So sample size of one. Yeah, that's super fascinating. I think that was a great overview. I knew all of those words. My news article is another abrupt 180 degree different direction, which is the successful launch and landing of Starship SN15. So we have like a little bit, I guess Jason was mentioning about being a late adopter to smartphones.
Starting point is 00:22:09 We have a history as well as covering some amounts of space articles and making space predictions. This happened a couple of weeks ago while we're recording this. So it's sort of old news, but I just want to celebrate. I think it's super cool.
Starting point is 00:22:20 This is a SpaceX prototype and they were able to launch it out of Texas up to, I don't remember where they go, up to about 10 kilometers or something and then land it back just next to where it took off from. But just the whole thing flying and launching it, it looks so ridiculous and to actually see them do it successfully.
Starting point is 00:22:40 They've managed to sort of land before, but there was always problems of one kind or another and so this is the first time they sort of took off landed no major issues and in fact they're looking to potentially do it again i guess they're doing inspections and stuff so oh i was gonna ask about that because i thought they had done it before but yeah maybe there were problems you know it might have looked like it landed, but something broke or something. Yeah. And there was also the, they did tests earlier, which were not as high and weren't, didn't have aerodynamic surfaces.
Starting point is 00:23:12 So they did some like tests of the engines and stuff. This is the first time they sort of sent up what would end up being the final stage, like the orbital part of the thing. And now it looks like they're, based on filings, they're thinking about launching up to orbit from Texas and splashing down somewhere around Hawaii sometime soon, which would require I think them to stack two stages together. So this thing's huge. If you've not, go look at the like diagrams for how big the final thing is going to be compared to like a person or existing launch vehicles. And it's it's absolutely insane.
Starting point is 00:23:46 Wow, it's amazing. Yeah, the so how far are we away from being able to get a ride on one of those as a passenger? You and me or an arbitrary rich person? Let's say let's say an arbitrary rich person. I mean, I think they're going to fly several civilian launches this year. I think SpaceX is doing one on the Falcon. Their current ones that go up to the space station, they're going to launch some civilians, which will sort of kick off another set of civilian space travel, which hasn't happened for a few years. But I think on Starship, I would say probably still sort of two or three years away.
Starting point is 00:24:22 I mean, there's the Japanese person who bought a trip around the moon on one of these starships, and they haven't set an exact date yet. But I think they're saying it's going to be closer than you think. Wow, it's amazing. Now you and I affording it? Yeah, that's a that's a whole nother question. Yeah. Yeah. Maybe if i if uh yeah we stop you know buying so much junk food maybe we can get there how much dogecoin did you buy oh gosh oh yeah yeah last show um or a couple shows ago a few years ago we're talking about you know like fractures or banking and those things, the last duo episode. And since then, people have been trading their dollars for dog money. So I think that just kind of tells you
Starting point is 00:25:13 everything you need to know. So okay, my second news article is on MailSync. So I actually use MailSpring. If you've ever gotten an email from me recently, you'll actually see this little footer because I've never bothered to change the default, which is great advertising for them. You'll see this little footer saying MailSpring, the best free email app or something like that. But I actually love MailSpring. We should definitely do it as a tool to show some time if I haven't already. But it's a nice desktop email app. It has a lot of really nice features. It collects all your inboxes. I just feel like it has a great interface.
Starting point is 00:25:53 I'm a big fan. And for a while, the MailSpring kind of front end, which is all in Electron, was open source. But the back end, which actually handled ingesting mail from a trillion different sources, they handle Exchange, they handle Gmail, they do all of that. That was all closed source. And so I don't remember if MailSpring is on Linux. I know it's on Mac and Windows. I'm sure it's on Linux too, but, but in general, like you always, when somebody is closed source, you always kind of wonder how long it's going to survive. And, uh, and now they recently decided to open source
Starting point is 00:26:35 the whole backend, which is really cool. Um, not only does it mean mail spring will be around for forever, but, um, also, um, if you ever needed to do that, like if you needed to ingest mail, and basically MailSync basically converts email into think of it as like a SQL light table. And so if you ever needed sort of a database representation of your email, and one that sort of is kept kept in sync and kept alive, kept alive. MailSync does exactly that. And so I think it's a really nice thing to have for the community. So I'm really excited about all of that. I think email is still an area where there's a lot of low-hanging fruit. I mean, I spend a ton of time on email. Something you know, organized email. I do think there is something to that. And this will sort of open up that where people can write new front
Starting point is 00:27:31 ends and new mail experiences without having to learn, you know, 10 different protocols. Oh man, that's cool. Yeah. There are some email clients that are premium, like there's superhuman and there's hay, but I haven't really tried them yet. But yeah, there's got to be, I think, a better way to do email, like something that's suggested. Like, I don't want something that automatically puts things in different folders because you never really know what it's going to do. But something that somehow like just suggested ways for me to do email better. I think that could be cool.
Starting point is 00:28:11 I mean, I was, I much preferred inbox, Google's Gmail client that they experimented with a while. I'm so still disappointed. I still feel I'm worse at email using the Gmail app than I was using inbox. I too still have a hole in my soul. Yeah. I actually switched. I, I, I found mail spring when inbox was shut down. Ah, okay. Well, we're not going to get on that. It's gonna make me sad. It's time for. Yeah, it's true. Inbox was so good, but it is what it is. Time for book of the show. All right. My book of the show, which actually, you know, maybe this might be the first time ever, but I already read this book. I'm not suggesting a book and walking someone off a plank. But yeah, this book is self-titled.
Starting point is 00:28:51 It's Elon Musk. And so it's all about Elon Musk. I think it's awesome. Actually, I found the book fascinating. I think that it really did a good job of you know covering the good and bad stuff actually it's really interesting the foreword is or maybe the first chapter preface or whatever whatever is um is from the author first person talking about um the negotiation he had to go through with elon to to write the book and originally elon wanted all these footnotes and you know i guess he's like
Starting point is 00:29:24 he has a sort of engineer mindset where he wants to just correct the record and wants everything to compile right. But they kind of went back and forth on that a lot. But as you can imagine, someone like that's really concerned about their image and everything, right? That's not a surprise. But yeah, I felt like the book was extremely well balanced. It had a lot of really good detail. It covered all the, no pun intended, all different chapters in Elon Musk's life. So he started off as just an intern at some software company and then just doing a bunch of coding. One thing I noticed was, and one thing you kind of take for granted,
Starting point is 00:30:06 or at least I kind of took for granted, is the support that he received from his family. So there's a bit of trivia here. Elon Musk, one of his things he's known for is having kind of a rough childhood. His parents, I guess, were really strict. And I'm not taking anything away from that. But his parents actually funded the first company. And, and actually when I've read about other, uh, like tech billionaires, actually, you kind of see that pattern over and over again. And so I was thinking, it made me think a lot about, um, like, like that sort of community, like the Elon Musk's and the Mark Zuckerberg's and the Bill Gates and what have you. And a lot of these people are kind of like in a position where they can take some of those really big risks. So like, you know, someone in all of these cases, someone offered them millions of dollars for their company, and they turn them down. And I feel like that would be pretty extraordinary to do that. But then when you see the background, it's like, okay, well, actually these people already had millions of dollars usually like through their family. So,
Starting point is 00:31:08 so that, that kind of like made more sense, but still, I mean, the, the, you know, Elon Musk's a total genius. Uh, I felt like, uh, yeah, I have a ton of respect for him after reading the book. And, uh, um, yeah, I thought it was a fantastic read. It actually is very exciting. I was really interested in it the whole time. So it doesn't, you know, a lot of these can tend to drag on because they might say, well, here's sort of the middle years where nothing happened. And there really wasn't a lot of that. The book was super entertaining all the way through. Nice. I've actually not read any Elon Musk's biography. I not too much studied him as a person yeah you know i i it's uh i know very little also about a lot of i don't really study a lot of people but for some reason like i was really just drawn to reading this biography the only other biography i've read is the steve jobs biography and uh yeah there's there's just a lot of really interesting parallels parallels there and uh yeah it's a solid book nice my book of the show is ready player two
Starting point is 00:32:07 this is the follow-up to ready player one um by ernest klein uh it's set in the same universe as ready player one i guess it's a low-hanging low-hanging fruit not a super deep book but enjoyable i enjoyed as much as ready player one but definitely pretty solid i i'm not getting through as many books uh nowadays as i used to yep same here but definitely pretty solid. I'm not getting through as many books nowadays as I used to. Yep, same here. But this one was nice, easy. I actually listened to it, but a nice, easy listen. And I kind of did it in the background. I thought it was good. You know, the same kind of thing. If you liked the first one, you probably like the second one. If you didn't like the first one, I wouldn't bother with the second one. Yeah, I mean, it's the same kind of like pop culture laden,
Starting point is 00:32:46 you know, exploration of the 80s and a little bit more music bent maybe than the first one. But yeah, a good read. I always struggle. I, you know, don't unlike a biography, I don't want to spoil the fiction too much. So I don't really want to say too much about what the book is. But yeah, Ready Player Two by Ernest Cline. Cool. Yeah, you recommended the first one a while back, and I gave it a read and I thought it was fabulous. So yeah, I'll definitely I think I already bought this book as soon as I saw it, but I haven't read it yet. Yeah, my backlog is getting pretty big. So I got to start spending some dedicated time getting through all these books I want to read. Yeah, I noticed that I've kind of caught myself switching to podcasts because, you know,
Starting point is 00:33:29 I don't have the long commute or anything. And so but I think I'm going to try to get back into, you know, doing full books, but just in bite sized pieces and see how that works out. Is it time for tool of the show? Tool of the show. My tool of the show is Datadog, which is an amazing tool. It is super, super fun. So the way Datadog works is, so you're developing, let's say, some kind of app. It could be an Android app, could be a website with some web backend, or even on the front end on the browser.
Starting point is 00:34:02 Anything you develop, you're going to give it to somebody. And that person is going to run it when you're not around. And so a bunch of logs are going to be generated, right? Now, like if it crashes, that's a whole nother thing. And I think I already mentioned Sentry in a previous episode. So if it crashes, you know, you have Breakpad, you have Crashpad, you have Sentry, you have a million different tools, Rollbar, etc. But then there's also just, is this healthy? In general, you don't want to crash websites or apps, right? But you still want to know, for example, how many people are opening your app, clicking on, let's say, the settings and then leaving the whole app? Well, that would probably be good to know.
Starting point is 00:34:48 Or just anytime you have any type of log, like sometimes you have a log, you know, should never get here. Right. And it, and it fires. Right. So most apps, sites, et cetera, are generating logs. And, you know, it's great to have those sort of aggregated. And so a data dog does is they have a bunch of clients for a whole bunch of different languages. I actually ended up writing my own client because they didn't have a C++ with your application ID and a bunch of logs. And then what Datadog does is it collects all of that. And then it presents you with this really nice web UI where you can, you can actually, there's something called live tail and you can actually see logs as they come in. And then you can write rules. And they actually have this sort of way to automatically find patterns in your logs. So if you have a bunch of logs that says, you know, received 26 packets, received 31 packets, received 100 packets, right,
Starting point is 00:35:57 all of those lines are going to be different, but they all have the same pattern. It's just received, you know, N packets. Datadog will automatically pull out that pattern and then group all of those logs together. To be honest, I haven't ever used any of the premium features. I've only ever used the free tier, which makes me think that they probably should maybe charge more or something, but nothing I've talked about costs any money. They do all of this for free. And then you can then write rules on top of that. So you can say, for example, all the logs that have the severity level of error, I want all of those to get written to an S3 bucket. So S3 is the Amazon AWS file system.
Starting point is 00:36:43 And so that's one of the rules I have set up. So whenever an error log happens on Eternal Terminal, I've actually gone through and looked at all the error logs to make sure that there's nothing personal, like someone's name or IP address or anything like that. And so I know all these error logs are safe. And then whenever one of those error logs are written, it will get sent to Datadog and
Starting point is 00:37:06 then it gets dumped through an S3 bucket. So all of that is super nice. The other part of it is Datadog, as I was saying, kind of alluding to earlier, they're using your own kind of infrastructure. So like Datadog's not actually storing anything. They have a way where you give them permission to a folder on your AWS. And so you're, you know, effectively you're paying for all the hosting costs and stuff yourself. And because of that, there's no limits.
Starting point is 00:37:35 Like they can just take tons and tons of data. And if they have to write a lot of data, you're going to pay for it through Amazon, not through them. So it's an amazing service for free. I have to look up what the, I think for premium, they'll let you do alerts and more fancy things. But yeah, definitely check that out if you're building anything that's going to be looked at by other people. Super cool. Yeah, they're like a publicly traded company and stuff now, right? That's a good question. I don't know. This is super cool to like hear you say like kind of like their niche on top of other people's services, I guess is kind of cool. Yeah, actually, that was I felt like a really good sort of business decision, because in my case, you know, it's a it's an open source project.
Starting point is 00:38:20 I'm not too worried about if they get hacked, these sort of error logs, which, as I said, have no personal real data on them. But, you know, imagine you're like a company that's logging customer information or whatever. The fact that, you know, it's passing through their system briefly and then it's gone is kind of nice from a security standpoint. And it means for them they don't really have a big footprint. My tool of the show is uh already teased it not a tool but a game i expect you to die uh for i played it on the quest 2 but i believe it's actually on a lot of vr platforms so real quick the data dog is publicly traded and actually that company is has a market cap of 28 billion billion. So that is a much, I kind of, I thought it was
Starting point is 00:39:08 just a small company, I guess I was totally off the mark there. But I think if you're going to buy a Dogecoin instead, buy a share of Datadog. That's my advice. Shares are a million dollars piece um so my game is i expect you to die i guess it's a little bit like a escape room kind of feel you unlike the ones i was saying where you move around the room here you're actually supposed to be sort of seated but all of the scenarios make sense with you being stationary so either you're like sitting or standing in kind of one spot and moving your body sort of like rotating around the room to do various items and there's various puzzles and it's themed as you're sort of a spy going on missions but there's not a ton of explanation and various things that you do in trying to to solve the puzzles cause you to die. So you may be like in a car and there's poison gas outside. And if you break the window somehow,
Starting point is 00:40:09 the poison gas is going to come in and kill you. But that's not the only way there might be a grenade. And if you drop the handle off the grenade, the grenade will explode. And so you aren't expected, at least I guess based on the title, I assume you're not expected to not to die. So you die and then you try again and and you you sort of figure out the puzzles uh puzzle rooms isn't
Starting point is 00:40:31 interesting i've never done an actual puzzle room like in real life like i've never gone to one oh my family loves them they're so fun like escape rooms you mean oh that's what i mean yeah sorry i don't even know the name i don't't, this is not my... Oh, they are so good. But I sense it's probably something similar where there's a kind of language to them that you have to figure out. Like if you just went to one never having heard of it before, you're not going to necessarily do so well because you don't, the premise can be,
Starting point is 00:40:59 you may need to know a certain trope or a certain way of thinking for it to start to make sense. The same is true of a lot of things. So I feel this one does a good balance of understanding where the puzzle is. So sometimes when I play games that are claimed to be sort of escape room themed, I don't even, I can't figure out where the puzzle is. I understand I'm supposed to do something, but I don't get that you're supposed to just tap around the screen or you're supposed to bash open things
Starting point is 00:41:30 to look what's inside. Those aren't intuitive to me. But this game, I feel like, strikes a really nice balance of like, it's clear what things I can affect. And then by like working through the interactions between them, I can see where the puzzle is and I can see how to solve the puzzle.
Starting point is 00:41:46 I don't always get it right the first time, but each time I can continue to make progress. And so I find it for my naive level of sophistication. I don't have to like go consult a walkthrough repeatedly to like keep making progress. I'm able to work it through on my own. And it's at just the right difficulty level where I don't get bored of it. And I don't just cruise right through it either. So I very much enjoyed and the theming is kind of cool to a little bit James Bondy a little bit, you know, the spy thing. And it has a little bit of story without being too much I find to be a really good experience. And each level is sort
Starting point is 00:42:19 of bite sized enough that for me, like sitting and doing one VR session, you know, I can get through kind of like one level and that's about the right for me. Cool and doing one vr session you know i can get through kind of like one level and that's about the right for me cool it makes sense and so yeah i guess one of the challenges with that i mean puzzle games in general it's like uh you never really know as a puzzle game designer like how you're supposed to deal with someone not figuring it out that seems like the hardest part right like with these rooms, what they've done is, like for example, we did one where, and we've done a bunch,
Starting point is 00:42:50 but we did one where we just, there was just a trick that we didn't get. I think it was some kind of language trick where there was like four different cat names and based on the cat names, that was the clue. Like you just had to know some factoid that none of us had really heard of. And so we were really stuck.
Starting point is 00:43:10 And so they have this thing where you can press a button and basically they'll kind of give you, I think there's two levels of hints. But yeah, kind of getting that right, I think especially for VR, it's really important because you don't want to take the headset off and then go look on Wikipedia andipedia back you know and so uh actually how do you die in in these games like does the screen just go black yeah i think it
Starting point is 00:43:32 like i don't remember exactly i think believe that like fades to red you know and then it's just like but whatever when it happens like a explosion of a grenade or like gas coming in and you starting to like black out and you have a few seconds to rack because sometimes you can stop it but sometimes like a grenade exploding you can't um and then it fades right and you go to like a sort of death screen where it you know makes some funny quip about you sucking and not figuring it out or so much wasted training on you or you know some some such something and then you can replay the level but there are achievements as well and the achievements sometimes require you to die in certain ways or whatever. Oh, yeah, that makes sense.
Starting point is 00:44:06 Yeah, I used to play a game when I was really little called Lost Vikings, which was a puzzle game. And one thing that I thought was pretty interesting was if you lost, if like you died three times trying to finish the level, then on the fourth time they would offer you to skip it. Oh. then on the fourth time they would offer you to skip it and uh oh yeah i feel like that that actually is probably the hardest part of all these games is figuring out what to do there they do that in some super mario games like the newer ones if you die too many times they give you like a special power up or whatever that that makes it a little easier yep yep that makes sense um but yeah i so i've really enjoyed it you know, I think I want to get more into these puzzle games, but like you pointed out,
Starting point is 00:44:47 they need to be kind of like well-designed or else it just becomes frustrating. I still find myself frustrated at some of the puzzle games where you need to like do an action repeatedly, even though it's not resulting in anything happening. Like, oh, you click this character and the character burps and you have to kick the character and they just keep burping in exactly the same way.
Starting point is 00:45:06 But on the 13th burp, they belch out a potion in their belch bubble or something. And it's like, what? Like, why would I sit here and click the same thing, getting the same result? If you don't give me some indication that progression is being made, I'm going to go do something else. Yeah, yeah, exactly. Like there should be a way to just using logic and observations to not have to guess. I think that's that's really important. Yes. Yeah. Yeah. Actually, they I used to do the those would you'd guess but it would just take forever or like Sudoku. So because much more common example, like you shouldn't be doing a lot of guessing in Sudoku. Like if you're doing a lot of guessing, no, no guessing. Properly designed Sudoku puzzle requires zero guessing. I thought there were some were like there would be maybe
Starting point is 00:45:59 one guess. It may need a very advanced technique or whatever, right? Like if it's a hard puzzle, like you may not know the deduction method that's needed and therefore you would have to guess if it's like above your level but uh like a any like now i don't want to say like you can make whatever sudoku puzzle you want right as long as if it has a unique solution and it could require guessing though but they're like when the algorithms design them they or the designers design them they're supposed to know here's the only kind of techniques you know and it should be forward solvable using only deduction oh interesting yeah that makes a lot of sense i think uh i think yeah there was this other puzzle game i used to really enjoy
Starting point is 00:46:39 and um i i watched a um actually a podcast. There's a podcast called Game Design. I can't remember the exact name of it. It was Game Design Podcast. And one of the inventors, the person who wrote that game was on that podcast. And it's actually really clever. They just kind of, and Sudoku probably works somewhat similar,
Starting point is 00:46:58 is it kind of worked backwards from the answer. But you have to work backwards in a way where it's not ambiguous. But yeah, I think that sounds awesome. awesome actually i'll give that a shot i'm looking for another quest game to pick up yeah um this is one where like i pretty much like the kids will play a level i've already played and once they learn like you know kind of thing but it's over their head but they enjoy watching me die like oh daddy sometimes i do it on purpose sometimes like i realize there's a novel way to die and i'll let myself die just for kicks.
Starting point is 00:47:26 But they always think it's hilarious. Oh, daddy, you suck. Like you can't, like you didn't realize that grenade was going to kill you. Like how dumb are you? No, I mean, I'm being hyperbolic, but it's an enjoyable family experience that way. Yeah, my son does the same thing with me and Zelda.
Starting point is 00:47:41 So he, I made it to this one temple in Zelda where it really was too advanced it's one of these ones you have to battle a robot people have played the breath of the wild there's some of these temples where you fight battles but the game is non-linear so you could get there at any point and this one robot i just ended up basically if he hit me at all i just died instantly and so it made it made this battle way harder than it really needed to be by doing it so early. And yeah, my son actually was filming it on his iPad for his school share day. And he's like, dad's going to die again. He's going to die. Watch. He's going to die. And then I died.
Starting point is 00:48:17 He's like, yep, dad died again. There's a game over screen. I was like, honestly, it's the first time in probably 25 years where I wanted to throw the controller for 35 years or something. But I was like, you know, having the audience support actually didn't help that much. All right. Hash maps. Hash maps. Yeah, I'm going to kick it off.
Starting point is 00:48:42 So, you know, before, you know, when I was in college and learning about data structure and algorithms, of course, I learned about, you know, Hashmaps. And, you know, oh, cool. Like, we're going to talk about, you know, what makes them interesting in a minute. But it never really clicked to me. And I was mentioning this to Jason in the pre-show. I don't think we'll get to it at the end because we're already going along. But when I went to start, you know, looking at sort of what you call Silicon Valley companies, big companies, like interviewing there, like I kept reading, oh, HashMaps, you got no HashMaps, you got no, I hadn't really used HashMaps.
Starting point is 00:49:16 And I thought they were overplaying it. But sure enough, all my interviews, like some solutions needed familiarity with HashMaps and intricacies of HashMaps. And then showing up to, you know, my first jobs there, like, sure enough, like hash maps were like super common. When you get to kind of like really big data, I was really surprised I underestimated the importance of hash maps early in my computer science career. Yep, same here. So the promise of a hash map is insertion and lookup being constant time
Starting point is 00:49:48 so the idea is if you think of like a binary tree we talked about trees i think in our last duo episode then you get log of in insertion right the insertion if you just insert into a sorted array for instance uh you and you don't use binary search, you'd have to sort of like go through each element in the array to find where you need to insert it. So that would be like linear time. If your stuff is a well-formed binary tree, then you need log of in like visits to nodes before you find where to insert.
Starting point is 00:50:21 But HashMaps promised to blow that away by saying, I'll give you a constant time insertion and a constant time lookup. So if you're trying to see if this key is in your HashMap, you can do that in sort of constant time. That doesn't mean zero time. It just means that it doesn't really matter how many items are in your map, that no matter how big your map
Starting point is 00:50:46 grows within bounds, you would still get sort of this algorithmic constant time. And so that's a super, super powerful thing. And we're going to talk a little bit about the various ways that can be accomplished. Cool. Yeah, totally makes sense. Yeah, I think, I think it actually will surprise a lot of people, like how ubiquitous hash functions are, they're really just everywhere. But But yeah, at a high level, what's going on here is, so you know, we talked about trees, right? And, you know, trees are super nice. The challenges with trees are that you have to, you know, keep them balanced, right? And so that could be really expensive. And even if you don't have to keep them balanced, putting something or finding something in a tree takes logarithmic time. And so that doesn't sound like a big deal, but when you start putting lots and lots of things into the database,
Starting point is 00:51:39 it can actually become a really big problem, right? So imagine like log base two of of a million is oh geez i kind of uh kind of put myself on the spot here is what like maybe six or something or the log base you know log base two of a million is what like 14 or something anyways it could easily end up taking like 14 times as long the internet says says it's 20. Oh, 20. Well, there you go. All right. It shows how well I could do a log in my head. But yeah. And so imagine, you know, like you have an app and now it takes 20 times as long to run. It could be, it could end up being a really big deal. And so the challenge with hashing is figuring out sort of the hash function. And what that means is figuring out
Starting point is 00:52:25 sort of some type of way to represent something really concisely or really like ultimately represented with just a number. And so there's been a bunch of work in this field. I mean, it would take a really long time for us to really dive deep into it. But at a high level,
Starting point is 00:52:42 there's really two types of hash functions that are important. The first type is like cryptographically secure, and the second is not secure, right? So for example, if you're trying to hash like credit card numbers, well, maybe, like maybe, you know, someone, let's say someone has a, like has a person's name, they're trying to find their credit card number, and maybe they have somehow they have some access to the database. And so they have all these hashed credit card numbers. If you do something that's not cryptographically secure, then, you know, they can use various statistical tricks to try to reverse engineer that. Or maybe they can narrow down. If they have a list of credit card numbers,
Starting point is 00:53:25 they can narrow down which one it is. And so if you're storing something really sensitive, using a cryptographically secure one has some nice guarantees, right? Or if speed is not super, super critical. And so that's why something like Git, for example, Git will use SHA-256, which is cryptographically secure. Now, you might not be storing passwords in Git. In fact, you probably should never be storing passwords in Git. But they do that because it's on almost all the computers and that's not the thing that's going to slow you down if you're using Git. You're not blocked by the hash. Jason, there's one more point there too that I was kind of unaware of because I don't build these kinds of services. But another reason to make
Starting point is 00:54:09 sure to use cryptographic hashes is if you have sort of a public service in some way or an API that isn't only trusted users, I guess, that is backed by a hash map and the keys in some way are like basically user defined. So I'm creating a username or I'm creating something that I can kind of control and if someone can deduce what hash function you're using and insert entries which all collide, all go to the same hash, then they can basically cause a denial of service. We'll talk about why in sort of a minute. But that's another reason why if you have a public API of some nature, where you're going to back it by a hash, you want to be really careful that someone can't cause like force hash collisions to occur, and thereby like slower denial of service your your service. Yeah, yeah,
Starting point is 00:55:03 that makes sense. Yeah, I think, yeah. So, so just, yeah. Using Git is a good example. Like, so let's say someone, um, you know, creates like 10 million Git commits that all have the same content in them. Right. Well, and you're running like a Git, uh, website, like you're, you're, you're hosting Git for other people as your job. Right. So you're probably going to write some rule that says, look, if this person makes, you know, the same commit all these times, you know, they all have the same content that I'm going to block them or, you know, I'm going to, yeah, block them and then send an email saying, you know, your account's been compromised or what have you
Starting point is 00:55:37 or something bad has happened, right? On the flip side, maybe someone is creating a whole bunch of different commits with different content because they are writing some script that's automatically converting all the tabs to spaces and all their files or something. And for whatever reason, it's creating a lot of commits. And so that might be normal, right? And so getting to Patrick's point, someone could actually, if it's not a cryptographically secure hash function, they could actually figure out a way to be making a lot of different changes, but they're all hashing to the same hash and creating all sorts of problems there. And so it'd be very hard to fix that. But there are times, I mean, a good example of a time to use
Starting point is 00:56:21 hash that's not crypto secure is when you're doing these big data ETL things. So when you're doing like a MapReduce or Spark or one of these things, so just to like not sidetrack too much, but, but the way MapReduce works is you have this enormous volume of data somewhere. And let's say you just want to build a histogram of it. So you want to count the number of times you see Patrick, you want to count the number of times you see the word Jason, you want to count the number of times you see the word programming throwdown. So the way MapReduce will do this is, you know, it will have a ton of different machines that are going to just take a small piece of this massive data set and count the Patrick's and
Starting point is 00:57:05 the Jason's and the programming throwdowns. And so then it's going to hash all of those names, right? So Patrick is going to hash to some kind of number. Jason's going to hash some kind of number. Programming throwdown is going to hash to a number. And then it's going to, based on that number, send it to what's called a reducer. And so if you have, let's say 10 reducers, you might say, well, all those numbers that end in a zero go to the first reducer. All those numbers that end in a one go to the second reducer. I guess you'd need 11 reducers, but, but anyway, so right. So then you, you use that to the reducer is going to collect all that information. And so a single reducer knows it's going to have all the Jasons
Starting point is 00:57:49 because JSON is always going to hash to the same thing. And so that's how MapReduce works. And, but for MapReduce, you know, as I said, you're processing these massive data sets and you know, it's not really a cryptography thing. Like there's not really a cryptography thing. Like there's not really any concern about that. And so MapReduce, at least back when I was working on a long time ago, use Murmur. Now it might be using XXHash, which has been proven to be a lot faster. But, you know,
Starting point is 00:58:18 that's something where, you know, actually saving that bit of time by using Murmur instead of SHA will actually really matter because the computation there is so intense. And so the hash in HashMaps is needed because ultimately what you're trying to do is take some key. And we should mention, I guess, HashMaps and HashSets are kind of interchangeable. Like it's just kind of the underlying fundamentals are the same as just what data type you use, whether it's just the key only or the key plus some data. And so what you're trying to do with the hash is take arbitrary data, mix it up, combine it, do something in some way to go from an arbitrary,
Starting point is 00:59:02 any input size of bytes. Imagine a string and you don't know the length of the string right it could be Bob and it could be Sir Ferdinand the Great I don't know like something really big and you want to combine it all down and let's say you want only one byte to represent all of that data how do you mix it up so that your keys are as evenly distributed as possible across your key space. Now the other thing you have is you have the hash function which goes from an arbitrary number of bytes usually like for the ones Jason was mentioning to you
Starting point is 00:59:33 know 4 bytes or 8 bytes for 64 bits or 256 bytes would be what 16 bytes? No? Yeah that's true. So you end up having, you know, say, let's say four bytes, right? But that's still two to the 32. That's a lot of like keys. And the getting constant time, you have something called a bucket. And a bucket is taking a key and having, you can think of it as an array of a size. So if you is 32 bytes. There we go. Thank you. That makes sense. And so you have these buckets, and the buckets are like slots in your array. So say I have, I want to make my array small to begin with. So I have seven buckets, seven elements in my array. And I have a four byte hash result, right right so i need to get from four bytes down to
Starting point is 01:00:26 seven um and the way you do that typically i mean there's other variation is is a modulus and if you do the modulus of you know two to the 32 whatever the value is modulus seven then it'll just keep wrapping around and you'll find which slot to uh put your in. Now there is interplay as well that we won't get into about how to choose that when you want more slots than seven, like what do you choose as your next? Do you choose eight? Do you choose another prime number? Seven's prime. And there's some trade-offs there. But ultimately you're taking this hash function and fitting it into your buckets of some size. Now, the size is determined in part by one of the design factors that can end up sort of breaking the constant time assumption by something called the load factor. So the load factor is how many items you have stored in your hash map divided by your current number of buckets. So if you have buckets that are size seven, and you have 14 items you tried to store in seven buckets, then clearly you're going to have at
Starting point is 01:01:33 least some collisions, and your load factor would be two. Now, when you have collisions, two things mapping to the same slot, there's a variety of ways of handling it um the we'll talk about that in in sort of a minute but in all of the cases the bigger your load factor gets in some hash maps they try to keep or must keep the load factor under one um in other cases you could allow it to grow above one but in both times once the load factor gets too high, you start breaking your constant time assumption. You end up going to linear time, which is worse than logarithmic time. And so you have to be really careful to play this load factor. Now, where the load factor game comes into play is that you end up using more memory
Starting point is 01:02:20 because the smaller your load factor is, you're effectively wasting memory. So if I say, oh, I'm just going to create a, you know, billion entry array, and I'm just going to hash everything to it. And I'm not going to worry about hash collisions. Okay, cool. Except that billion element array is really big in memory, and it's mostly sitting empty. So there's a trade off between memory usage and speed. Yeah. So, I mean, you know, what makes the hash table or hash map or hash set, what makes these things so fast is that hashing is turning some content into a number and that number is then used as an index. And so if you have an array, you know, getting an item from that array from its index is super fast. And so you're kind of turning like you're doing sort of an association, right? That's turning, you know, strings or whatever you want into indices and then just putting it into that spot. But yeah, as Patrick said, the problem is that only makes sense if the array has already been totally allocated in memory and you can go to spot five by doing some arithmetic.
Starting point is 01:03:31 And so for that whole thing to work, there has to be something, even if it's just empty in spot four and spot three and spot two and spot one, like that space has to be occupied. So we were kind of getting at this load factor hashing about changing your data into a sort of number, modulizing that number by the number of buckets you have, and then looking in that slot. Now, what happens if two things map to the same slot? So it turns out there's roughly two strategies for dealing with it. The first one is called closed addressing. So an example of this is the C++ STL library for their unordered map and unordered set use closed addressing. And there's some variation and complexity about what they do,
Starting point is 01:04:16 but roughly what ends up happening is if you have a collision, because you could have seven buckets and you insert your first item in index one and then your second thing hashes and you also insert it in text one so you have only two things but they collided and what would happen in the stl library and these closed addressing is you would just make a linked list so you would take put them in the same bucket but but there would be a list. And so when you go to do a lookup, you get the bucket and you say,
Starting point is 01:04:48 does my key match the key at the head of the linked list? If not, go traverse the linked list until you either find the match or you find out that it's not in your set. So that's closed addressing. Yeah, actually, one thing that I wanted to just mention there is, yeah, you have to keep the original key because when you hash something, you've kind of destroyed the original value. Like if I, yeah, if I hash Jason, it's going to give me, you know, 13 or something like
Starting point is 01:05:17 that. And so now Jason is gone. Right. And so, and so in the hash table, when you go to spot 13, you actually still need to store the key again. And that's one thing that kind of confuses a lot of new people when they see hash tables. And so, yeah, so that's to Patrick's point. Like if you're looking for JSON, just because you got to 13, it doesn't mean that you found JSON because it could have been something else hashed to 13.
Starting point is 01:05:41 So that's where you would actually look through that list and say, OK, is there an actual JSON here? And if that list gets really long, that's going to take a long time. Hot tip here too, is I think every one of these things we're calling out, I've heard personally in an interview I've been in, done or listened to. So if you're going to do an interview for a job in Silicon Valley, like every one of these caveats I think we've mentioned so far has come up, at least in my experience. So the nuances here, if for no other reason than interviewing ability are important, but they do actually impact when you get into the subtleties of how your HashMap is performing. Because I do know a lot of people who just slap in a HashMap and assume it works and then move on. And they don't actually measure to see if it's an improvement or if they should tune it. Yeah. One of the things that I was really
Starting point is 01:06:33 fascinated by, again, working, so, you know, MapReduce, like working on MapReduce was definitely the time where I was really exposed to a lot of these things because that scale is when you'll really start to see all of the challenges get exposed. And MapReduce switched from using QuickSort to MergeSort. And the reason is because, you know, in QuickSort and MergeSort have the same runtime on average. But if you pick bad pivots which could happen because the data is just structured a certain way and and you're expecting totally
Starting point is 01:07:10 random or arbitrarily distributed data there's actually times where quick sort would you know consistently for this one map reduce quick sort would take almost linear time and and you know this is over you know like billions and billions of items so it just it would basically never finish and so um and so and so we end up switching everything to merge sort and so i think um and so that's that's that's one of these challenges with hash tables too is is um you know especially if you're doing something at scale, is you want to monitor the number of collisions and also something called the smearing, which is just another word for the entropy, really.
Starting point is 01:07:53 But you want to measure the entropy of your hash table. So in other words, you want to see that a lot of the items in the hash table, all the cells in the hash table are either empty or filled with one or two or three things. And yeah, just keep a constant eye on it. And you're right, it does come up a lot in interviews, trying to understand that whole,
Starting point is 01:08:16 like understand all of those dynamics. So if we gave the name closed addressing, it doesn't take much, I guess, to figure out the alternative is open addressing. So if you end up with a collision and you're using an open addressing hash map then you use some mechanism for finding another bucket, another slot in your array to go look at to see if your key is there. So the easiest one to kind of think about is when inserting, if I insert and there's already something there, then I just do my bucket plus one. And then I look,
Starting point is 01:08:53 is there something there? And if so, go bucket plus two, right? And I just keep walking down. And you could do linear probe, quadratic probe, right? You're just basically looking and kind of iterating through all of the buckets, looking for one that's open. And then at accessing, you just do kind of like the opposite, you would go to your array, if that item, if the bucket is empty, you're done. But if the bucket has an item, and it doesn't match your key, then you have to follow sort of the same probing as the insertion algorithm use. And there's a variety of these. And these actually, and I'll talk about why in a second, but there's there's cuckoo hashing, Robin Hood hashing. So like in Robin Hood hashing, you measure how far away from your original slot
Starting point is 01:09:38 each element is. And you use that to decide whether to keep looking or to displace the elements that they are put yours in and then move that one further down so that you're trying to get your keys as close to their original bucket as possible. And the reason why these end up mattering and why this versus like the linked list approach from closed addressing matters is because again, algorithmically,
Starting point is 01:10:03 these are constant time insertion and lookup on average but in practice you have problems with cache efficiency in a linked list if i don't pay attention i go look up one key and then it's not correct and so i have to traverse the linked list but that item in memory could be at an entirely different place in ram just based on wherever i got it from my allocator off the heap and I have to wait for the processor to go through all of its caches out to RAM, pull in that value, and access it, right? So the cache efficiency there is very different than if I have two slots in an array, which will be consecutive in memory, and I just need to look at the first one and then the second one,
Starting point is 01:10:46 because the way the processor works is it doesn't just fetch your value from RAM and into cache. It fetches, it's called sort of a cache line. It fetches a bunch of memory all at once into the cache. And then so it expects that your next lookup is close to your first lookup when you're accessing memory. And so these open addressing ones try to leverage that realization to make much more cache efficient ways of accessing the hash buckets. So yeah, one question, when you're doing the open addressing, how do you know when to stop?
Starting point is 01:11:21 Like, let's say the hash table is full, right? Yep. So you stop when either you get to an empty bucket or when you have if it's completely full and you've gone all the way around but these are much more sensitive so closed addressing and i i might someone will tell me i'm wrong that's fine closed addressing you can have a load factor over one open addressing you could never have a load factor over one right it'll crash or something right because it just won't be able to insert like yeah you have no you can't insert the last thing as soon as you go over one and if you have one then you're right any checking for any value not in the set will require you to iterate all buckets. Yeah, the only alternative
Starting point is 01:12:06 would be you'd have to also store the hash. So like in every bucket, you'd have to store what the hash would have been if there were no collisions. Yes, there are. So there are also hybrid hash maps, I think that kind of start to use rehashing methods. So rather than using a single hash, you use a different seed and you hash again, or multiple bucket lists. But yeah, so Oh, no, I was saying something a little different. I was saying like, so let's say you're doing linear probing, right? And so but but let's say in the hash table, along with the key, you're also storing the hash of the key. Or I guess you could just do it on the fly. So like if you're doing the linear probing and you're looking for Apple and Apple hashes to two, and then you go to the next one and that item, whatever that is, if it doesn't hash to two, then you know to stop, even if the hash table is full.
Starting point is 01:13:12 But then like, you're either like doing a lot of hashes or you'd have to somehow like store the hash or something. Yeah. Does that make sense? Maybe. I don't want to get into a design review of your algorithm. I feel something else could be hashed to three. So you could have some dog was hashed at three.
Starting point is 01:13:30 Yeah, but I'm saying like, okay, so let's say the thing you're searching for, your query hashes to two, right? And you go to that spot and there's something that hashes to two, but it's not your item, right? So then you go to the next spot and if that thing in that next spot doesn't hash to two then you're done right but i'm saying something else could have already been there yeah oh and then you have to jump over it yes oh and actually actually now i think about it so this is this is uh yeah this is bringing back a lot of memories so actually when you when you when you um if something if there's something that doesn't hash to two in that spot that you're in right now it
Starting point is 01:14:10 could be that apple is still there it just yes got there too late yes and like someone else oh yeah okay so yeah you're right you have to go through the entire hash table worst case yeah see i would totally fail actually just just to circle back something you said earlier, I've actually never had a hash table kind of question, but you could tell I would totally fail that. But I think it is super important. I think, you know, don't panic if like your background is in something else. I don't know if you'll get maybe these kinds of questions,
Starting point is 01:14:40 but it's still really good to know. Okay, I've seen good interviews and bad interviews. I won't speak to bad interviews, but in good interviews, I think they're actually hoping you don't know the answer because they don't care. They want you to be able like Jason just did, like, Oh, have this idea. Oh, wait, this idea doesn't work. And then, you know, be able to make the progression to refining your idea or discovering something that's already in literature that maybe your interview already knows about. But they're letting you sort of do it from first principles and seeing how your brain unpacks the problems. Yeah, it's a good call. So if you really, really want the cheat code, then there's something called the minimal perfect hash. So the idea here is if you know up front
Starting point is 01:15:20 all the items you're going to hash, say 397 of them, then you make exactly 397 buckets and you find the hash function which one-to-one maps all of your input keys to your bucket list with zero collisions. These are pretty cool because you have a perfectly space efficient answer. And the only problem is, how do you search the space of hash functions to find such a perfect hash? And there's a class of ones that you can narrow in on and how to structure this problem. It's a very interesting, sort of research topic. I've never seen it used in practice, but it is a really cool idea that you could have the best of all worlds, a constant time lookup, no wasted space, a load factor of one
Starting point is 01:16:13 and guarantee that you would instantly know if a key is in your set or not. It's just really awesome. The problem is it becomes very, very expensive to find this perfect hash. Yeah, that makes sense. Yeah, I mean, the only way I could think to do it as a person who has no experience would be to just try adding salt and just keep doing that until, but that's going to take forever. I mean, the odds that... You've just discovered Bitcoin mining. Yeah, I guess so.
Starting point is 01:16:41 Yeah, I think, I mean, the odds that you eventually hit that right hash are just Oh, for people who don't know, a salt is, a salt is very simple. It's something that you add to all of your keys that just make the hashing work differently. Oh, yeah, yeah. Okay. And the reason why, correct me if I'm wrong on this, or feel free to fill in the holes here, but I think the reason why they use salt is not to try and find better hash functions, but the idea is the salt is something that you as the programmer know, but someone on the outside doesn't know. And so if someone, let's say, gets a hold of your database, but they don't know your salt because they don't have the source code, then there are certain tricks
Starting point is 01:17:26 that they can't use. Like for example, let's say someone has a database full of hashed passwords. Well, they know that a lot of people out there make their password the word password. So they'll just run like the hash function. They'll like look at all the popular hash functions and run them all on the word password and see the numbers that come out of that or the hash function, they'll like look at all the popular hash functions and run them all on the word password and see the numbers that come out of that or the hash, let's say that comes out of that. And, um, and just see like, okay, which of these hashes is in your database a lot. Okay. It's this one. Therefore you're using this type of hash function. And then knowing that allows them to do a bunch of other things. But if you've put a salt in, now you don't have a whole bunch of hashed words of password.
Starting point is 01:18:10 Now you have a bunch of hashes of, you know, my cool salt password and my cool salt Abba and my cool salt Patrick and whatever. And just like, you know, even though you added the same thing to every key, it's going to cause the hashes to be totally different. Yep. That's right. And so, yeah, I think, uh, so getting back to Patrick saying like, you could just try like adding, trying different salts until you get one that hashes these 300 things in a 300 unique like integers after you've modded them. Uh, but that will take forever. Not forever. Just a long time. Not literally forever.
Starting point is 01:18:48 But yeah, really, really, really long time. Yeah, I mean, definitely. I mean, intractably long time. Like not long as in come back in a week. Long as in like your whole lifetime. Like more than that. Another thing that I've used hashing a lot for which is like very different
Starting point is 01:19:04 from the traditional hashing, but still really useful, is locality-sensitive hashing. And this is kind of the opposite of everything we just talked about for goals for hashing. In locality-sensitive hashing, you actually want things that are similar to hash to the same number, right? So imagine a map, right? Like a map of United States, right? And you have a bunch of points on the map. And so maybe all the points represent different businesses or something, right? And so in this case, you want to hash the points that are close together to the same number because that number is going to represent, let's say, a file on disk that has all the names of the businesses.
Starting point is 01:19:54 And so when you're looking on Google Maps, you're probably looking at a whole area. And so you're probably looking at a bunch of businesses, but they're all close together spatially. You want to hash all of those to the same number so that all the names of the businesses come from the same file on disk, right? And so that's locality sensitive hashing. And so in that case, you're not worried about collisions. It's kind of a different use case, but you still get a lot of benefit. You get a lot of the order one type benefits. So imagine if you now have a bunch of these points from your map,
Starting point is 01:20:35 or maybe even you have a region of points, right? And you can feed this through this locality sensitive hash, maybe feed the boundaries or Monte Carlo it or what have you. And it'll say, okay, the hashes are three, four, and five. And then you can load those files, which might have businesses that are outside of the range, but they're all going to be kind of in that area because of the way you've done the hashing. And that's how a lot of these kind of spatial databases work. Awesome. Yeah, this is a really good episode.
Starting point is 01:21:04 This is very fun. Yeah. I mean, hash tables and trees, you know, it's sort of like, uh, there's that saying that, uh, you know, with a, uh, uh, with a linear saw and a hammer, like you could do just about anything. Like you build a whole house just with a linear saw and a hammer. I feel like trees and hash tables are kind of like that for us. Like trees and hash tables, you can build just about... Things like arrays and algebra, that's all primitives, right? But with those two data structures, you could build just about everything. And a 55-gallon drum of water.
Starting point is 01:21:39 Yeah, that's right. You could pass the computer science GRE with a 55-gallon drum of water, a water computer, and trees and hashtags. Music by Eric Feindler. music by eric barnmeller programming throwdown is distributed under a creative commons attribution share alike 2.0 license you're free to share copy distribute transmit the work to remix adapt the work but you must provide an attribution to patrick and i and share alike in kind

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.