Coding Blocks - Data Structures – Hashtable vs Dictionary

Starting point is 00:00:00 You're listening to Coding Blocks, episode 96. Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app. And you can go to codingblocks.net where you can find things like show notes and examples and discussion and whatnot. Did you just say uh twice in that sentence? That's it. That's right, man. Take it. Uh, well, Alan, it's your turn. Alan? That's it turn. Oh, Alan. Oh, that's it.

Starting point is 00:00:26 I'm walking away. Alan, if we can, we can we change his name from Alan to Alan? Send your feedback, questions and rants to comments at codingblocks.net. Follow us on Twitter at codingblocks or head to www.codingblocks.net and find all our social links are at the top of the page. With that, I am non, uh, Alan Underwood. I'm Joe Zach.

Starting point is 00:00:49 Uh, and I'm Michael outlaw. I just vomited. This episode is sponsored by Manning publications. Manning is running a special promotion this December. The countdown to 2019 will run on manning.com all the way through December. Answer just a single question every day and you'll be in the running to win free eBooks, videos, and even a whole year's worth of new releases. Plus every week, everyone will get to enjoy massive discounts

Starting point is 00:01:18 on Manning products. All you need is to sign up to Manning's deal of the day at manning.com slash mail dash preferences. That's www.manning.com slash mail dash preferences. And you're good to go. Also, while you're up there, take a moment to shop around for your favorite books and use the code C O D B L O C K four zero to save 40%. All right. So as we like to do, we'd like to start off each episode with a big thank you to all those who have taken the time to leave us a review or written us.

Starting point is 00:01:58 So either which way. So on iTunes, we've got cross link JLA one 15 Ross four44Ross, Hutchie Bong, John Mabel, CodeInMyRobe, Saltashi. I'm going to go ahead and do him some justice here because he went through the effort. Sex and Edger, maybe. And then Bubba Cow. All right. And Stitcher, we've got Jarvanen.

Starting point is 00:02:25 Sorry about that. Bubba Cow. All right. And Stitcher, we've got Jarvanen. Sorry about that. Illilville, Cole Israel, Sbgood, Maxim Bendis, and Nathan Ayer. We really appreciate that. Thank you so much for leaving those reviews. Even though we butcher your names, we really do appreciate it. So thank you. You rock.

Starting point is 00:02:41 Wait, is that Ayer or Liar? Oh, I don't know. I don't know. I don't know. Hey, by the way, I forget who it was, but whoever said that they learned more than one episode about big O notation coding blocks that they did about two semesters in college of it. That's amazing. And I feel you. I was there. You're going to be fun, man.

Starting point is 00:03:02 Wait, that's an option. Yeah, right. You guys didn't go to costco you all right a quick update from uh i think it was the last episode where we talked about c-sharp strings and how they kind of dedupe the memory and use them as a performance optimization the technical term for that is actually string and turning and i didn't realize that's actually kind of common in a lot of languages. So you can read about on Wikipedia, but I wanted to point out great tip from a two Mac 84. He had a correction for us.

Starting point is 00:03:31 Uh, it only works for string literal. So those are going to be strings that like show up in quotes in your code. So if you do something like, um, you know, get date dot two string, even though that may have the same value of memory,

Starting point is 00:03:41 it's not going to get the same benefits as you would as if you had that thing in, um thing in quotes. So I just thought it was kind of interesting, and he sent me a little code snippet. That was really cool. So really appreciate that. We love getting that feedback, especially when it results in us learning something. Yeah, I want to be clear there. We're talking about hard-coded strings.

Starting point is 00:04:00 Yeah. Okay. Yeah. That's basically what you're talking about. And by the way, we've gotten some really, really good comments on the past two data structure episodes. So, I mean, we're not going to cover everything that the people shared up there,

Starting point is 00:04:15 but there's some really good things about, you know, why unmanaged code is faster than managed code and some of the reasons behind it and all that. So, you know, if you want to go continue the conversation and learn some more, definitely head up to, you know, slash episode 94 or slash episode 95 and take a look at those

Starting point is 00:04:34 because there's some great comments up there. So. Yeah. So continuing on, it's time we're talking about a couple more data structures. You want to kick it off outlaw? Yep. So let's start with hash tables. So we've built up to here, right?

Starting point is 00:04:51 We started with, well, we started with primitive. So don't let me talk to you about floats again because I will. But as we got into the more advanced data types, you know, arrays kind of laid some groundwork there. So now we understand that, you know, arrays have their place. There's a lot of benefits to arrays, you know, as a way of keeping a collection of data. But also like some of the performance tradeoffs, right, that you might get. So they're great for lookups, lookups, you know, uh, random lookups, but if you had to like insert something in the middle, not so hot there, uh,

Starting point is 00:05:35 versus the linked list on the other extreme, which were great for being able to insert or remove items from the middle of a list. But if you had to search or scan that list, they're not so hot, right? So what if we could live in some utopian world where we could get the benefits of both? That'd be great, right? Well, that's where the hash table comes in. So you get these kind of benefits with both of these, uh, or with the linked lists and the arrays, you get that with the hash tables. Um, but there's some trade off. There's still some caveats. There's some things to be aware of. Right. So, um, yeah, so I had here like the hash table smashes these two worlds together and so uh hang on a

Starting point is 00:06:28 second when you say hash table are you talking about maps or dictionaries or uh yeah i've seen these things used by different bunch of different names in different languages are we talking about the same thing uh no no all right no i'm just talking about hash table well all right i mean yeah because that's the other thing that gets weird too is like let's not let's not confuse it with the talk of a language yet okay we're talking about like hash table data structures and like classic computer science yes all right so because there there are, like, depending on your languages, you know, it might get a little bit more – the terminology might get a little bit more jumbled and mixed up, right? At least in the C Sharp world, though, there are classes for both. So there's a separate class for hash table, a separate class for dictionary.

Starting point is 00:07:21 Yeah, we talked, like, for an hour on arrays and JavaScript and how they're not always arrays. So I'm not surprised the hash tables are no different. Well, just on the naming thing in on the Wikipedia article, just to bring it up, it is also referred to as a hash map. So again, not getting into the actual implementations,

Starting point is 00:07:39 whether it be Java or C sharp or any of those, they are sometimes called one or the other hash table or hash map. Yeah. So, um, I mean, specifically the MSDN documentation, they succinctly described that hash table as the hash table class, as it represents a collection of key value pairs that are organized based on the hash code of the key. And even while they might be documenting their specific implementation of it, I mean, I felt like that was a pretty good overall description of what the hash table is and what purpose it's trying to provide for you, right? So how does the hash table work? So let's consider that at the core,

Starting point is 00:08:27 the core structure of the hash table is an array. And within this array, each element of this array contains an object that contains the key and some value. So it's, it's not just a simple integer, you know, at that key it's, it's some complex type of object, right?

Starting point is 00:08:52 Now, if you're reading the documentation from a Wikipedia, they refer to it as buckets. Some, I think in the imposter handbook, it was referred to as slots. There's not really like a name that I really liked for any of these. Like buckets super felt like a weird name. Like why would you call it a bucket, not just like a name value pair or something else? I don't know. But a bucket seemed like a really weird name for what you would call that particular index within the array. But so I digress. So

Starting point is 00:09:27 you have the underlying structure of the hash table as an array for a key value lookup. The data that you want to read or write to the hash table, you're going to use that data to create a hash. And that hash will serve as the index to the array. Right? And I heard once a good example when I first learned about hash tables, and they were kind of trying to describe the purpose of the hash function. And the example that they started off with was not something that you would see in production, like a real language. But the example I heard is, let's say you've got a word, you've got English words, and you want to store them in a hash table.

Starting point is 00:10:08 You may take the word that you're trying to store, take a look at the first letter, and then store it in a location based on that first letter. So if the word is Abigail, then we're going to go ahead and take that first letter, and be like, okay, this is an A, so let's go over here. Next word, let's say, is.NET Core. And we say, okay, that's a D, so let's put it over there with the Ds.

Starting point is 00:10:31 And then later when you come to look something up, say, okay, give me back Abigail, then we could go and look in the As. And so it's really fast for inserts. It's fast for looking up things too. So it's just a good data structure. Now, in that example, it's a really bad choice because if we add the word Alan in there, suddenly we've got two words now

Starting point is 00:10:49 that both start with an A, and so we've got a collision. Yeah, so this is where, like you were saying, the implementation of the hash table can vary. There's multiple implementations of them. And really what's changing behind there is how they're handling collisions, the decisions they're making.

Starting point is 00:11:06 So going along with your example there with Abigail and Alan, your hashing function in this example that you gave is only looking at the first letter. So it picked A for both of them. There was a collision. So how does it put the second one in? So this is where like, let's think of it being another, a linked list. So you have an array of indexes and then a linked list that a particular index, in this case, the index for A would point to, and Alan would get tacked on to the end of that linked list.

Starting point is 00:11:46 So when you want to look up Alan, you're, you're going to go to the A's, then you're going to scan through the A's and you're hoping that you're not going to have to go deep. Um, you know, you're not going to have like, you know, again, Joe's alphabetical hashing algorithm is probably not the greatest one. Sorry, Joe. So you're going to have a ton of collisions in that particular example, but let's pretend that you didn't because really the idea of the hash hashing

Starting point is 00:12:16 function is that the collisions are going to be rare is the hope. I mean, they're not going to be, they're going to happen, but you definitely want them to happen less often than, uh, than not. So, so that's the idea, right? Or are we, can we agree to that structure, an array for the index for the hashes and then a linked list for where the values are within there? So is that every implementation of the, because like, I don't know, it's, they talk about collisions and there's gotta be some sort of method to handle the collision. So is that the standard way it's done? Okay. Let's talk about, okay. So you can't talk about

Starting point is 00:12:59 hash tables without getting into a big conversation about the ways to resolve these collision strategies, right? So there's popular collision strategies. And if you go to the Wikipedia page, and there are a bunch in there, but what seemed to be like the two more popular ones were the separate chaining and open addressing variations. So in the separate chaining implementation, you know, you would have those pointers, but when there's a collision, right, you'll traverse that list, look for your key, right? And so this is basically the example that I gave a minute ago. And, you know, assuming that you have a good hash function, you're rarely going to have more than three items in any given index. So when you think about the – we talked about linked list being awful, just awful for traversal, right?

Starting point is 00:14:01 And that wouldn't be your greatest choice for doing any kind of sorting. But because these are so small, remember the article that we talked about that we referenced before, like everything is fast for small n, right? So it's going to be good enough in this particular example. Because like if you only have three elements, that's... Assuming you don't have a bunch of collisions, which you should never have a ton of collisions if you have a good hashing algorithm. Well, which is also another point too. Don't try to roll your own hashing function, right? Like if you can just

Starting point is 00:14:35 use the library's hashing function, like if you had to create your own hash table, then if you could use a hashing function that's already built into either the operating system or the library or whatever that that's available to you, you're probably gonna be better off than if you tried to roll your own hashing algorithm. That's where you're going to get into trouble and have more potential for collisions. Yeah, you want to even distribution. So and you want like a wide distribution, too. So you want a lot of different values, because you want an even distribution, and you want a wide distribution too, so you want a lot of different values because you want to keep those lists short. So the alphabetical thing is

Starting point is 00:15:09 terrible. You're not going to get that many Zs, you're probably going to get a lot of As, and besides, there's only 26 buckets. So the chances of you having more than three items in your linked list is really small, depending on how many numbers you're shoving in there. I always did wonder what kind of hashing algorithms they were using. I've never really seen a good example of like, oh, hey, here's how.NET does it. It's based off the memory addresses or something. So I just kind of wondered, like, with a hashtag with thousands and thousands of keys, like, what algorithm are they using to divide those up evenly?

Starting point is 00:15:38 And one other thing to point out here, too, is while we're talking about this, there's this whole notion of collision and that's because the hash can't be perfect right right and there's there is this this notion in the wikipedia article that's interesting it says the perfect hash function so that can only exist if you know all the items ahead of time so that that makes? If, if you know exactly what your data set is, then you can write the perfect hashing algorithm, which is not the case in 99.999% of the time. So,

Starting point is 00:16:13 so these collisions we're talking about happen because you have a very good hashing algorithm that can't be perfect. Correct. And, and, and, and, and,

Starting point is 00:16:22 and, and, and, and, and, and, and, and,

Starting point is 00:16:23 and, and, and, and, and, and, and, and,

Starting point is 00:16:24 and, and, and, and, and, and, and, and, and, and, and, and, and on another good point, too, that you want that ideally – okay, so going to your point about you're not going to have a perfect hashing algorithm you want, you want it to have an even distribution of where it's going to place things. Because again, going with that assumption that you're not going to have any more than three items in a given index within the hash tables array, internal array, in order to make that kind of assumption, then that assumes that your hashing function does create a good even distribution when getting random data to it. Hey, one other thing I wanted to add, I don't know where you finished. Sorry. No, go ahead. I just want to say one thing that's really important, too, is that hashing function has to be performant. So obviously, we don't want it to scale poorly based on like the size of the input, right? We don't want to have like a big, long, slow hashing algorithm.

Starting point is 00:17:27 This thing needs to happen fast. If you look at the big O cheat sheet or something and look at the lookup times and the insert times for hash table, it puts them at O of 1, constant time for fast lookup because it's assuming a good algorithm. And it's treating that as constant time, which means that the bigger your hash table gets, it needs to keep about the same. And here we're talking about linked lists, which we know the insertion there and lookup time is going to be O of N, but they kind of cheat a little bit. And so there's like a little asterisk in the Wikipedias of the world, whenever they talk about hash tables, they're like, well, it's basically O of 1. It's basically constant time for lookups and inserts in the normal case,

Starting point is 00:18:05 but that's really not so much the case for the worst case scenario. But it just happens so infrequently if we have a good hashing algorithm, which is like this magical question mark that we don't have to worry about it so much. And in practice, it works out really, really well. And they say the average time is 0-1, right? So the average. So that's why you can get away with that. And I'm sure Mike's going to get into some ways that it goes awry. Yeah.

Starting point is 00:18:45 So, well, before we go any further there, let's cover another one of the popular collision strategies, which is called open addressing. So in this implementation, the element in the array are the buckets themselves. So rather than the previous one where the elements in the array were pointers to linked list of buckets, in the open addressing, the

Starting point is 00:19:01 item that is in the index is the item itself. Again, I hate the term addressing, the item that is in the index is the item itself. Again, I hate the term bucket, but. That's what everybody uses apparently. Yeah. So what this means is, let's say that, let's continue the example of Joe's amazing first letter hashing algorithm. So Alan and Abigail, we'll just skip, let's skip Alan from when we only have Abigail in the list. And now Brandon comes in. So Abigail is in the first array index. Brandon comes in, he goes into the second because it's a B. Well, now when Alan gets

Starting point is 00:19:46 inserted into the list, he would actually move Brandon down. So it would be Abigail in the first index, Alan in the second, and Brandon in the third index. So you see how this already got weird, right? The hash value only serves as a starting point for where you would look up the data from. So meaning that during a write operation, if you go to that index in the array and something is already there, you would continue on looking for an empty slot. Now in, in the example that I just gave, I kind of smashed, you know, uh, Abigail and Brandon's indexes like right next to each other.

Starting point is 00:20:30 And in reality, there would be some space there, but, um, so that, you know, because what you're doing is when you get to that first index, again,

Starting point is 00:20:42 you're the, the index that you're, that you, your hashing function computes is just the starting point. So once you get to that index and you're trying to traverse from there forward through the array, looking for the next thing that you're looking for, the actual item you're trying to get to, you're only going to look at the, you're only going to keep looking until you get to an empty slot. If you get to an empty slot, then that means that you never found what you were looking for. If you were doing a read. And if you were doing a write, then that would be where you would put the item.

Starting point is 00:21:16 Does that make sense? So open addressing sounds awful to me in terms of performance. Yeah, it just, I don't, I don't know. I guess it's kind of unbounded. It's like, how long does it take you to look up an item in the worst case? So it's like, well, in the worst case, we keep looking at every little chunk of memory that could possibly kind of fit this thing.

Starting point is 00:21:39 So it's, yeah, as you kind of start one spot, it's not there. Either let me rehash it or let me look at the memory spot next to it or something like that or you know some sort of method for determining the next place to look and you just keep looking until you're out of places to look and yeah that could potentially be pretty bad so that seems pretty pretty bad to me yeah it sounds bad but it also sounds like even behind the open addressing thing, there's other, there's additional, you know, features or ways to go about doing it.

Starting point is 00:22:10 There's, there's like several well-known ones, linear probing, quadratic probing, double hashing. And it's basically, so you're not just going to the next one. You might be chopping it up,

Starting point is 00:22:19 doing similar things like binary searches or something like that. So you're trying to scan those indexes really fast, right? Yeah. I mean, I imagine that the performance of it isn't that awful because you're only using, it's only for things that happen to hash to the same value, which again, you're hoping that your hashing algorithm will be good and give you a good distribution that you're not going to have those collisions often. So when you do, you're probably that your hashing algorithm will be good and give you a good distribution that you're not going to have those collisions often. So when you do, you're probably not searching a lot.

Starting point is 00:22:48 So it probably much like the separate chaining gets away with traversing the linked list. Open addressing probably gets away with having to traverse that array because there's not going to be that many collisions or at least that's the assumption. Right. Yeah. Assuming. So it's probably one of two things. Right. because there's not going to be that many collisions, or at least that's the assumption, right? Yeah. Assuming, so it's probably one of two things, right? Either assuming that the hashing algorithm is good enough or that your data set isn't so large that you're going to run into those collisions more, right, is my guess.

Starting point is 00:23:19 Yeah, because even if you, okay, even if you had a lot of elements in the hash, in the hash's internal array at that point, assuming that there's a good distribution, you're only going to be scanning a few items before you get to a space. And you're going to be like, oh, there actually wasn't anything there. that that key is available if you you know you know that that's where like uh if you try to get that value you're just going to get back a null or depending on whatever framework you're using at the time uh however it decides to handle that type of situation cool all right you know we i don't think we've really talked about um the what this looks like for the programmer so i love hashes and and their ilk because of how easy it is to use for like a programmer so i love hashes and and their ilk because of how easy it is to use for like a

Starting point is 00:24:06 programmer so in javascript i you know i'm used to using it as an object there's different ways to do that though but i basically i create my object and i throw the the indexer so the square brackets are routed and give it some sort of key so i could say you know abigail is my key equals new person and i've created a new person and it's stored in what I think of as being the key of Abigail. And then when we tell the hash, hey, give me the object at Abigail, and it's going to go, it's going to run the hash on that word Abigail there on my key. And that's how it's going to figure out how to look up the actual person in memory that I stored. It's pretty awesome.

Starting point is 00:24:46 So if you've got something where you want to be able to just kind of shove stuff in and look it back up, then that's a great data structure. I've been doing the Advent code this year for 2018. And one of the problems I just solved, I ended up using a hash table. And the reason I went with a hash table over an array because I was storing numeric keys is because I thought the data might be sparse. And this is something we talked a little bit about arrays. So if you've got like a fixed or a known size,

Starting point is 00:25:14 say like, you know, 1000, you could do an array with that. But if you know that you're only going to be using like, you know, a handful of indexes out of that 1000, then you're going to be pre-allocating a huge chunk of space in memory and only using, you know, 10% of it or some small number. If you use a hash table, then you're only, you're only allocating the memory as you put the stuff in. So memory wise, it's going to be really efficient. And the insert and take out times,

Starting point is 00:25:39 the insert and look up times are on average of one. So it's going to be just as fast as an array for those lookups on new numbers, but much more memory efficient. Turned out I was wrong on that, by the way. It would have been much better to use an array in my case because it ended up not being sparse. So that's what I get for pre-optimization. Yeah, like our previous conversation of like a list versus array.

Starting point is 00:26:08 Right, yeah. Use the list until you know that you shouldn't. Right, exactly. Okay, so what are some other strategies? So the Wikipedia article has other strategies that they listed for hash tables, but they didn't go into

Starting point is 00:26:23 quite as much detail. But so I'll just quickly like say that some of the names like cuckoo hashing, hopscotch hashing, Robin hood. And then this one was interesting to me, the two choice hashing, which kind of sounded like a rapper name. You know, that would be like my rap name. I'm too hash. I'm two choice hashing, you know,

Starting point is 00:26:44 be right next to two chains and what's up? But yeah, so that one was kind of neat because as the name implies, actually it uses two hashing functions. And so if during a write operation you find a location that is already in use, it's going to use the location that would, the hashing functions location that would provide the fewest objects already at that value.

Starting point is 00:27:12 So hash it one way, then flip it down that and reverse it, and hash it again, and see which one looks better, and just kind of go with that. It sounds kind of awful from a developer, like if you were the developer of that thing, though, of that two-choice wrapping hashing implementation, it does sound kind of awful to try to maintain that code because then you're like, I mean, just wrap your head around that for a moment. Like, okay, I just put something into the hash's internal array.

Starting point is 00:27:43 How do I know which hash function was used like just trying to keep track of that kind of thing like obviously they've solved the problem someone solved it right because it is a real thing but i would imagine it has extra storage needs behind the scenes right in order to map those things out they've got a hash table to keep track of their hash table. I feel like somebody came up with this at like the last second. It's like minutes to the deadline. They're like, oh, this is going too slow.

Starting point is 00:28:11 And I tell you what, just hash the stupid thing twice. That's right. And if one of them doesn't have an item, just put it there. And then whenever we need to look it up, we'll just do it twice. It'll half the time that we need to go chasing down this crazy chain. Let's just go with it. And they tried it and it worked and they shipped and now they're Google. That's right.

Starting point is 00:28:30 Yeah, exactly. And to be fair, when we said, don't go writing your own hashing algorithms, you know, we say that if you're just, you know,

Starting point is 00:28:40 writing some code, obviously if you've got a real use case for it, right? Like you're writing the next, you know, network layer topology type stuff, there's got to be crazy fast and you're not finding anything that suits your needs. And then probably you will write your own, but we're saying on general day-to-day type stuff, if you find yourself writing your own hashing algorithm, you're probably doing something wrong, right? And it might be special cases where you know enough about your data set

Starting point is 00:29:06 that you could kind of make an informed decision about that. And of course, if you were just wanting to learn about hashing, it was like, I'm really interested in how the heck they come up with an algorithm that works. Like, this is not something I could do on my own. There's some math was out there who came up with this hashing strategy that was really good and smart

Starting point is 00:29:23 about coming up with a good distribution. So I'd love to know what that is. So maybe one of these days I'll look it up and try to program it, just like from an academic perspective. But this is not the kind of thing that you want to do on Friday afternoon before launch because, you know, you want to do that. Just use whatever is built into your language. Right.

Starting point is 00:29:39 Agreed. So we've kind of already hinted on this, but I, just to go ahead and throw it out there, like officially, you know, the, the complexity around hash tables and their usage. So on, in terms of space complexity, the average is going to, the average and the worst are just going to be O of N. So however many items you're trying to put into your hash table, there's your space complexity. The insert, the search, searching the hash table, inserting into the hash table, and deleting from the hash table are on average an O of 1. Which is as good as it gets. Yeah, that's amazing.

Starting point is 00:30:23 And then the worst case is going to be an O of N operation. So it's awful, but I mean, it can be worse. And this is one of those ones, like we mentioned, there's kind of an asterisk when you look up it a lot of times in the charts. So often in Big O, we talk about the worst case scenario because it's kind of that limiting function deal. That's kind of what Big O is designed for, is to let about the worst case scenario because it's kind of that limiting function deal. That's kind of what Big O is designed for is to let you know worst case scenario. But in this case, it's so much more frequently the good case.

Starting point is 00:30:53 Whenever you do some sort of lookup, it's like they don't want to scare you. Like, listen, listen, just pretend. Just get it in your mind that it's O of one. And then if you have a problem with it later, come back here and notice the asterisk and then look at it deeper. And you know something, just if you guys, I'm sure we've all seen situations where just hundreds of thousands of items have been crammed into a hash table and it doesn't perform faster. You're like, why? Because I would expect it to be really fast because this should be a direct lookup. It should be an O of one operation, right? And then these underlying implementation details right here of when

Starting point is 00:31:29 there's collisions and how those things are handled, just imagine you're doing some sort of lookup or you're looking up many items in that thing and it's having to go to it. But then it's like, Oh no, now I got to scan the rest of the contents in here because there were multiple collisions. That's probably why you end up running into those things. So it's nice to know these things behind the scenes that are happening for you that you probably never even think about. Yeah, or now that if you do have a slow-performing hash table usage, maybe you have some target to go after. It'll be like, well, huh, maybe there's something about my data that I'm not getting the uniform distribution that I thought I should have been getting. But just to point out,

Starting point is 00:32:10 though, that when we talk about the best case or the average case, I'm sorry, time complexity being an O of 1 versus the worst case being an O of N. I mean, O of N was like not far, like O of one was a flat line and O of N was like, it's not terrible, slightly, you know, it was like maybe a three degree,

Starting point is 00:32:35 four degree line. Like it wasn't a big line in comparison. I mean, we're not talking, it was, it was definitely still at the bottom of the yellow on like, if you were to go to big O cheat sheet.com, right. O of N is there at the bottom. the yellow on like if you were to go to bigocheatsheet.com right o of n is there at the bottom so in your like quote bad cases like it was the best of the bad right all right it still kind of stinks though like if you've got a algorithm that takes one second

Starting point is 00:32:58 and then o of one with a thousand inputs it's still one second but in o of n even though it's great a size of one of one thousand is going to give you one thousand seconds so i mean it's still one second but in o of n even though it's great a size of one of 1000 is going to give you 1000 seconds so i mean it's a big difference and you can notice if you ever had an algorithm that was like looking stuff up in arrays over and over and over again by searching through all of them and you convert that thing to a hash table you're probably going to see really big delay i mean a really big improvement there but yeah you're right and it's not it's definitely not as bad as the others yeah that that That's the point I was trying to get, I wanted to make was that, yes, it could still be bad, but it's not like, Oh, of in fact, factorial, you know, it's not a hockey stick is where

Starting point is 00:33:33 I'm, I'm going with that. If you just really want to boil it down to that big, Oh, cheat sheet. Oh, of one is the best. Oh, log in is the very next best. And then, Oh, of in this is the next. So you're, you're not doing bad, right? You're still barely out of the green. You're a little bit out of the green, but still, it's not a bad operation.

Starting point is 00:33:54 So the fact that your worst case is O of N is still not terrible. But, I mean, to Joe's point, should you find some kind of massive data set to where you have collisions 99 of the time right and you're using say like underneath the covers unbeknownst to you the separate chaining uh resolution strategy is being used you're like hey why is my hash table performing so badly you know if you happen to notice like oh my data happens to use randomly producing keys keys, hashes that are all the same, that's why it's performing so badly. You could kind of just make guesses because even if you did happen to notice that it was producing that without looking at the implementation of the hash table, you'd be like, oh,

Starting point is 00:34:39 they're probably using a separate chaining collision strategy behind the scenes, and that's why. Now, who's actually going to go They're probably using a separate chaining collision strategy behind the scenes. And that's why. Yep. Right? Now, who's actually going to go start hashing all of their own data to see what it's going to produce? I don't know. Maybe you're using it. Don't judge me, man. You do that, I'll be over here.

Starting point is 00:34:58 Like, npm install. Right. Hash 2. Right. Exactly. You know what? I'm going to derail this for a second i i want that never happens right i want to say how much better things like npm and i'm even gonna

Starting point is 00:35:12 give a shout out to the java community here we go and maven oh god is oh god the nougat why does nougat suck so bad oh man like seriously why does it I'm so angry at Nougat almost all the time because of this whole, it doesn't encapsulate its own dependencies well. And that, man, that makes me mad. NPM does it with JavaScript. Like, how? Is it a Nougat problem, though, or an MSBuild problem? I don't care. Let's be fair.

Starting point is 00:35:44 I don't care what it is. Are you talking about like where A depends on B, B depends on C, C depends on D, and A doesn't have an explicit reference to D, but yet it's needed? Well, that's one. Because that's an MSBuild. Let's be fair. That's one. But the one that really irks me is you – all right. So we're going to dive down here into the c-sharp for just a second

Starting point is 00:36:06 let's do it so i am using log for net 1.2 in my app right let's say but i pull in a dependency that's using log for net 1.1 well guess what i can't use it because I have to have the same dependency that my dependency has. And that burns me up. Like, dude, Maven and Java work really well in this world. You can bundle up everything you need in your jar and it just works. Why can't we make that happen? Okay, two things. Number one is technically you could solve that problem with binding in your apps config.

Starting point is 00:36:49 You could change the version. You could give a range for the DLL binding and say anything from this range to this range. Okay, let's say that it's breaking. Number two way that you could solve this problem is as the publisher of that NuGet package, you could IL repack your dependencies in so that they come along for the ride. And they're baked into your... So bake them in so you're not actually including a dependency. the whole fact that almost everybody else has figured out this way to be able to bundle in your dependencies in a way that doesn't damage your own application. And it burns me up every time I face this with NuGet. Like it makes me so mad that I'm like, I'm not doing this in C Sharp.

Starting point is 00:37:37 I'm doing it in JavaScript. I don't care that I like C Sharp better. I don't want to deal with this dependency entanglement nightmare, right? Like every time I deal with new game, it's does seem like a bit of a headache. And when I'm used to using things like node, it's, you know, there's, it's a no brainer that I can just NPM install anything in like two

Starting point is 00:37:53 seconds and get it working. Now that the node module folder is horrible. Oh, it's a nightmare to look at it. It's mad. It's like 300 gigabytes or whatever. Diniest whole thing. But I mean,

Starting point is 00:38:04 that's crazy, but that's your tip But that's what. This is other good stuff, though. Your tip last week, or last episode, Joe, that Jamie provided. I mean, that was the whole point of even for NPM was trying to get around some of these problems where, you know, if you only built based off of the package's JSON file, that's kind of loosey-goosey, right? As to like what the versions could be, whereas the package lock JSON isn't. That's specifically, here's the entire dependency graph, like every version that was used, right? And that's where, while we're on this tangent, like even if you do the IL repack that I mentioned as a way to get around it, that's where those kinds of solutions can still irk me.

Starting point is 00:38:49 Right. Because I'm willing to understand like the challenges that Nugent has in this build. I, I kind of can't forgive for the, like not including the dependency thing, skipping the chaining dependency. But then that,

Starting point is 00:39:01 that problem then carries forward or is replicated by, and I don't know which, it's either, you know, but by like IL Repack, for example, where if MSBuild didn't, so dependency A, or let's say app A depends on dependency B, depends on dependency C, and dependency C depends on dependency C and dependency C depends on dependency D and MS build. When it does that come compilation, if app a isn't explicitly using dependency D and nothing in its execution path is using dependency D dependency, D will not come along for the ride. And then you'll be missing. So in your, in your,

Starting point is 00:39:41 in your bin directory or your output of the compilation process, you're not going to have that dependency there, right? Well, you and I especially, I know, have seen cases where, and it's well documented, like you can go search it on, like Stack Overflow has plenty of articles on this problem. Where even though it thought that it didn't need it, it actually was needed in some scenarios, and you'll get runtime errors that you won't see. But I'll repack to its credit, I guess, instead of making a runtime error, you'll see it as a build error because it's not smart enough to recognize

Starting point is 00:40:22 that App A doesn't use certain things. And so if it sees that one of those random dependencies has a dependency, the next dependency, it'll keep walking that app's dependency chain, expecting all of those DLLs to be in there so that it can hack them all in together. And when it can't find D, it's like, oh, well, we're done. All right. Well, there's such different cultures, too. I just had to mention, like, when I'm working in, it's like, oh, well, we're done. All right. Well, there's such different cultures, too. I just had to mention, like, when I'm working in front end, like, man, I'll require something in, like, a heartbeat.

Starting point is 00:40:52 Oh, I don't want to left pad this thing. Just require, just look it up. Like, nobody cares, you know, because it's the front end. And for some reason, I guess maybe because there's, like, this culture of kind of these things being small and like single, single purpose and DLLs tend to be heavier, man, when you're working on the backend and you want to bring something in and you get, it's like,

Starting point is 00:41:09 all right, let's call, let's call the tribes together. Let's get the council, all the dev, you know, managers get together and you got to go there. You got to bring presence and you got to sweet talk them and you got to,

Starting point is 00:41:19 you got to set up a recurring meeting. Cause it's probably going to take you a couple to actually get this thing added to the back to the project. And then when you finally get around, you finally, everyone, everyone you know or 80 percent of the people agree yes let's bring this package in instead of rewriting it then you do it and then it breaks the build and you got to mess with it for the next 24 hours it's so true too like you've got the senate over here you got the house of republicans over here that you got congress you've got you got veto power over here. You've got Congress. You've got veto power. See, I thought when you said the Senate, my mind immediately went to Star Wars.

Starting point is 00:41:51 And I was thinking of the Senate within Star Wars. Oh, gosh. So I was picturing that kind of room. You go out on your little hover. Floating platform. Yeah, your hover platform to make your speech about like, we need this great library. Library. Library. Yeah. Your hover platform to, you know, make your speech about like, we need this great library. Yeah. You know?

Starting point is 00:42:11 So, yeah, man, I apologize for derailing. I don't even, something snapped in my head when you said something and I was like, Oh no, I've got to get,

Starting point is 00:42:20 it was like, you had a Twitch, like you hadn't had your medication yet. Like, it's one of those things like where you fight it. You're just like, at first you're angry and then you're confused, right? Like why would anybody ever let this happen? And then it's just, I don't know.

Starting point is 00:42:35 I think it's backwards, right? Like you're confused. Like, wait a minute. Why is this broken? Yeah. This should work. It might be many stages. What DLL is that?

Starting point is 00:42:43 I'm not using that. Like there's no reference to that why is it complaining about that so yeah you definitely start out confused and then when you realize what the problem is then you're angry and you're super angry you're like what and then you're like you know what i'm going over to java i may think it's way more explicit than i want it to be but i'm about to change you know coding paths in my career whoa whoa whoa now you're talking emotionally because that is not a rational decision just come over javascript man seriously if i want an uppercase string i just npm install uppercase i don't even care that it's built into javascript but i'm just

Starting point is 00:43:16 gonna get the package and if it's mining some crypto coins in the background then you know whatever that's the price we pay someone will fix fix it eventually. Oh, that's so amazing. I'm going to assume that some of our Java brethren among us here are going to be upset. Yeah, they're already upset. Yeah, don't hate us. We don't hate you. We don't hate you. It's just, it's funny. There are definitely things that are way more polished and more better in the Java world.

Starting point is 00:43:40 Well, I was going to say. Come to Microsoft and we got Link. Oh, that's so beautiful. That almost makes up for all of it. It almost does. Well, I was going to say. Come to Microsoft Land. We got Link. Oh, that's so beautiful. That almost makes up for all of it. It almost does. Well, I was going to say, too, like, I don't know why in my reaction to, like, that situation, like, you would suddenly, in this hypothetical situation, turn into Little John. What?

Starting point is 00:43:56 What? Yeah, yeah. What? Oh, man. And then when you do finally get it to build, I mean, little John comes back at me like, yeah. Yeah. You're over there like high-fiving yourself in the corner and everyone else is still pissed. Yeah, exactly.

Starting point is 00:44:15 You broke the build all day and you're high-fiving yourself? Great. You don't know where I've been. I got some of my little John impersonations apparently though. Yeah, those are hard ones to do, man. He worked on that for years. All right,

Starting point is 00:44:27 let's get back to a hash tables and all the fun that they are. Yeah. Sorry about that. And so quickly, let's talk about like the pros of the hash table. Uh, the speed number one, by far and away,

Starting point is 00:44:39 the speed of the hash tables, uh, yeah, the reads ignoring collisions. You're generally going to consider that an o of one operation um if there is a collision the read write time can be reduced to n divided by k where k is the size of the hash table which can just be reduced to o of n don't you love big o notation just throw everything away. There should be like a whole big O math.

Starting point is 00:45:06 It's basically drop all the constants, right? Oh, you had some operation in there? Eh, don't worry about it. Whatever. N divided by K. Assuming a good hashing algorithm is used, it's usually going to be

Starting point is 00:45:21 O of 1, like we said. But this assumes that by the good performance of the hash, that, uh, that by good, the performance of the hashing algorithm has been considered. So to Joe's point, uh, if your hashing algorithm is going to take a second and you have to put a thousand things in this hash table, that's not going to be a good hashing algorithm. So, yeah, I mean, you can't blame the idea of the hash table at that point because you have a bad performing algorithm. So even if you do have a slow algorithm, it might still be O of one, but it may be slower than the alternatives.

Starting point is 00:45:58 And I think that's worth explaining, right? It's O of one because the lookup operation itself can go directly to it, but the hashing behind it is ridiculously long and tedious. All right, so to the cons of the hash table. So depending on your language, I'm looking at you, C Sharp, the hash table type is loosely typed. We'll come back to that in a moment. The cost of the hashing function can be more than just looping over the list, especially for few entries. So, you know, if you only have two items in your hash table, did you really need a hash table? Right? I mean, probably not. Maybe. Probably not. Because the hash table entries are spread around, there is a poor locality of reference, which can trigger processor memory,

Starting point is 00:46:54 processor cache misses. So, you know, again, if you're trying to squeeze every bit of performance out, cache performance can also be poor or ineffective depending on the implementation such as separate chaining and uh the performance can degrade when there are many collisions which you know we pretty much already covered that all right so when should you use a hash table so uh now if we are talking about specific language implementations of a hash table class or type, then don't. At least not in C Sharp. You should prefer the dictionary type over the hash table type. Every time.

Starting point is 00:47:37 And not in Java either. You should use the hash map. Isn't that interesting? Yeah. Because it's strongly typed, and so the alternative is if you use this hash table, then what happens is you're going to get an object pack and you're going to have to cast it. So you're losing the benefits of having your compiled language,

Starting point is 00:47:54 right? There's no sort of checking around that. And, you know, potentially you're doing something wrong or getting an error. So that kind of stinks. And if you're storing simple values, it's even worse because you're going to end up boxing them. So you're going to take these simple values that would take, say, 32 bits for an integer, and then you're going to stuff them in a 64-bit reference and throw them on the heap. And then now you're taking up whatever 32 times 3 is. Well, not just taking them up.

Starting point is 00:48:14 Now you've also got the garbage collection and everything else on top of it, right? Yeah. And totally unnecessary if you just use the strongly typed version. Right. So if we're not talking about the specific language implementations, then when should you use the hash tables? Okay, so anytime you need an associative array. So by that I mean you want to have an array, but you don't want to necessarily look up items in the array by their index, you might want to be able to say like, hey, in this array of people, I want to find the Allen object, right? That's what I mean by the associative array. There were other examples that were given too that I found that were like database indexing or another good one too would be cache. If you wanted to build your own cache, you could use a hash table behind the scenes so that when your user wants to add something to the cache or look up something from the cache, you hash the value that they give you as the key.

Starting point is 00:49:17 Then go to that point in your hash tables array and find it. And then sets was another option when you might want to use these. I think it's worth calling out cache specifically, just because that's a very common use case for it. If you're doing like something, a REST API, like one common approach might be to get the REST call and you take those arguments and you kind of put them in one big string and say, have I seen this in the last 30 seconds or something like that? And then if you've seen that, if you've got that object in the hash table memory, then you can go ahead and return it without going and hitting a slower service. And so it's really common to use it in those types of scenarios.

Starting point is 00:49:52 And it's all about coming up with and managing those keys. Then on your end in the background, this should be using a really fast, efficient data structure for storing that data. This episode is sponsored by Datadog. You've heard us tell you about Datadog. You know they're a software-as-a-service monitoring platform that provides developer and operation teams with a unified view of their infrastructure, apps, and logs. But did you know about these features? Like Watchdog. Watchdog automatically detects performance problems in your applications without any manual setup or configuration. And by continuously examining

Starting point is 00:50:25 application performance data, it identifies anomalies like a sudden spike in your hit rate or something that could otherwise have remained invisible. So once an anomaly is detected, Watchog provides you with all the relevant information you need to get to the root cause faster. Things like stack traces, error messages, and related issues from the same time frame. Or what about trace, search, and analytics? Trace, search, and analytics allows you to explore, graph, and correlate application performance data using high cardinality attributes. You can search and filter request traces using key business and application attributes,

Starting point is 00:51:02 such as user IDs or host names or product SKUs so you can quickly pinpoint where performance issues are originating and who's being affected. Tight integration with data from logs and infrastructure metrics also lets you correlate these specific trace events to the performance of the underlying infrastructure so you can resolve the problem quickly. And don't forget about logging without limits. Logging without limits is the thing where you can cost effectively process and archive all of your logs and then later decide on the fly which to index, visualize, retain for analytics and Datadog. Now you can collect every single log produced by your applications and infrastructure

Starting point is 00:51:41 without having to decide ahead of time which logs will be the most valuable for monitoring, analytics, and troubleshooting. Datadog is offering our listeners a free 14-day trial with no credit card required. And as an added bonus for signing up and creating a dashboard, they'll send you a Datadog t-shirt. Head to www.datadog.com slash codingblocks to sign up today. All right. So it's that time of the show when we ask you, if you haven't already,

Starting point is 00:52:09 please do go leave us a review. You know, we say that it puts a smile on her face and it totally does. And actually somebody left us reviews. They left the review to put a smile on her face because we put one on there. So, you know,

Starting point is 00:52:22 that's awesome. I mean, wasn't that nice that, that, that put a smile on my face. Imagine know, that's awesome. I mean, wasn't that nice? That put a smile on my face. Imagine that. So, yeah, I mean, if you ever find yourself and you're in front of your keyboard or on your phone and you're bored

Starting point is 00:52:33 and you're thinking, hey, you know what? I meant to leave those guys a review. Please do. It makes our day and we really appreciate it. And it's a nice way to give back. So thanks. Yeah. And also to,

Starting point is 00:52:47 you know, leave it. We greatly appreciate the reviews like Alan said, but you know, you could also spread the word, share coding blocks with a friend, tell, you know,

Starting point is 00:52:55 let one of your coworkers know about the show. You know, expose more people to it. And it inevitably you're going to say, yeah, there's this podcast and somebody's going to look at you sideways and be like, what's a podcast? And then you can explain to them, oh, it's really not that big of a deal, right? So, yeah.

Starting point is 00:53:12 It's even a good icebreaker for people that haven't been introduced to the podcasting world. It's like TiVo for radio. Wait, what's TiVo? And then you can turn them on to Serial and Hardcore History and a bunch of other things that they'll love. So, yeah. Wait, Serial? You could definitely be a life of a things that they'll love. So, yeah. Wait, cereal? You could definitely be a life of a party. You're like, oh, God, here comes the podcast guy.

Starting point is 00:53:30 He's going to tell us about Koenigsegg again. The podcast guy. I've been that guy. All right. Well. Whoa, excuse me. Yeah, you got to be ready to do this. You can't mess this part up.

Starting point is 00:53:43 I can't mess this up like my little John impersonation. So this is time for my favorite portion of the show. Survey says. All right. So in episode 94, we asked, what do you value most in a job? And your choices were pay. It's all about the Benjamins or tech stack. I need to remain interested or commute or the lack thereof. I mean, I love listening to coding box and all, but a new set of tires every month is ridiculous or location, location, location. It's just like real estate. Or team, I need to be

Starting point is 00:54:29 surrounded by people better than me so I can grow. Or industry, I want the type of problems I solve to matter to me. Or benefits, I like to take off for the summer. And lastly, work-life balance. I have a life outside of the office. All right, I believe Joe went first, I think. So let's start with you, Alan. What do you think? Man, this one's tough because there's a lot of, like, I'd say a lot of these, at least personally, speak to me. I'm going to go with, for most people,

Starting point is 00:55:08 pay. It's all about the Benjamins. I'm going to say 30... 35%. Okay. I like where your head's at. For me, I think the location's going to be the biggest factor. Either

Starting point is 00:55:23 wanting to work from home or having something where I'm going to guess that most people don't want to move unless there's plenty of Benjamins. But I'm going to say location at 30%. You both are committed to your choices. Alan with pay at 35% and Joe with location at 30%. I have that right? That's correct. Drumroll, please. You're both wrong. We're both wrong, really?

Starting point is 00:55:49 I'm super surprised, and I'm curious to see where you're going to come in on this, Alan, because you were super opinionated, and we're just dying to spill your opinion on this at the time when we were doing episode 94, if you recall. But, no, work

Starting point is 00:56:03 life balance, far and away 30 you know everything else was kind of scattered among it but yeah it was it was clearly you know the one that walked away with it that one i thought would be up there i didn't think it'd win and i wonder if that's because everybody's already paid well. So now they're on to the other thing. Yeah, you listen to this show and you're an A-tier. So you're already making the bucks. So yeah, it's about that work-life balance. If you're listening to this podcast, this two-and-a-half-hour podcast, while you're on your downtime, while you're commuting or washing dishes or whatever,

Starting point is 00:56:38 then yeah, chances are you're working too much. Yeah, it's – I don't know, man. For me, commute's a big one right living in in the atlanta area and knowing that you can give half your life away to the uh gods of the highway so that's what we're calling them now the gods yeah i guess um yeah i honestly thought that i smite thee what was number two so if work-life balance was number one at 30, you said. Yeah, I mean, then from there, tech stack was surprisingly number two at 21%. Wow.

Starting point is 00:57:14 That's shocking. Team was close, though, at 20. Really? So there wasn't a big difference between those two. And that one was so hard for me because I'm like, really? Because how do you judge the team? Unless you know because they're already friends of yours

Starting point is 00:57:31 or people that you've met or whatnot and so you kind of already have an idea of who they are. Unless you're just making an assumption like, oh, it's Google. So I just assume they're going to be like, you know, a bunch of smart people,

Starting point is 00:57:41 for example, that, you know, but otherwise it's like, well, how do you know who the team is to make that, to use that as your, your determination? I don't know. I, I personally thought this was going to be a silly response, a silly survey. I thought far and away the winner will be pay.

Starting point is 00:57:59 It'll be like 98% pay and like two people are going to make a joke about something else right but i i was really surprised to see that you know pay didn't rank as high it honestly you know what and maybe this is just something about our audiences i'm wondering if a lot of it really is people are getting paid pretty well because they've taken the time to invest in their in their skills by listening to podcasts or reading books or doing courses or just constantly improving. So that's almost like, yeah, that's coming, right? Now I need to focus on what else is important to me. Oh, you know, if you're in our Slack, then you can go to the salary survey channel.

Starting point is 00:58:45 And we've got a little survey. So you can go ahead and you can already see what people put in there. You know, it's of course is all anonymous. So you might be interested in that. Yeah. If you're not anything, what was the lowest though? I have to know.

Starting point is 00:58:55 Yeah. I'll tell you, but I didn't want to like just finish up this one thought though, along that line though, you kind of hit on something that maybe it's just our industry that, you know, maybe we're asking the wrong industry. So maybe people within our industry, these are the things that would matter. Like work life balance may rank higher.

Starting point is 00:59:19 But if you were to go maybe across industries, then maybe pay would rank more. You know, you know, if I guess what I'm saying is like if the industry was more general purpose or across all of the industries, then maybe pay would rank more higher or maybe I wouldn't be surprised. But to your question, benefits was the last. And I think that's what you would have said. That's what you were wanting to say last time, Alan. Am I wrong? Was benefits not the… I don't think, no. I think for me it was probably going to be

Starting point is 00:59:45 either commute or pay i was pretty sure it was going to be one of those two oh really i think i swore it's going to be benefits yeah i mean the benefits thing is always i don't think it's ever anything somebody's uh like striving to get the best benefits although it could make or break the deal when they go to a particular spot right so i don't know yeah well that's really interesting so yeah good to know all right well and by the way you can see that whole pie chart by going to the show notes and voting then you'll be able to see actually right after you vote episode 94 all right uh no this oh for that one yeah that was episode 94 yeah and all surveys that we do once you take the survey you see the results of it so you don't have to wait so long to see the at least a few results although it's much funner to hear it so yeah sure all right all

Starting point is 01:00:38 right so i was at data psycon last week and uh they had obviously there were all kinds of amazing talks but there was one little thing that i was like oh this is gonna be so fun i gotta save this for the show so a little fun quick game that i wanted to play with you too just to talk about uh the magnitude of data so this comes from i'm not going to tell you the source yet because I know Alan's already at the keyboard. He's going to go look for it. I would never. What? I just saw him reach for his mouse.

Starting point is 01:01:13 But so if we talk about – if I were to ask you how much data is generated every minute, All right. And this is as it relates to 2017. How much data was generated every minute? Let's talk about first. What would be a big one? Okay, here's a good one. Let's pick Amazon. How much money in sales did Amazon make every minute for 2017? Golly.

Starting point is 01:01:50 Oh, it's going to make you sick. $100 million. $100 million every minute. In sales. We're not talking about revenue. We're talking about sales. Yeah. Joe?

Starting point is 01:02:01 A million a minute. Wow. Jeff Bezos would really like you guys his number. It was $258,000 $751.90. I way overblew this one. I figured the costs were high.

Starting point is 01:02:17 You guys overshot it by a multiple of four. Everyone at Data Science Con is shaking their head right now. It's like, you guys. If the rest of these are going to go like this, you're going to be depressed by every answer I give. So I'm just going to go ahead and throw that out there. All right. We're going to shoot further down.

Starting point is 01:02:32 Okay. 800 petabytes. Okay. No, sorry. Go ahead. How many tweets per minute did users send on Twitter? God. $400 per minute.

Starting point is 01:02:50 Wait, what was the question? 750,000 tweets per minute. 750,000? Per minute. One million. 456,000 tweets per minute went out across Twitter. That is pretty crazy.

Starting point is 01:03:04 That's a lot. I understand tweets. I don't across Twitter. That is pretty crazy. That's a lot. I understand tweets. I don't understand dollars. Here's a couple of interesting ones. What about how many spam emails do you think were sent every minute of every day in 2017?

Starting point is 01:03:24 750,000. Is that going to be your answer from now on? How many spam emails per minute? $10 million. No, actually, I'm going to say $100 million spam emails per minute.

Starting point is 01:03:40 Absolutely. I at least get a million a day. Well, I'm going to put it to you like this and i'm gonna break your heart because he is way closer than you were really way by orders of magnitude closer it was 103 million 447 520 spam emails sent every minute of every day for the year 2017. Dude, that's ridiculous. How is the internet even running? Because that server's relaying that stuff. Let me put this into context for you.

Starting point is 01:04:18 This is really going to make you feel depressed about the use of the internet. There were, for Google searches conducted, there were 3,607,080 Google searches for every minute of 2017. Wow. And it was a factor of 30. There's 100 million more email spams being sent than there were Google searches. Good Lord.

Starting point is 01:04:46 That's crazy, right? That's ridiculous. Let me tell you, if they use my definition of spam email, which is just about anything that wasn't sent from a human to a human, then it's probably way higher. So here's one that I didn't expect to see. How many forecast requests do you think that the weather channel received? Ooh, just for total for 2017 for every minute we're doing every minute. Every one of these questions is going to be relative to a minute.

Starting point is 01:05:18 Uh, 500,000, 500,000, two mil. I gave you guys a hint. By saying that I wasn't expecting this one. It was 18,055,555 requests every minute. Jeez, man. I got three devices on my desk on right now.

Starting point is 01:05:42 Four that are all probably checking the weather right now. Right. All right. I don't are all probably checking the weather right now. Right. All right. Let me, let me, I don't want to go through every one of these. I'm going to like pull out a couple more. So I got three last ones that I want to say.

Starting point is 01:05:52 All right. YouTube videos watched per minute. In 2017? Yep. Watched per minute. $5 million. $400. $1 million.

Starting point is 01:06:15 Joe wins. $400 was the correct answer. Yes. No, you're pretty close, Alan. $4.1 million. Nice. Yeah. And that number is just going up yeah sorry 400 is how much they pay

Starting point is 01:06:28 out per year to uh people who post their videos there what about what about uh text messages sent 10 million per minute. 10 mil. Oh, you're both going with 10 million. Okay. I like it. 15 million text messages per minute. Wow. I'm rounding down because there was some more to that number.

Starting point is 01:06:59 Last one that I'll say, I'll include a link to this in the show notes, but the last one that I've got here is, how much data did Americans use of internet data? Per minute or like total? No, every one of these is per minute for every day of 2017. What was the question? How much? How much internet data did Americans use every minute of every day of 2017? I don't know the number. It's like 250 Potterbytes.

Starting point is 01:07:39 Petabytes. Potterbytes. No, I'm the one after that. Oh, the Harry Potterbytes. Harry Potterbytes. I'm going. No, I'm the one after that. Oh, the Harry Potterbytes. The Harry Potterbytes. I'm going to go with 10 terabytes. Okay, so they put the number in terms of gigabytes, and so that's how I'm going to read the number. It would be 2,657,700 gigabytes every minute of every day.

Starting point is 01:08:08 That's what I said. I know, Joe, but I couldn't give it to you like that. I had to take some of the steam away from you. Yeah, the translation is magical. That's crazy, though. That's crazy. Right? Yeah, I thought you guys might enjoy that a little bit.

Starting point is 01:08:21 So, again, there'll be a link to that in the show notes so um you know the listeners can play along at home hey hey so you said that was two million gigabytes right yes that's two petabytes is what that is i just oh yeah google translate so just in case you the next one after it is not the harry potter It's the Pebibyte. Never heard of it. So the Pebibyte. Yeah, because I should have thought about that. Yeah, you're right. Yeah. So next time you're mad at your cable company or, you know,

Starting point is 01:08:52 wherever you're getting internet access, just remember what they're putting up with. Oh, you know. You're paying for that, man. You get all of it. Oh, yeah. I'm shilling. They're sliding me them dollars under the table for

Starting point is 01:09:05 repping. All right, so we'll wrap this up by saying that today's survey will be, we're heading into the holiday season, so you got to start making some important decisions here. Namely, how do you plan

Starting point is 01:09:22 to spend your time off this holiday season? And your choices are spending time with the family because the holidays are all about the memories. Or I'm not avoiding the family. I'm building my next great project. Or escaping the family and the keyboard? Or lastly, wait, what time off? You have, have you guys got in your minds what the answers are going to be? You can't ever do that. We're not going to tell it.

Starting point is 01:10:03 Just wondering. All right. Well, I know my family's not listening. So I feel like it's okay for me to say that I plan on spending a lot of time with my computer. Because the computer's always listening, by the way. So you should never talk bad about it. That's right. Hey, you guys want to hear a roof joke?

Starting point is 01:10:21 Okay. The first one's on the house. Oh, God. Oh, yeah. That's right. There were some jokes that we had, too. That was really awful. That's great. What do you call a pile of kittens?

Starting point is 01:10:36 I don't know. Perfection? A meowton. Oh, boy. Here's one why are teddy bears never hungry because they're always stuffed oh boy well in the interest of being done

Starting point is 01:10:55 before three hours this episode move along these are ones that you can share with your family they're safe this episode is brought to you by Manning now I just purchased the physical book These are the ones that you can share with your family. They're safe. This episode is brought to you by Manning. Now, I just purchased the physical book, Kafka in Action, from Manning.com, which is really nice because it also lets me access the e-book immediately.

Starting point is 01:11:22 And I saved $18 because I used the code CODBLOCK40. And best of both worlds because I get the physical and the digital. But they don't just have books. I also recently watched, again, Zach Braddy's React in Motion course, which is really great. And he's a really funny and smart guy. So that was really awesome. And they have this really cool feature that I don't think I've seen anywhere else, where as you're watching the video, it actually, you can highlight the text. So you can see the words as you're listening and watching what's going on you can highlight the text so you can see the words as you're listening and watching what's going on on the screen there. And it works even if you speed up the video, which is something I definitely do. Manning is running a special promotion this

Starting point is 01:11:54 December. The countdown to 2019 will run on manning.com all the way through the end of December. Answer just a single question every day and you'll be in the running to win free eBooks, videos, and even a whole year's worth of new releases. Plus every week, everyone will get to enjoy massive discounts on Manning products. All you need is to sign up on Manning's deal of the day at manning.com slash mail dash preferences, and you're good to go Again, that's www.manning.com slash mail dash preferences to sign up. And be like Joe. While you're at manning.com, take a moment, shop around, find the next great book that you want to read and use the code C-O-D-B-L-O-C-K-40 to save 40%. All right, so now we're going to talk about dictionaries. And much like the hash tables, actually, they're pretty much the same thing as hash tables.

Starting point is 01:12:52 The main differentiator being that hash tables, the definition in the Wikipedia computer science definition of the hash table data structure, the dictionary holds key value pairs and those values are untyped. So a dictionary is the same thing, except we know what type it is. And so there are a couple little shortcuts that we can take because we know the size of that ahead of time.

Starting point is 01:13:14 And like we mentioned earlier, when we talked about hash tables and C-sharp and Java, there's no good reason that I'm aware of to use one of those untyped data structures when you have a strongly typed option available. If you're working in something like JavaScript, you don't really have this sort of differentiation. It doesn't make sense because it's a obliquely typed language. But in something like C Sharp or, you know, I don't know what C++ has available here,

Starting point is 01:13:43 but anything like that, you're going to want to use that strongly typed option if you can get away with it. And you pretty much always can. I don't think I've ever used a hash table in C Sharp. What about you? Ever? Oh, definitely. Yeah, back in the C Sharp, I think it was before 2.0 days. Yeah, they didn't have dictionaries.

Starting point is 01:14:04 They didn't have the typed ones. So all you had were hash tables back in the day. Yeah, and you just cast it on the way out. Yep. Yeah, that stinks. Yeah, and so they even look really similar, you know, and in C Sharp you would do like a hash table, or var my thing equals new hash table, no types there.

Starting point is 01:14:23 Dictionary, we've got generics, which we mentioned. And if you're not familiar with generics, basically you've got these cool angle brackets where you specify the types. So you say a dictionary, and my key is, say, an integer, and my value is a string. And you can change those items into brackets. So it's the same class underneath. It's dictionary, but you're changing the strongly typed arguments

Starting point is 01:14:42 that can go inside of it. So we could say a dictionary with a string is the key, and a person is the value. And you can get even crazier. You don't have to use primitives for that first type. You could say my dictionary has keys of a person, and the values are strings. So kind of interesting.

Starting point is 01:15:02 I don't think I've ever used a complex object as a key before I don't know is that well I mean would you call a string a complex object though uh no I mean it's not a primitive so I guess I guess it is I definitely use strings there I was just trying to think like a customer object or something like that I think I have. I'm pretty sure I have. I can't think off the top of my head why, but something like that is kind of where I'm thinking. But yeah, if you had something like a customer object or an order object, right, then maybe you had something that you were trying to associate with it in your dictionary. So if I've got like a sort of code that takes in a bunch of customers and it counts up the number of orders, then I might have a hash table. And as I loop through the orders, I might say, okay, this customer, if I've seen him before, just go ahead and add to the number of orders.

Starting point is 01:16:00 If I haven't seen him, go ahead and initialize that spot in that, the dictionary with a value of one for that order. And so as I loop through the orders at the end, I'll end up with something that I can say, Hey, dictionary at a customer, Abigail, they have three orders. That's going to be a fast lookup, which is really nice. And so in that case, I guess it is nice to be able to use that complex object. So I can just say, just, just take the whole customer object. And then that way I don't have to have have a separate data structure where I say, okay, the dictionary has the string Abigail that uniquely identifies this customer. Then I go over to some other dictionary and then look up the object based on the key of Abigail there again. So that would be pretty awkward.

Starting point is 01:16:41 So I guess it's really nice. I've just never really thought to do that. So I'm kind of curious, because the way you started with, like, the hash table versus the dictionary, right? And it was about, like, specific implementation, though. Like, specifically in, like, a C-sharp implementation, right, where, like, hash tables, you know, are based on object and dictionary are based on generics, right? But what about – well, for example, in the imposter's handbook, right? Like he says the specific difference was that the dictionary, I don't know how, but is able to guarantee that it's going to have a unique index. Each item is going to have a unique index, so therefore it's O of 1. That was the performance gain of the dictionary over the hash table. So I can tell you a little bit about that. From what I've read, specifically in C Sharp, that is the case.

Starting point is 01:17:39 And I'm not too well-versed in the whole underlying implementation by Nuno that underneath the covers, there are differences with these two data structures. Wikipedia definition-wise, the only difference between a dictionary and a hash table is that a dictionary is a strong type and you should use it. Implementation-wise, if you look under the hood in the CLR

Starting point is 01:17:58 for C Sharp, then you just read about it. That's the way to really do it. We'll have some links there. The hash tables. That's the way to really do it. We'll have some links there. Is that the hash tables, that's the untimed versions, they use a rehashing scheme. And so we kind of talked about a couple of those earlier where it basically says, okay, the spot's full. Let me try rehashing it and find another spot.

Starting point is 01:18:18 And I keep doing that until I find an open spot. And then I plunk my item in. And the dictionary uses chaining, which is like the linkless idea where we say, okay, we've already come to this bucket before. Let's go ahead and chain the items. However, under the covers,

Starting point is 01:18:32 C sharp for whatever reason, I've never really found a good reason why they did this. But in the way they implemented their hash table, they made certain to have an underlying data structure that had one bucket

Starting point is 01:18:44 for every item in your hash table. And so they end up more or less guaranteeing that you always have a short number of items in that list. Does that make sense? So if you have 10 items in your hash table, you've got 10 buckets. If you have 1,000, then they give you 1,000 buckets and they hash your key down to one of those X number of buckets. And so it keeps the ratio really good or close to one, I guess. So the number of available buckets is always equal to the number of items that you're storing. I mean, where I was kind of getting tripped up mentally in my own head was just that in that the book that we referenced, The Imposter's Handbook, there wasn't a lot of detail about that implementation of it.

Starting point is 01:19:35 So I was like, well, that's a bold claim. I mean, I'm not saying that it's wrong, but that the dictionary has a unique key for accessing any given value. According to the book, there's not going to be a collision in the dictionary. So conversations about like chaining, separate chaining or open addressing or rehashing are moot because there's not going to be a collision. So what I read was basically that the length of the list that it uses to store the collisions is never going to exceed the bucket size because the buckets are always guaranteed to have the same number of buckets as you have elements. So it ends up basically keeping a good ratio a load factor i think they called it of uh nodes to buckets that ends up uh you know matthew matthew hand wavy stuff ends up keeping the ratio really low and so you can never have more items in the way i'm saying this right

Starting point is 01:20:38 you can never have a longer list than you have buckets. Hmm. And I'm going to do a terrible job of explaining this because it's way over my head and very complicated. But that's kind of the secret there of how they end up guaranteeing performance for the dictionary. They basically have a specialized algorithm where they keep a good ratio of the length of the list and the number of buckets. But as for how they do that, that's way over my head. So here's something interesting on stack overflow about the collisions with dictionaries and the fact that how they're handled. Basically, it uses Git hash code. So the answer is basically saying, you know, don't assume that whatever you could even do like an IL peak on these type of things and look at what the implementation is.

Starting point is 01:21:33 But don't think that that's what it's always going to be, right? Like they could change it in the next version of.NET how they implement it. But they're actually saying that if you have an object that's being put in there, the get hash code method must return the same value for the lifetime of the object. Right. So that's basically how they're ensuring that these things go to the right place. So the equals and the get hash codes are the important parts of this. And so when a collision happens, we store more than one element in the bucket, which affects the time we need to look it up because now we need to traverse this linked list

Starting point is 01:22:15 in order to find our item. Now, in order to make that look up fast, they keep the ratio of the number of the items in the list and the number of the buckets in the container low. That's the magical mathy part that I just don't understand. I've looked at the algorithm a couple of times and read a couple of articles and I don't understand why that ratio is important

Starting point is 01:22:35 or why it guarantees things are fast, but I did not see anything that said that you wouldn't have a list. Like in fact, I did read a lot about this special kind of load factor that definitely refers to the length of the list in comparison to the buckets. Wait, what were the nodes again? So I definitely.

Starting point is 01:22:53 What were the nodes again? You had the nodes and the buckets. It was a ratio of the nodes to the buckets. But what were the nodes? Load factor. The node is the load factor? The load factor. Well, the load factor is the ratio factor? The load factor. Well, the load factor is the ratio of the nodes to buckets.

Starting point is 01:23:10 It's the ratio of the length of the list to the number of buckets. Oh, okay. So if you want a one-to-one, right, is what you want. I got you. Yeah, yeah, what Alan's saying. Either that or you want it to be uniform is what we're saying. So if it was like going back

Starting point is 01:23:30 to the hash table definition that I gave earlier, right? Like if the underlying structure was an array, then at each index in that array, you want it to be a uniform thing that are in each of those index. So each one of those indexes is a bucket, right? And if it was like a separate chaining,

Starting point is 01:23:48 then you have a linked list of buckets per index in that array, right? So you want the same number of buckets being pointed to by each index in the array so that it's balanced, right? So the load factor should be close to one. And what I read is that the way that they kind of manage it and the way that they do the chaining is that it was so that they add a new bucket every time you get a new item and they do this in order to keep that ratio positive. As to how they guarantee things get split up correctly, I just don't, I plain flat out

Starting point is 01:24:20 don't get it. I don't understand what adding more buckets has to do with making things faster other than it improves this load factor number because the number of the buckets is now getting higher as the potential list of the length of the list is getting higher. So it keeps that ratio really, I guess, at worst one. But I don't understand why that's desirable. We'll have some links in the show, though. I mean, I don't think it's something we're going to figure out because it's definitely pretty hairy in there. There are some really nice articles on it. And if you understand it, Chumac84, I'm looking at you.

Starting point is 01:24:54 I could definitely use your help on trying to figure out why. Because what I'm thinking is there's a decision that Microsoft made here. And so we've got hash tables up here and dictionaries. And we're choosing to implement dictionaries, the one that came out later differently. So either that's because they figure out the chaining is actually better, but they didn't want to change the original implementation for backwards compatibility reasons. Or they said, you know what, because maps are sorry, the dictionaries are strongly typed. We can do a little bit of extra mojo that helps us balance things better.

Starting point is 01:25:25 And, you know, maybe allocate stuff in a different way. Like maybe that's why they can create those buckets of a uniform size and know it's going to be like, I don't know the answer to that. That's just me speculating. I never did find authority of sources that,

Starting point is 01:25:37 Hey, dictionary does it differently because it's better. I mean, they could, I am curious though. Cause like one of these has to be wrong, right? Because going back to the book, it was saying that it says the dictionary is exactly like a hash table, except it has a unique key for accessing a given value. And so collisions are not something you have to worry about with a dictionary. Right.

Starting point is 01:26:14 But yet, you know, to your point, like the documentation from Microsoft says, you know, it talks about employing alternate collision resolution strategies. And even I questioned, like, how could you guarantee a unique key? Like, how does that how could that possibly work? So my inclination is to think like, well, I guess the handbook might be wrong about that part or maybe I'm misunderstanding that part. But then on the other little, you know, I got like a little devil on each shoulder, right? And so then we're saying like, well, I mean, when Microsoft decided in.NET 2.0 to create the dictionary class, they could have just said, well, we already used the hash table class name. So we already have a class named hash table, and we want to have something very similar to it. So I guess we'll call it dictionary, but maybe they really implemented a hash table behind the scenes. You know what I'm saying? Just similar to how like in Java, they started out with the hash table and then they decided, I don't remember the version, to create a better version of it and they call it the hash

Starting point is 01:27:12 map, right? Well, I guess thinking out loud a little bit though, if you add keys, like if you knew up a dictionary and you try and add a duplicate key, if it's a number, it'll yell at you, right? Like it's like, no, you can't do that. If it's an object, though, it would have to rely on the get hash code. So I don't know. Maybe you don't have collisions for primitive type keys. I'm not sure. Did you see anything on there?

Starting point is 01:27:39 The computer sciency, though, definition. Well, if we're taking the way it's written here in the handbook as the handbook as the quote computer sciencey way right where there's not a collision right i mean then it's not yeah i don't know man i got i'm taking this thing at face value but maybe that part is not accurate i mean looking at the microsoft docs it doesn't sound like and i mean i don't know but i don't know how to make that. I'm trying to decide like what do I make of that? Are they talking about specifically from the name of their class, right, and how they implemented

Starting point is 01:28:11 their class or are they talking about like here's how the computer science-y definition of a dictionary. Yeah. Anyway. My takeaway is that the Microsoft documentation like clearly says like, hey, we do chaining for dictionaries and we do re-hashing for the hashes.

Starting point is 01:28:31 So everything I know about chaining says that it's using a linked list underneath. So maybe they're just getting a little loosey-goosey about what they define it to be a collision. Maybe they're saying it's not a collision because no matter what, you're always getting a pointer to a linked list back. And so you don't really know if you're colliding or not.

Starting point is 01:28:46 You basically get a pointer and you throw it on there. And so when you go to look stuff up, then I'm speculating again, but I don't really know why there would be discrepancy between the book and the documentation. But I have no reason to think that the documentation is wrong. There's several articles and stuff that I read specifically on that keeping of the list. Yep. So, I don't know. But it's really interesting. I think like any

Starting point is 01:29:11 language that you use, like if you take like a hard look at the data structures, like even ones you think you know, like arrays, for example, you're going to find some really interesting stuff underneath the covers. Might I point you to float? Yep, float. And you know, javascript has objects but they're basically like hash tables so you know that's something or

Starting point is 01:29:34 or also referred to as associated arrays do you guys see anything about uh associative arrays and how they differ or are the same is it just a synonym for hash? I think they're a synonym because actually when I searched for just, you know, if you go to associative array on wiki, so in.wikipedia.org slash wiki slash associative underscore array data or the dictionary type redirects there, associative container redirects there, map redirects there. Associative container redirects there. Map redirects there. So there's several other terms that are very much in line with what we're talking about, right?

Starting point is 01:30:12 So an associative array slash dictionary basically the same thing. Yep. Yeah, going back to the handbook for a moment, you know, we've had all this conversation about arrays in JavaScript, right.

Starting point is 01:30:26 And things like that. And it was actually saying like, okay, well now that you know about this, actually arrays under the covers are just a dictionary in JavaScript. Yeah. Sometimes. Exactly.

Starting point is 01:30:39 Until it's not. Yeah. And then it's like an associative array, which is a dictionary. Right. Yeah. And then it's an object which has like an associative array which is a dictionary. Right. Yeah. So as for the pros, it's basically the same as the hash table

Starting point is 01:30:50 except in static languages you can be a little bit more efficient because there's no boxing necessary. We did episode way back on boxing, episode two. So operations are safer

Starting point is 01:31:00 and the errors are caught at compile time and perhaps maybe there are some performance gains to be had because you know presumably the size of the objects that you're storing as keys and also as values. So maybe there's something there, although I don't have an authoritative source on that. Cons are the same as the hash too. And there's something about a class resolution strategy that I don't know about. Who put that in there? Was that you, Outlaw?

Starting point is 01:31:30 The class resolution strategy? What? Yeah. I didn't. Okay. That was some sort of – Maybe that was supposed to be like different conflict or collision resolution strategies and class got written instead?

Starting point is 01:31:48 I think maybe we had a collision on the Google Docs document and we have an errant line posted here. It's moving right along. When to use this? Basically, it's the same as a hash table. So whenever you need a hash-like data structure, but you want that type safety and you're working in a language that has it,

Starting point is 01:32:04 you need those fast inserts, those fast deletes, fast lookups. And another case I mentioned there was like if you have sparse data, so you don't necessarily want to pre-allocate the whole universe of what you might use if you know you're only going to be using a small percentage. And this is a really memory and efficient data structure that's still going to give you fast random access. Yeah, I mean, kind of going back to my summary of the hash table, though, like when to use the dictionary is like always. Almost always, yeah. You should prefer the dictionary over the hash table would be my opinion, at least in like a language like C Sharp, for example.

Starting point is 01:32:42 And one thing I don't think we touched on earlier is that hashes and dictionaries don't generally support ordering. So if you add the keys, Abigail, Alan, and Brad, and then you, uh, and outlaw don't leave you out. And then you say, okay, um, get me the keys loop over. I might get outlaw first and then Brad and then Abigail and then Alan, there's no type of guarantee built into that data structure definition on what you're going to get returned. Now, some languages like in particular, C sharp has like ordered dictionaries

Starting point is 01:33:10 and I ordered dictionary interface, in which case they basically end up storing a like some sort of list or array data structure that keeps those things in order that you sort them or that you insert them in so they can return those to you. But that's gonna be a specialized form of this data structure. But I just want to be a specialized form of this data structure. But I just wanted to call it out just to kind of highlight the fact that there are a lot of variants that you might find in different languages or for very specific purposes that are going to be similar to a hash table or dictionary that are going to be just a little bit different for whatever reason.

Starting point is 01:33:42 Yeah. So, you know, I have a summary here of like the dictionary versus the hash table as it relates to C Sharp. And much like Joe already said, the dictionaries are strongly typed, whereas the hash tables aren't. So I've got an example of how you could get yourself into trouble with the hash table by mixing those types. And in hash table, it would be perfectly valid code. Well, it would compile, it could work, but whether or not the validity of it would be what you actually want, we could argue. you but um yeah so that the dictionaries in c sharp are much like uh the java hash table where uh you would have the key and value types you know as generics hey i just had a thought what if you were trying to truly just store a collection of random things you know doesn't you didn't care

Starting point is 01:34:47 about strongly type that's a good point that's a good time that you might use the hash table over the dictionary oh yeah sure right where you want to have mix it kind of like in the last maybe was the last episode where i'd mentioned where you might have the array of pointers and each pointer pointed to something different yeah something like that. So a hash table might work there really well. Yeah. An example I like there is if you have an array and you want to say, get the unique items out of it. One strategy for doing that is to loop through that array and throw those items into a hash table. And whenever there's a collision or rather you try to store the same key twice, then you might say, okay, this item's been duplicated. Either kick it out or do something special with it. But in that case, all you really care about are the keys.

Starting point is 01:35:29 So you're throwing this thing into a hash table or a dictionary, but the value is what, zero? Is it true? It doesn't matter because all you care about are the keys. So that's kind of an interesting case there. Yep. Here was another, just going back to this Java versus C Sharp

Starting point is 01:35:47 the dictionary in C Sharp versus the hash table in Java both of them being typed it was kind of curious that in Java the hash table actually extends dictionary well it's hash map right in Java no no no wait I'm not talking about hash map I'm talking about hash table if you look at the class

Starting point is 01:36:04 you want to get confusing now, right? Because our whole buildup here is we've used hash table to build up to our understanding of what dictionary is. But in Java, hash table extends dictionary. I just tweeted at Larry Ellison. I said, yo. What's up with this? What gives Wikipedia says the dictionary comes after the dictionary

Starting point is 01:36:28 is strongly typed. Coding blocks on that. Yeah, that's right. Send your tweets to Larry Ellison. It's actually up on the Oracle doc, so you're absolutely right. The dictionary extends object

Starting point is 01:36:44 and the hash table extends dictionary to your point. Yes. You should in Java prefer hash map over a hash table. And I forget why there was like some performance optimization, um, that was made. I'm not enough in the Java world to be able to like speak to that with,

Starting point is 01:37:02 uh, any kind of authority. But, uh, last point I wanted to add here though, in this hash table versus dictionary kind of wrap up is that the JavaScript object we talked about already could be used like a dictionary. You can use it like an associative array. Yep. And I do many,

Starting point is 01:37:23 many times. We all do, especially for caching. We abuse it. Yeah, all the time. Would you consider that an abuse of JavaScript? No, that's a good use of JavaScript. No, that's just what we do.

Starting point is 01:37:33 That's the appropriate use of JavaScript. Appropriate use of JavaScript is to abuse it. That's right. Last episode, we talked about arrays and similars. You know, I'm not going to be implementing my own array but for linked lists like i i don't have a problem implementing my own version of a linked list over using like c-sharp's built-in one now i know about the built-in one i'll probably use it but it doesn't bother me to think about somebody recreating a

Starting point is 01:37:56 linked list or even a stack like you can really easily make your own stack so if you don't use the built-in one fine javascript fine just use Just use an array and, you know, push pop, whatever. You've got a stack there. Good enough. However, hash table is one that I will not be writing on my own because I want that really good distribution of those keys. And that's not something I'm confident in my abilities to do without spending a lot of time researching something that I'm really not interested in knowing that much about. So, yeah, we've got tons and tons of resources that we'll link to here in the show notes.

Starting point is 01:38:30 I mean, we've got a lot. And with that, it is time for my favorite part of the show. It's the tip of the week. Alright, and I'll start us off. We have a Slack. We talk about it all the time. Go to codingbox.net slash Slack and you can send yourself an info and invite.

Starting point is 01:38:48 Invite. Hop on in there. This is my favorite. It's a lot of fun. That's right. One thing I'll tell you, though, is because we have a lot of people, a lot of active conversations going on, we have our data truncated by Slack all the time. So you'll see a lot of channels that sometimes are just empty. And so you might join an empty channel and think it's dead.

Starting point is 01:39:04 But really, it was really active two weeks ago. So you'll see a lot of channels that sometimes are just empty. And so you might join an empty channel and think it's dead. But really, it was really active two weeks ago. And just three days have gone by and the slack lingoliers have eaten it. So one of those channels that sometimes goes dormant. Yeah, you like that, lingoliers? Yeah, okay, lingoliers. I thought you mispronounced it or pronounced it some other way. I'm like, wait a minute.

Starting point is 01:39:23 No, I thought it was pronounced lingoliers. Am I saying it wrong this whole time? Stephen King. He still hasn't answered my tweet. Anyway. We have these channels that sometimes look dormant, but sometimes they wake up with a passion.

Starting point is 01:39:40 And one of those is hashtag pet dash pictures. It's lit right now. So Jacob started the fire. He posted a picture of his pupper laying on his back. Arlene sent, uh, Oh my gosh.

Starting point is 01:39:55 Hold on. Sorry. I started watching it. Uh, Arlene posted a picture of a cat fetching a fork, which is ridiculous. Ridiculously cute. As it sounds like Critter had his cat on trampoline.

Starting point is 01:40:05 Anyway, it just went on and on and on. So the real tip is you should join the Slack. And specifically, you should join that Pet Pictures channel because it's lit right now. It's awesome. There's so many cute animals in there that I'm dying. But so far, they're only cats and dogs. And I know you out there,

Starting point is 01:40:22 dear listener, someone has a cute lizard or a squirrel or something and i want to see it and think about getting it while i'm working because we all need that positivity this time of year but wait wait logan posted a picture of a dog that looks very angry that was the best picture he could get of it it's an all black dog so he's talking about uh how i love this channel he's talking about how it's hard to get a good picture of it so yeah so uh yeah look at the teeth on that dog yeah it doesn't look happy it looks kind of like your cat about the man's dog come on i'm not talking smack it just looks really unhappy you need to pet your dog logan pet your dog oh boy all right well i'm gonna i'm gonna

Starting point is 01:41:01 go warn them in the pet pictures channel about this episode coming out. Like, Logan, listen, your dog comes up. All right. I'm sorry to say. I guess. Am I next? Yeah, I'm next. All right.

Starting point is 01:41:21 So I don't know how I've been on the interwebs as long as I have and been as interested in big data as I have. As a matter of fact, I got a question or we all got a question. I think I'm the only person that responded that was, you know, what are your current interests? You know, like we know, we hear that you're C-sharp guys or whatever. And so I posted some of my interests up there. And a lot of them revolve around this whole big data and just data in general, right? I'd never heard of Grafana. You guys?

Starting point is 01:41:45 Sounds familiar. A little bit. Just because I was looking at the search talk I did a while back. That was a really good example of visualizations based on time series type stuff. So I know that it goes hand-in-hand with Prometheus. It's the only time I've ever really seen it. Yeah. Yeah.

Starting point is 01:42:00 So this thing is awesome. It's an open source package that you can hook up to up to 51 different types of data sources. 51. That's like a lot. Yeah. And you can literally just build dashboards on the fly. So if you've got some time series data or something, you want to be able to visualize it, hook this thing up to your Kafka or your Elasticsearch or whatever you want. Like he said, Prometheus, um, cloud watch, you know, probably any Azure alert type things.

Starting point is 01:42:31 And you can actually drag a panel up there, drag a chart up there, hook it up, tell it what the columns are and boom, you have this thing that you can go look at. And I believe it's even got alerts on it. I can't remember, but yeah, totally. This thing's really cool. on it i can't remember but yeah totally this thing's really cool and it looks like a really quick way to be able to start visualizing any kind of big data or even even just data that you have access to so check that one out should we answer now about the uh things we're interested in or why not i say you put the post up on the page um i don't remember where i put it though. I don't remember which episode that was.

Starting point is 01:43:09 But you can, yeah. I mean, you guys want to do it real quick what you're interested in thing? Is it fast? Is it fast? How many things are you interested in? I mean, like Python, data science, JavaScript, machine learning. Okay.

Starting point is 01:43:24 Like those kind of topics. I mean, topics we've talked about. They shouldn't shock you. Yep. DevOps. Yep. What about you, Joe? I don't understand the question.

Starting point is 01:43:34 Is that what we're interested in? Yeah. The types of things that you're currently like hot on learning about or just. Oh, yeah. I know that. All right. I totally know that. Well, tell us.

Starting point is 01:43:45 Oh, he has to tell us? No, hold on. I'm thinking. No, yeah. I know that. All right. I totally know that. Well, tell us. Oh, he has to tell us? No, hold on. I'm thinking. No, absolutely. I am very super interested in, I really want to focus 2019 on two things. Search engines. So, Elasticsearch, Azure Search, Algolia, stuff like that. I think you build a lot of really cool user experiences in it.

Starting point is 01:44:03 And the Jamstack, which we're going to be talking to you about really soon here. I'm really interested in Jamstack, and I'm worried about my little kind of middleware island shrinking year after year, and I'm deciding to join them rather than be beaten by them. So, running into all the things. Really? Jamstack? I swear

Starting point is 01:44:19 I thought you were just messing with me when you said that previously. No, man. That's my profile of them two. Man, I've been changing everywhere. The very first time he mentioned something about it, I assumed it was a reference to Dance to Die's jam article where he'd referred to Joe Allen Michael at the jam. Jam, yeah. So I assumed that Jamstack was something like that. Dance to Die hooks up.

Starting point is 01:44:42 Yeah. So you're totally on board with that. All right, interesting. Yeah, Jamstack and Search. All right, then. Software for humans. That's me. All right.

Starting point is 01:44:51 All right. So my tip of the week comes to us from Angry Zoot. Thank you, Angry Zoot. Is to add emojis to your file. Or, oh my God, if you use this in your code. In Visual Studio Code, by using window plus the semicolon, and it'll bring up a little window that you could select the emoji you want to use.

Starting point is 01:45:12 So you could say var foobar equals, and then put a smiley face in it. Oh, that's amazing. Or worse, you could do var smiley face emoji equals foobar oh that's amazing in which case i'm gonna be like why would you do this in your code but you can do this in visual studio code without any add-ons without any extensions plugins whatnot zoot you're awesome yeah uh or all my variables or maybe you're, because if Alan starts using this name as variables, I'm going to be looking at like, wait, smiley face plus frown face equals meh face? Oh, that's amazing.

Starting point is 01:45:55 Right? Smiley plus smiley equals super smiley? Oh, dude, this is going to be great. If smiley greater than zero. Yeah. Really, though, I am so on board with emojis everywhere and it's dance to die. It's actually the reason he made those, those ticket templates for QIT where it would say like,

Starting point is 01:46:12 you know, steps to reproduce or environment, you know, like that sort of thing, but he used emojis and it really broke it up visually. And so you could kind of look at this, what would otherwise be kind of a blob of text and see like, okay,

Starting point is 01:46:22 there's three parts to this. And you know, here's some visual indicators as to their meaning. And so now I see him everywhere. I'm MVP, uh, uh, Nicholas.

Starting point is 01:46:29 Um, he has a, uh, his advent of code, uh, code upon GitHub. And he's got a couple of stars there that represent the kinds of problems because you're rewarded with stars and it just looks really nice.

Starting point is 01:46:40 And so when you go to a page of GitHub, it's so text heavy and you're so used to seeing it. And all of a sudden you see this cute little pictures that kind of divide up the thing that you're looking into. Synthetical operations. Like suddenly, emojis aren't so crazy. That's really cool. I want to see where you're talking about. Where's the QIT one with the emojis?

Starting point is 01:47:01 Do you have a link? Just go create an issue. So if you go to github.com slash coding block slash podcast dash app and create a new issue, you'll see when you select the issue type that it gives you a template that asks you to fill in different things when you create the ticket. I want to create a new issue.

Starting point is 01:47:18 Yeah, and Dance to Die, man, he is the emoji master for sure. Oh, man, you want me to log in? What's that about? Hold on. You're not always logged into GitHub. Come on.

Starting point is 01:47:29 Oh, man. Oh, I still got to open pull request tags. Dude. All right. Well, so very coolness. Thank you. All right. Well, with that, I hope you've enjoyed this episode of Hash Tables versus Dictionaries.

Starting point is 01:47:52 Be sure to subscribe to us in case a friend happened to let you borrow their device to listen to this or if they sent you a link and pointed you to the right place. But be sure to subscribe to us on iTunes to hear more using your favorite podcast app. You can leave us a review if you haven't already. Like Alan discussed earlier, you can head to www.codingblocks.net slash review. And happy holidays to all, right? What everybody doesn't know is Joe Zach, while he seems like he's probably the most festive of the bunch of us, he is the Grinch of the three of us. I'm one of those warriors that you hear about that is uh actively trying

Starting point is 01:48:25 to destroy christmas sorry so truly kick him in the shins if you see him during this holiday season he needs some yeah i legitimately hate holidays that's that's crazy talk so anyways happy holidays to everybody and while you're up there at codyblocks.net check out our show notes examples discussions and more and send your feedbacks, and rants to the Slack channel because it's brilliant and awesome. There's awesome people in there. And you can get an invite there by going to codingblocks.com slash slack. Yeah, and Logan, pet your dog.

Starting point is 01:48:54 Pet your dog, man. You can also follow us on Twitter or head over to codingblocks.net where you can find all our social links at the top of the page. Love you guys. Thanks.

Coding Blocks - Data Structures – Hashtable vs Dictionary

Just in time to help you spread some cheer this holiday season, the dad jokes are back as we dig into the details of hash tables and dictionaries....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Coding Blocks - Data Structures – Hashtable vs Dictionary

Just in time to help you spread some cheer this holiday season, the dad jokes are back as we dig into the details of hash tables and dictionaries....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.