Coding Blocks - Data Structures – Hashtable vs Dictionary
Episode Date: December 17, 2018Just in time to help you spread some cheer this holiday season, the dad jokes are back as we dig into the details of hash tables and dictionaries....
Transcript
Discussion (0)
You're listening to Coding Blocks, episode 96.
Subscribe to us and leave us a review on iTunes, Stitcher, and more using your favorite podcast app.
And you can go to codingblocks.net where you can find things like show notes and examples and discussion and whatnot.
Did you just say uh twice in that sentence? That's it.
That's right, man. Take it.
Uh, well, Alan, it's your turn.
Alan? That's it turn. Oh, Alan.
Oh, that's it.
I'm walking away.
Alan, if we can, we can we change his name from Alan to Alan?
Send your feedback, questions and rants to comments at codingblocks.net.
Follow us on Twitter at codingblocks or head to www.codingblocks.net and find all our social links are at the top of the page.
With that, I am non,
uh,
Alan Underwood.
I'm Joe Zach.
Uh,
and I'm Michael outlaw.
I just vomited.
This episode is sponsored by Manning publications.
Manning is running a special promotion this December.
The countdown to 2019 will run on manning.com all the way through December. Answer
just a single question every day and you'll be in the running to win free eBooks, videos, and even
a whole year's worth of new releases. Plus every week, everyone will get to enjoy massive discounts
on Manning products. All you need is to sign up to Manning's deal of the day at manning.com slash mail dash preferences.
That's www.manning.com slash mail dash preferences.
And you're good to go.
Also, while you're up there, take a moment to shop around for your favorite books and use the code C O D B L O C K four zero to save 40%.
All right.
So as we like to do,
we'd like to start off each episode with a big thank you to all those who have
taken the time to leave us a review or written us.
So either which way.
So on iTunes,
we've got cross link JLA one 15 Ross four44Ross, Hutchie Bong, John Mabel, CodeInMyRobe, Saltashi.
I'm going to go ahead and do him some justice here because he went through the effort.
Sex and Edger, maybe.
And then Bubba Cow.
All right.
And Stitcher, we've got Jarvanen.
Sorry about that. Bubba Cow. All right. And Stitcher, we've got Jarvanen.
Sorry about that.
Illilville, Cole Israel, Sbgood, Maxim Bendis, and Nathan Ayer.
We really appreciate that.
Thank you so much for leaving those reviews.
Even though we butcher your names, we really do appreciate it.
So thank you.
You rock.
Wait, is that Ayer or Liar?
Oh, I don't know.
I don't know. I don't know.
Hey, by the way, I forget who it was, but whoever said that they learned more than one episode about big O notation coding blocks that they did about two semesters in college of it.
That's amazing.
And I feel you.
I was there.
You're going to be fun, man.
Wait, that's an option.
Yeah, right.
You guys didn't go to costco you all right a quick update from uh i think it was the last episode where we talked about
c-sharp strings and how they kind of dedupe the memory and use them as a performance optimization
the technical term for that is actually string and turning and i didn't realize
that's actually kind of common in a lot of languages. So you can read about on Wikipedia,
but I wanted to point out great tip from a two Mac 84.
He had a correction for us.
Uh,
it only works for string literal.
So those are going to be strings that like show up in quotes in your code.
So if you do something like,
um,
you know,
get date dot two string,
even though that may have the same value of memory,
it's not going to get the same benefits as you would as if you had that thing
in, um thing in quotes.
So I just thought it was kind of interesting, and he sent me a little code snippet.
That was really cool.
So really appreciate that.
We love getting that feedback, especially when it results in us learning something.
Yeah, I want to be clear there.
We're talking about hard-coded strings.
Yeah.
Okay.
Yeah.
That's basically what you're talking about. And by the way, we've gotten some really,
really good comments on the past two data structure episodes.
So,
I mean,
we're not going to cover everything that the people shared up there,
but there's some really good things about,
you know,
why unmanaged code is faster than managed code and some of the reasons behind
it and all that.
So,
you know,
if you want to go continue the conversation and learn some more, definitely
head up to, you know, slash episode 94 or slash episode 95 and take a look at those
because there's some great comments up there.
So.
Yeah.
So continuing on, it's time we're talking about a couple more data structures.
You want to kick it off outlaw?
Yep.
So let's start with hash tables.
So we've built up to here, right?
We started with, well, we started with primitive.
So don't let me talk to you about floats again because I will.
But as we got into the more advanced data types, you know, arrays kind of laid some groundwork there.
So now we understand that, you know, arrays have their place.
There's a lot of benefits to arrays, you know, as a way of keeping a collection of data.
But also like some of the performance tradeoffs, right, that you might get.
So they're great for lookups, lookups, you know, uh, random lookups,
but if you had to like insert something in the middle, not so hot there, uh,
versus the linked list on the other extreme, which were great for being able to insert or remove
items from the middle of a list. But if you had to search or scan that list,
they're not so hot, right? So what if we could live in some utopian world where we could get
the benefits of both? That'd be great, right? Well, that's where the hash table comes in.
So you get these kind of benefits with both of these,
uh, or with the linked lists and the arrays, you get that with the hash tables. Um,
but there's some trade off. There's still some caveats. There's some things to be aware of.
Right. So, um, yeah, so I had here like the hash table smashes these two worlds together and so uh hang on a
second when you say hash table are you talking about maps or dictionaries or uh yeah i've seen
these things used by different bunch of different names in different languages are we talking about
the same thing uh no no all right no i'm just talking about hash table well all right i mean yeah because
that's the other thing that gets weird too is like let's not let's not confuse it with the
talk of a language yet okay we're talking about like hash table data structures and
like classic computer science yes all right so because there there are, like, depending on your languages, you know, it might get a little bit more – the terminology might get a little bit more jumbled and mixed up, right?
At least in the C Sharp world, though, there are classes for both.
So there's a separate class for hash table, a separate class for dictionary.
Yeah, we talked, like, for an hour on arrays and JavaScript and how they're not always arrays.
So I'm not surprised the hash tables are no different.
Well,
just on the naming thing in on the Wikipedia article,
just to bring it up,
it is also referred to as a hash map.
So again,
not getting into the actual implementations,
whether it be Java or C sharp or any of those,
they are sometimes called one or the other hash table or hash map.
Yeah. So, um, I mean, specifically the MSDN documentation, they succinctly described that
hash table as the hash table class, as it represents a collection of key value pairs
that are organized based on the hash code of the key. And even while they might be documenting
their specific implementation of it, I mean, I felt like that was a pretty good overall description
of what the hash table is and what purpose it's trying to provide for you, right?
So how does the hash table work? So let's consider that at the core,
the core structure of the hash table is an array.
And within this array,
each element of this array contains an object that contains the key and some
value.
So it's,
it's not just a simple integer,
you know,
at that key it's, it's some complex type of object, right?
Now, if you're reading the documentation from a Wikipedia, they refer to it as buckets.
Some, I think in the imposter handbook, it was referred to as slots.
There's not really like a name that I really liked for any of these.
Like buckets super felt like a weird name.
Like why would you call it a bucket, not just like a name value pair or something else?
I don't know.
But a bucket seemed like a really weird name for what you would call that particular index within the array.
But so I digress. So
you have the underlying structure of the hash table as an array for a key value lookup.
The data that you want to read or write to the hash table, you're going to use that data
to create a hash. And that hash will serve as the index to the array.
Right?
And I heard once a good example when I first learned about hash tables, and they were kind of trying to describe the purpose of the hash function.
And the example that they started off with was not something that you would see in production, like a real language.
But the example I heard is, let's say you've got a word,
you've got English words, and you want to store them in a hash table.
You may take the word that you're trying to store,
take a look at the first letter,
and then store it in a location based on that first letter.
So if the word is Abigail,
then we're going to go ahead and take that first letter,
and be like, okay, this is an A, so let's go over here.
Next word, let's say, is.NET Core.
And we say, okay, that's a D, so let's put it over there with the Ds.
And then later when you come to look something up, say, okay, give me back Abigail,
then we could go and look in the As.
And so it's really fast for inserts.
It's fast for looking up things too.
So it's just a good data structure.
Now, in that example, it's a really bad choice
because if we add the word Alan in there,
suddenly we've got two words now
that both start with an A,
and so we've got a collision.
Yeah, so this is where, like you were saying,
the implementation of the hash table can vary.
There's multiple implementations of them.
And really what's changing behind there
is how they're handling collisions, the
decisions they're making.
So going along with your example there with Abigail and Alan, your hashing function in
this example that you gave is only looking at the first letter.
So it picked A for both of them.
There was a collision.
So how does it put the second one in? So this is
where like, let's think of it being another, a linked list. So you have an array of indexes and
then a linked list that a particular index, in this case, the index for A would point to,
and Alan would get tacked on to the end of that linked list.
So when you want to look up Alan, you're, you're going to go to the A's,
then you're going to scan through the A's and you're hoping that you're not
going to have to go deep. Um, you know,
you're not going to have like, you know, again,
Joe's alphabetical hashing algorithm is probably not the greatest one.
Sorry, Joe.
So you're going to have a ton of collisions in that particular example,
but let's pretend that you didn't because really the idea of the hash hashing
function is that the collisions are going to be rare is the hope.
I mean, they're not going to be, they're going to happen,
but you definitely want them to happen less often than,
uh, than not. So, so that's the idea, right? Or are we, can we agree to that structure,
an array for the index for the hashes and then a linked list for where the values are within there?
So is that every implementation of the, because like, I don't know, it's,
they talk about collisions and there's gotta be some sort of method to handle the collision.
So is that the standard way it's done? Okay. Let's talk about, okay. So you can't talk about
hash tables without getting into a big conversation about the ways to resolve these collision strategies, right?
So there's popular collision strategies.
And if you go to the Wikipedia page, and there are a bunch in there,
but what seemed to be like the two more popular ones were the separate chaining and open addressing variations. So in the separate chaining implementation,
you know, you would have those pointers, but when there's a collision, right, you'll traverse that
list, look for your key, right? And so this is basically the example that I gave a minute ago.
And, you know, assuming that you have a good hash function, you're rarely going to have more than three items in any given index.
So when you think about the – we talked about linked list being awful, just awful for traversal, right?
And that wouldn't be your greatest choice for doing any kind of sorting. But because
these are so small, remember the article that we talked about that we referenced
before, like everything is fast for small n, right? So
it's going to be good enough in this particular example. Because like if you
only have three elements, that's... Assuming you don't have a bunch of collisions, which you should
never have a ton of collisions if you have a good hashing algorithm. Well, which is also
another point too. Don't try to
roll your own hashing function, right? Like if you can just
use the library's hashing function, like if you had
to create your own hash table, then
if you could use a hashing function that's already built into
either the operating system or the library or whatever that that's available to you,
you're probably gonna be better off than if you tried to roll your own hashing algorithm.
That's where you're going to get into trouble and have more potential for collisions.
Yeah, you want to even distribution. So and you want like a wide distribution, too. So you want a lot of different values, because you want an even distribution, and you want a wide distribution too, so you want
a lot of different values because you want to keep those lists short. So the alphabetical thing is
terrible. You're not going to get that many Zs, you're probably going to get a lot of As,
and besides, there's only 26 buckets. So the chances of you having more than three items in
your linked list is really small, depending on how many numbers you're shoving in there.
I always did wonder what kind of hashing algorithms they were using.
I've never really seen a good example of like, oh, hey, here's how.NET does it.
It's based off the memory addresses or something.
So I just kind of wondered, like, with a hashtag with thousands and thousands of keys,
like, what algorithm are they using to divide those up evenly?
And one other thing to point out here, too, is while we're talking about this,
there's this whole notion of collision and
that's because the hash can't be perfect right right and there's there is this this notion in
the wikipedia article that's interesting it says the perfect hash function so that can only exist
if you know all the items ahead of time so that that makes? If, if you know exactly what your data set is,
then you can write the perfect hashing algorithm,
which is not the case in 99.999% of the time.
So,
so these collisions we're talking about happen because you have a very good
hashing algorithm that can't be perfect.
Correct.
And,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and,
and, and, and, and, and, and, and, and, and, and, and, and, and on another good point, too, that you want that ideally – okay, so going to your point about you're not going to have a perfect hashing algorithm you want, you want it to have an even distribution of where it's going to place things.
Because again, going with that assumption that you're not going to have any more than three items in a given index within the hash tables array, internal array, in order to make that kind of assumption, then that assumes that your hashing function does create a good even distribution when getting random data to it.
Hey, one other thing I wanted to add, I don't know where you finished.
Sorry.
No, go ahead.
I just want to say one thing that's really important, too, is that hashing function has to be performant.
So obviously, we don't want it to scale poorly based on like the size of the input, right?
We don't want to have like a big, long, slow hashing algorithm.
This thing needs to happen fast.
If you look at the big O cheat sheet or something and look at the lookup times and the insert times for hash table, it puts them at O of 1, constant time for fast lookup because it's assuming a good algorithm.
And it's treating that as constant time, which means that the bigger your hash table gets, it needs to keep about the same.
And here we're talking about linked lists, which we know the insertion there and lookup time is going to be O of N,
but they kind of cheat a little bit.
And so there's like a little asterisk in the Wikipedias of the world,
whenever they talk about hash tables, they're like, well, it's basically O of 1.
It's basically constant time for lookups and inserts in the normal case,
but that's really not so much the case for the worst case scenario. But it just happens so
infrequently if we have a good hashing algorithm, which is like this magical question mark that we
don't have to worry about it so much. And in practice, it works out really, really well.
And they say the average time is 0-1, right?
So the average.
So that's why you can get away with that.
And I'm sure Mike's going to get into some ways that it goes awry.
Yeah.
So, well, before we go any further there, let's cover another one of the popular collision strategies, which is called open addressing.
So in this implementation,
the element in the array are the buckets
themselves. So rather than the
previous one where the elements in the
array were pointers to linked list of
buckets, in the open
addressing, the
item that is in the index
is the item itself. Again, I hate the term addressing, the item that is in the index is the item itself.
Again, I hate the term bucket, but.
That's what everybody uses apparently.
Yeah.
So what this means is, let's say that, let's continue the example of Joe's amazing first letter hashing algorithm. So Alan and Abigail, we'll just skip, let's skip
Alan from when we only have Abigail in the list. And now Brandon comes in. So Abigail is in the
first array index. Brandon comes in, he goes into the second because it's a B. Well, now when Alan gets
inserted into the list, he would actually move Brandon down. So it would be Abigail
in the first index, Alan in the second, and Brandon in the third index.
So you see how this already got weird, right? The hash value only serves as a starting point for where you would look up the data from.
So meaning that during a write operation, if you go to that index in the array and something is already there, you would continue on looking for an empty slot. Now in, in the example that I just gave,
I kind of smashed,
you know,
uh,
Abigail and Brandon's indexes like right next to each other.
And in reality,
there would be some space there,
but,
um,
so that,
you know,
because what you're doing is when you get to that first index,
again,
you're the,
the index that you're, that you, your hashing function computes is just the
starting point. So once you get to that index and you're trying to traverse from there forward
through the array, looking for the next thing that you're looking for, the actual item you're
trying to get to, you're only going to look at the, you're only going to keep looking until you
get to an empty slot. If you get to an empty slot, then that means that you never found what you were looking for.
If you were doing a read.
And if you were doing a write, then that would be where you would put the item.
Does that make sense? So open addressing sounds
awful to me in terms of
performance.
Yeah, it just, I don't, I don't know.
I guess it's kind of unbounded.
It's like, how long does it take you to look up an item in the worst case?
So it's like, well, in the worst case,
we keep looking at every little chunk of memory that could possibly kind of fit this thing.
So it's, yeah, as you kind of start one spot, it's not there.
Either let me rehash it or let me look
at the memory spot next to it or something like that or you know some sort of method for determining
the next place to look and you just keep looking until you're out of places to look and yeah that
could potentially be pretty bad so that seems pretty pretty bad to me yeah it sounds bad but
it also sounds like even behind the open addressing thing, there's other, there's additional,
you know,
features or ways to go about doing it.
There's,
there's like several well-known ones,
linear probing,
quadratic probing,
double hashing.
And it's basically,
so you're not just going to the next one.
You might be chopping it up,
doing similar things like binary searches or something like that.
So you're trying to scan those indexes really fast, right?
Yeah.
I mean, I imagine that the performance of it isn't that awful because you're only using,
it's only for things that happen to hash to the same value, which again, you're hoping
that your hashing algorithm will be good and give you a good distribution that you're not
going to have those collisions often. So when you do, you're probably that your hashing algorithm will be good and give you a good distribution that you're not going to have those collisions often.
So when you do, you're probably not searching a lot.
So it probably much like the separate chaining gets away with traversing the linked list.
Open addressing probably gets away with having to traverse that array because there's not going to be that many collisions or at least that's the assumption.
Right.
Yeah. Assuming. So it's probably one of two things. Right. because there's not going to be that many collisions, or at least that's the assumption, right?
Yeah.
Assuming, so it's probably one of two things, right?
Either assuming that the hashing algorithm is good enough or that your data set isn't so large
that you're going to run into those collisions more, right, is my guess.
Yeah, because even if you, okay,
even if you had a lot of elements in the hash,
in the hash's internal array at that point, assuming that there's a good distribution, you're only going to be scanning a few items before you get to a space.
And you're going to be like, oh, there actually wasn't anything there. that that key is available if you you know you know that that's where like uh
if you try to get that value you're just going to get back a null or depending on whatever framework
you're using at the time uh however it decides to handle that type of situation cool all right
you know we i don't think we've really talked about um the what this looks like for the programmer
so i love hashes and and their ilk because of how easy it is to use for like a programmer so i love hashes and and their ilk because of how easy it is to use for like a
programmer so in javascript i you know i'm used to using it as an object there's different ways to do
that though but i basically i create my object and i throw the the indexer so the square brackets
are routed and give it some sort of key so i could say you know abigail is my key equals new person
and i've created a new person and it's stored in what I think of as
being the key of Abigail. And then when we tell the hash, hey, give me the object at Abigail,
and it's going to go, it's going to run the hash on that word Abigail there on my key.
And that's how it's going to figure out how to look up the actual person in memory that I stored.
It's pretty awesome.
So if you've got something where you want to be able to just kind of shove stuff in
and look it back up, then that's a great data structure.
I've been doing the Advent code this year for 2018.
And one of the problems I just solved, I ended up using a hash table.
And the reason I went with a hash table over an array because I was storing numeric keys
is because I thought the data might be sparse.
And this is something we talked a little bit about arrays.
So if you've got like a fixed or a known size,
say like, you know, 1000,
you could do an array with that.
But if you know that you're only going to be using like,
you know, a handful of indexes out of that 1000,
then you're going to be pre-allocating
a huge chunk of space in memory and only using, you know, 10% of it or some small number.
If you use a hash table, then you're only, you're only allocating the memory as you put the stuff
in. So memory wise, it's going to be really efficient. And the insert and take out times,
the insert and look up times are on average of one. So it's going to be just as fast as an array for those lookups on new numbers,
but much more memory efficient.
Turned out I was wrong on that, by the way.
It would have been much better to use an array in my case because it ended up
not being sparse.
So that's what I get for pre-optimization.
Yeah, like our previous conversation of like a list
versus array.
Right, yeah.
Use the list until you know that you shouldn't.
Right, exactly.
Okay, so
what are some
other strategies? So the Wikipedia article
has other strategies that they listed
for hash tables, but they didn't go into
quite as much detail.
But so I'll just quickly like say that some of the names like cuckoo hashing,
hopscotch hashing, Robin hood.
And then this one was interesting to me, the two choice hashing,
which kind of sounded like a rapper name.
You know, that would be like my rap name.
I'm too hash.
I'm two choice hashing, you know,
be right next to two chains and what's up?
But yeah, so that one was kind of neat because as the name implies,
actually it uses two hashing functions.
And so if during a write operation you find a location that is already in use,
it's going to use the location that would, the hashing
functions location that would provide the
fewest objects
already at that value.
So hash it one way, then flip it
down that and reverse it, and hash
it again, and see which one looks better,
and just kind of go with that.
It sounds kind
of awful from a developer,
like if you were the developer of that thing, though, of that two-choice wrapping hashing implementation, it does sound kind of awful to try to maintain that code because then you're like, I mean, just wrap your head around that for a moment.
Like, okay, I just put something into the hash's internal array.
How do I know which hash function was used
like just trying to keep track of that kind of thing like obviously they've solved the problem
someone solved it right because it is a real thing but i would imagine it has extra storage
needs behind the scenes right in order to map those things out they've got a hash table to
keep track of their hash table.
I feel like somebody came up with this at like the last second.
It's like minutes to the deadline.
They're like, oh, this is going too slow.
And I tell you what, just hash the stupid thing twice.
That's right.
And if one of them doesn't have an item, just put it there.
And then whenever we need to look it up, we'll just do it twice.
It'll half the time that we need to go chasing down this crazy chain.
Let's just go with it.
And they tried it and it worked and they shipped and now they're Google.
That's right.
Yeah,
exactly.
And to be fair,
when we said,
don't go writing your own hashing algorithms,
you know,
we say that if you're just,
you know,
writing some code,
obviously if you've got a real use case for it,
right?
Like you're writing
the next, you know, network layer topology type stuff, there's got to be crazy fast and you're
not finding anything that suits your needs. And then probably you will write your own, but we're
saying on general day-to-day type stuff, if you find yourself writing your own hashing algorithm,
you're probably doing something wrong, right? And it might be special cases where you know enough about your data set
that you could kind of make an informed decision about that.
And of course, if you were just wanting to learn about hashing,
it was like, I'm really interested in how the heck
they come up with an algorithm that works.
Like, this is not something I could do on my own.
There's some math was out there
who came up with this hashing strategy
that was really good and smart
about coming up with a good distribution.
So I'd love to know what that is.
So maybe one of these days I'll look it up and try to program it, just like from an academic
perspective.
But this is not the kind of thing that you want to do on Friday afternoon before launch
because, you know, you want to do that.
Just use whatever is built into your language.
Right.
Agreed.
So we've kind of already hinted on this, but I, just to go ahead and throw it out there,
like officially, you know, the, the complexity around hash tables and their usage. So on,
in terms of space complexity, the average is going to, the average and the worst are just
going to be O of N. So however many items you're trying to put into your hash table, there's your space complexity.
The insert, the search, searching the hash table, inserting into the hash table, and deleting from the hash table are on average an O of 1.
Which is as good as it gets.
Yeah, that's amazing.
And then the worst case is going to be an O of N operation.
So it's awful, but I mean, it can be worse.
And this is one of those ones, like we mentioned,
there's kind of an asterisk when you look up it a lot of times in the charts.
So often in Big O, we talk about the worst case scenario
because it's kind of that limiting function deal.
That's kind of what Big O is designed for, is to let about the worst case scenario because it's kind of that limiting function deal. That's kind of what Big O is designed for is to let you know worst case scenario.
But in this case, it's so much more frequently the good case.
Whenever you do some sort of lookup, it's like they don't want to scare you.
Like, listen, listen, just pretend.
Just get it in your mind that it's O of one.
And then if you have a problem with it later, come back here and notice the asterisk and then look at it deeper.
And you know something, just if you guys, I'm sure we've all seen situations where just hundreds of thousands of items have been crammed into a hash table and it doesn't perform faster.
You're like, why?
Because I would expect it to be really fast because this should be a direct lookup.
It should be an O of one operation, right? And then these underlying implementation details right here of when
there's collisions and how those things are handled, just imagine you're doing some sort of
lookup or you're looking up many items in that thing and it's having to go to it. But then it's
like, Oh no, now I got to scan the rest of the contents in here because there were multiple
collisions. That's probably why you end up running into those things.
So it's nice to know these things behind the scenes that are happening for you that you probably never even think about.
Yeah, or now that if you do have a slow-performing hash table usage, maybe you have some target to go after.
It'll be like, well, huh, maybe there's something about my data that I'm not
getting the uniform distribution that I thought I should have been getting. But just to point out,
though, that when we talk about the best case or the average case, I'm sorry,
time complexity being an O of 1 versus the worst case being an O of N. I mean,
O of N was like not far,
like O of one was a flat line and O of N was like,
it's not terrible,
slightly,
you know,
it was like maybe a three degree,
four degree line.
Like it wasn't a big line in comparison.
I mean, we're not talking,
it was,
it was definitely still at the bottom of the yellow on like,
if you were to go to big O cheat sheet.com, right. O of N is there at the bottom. the yellow on like if you were to go to bigocheatsheet.com right o of n is
there at the bottom so in your like quote bad cases like it was the best of the bad right all
right it still kind of stinks though like if you've got a algorithm that takes one second
and then o of one with a thousand inputs it's still one second but in o of n even though it's
great a size of one of one thousand is going to give you one thousand seconds so i mean it's still one second but in o of n even though it's great a size of one of
1000 is going to give you 1000 seconds so i mean it's a big difference and you can notice if you
ever had an algorithm that was like looking stuff up in arrays over and over and over again by
searching through all of them and you convert that thing to a hash table you're probably going to see
really big delay i mean a really big improvement there but yeah you're right and it's not it's
definitely not as bad as the others yeah that that That's the point I was trying to get, I wanted to make was that, yes, it could still be bad,
but it's not like, Oh, of in fact, factorial, you know, it's not a hockey stick is where
I'm, I'm going with that.
If you just really want to boil it down to that big, Oh, cheat sheet.
Oh, of one is the best.
Oh, log in is the very next best.
And then, Oh, of in this is the next.
So you're, you're not doing bad, right?
You're still barely out of the green.
You're a little bit out of the green, but still, it's not a bad operation.
So the fact that your worst case is O of N is still not terrible.
But, I mean, to Joe's point, should you find some kind of massive data set to where you have collisions 99 of the time right and you're
using say like underneath the covers unbeknownst to you the separate chaining uh resolution strategy
is being used you're like hey why is my hash table performing so badly you know if you happen
to notice like oh my data happens to use randomly producing keys keys, hashes that are all the same, that's why
it's performing so badly. You could kind of just make guesses because even if
you did happen to notice that it was producing that
without looking at the implementation of the hash table, you'd be like, oh,
they're probably using a separate chaining collision strategy behind the scenes, and that's why.
Now, who's actually going to go They're probably using a separate chaining collision strategy behind the scenes. And that's why. Yep.
Right?
Now, who's actually going to go start hashing all of their own data to see what it's going to produce?
I don't know.
Maybe you're using it.
Don't judge me, man.
You do that, I'll be over here.
Like, npm install.
Right.
Hash 2.
Right.
Exactly.
You know what?
I'm going to derail this for a second i i
want that never happens right i want to say how much better things like npm and i'm even gonna
give a shout out to the java community here we go and maven oh god is oh god the nougat why does
nougat suck so bad oh man like seriously why does it I'm so angry at Nougat almost all the time because of this whole, it doesn't encapsulate its own dependencies well.
And that, man, that makes me mad.
NPM does it with JavaScript.
Like, how?
Is it a Nougat problem, though, or an MSBuild problem?
I don't care.
Let's be fair.
I don't care what it is.
Are you talking about like where A depends on B, B depends on C, C depends on D, and A doesn't have an explicit reference to D, but yet it's needed?
Well, that's one.
Because that's an MSBuild.
Let's be fair.
That's one.
But the one that really irks me is you – all right.
So we're going to dive down here into the c-sharp for just a second
let's do it so i am using log for net 1.2 in my app right let's say but i pull in a dependency
that's using log for net 1.1 well guess what i can't use it because I have to have the same dependency that my dependency has.
And that burns me up.
Like, dude, Maven and Java work really well in this world.
You can bundle up everything you need in your jar and it just works.
Why can't we make that happen?
Okay, two things.
Number one is technically you could solve that problem with binding in your apps config.
You could change the version.
You could give a range for the DLL binding and say anything from this range to this range.
Okay, let's say that it's breaking.
Number two way that you could solve this problem is as the publisher of that NuGet package, you could IL repack your dependencies in so that they come along for the ride.
And they're baked into your...
So bake them in so you're not actually including a dependency. the whole fact that almost everybody else has figured out this way to be able to bundle in
your dependencies in a way that doesn't damage your own application. And it burns me up every
time I face this with NuGet. Like it makes me so mad that I'm like, I'm not doing this in C Sharp.
I'm doing it in JavaScript. I don't care that I like C Sharp better. I don't want to deal with
this dependency entanglement nightmare, right?
Like every time I deal with new game,
it's does seem like a bit of a headache. And when I'm used to using things like node,
it's,
you know,
there's,
it's a no brainer that I can just NPM install anything in like two
seconds and get it working.
Now that the node module folder is horrible.
Oh,
it's a nightmare to look at it.
It's mad.
It's like 300 gigabytes or whatever.
Diniest whole thing.
But I mean,
that's crazy,
but that's your tip But that's what.
This is other good stuff, though.
Your tip last week, or last episode, Joe, that Jamie provided.
I mean, that was the whole point of even for NPM was trying to get around some of these problems where, you know, if you only built based off of the package's JSON file, that's kind of loosey-goosey, right? As to like
what the versions could be, whereas the package lock JSON isn't. That's specifically, here's the
entire dependency graph, like every version that was used, right? And that's where, while we're on
this tangent, like even if you do the IL repack that I mentioned as a way to get around it, that's where those kinds of solutions can still irk me.
Right.
Because I'm willing to understand like the challenges that Nugent has in this
build.
I,
I kind of can't forgive for the,
like not including the dependency thing,
skipping the chaining dependency.
But then that,
that problem then carries forward or is replicated by, and I don't know which, it's either, you know, but by like IL Repack, for example, where if MSBuild didn't, so dependency A, or let's say app A depends on dependency B, depends on dependency C, and dependency C depends on dependency C and dependency C depends on dependency D and MS build.
When it does that come compilation,
if app a isn't explicitly using dependency D and nothing in its execution
path is using dependency D dependency,
D will not come along for the ride.
And then you'll be missing.
So in your,
in your,
in your bin directory or your output of the compilation process, you're not going to have that dependency there, right?
Well, you and I especially, I know, have seen cases where, and it's well documented, like you can go search it on, like Stack Overflow has plenty of articles on this problem.
Where even though it thought that it didn't need it, it actually was needed in some scenarios,
and you'll get runtime errors that you won't see.
But I'll repack to its credit, I guess,
instead of making a runtime error,
you'll see it as a build error
because it's not smart enough to recognize
that App A doesn't use certain things.
And so if it sees that one of those random dependencies has a dependency,
the next dependency, it'll keep walking that app's dependency chain,
expecting all of those DLLs to be in there so that it can hack them all in together.
And when it can't find D, it's like, oh, well, we're done.
All right.
Well, there's such different cultures, too. I just had to mention, like, when I'm working in, it's like, oh, well, we're done. All right. Well, there's such different cultures, too.
I just had to mention, like, when I'm working in front end, like, man, I'll require something in, like, a heartbeat.
Oh, I don't want to left pad this thing.
Just require, just look it up.
Like, nobody cares, you know, because it's the front end.
And for some reason, I guess maybe because there's, like, this culture of kind of these things being small and like single, single purpose and DLLs tend to be heavier,
man,
when you're working on the backend and you want to bring something in and
you get,
it's like,
all right,
let's call,
let's call the tribes together.
Let's get the council,
all the dev,
you know,
managers get together and you got to go there.
You got to bring presence and you got to sweet talk them and you got to,
you got to set up a recurring meeting.
Cause it's probably going to take you a couple to actually get this thing
added to the back to the project.
And then when you finally get around, you finally, everyone, everyone you know or 80 percent of the people agree yes let's bring this package in instead of rewriting
it then you do it and then it breaks the build and you got to mess with it for the next 24 hours
it's so true too like you've got the senate over here you got the house of republicans over here
that you got congress you've got you got veto power over here. You've got Congress. You've got veto power.
See, I thought when you said the Senate, my mind immediately went to Star Wars.
And I was thinking of the Senate within Star Wars.
Oh, gosh.
So I was picturing that kind of room.
You go out on your little hover.
Floating platform.
Yeah, your hover platform to make your speech about like, we need this great library. Library. Library. Yeah. Your hover platform to, you know, make your speech about like, we need this great library.
Yeah.
You know?
So,
yeah,
man,
I apologize for derailing.
I don't even,
something snapped in my head when you said something and I was like,
Oh no,
I've got to get,
it was like,
you had a Twitch,
like you hadn't had your medication yet.
Like,
it's one of those things like where you fight it.
You're just like, at first you're angry and then you're confused, right?
Like why would anybody ever let this happen?
And then it's just, I don't know.
I think it's backwards, right?
Like you're confused.
Like, wait a minute.
Why is this broken?
Yeah.
This should work.
It might be many stages.
What DLL is that?
I'm not using that.
Like there's no reference to that
why is it complaining about that so yeah you definitely start out confused and then when you
realize what the problem is then you're angry and you're super angry you're like what and then
you're like you know what i'm going over to java i may think it's way more explicit than i want it
to be but i'm about to change you know coding paths in my career whoa whoa whoa now you're talking emotionally because
that is not a rational decision just come over javascript man seriously if i want an uppercase
string i just npm install uppercase i don't even care that it's built into javascript but i'm just
gonna get the package and if it's mining some crypto coins in the background then you know
whatever that's the price we pay someone will fix fix it eventually. Oh, that's so amazing. I'm going to assume that some of our Java brethren among us here are going to be upset.
Yeah, they're already upset.
Yeah, don't hate us.
We don't hate you.
We don't hate you.
It's just, it's funny.
There are definitely things that are way more polished and more better in the Java world.
Well, I was going to say.
Come to Microsoft and we got Link.
Oh, that's so beautiful.
That almost makes up for all of it. It almost does. Well, I was going to say. Come to Microsoft Land. We got Link. Oh, that's so beautiful. That almost makes up for all of it.
It almost does.
Well, I was going to say, too, like, I don't know why in my reaction to, like, that situation,
like, you would suddenly, in this hypothetical situation, turn into Little John.
What?
What?
Yeah, yeah.
What?
Oh, man.
And then when you do finally get it to build, I mean, little John comes back at me like, yeah.
Yeah.
You're over there like high-fiving yourself in the corner and everyone else is still pissed.
Yeah, exactly.
You broke the build all day and you're high-fiving yourself?
Great.
You don't know where I've been.
I got some of my little John impersonations apparently though.
Yeah, those are hard ones to do,
man.
He worked on that for years.
All right,
let's get back to a hash tables and all the fun that they are.
Yeah.
Sorry about that.
And so quickly,
let's talk about like the pros of the hash table.
Uh,
the speed number one,
by far and away,
the speed of the hash tables,
uh,
yeah,
the reads ignoring collisions.
You're generally going to consider
that an o of one operation um if there is a collision the read write time can be reduced to
n divided by k where k is the size of the hash table which can just be reduced to o of n don't
you love big o notation just throw everything away. There should be like a whole big O math.
It's basically drop all
the constants, right?
Oh, you had some operation in there?
Eh, don't worry about it.
Whatever.
N divided by K.
Assuming a good hashing
algorithm is used, it's usually going to be
O of 1, like we said.
But this assumes that
by the good performance of the hash, that, uh, that by good, the performance of the hashing
algorithm has been considered. So to Joe's point, uh, if your hashing algorithm is going to take a
second and you have to put a thousand things in this hash table, that's not going to be a good
hashing algorithm. So, yeah, I mean, you can't blame the idea of the hash table at that point
because you have a bad performing algorithm. So even if you do have a slow algorithm,
it might still be O of one, but it may be slower than the alternatives.
And I think that's worth explaining, right? It's O of one because the lookup operation itself
can go directly to it, but the hashing behind it is ridiculously long and tedious.
All right, so to the cons of the hash table.
So depending on your language, I'm looking at you, C Sharp, the hash table type is loosely typed.
We'll come back to that in a moment.
The cost of the hashing function can be more than just looping over the list, especially for few entries. So, you know, if you only have two items in your hash table,
did you really need a hash table? Right? I mean, probably not. Maybe. Probably not. Because the hash table
entries are spread around, there is a poor locality of reference, which can trigger processor memory,
processor cache misses. So, you know, again, if you're trying to squeeze every bit of performance
out, cache performance can also be poor or ineffective depending on the implementation
such as separate chaining and uh the performance can degrade when there are many collisions which
you know we pretty much already covered that all right so when should you use a hash table so
uh now if we are talking about specific language implementations of a hash table class or type, then don't.
At least not in C Sharp.
You should prefer the dictionary type over the hash table type.
Every time.
And not in Java either.
You should use the hash map.
Isn't that interesting?
Yeah. Because it's strongly typed, and so the alternative
is if you use this hash
table, then what happens is you're going to get
an object pack and you're going to have to cast it. So you're
losing the benefits of having your compiled language,
right? There's no sort of checking around that.
And, you know, potentially you're doing something wrong
or getting an error. So that kind of
stinks. And if you're storing simple values, it's even worse
because you're going to end up boxing them. So you're going to take these
simple values that would take, say, 32 bits for an integer, and then you're going to stuff them in a 64-bit reference and throw them on the heap.
And then now you're taking up whatever 32 times 3 is.
Well, not just taking them up.
Now you've also got the garbage collection and everything else on top of it, right?
Yeah.
And totally unnecessary if you just use the strongly typed version.
Right. So if we're not talking about the specific language implementations, then when should you use the hash tables?
Okay, so anytime you need an associative array.
So by that I mean you want to have an array, but you don't want to necessarily look up items in the array by their index, you might want to be able to say like, hey, in this array of people, I want to find the Allen object, right? That's what I
mean by the associative array. There were other examples that were given too that I found that
were like database indexing or another good one too would be cache. If you wanted to build your own cache, you could use a hash table behind the scenes so that when your user wants to add something to the cache or look up something from the cache, you hash the value that they give you as the key.
Then go to that point in your hash tables array and find it.
And then sets was another option when you might want to
use these. I think it's worth calling out cache specifically, just because that's a very common
use case for it. If you're doing like something, a REST API, like one common approach might be to
get the REST call and you take those arguments and you kind of put them in one big string and say,
have I seen this in the last 30 seconds or something like that? And then if you've seen
that, if you've got that object in the hash table memory, then you can go ahead and return it without going and hitting a slower service.
And so it's really common to use it in those types of scenarios.
And it's all about coming up with and managing those keys.
Then on your end in the background, this should be using a really fast, efficient data structure for storing that data.
This episode is sponsored by Datadog.
You've heard us tell you about Datadog.
You know they're a software-as-a-service monitoring platform that provides developer and operation teams with a unified view of their infrastructure, apps, and logs.
But did you know about these features?
Like Watchdog.
Watchdog automatically detects performance problems in your applications without any manual setup or configuration. And by continuously examining
application performance data, it identifies anomalies like a sudden spike in your hit rate
or something that could otherwise have remained invisible. So once an anomaly is detected,
Watchog provides you with all the relevant information you need to get to the root cause
faster. Things like stack traces, error messages, and related issues from the same time frame.
Or what about trace, search, and analytics?
Trace, search, and analytics allows you to explore, graph, and correlate application
performance data using high cardinality attributes.
You can search and filter request traces using key business and application attributes,
such as user IDs or host names or product SKUs so you can quickly pinpoint where performance issues are originating and who's being affected.
Tight integration with data from logs and infrastructure metrics also lets you correlate
these specific trace events to the performance of the underlying infrastructure so you can
resolve the problem quickly.
And don't forget about logging without
limits. Logging without limits is the thing where you can cost effectively process and archive all
of your logs and then later decide on the fly which to index, visualize, retain for analytics
and Datadog. Now you can collect every single log produced by your applications and infrastructure
without having to decide ahead of time which logs will be the most valuable for monitoring, analytics, and troubleshooting.
Datadog is offering our listeners a free 14-day trial with no credit card required.
And as an added bonus for signing up and creating a dashboard, they'll send you a Datadog
t-shirt.
Head to www.datadog.com slash codingblocks to sign up today.
All right.
So it's that time of the show when we ask you,
if you haven't already,
please do go leave us a review.
You know,
we say that it puts a smile on her face and it totally does.
And actually somebody left us reviews.
They left the review to put a smile on her face because we put one on
there.
So,
you know,
that's awesome.
I mean,
wasn't that nice that,
that, that put a smile on my face. Imagine know, that's awesome. I mean, wasn't that nice? That put a smile on my face.
Imagine that.
So, yeah, I mean, if you ever find yourself
and you're in front of your keyboard
or on your phone and you're bored
and you're thinking, hey, you know what?
I meant to leave those guys a review.
Please do.
It makes our day and we really appreciate it.
And it's a nice way to give back.
So thanks.
Yeah.
And also to,
you know,
leave it.
We greatly appreciate the reviews like Alan said,
but you know,
you could also spread the word,
share coding blocks with a friend,
tell,
you know,
let one of your coworkers know about the show.
You know,
expose more people to it.
And it inevitably you're going to say,
yeah,
there's this podcast and somebody's going to look at you sideways and be like, what's a podcast?
And then you can explain to them, oh, it's really not that big of a deal, right?
So, yeah.
It's even a good icebreaker for people that haven't been introduced to the podcasting world.
It's like TiVo for radio.
Wait, what's TiVo?
And then you can turn them on to Serial and Hardcore History and a bunch of other things that they'll love.
So, yeah.
Wait, Serial? You could definitely be a life of a things that they'll love. So, yeah. Wait, cereal?
You could definitely be a life of a party.
You're like, oh, God, here comes the podcast guy.
He's going to tell us about Koenigsegg again.
The podcast guy.
I've been that guy.
All right.
Well.
Whoa, excuse me.
Yeah, you got to be ready to do this.
You can't mess this part up.
I can't mess this up like my little John impersonation.
So this is time for my favorite portion of the show.
Survey says.
All right.
So in episode 94, we asked, what do you value most in a job?
And your choices were pay. It's all about the Benjamins or tech stack.
I need to remain interested or commute or the lack thereof. I mean, I love listening to coding
box and all, but a new set of tires every month is ridiculous or location, location, location. It's just like real estate. Or team, I need to be
surrounded by people better than me so I can grow. Or industry, I want the type of problems I solve
to matter to me. Or benefits, I like to take off for the summer. And lastly, work-life balance.
I have a life outside of the office.
All right, I believe Joe went first, I think.
So let's start with you, Alan.
What do you think?
Man, this one's tough because there's a lot of, like, I'd say a lot of these, at least personally, speak to me.
I'm going to go with, for most people,
pay. It's all about the Benjamins.
I'm going to say
30...
35%.
Okay.
I like where your head's at.
For me, I think the location's
going to be the biggest factor. Either
wanting to work from home or having something where I'm going to guess that most people don't want to move unless there's plenty of Benjamins.
But I'm going to say location at 30%.
You both are committed to your choices.
Alan with pay at 35% and Joe with location at 30%.
I have that right?
That's correct.
Drumroll, please.
You're both wrong. We're both wrong, really?
I'm super
surprised, and I'm curious to see
where you're going to come in on this,
Alan, because you were super opinionated, and we're
just dying to spill your
opinion on this at the time when we were doing
episode 94, if you recall.
But, no, work
life balance, far and away 30 you know everything else was
kind of scattered among it but yeah it was it was clearly you know the one that
walked away with it that one i thought would be up there i didn't think it'd win
and i wonder if that's because everybody's already paid well. So now they're on to the other thing. Yeah, you listen to this show and you're an A-tier.
So you're already making the bucks.
So yeah, it's about that work-life balance.
If you're listening to this podcast, this two-and-a-half-hour podcast,
while you're on your downtime, while you're commuting or washing dishes or whatever,
then yeah, chances are you're working too much.
Yeah, it's – I don't know, man.
For me, commute's a big one right living in
in the atlanta area and knowing that you can give half your life away to the uh gods of the highway
so that's what we're calling them now the gods yeah i guess um yeah i honestly thought that i
smite thee what was number two so if work-life balance was number one at 30, you said.
Yeah, I mean, then from there, tech stack was surprisingly number two at 21%.
Wow.
That's shocking.
Team was close, though, at 20.
Really?
So there wasn't a big difference between those two.
And that one was so hard for me because I'm like,
really?
Because how do you judge the team?
Unless you know because they're already friends of yours
or people that you've met or whatnot
and so you kind of already have an idea
of who they are.
Unless you're just making an assumption like,
oh, it's Google.
So I just assume
they're going to be like,
you know, a bunch of smart people,
for example,
that, you know,
but otherwise it's like,
well, how do you know who the team is to make that, to use that as your, your
determination?
I don't know.
I, I personally thought this was going to be a silly response, a silly survey.
I thought far and away the winner will be pay.
It'll be like 98% pay and like two people are going to make a joke about something else right but i i
was really surprised to see that you know pay didn't rank as high it honestly you know what
and maybe this is just something about our audiences i'm wondering if a lot of it really
is people are getting paid pretty well because they've taken the time to invest in their in
their skills by listening to podcasts or reading books or doing courses or just constantly improving.
So that's almost like, yeah, that's coming, right?
Now I need to focus on what else is important to me.
Oh, you know, if you're in our Slack, then you can go to the salary survey channel.
And we've got a little survey.
So you can go ahead and you can already see what people put in there.
You know,
it's of course is all anonymous.
So you might be interested in that.
Yeah.
If you're not anything,
what was the lowest though? I have to know.
Yeah.
I'll tell you,
but I didn't want to like just finish up this one thought though,
along that line though,
you kind of hit on something that maybe it's just our industry that,
you know, maybe we're asking the wrong industry.
So maybe people within our industry, these are the things that would matter.
Like work life balance may rank higher.
But if you were to go maybe across industries, then maybe pay would rank more.
You know, you know, if I guess what I'm saying is like if the industry was more general purpose or across all of the industries, then maybe pay would rank more higher or maybe I wouldn't be surprised.
But to your question, benefits was the last.
And I think that's what you would have said.
That's what you were wanting to say last time, Alan.
Am I wrong?
Was benefits not the… I don't think, no.
I think for me it was probably going to be
either commute or pay i was pretty sure it was going to be one of those two oh really i think
i swore it's going to be benefits yeah i mean the benefits thing is always i don't think it's
ever anything somebody's uh like striving to get the best benefits although it could make or break
the deal when they go to a particular spot right so i don't know yeah well that's really interesting so yeah good to know all right well
and by the way you can see that whole pie chart by going to the show notes and voting then you'll
be able to see actually right after you vote episode 94 all right uh no this oh for that one yeah that was episode 94 yeah and all surveys
that we do once you take the survey you see the results of it so you don't have to wait so long
to see the at least a few results although it's much funner to hear it so yeah sure all right all
right so i was at data psycon last week and uh they had obviously there were all kinds of amazing talks
but there was one little thing that i was like oh this is gonna be so fun i gotta save this for the
show so a little fun quick game that i wanted to play with you too just to talk about uh the
magnitude of data so this comes from i'm not going to tell you the source yet because I know Alan's already at the keyboard.
He's going to go look for it.
I would never.
What?
I just saw him reach for his mouse.
But so if we talk about – if I were to ask you how much data is generated every minute, All right. And this is as it relates to 2017.
How much data was generated every minute?
Let's talk about first.
What would be a big one?
Okay, here's a good one.
Let's pick Amazon.
How much money in sales did Amazon make every minute for 2017?
Golly.
Oh, it's going to make you sick.
$100 million.
$100 million every minute.
In sales.
We're not talking about revenue.
We're talking about sales.
Yeah.
Joe?
A million a minute.
Wow.
Jeff Bezos would really like you guys
his number.
It was $258,000
$751.90.
I way overblew this one.
I figured the costs were high.
You guys overshot it by a multiple of four.
Everyone at Data Science Con
is shaking their head right now.
It's like, you guys.
If the rest of these are going to go like this, you're going to be depressed by every answer I give.
So I'm just going to go ahead and throw that out there.
All right.
We're going to shoot further down.
Okay.
800 petabytes.
Okay.
No, sorry.
Go ahead.
How many tweets per minute did users send on Twitter?
God.
$400 per minute.
Wait, what was the question?
750,000 tweets per minute.
750,000?
Per minute.
One million.
456,000 tweets per minute
went out across Twitter.
That is pretty crazy.
That's a lot. I understand tweets. I don't across Twitter. That is pretty crazy. That's a lot.
I understand tweets. I don't understand dollars.
Here's a couple of interesting ones.
What about
how many spam emails
do you think were sent
every minute of every
day in 2017?
750,000.
Is that going to be your answer from now on?
How many spam emails
per minute?
$10 million.
No, actually, I'm going to say
$100 million
spam emails per minute.
Absolutely.
I at least get a million a day.
Well, I'm going to put it to you like this and i'm gonna break your heart because he is way closer than you were really way by orders of magnitude
closer it was 103 million 447 520 spam emails sent every minute of every day for the year 2017.
Dude, that's ridiculous.
How is the internet even running?
Because that server's relaying that stuff.
Let me put this into context for you.
This is really going to make you feel depressed about the use of the internet. There were, for Google searches conducted,
there were 3,607,080 Google searches
for every minute of 2017.
Wow.
And it was a factor of 30.
There's 100 million more email spams being sent
than there were Google searches.
Good Lord.
That's crazy, right?
That's ridiculous.
Let me tell you, if they use my definition of spam email, which is just about anything that wasn't sent from a human to a human, then it's probably way higher.
So here's one that I didn't expect to see.
How many forecast requests do you think that the weather channel received?
Ooh,
just for total for 2017 for every minute we're doing every minute.
Every one of these questions is going to be relative to a minute.
Uh,
500,000,
500,000,
two mil.
I gave you guys a hint. By saying that I wasn't expecting this one.
It was 18,055,555 requests every minute.
Jeez, man.
I got three devices on my desk on right now.
Four that are all probably checking the weather right now.
Right. All right. I don't are all probably checking the weather right now. Right.
All right.
Let me,
let me,
I don't want to go through every one of these.
I'm going to like pull out a couple more.
So I got three last ones that I want to say.
All right.
YouTube videos watched per minute.
In 2017?
Yep.
Watched per minute.
$5 million.
$400.
$1 million.
Joe wins.
$400 was the correct answer.
Yes.
No, you're pretty close, Alan.
$4.1 million.
Nice.
Yeah.
And that number is just going up yeah sorry 400 is how much they pay
out per year to uh people who post their videos there what about what about uh text messages sent 10 million per minute.
10 mil.
Oh, you're both going with 10 million.
Okay.
I like it.
15 million text messages per minute.
Wow.
I'm rounding down because there was some more to that number.
Last one that I'll say, I'll include a link to this in the show notes, but the last one
that I've got here is, how much data did Americans use of internet data?
Per minute or like total?
No, every one of these is per minute for every day of 2017.
What was the question?
How much? How much internet data did Americans use every minute of every day of 2017?
I don't know the number.
It's like 250 Potterbytes.
Petabytes.
Potterbytes.
No, I'm the one after that.
Oh, the Harry Potterbytes.
Harry Potterbytes. I'm going. No, I'm the one after that. Oh, the Harry Potterbytes. The Harry Potterbytes.
I'm going to go with 10 terabytes.
Okay, so they put the number in terms of gigabytes, and so that's how I'm going to read the number.
It would be 2,657,700 gigabytes every minute of every day.
That's what I said.
I know, Joe, but I couldn't give it to you like that.
I had to take some of the steam away from you.
Yeah, the translation is magical.
That's crazy, though.
That's crazy.
Right?
Yeah, I thought you guys might enjoy that a little bit.
So, again, there'll be a link to that in the show notes so um you know the listeners can play along at home hey hey so you said that was two million
gigabytes right yes that's two petabytes is what that is i just oh yeah google translate so just
in case you the next one after it is not the harry potter It's the Pebibyte. Never heard of it.
So the Pebibyte.
Yeah, because I should have thought about that.
Yeah, you're right.
Yeah.
So next time you're mad at your cable company or, you know,
wherever you're getting internet access,
just remember what they're putting up with.
Oh, you know.
You're paying for that, man.
You get all of it.
Oh, yeah.
I'm shilling.
They're sliding me them dollars under the table for
repping.
All right, so
we'll wrap this up by saying that
today's survey
will be, we're heading into the holiday season,
so you got to start making some
important decisions here. Namely,
how do you plan
to spend your time off
this holiday season? And your choices are
spending time with the family because the holidays are all about the memories. Or
I'm not avoiding the family. I'm building my next great project. Or escaping the family and the keyboard?
Or lastly, wait, what time off?
You have, have you guys got in your minds what the answers are going to be?
You can't ever do that.
We're not going to tell it.
Just wondering.
All right.
Well, I know my family's not listening.
So I feel like it's okay for me to say that I plan on spending a lot of time with my computer.
Because the computer's always listening, by the way.
So you should never talk bad about it.
That's right.
Hey, you guys want to hear a roof joke?
Okay.
The first one's on the house.
Oh, God.
Oh, yeah. That's right. There were some jokes
that we had, too.
That was really awful. That's great.
What do you call a pile of
kittens?
I don't know. Perfection?
A meowton.
Oh, boy.
Here's one
why are teddy bears never hungry
because they're always stuffed
oh boy
well in the interest of being done
before three hours this episode
move along
these are ones that you can share with your family
they're safe
this episode is brought to you by Manning now I just purchased the physical book These are the ones that you can share with your family. They're safe.
This episode is brought to you by Manning.
Now, I just purchased the physical book, Kafka in Action, from Manning.com,
which is really nice because it also lets me access the e-book immediately.
And I saved $18 because I used the code CODBLOCK40.
And best of both worlds because I get the physical and the digital.
But they don't just have books. I also recently watched, again, Zach Braddy's React in Motion course, which is really great. And he's a really funny and smart guy. So that was really awesome.
And they have this really cool feature that I don't think I've seen anywhere else,
where as you're watching the video, it actually, you can highlight the text. So you can see the
words as you're listening and watching what's going on you can highlight the text so you can see the words as
you're listening and watching what's going on on the screen there. And it works even if you speed
up the video, which is something I definitely do. Manning is running a special promotion this
December. The countdown to 2019 will run on manning.com all the way through the end of December.
Answer just a single question every day and you'll be in the running to win free eBooks,
videos, and even a whole year's worth of new releases. Plus every week, everyone will get
to enjoy massive discounts on Manning products. All you need is to sign up on Manning's deal of
the day at manning.com slash mail dash preferences, and you're good to go Again, that's www.manning.com slash mail dash preferences
to sign up. And be like Joe. While you're at manning.com, take a moment, shop around,
find the next great book that you want to read and use the code C-O-D-B-L-O-C-K-40 to save 40%. All right, so now we're going to talk about dictionaries.
And much like the hash tables, actually, they're pretty much the same thing as hash tables.
The main differentiator being that hash tables, the definition in the Wikipedia computer science
definition of the hash table data structure, the dictionary holds key value pairs and those values
are untyped.
So a dictionary is the same thing,
except we know what type it is.
And so there are a couple little shortcuts
that we can take
because we know the size of that ahead of time.
And like we mentioned earlier,
when we talked about hash tables
and C-sharp and Java,
there's no good reason that I'm aware of
to use one of those untyped data structures when you have a
strongly typed option available. If you're working in something like JavaScript, you don't really
have this sort of differentiation. It doesn't make sense because it's a obliquely typed language.
But in something like C Sharp or, you know, I don't know what C++ has available here,
but anything like that, you're going to want to use that strongly typed option if you can get away with it.
And you pretty much always can.
I don't think I've ever used a hash table in C Sharp.
What about you?
Ever?
Oh, definitely.
Yeah, back in the C Sharp, I think it was before 2.0 days.
Yeah, they didn't have dictionaries.
They didn't have the typed ones.
So all you had were hash tables back in the day.
Yeah, and you just cast it on the way out.
Yep.
Yeah, that stinks.
Yeah, and so they even look really similar, you know,
and in C Sharp you would do like a hash table,
or var my thing equals new hash table, no types there.
Dictionary, we've got generics, which we mentioned.
And if you're not familiar with generics,
basically you've got these cool angle brackets where you specify the types.
So you say a dictionary, and my key is, say, an integer,
and my value is a string.
And you can change those items into brackets.
So it's the same class underneath.
It's dictionary, but you're changing the strongly typed arguments
that can go inside of it.
So we could say a dictionary with a string is the key,
and a person is the value.
And you can get even crazier.
You don't have to use primitives for that first type.
You could say my dictionary has keys of a person,
and the values are strings.
So kind of interesting.
I don't think I've ever used a complex object as a key before
I don't know is that well I mean would you call a string a complex object though
uh no I mean it's not a primitive so I guess I guess it is I definitely use strings there I was
just trying to think like a customer object or something like that I think I have. I'm pretty sure I have. I can't think off the top of my head why, but something like that is kind of where I'm thinking.
But yeah, if you had something like a customer object or an order object, right,
then maybe you had something that you were trying to associate with it in your dictionary.
So if I've got like a sort of code that takes in a bunch of customers and it counts up the number of orders, then I might have a hash table.
And as I loop through the orders, I might say, okay, this customer, if I've seen him before, just go ahead and add to the number of orders.
If I haven't seen him, go ahead and initialize that spot in that, the dictionary with
a value of one for that order. And so as I loop through the orders at the end, I'll end up with
something that I can say, Hey, dictionary at a customer, Abigail, they have three orders.
That's going to be a fast lookup, which is really nice. And so in that case, I guess it is nice to
be able to use that complex object. So I can just say, just, just take the whole customer object.
And then that way I don't have to have have a separate data structure where I say, okay, the dictionary has the string
Abigail that uniquely identifies this customer. Then I go over to some other dictionary and then
look up the object based on the key of Abigail there again. So that would be pretty awkward.
So I guess it's really nice. I've just never really thought to do that. So I'm kind of curious, because the way you started with, like, the hash table versus the dictionary, right?
And it was about, like, specific implementation, though.
Like, specifically in, like, a C-sharp implementation, right, where, like, hash tables, you know, are based on object and dictionary are based on generics, right? But what about – well, for example, in the imposter's handbook, right?
Like he says the specific difference was that the dictionary, I don't know how, but is able to guarantee that it's going to have a unique index.
Each item is going to have a unique index, so therefore it's O of 1.
That was the performance gain of the dictionary over the hash table.
So I can tell you a little bit about that.
From what I've read, specifically in C Sharp, that is the case.
And I'm not too well-versed in the whole underlying implementation by Nuno
that underneath the covers, there are
differences with these two data structures.
Wikipedia definition-wise,
the only difference between a dictionary
and a hash table is that a dictionary is
a strong type and you should use it. Implementation-wise,
if you look under the hood in the CLR
for C Sharp,
then you just read about it.
That's the way to really do it. We'll have some links
there.
The hash tables. That's the way to really do it. We'll have some links there. Is that the hash tables, that's the untimed versions,
they use a rehashing scheme.
And so we kind of talked about a couple of those earlier where it basically says, okay, the spot's full.
Let me try rehashing it and find another spot.
And I keep doing that until I find an open spot.
And then I plunk my item in.
And the dictionary uses chaining,
which is like the linkless idea
where we say, okay,
we've already come to this bucket before.
Let's go ahead and chain the items.
However, under the covers,
C sharp for whatever reason,
I've never really found a good reason
why they did this.
But in the way they implemented
their hash table,
they made certain
to have an underlying data structure
that had one bucket
for every item in your hash table.
And so they end up more or less guaranteeing that you always have a short number of items in that list.
Does that make sense?
So if you have 10 items in your hash table, you've got 10 buckets. If you have 1,000, then they give you 1,000 buckets and they hash your key down to one of those X number of buckets.
And so it keeps the ratio really good or close to one, I guess.
So the number of available buckets is always equal to the number of items that you're storing. I mean, where I was kind of getting tripped up mentally in my own head was just that in
that the book that we referenced, The Imposter's Handbook, there wasn't a lot of detail about
that implementation of it.
So I was like, well, that's a bold claim.
I mean, I'm not saying that it's wrong, but that the dictionary has a unique key for accessing any given value.
According to the book, there's not going to be a collision in the dictionary.
So conversations about like chaining, separate chaining or open addressing or rehashing are moot because there's not going to be a collision. So what I read was basically that the length of the list that it uses to store the collisions
is never going to exceed the bucket size because the buckets are always guaranteed to have the same number of buckets as you have elements.
So it ends up basically keeping a good ratio a load factor i think they called it
of uh nodes to buckets that ends up uh you know matthew matthew hand wavy stuff ends up keeping
the ratio really low and so you can never have more items in the way i'm saying this right
you can never have a longer list than you have buckets.
Hmm.
And I'm going to do a terrible job of explaining this because it's way over my head and very complicated.
But that's kind of the secret there of how they end up guaranteeing performance for the dictionary.
They basically have a specialized algorithm where they keep a good ratio of the length of the list and the number
of buckets. But as for how they do that, that's way over my head. So here's something interesting
on stack overflow about the collisions with dictionaries and the fact that how they're
handled. Basically, it uses Git hash code. So the answer is basically saying, you know, don't assume that whatever you could even do like an IL peak on these type of things and look at what the implementation is.
But don't think that that's what it's always going to be, right?
Like they could change it in the next version of.NET how they implement it. But they're actually saying that if you have an object that's being put in there, the get hash code method must return the same value for the lifetime of the object.
Right.
So that's basically how they're ensuring that these things go to the right place.
So the equals and the get hash codes are the important parts of this.
And so when a collision happens,
we store more than one element in the bucket,
which affects the time we need to look it up because now we need to traverse this linked list
in order to find our item.
Now, in order to make that look up fast,
they keep the ratio of the number of the items in the list
and the number of the buckets in the container low.
That's the magical mathy part that I just don't understand.
I've looked at the algorithm a couple of times
and read a couple of articles
and I don't understand why that ratio is important
or why it guarantees things are fast,
but I did not see anything that said
that you wouldn't have a list.
Like in fact, I did read a lot
about this special kind of load factor
that definitely refers to the length of the list in comparison to the buckets.
Wait, what were the nodes again?
So I definitely.
What were the nodes again?
You had the nodes and the buckets.
It was a ratio of the nodes to the buckets.
But what were the nodes?
Load factor.
The node is the load factor?
The load factor. Well, the load factor is the ratio factor? The load factor.
Well, the load factor is the ratio of the nodes to buckets.
It's the ratio of the length of the list to the number of buckets.
Oh, okay.
So if you want a one-to-one, right, is what you want.
I got you.
Yeah, yeah, what Alan's saying.
Either that or you want it to be uniform
is what we're saying.
So if it was like going back
to the hash table definition
that I gave earlier, right?
Like if the underlying structure was an array,
then at each index in that array,
you want it to be a uniform thing
that are in each of those index.
So each one of those indexes is a bucket, right?
And if it was like a separate chaining,
then you have a linked list of buckets per index in that array, right?
So you want the same number of buckets being pointed to by each index in the array
so that it's balanced, right?
So the load factor should be close to one.
And what I read is that the way that they kind of manage it and the way that they do
the chaining is that it was so that they add a new bucket every time you get a new item
and they do this in order to keep that ratio positive.
As to how they guarantee things get split up correctly, I just don't, I plain flat out
don't get it.
I don't understand what adding more buckets has to do with making things faster other than it improves this load factor number because the number of the buckets is now getting higher as the potential list of the length of the list is getting higher.
So it keeps that ratio really, I guess, at worst one.
But I don't understand why that's desirable.
We'll have some links in the show, though.
I mean, I don't think it's something we're going to figure out because it's definitely pretty hairy in there.
There are some really nice articles on it.
And if you understand it, Chumac84, I'm looking at you.
I could definitely use your help on trying to figure out why.
Because what I'm thinking is there's a decision that Microsoft made here.
And so we've got hash tables up here and dictionaries.
And we're choosing to implement dictionaries, the one that came out later differently. So either that's because
they figure out the chaining is actually better, but they didn't want to change the original
implementation for backwards compatibility reasons. Or they said, you know what, because
maps are sorry, the dictionaries are strongly typed. We can do a little bit of extra mojo
that helps us balance things better.
And,
you know,
maybe allocate stuff in a different way.
Like maybe that's why they can create those buckets of a uniform size and
know it's going to be like,
I don't know the answer to that.
That's just me speculating.
I never did find authority of sources that,
Hey,
dictionary does it differently because it's better.
I mean,
they could,
I am curious though.
Cause like one of these has to be wrong, right?
Because going back to the book, it was saying that it says the dictionary is exactly like a hash table, except it has a unique key for accessing a given value.
And so collisions are not something you have to worry about with a dictionary. Right.
But yet, you know, to your point, like the documentation from Microsoft says, you know, it talks about employing alternate collision resolution strategies.
And even I questioned, like, how could you guarantee a unique key?
Like, how does that how could that possibly work? So my inclination is to think like, well, I guess the handbook might be wrong about that part or maybe I'm misunderstanding that part.
But then on the other little, you know, I got like a little devil on each shoulder, right? And so then we're saying like, well, I mean, when Microsoft decided in.NET 2.0 to create the dictionary class, they could have just said, well, we already used the hash table class name.
So we already have a class named hash table, and we want to have something very similar to it.
So I guess we'll call it dictionary, but maybe they really implemented a hash table behind the scenes.
You know what I'm saying? Just similar to how like in Java, they started out with the hash table and then they
decided, I don't remember the version, to create a better version of it and they call it the hash
map, right? Well, I guess thinking out loud a little bit though, if you add keys, like if you
knew up a dictionary and you try and add a duplicate key, if it's a number, it'll yell
at you, right? Like it's like, no, you can't do that.
If it's an object, though, it would have to rely on the get hash code.
So I don't know.
Maybe you don't have collisions for primitive type keys.
I'm not sure.
Did you see anything on there?
The computer sciency, though, definition.
Well, if we're taking the way it's written here in the handbook as the handbook as the quote computer sciencey way right where there's not a collision right i mean then it's
not yeah i don't know man i got i'm taking this thing at face value but maybe that part is
not accurate i mean looking at the microsoft docs it doesn't sound like and i mean i don't know but
i don't know how to make that. I'm trying to decide
like what do I make of that?
Are they talking about specifically from the name of their
class, right, and how they implemented
their class or are they talking about like
here's how the computer science-y
definition of a dictionary.
Yeah. Anyway.
My takeaway is that the Microsoft
documentation like clearly
says like, hey, we do chaining for dictionaries
and we do re-hashing for the hashes.
So everything I know about chaining
says that it's using a linked list underneath.
So maybe they're just getting a little loosey-goosey
about what they define it to be a collision.
Maybe they're saying it's not a collision
because no matter what,
you're always getting a pointer to a linked list back.
And so you don't really know if you're colliding or not.
You basically get a pointer and you throw it on there.
And so when you go to look stuff up, then I'm speculating again,
but I don't really know why there would be discrepancy between the book and the documentation.
But I have no reason to think that the documentation is wrong.
There's several articles and stuff that I read specifically on that keeping of the list.
Yep.
So, I don't know.
But it's really interesting. I think like any
language that you use, like if you take like a
hard look at the data structures, like even ones
you think you know, like arrays, for example,
you're going to find some really
interesting stuff underneath the covers.
Might I point
you to float?
Yep, float. And you know, javascript has objects but they're basically like hash tables so you know that's something or
or also referred to as associated arrays do you guys see anything about uh associative arrays
and how they differ or are the same is it just a synonym for hash? I think they're a synonym because actually
when I searched for just, you know, if you go to associative array on wiki, so in.wikipedia.org
slash wiki slash associative underscore array data or the dictionary type redirects there,
associative container redirects there, map redirects there. Associative container redirects there. Map redirects there.
So there's several
other terms that are very much in line
with what we're talking about, right?
So an associative array slash dictionary
basically the same thing.
Yep.
Yeah, going back to the
handbook for a moment,
you know, we've had all this conversation
about arrays in JavaScript,
right.
And things like that.
And it was actually saying like,
okay,
well now that you know about this,
actually arrays under the covers are just a dictionary in JavaScript.
Yeah.
Sometimes.
Exactly.
Until it's not.
Yeah.
And then it's like an associative array,
which is a dictionary. Right. Yeah. And then it's an object which has like an associative array which is a dictionary.
Right.
Yeah.
So as for the pros,
it's basically the same as the hash table
except in static languages
you can be a little bit
more efficient
because there's no
boxing necessary.
We did episode way back
on boxing, episode two.
So operations are safer
and the errors are
caught at compile time
and perhaps maybe there are some performance gains to be had because you know presumably the size of the objects that you're storing as keys and also as values.
So maybe there's something there, although I don't have an authoritative source on that.
Cons are the same as the hash too.
And there's something about a class resolution strategy that I don't know about.
Who put that in there?
Was that you, Outlaw?
The class resolution strategy?
What?
Yeah.
I didn't.
Okay.
That was some sort of –
Maybe that was supposed to be like different conflict or collision resolution strategies
and class got written instead?
I think maybe we had a collision on the Google Docs document
and we have an errant line posted here.
It's moving right along.
When to use this?
Basically, it's the same as a hash table.
So whenever you need a hash-like data structure,
but you want that type safety
and you're working in a language that has it,
you need those fast inserts, those fast deletes, fast lookups.
And another case I mentioned there was like if you have sparse data, so you don't necessarily want to pre-allocate the whole universe of what you might use if you know you're only going to be using a small percentage.
And this is a really memory and efficient data structure that's still going to give you fast random access.
Yeah, I mean, kind of going back to my summary of the hash table, though,
like when to use the dictionary is like always.
Almost always, yeah.
You should prefer the dictionary over the hash table would be my opinion,
at least in like a language like C Sharp, for example.
And one thing I don't think we touched on earlier is that hashes and dictionaries don't
generally support ordering.
So if you add the keys, Abigail, Alan, and Brad, and then you, uh, and outlaw don't leave
you out.
And then you say, okay, um, get me the keys loop over.
I might get outlaw first and then Brad and then Abigail and then Alan, there's no type
of guarantee built into that data structure definition on what you're going to
get returned. Now, some languages like in particular, C sharp has like ordered dictionaries
and I ordered dictionary interface, in which case they basically end up storing a like some sort of
list or array data structure that keeps those things in order that you sort them or that you
insert them in so they can return those to you. But that's gonna be a specialized form of this
data structure. But I just want to be a specialized form of this data structure.
But I just wanted to call it out just to kind of highlight the fact that there are a lot
of variants that you might find in different languages or for very specific purposes that
are going to be similar to a hash table or dictionary that are going to be just a little
bit different for whatever reason.
Yeah.
So, you know, I have a summary here of like the dictionary versus the hash table as it relates to C Sharp.
And much like Joe already said, the dictionaries are strongly typed, whereas the hash tables aren't.
So I've got an example of how you could get yourself into trouble with the hash table by mixing those types.
And in hash table, it would be perfectly valid code.
Well, it would compile, it could work, but whether or not the validity of it would be what you actually want, we could argue. you but um yeah so that the dictionaries in c sharp are much like uh the java hash table where
uh you would have the key and value types you know as generics hey i just had a thought
what if you were trying to truly just store a collection of random things you know doesn't you didn't care
about strongly type that's a good point that's a good time that you might use the hash table over
the dictionary oh yeah sure right where you want to have mix it kind of like in the last maybe was
the last episode where i'd mentioned where you might have the array of pointers and each pointer
pointed to something different yeah something like that. So a hash table might work there really well. Yeah. An example I like there is if you have an array and you want to
say, get the unique items out of it. One strategy for doing that is to loop through that array and
throw those items into a hash table. And whenever there's a collision or rather you try to store the
same key twice, then you might say, okay, this item's been duplicated. Either kick it out or do something special with it.
But in that case, all you really care about are the keys.
So you're throwing this thing into a hash table or a dictionary,
but the value is what, zero?
Is it true?
It doesn't matter because all you care about are the keys.
So that's kind of an interesting case there.
Yep.
Here was another, just going back to this
Java versus C Sharp
the dictionary in C Sharp versus the hash table
in Java both of them being typed
it was kind of curious
that in Java the hash table actually
extends dictionary well it's hash map
right in Java no no no wait I'm not
talking about hash map I'm talking about hash table
if you look at the class
you want to get confusing now, right?
Because our whole buildup here is we've used hash table to build up to our understanding of what dictionary is.
But in Java, hash table extends dictionary.
I just tweeted at Larry Ellison.
I said, yo.
What's up with this?
What gives Wikipedia says the dictionary comes after
the dictionary
is strongly typed.
Coding blocks on that.
Yeah, that's
right. Send your tweets to
Larry Ellison.
It's actually up on the Oracle doc, so
you're absolutely right.
The dictionary extends object
and the hash table extends dictionary to your
point.
Yes.
You should in Java prefer hash map over a hash table.
And I forget why there was like some performance optimization,
um,
that was made.
I'm not enough in the Java world to be able to like speak to that with,
uh,
any kind of authority.
But, uh, last point I wanted to add here though,
in this hash table versus dictionary kind of wrap up is that the JavaScript
object we talked about already could be used like a dictionary.
You can use it like an associative array.
Yep.
And I do many,
many times.
We all do,
especially for caching.
We abuse it.
Yeah, all the time.
Would you consider that an abuse of JavaScript?
No, that's a good use of JavaScript.
No, that's just what we do.
That's the appropriate use of JavaScript.
Appropriate use of JavaScript is to abuse it.
That's right.
Last episode, we talked about arrays and similars.
You know, I'm not going to be
implementing my own array but for linked lists like i i don't have a problem implementing my
own version of a linked list over using like c-sharp's built-in one now i know about the
built-in one i'll probably use it but it doesn't bother me to think about somebody recreating a
linked list or even a stack like you can really easily make your own stack so if you don't use
the built-in one fine javascript fine just use Just use an array and, you know, push pop, whatever. You've got a stack there. Good enough. However,
hash table is one that I will not be writing on my own because I want that really good distribution
of those keys. And that's not something I'm confident in my abilities to do without spending
a lot of time researching something that I'm really not interested in knowing that much about.
So, yeah, we've got tons
and tons of resources that we'll link to
here in the show notes.
I mean, we've got a lot.
And with that,
it is time for my
favorite part of the show. It's the tip
of the week.
Alright, and I'll start us off.
We have a Slack. We talk about it all the time.
Go to codingbox.net slash Slack and you can send yourself an info and invite.
Invite.
Hop on in there.
This is my favorite.
It's a lot of fun.
That's right.
One thing I'll tell you, though, is because we have a lot of people, a lot of active conversations going on, we have our data truncated by Slack all the time.
So you'll see a lot of channels that sometimes are just empty.
And so you might join an empty channel and think it's dead.
But really, it was really active two weeks ago. So you'll see a lot of channels that sometimes are just empty. And so you might join an empty channel and think it's dead.
But really, it was really active two weeks ago.
And just three days have gone by and the slack lingoliers have eaten it.
So one of those channels that sometimes goes dormant.
Yeah, you like that, lingoliers?
Yeah, okay, lingoliers.
I thought you mispronounced it or pronounced it some other way.
I'm like, wait a minute.
No, I thought it was pronounced lingoliers.
Am I saying it wrong this whole time?
Stephen King. He still hasn't
answered my tweet.
Anyway.
We have these channels that sometimes look
dormant, but sometimes they wake
up with a passion.
And one of those is hashtag
pet dash pictures.
It's lit right now.
So Jacob started the fire.
He posted a picture of his pupper laying on his back.
Arlene sent,
uh,
Oh my gosh.
Hold on.
Sorry.
I started watching it.
Uh,
Arlene posted a picture of a cat fetching a fork,
which is ridiculous.
Ridiculously cute.
As it sounds like Critter had his cat on trampoline.
Anyway, it just went on and on and on. So the real tip is you should
join the Slack. And specifically, you
should join that Pet Pictures channel
because it's lit right now. It's awesome.
There's so many cute animals
in there that I'm dying. But so far, they're only
cats and dogs. And I know
you out there,
dear listener, someone has a cute
lizard or a squirrel or something and i want to
see it and think about getting it while i'm working because we all need that positivity
this time of year but wait wait logan posted a picture of a dog that looks very angry that was
the best picture he could get of it it's an all black dog so he's talking about uh how i love this
channel he's talking about how it's hard to get a good picture of it so yeah so uh yeah look at the teeth on that dog yeah it doesn't look happy
it looks kind of like your cat about the man's dog come on i'm not talking smack it just looks
really unhappy you need to pet your dog logan pet your dog oh boy all right well i'm gonna i'm gonna
go warn them in the pet pictures channel about this episode coming out.
Like, Logan, listen, your dog comes up.
All right.
I'm sorry to say.
I guess.
Am I next?
Yeah, I'm next.
All right.
So I don't know how I've been on the interwebs as long as I have and been as interested in big data as I have.
As a matter of fact, I got a question or we all got a question.
I think I'm the only person that responded that was, you know, what are your current interests?
You know, like we know, we hear that you're C-sharp guys or whatever.
And so I posted some of my interests up there.
And a lot of them revolve around this whole big data and just data in general, right?
I'd never heard of Grafana.
You guys?
Sounds familiar.
A little bit.
Just because I was looking at the search talk I did a while back.
That was a really good example of visualizations based on time series type stuff.
So I know that it goes hand-in-hand with Prometheus.
It's the only time I've ever really seen it.
Yeah.
Yeah.
So this thing is awesome. It's an open source package that you can hook up to up to 51 different types of data sources.
51.
That's like a lot.
Yeah.
And you can literally just build dashboards on the fly.
So if you've got some time series data or something, you want to be able to visualize it,
hook this thing up to your Kafka or your Elasticsearch or whatever you want.
Like he said, Prometheus, um, cloud watch, you know, probably any Azure alert type things.
And you can actually drag a panel up there, drag a chart up there, hook it up, tell it what the
columns are and boom, you have this thing that you can go look at. And I believe it's even got
alerts on it. I can't remember, but yeah, totally. This thing's really cool. on it i can't remember but yeah totally this thing's really cool and it looks
like a really quick way to be able to start visualizing any kind of big data or even even
just data that you have access to so check that one out should we answer now about the uh
things we're interested in or why not i say you put the post up on the page um i don't remember
where i put it though.
I don't remember which episode that was.
But you can, yeah. I mean, you guys want to do it real quick what you're interested in thing?
Is it fast?
Is it fast?
How many things are you interested in?
I mean, like
Python, data science,
JavaScript, machine learning.
Okay.
Like those kind of topics.
I mean, topics we've talked about.
They shouldn't shock you.
Yep.
DevOps.
Yep.
What about you, Joe?
I don't understand the question.
Is that what we're interested in?
Yeah.
The types of things that you're currently like hot on learning about or just.
Oh, yeah.
I know that.
All right.
I totally know that.
Well, tell us.
Oh, he has to tell us? No, hold on. I'm thinking. No, yeah. I know that. All right. I totally know that. Well, tell us. Oh, he has to tell us?
No, hold on.
I'm thinking.
No, absolutely.
I am very super interested in, I really want to focus 2019 on two things.
Search engines.
So, Elasticsearch, Azure Search, Algolia, stuff like that.
I think you build a lot of really cool user experiences in it.
And the Jamstack, which we're going to be talking to you about
really soon here. I'm really interested in
Jamstack, and I'm worried about
my little kind of middleware island shrinking
year after year, and I'm deciding
to join them rather than be beaten by them.
So, running into all the things.
Really? Jamstack? I swear
I thought you were just messing with me when you said that
previously. No, man. That's my profile
of them two. Man, I've been changing everywhere.
The very first time he mentioned something about it, I assumed it was a reference to
Dance to Die's jam article where he'd referred to Joe Allen Michael at the jam.
Jam, yeah.
So I assumed that Jamstack was something like that.
Dance to Die hooks up.
Yeah.
So you're totally on board with that.
All right, interesting.
Yeah, Jamstack and Search.
All right, then.
Software for humans.
That's me.
All right.
All right.
So my tip of the week comes to us from Angry Zoot.
Thank you, Angry Zoot.
Is to add emojis to your file.
Or, oh my God, if you use this in your code.
In Visual Studio Code,
by using window plus the semicolon, and it'll bring up a little window that you could select
the emoji you want to use.
So you could say var foobar equals, and then put a smiley face in it.
Oh, that's amazing.
Or worse, you could do var smiley face emoji equals foobar oh that's amazing
in which case i'm gonna be like why would you do this in your code but you can do this in visual
studio code without any add-ons without any extensions plugins whatnot zoot you're awesome
yeah uh or all my variables or maybe you're, because if Alan starts using this name as variables,
I'm going to be looking at like, wait, smiley face plus frown face equals meh face?
Oh, that's amazing.
Right?
Smiley plus smiley equals super smiley?
Oh, dude, this is going to be great.
If smiley greater than zero.
Yeah.
Really, though, I am so on board with emojis everywhere and it's dance to die.
It's actually the reason he made those,
those ticket templates for QIT where it would say like,
you know,
steps to reproduce or environment,
you know,
like that sort of thing,
but he used emojis and it really broke it up visually.
And so you could kind of look at this,
what would otherwise be kind of a blob of text and see like,
okay,
there's three parts to this.
And you know,
here's some visual indicators as to their meaning.
And so now I see him everywhere.
I'm MVP,
uh,
uh,
Nicholas.
Um,
he has a,
uh,
his advent of code,
uh,
code upon GitHub.
And he's got a couple of stars there that represent the kinds of problems
because you're rewarded with stars and it just looks really nice.
And so when you go to a page of GitHub,
it's so text heavy and you're so used to seeing it.
And all of a sudden you see this cute little pictures that kind of divide up the thing that you're looking into.
Synthetical operations.
Like suddenly, emojis aren't so crazy.
That's really cool.
I want to see where you're talking about.
Where's the QIT one with the emojis?
Do you have a link?
Just go create an issue. So if you go to github.com slash coding block
slash podcast dash app
and create a new issue, you'll see
when you select the issue type that it gives you a template
that asks you to fill in different things
when you create the ticket. I want to
create a new issue.
Yeah, and Dance to Die, man, he
is the emoji
master for sure. Oh, man,
you want me to log in?
What's that about?
Hold on.
You're not always logged into GitHub.
Come on.
Oh, man.
Oh, I still got to open pull request tags.
Dude.
All right.
Well, so very coolness.
Thank you.
All right.
Well, with that, I hope you've enjoyed this episode of Hash Tables versus Dictionaries.
Be sure to subscribe to us in case a friend happened to let you borrow their device to listen to this or if they sent you a link and pointed you to the right place.
But be sure to subscribe to us on iTunes to hear more using your favorite podcast app.
You can leave us a review if you haven't already.
Like Alan discussed earlier, you can head to www.codingblocks.net slash review.
And happy holidays to all, right?
What everybody doesn't know is Joe Zach, while he seems like he's probably the most festive of the bunch of us,
he is the Grinch of the three of us.
I'm one of those warriors that you hear about that is uh actively trying
to destroy christmas sorry so truly kick him in the shins if you see him during this holiday season
he needs some yeah i legitimately hate holidays that's that's crazy talk so anyways happy holidays
to everybody and while you're up there at codyblocks.net check out our show notes examples
discussions and more and send your feedbacks, and rants to the Slack channel
because it's brilliant and awesome. There's awesome people in there.
And you can get an invite there by going
to codingblocks.com slash slack.
Yeah, and Logan, pet your dog.
Pet your dog, man.
You can also follow us on Twitter or
head over to codingblocks.net where you can find all our social
links at the top of the page.
Love you guys. Thanks.