Postgres FM - Hash indexes

Starting point is 00:00:00 Hello, hello, this is PostgresFM episode 76. My name is Nikolai and as usual, together with me is Michael. Hi, Michael. Hello, Nikolai. So, today's topic is super interesting and very popular, right? Not exactly. I chose this one. Okay. Therefore, it's boring.

Starting point is 00:00:22 But we will learn something new, which is good. I hope so. Even if it's useless, it's boring. But we will learn something new, which is good. I hope so. Even if it's useless, it's still new. We're here to learn, right? So Nikolai's teasing me because I've picked the topic hash indexes, which until relatively recently were highly discouraged by the Postgres docs and not very sensible to use. Until Postgres 10.

Starting point is 00:00:49 Yes. But have been in Postgres for a long, long time. I looked into the history of it. And therefore, some people must think they're useful. And I definitely think there's one or two use cases for them. But more to the point, I think they're very interesting for people. I think this is the kind of thing that will make you appreciate the B-tree index as well

Starting point is 00:01:07 and also understand a little bit more about trade-offs we make when we're choosing things within Postgres. Indexes, right? Why doesn't Postgres choose indexes, index type for us? We just want

Starting point is 00:01:23 to make it fast and that's it, right? It should choose. Okay, different question. Are hash indexes more useful than the money type data type? Oh, I think so, yes. Okay, this is already better. I hope we will never have an episode about the money data type. Yeah, me too.

Starting point is 00:01:44 Okay. So, yeah, should we start with what hash index is? Yeah, hash function, hash index, right? Yeah, that's pretty much it. Hash table. Yeah. So in fact, that's a good point. A lot of people have come across hashing from looking at query plans for hash drawings or, you know, backend engineers, very familiar with hash tables, can be a very efficient way of taking a value, hashing it. storing that and then in future if we want to look up the value by equality and equality only

Starting point is 00:02:27 we can hash the value we see and look it up in the in the index in this case so people that like uh theory talk a lot about big o notation don't they so this is one of those things that i struggle to love but it has a low like oh one is what people often say i'm not sure it's strictly true depending on how deep down that rabbit hole you dive but yeah it's it can be very very efficient for looking up things by quality especially in certain conditions so that's like like actually a couple of things on details. I didn't realize you could, and it makes sense, but you can use a hash index on any value, any data type, which is quite cool.

Starting point is 00:03:16 I guess the same is true. Postgres has a lot of internal hashing functions for all data types and also for the record. So you can say record for some table and it will give you a hash integer 4, right? Regular integer.

Starting point is 00:03:33 32 bytes. And I actually recently rediscovered it and used. It was interesting. I didn't use the hash index but I used one of probably hash text function. Yeah, it was hash text function. I needed to, you know, when you need to reindex a lot of indexes. And it's a huge database. For example, after upgrade to Postgres 13 or 14, 14 especially, you need to re-index a lot of bit re-indexes to start benefiting from the optimizations

Starting point is 00:04:08 Peter Gagan and others implemented. And obviously you need to re-index. Okay, you will do re-index concurrently and so on. It's good, but the question will be how to move faster. There are parallel workers, as as we discussed and so on. But probably you will decide to use multiple processes to issuing the reindex

Starting point is 00:04:32 concurrently command. And in this case, if two of them try to reindex two different indexes which belong to the same table, you will have a deadlock. Right?

Starting point is 00:04:43 So my idea was we just need to attach each table associated with particular worker. And to do it, I just took a table name, calculated hash text from it, and then model number of workers. If I have, for example, five workers, I just... So we have table name, hash text function. The hash text function produces number, like integer four number. And then we just model five. Remainder of the division, it will be like 0, 1, 2, 3, 4. That's it again, 0, 1, 3, 4.

Starting point is 00:05:22 And each index associated with particular table name will always go to particular workers. So there will never be a conflict, no more deadlocks. So this is how you can redistribute works. Maybe also uncheck if you need to redistribute. You need to check a lot of indexes for corruption. You can also use many, like 16 workers, for example, and you redistribute workers to avoid deadlocks. So hash text is

Starting point is 00:05:49 useful. I was already like CRC 32 or something. I was thinking, oh, I need to... Oh, by the way, I asked my bot to advise me. And it says, oh, there is hash text. There is hash text. I never used it before, or maybe I just forgot. In PSQL, if you

Starting point is 00:06:10 say backslash df then hash star, you will see more than 50 functions. We checked it before the call. More than 50 functions, many of them return integer 4, some of them return integer 8. So it's good to have a lot of already implemented functions. You don't need to care. Just use it. Right. Good? Yeah, absolutely. Small comment, slightly off topic.

Starting point is 00:06:37 Yeah. I mean, related to hashing, but I guess completely unrelated to hash indexing. Yeah, exactly. Yeah, so you mentioned already there's a big change in the history of hash indexes, I think is worth mentioning. And that's the version 10 made hash indexes crash safe or right. Before that, it meant if you had like, it made them basically unusable or unsafe to use, especially in a high availability setup. If you had replication and then failed over, your replica wouldn't have that index. And if you're- Ephemeral indexes. Yeah, right.

Starting point is 00:07:18 But no more. So it's solved. The same was solved with GIST very long ago. Very long. Originally, GIST was also not wall-supported and wall at all, but it was like 20-something years ago it was implemented. Well, speaking of 20-something years ago, I was trying to find out when hash

Starting point is 00:07:40 indexes were added, and the Git history doesn't go far enough back, I don't think. And the online docs, 7.2 is the oldest version on the online docs, and they have hash indexes. So that's as far back as 2002 already. So

Starting point is 00:07:55 they're more than 20 years old as well. Well, I'm quite sure they're older because it's maybe one of the simplest type of index you can implement. But interesting that wall was not implemented so long, right? Right. Until 2017, right? I think you'd struggle to get – I could be wrong,

Starting point is 00:08:15 but I think if we didn't have hash indexes in Postgres today, they wouldn't get added, and they would be encouraged to be added via an extension. That's my guess as to how they would be implemented if they if they were done today but i mean i might be or maybe maybe via the contrib modules well yeah but we have them we're first class supported they're right they're well uh logged now they're replicated and everything. Crash safe, even while they're doing complex operations, we might get to a bit later, it will recover. So we can use them if we want to. I guess the question then is, why would we? And you mentioned something before the call I found very interesting

Starting point is 00:08:58 and it's probably very telling. But should I ask you, have you ever used the hash index? Only in experimental environments, never on production. I think I saw them in the wild, you know, saw them. Yeah, I can believe that. Yeah, but I don't think I ever seriously considered them. But you need to take into account that version 10 and later, it's already mostly consulting practice. Before that, I created a lot of systems, social networks, and so on. There, it was before Postgres 10, so never consider them. So my memory says stay away from them. Yeah, that's a really good point. So it's only since 2017 that you should even be considering using these. So I'd like to first cover, I know there's going to be quite a few limitations that we need to talk about, but I'd like to cover some of the potential benefits of hash indexes. Like I think there's a couple that are worth mentioning.

Starting point is 00:09:57 Size? So they can be. So for imagine if you are hashing a fairly large string or even number or just a piece of data that is more than four bytes. When you indexing it with a hash index, it's only having to store four bytes or, you know, a bit a bit more. But, you know, it's not having to store that value in the index and that means for larger data types or larger values it can be significantly smaller than a bt index with some caveats naturally there are some optimizations for bt that we've got in recent versions and yeah this it really stands out for larger values, and for indexes that are unique or mostly unique, like also for columns that are unique or mostly unique. So that's a lot of the work. He deserves most of the credit, but he put me down as a co-author, which was kind of him. And we looked at quite a few things. And for the use case that he had for hash indexes,

Starting point is 00:11:14 which was a URL shortening service, you get some quite large, URLs can be quite large strings in the grand scheme of things, especially full URLs. So they actually came out quite a lot smaller when you put a hash index on it versus a B-tree, which is pretty cool. Even post-deduplication, because these are largely going to be unique. We didn't make them completely unique, but largely your URL is going to be unique strings.

Starting point is 00:11:42 So, yeah, they can be significantly smaller. They can be significantly bigger, they can be significantly bigger, depending on your data, but I think size is underrated in terms of index types. I think not only we're going to look at performance in a bit, but base and the cache, you know, we've talked about this kind of thing before,

Starting point is 00:11:58 but I think it is worth mentioning. Yeah, by the way, if we try to index in a regular way with B3 some large text values, there's some limitation. I always forget. Oh, good point. Yeah, so sometimes we just need to apply hash and use expression B3 on top of it. But it's not, of course, it's not hash index, but we use combination, right?

Starting point is 00:12:22 In this case, we cannot order these values, but for exact matching, this is what can work well. Why do you hash it and then put a B2 index on instead of just putting a hash index on? Because, again, I'm old guy. You know, like, again, my mom always says, stay away from hash. By the the way with my bot and github we just checked when hash build the hash build function was added and it's 27 years ago original postgres 95

Starting point is 00:12:55 yeah original source code of postgres 95 wow committed by by Mark Fournier. It had already this hash build, which builds a new hash index. So this is super old stuff. I think maybe it even came from original Postgres. Obviously, it came from original Postgres, maybe even from Ingress. Who knows? So maybe it's even more than 30 years ago. Yeah, very cool. But but yeah that's another advantage right there's no limit to the size so you think yeah yeah that's great so you think i just can create

Starting point is 00:13:34 hash index if i have very large text values you're losing the same things right you can't order by them and yeah it's you're kind of doing a self-rolled one which maybe even proves that we don't need them but i think less less footprint right because you're having to yeah i'm not sure i think there might be some weird trade-offs around maintenance but i think you'd largely recreate them yeah so if i say create table uh something as select and then I say repeat as column C1 and I repeat some letter a million times, I have obviously text value a million letters. Then I say create index on this value, on this table, on this column. In this case, B3 will say index row requires 11,464 bytes, so 11k, right? Ah, in my case.

Starting point is 00:14:30 But maximum size is 8k. Okay, like block size. But if I say using hash, right, using hash, same column, create index. Yeah, it works. So why do I – okay, I'm old guy. I have old habit. I need to get rid of it actually this is perfect case large text values

Starting point is 00:14:51 I don't need to hash first and then B3 I just hash index in this case I will need to think about it thank you there are some reasons you might not want to do this why okay well let's go through the positives first i promised positives size can be it can be worse but it can be a lot better speed can be better

Starting point is 00:15:12 not just lookups but also average latency of inserts i definitely had some discussions with hacky when we were reporting these numbers that we could have done a better job i think I hold my hands up this was a couple of years ago and I've learned a lot since then but um inset performance on a batch was or when we were insetting one more at a time was about 10% slower I think for the B-trees than versus hash which I was a bit surprised by because I knew there was some overhead around splits for hash indexes or higher overheads some of the time. But look up performance. B3 also has

Starting point is 00:15:51 split sometimes, right? Yeah, of course. So we need to compare. In this case, we can't say hash index has this disadvantage and that's it, right? B3 also has this disadvantage. Yeah. So they're kind of bigger splits less often. Yeah, that's it, right? B3 also has this disadvantage. They're kind of bigger splits less often.

Starting point is 00:16:08 Yeah, that's a good point. But look up speed, I was expecting a difference and admittedly they're both really fast, right? We're talking about basically key lookups. So B3s, we've always raved about how good they are at key lookups, right?

Starting point is 00:16:23 Yeah, just a few pages to find proper reference. It's just a few blocks to fetch from disk or to read from page cache. I think we only did about 100,000 lookups, and both of them were under a second to do 100,000 lookups. So we haven't checked buffers at that time. Only timing, which is quite vulnerable. Yeah.

Starting point is 00:16:50 Again, I would do some things differently now. This is a couple of years ago, but the, yeah, the, the lookups whilst they were both under a second, I think it was something like 30 or 40% faster for the hash lookups, which was, I thought, actually quite significant. So I think there are some cases where in some extreme, if you've got long strings and you only care about uniqueness lookups, then I think there's a case for using them. But that is obviously a very narrow, very specific use case.

Starting point is 00:17:25 I wonder in these tests if you considered the planning time, because I guess if table is not absolutely huge, like billions of rows, in this case planning time can be maybe even a bigger contributor than index scan itself, right? So we need to exclude it using maybe prepared statements and consider on execution. That would be a good test. But also, why would the planning time differ between index types?

Starting point is 00:17:52 I'm not saying it differs. I'm saying if you're comparing these like latencies. Yeah, sure. What percentage of planning time is there and here? So we need to exclude it to compare clean numbers, right? Yeah, that'd be great. I don't think we used compared statements. I'm pretty sure we didn't. And honestly, I wouldn't look at timing here at the first metric.

Starting point is 00:18:16 I would still prefer looking at buffers. But okay. But the point is they can be more efficient and that's partly because they're if you imagine a b tree structure you might have to go through a few late you know i might go through a few hops uh especially once your table gets really large now we've talked about partitioning so many times right so you could argue that you could keep your b trees relatively small in the grand scheme of things but But if we do get large enough,

Starting point is 00:18:45 the hash structure is just so much more efficient and that will show up in buffers as well, right? So that is the argument, I think. So you've got potentially smaller indexes for highly unique, slightly larger data types and potential benefits on the insert and lookup speeds. Have you considered collisions? Because if you have only integer four, obviously you might have collisions,

Starting point is 00:19:14 and you need to resolve this additional hops, additional comparison and so on. Have you considered trying to measure some edge cases. Yeah, I looked into this. I looked into the internals a bit. Well, actually, there's a couple of really good internals pages on the Postgres docs on hash indexes. And there's only a throwaway word quite early on in one of them that mentions that they are a lossy index type.

Starting point is 00:19:41 So it's not the only lossy index type, but it means that just because we get a match for the hash, the 32-bit integer, doesn't mean we do actually have a hit. We could have a collision. So there is a recheck and you can get those removed by index recheck type things in your explain, but I haven't seen it. Like we're talking, but you'd have to have, well, I'm not so sure you'd have to have billions, but I'm guessing you'd start, you know, in the tests I've done, I haven't seen any.

Starting point is 00:20:13 But I guess you'd pay that overhead for every lookup, right? You're having to do that recheck all the time because they're lossy. It's not just when there are collisions that you pay that recheck. So those numbers I was talking about include the overhead of rechecks. Okay.

Starting point is 00:20:28 So what are the downsides? Why shouldn't I use... So many. So many, okay. Index-only scan, as I remember, right? Yep. Lack of index-only scan. Because we don't have the value in the index,

Starting point is 00:20:42 it's literally impossible to have an index-only scan. Right. That's bad. Yeah. Yeah. So... But if I, in my case, if I have large text value and I consider hash index versus B3 over hash expression... You can't have an index-only scan there either. Yeah, because I cannot fetch the original value. So it's not a reason that would stop me from using hash index.

Starting point is 00:21:10 Okay. No. Right. In fact, have you considered collisions for that? Yeah. Well, yeah, yeah, yeah. You have to implement it yourself. That's true, yeah.

Starting point is 00:21:20 So I think the big one is ordering. Obviously not in the case you're talking about where you're hashing, but the big advantage B-trees have is they order your data, and that supports so many different things. And greater than or less than comparison, obviously. Yeah, range scans. Yeah, it feels like a limitless number of things that that then enables. Right, you can deal with collation issues.

Starting point is 00:21:48 It's so cool. They can be corrupted because of collation changes. I mean, that's a good point. Maybe hash indexes are less likely to get corrupted. Right. Okay. Actually, what if the hash function like created a different value instead

Starting point is 00:22:07 I don't think it's a good question I don't know how it's implemented internally it should not be super difficult but obviously how will it hash non-latent characters and so on and it doesn't depend on JLipC version change for example

Starting point is 00:22:24 it's an interesting question I never had this question because again I'm not and so on. Yeah. It doesn't depend on JLipC version change, for example. It's an interesting question. I never had this question because, again, I'm not seeing them in the world often at all. Another huge difference, and I don't think this is a, I don't think this is necessary

Starting point is 00:22:36 based on its, I don't think this is like the index-only scans where it's literally impossible, but in Postgres, we can't use hash indexes to enforce uniqueness. So for unique constraints, which is a big deal.

Starting point is 00:22:48 Like, for example, we can't use them for primary keys. Well, like, there's no point because we've already got a B-tree index on them. Okay. Good to know. What else? Multicolumn, right? I remember now. Yeah.

Starting point is 00:23:02 Multicolumn. Exactly. So hash indexes don't support multicolumn. You can combine values, or you can, I don't know. But more to the point, one of the nice things about multicolumn B-tree indexes is you've also got the sorting, sorted by the first column, then sorted by the second. So we just don't have advantages like that. And ties into the index only scans as well right like one of the best

Starting point is 00:23:29 things about multi-column indexes is we then get potential for index only scans as well so these things a lot of them are into play one difference that isn't necessarily an issue but i found interesting i thought you might as well was that the fill factor for hash indexes is 75 by default, whereas BG's is 90. Really interesting. Some old decision, I guess. Maybe not, but maybe. I don't know. at what does it look like if you change the fill factor to 25, 50 or 100. And at 100 versus 75, you just see it not getting larger as quickly, but when it does, it bounces at a faster rate. So as the index grows, it grows less at first and then more all at once. So my guess is it's some trade-off between general size and performance

Starting point is 00:24:29 and the impact of splits or the cost repair splits. And in fact, this is probably the biggest one you'd want to consider on the difference between doing a B-tree index on a hash function versus using a hash index. And that's how they would how inset performance would look in terms of probably not average latency but maybe the standard deviation of latency so with hash indexes the way they're structured we have buckets that values can go into

Starting point is 00:24:59 but beyond a certain size postgres works out and it has all this for you. It decides, oh, we need more space. It'd be best to have another bucket. Let's split one of the existing buckets. And then we pay that overhead during the insert, right? Like that, you can slow down your insert. So my guess is the standard deviation for inserts would be perhaps significantly higher than BGs, but I haven't checked. So another thing I would like to do. Maybe.

Starting point is 00:25:27 So on high throughput, there is a note in the internals document about high throughput. Let's say you've got a really high throughput table. You might actually be better off with your B-tree index that you mentioned than a hash one.

Starting point is 00:25:44 Okay. And multi-column index, I think we can... I've just checked, by the way, hash record applied to whole table row. It works. Producers, inter, for, number. I think we might probably

Starting point is 00:25:59 decide to use it to... I remember I was always using MD5 hash, converting row first to was always using MD5 hash, converting row first to text and then MD5. But maybe hash record is a better choice when we need to, for example, compare the content of two snapshots

Starting point is 00:26:15 row by row, you know, like to find which rows were changed, comparing them, like joining by ID, by primary key for example and then we compare hashes of whole row, right? In this case hash record

Starting point is 00:26:32 is working. I wonder if like if we multi-column index is not supported what if we just combine multiple columns and produce hash out of them and almost the same, right? It's strange multiple columns and produce hash out of them and almost the same, right?

Starting point is 00:26:48 It's strange that it's not supported. I mean, we can still only use uniqueness lookups, right? So I guess it's the case where you want to see if two columns when combined are equal to the record you've stored. Yeah. But I'm not seeing the same benefits.

Starting point is 00:27:05 Okay. I will check not seeing the same benefits. Okay. I will check a couple of things more. Yeah. Any more limitations or that's it? Don't think so. I might have missed one, but like those for me, they're substantial limitations.

Starting point is 00:27:17 And I think if you're, imagine when you're designing a system or you're choosing an index type for a new feature building, and it turns out that right now the hash might just about come out on top. Maybe you get a slightly smaller index. Maybe your lookups are slightly faster. I'm not sure I'd still advise using a hash index.

Starting point is 00:27:40 I think I might say, look, if your requirements change in the future, if you ever do need to look up a range of these values, if there's some analytics you might need to do on it or something, you might have been better off with a B-tree index. You've got that extra flexibility. You've also got, I think, probably more effort being put into optimizations on them going forwards. If it's a tight call between them, I might still encourage people not to use them in general. I never did it. I just did it. I used hash record function applied to whole row in a table, which has three columns.

Starting point is 00:28:15 And then I created index on top of that hash record. So first time I created an index not specifying columns, but involving all columns in a table. This is super strange. It could be bit.3 actually as well, because hash record produces integer 4, right? So it applies to all columns. So I said create index on T1 using hash of hash record of T1. And the whole record.

Starting point is 00:28:47 That's super strange. I need to think about it. What use case are you imagining for this? I don't know. I'm just exploring, you know, like curious. I don't know. Maybe like, again, like to track which records changed content or something. I don't know.

Starting point is 00:29:04 I don't know. I don't know. It's just interesting that it works. I thought we are required to specify columns, right? When we, like, this column, that column. But here you just specify whole record. And it works. This shows flexibility of Postgres as usual, right? Yeah.

Starting point is 00:29:21 But I created multi-column index hash, but it's hash of hash. So double hashing here. Okay. So I don't know. I'm still thinking about use cases. Realistic one is only these really large text values. That's it so far for me. Yeah. And when you only need unique lookups of them yeah so yeah interesting so i will keep it in mind so yeah i love i love that we've gone that during this episode we've gone from i never have used cases for hash indexes yeah exactly there we go cool but i also think you know you said you've seen them in the wild. I think that might be a result of people overthinking it. And I was there a couple of years ago.

Starting point is 00:30:10 I was thinking, you know, there might be some cases where that difference is, is meaningful and worthwhile, but for the flexibility, I think anyway, I've said that a few times now. Cool. Well,

Starting point is 00:30:22 thank you, Nikolai. Thanks for indulging me. you a few people found that interesting and useful and yeah catching it thank you bye

Your Ad Here

Postgres FM - Hash indexes

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.