Postgres FM - GIN Indexes

Episode Date: March 14, 2025

Nikolay and Michael discuss GIN indexes in Postgres — what they are, what they're used for, and some limitations to be aware of. Here are some links to things they mentioned:GIN Indexes ht...tps://www.postgresql.org/docs/current/gin.htmlGeneralized Search Trees for Database Systems (Hellerstein, Naughton, Pfeffer) https://dsf.berkeley.edu/papers/vldb95-gist.pdf RUM extension https://pgxn.org/dist/rum/1.1.0/Understanding Postgres GIN Indexes: The Good and the Bad (Lukas Fittl) https://pganalyze.com/blog/gin-index~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, hello, this is PostGCFM. As usual, I don't remember the episode number. 137? 137. My name is Nikolai, PostGCFM, and as usual, my co-host is Michael, Pidgey Master. Hi, Michael, how are you doing today? Hello, Nikolai. I am good, thank you. How are you doing? Fantastic, honestly. Good? A lot of things are happening but all good. Things are changing and a lot of things are changing in a constructive way, which I like. Let's discuss GIN. It's one of things a lot of folks use and it's one of index types many people use still and i think it's not going to
Starting point is 00:00:48 disappear despite the rise of vector related types of indexes and approximate k nearest neighbors indexes right gin is very interesting type of index developed by the same folks who brought wall support to JIST. JIST was the only thing for complex data types before Jin. But then with the rise of JSON and JSONB, Jin became very popular, finding a lot of applications. Not only full-text searches, I think originally it. Not only full-text search, as it was, I think originally it was created for full-text search. Yeah. Yeah.
Starting point is 00:01:30 So I looked at the history. I was shocked to see that it was added back in 8.2. So 2006, nearly 20 years ago. Yeah. This is exactly when I started to be involved with Postgres. Yeah. This is exactly when I started to be involved with Postgres and 8.2 when I worked on XML as well and I thought how, very sad about what's happening in the world and how disconnected we became all of us. Yeah. Because these guys are all Russian folks but also I was looking at the domains and one of the, there's an article mentioned in the Postgres docs on gin indexes that references the initial release
Starting point is 00:02:27 and mentions who sponsored the work which i think was a german company but one of the domain the domain it's hosted on is.su which i didn't even know i didn't even know what that stood for it turns out it's ussr or soviet union which makes sense yeah i wanted to say that originally the primary purpose was to support better performance for full-text search and originally full-text search was powered by Gist and its version called RD3, Russian DOL3, where its B3-like structure was created for basically arrays, for sets, created for basically arrays, for sets and supported operations like instead of less than greater than supported operations is contained, one array is contained in another or contains, right?
Starting point is 00:03:17 So also overlaps and it just can work with this, it can support full-text search, but it's slow if you have large volumes of data and a lot of words. Different words, like vocabulary is huge. Actually used vocabulary, right? So gene is inverted index and the name was inherited from Gist. Gist is generalized search tree. I like this so much. Folks who are interested in databases and want to learn something cool, read article
Starting point is 00:03:56 by Hellerstein from Berkeley. Joseph Hellerstein and article about jumping from single dimension B3 to two dimensional space R3 and then generalizing this idea to GIST generalized search tree. It's so cool. And then additionally Rd3, it's four sets already. And of course you can have multiple dimensions, like many dimensions. Not so many as four vectors, unfortunately, like not 2000, but for few dimensions JIST works well and even there was a history for R3, original implementation of R3, native implementation, JIST behaved much better, so it was replaced. Original implementation of R tree for multiple dimensions was replaced by JIST, not for B tree.
Starting point is 00:04:48 But there is extension, JIST B tree for various like multi-column indexes and so on, when you have columns of different nature. And trees are great, but for full-text search they are not so great. So they jumped from idea of working with trees to inverted index, which is a more common idea for search engines. And compared to tree-like structure, inverted index is very simple explanation. Can be, okay, we have list of words and for each word we have list of references to places where this word was noticed, right? Like rows.
Starting point is 00:05:39 Links to tuples. Yeah. Not row versions. So list of words, of course normalized and then for each word list of places where it can be found. It's very very similar to index in a book, in the end of the book. Yeah, I read this and for so many years I've been using and reading about, you know, indexes at the back of books, or, you know, in recipe books is like, there's a really famous example,
Starting point is 00:06:12 really easy one to understand how indexing speeds things up. But it never occurred to me that they're not a B-tree structure, they are a list of words. Yeah, exactly, it's an inverted index. So yeah. And I think the thing that clicked for me as to like why the word inverted is used was in a standard, in a non-inverted index, we have an entry per row or row version.
Starting point is 00:06:42 And in an inverted one, each row could be referenced multiple times assuming exactly in a full-text search case for example each document is likely to have lots of words in it so that's the where the word inverted comes from. Yeah in tree like structure in tree in B3, R3, RD3, and just in general. We perform descending. We say we need to find this value, for example, and we jump from one note to another down to the index towards leaf notes, right? Leaves.
Starting point is 00:07:19 Performing some operations to understand where to go, which child node to choose, not only direction because it's how to say, each internal node might have multiple, many, many children, unlike binary tree. And my favorite example, favorite question when hiring engineer, not database engineer. For database engineer this question is mandatory, but software engineer, backend engineer. You can ask what is B3 and if you hear binary, this is game over for you. And this, unfortunately I had situations when very, very good software engineers, I had this feeling, do I want to ask that question? Because it's like kind of flipping the coin.
Starting point is 00:08:14 And then I hear binary and my internal principles say this is no go. So B3 is not binary tree. B3, unlike binary, we are only two children's in the maximum. We have many children for each internal node, maintaining some rules. In case anybody's wondering, balanced. Well, there are several theories about it, as I remember, but I don't remember theories. But remember serious but okay yeah balanced but actually it's almost balanced meaning that like conceptually the path from the root to each leaf can be n or n plus 1 this helps with balancing so almost balanced okay again again when we work with tree, we have several operations until we reach leaf. And leaf has pointer to the tuple we are looking for.
Starting point is 00:09:13 Inverted index. Okay, we find our word. And then we find many places for this word. And this also means there are several challenges here. I remember originally, as I understand, maybe like my perception might be wrong, but it was long ago, how many, 20 years, right? Almost. I remember GIN was lacking some internal B3 indexes. It needs B3 indexes twice. First, to find the word, because if the number of different words is huge, we need additional mechanism to perform the search faster.
Starting point is 00:09:52 And it can be B3. And second, inside posting lists, we also have B3s. So inside GIN, there are two types of B3 indexes, as I remember. And I saw Igor Rogov's articles and book about internals. And this book has, of course, a whole section about GIN indexes. And I saw a representation, visualization for GIN was, I think actually, the computational should have it, right? One of the, maybe the first illustration which was added to Postgres documentation was Gene,
Starting point is 00:10:32 if I'm not mistaken. My memory might trick me a little bit, but I remember I saw it was 90 degrees rotated, so instead of like in book we have words and then we see page numbers to the right for each word. It was rotated 90 degrees so everything looks down. So it seems like we're going down all the time like in case of gist or bit or bitree. So starting from search of our word, then we using internal B3, then we go in posting link, posting list, we also use B3 and also descending, right? And I honestly feel confusion.
Starting point is 00:11:14 I like the book-like visualization because it distinguishes from tree structures. And also we have three internal trees here. It only adds confusion, in my opinion. This is the choice. The guys who created visualization and wrote, Igor Rogov and others, they are great in terms of education. I definitely admire them, their work in education.
Starting point is 00:11:48 But I personally like original representation. We could write B3 also rotated, highlighting that it's kind of different beast in terms of indexes, right? Yeah. Good? Yeah, this is... Sorry, sorry. This was my introduction, just like history and so on like and some simple
Starting point is 00:12:07 Explanation Yeah. Yeah, I really liked it and I really want to like diagrams in the docs I think they a lot of effort goes into making diagrams that are easy to understand Unfortunately, we don't have that many in the post-growth documentation, but I also find that I personally And unfortunately we don't have that many in the Postgres documentation, but I also find that I personally, I don't learn best visually, I don't think. And I found the description in the Postgres docs of how Jim works particularly useful. It's really well written, like most of the documentation, but I found that really helpful. So I'll link to that in the show notes as well.
Starting point is 00:12:41 I think that's an incredibly good description with a few examples of, I mean obviously full text search is the main use case but it's not the only. I think right now full text search was original main use case, right now main use case is JSONB. Yeah interesting. Well I don't honestly I don't see that I'd say I think I'd be interested in the results of your poll. I think you've put a poll on Twitter, but I do think gin is probably the second most popular index type. After B-tree. But it's by somewhere in the region. I was thinking order of magnitude wise. It's, I say it's there's fewer than one in 10 maybe maybe, but probably more than one in 100. So like somewhere in between
Starting point is 00:13:30 those. So it's still not super common for me to see them. Yeah, almost half of folks voted. They say two index types are used. And I guess it's either B3 and GIN or B3 and vector types like HMSW. Interesting. Who knows? I do see the occasional brin and the occasional hash, but often those are kind of... people have left them lying around after an experiment, after like trying them out, but not necessarily getting that much usage. Postgres should have telemetry enabled by default.
Starting point is 00:14:07 I'm joking, I'm joking. To get usage stats. Well actually, I know this is a real tangent now, but I do think that the usage stats can be misleading for people as well because the last person I came across that had a hash index, they had a hash index on the same column they had a regular B-tree index on. And I said, you know, that's redundant, like if you could, the B-tree could, could... You already pay the price for maintaining B-tree, so... Exactly. But the hash index was still getting used because it's a bit smaller for that, like in...
Starting point is 00:14:40 So it was still getting usage in the telemetry. Had a method about hash indexes, remember? Yeah. But all I mean in the telemetry. About hash indexes, remember? Yeah, but all I mean is the telemetry isn't enough to know that you can drop that hash index because it's still getting usage. I'm joking about telemetry. I cannot imagine Postgres would add some telemetry stuff enabled by default. It's completely opposite to what it's usually done in Polgis. But I'm very curious, of course. I think big platforms might have some stats, but who knows. Anyway, it definitely looks like B3 is number one, but just GIN may be number two or number three.
Starting point is 00:15:21 Oh, sorry, I understand what you mean by telemetry. I thought you meant like for the users to see what they were using. No, imagine Postgres has telemetry sending to some server. The cloud providers could do it. Like, anonymized. Yeah. If they can, because it's also a question how much they can go to schema analysis. Usually it should be possible. But it's better to focus on different tasks for cloud providers and we can discuss this another time. This is not number one priority, I'm pretty sure. How about you? How often do you see gin being used in the world? Well it's used for full tech search but often it's losing to elastic in larger companies. In smaller projects it's great but in larger companies we should discuss performance gains
Starting point is 00:16:22 and overhead next, right? But yeah, what I see is great until it's not in terms of full-text search. For JSONB, kind of similar, but there we can tune it, right? We can use smaller size of index just preserving only information about key and keys and values just on be pattern ops with these things like additional you can tweak it right and yeah path that may be path ops exactly yeah just to be path ops so you can say I don't need everything to be indexed all the knowledge about structure, I just need keys and values and support only part of operations, search operations I have. And for JSONB for sure I see it, but at the same time in many cases JSON or JSONB values
Starting point is 00:17:19 left unindexed because it's just storage for some metadata for example and we don't need to search in it or we can we decide to search only using specific paths inside JSON value and then in this case for both JSON and JSONB and even for XML data type you can have B3 because it becomes a scalar value when you apply some like you need to get the value from some path that's it. It's just an expression index yeah yeah exactly so I also see quite a lot Trigram PgTRGM being used with sometimes with some issues but yeah this this this I see as well.
Starting point is 00:18:06 Yeah. So again, but again, normally related to full-text search, right? I say trigram by the way, and I thought maybe it's trigram, I don't know. I think JSONB is number one in my head, it's so I don't have analytics, so it's some subjective perception.
Starting point is 00:18:24 It makes sense, right? It makes sense if you think I might want to do some search on this. It's a perfect, I mean, it came 10 years before JSON V, but it's the perfect index type for it. If you don't know in advance what you're going to want to be searching for. There is also a full-text search for JSON V, additionally,
Starting point is 00:18:44 a right specific case. I remember some support was added to perform full text search inside JSONv values, also powered by GIN, I guess. So anyway, this index is great. Well, for arrays as well, I use it many times for arrays. If I know I need to store arrays, text arrays, number like integer arrays, doesn't matter. I like how Postgres supports them for decades. Array support in Postgres is great,
Starting point is 00:19:18 except one thing, you know it, right? No's? No, no, no. Similar, but no. I don't know it right? No, no, no, similar but no it's... I don't know it. No. It's indexes. It starts with one. It's very confusing. Oh, okay. Super confusing. You confused me saying indexes. I was thinking... No, no, yeah, it's different. Index structures. Yeah, yeah. I mean to access the first element, everywhere else... outside Postgres, it's zero, right? But in Postgres, it's one. And you need to switch your mind all the time. Not everywhere else, but yeah. And for arrays, it's great.
Starting point is 00:19:56 For example, I remember my very first talk in the US in 2008 was about how cool Postgres is for Web 2.0. And I talked about EAV structure, how it's performance-wise not good for larger data volumes. And when we talked, I talked about tagging, tags, right? For social media, we need tags. And putting tags into separate tables following EAV, entity attribute value structure is going to hit performance. So instead we can put it as arrays
Starting point is 00:20:33 in the same table and then to search we can just use gin, right? And Russian Dole Tree will be effectively used but supported by gin implicitly, right. So in this case you can find quickly like give me rows where we have this tag. There will be problem although of ordering. Before we move on to discussing this problem I think this is the biggest problem with full-text search and general gene people suffer from it but we will discuss it before that let's discuss performance gains and losses again is obvious search speed is much better for inverted index than for tree like structure so for larger volumes of data, search is good, select is good, right?
Starting point is 00:21:27 If we, if like one comment here, only if we talk about the use of gene only, right? And like only searching by like one dimension. Yeah, well one word or few words also, it has additional internal query language, you can say, and or phrase search. A lot of features there. But also, when we say this, search speed is better. We need to think about edge cases as well. For example, if we have a very, very popular word which was not excluded by putting it to stop list of vocabulary, stop list dictionary. If we didn't put it there,
Starting point is 00:22:17 we're going to have poor selectivity and high cardinality of results and performance, I don't know, limit works, right? But then there are edge cases where when like it's quite popular word, we use gene and there is a big problem of the need to combine gene-based search with something else like for example additional filter on some scholar like time stamp or anything or ordering. And this is huge problem. I want to find everything which includes these words or these values, doesn't matter, but I also need to get 25 last items from there.
Starting point is 00:23:04 And this is somehow weight it like you might want to wait based on the the more recent The more you want to wait it, but if it's got you know, the word many many times over I I get it I think we're probably Venturing a bit too far into full tech search stuff rather than gin stuff, but you're right and it's about gin It's not about full-text search but for for three grams don't you want to have most popular use or like the latest or for for just on be search I want to find everything which which includes these things but I want very fresh I want to order somehow and in most cases and modern work like when it was created, the idea was we are going to order by ranking.
Starting point is 00:23:49 It was full search, ranking, most relevant. But in social media, we usually don't want that, we usually want the freshest information. So creation time or ID, if it's numeric, or UID version 7 would work. So we need to order by some scalar. And this works not well with gin. I think you're right though. I think maybe at scale it's always going to be a problem. There is a feature that was added from the beginning with gin, at least for the stop word issue. And that's gin fuzzy search limit that you can set in that it recommends in the thousands
Starting point is 00:24:28 so that if if your search would have returned more than that number roughly it will limit them but it will limit them randomly so that that plays into your which which one should be returned but the idea behind that feature is if you're returning thousands of options anyway how good is that search in the first place like it's a so yeah you yeah you're right also statistics related work like it over time it was improved but like I remember still having issues like with lossiness of for a planner right like like we need to check I don't know actually details right now. I'm using Jin blindly lately in terms of performance. But for me, the key problem is an ability for Jin
Starting point is 00:25:12 to combine search with additional scholar-based filters or ordering. Ordering is number one problem for me. It led the same folks to create RAM indexes. Yeah, which confused me because when they launched gin they said you should think of it as a genie not as an alcoholic drink. But then they come out with rum and it's definitely about alcoholic drinks. It was a catch. I don't believe them. Yeah. Ambush, right? So yeah, yeah. Well, and the rum unfortunately in my case didn't work well because of like it was huge and so on
Starting point is 00:25:46 maybe it was because I tried to put to the idea of RAM is let's put colors into a structure right like time stamp. So that you can do the creation time. You can do the thing you wanted. Right exactly and creation time it's how many? Eight bytes or 16 and I keep forgetting. It was huge. I guess I could reduce it, for example, putting 4-byte integers just to support ordering. There are ideas how to improve, but still this is an extension that is not available on many platforms, unfortunately. But there is another thing. There is a standard contrib module extension which is available everywhere called Bitregin. At least this can help you to combine different filters. For example, if you want to have filter to search in JSONB or full-text search or Trigram
Starting point is 00:26:42 index, you want to combine it with additional filtering, for example, price, right, or age, depending on data, right. This is possible, this is worth considering, so to have multi-column index and maybe for ordering it also can be helpful, but if you don't do anything, what is usually happening, the planner needs to decide. Should it use, for example, primary key to order, or create an add timestamp to order by, and then apply filter on instead of gene, it will be just applying filter for whole data.
Starting point is 00:27:20 It will find it. And it can be very, It's like alternative is to perform gene filtering, find everything and then sort and memory. And it's hit and miss. Sometimes we see that planners are very optimistic thinking we will quickly find just following bit. by created ad in reverse order like very fresh item dynamically check does it match our filter which supposed to go through gin well it's not matching not matching not matching and if it it's a miss it might like skip many many many rows until it finds our 25 target hopes and just following B3 not involving gene at all and this is
Starting point is 00:28:06 terrible situation I saw many incidents because of that so or vice versa gene again we use some popular word it finds millions of entries then tries to like sort them in memory to find top 25 maybe it's like edge cases here maybe hitting less in terms of performance but also not fun at all to have this situation. Right? Yeah. And this is number one. It could be where the fuzzy limit could actually maybe help there depending on how important is to not have false negative like things that should show up not showing up I guess depending on the use case. If you just want to show something, let's say you're showing a social media feed,
Starting point is 00:28:49 trying to show engaging things to people, it doesn't matter if there's a few really good ones that don't show, but then if you've got a different use case, then it matters a lot. Like for full-text search, if you don't show the most relevant thing, that's an issue. Or for example user just added something goes to search expects this to be there but it's not there it's it's not good it's like it's feeling of this is like it's a bug so it's in many cases unacceptable unfortunately and also there is a problem with pending list right? This is about inserts rather than selects. Not posting lists.
Starting point is 00:29:28 There are posting lists for each word. Basically we have references to tuples. And then there is pending list, which was the trick to... We forgot to mention the main problem with GIN, actually. The key problem is slow writes. Because search is fast with all the things we discussed, but writes are slower than writes to tree-like structures, to trees, to B-tree, to Gist. So even with three balancing, I guess it's like slower.
Starting point is 00:30:02 And what was created fast update what option isn't it's an unfair comparison like it's almost by design that they have to be slower because Documents contain lots of words one. Yeah one row being updated or added Creates multiple rights to that to the index. So it's yeah, it's not a fair fight. It's not a fair comparison I agree so to mitigate this fast update Technique was created and there's an option fast update when you create an index and I guess it's owned by default, right? Yeah, yeah So fast update means that changes will go to some temporary like buffer like kind of location inside the structure pending list Pending list. Yes, and then at some point when they reach by default four megabyte
Starting point is 00:30:53 Yeah, there's a few things that can trigger it. That's one there's like a Catch what happens but also back. Oh It batch processes them. Yeah, and it's happening it can happen synchronously so some write some update or insert might suddenly be very slow so slow it might hit like your statement amount which is like 30 seconds for example or 60 depending right or 10 seconds and this is like unpredictable performance for rights yeah or vacuum can do it. Asynchronously.
Starting point is 00:31:26 Vacuum I like, but what I saw from practice, vacuum is good, mitigates this, and not allowing this job to be done in a synchronous manner. But it works only if our duration of vacuum is reasonable, which points to the direction we do need partitioning. Because if we have a huge table, vacuum is super slow, it cannot catch up with our writes and pending list problem hits our writes, and users notice it. At some point, it might be so that we decide, okay, we need to switch off fast update for some indexes, for large tables, because we want predictable, maybe not good,
Starting point is 00:32:12 but predictable performance of writes. It depends. There is a trade-off, and I see both cases are not ideal, you know, like if you have really large volumes of data and workload, heavy workload. So partitioning can help here and to keep fast update on, in my opinion. Well, yeah, have you tried, have you tried that? I'm just thinking actually, then you've got to have the partition key in the query and then suddenly you've got another dimension there. Sounds tricky. And so what? We need a different index in this case, right?
Starting point is 00:32:55 Well, it just might not then use the gin. I've had trouble convincing the planet to use gin indexes in the past when like they've got another choice and it doesn't I'm probably doing I'm maybe I'm doing something wrong but I wanted to try and get a bitmap scan like and in the index like with a GIN index as one bitmap index scan and let's say a B-tree index is another sounds like the case for B-tree GIN for me well a multi column gin index with B-tree in it could have helped but it wouldn't have been as like it wouldn't have been structured quite as like optimally in my opinion so I
Starting point is 00:33:35 was hoping to get this and list really short and then a really like quick scan and I didn't have luck with multi-column be a gin index being as like as performant as I thought we could get it with a bitmap scan but that the partitioning thing I would love to be wrong but the partitioning thing sounds like you've now got a gin index per partition right so you've got to get to the right partition so then it's like, I get if it's time, partition should work partition pruning should be in good shape of course,
Starting point is 00:34:12 but this is general rule for partitioning. Yeah. But so let's say you're partitioning by time. It means we need, we've got the time filter on, right? Right. Is, is that the case? Like for full for full text text search for example, we're gonna be filtering by time? Well if we're ordering by time as I like for social media again we need fresh information in most cases this is the rule compared to previous like pre-previous era for web 1.0. Well unless there isn't fresh information then you need to go back and back and back right? Maybe yes well
Starting point is 00:34:44 depends depends so my question is how we limit how we order. Like if we don't limit don't order this is found fundamental question to this query why do we request unlimited results? This is this question I raise very often during consulting practice, very often I see no limit, no order by, and this is the key question. In most cases people say, oh, actually, yeah, it makes sense to have pagination here, for example. Or pagination with gin. Yeah.
Starting point is 00:35:16 It's interesting. It's difficult, right? Yeah. Well, again, if we order by ID or timestamp by scalar value, it's like key set pagination works and that's it. And Gene is just additional filtering in this case. And we're kind of back to where we started with your original comment that it works until it doesn't. So until and set and scale, it's fine to not have these.
Starting point is 00:35:40 It's fine to not have the limit and the order by because your data set is small enough that your query response times are pretty good. But we design for future, not only for now, right? So it's always worth asking yourself how much data you expect in the future, how much data you need to return with this query to your application or front end. I don't know. Do we need everything?
Starting point is 00:36:04 I saw some case, interesting case't know, do we need everything? I saw some case, interesting case, when people intentionally decided to return everything and then do a lot of work on front-end. I saw it many times actually with guys who decided that our front-end should be very thick and a lot of logic there and it's better to cache everything, like it's let's load and then work well in my opinion it's a dead end because if your project grows it's a dead end you will experience poor performance and other case was guys who decided to perform analytics with R so they also
Starting point is 00:36:39 needed to load everything and then build B3 and so on and kind of group by and so on was performed only in memory using R which for me like looks like why do you do this do it on inside your database so yeah and I let me not to forget when when when pending list is growing and like limit, it can exceed limits, right? Or no? Is it a soft limit or no? I'm not sure. This is a good question.
Starting point is 00:37:13 But anyway, maybe it was a bug when it exceeded. I don't remember. I remember some parts of my memory tell me that it happened in the past. I'm not sure it was a bug or no. But obviously also select performance will be affected if pending list is huge. For example imagine for specific filters we are unlucky and we need to process all like we need to deal with it and selects become slower. Yeah so it's four megabytes right? The default. I suspect, I
Starting point is 00:37:46 suspect it's not too much of a penalty if you leave it at the four megabytes, but a tempting thing to do like might be to increase that so that you don't have, so that you hopefully have auto vacuum kick in and do the work in the background instead of having a random insert paying the penalty. Right. But then once you've increased the list, then each select has to look through more pending entries, or especially towards the end, and then you're paying the price on selects as well. So yeah, I don't think, I haven't seen, in fact, actually we probably should mention there's a really good write-up about these kinds of issues by Lucas Fittle
Starting point is 00:38:26 of PG Analyze who also linked to the GitLab work. I think Lucas does fantastic job analyzing various Postgres cases. I think that's why his company is called PG Analyze, but this is your joke, not mine. You're stealing my jokes now. Yes. Well, yeah, good job with PG analysis and yeah in GitLab's case I was slightly involved only and I remember that case was few years ago and the result was decision for specific indexes to switch off fast update because table was huge and so on
Starting point is 00:39:01 right? Yeah anyway. So switching off fast update that means we're not using a pending list anymore and every insert and update pays the price at the time of insertion to update the index. And I think that's a really interesting trade off to make each insert a bit slower, but not have to have occasional very slow ones. I like that a lot in terms of performance, in terms of trade-off. Anyway kudos to GitLab for openness as usual
Starting point is 00:39:30 because all the details are visible to public, to community and it's super helpful for like for general development of Postgres and so on. And kudos to Lukas for very great analysis. Yeah. Of this Posgo's case. Good, I think we touched a little bit of some deeper waters, but not super deep if someone wants to read internal SIGOR ROGOS articles and books already. Great in this area. Yeah.
Starting point is 00:40:03 And yeah, I think that's it. Right. Right. Or anything else. One good one last tiny tip that I think is worth you mentioned Jason be path ops briefly. I think for most other data types, the defaults are the only operator class. But for Jason B, the default is actually JSONB ops, not JSONB path ops. And you can get a performance boost with JSONB path ops if you don't need setting up. Like if you're just using the usual JSON, the usual contains operators. Final words, gin is very important index type. It's not going to disappear.
Starting point is 00:40:41 It's going to be used heavily. It's one of the strengths of Postgres, like a rich set of index types. Yeah, for sure. Okay, so thank you. See you next time. Thanks so much, Nikolai. Catch you next week. Bye. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.